#
9a9677ec |
|
19-May-2024 |
Ricardo Branco <rbranco@suse.de> |
linux: Update linux manpage to mention mqueuefs Reviewed by: imp, kib Pull Request: https://github.com/freebsd/freebsd-src/pull/1248
|
#
e30621d5 |
|
18-May-2024 |
Ricardo Branco <rbranco@suse.de> |
mqueue: Introduce kern_kmq_timedreceive & kern_kmq_timedsend Reviewed by: imp, kib Pull Request: https://github.com/freebsd/freebsd-src/pull/1248
|
#
f5449510 |
|
19-Mar-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
sys/syscallsubr.h: align definition of kern_fcntl_freebsd() on 32bit Fixes: d0efabdf15d956e9bc0414356ed798ca3c846e08
|
#
18cb4223 |
|
31-Jan-2024 |
John Baldwin <jhb@FreeBSD.org> |
timerfd: Move kern_timerfd_* prototypes to <sys/syscallsubr.h>
|
#
14505c92 |
|
30-Jan-2024 |
John Baldwin <jhb@FreeBSD.org> |
syscallsubr.h: Sort kern_membarrier prototype alphabetically
|
#
c662306e |
|
20-Jan-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
Add kern_openatfp(9) Reviewed by: markj, pjd Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43529
|
#
2a284076 |
|
20-Jan-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
kern_openat(): rename fd argument to dirfd Reviewed by: markj, pjd Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43529
|
#
d8decc9a |
|
19-Jan-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
Add kcmp(2) kernel bits This is based purely on reading the Linux kcmp(2) man page. In addition to the Linux set of comparators, I also added KCMP_FILEOBJ to compare underlying file' objects. Tested by: manu Reviewed by: brooks, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43518
|
#
0fac350c |
|
30-Nov-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: don't malloc/free sockaddr memory on getpeername/getsockname Just like it was done for accept(2) in cfb1e92912b4, use same approach for two simplier syscalls that return socket addresses. Although, these two syscalls aren't performance critical, this change generalizes some code between 3 syscalls trimming code size. Following example of accept(2), provide VNET-aware and INVARIANT-checking wrappers sopeeraddr() and sosockaddr() around protosw methods. Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D42694
|
#
cfb1e929 |
|
30-Nov-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: don't malloc/free sockaddr memory on accept(2) Let the accept functions provide stack memory for protocols to fill it in. Generic code should provide sockaddr_storage, specialized code may provide smaller structure. While rewriting accept(2) make 'addrlen' a true in/out parameter, reporting required length in case if provided length was insufficient. Our manual page accept(2) and POSIX don't explicitly require that, but one can read the text as they do. Linux also does that. Update tests accordingly. Reviewed by: rscheff, tuexen, zlei, dchagin Differential Revision: https://reviews.freebsd.org/D42635
|
#
3555be01 |
|
08-Sep-2023 |
John Baldwin <jhb@FreeBSD.org> |
Move kern_extattr_* prototypes to <sys/syscallsubr.h> All of the kern_* prototypes belong in this header. While here, sort the prototypes by function name. Reviewed by: dchagin Fixes: 6453d4240f6b vfs: Export exattr methods to reuse by Linuxulator Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D41766
|
#
4a69fc16 |
|
07-Oct-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
Add membarrier(2) This is an attempt at clean-room implementation of the Linux' membarrier(2) syscall. For documentation, you would need to read both membarrier(2) Linux man page, the comments in Linux kernel/sched/membarrier.c implementation and possibly look at actual uses. Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32360
|
#
95ee2897 |
|
16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: two-line .h pattern Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
|
#
9b65fa69 |
|
29-Jul-2023 |
Konstantin Belousov <kib@FreeBSD.org> |
linuxolator: implement Linux' PROT_GROWSDOWN From the Linux man page for mprotect(2): PROT_GROWSDOWN Apply the protection mode down to the beginning of a mapping that grows downward (which should be a stack segment or a segment mapped with the MAP_GROWSDOWN flag set). Reported by: dchagin Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D41099
|
#
07c0b6e5 |
|
29-May-2023 |
Dmitry Chagin <dchagin@FreeBSD.org> |
vfs: Retire kern_alternate_path() as unused anymore From now a non-native ABI should use pwd_altroot() ability to tell to the namei() its root directory to dynamically reroots lookups. Differential Revision: https://reviews.freebsd.org/D40093 MFC after: 2 month
|
#
4d846d26 |
|
10-May-2023 |
Warner Losh <imp@FreeBSD.org> |
spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
|
#
cb858340 |
|
28-Apr-2023 |
Dmitry Chagin <dchagin@FreeBSD.org> |
linux(4): Add a dedicated statat() implementation Get rid of calling Linux stat translation hook and specific to Linux handling of non-vnode dirfd from kern_statat(), Reviewed by: kib, mjg Differential revision: https://reviews.freebsd.org/D35474
|
#
d46174cd |
|
28-May-2022 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Finish cpuset_getaffinity() after f35093f8 Split cpuset_getaffinity() into a two counterparts, where the user_cpuset_getaffinity() is intended to operate on the cpuset_t from user va, while kern_cpuset_getaffinity() expects the cpuset from kernel va. Accordingly, the code that clears the high bits is moved to the user_cpuset_getaffinity(). Linux sched_getaffinity() syscall returns the size of set copied to the user-space and then glibc wrapper clears the high bits. MFC after: 2 weeks
|
#
47a57144 |
|
12-May-2022 |
Justin Hibbits <jhibbits@FreeBSD.org> |
cpuset: Byte swap cpuset for compat32 on big endian architectures Summary: BITSET uses long as its basic underlying type, which is dependent on the compile type, meaning on 32-bit builds the basic type is 32 bits, but on 64-bit builds it's 64 bits. On little endian architectures this doesn't matter, because the LSB is always at the low bit, so the words get effectively concatenated moving between 32-bit and 64-bit, but on big-endian architectures it throws a wrench in, as setting bit 0 in 32-bit mode is equivalent to setting bit 32 in 64-bit mode. To demonstrate: 32-bit mode: BIT_SET(foo, 0): 0x00000001 64-bit sees: 0x0000000100000000 cpuset is the only system interface that uses bitsets, so solve this by swapping the integer sub-components at the copyin/copyout points. Reviewed by: kib MFC after: 3 days Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D35225
|
#
f35093f8 |
|
11-May-2022 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Use Linux semantics for the thread affinity syscalls. Linux has more tolerant checks of the user supplied cpuset_t's. Minimum cpuset_t size that the Linux kernel permits in case of getaffinity() is the maximum CPU id, present in the system / NBBY, the maximum size is not limited. For setaffinity(), Linux does not limit the size of the user-provided cpuset_t, internally using only the meaningful part of the set, where the upper bound is the maximum CPU id, present in the system, no larger than the size of the kernel cpuset_t. Unlike FreeBSD, Linux ignores high bits if set in the setaffinity(), so clear it in the sched_setaffinity() and Linuxulator itself. Reviewed by: Pau Amma (man pages) In collaboration with: jhb Differential revision: https://reviews.freebsd.org/D34849 MFC after: 2 weeks
|
#
f04534f5 |
|
06-May-2022 |
Dmitry Chagin <dchagin@FreeBSD.org> |
sysvsem: Add a timeout argument to the semop. For future use in the Linux emulation layer for the semtimedop syscall split the sys_semop syscall into two counterparts and add struct timespec *timeout argument to the last one. Reviewed by: jhb, kib Differential revision: https://reviews.freebsd.org/D35121 MFC after: 2 weeks
|
#
f3f3e3c4 |
|
03-Mar-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
fd: add close_range(..., CLOSE_RANGE_CLOEXEC) For compatibility with Linux. MFC after: 3 days Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D34424
|
#
c40fee6f |
|
25-Nov-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: drop the always curthread argument from kern_alternate_path
|
#
2b9d052d |
|
17-Nov-2021 |
Brooks Davis <brooks@FreeBSD.org> |
freebsd32: fix getfsstat sign extension bugs Add freebsd32 versions of getfsstat and freebsd11_getfsstat so that bufsize is properly sign-extended if a negative value is passed. Reject negative values before passing to kern_getfsstat as a size_t. Reviewed by: kevans
|
#
e02f64d9 |
|
17-Nov-2021 |
Brooks Davis <brooks@FreeBSD.org> |
freebsd32: add real abort2 Previously, the code would copy twice as many pointers as specified and print pairs of them a single 64-bit pointer. abort2 doesn't return so make the return type void freebsd32_abort2 is in it's own file with a 2-clause BSD license based on a discussion with Wojciech many years ago. Reviewed by: kevans
|
#
b7fd8611 |
|
17-Nov-2021 |
Brooks Davis <brooks@FreeBSD.org> |
syscalls: sprinkle in const values Add missing const qualifiers to a number of syscall arguments. Obtained from: CheriBSD Reviewed by: kevans
|
#
01ce7fca |
|
15-Nov-2021 |
Brooks Davis <brooks@FreeBSD.org> |
ommap: fix signed len and pos arguments 4.3 BSD's mmap took an int len and long pos. Reject negative lengths and in freebsd32 sign-extend pos correctly rather than mis-handling negative positions as large positive ones. Reviewed by: kib
|
#
3225fd22 |
|
04-Nov-2021 |
John Baldwin <jhb@FreeBSD.org> |
kern_utimensat: Update name of last arg in prototype. The last argument is a mask of AT_* flags, not a namei cnp flag as 'int follow' implies in other kern_* functions. Obtained from: CheriBSD Sponsored by: The University of Cambridge, Google Inc.
|
#
0dc332bf |
|
05-Aug-2021 |
Ka Ho Ng <khng@FreeBSD.org> |
Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9). fspacectl(2) is a system call to provide space management support to userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the deallocation. vn_deallocate(9) is a public KPI for kmods' use. The purpose of proposing a new system call, a KPI and a VOP call is to allow bhyve or other hypervisor monitors to emulate the behavior of SCSI UNMAP/NVMe DEALLOCATE on a plain file. fspacectl(2) comprises of cmd and flags parameters to specify the space management operation to be performed. Currently cmd has to be SPACECTL_DEALLOC, and flags has to be 0. fo_fspacectl is added to fileops. VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation of VOP_DEALLOCATE(9) is provided. Sponsored by: The FreeBSD Foundation Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28347
|
#
e884512a |
|
10-Jun-2021 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Split kern_poll() on two counterparts. The kern_poll_kfds() operates on clear kernel data, kfds points to an array in the kernel, while kern_poll() operates on user supplied pollfd. Move nfds check to kern_poll_maxfds(). No functional changes, it's for future use in the Linux emulation layer. Reviewd by: kib Differential Revision: https://reviews.freebsd.org/D30690 MFC after: 2 weeks
|
#
5d1d844a |
|
25-Apr-2021 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
kern_linkat: modify to accept AT_ flags instead of FOLLOW/NOFOLLOW This makes this API match other kern_xxxat() functions. Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D29776
|
#
7a1591c1 |
|
22-Jan-2021 |
Brooks Davis <brooks@FreeBSD.org> |
Rename kern_mmap_req to kern_mmap Replace all uses of kern_mmap with kern_mmap_req move the old kern_mmap. Reand rename kern_mmap_req to kern_mmap . The helper saved some code churn initially, but having multiple interfaces is sub-optimal. Obtained from: CheriBSD Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28292
|
#
7a202823 |
|
23-Dec-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Expose eventfd in the native API/ABI using a new __specialfd syscall eventfd is a Linux system call that produces special file descriptors for event notification. When porting Linux software, it is currently usually emulated by epoll-shim on top of kqueues. Unfortunately, kqueues are not passable between processes. And, as noted by the author of epoll-shim, even if they were, the library state would also have to be passed somehow. This came up when debugging strange HW video decode failures in Firefox. A native implementation would avoid these problems and help with porting Linux software. Since we now already have an eventfd implementation in the kernel (for the Linuxulator), it's pretty easy to expose it natively, which is what this patch does. Submitted by: greg@unrelenting.technology Reviewed by: markj (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26668
|
#
be2535b0 |
|
04-Dec-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Add kern_ntp_adjtime(9). Reviewed by: brooks, cy Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27471
|
#
15eaec6a |
|
21-Nov-2020 |
Kyle Evans <kevans@FreeBSD.org> |
_umtx_op: move compat32 definitions back in These are reasonably compact, and a future commit will blur the compat32 lines by supporting 32-bit operations with the native _umtx_op.
|
#
de774e42 |
|
17-Nov-2020 |
Conrad Meyer <cem@FreeBSD.org> |
linux(4): Implement name_to_handle_at(), open_by_handle_at() They are similar to our getfhat(2) and fhopen(2) syscalls. Differential Revision: https://reviews.freebsd.org/D27111
|
#
63ecb272 |
|
16-Nov-2020 |
Kyle Evans <kevans@FreeBSD.org> |
umtx_op: reduce redundancy required for compat32 All of the compat32 variants are substantially the same, save for copyin/copyout (mostly). Apply the same kind of technique used with kevent here by having the syscall routines supply a umtx_copyops describing the operations needed. umtx_copyops carries the bare minimum needed- size of timespec and _umtx_time are used for determining if copyout is needed in the sem2_wait case. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27222
|
#
aaf78c16 |
|
23-Sep-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not leak oldvmspace if image activation failed and current address space is already destroyed, so kern_execve() terminates the process. While there, clean up some internals of post_execve() inlined in init_main. Reported by: Peter <pmc@citylink.dinoex.sub.org> Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D26525
|
#
67a659d2 |
|
08-Sep-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Add kern_mmap_racct_check(), a helper to verify limits in vm_mmap*(). Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24652
|
#
52c81be1 |
|
20-Jun-2020 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add linux_madvise(2) instead of having Linux apps call the native FreeBSD madvise(2) directly. While some of the flag values match, most don't. PR: kern/230160 Reported by: markj Reviewed by: markj Discussed with: brooks, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25272
|
#
472ced39 |
|
12-Apr-2020 |
Kyle Evans <kevans@FreeBSD.org> |
Implement a close_range(2) syscall close_range(min, max, flags) allows for a range of descriptors to be closed. The Python folk have indicated that they would much prefer this interface to closefrom(2), as the case may be that they/someone have special fds dup'd to higher in the range and they can't necessarily closefrom(min) because they don't want to hit the upper range, but relocating them to lower isn't necessarily feasible. sys_closefrom has been rewritten to use kern_close_range() using ~0U to indicate closing to the end of the range. This was chosen rather than requiring callers of kern_close_range() to hold FILEDESC_SLOCK across the call to kern_close_range for simplicity. The flags argument of close_range(2) is currently unused, so any flags set is currently EINVAL. It was added to the interface in Linux so that future flags could be added for, e.g., "halt on first error" and things of this nature. This patch is based on a syscall of the same design that is expected to be merged into Linux. Reviewed by: kib, markj, vangyzen (all slightly earlier revisions) Differential Revision: https://reviews.freebsd.org/D21627
|
#
d718de81 |
|
04-Mar-2020 |
Brooks Davis <brooks@FreeBSD.org> |
Introduce kern_mmap_req(). This presents an extensible interface to the generic mmap(2) implementation via a struct pointer intended to use a designated initializer or compount literal. We take advantage of the mandatory zeroing of fields not listed in the initializer. Remove kern_mmap_fpcheck() and use kern_mmap_req(). The motivation for this change is a desire to keep the core implementation from growing an ever-increasing number of arguments that must be specified in the correct order for the lowest-level implementations. In CheriBSD we have already added two more arguments. Reviewed by: kib Discussed with: kevans Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D23164
|
#
cbc10891 |
|
03-Feb-2020 |
Dmitry Chagin <dchagin@FreeBSD.org> |
For code reuse in Linuxulator rename get_proccess_cputime() and get_thread_cputime() and add prototypes for it to <sys/syscallsubr.h>. As both functions become a public interface add process lock assert to ensure that the process is not exiting under it. Fix whitespace nit while here. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23340 MFC after 2 weeks
|
#
7739d927 |
|
01-Feb-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: replace kern___getcwd with vn_getcwd The previous routine was resulting in extra data copies most notably in linux_getcwd.
|
#
b3fb13eb |
|
24-Jan-2020 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add kern_unmount() and use in Linuxulator. No functional changes. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22646
|
#
ca603bb1 |
|
12-Jan-2020 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
dd kern_getpriority(), make Linuxulator use it. Reviewed by: kib, emaste MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22842
|
#
7a0ef283 |
|
12-Jan-2020 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add kern_setpriority(), use it in Linuxulator. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22841
|
#
535b1df9 |
|
04-Jan-2020 |
Kyle Evans <kevans@FreeBSD.org> |
shm: correct KPI mistake introduced around memfd_create When file sealing and shm_open2 were introduced, we should have grown a new kern_shm_open2 helper that did the brunt of the work with the new interface while kern_shm_open remains the same. Instead, more complexity was introduced to kern_shm_open to handle the additional features and consumers had to keep changing in somewhat awkward ways, and a kern_shm_open2 was added to wrap kern_shm_open. Backpedal on this and correct the situation- kern_shm_open returns to the interface it had prior to file sealing being introduced, and neither function needs an initial_seals argument anymore as it's handled in kern_shm_open2 based on the shmflags.
|
#
18348a23 |
|
04-Jan-2020 |
Kyle Evans <kevans@FreeBSD.org> |
kern_mmap: add a variant that allows caller to inspect fp Linux mmap rejects mmap() on a write-only file with EACCES. linux_mmap_common currently does a fun dance to grab the fp associated with the passed in fd, validates it, then drops the reference and calls into kern_mmap(). Doing so is perhaps both fragile and premature; there's still plenty of chance for the request to get rejected with a more appropriate error, and it's prone to a race where the file we ultimately mmap has changed after it drops its referenced. This change alleviates the need to do this by providing a kern_mmap variant that allows the caller to inspect the fp just before calling into the fileop layer. The callback takes flags, prot, and maxprot as one could imagine scenarios where any of these, in conjunction with the file itself, may influence a caller's decision. The file type check in the linux compat layer has been removed; EINVAL is seemingly not an appropriate response to the file not being a vnode or device. The fileop layer will reject the operation with ENODEV if it's not supported, which more closely matches the common linux description of mmap(2) return values. If we discover that we're allowing an mmap() on a file type that Linux normally wouldn't, we should restrict those explicitly. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D22977
|
#
34ad5ac2 |
|
13-Dec-2019 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add kern_kill() and use it in Linuxulator. It's just a cleanup, no functional changes. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22645
|
#
be2cfdbc |
|
13-Dec-2019 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add kern_getsid() and use it in Linuxulator; no functional changes. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22647
|
#
d6fee74a |
|
12-Dec-2019 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add kern_sync(9), and make kernel code call it instead of going via sys_sync(2). Minor cleanup, no functional changes. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19366
|
#
20f70576 |
|
25-Sep-2019 |
Kyle Evans <kevans@FreeBSD.org> |
Add a shm_open2 syscall to support upcoming memfd_create shm_open2 allows a little more flexibility than the original shm_open. shm_open2 doesn't enforce CLOEXEC on its callers, and it has a separate shmflag argument that can be expanded later. Currently the only shmflag is to allow file sealing on the returned fd. shm_open and memfd_create will both be implemented in libc to use this new syscall. __FreeBSD_version is bumped to indicate the presence. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D21393
|
#
0cd95859 |
|
25-Sep-2019 |
Kyle Evans <kevans@FreeBSD.org> |
[2/3] Add an initial seal argument to kern_shm_open() Now that flags may be set on posixshm, add an argument to kern_shm_open() for the initial seals. To maintain past behavior where callers of shm_open(2) are guaranteed to not have any seals applied to the fd they're given, apply F_SEAL_SEAL for existing callers of kern_shm_open. A special flag could be opened later for shm_open(2) to indicate that sealing should be allowed. We currently restrict initial seals to F_SEAL_SEAL. We cannot error out if F_SEAL_SEAL is re-applied, as this would easily break shm_open() twice to a shmfd that already existed. A note's been added about the assumptions we've made here as a hint towards anyone wanting to allow other seals to be applied at creation. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D21392
|
#
bbbbeca3 |
|
24-Jul-2019 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add kernel support for a Linux compatible copy_file_range(2) syscall. This patch adds support to the kernel for a Linux compatible copy_file_range(2) syscall and the related VOP_COPY_FILE_RANGE(9). This syscall/VOP can be used by the NFSv4.2 client to implement the Copy operation against an NFSv4.2 server to do file copies locally on the server. The vn_generic_copy_file_range() function in this patch can be used by the NFSv4.2 server to implement the Copy operation. Fuse may also me able to use the VOP_COPY_FILE_RANGE() method. vn_generic_copy_file_range() attempts to maintain holes in the output file in the range to be copied, but may fail to do so if the input and output files are on different file systems with different _PC_MIN_HOLE_SIZE values. Separate commits will be done for the generated syscall files and userland changes. A commit for a compat32 syscall will be done later. Reviewed by: kib, asomers (plus comments by brooks, jilles) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D20584
|
#
5dc7e31a |
|
02-Jul-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Control implicit PROT_MAX() using procctl(2) and the FreeBSD note feature bit. In particular, allocate the bit to opt-out the image from implicit PROTMAX enablement. Provide procctl(2) verbs to set and query implicit PROTMAX handling. The knobs mimic the same per-image flag and per-process controls for ASLR. Reviewed by: emaste, markj (previous version) Discussed with: brooks Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D20795
|
#
77555b84 |
|
10-Jun-2019 |
Doug Moore <dougm@FreeBSD.org> |
Change the check for 'size' wrapping around to zero in kern_mmap to account for both the lower and upper bound modifications. Change the error returned to ENOMEM. Rename the parameter size to len and make size a local variable that stores the value of len after it has been modified. This addresses concerns expressed by Bruce Evans after r348843. Reported by: brde@optusnet.com.au Reviewed by: kib, markj (mentors) MFC after: 3 days Relnotes: yes Differential Revision: https://reviews.freebsd.org/D20592
|
#
a1304030 |
|
06-Apr-2019 |
Mariusz Zaborski <oshogbo@FreeBSD.org> |
Introduce funlinkat syscall that always us to check if we are removing the file associated with the given file descriptor. Reviewed by: kib, asomers Reviewed by: cem, jilles, brooks (they reviewed previous version) Discussed with: pjd, and many others Differential Revision: https://reviews.freebsd.org/D14567
|
#
318f0d77 |
|
06-Nov-2018 |
Brooks Davis <brooks@FreeBSD.org> |
Use declared types for caddr_t arguments. Leave ptrace(2) alone for the moment as it's defined to take a caddr_t. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D17852
|
#
12e69f96 |
|
02-Nov-2018 |
Brooks Davis <brooks@FreeBSD.org> |
Add const to input-only char * arguments. These arguments are mostly paths handled by NAMEI*() macros which already take const char * arguments. This change improves the match between syscalls.master and the public declerations of system calls. Reviewed by: kib (prior version) Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D17812
|
#
4f77f488 |
|
25-Oct-2018 |
Konstantin Belousov <kib@FreeBSD.org> |
Implement O_BENEATH and AT_BENEATH. Flags prevent open(2) and *at(2) vfs syscalls name lookup from escaping the starting directory. Supposedly the interface is similar to the same proposed Linux flags. Reviewed by: jilles (code, previous version of manpages), 0mp (manpages) Discussed with: allanjude, emaste, jonathan Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17547
|
#
c542c43e |
|
16-Aug-2018 |
Jamie Gritton <jamie@FreeBSD.org> |
Revert r337922, except for some documention-only bits. This needs to wait until user is changed to stop using jail(2). Differential Revision: D14791
|
#
284001a2 |
|
16-Aug-2018 |
Jamie Gritton <jamie@FreeBSD.org> |
Put jail(2) under COMPAT_FREEBSD11. It has been the "old" way of creating jails since FreeBSD 7. Along with the system call, put the various security.jail.allow_foo and security.jail.foo_allowed sysctls partly under COMPAT_FREEBSD11 (or BURN_BRIDGES). These sysctls had two disparate uses: on the system side, they were global permissions for jails created via jail(2) which lacked fine-grained permission controls; inside a jail, they're read-only descriptions of what the current jail is allowed to do. The first use is obsolete along with jail(2), but keep them for the second-read-only use. Differential Revision: D14791
|
#
e8a1ec3e |
|
27-Jun-2018 |
Ed Maste <emaste@FreeBSD.org> |
Split kern_break from sys_break and use it in linuxulator Previously the linuxulator's linux_brk invoked the FreeBSD sys_break syscall implementation directly. Instead, move the bulk of the existing implementation to kern_break, and call that from both sys_break and linux_brk. This also addresses a minor bug in linux_brk in that we now return the actual (rounded up) break address, rather than the requested value. Reviewed by: brooks (earlier version) Sponsored by: Turing Robotic Industries Differential Revision: https://reviews.freebsd.org/D16019
|
#
34a77b97 |
|
27-Mar-2018 |
Brooks Davis <brooks@FreeBSD.org> |
Move uio enums to sys/_uio.h. Include _uio.h instead of uio.h in several headers to reduce header polution. Fix a few places that relied on header polution to get the uio.h header. I have not moved struct uio as many more things that use it rely on header polution to get other definitions from uio.h. Reviewed by: cem, kib, markj Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14811
|
#
b1288166 |
|
17-Jan-2018 |
John Baldwin <jhb@FreeBSD.org> |
Use long for the last argument to VOP_PATHCONF rather than a register_t. pathconf(2) and fpathconf(2) both return a long. The kern_[f]pathconf() functions now accept a pointer to a long value rather than modifying td_retval directly. Instead, the system calls explicitly store the returned long value in td_retval[0]. Requested by: bde Reviewed by: kib Sponsored by: Chelsio Communications
|
#
3f289c3f |
|
12-Jan-2018 |
Jeff Roberson <jeff@FreeBSD.org> |
Implement 'domainset', a cpuset based NUMA policy mechanism. This allows userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403
|
#
dd688800 |
|
19-Dec-2017 |
John Baldwin <jhb@FreeBSD.org> |
Add a custom VOP_PATHCONF method for fdescfs. The method handles NAME_MAX and LINK_MAX explicitly. For all other pathconf variables, the method passes the request down to the underlying file descriptor. This requires splitting a kern_fpathconf() syscallsubr routine out of sys_fpathconf(). Also, to avoid lock order reversals with vnode locks, the fdescfs vnode is unlocked around the call to kern_fpathconf(), but with the usecount of the vnode bumped. MFC after: 1 month Sponsored by: Chelsio Communications
|
#
c4e20cad |
|
27-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys/sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
|
#
69921123 |
|
23-May-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Commit the 64-bit inode project. Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_namlen to 16-bits, and change the required alignment. Increase struct statfs f_mntfromname[] and f_mntonname[] array length MNAMELEN to 1024. ABI breakage is mitigated by providing compatibility using versioned symbols, ingenious use of the existing padding in structures, and by employing other tricks. Unfortunately, not everything can be fixed, especially outside the base system. For instance, third-party APIs which pass struct stat around are broken in backward and forward incompatible ways. Kinfo sysctl MIBs ABI is changed in backward-compatible way, but there is no general mechanism to handle other sysctl MIBS which return structures where the layout has changed. It was considered that the breakage is either in the management interfaces, where we usually allow ABI slip, or is not important. Struct xvnode changed layout, no compat shims are provided. For struct xtty, dev_t tty device member was reduced to uint32_t. It was decided that keeping ABI compat in this case is more useful than reporting 64-bit dev_t, for the sake of pstat. Update note: strictly follow the instructions in UPDATING. Build and install the new kernel with COMPAT_FREEBSD11 option enabled, then reboot, and only then install new world. Credits: The 64-bit inode project, also known as ino64, started life many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick (mckusick) then picked up and updated the patch, and acted as a flag-waver. Feedback, suggestions, and discussions were carried by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles), and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial ports investigation followed by an exp-run by Antoine Brodin (antoine). Essential and all-embracing testing was done by Peter Holm (pho). The heavy lifting of coordinating all these efforts and bringing the project to completion were done by Konstantin Belousov (kib). Sponsored by: The FreeBSD Foundation (emaste, kib) Differential revision: https://reviews.freebsd.org/D10439
|
#
f19351aa |
|
05-May-2017 |
Brooks Davis <brooks@FreeBSD.org> |
Provide a freebsd32 implementation of sigqueue() The previous misuse of sys_sigqueue() was sending random register or stack garbage to 64-bit targets. The freebsd32 implementation preserves the sival_int member of value when signaling a 64-bit process. Document the mixed ABI implementation of union sigval and the incompability of sival_ptr with pointer integrity schemes. Reviewed by: kib, wblock MFC after: 1 week Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D10605
|
#
a3b7d0fb |
|
06-Apr-2017 |
Brooks Davis <brooks@FreeBSD.org> |
Regen after r316594.
|
#
46dc8e9d |
|
30-Mar-2017 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Add kern_mincore() helper for micore() syscall. Suggested by: kib@ Reviewed by: kib@ MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D10143
|
#
3f8455b0 |
|
18-Mar-2017 |
Eric van Gyzen <vangyzen@FreeBSD.org> |
Add clock_nanosleep() Add a clock_nanosleep() syscall, as specified by POSIX. Make nanosleep() a wrapper around it. Attach the clock_nanosleep test from NetBSD. Adjust it for the FreeBSD behavior of updating rmtp only when interrupted by a signal. I believe this to be POSIX-compliant, since POSIX mentions the rmtp parameter only in the paragraph about EINTR. This is also what Linux does. (NetBSD updates rmtp unconditionally.) Copy the whole nanosleep.2 man page from NetBSD because it is complete and closely resembles the POSIX description. Edit, polish, and reword it a bit, being sure to keep any relevant text from the FreeBSD page. Reviewed by: kib, ngie, jilles MFC after: 3 weeks Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D10020
|
#
3fdcf9ef |
|
13-Feb-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Order alphabetically. Noted by: alc MFC after: 3 days
|
#
496ab053 |
|
13-Feb-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Rework r313352. Rename kern_vm_* functions to kern_*. Move the prototypes to syscallsubr.h. Also change Mach VM types to uintptr_t/size_t as needed, to avoid headers pollution. Requested by: alc, jhb Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D9535
|
#
96ee4310 |
|
05-Feb-2017 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add kern_cpuset_getaffinity() and kern_cpuset_getaffinity(), and use it in compats instead of their sys_*() counterparts. Reviewed by: kib, jhb, dchagin MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9383
|
#
b38b22b0 |
|
31-Jan-2017 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add kern_pread() and kern_pwrite(), and use it in compats instead of their sys_*() counterparts. The svr4 is left unchanged. Reviewed by: kib@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9379
|
#
ea2ebdc1 |
|
31-Jan-2017 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add kern_cpuset_getid() and kern_cpuset_setid(), and use them in compat32 instead of their sub_*() counterparts. Reviewed by: jhb@, kib@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9382
|
#
d293f35c |
|
29-Jan-2017 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add kern_listen(), kern_shutdown(), and kern_socket(), and use them instead of their sys_*() counterparts in various compats. The svr4 is left untouched, because there's no point. Reviewed by: ed@, kib@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9367
|
#
f67d6b5f |
|
29-Jan-2017 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add kern_lseek() and use it instead of sys_lseek() in various compats. I didn't touch svr4/, there's no point. Reviewed by: ed@, kib@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9366
|
#
142c750a |
|
28-Jan-2017 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Remove unused kern_sendfile() declaration.
|
#
34ed0c63 |
|
27-Dec-2016 |
John Baldwin <jhb@FreeBSD.org> |
Rename the 'flags' argument to getfsstat() to 'mode' and validate it. This argument is not a bitmask of flags, but only accepts a single value. Fail with EINVAL if an invalid value is passed to 'flag'. Rename the 'flags' argument to getmntinfo(3) to 'mode' as well to match. This is a followup to r308088. Reviewed by: kib MFC after: 1 month
|
#
93d9ebd8 |
|
15-Aug-2016 |
Ed Schouten <ed@FreeBSD.org> |
Eliminate use of sys_fsync() and sys_fdatasync(). Make the kern_fsync() function public, so that it can be used by other parts of the kernel. Fix up existing consumers to make use of it. Requested by: kib
|
#
0acf5d0b |
|
25-Feb-2016 |
Mark Johnston <markj@FreeBSD.org> |
Improve error handling for posix_fallocate(2) and posix_fadvise(2). - Set td_errno so that ktrace and dtrace can obtain the syscall error number in the usual way. - Pass negative error numbers directly to the syscall layer, as they're not intended to be returned to userland. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5425
|
#
e26f6b5f |
|
11-Aug-2015 |
Ed Schouten <ed@FreeBSD.org> |
Add support for anonymous kqueues. CloudABI's polling system calls merge the concept of one-shot polling (poll, select) and stateful polling (kqueue). They share the same data structures. Extend FreeBSD's kqueue to provide support for waiting for events on an anonymous kqueue. Unlike stateful polling, there is no need to support timeouts, as an additional timer event could be used instead. Furthermore, it makes no sense to use a different number of input and output kevents. Merge this into a single argument. Obtained from: https://github.com/NuxiNL/freebsd Differential Revision: https://reviews.freebsd.org/D3307
|
#
aa04a06d |
|
11-Aug-2015 |
Ed Schouten <ed@FreeBSD.org> |
Introduce kern_cap_rights_limit(). The existing sys_cap_rights_limit() expects that a cap_rights_t object lives in userspace. It is therefore hard to call into it from kernelspace. Move the interesting bits of sys_cap_rights_limit() into kern_cap_rights_limit(), so that we can call into it from the CloudABI compatibility layer. Obtained from: https://github.com/NuxiNL/freebsd Differential Revision: https://reviews.freebsd.org/D3314
|
#
a2034cc9 |
|
05-Aug-2015 |
Ed Schouten <ed@FreeBSD.org> |
Allow the creation of kqueues with a restricted set of Capsicum rights. On CloudABI we want to create file descriptors with just the minimal set of Capsicum rights in place. The reason for this is that it makes it easier to obtain uniform behaviour across different operating systems. By explicitly whitelisting the operations, we can return consistent error codes, but also prevent applications from depending OS-specific behaviour. Extend kern_kqueue() to take an additional struct filecaps that is passed on to falloc_caps(). Update the existing consumers to pass in NULL. Differential Revision: https://reviews.freebsd.org/D3259
|
#
7ee1b208 |
|
01-Aug-2015 |
Ed Schouten <ed@FreeBSD.org> |
Add kern_shm_open(). This allows you to specify the capabilities that the new file descriptor should have. This allows us to create shared memory objects that only have the rights we're interested in. The idea behind restricting the rights is that it makes it a lot easier for CloudABI to get consistent behaviour across different operating systems. We only need to make sure that a shared memory implementation consistently implements the operations that are whitelisted. Approved by: kib Obtained from: https://github.com/NuxiNL/freebsd
|
#
8328babd |
|
29-Jul-2015 |
Ed Schouten <ed@FreeBSD.org> |
Make pipes in CloudABI work. Summary: Pipes in CloudABI are unidirectional. The reason for this is that CloudABI attempts to provide a uniform runtime environment across different flavours of UNIX. Instead of implementing a custom pipe that is unidirectional, we can simply reuse Capsicum permission bits to support this. This is nice, because CloudABI already attempts to restrict permission bits to correspond with the operations that apply to a certain file descriptor. Replace kern_pipe() and kern_pipe2() by a single kern_pipe() that takes a pair of filecaps. These filecaps are passed to the newly introduced falloc_caps() function that creates the descriptors with rights in place. Test Plan: CloudABI pipes seem to be created with proper rights in place: https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/unistd/pipe_test.c#L44 Reviewers: jilles, mjg Reviewed By: mjg Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3236
|
#
ea566832 |
|
10-Jul-2015 |
Ed Schouten <ed@FreeBSD.org> |
Add missing const keyword to kern_sigaction()'s 'act' parameter. This structure is not modified by the function. Also add const to sigact_flag_test(), as it is called by kern_sigaction().
|
#
5fe97c20 |
|
10-Jul-2015 |
Mateusz Guzik <mjg@FreeBSD.org> |
fd: split kern_dup flags argument into actual flags and a mode Tidy up the code inside to switch on the mode.
|
#
2491302a |
|
09-Jul-2015 |
Ed Schouten <ed@FreeBSD.org> |
Add implementations for some of the CloudABI file descriptor system calls. All of the CloudABI system calls that operate on file descriptors of an arbitrary type are prefixed with fd_. This change adds wrappers for most of these system calls around their FreeBSD equivalents. The dup2() system call present on CloudABI deviates from POSIX, in the sense that it can only be used to replace existing file descriptor. It cannot be used to create new ones. The reason for this is that this is inherently thread-unsafe. Furthermore, there is no need on CloudABI to use fixed file descriptor numbers. File descriptors 0, 1 and 2 have no special meaning. This change exposes the kern_dup() through <sys/syscallsubr.h> and puts the FDDUP_* flags in <sys/filedesc.h>. It then adds a new flag, FDDUP_MUSTREPLACE to force that file descriptors are replaced -- not allocated. Differential Revision: https://reviews.freebsd.org/D3035 Reviewed by: mjg
|
#
7236f2c2 |
|
24-May-2015 |
Dmitry Chagin <dchagin@FreeBSD.org> |
For future use in the Linuxulator: 1. Add a kern_kqueue() counterpart for kqueue() with flags parameter. 2. Be a bit secure. To avoid a double fp lookup add a kern_kevent_fp() counterpart for kern_kevent() with file pointer parameter instead of file descriptor an pass the buck to it. Suggested by: mjg [2] Differential Revision: https://reviews.freebsd.org/D1091 Reviewed by: trasz
|
#
a93e83c8 |
|
24-May-2015 |
Dmitry Chagin <dchagin@FreeBSD.org> |
In preparation for switching linuxulator to the use the native 1:1 threads split sys_sched_getparam(), sys_sched_setparam(), sys_sched_getscheduler(), sys_sched_setscheduler() to their kern_* counterparts and add targettd parameter to allow specify the target thread directly by callee. Differential Revision: https://reviews.freebsd.org/D1034 Reviewed by: trasz
|
#
1aa90eca |
|
24-May-2015 |
Dmitry Chagin <dchagin@FreeBSD.org> |
In preparation for switching linuxulator to the use the native 1:1 threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval(). Add a kern_sched_rr_get_interval() counterpart which takes a targettd parameter to allow specify target thread directly by callee (new Linuxulator). Linuxulator temporarily uses first thread in proc. Move linux_sched_rr_get_interval() to the MI part. Differential Revision: https://reviews.freebsd.org/D1032 Reviewed by: trasz
|
#
09baafb4 |
|
24-May-2015 |
Dmitry Chagin <dchagin@FreeBSD.org> |
In preparation for switching linuxulator to the use the native 1:1 threads introduce kern_thr_alloc() which will be used later in the linux_clone(). Differential Revision: https://reviews.freebsd.org/D1029 Reviewed by: trasz
|
#
95be6d2b |
|
24-May-2015 |
Dmitry Chagin <dchagin@FreeBSD.org> |
In preparation for switching linuxulator to the use the native 1:1 threads split sys_thr_exit() up into sys_thr_exit() and kern_thr_exit(). Move Where the second will be used in linux_exit() system call later. Differential Revision: https://reviews.freebsd.org/D1028 Reviewed by: trasz
|
#
6289b482 |
|
21-Apr-2015 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Modify kern___getcwd() to take max pathlen limit as an additional argument. This will be used for the Linux emulation layer - for Linux, PATH_MAX is 4096 and not 1024. Differential Revision: https://reviews.freebsd.org/D2335 Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation
|
#
1c73bcab |
|
15-Apr-2015 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Rewrite linprocfs_domtab() as a wrapper around kern_getfsstat(). This adds missing jail and MAC checks. Differential Revision: https://reviews.freebsd.org/D2193 Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation
|
#
2205e0d1 |
|
23-Jan-2015 |
Jilles Tjoelker <jilles@FreeBSD.org> |
Add futimens and utimensat system calls. The core kernel part is patch file utimes.2008.4.diff from pluknet@FreeBSD.org. I updated the code for API changes, added the manual page and added compatibility code for old kernels. There is also audit and Capsicum support. A new UTIME_* constant might allow setting birthtimes in future. Differential Revision: https://reviews.freebsd.org/D1426 Submitted by: pluknet (partially) Reviewed by: delphij, pluknet, rwatson Relnotes: yes
|
#
9f7a06f2 |
|
04-Jan-2015 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Indeed, instead of hiding the kern___getcwd() bug by bogus cast in r276564, change path type to char * (pathnames are always char *). And remove bogus casts of malloc(). kern___getcwd() internally doesn't actually use or support u_char * paths, except to copy them to a normal char * path. These changes are not visible to libc as libc/gen/getcwd.c misdeclares __getcwd() as taking a plain char * path. While here remove _SYS_SYSPROTO_H_ for __getcwd() syscall as we always have sysproto.h. Pointed out by: bde MFC after: 1 week
|
#
6e646651 |
|
13-Nov-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove the no-at variants of the kern_xx() syscall helpers. E.g., we have both kern_open() and kern_openat(); change the callers to use kern_openat(). This removes one (sometimes two) levels of indirection and consolidates arguments checks. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
186d9c34 |
|
12-Nov-2014 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Add the ppoll() system call. Export kern_poll() needed by an upcoming Linuxulator change. Differential Revision: https://reviews.freebsd.org/D1133 Reviewed by: kib, wblock MFC after: 1 month
|
#
07b384cb |
|
21-Oct-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
Eliminate unnecessary memory allocation in sys_getgroups and its ibcs2 counterpart.
|
#
f69261f2 |
|
25-Sep-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
Fix fcntl(2) compat32 after r270691. The copyin and copyout of the struct flock are done in the sys_fcntl(), which mean that compat32 used direct access to userland pointers. Move code from sys_fcntl() to new wrapper, kern_fcntl_freebsd(), which performs neccessary userland memory accesses, and use it from both native and compat32 fcntl syscalls. Reported by: jhibbits Sponsored by: The FreeBSD Foundation MFC after: 3 days
|
#
abd386ba |
|
24-Aug-2014 |
Mateusz Guzik <mjg@FreeBSD.org> |
Fix getppid for traced processes. Traced processes always have the tracer set as the parent. Utilize proc_realparent to obtain the right process when needed. Reviewed by: kib MFC after: 1 week
|
#
e346f8c4 |
|
07-Aug-2014 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Split up sys_ktimer_getoverrun() into a sys_ and a kern_ variant and export the kern_ version needed by an upcoming linuxolator change. MFC after: 3 days Sponsored by: DARPA,AFRL
|
#
55648840 |
|
19-Sep-2013 |
John Baldwin <jhb@FreeBSD.org> |
Extend the support for exempting processes from being killed when swap is exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month
|
#
b12698e1 |
|
18-Sep-2013 |
Roman Divacky <rdivacky@FreeBSD.org> |
Revert r255672, it has some serious flaws, leaking file references etc. Approved by: re (delphij)
|
#
253c75c0 |
|
18-Sep-2013 |
Roman Divacky <rdivacky@FreeBSD.org> |
Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data so this patch overrides kqueue fileops to maintain enough space in struct file. Initial patch developed by me in 2007 and then extended and finished by Yuri Victorovich. Approved by: re (delphij) Sponsored by: Google Summer of Code Submitted by: Yuri Victorovich <yuri at rawbw dot com> Tested by: Yuri Victorovich <yuri at rawbw dot com>
|
#
0dac22d8 |
|
18-Aug-2013 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Implement 32bit versions of the cap_ioctls_limit(2) and cap_ioctls_get(2) system calls as unsigned longs have different size on i386 and amd64. Reported by: jilles Sponsored by: The FreeBSD Foundation
|
#
643ee871 |
|
21-Jul-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
Implement compat32 wrappers for the ktimer_* syscalls. Reported, reviewed and tested by: Petr Salinger <Petr.Salinger@seznam.cz> Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
d31e4b3a |
|
20-Jul-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
id_t is 64bit, provide the compat32 wrapper for clock_getcpuclockid2(2). Reported and tested by: Petr Salinger <Petr.Salinger@seznam.cz> PR: threads/180652 Sponsored by: The FreeBSD Foundation
|
#
da7d2afb |
|
01-May-2013 |
Jilles Tjoelker <jilles@FreeBSD.org> |
Add accept4() system call. The accept4() function, compared to accept(), allows setting the new file descriptor atomically close-on-exec and explicitly controlling the non-blocking status on the new socket. (Note that the latter point means that accept() is not equivalent to any form of accept4().) The linuxulator's accept4 implementation leaves a race window where the new file descriptor is not close-on-exec because it calls sys_accept(). This implementation leaves no such race window (by using falloc() flags). The linuxulator could be fixed and simplified by using the new code. Like accept(), accept4() is async-signal-safe, a cancellation point and permitted in capability mode.
|
#
d289dc7b |
|
31-Mar-2013 |
Jilles Tjoelker <jilles@FreeBSD.org> |
Rename do_pipe() to kern_pipe2() and declare it properly.
|
#
e2d55f48 |
|
15-Nov-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Move the definition of the idtype_t from sys/types.h to sys/wait.h. Fix the bug, use #if __BSD_VISIBLE instead of #if defined(__BSD_VISIBLE), since __BSD_VISIBLE is always defined. Reformat the comments from the Solaris style to KNF. Reported and reviewed by: bde MFC after: 28 days
|
#
43bdcf93 |
|
15-Nov-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Alphabetically reorder the forward-declarations of the structures. Add the declaration for enum idtype, to be used later. Reported and reviewed by: bde MFC after: 28 days
|
#
f13b5a0f |
|
12-Nov-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Add the wait6(2) system call. It takes POSIX waitid()-like process designator to select a process which is waited for. The system call optionally returns siginfo_t which would be otherwise provided to SIGCHLD handler, as well as extended structure accounting for child and cumulative grandchild resource usage. Allow to get the current rusage information for non-exited processes as well, similar to Solaris. The explicit WEXITED flag is required to wait for exited processes, allowing for more fine-grained control of the events the waiter is interested in. Fix the handling of siginfo for WNOWAIT option for all wait*(2) family, by not removing the queued signal state. PR: standards/170346 Submitted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 month
|
#
76dcec5d |
|
24-May-2012 |
Gleb Kurtsou <gleb@FreeBSD.org> |
Add kern_fhstat(), adjust sys_fhstat() to use it. Extend kern_getdirentries() to accept uio segflag and optionally return buffer residue. Sponsored by: Google Summer of Code 2011
|
#
7edec621 |
|
14-Nov-2011 |
John Baldwin <jhb@FreeBSD.org> |
- Split out a kern_posix_fadvise() from the posix_fadvise() system call so it can be used by in-kernel consumers. - Make kern_posix_fallocate() public. - Use kern_posix_fadvise() and kern_posix_fallocate() to implement the freebsd32 wrappers for the two system calls.
|
#
7332c129 |
|
01-Apr-2011 |
Konstantin Belousov <kib@FreeBSD.org> |
Add support for executing the FreeBSD 1/i386 a.out binaries on amd64. In particular: - implement compat shims for old stat(2) variants and ogetdirentries(2); - implement delivery of signals with ancient stack frame layout and corresponding sigreturn(2); - implement old getpagesize(2); - provide a user-mode trampoline and LDT call gate for lcall $7,$0; - port a.out image activator and connect it to the build as a module on amd64. The changes are hidden under COMPAT_43. MFC after: 1 month
|
#
86665509 |
|
30-Mar-2011 |
Konstantin Belousov <kib@FreeBSD.org> |
Provide compat32 shims for kldstat(2). Requested and tested by: jpaetzel MFC after: 1 week
|
#
a7d5f7eb |
|
19-Oct-2010 |
Jamie Gritton <jamie@FreeBSD.org> |
A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
|
#
fc0de8f0 |
|
30-Jun-2010 |
John Baldwin <jhb@FreeBSD.org> |
Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
|
#
e268f54c |
|
11-Jan-2010 |
Kirk McKusick <mckusick@FreeBSD.org> |
Background: When renaming a directory it passes through several intermediate states. First its new name will be created causing it to have two names (from possibly different parents). Next, if it has different parents, its value of ".." will be changed from pointing to the old parent to pointing to the new parent. Concurrently, its old name will be removed bringing it back into a consistent state. When fsck encounters an extra name for a directory, it offers to remove the "extraneous hard link"; when it finds that the names have been changed but the update to ".." has not happened, it offers to rewrite ".." to point at the correct parent. Both of these changes were considered unexpected so would cause fsck in preen mode or fsck in background mode to fail with the need to run fsck manually to fix these problems. Fsck running in preen mode or background mode now corrects these expected inconsistencies that arise during directory rename. The functionality added with this update is used by fsck running in background mode to make these fixes. Solution: This update adds three new fsck sysctl commands to support background fsck in correcting expected inconsistencies that arise from incomplete directory rename operations. They are: setcwd(dirinode) - set the current directory to dirinode in the filesystem associated with the snapshot. setdotdot(oldvalue, newvalue) - Verify that the inode number for ".." in the current directory is oldvalue then change it to newvalue. unlink(nameptr, oldvalue) - Verify that the inode number associated with nameptr in the current directory is oldvalue then unlink it. As with all other fsck sysctls, these new ones may only be used by processes with appropriate priviledge. Reported by: jeff Security issues: rwatson
|
#
7e767511 |
|
19-Dec-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
MFC r198508, r198509: Reimplement pselect() in kernel, making change of sigmask and sleep atomic. MFC r198538: Move pselect(3) man page to section 2.
|
#
3134e115 |
|
19-Dec-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
MFC r198506: In kern_sigsuspend(), manipulate thread signal mask using kern_sigprocmask(). Also, do cursig/postsig loop immediately after waiting for signal, repeating the wait if wakeup was spurious due to race with other thread fetching signal from the process queue before us. MFC r199136: Use cpu_set_syscall_retval(9) to set syscall result, and return EJUSTRETURN from kern_sigsuspend() to prevent syscall return code from modifying wrong frame. Take care of possibility that pending SIGCONT might be cancelled by SIGSTOP, causing postsig() not to deliver any catched signal.
|
#
066d836b |
|
27-Oct-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Current pselect(3) is implemented in usermode and thus vulnerable to well-known race condition, which elimination was the reason for the function appearance in first place. If sigmask supplied as argument to pselect() enables a signal, the signal might be delivered before thread called select(2), causing lost wakeup. Reimplement pselect() in kernel, making change of sigmask and sleep atomic. Since signal shall be delivered to the usermode, but sigmask restored, set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK should be cleared by ast() in case signal was not gelivered during syscall execution. Reviewed by: davidxu Tested by: pho MFC after: 1 month
|
#
84440afb |
|
27-Oct-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
In kern_sigsuspend(), better manipulate thread signal mask using kern_sigprocmask() to properly notify other possible candidate threads for signal delivery. Since sigsuspend() shall only return to usermode after a signal was delivered, do cursig/postsig loop immediately after waiting for signal, repeating the wait if wakeup was spurious due to race with other thread fetching signal from the process queue before us. Add thread_suspend_check() call to allow the thread to be stopped or killed while in loop. Modify last argument of kern_sigprocmask() from boolean to flags, allowing the function to be called with locked proc. Convertion of the callers that supplied 1 to the old argument will be done in the next commit, and due to SIGPROCMASK_OLD value equial to 1, code is formally correct in between. Reviewed by: davidxu Tested by: pho MFC after: 1 month
|
#
9f1fab50 |
|
16-Sep-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
MFC r197049: Calculate the amount of bytes to copy for select filedescriptor masks taking into account size of fd_set for the current process ABI. Approved by: re (kensmith)
|
#
b55ef216 |
|
09-Sep-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
kern_select(9) copies fd_set in and out of userspace in quantities of longs. Since 32bit processes longs are 4 bytes, 64bit kernel may copy in or out 4 bytes more then the process expected. Calculate the amount of bytes to copy taking into account size of fd_set for the current process ABI. Diagnosed and tested by: Peter Jeremy <peterjeremy acm org> Reviewed by: jhb MFC after: 1 week
|
#
c3889811 |
|
08-Jul-2009 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
There is an optimization in chmod(1), that makes it not to call chmod(2) if the new file mode is the same as it was before; however, this optimization must be disabled for filesystems that support NFSv4 ACLs. Chmod uses pathconf(2) to determine whether this is the case - however, pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't. This change adds lpathconf(3) to make it possible to solve that problem in a clean way. Reviewed by: rwatson (earlier version) Approved by: re (kib)
|
#
4202e1be |
|
30-May-2009 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Split native socketpair() syscall onto kern_socketpair() which should be used by kernel consumers and socketpair() itself. Approved by: kib (mentor) MFC after: 1 month
|
#
0304c731 |
|
27-May-2009 |
Jamie Gritton <jamie@FreeBSD.org> |
Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)
|
#
b38ff370 |
|
29-Apr-2009 |
Jamie Gritton <jamie@FreeBSD.org> |
Introduce the extensible jail framework, using the same "name=value" interface as nmount(2). Three new system calls are added: * jail_set, to create jails and change the parameters of existing jails. This replaces jail(2). * jail_get, to read the parameters of existing jails. This replaces the security.jail.list sysctl. * jail_remove to kill off a jail's processes and remove the jail. Most jail parameters may now be changed after creation, and jails may be set to exist without any attached processes. The current jail(2) system call still exists, though it is now a stub to jail_set(2). Approved by: bz (mentor)
|
#
0eee862a |
|
20-Feb-2009 |
Ed Schouten <ed@FreeBSD.org> |
Don't make Linux stat() open character devices to resolve its name. The existing code calls kern_open() to resolve the vnode of a pathname right after a stat(). This is not correct, because it causes random character devices to be opened in /dev. This means ls'ing a tape streamer will cause it to rewind, for example. Changes I have made: - Add kern_statat_vnhook() to allow binary emulators to `post-process' struct stat, using the proper vnode. - Remove unneeded printf's from stat() and statfs(). - Make the Linuxolator use kern_statat_vnhook(), replacing translate_path_major_minor_at(). - Let translate_fd_major_minor() use vp->v_rdev instead of vp->v_un.vu_cdev. Result: crw-rw-rw- 1 root root 0, 14 Feb 20 13:54 /dev/ptmx crw--w---- 1 root adm 136, 0 Feb 20 14:03 /dev/pts/0 crw--w---- 1 root adm 136, 1 Feb 20 14:02 /dev/pts/1 crw--w---- 1 ed tty 136, 2 Feb 20 14:03 /dev/pts/2 Before this commit, ptmx also had a major number of 136, because it silently allocated and deallocated a pseudo-terminal. Device nodes that cannot be opened now have proper major/minor-numbers. Reviewed by: kib, netchild, rdivacky (thanks!)
|
#
ab0d10f6 |
|
11-Nov-2008 |
Ed Schouten <ed@FreeBSD.org> |
Several cleanups related to pipe(2). - Use `fildes[2]' instead of `*fildes' to make more clear that pipe(2) fills an array with two descriptors. - Remove EFAULT from the manual page. Because of the current calling convention, pipe(2) raises a segmentation fault when an invalid address is passed. - Introduce kern_pipe() to make it easier for binary emulations to implement pipe(2). - Make Linux binary emulation use kern_pipe(), which means we don't have to recover td_retval after calling the FreeBSD system call. Approved by: rdivacky Discussed on: arch
|
#
63f8fe9e |
|
22-Oct-2008 |
John Baldwin <jhb@FreeBSD.org> |
Split the copyout of *base at the end of getdirentries() out leaving the rest in kern_getdirentries(). Use kern_getdirentries() to implement freebsd32_getdirentries(). This fixes a bug where calls to getdirentries() in 32-bit binaries would trash the 4 bytes after the 'long base' in userland. Submitted by: ups MFC after: 1 week
|
#
d7f03759 |
|
19-Oct-2008 |
Ulf Lilleengen <lulf@FreeBSD.org> |
- Import the HEAD csup code which is the basis for the cvsmode work.
|
#
48b05c3f |
|
08-Apr-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Implement the linux syscalls openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat, renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat. Submitted by: rdivacky Sponsored by: Google Summer of Code 2007 Tested by: pho
|
#
e4193f25 |
|
30-Mar-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Implement the openat(2), faccessat(2), fchmodat(2), fchownat(2), fstatat(2), futimesat(2), linkat(2), mkdirat(2), mkfifoat(2), mknodat(2), readlinkat(2), renameat(2), symlinkat(2) syscalls. Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho
|
#
5f56182b |
|
12-Feb-2008 |
Ruslan Ermilov <ru@FreeBSD.org> |
Change readlink(2)'s return type and type of the last argument to match POSIX. Prodded by: Alexey Lyashkov
|
#
e4650294 |
|
07-Jan-2008 |
John Baldwin <jhb@FreeBSD.org> |
Make ftruncate a 'struct file' operation rather than a vnode operation. This makes it possible to support ftruncate() on non-vnode file types in the future. - 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on a given file descriptor. - ftruncate() moves to kern/sys_generic.c and now just fetches a file object and invokes fo_truncate(). - The vnode-specific portions of ftruncate() move to vn_truncate() in vfs_vnops.c which implements fo_truncate() for vnode file types. - Non-vnode file types return EINVAL in their fo_truncate() method. Submitted by: rwatson
|
#
a66fde8d |
|
07-Jun-2007 |
John Baldwin <jhb@FreeBSD.org> |
- Remove unused variable from create_thread(). - Move kern_thr_*() prototype to <sys/syscallsubr.h> where all the other kern_*() prototypes live.
|
#
4e4de5e4 |
|
20-Dec-2006 |
Jung-uk Kim <jkim@FreeBSD.org> |
MFP4: (part of) 110058 copyin()/copyout() for message type is separated from msgsnd()/msgrcv() and it is done from its wrapper functions to support 32-bit emulations. After I implemented this, I have briefly referenced NetBSD and Darwin. NetBSD passes copyin()/copyout() function pointers from wrappers. Darwin passes size of message type as an argument, which is actually similar to my first implementation (P4 109706). We may revisit these implementations later.
|
#
f30e89ce |
|
27-Jul-2006 |
John Baldwin <jhb@FreeBSD.org> |
Fix a file descriptor race I reintroduced when I split accept1() up into kern_accept() and accept1(). If another thread closed the new file descriptor and the first thread later got an error trying to copyout the socket address, then it would attempt to close the wrong file object. To fix, add a struct file ** argument to kern_accept(). If it is non-NULL, then on success kern_accept() will store a pointer to the new file object there and not release any of the references. It is up to the calling code to drop the references appropriately (including a call to fdclose() in case of error to safely handle the aforementioned race). While I'm at it, go ahead and fix the svr4 streams code to not leak the accept fd if it gets an error trying to copyout the streams structures.
|
#
c870740e |
|
10-Jul-2006 |
John Baldwin <jhb@FreeBSD.org> |
- Split out kern_accept(), kern_getpeername(), and kern_getsockname() for use by ABI emulators. - Alter the interface of kern_recvit() somewhat. Specifically, go ahead and hard code UIO_USERSPACE in the uio as that's what all the callers specify. In place, add a new uioseg to indicate what type of pointer is in mp->msg_name. Previously it was always a userland address, but ABI emulators may pass in kernel-side sockaddrs. Also, remove the namelenp field and instead require the two places that used it to explicitly copy mp->msg_namelen out to userland. - Use the patched kern_recvit() to replace svr4_recvit() and the stock kern_sendit() to replace svr4_sendit(). - Use kern_bind() instead of stackgap use in ti_bind(). - Use kern_getpeername() and kern_getsockname() instead of stackgap in svr4_stream_ti_ioctl(). - Use kern_connect() instead of stackgap in svr4_do_putmsg(). - Use kern_getpeername() and kern_accept() instead of stackgap in svr4_do_getmsg(). - Retire the stackgap from SVR4 compat as it is no longer used.
|
#
d9f46233 |
|
08-Jul-2006 |
John Baldwin <jhb@FreeBSD.org> |
- Split ioctl() up into ioctl() and kern_ioctl(). The kern_ioctl() assumes that the 'data' pointer is already setup to point to a valid KVM buffer or contains the copied-in data from userland as appropriate (ioctl(2) still does this). kern_ioctl() takes care of looking up a file pointer, implementing FIONCLEX and FIOCLEX, and calling fi_ioctl(). - Use kern_ioctl() to implement xenix_rdchk() instead of using the stackgap and mark xenix_rdchk() MPSAFE.
|
#
c1cccebe |
|
08-Jul-2006 |
John Baldwin <jhb@FreeBSD.org> |
Add a kern_close() so that the ABIs can close a file descriptor w/o having to populate a close_args struct and change some of the places that do.
|
#
b1ee5b65 |
|
08-Jul-2006 |
John Baldwin <jhb@FreeBSD.org> |
Rework kern_semctl a bit to always assume the UIO_SYSSPACE case. This mostly consists of pushing a few copyin's and copyout's up into __semctl() as all the other callers were already doing the UIO_SYSSPACE case. This also changes kern_semctl() to set the return value in a passed in pointer to a register_t rather than td->td_retval[0] directly so that callers can only set td->td_retval[0] if all the various copyout's succeed. As a result of these changes, kern_semctl() no longer does copyin/copyout (except for GETALL/SETALL) so simplify the locking to acquire the semakptr mutex before the MAC check and hold it all the way until the end of the big switch statement. The GETALL/SETALL cases have to temporarily drop it while they do copyin/malloc and copyout. Also, simplify the SETALL case to remove handling for a non-existent race condition.
|
#
3cb83e71 |
|
06-Jul-2006 |
John Baldwin <jhb@FreeBSD.org> |
Add kern_setgroups() and kern_getgroups() and use them to implement ibcs2_[gs]etgroups() rather than using the stackgap. This also makes ibcs2_[gs]etgroups() MPSAFE. Also, it cleans up one bit of weirdness in the old setgroups() where it allocated an entire credential just so it had a place to copy the group list into. Now setgroups just allocates a NGROUPS_MAX array on the stack that it copies into and then passes to kern_setgroups().
|
#
49d409a1 |
|
27-Jun-2006 |
John Baldwin <jhb@FreeBSD.org> |
- Add a kern_semctl() helper function for __semctl(). It accepts a pointer to a copied-in copy of the 'union semun' and a uioseg to indicate which memory space the 'buf' pointer of the union points to. This is then used in linux_semctl() and svr4_sys_semctl() to eliminate use of the stackgap. - Mark linux_ipc() and svr4_sys_semsys() MPSAFE.
|
#
d5388587 |
|
13-Jun-2006 |
John Baldwin <jhb@FreeBSD.org> |
- Add a kern_kldload() that is most of the previous kldload() and push Giant down in it. - Push Giant down in kern_kldunload() and reorganize it slightly to avoid using gotos. Also, expose this function to the rest of the kernel.
|
#
fa545f43 |
|
28-Feb-2006 |
Paul Saab <ps@FreeBSD.org> |
Fix 32bit sendfile by implementing kern_sendfile so that it takes the header and trailers as iovec arguments instead of copying them in inside of sendfile. Reviewed by: jhb MFC after: 3 weeks
|
#
809f984b |
|
06-Feb-2006 |
John Baldwin <jhb@FreeBSD.org> |
Add a kern_eaccess() function and use it to implement xenix_eaccess() rather than kern_access(). Suggested by: rwatson
|
#
ecc44de7 |
|
31-Oct-2005 |
Paul Saab <ps@FreeBSD.org> |
Reformat socket control messages on input/output for 32bit compatibility on 64bit systems. Submitted by: ps, ups Reviewed by: jhb
|
#
a372f822 |
|
14-Oct-2005 |
Paul Saab <ps@FreeBSD.org> |
Implement the 32bit versions of recvmsg, recvfrom, sendmsg Partially obtained from: jhb
|
#
f0b479cd |
|
14-Oct-2005 |
Paul Saab <ps@FreeBSD.org> |
Implement 32bit wrappers for clock_gettime, clock_settime, and clock_getres.
|
#
bcd9e0dd |
|
07-Jul-2005 |
John Baldwin <jhb@FreeBSD.org> |
- Add two new system calls: preadv() and pwritev() which are like readv() and writev() except that they take an additional offset argument and do not change the current file position. In SAT speak: preadv:readv::pread:read and pwritev:writev::pwrite:write. - Try to reduce code duplication some by merging most of the old kern_foov() and dofilefoo() functions into new dofilefoo() functions that are called by kern_foov() and kern_pfoov(). The non-v functions now all generate a simple uio on the stack from the passed in arguments and then call kern_foov(). For example, read() now just builds a uio and calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev(). PR: kern/80362 Submitted by: Marc Olzheim marcolz at stack dot nl (1) Approved by: re (scottl) MFC after: 1 week
|
#
3a996d6e |
|
11-Jun-2005 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Do not allocate memory based on not-checked argument from userland. It can be used to panic the kernel by giving too big value. Fix it by moving allocation and size verification into kern_getfsstat(). This even simplifies kern_getfsstat() consumers, but destroys symmetry - memory is allocated inside kern_getfsstat(), but has to be freed by the caller. Found by: FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/ Reported by: Peter Holm <peter@holm.cc>
|
#
13a82b96 |
|
09-Jun-2005 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Avoid code duplication in serval places by introducing universal kern_getfsstat() function. Obtained from: jhb
|
#
efe5beca |
|
03-Jun-2005 |
Paul Saab <ps@FreeBSD.org> |
Wrap copyin/copyout for kevent so the 32bit wrapper does not have to malloc nchanges * sizeof(struct kevent) AND/OR nevents * sizeof(struct kevent) on every syscall. Glanced at by: peter, jmg Obtained from: Yahoo! MFC after: 2 weeks
|
#
b88ec951 |
|
31-Mar-2005 |
John Baldwin <jhb@FreeBSD.org> |
Implement kern_adjtime(), kern_readv(), kern_sched_rr_get_interval(), kern_settimeofday(), and kern_writev() to allow for further stackgap reduction in the compat ABIs.
|
#
c1aa81b6 |
|
01-Mar-2005 |
Paul Saab <ps@FreeBSD.org> |
regen
|
#
1a88a252 |
|
13-Feb-2005 |
Maxim Sobolev <sobomax@FreeBSD.org> |
Backout previous change (disabling of security checks for signals delivered in emulation layers), since it appears to be too broad. Requested by: rwatson
|
#
d8ff44b7 |
|
13-Feb-2005 |
Maxim Sobolev <sobomax@FreeBSD.org> |
Split out kill(2) syscall service routine into user-level and kernel part, the former is callable from user space and the latter from the kernel one. Make kernel version take additional argument which tells if the respective call should check for additional restrictions for sending signals to suid/sugid applications or not. Make all emulation layers using non-checked version, since signal numbers in emulation layers can have different meaning that in native mode and such protection can cause misbehaviour. As a result remove LIBTHR from the signals allowed to be delivered to a suid/sugid application. Requested (sorta) by: rwatson MFC after: 2 weeks
|
#
fee4a6af |
|
07-Feb-2005 |
John Baldwin <jhb@FreeBSD.org> |
Implement a kern_pathconf() wrapper for pathconf() which can take the filename from either a user space or a kernel space pointer.
|
#
76951d21 |
|
07-Feb-2005 |
John Baldwin <jhb@FreeBSD.org> |
- Tweak kern_msgctl() to return a copy of the requested message queue id structure in the struct pointed to by the 3rd argument for IPC_STAT and get rid of the 4th argument. The old way returned a pointer into the kernel array that the calling function would then access afterwards without holding the appropriate locks and doing non-lock-safe things like copyout() with the data anyways. This change removes that unsafeness and resulting race conditions as well as simplifying the interface. - Implement kern_foo wrappers for stat(), lstat(), fstat(), statfs(), fstatfs(), and fhstatfs(). Use these wrappers to cut out a lot of code duplication for freebsd4 and netbsd compatability system calls. - Add a new lookup function kern_alternate_path() that looks up a filename under an alternate prefix and determines which filename should be used. This is basically a more general version of linux_emul_convpath() that can be shared by all the ABIs thus allowing for further reduction of code duplication.
|
#
a6886ef1 |
|
30-Jan-2005 |
Maxim Sobolev <sobomax@FreeBSD.org> |
Extend kern_sendit() to take another enum uio_seg argument, which specifies where the buffer to send lies and use it to eliminate yet another stackgap in linuxlator. MFC after: 2 weeks
|
#
610ecfe0 |
|
29-Jan-2005 |
Maxim Sobolev <sobomax@FreeBSD.org> |
o Split out kernel part of execve(2) syscall into two parts: one that copies arguments into the kernel space and one that operates completely in the kernel space; o use kernel-only version of execve(2) to kill another stackgap in linuxlator/i386. Obtained from: DragonFlyBSD (partially) MFC after: 2 weeks
|
#
f4b6eb04 |
|
25-Jan-2005 |
Maxim Sobolev <sobomax@FreeBSD.org> |
Split out kernel side of msgctl(2) into two parts: the first that pops data from the userland and pushes results back and the second which does actual processing. Use the latter to eliminate stackgap in the linux wrapper of that syscall. MFC after: 2 weeks
|
#
23af91dc |
|
25-Jan-2005 |
Maxim Sobolev <sobomax@FreeBSD.org> |
More kern_{get,set}itiver() where they belong. Submitted by: dwmalone MFC after: 2 weeks
|
#
efa42cbc |
|
19-Jan-2005 |
Paul Saab <ps@FreeBSD.org> |
move kern_nanosleep to sys/syscallsubr.h Requested by: jhb
|
#
60727d8b |
|
06-Jan-2005 |
Warner Losh <imp@FreeBSD.org> |
/* -> /*- for license, minor formatting changes
|
#
c8837938 |
|
05-Jan-2005 |
John Baldwin <jhb@FreeBSD.org> |
- Move the function prototypes for kern_setrlimit() and kern_wait() to sys/syscallsubr.h where all the other kern_foo() prototypes live. - Resort kern_execve() while I'm there.
|
#
84e0b075 |
|
07-Oct-2004 |
David Xu <davidxu@FreeBSD.org> |
Add an execve command for kse_thr_interrupt to allow libpthread to restore signal mask correctly, this is required by POSIX. Reviewed by: deischen
|
#
78c85e8d |
|
05-Oct-2004 |
John Baldwin <jhb@FreeBSD.org> |
Rework how we store process times in the kernel such that we always store the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month
|
#
8914e6f4 |
|
24-Sep-2004 |
John Baldwin <jhb@FreeBSD.org> |
Sort forward declared structures.
|
#
e140eb43 |
|
17-Jul-2004 |
David Malone <dwmalone@FreeBSD.org> |
Add a kern_setsockopt and kern_getsockopt which can read the option values from either user land or from the kernel. Use them for [gs]etsockopt and to clean up some calls to [gs]etsockopt in the Linux emulation code that uses the stackgap.
|
#
2332251c |
|
04-Nov-2003 |
Max Khon <fjoe@FreeBSD.org> |
Back out the following revisions: 1.36 +73 -60 src/sys/compat/linux/linux_ipc.c 1.83 +102 -48 src/sys/kern/sysv_shm.c 1.8 +4 -0 src/sys/sys/syscallsubr.h That change was intended to support vmware3, but wantrem parameter is useless because vmware3 uses SYSV shared memory to talk with X server and X server is native application. The patch worked because check for wantrem was not valid (wantrem and SHMSEG_REMOVED was never checked for SHMSEG_ALLOCATED segments). Add kern.ipc.shm_allow_removed (integer, rw) sysctl (default 0) which when set to 1 allows to return removed segments in shm_find_segment_by_shmid() and shm_find_segment_by_shmidx(). MFC after: 1 week
|
#
710c5645 |
|
05-May-2003 |
David Malone <dwmalone@FreeBSD.org> |
Split sendit into two parts. The first part, still called sendit, that does the copyin stuff and then calls the second part kern_sendit to do the hard work. Don't bother holding Giant during the copyin phase. The intent of this is to allow the Linux emulator to impliment send* syscalls without using the stackgap.
|
#
f130dcf2 |
|
05-May-2003 |
Martin Blapp <mbr@FreeBSD.org> |
Change the semantics of sysv shm emulation to take a additional argument to the functions shm{at,ctl}1 and shm_find_segment_by_shmid{x}. The BSD semantics didn't allow the usage of shared segment after being marked for removal through IPC_RMID. The patch involves the following functions: - shmat - shmctl - shm_find_segment_by_shmid - shm_find_segment_by_shmidx - linux_shmat - linux_shmctl Submitted by: Orlando Bassotto <orlando.bassotto@ieo-research.it> Reviewed by: marcel
|
#
e77daab1 |
|
18-Apr-2003 |
John Baldwin <jhb@FreeBSD.org> |
Rename do_sigprocmask() to kern_sigprocmask() and make it a global symbol so that it can be used by binary emulators.
|
#
12e4397e |
|
03-Feb-2003 |
Hajimu UMEMOTO <ume@FreeBSD.org> |
Break out the bind and connect syscalls to intend to make calling these syscalls internally easy. This is preparation for force coming IPv6 support for Linuxlator. Submitted by: dwmalone MFC after: 10 days
|
#
23eeeff7 |
|
25-Oct-2002 |
Peter Wemm <peter@FreeBSD.org> |
Split 4.x and 5.x signal handling so that we can keep 4.x signal handling clean and functional as 5.x evolves. This allows some of the nasty bandaids in the 5.x codepaths to be unwound. Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an anti-foot-shooting measure in place, 5.x folks need this for a while) and finish encapsulating the older stuff under COMPAT_43. Since the ancient stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *' to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn is supposed to take), add a compile time check to prevent foot shooting there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc. Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago). Approved by: re
|
#
012e544f |
|
04-Sep-2002 |
Ian Dowse <iedowse@FreeBSD.org> |
Split up ptrace() into a wrapper that does the copying to and from user space and a kern_ptrace() implementation. Use the kern_*() version in the Linux emulation code to remove more stack gap uses. Approved by: des
|
#
48b52b7a |
|
02-Sep-2002 |
Ian Dowse <iedowse@FreeBSD.org> |
Split up __getcwd so that kernel callers of the internal version can specify whether the buffer is in user or system space.
|
#
49c2ff15 |
|
02-Sep-2002 |
Ian Dowse <iedowse@FreeBSD.org> |
Split fcntl() into a wrapper and a kernel-callable kern_fcntl() implementation. The wrapper is responsible for copying additional structure arguments (struct flock) to and from userland.
|
#
8f19eb88 |
|
01-Sep-2002 |
Ian Dowse <iedowse@FreeBSD.org> |
Split out a number of mostly VFS and signal related syscalls into a kernel-internal kern_*() version and a wrapper that is called via the syscall vector table. For paths and structure pointers, the internal version either takes a uio_seg parameter or requires the caller to copyin() the data to kernel memory as appropiate. This will permit emulation layers to use these syscalls without having to copy out translated arguments to the stack gap. Discussed on: -arch Review/suggestions: bde, jhb, peter, marcel
|