#
360331 |
|
26-Apr-2020 |
hselasky |
MFC r359968: Cast all ioctl command arguments through uint32_t internally.
Hide debug print showing use of sign extended ioctl command argument under INVARIANTS. The print is available to all and can easily fill up the logs.
No functional change intended.
Sponsored by: Mellanox Technologies
|
#
331722 |
|
29-Mar-2018 |
eadler |
Revert r330897:
This was intended to be a non-functional change. It wasn't. The commit message was thus wrong. In addition it broke arm, and merged crypto related code.
Revert with prejudice.
This revert skips files touched in r316370 since that commit was since MFCed. This revert also skips files that require $FreeBSD$ property changes.
Thank you to those who helped me get out of this mess including but not limited to gonzo, kevans, rgrimes.
Requested by: gjb (re)
|
#
330897 |
|
14-Mar-2018 |
eadler |
Partial merge of the SPDX changes
These changes are incomplete but are making it difficult to determine what other changes can/should be merged.
No objections from: pfg
|
#
326684 |
|
08-Dec-2017 |
kib |
MFC r326429: Destroy seltd st_mtx and st_wait in seltdfini().
|
#
315553 |
|
19-Mar-2017 |
trasz |
MFC r313018:
Add kern_pread() and kern_pwrite(), and use it in compats instead of their sys_*() counterparts. The svr4 is left unchanged.
Sponsored by: DARPA, AFRL
|
#
315462 |
|
17-Mar-2017 |
mmokhi |
MFC r314996 Fix NULL pointer dereference and panic with shm file pread/pwrite.
Approved by: dchagin
|
#
314334 |
|
27-Feb-2017 |
kib |
MFC kern_mmap(9) and related helpers.
MFC r302514 (by rwatson): Audit file-descriptor arguments to I/O system calls such as read(2), write(2), dup(2), and mmap(2).
MFC r302524 (by rwatson): When mmap(2) is used with a vnode, capture vnode attributes in the audit trail.
MFC r313352 (by trasz): Add kern_vm_mmap2(), kern_vm_mprotect(), kern_vm_msync(), kern_vm_munlock(), kern_vm_munmap(), and kern_vm_madvise().
MFC r313655: Change type of the prot parameter for kern_vm_mmap() from vm_prot_t to int.
MFC r313696: Rework r313352.
|
#
302408 |
|
07-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
297493 |
|
01-Apr-2016 |
jhb |
Cap IOSIZE_MAX to INT_MAX for 32-bit processes.
Previously, freebsd32 binaries could submit read/write requests with lengths greater than INT_MAX that a native kernel would have rejected.
Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5788
|
#
296060 |
|
25-Feb-2016 |
markj |
Improve error handling for posix_fallocate(2) and posix_fadvise(2).
- Set td_errno so that ktrace and dtrace can obtain the syscall error number in the usual way. - Pass negative error numbers directly to the syscall layer, as they're not intended to be returned to userland.
Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5425
|
#
285310 |
|
09-Jul-2015 |
kib |
Cover a race between doselwakeup() and selfdfree(). If doselwakeup() loop finds the selfd entry and clears its sf_si pointer, which is handled by selfdfree() in parallel, NULL sf_si makes selfdfree() free the memory. The result is the race and accesses to the freed memory.
Refcount the selfd ownership. One reference is for the sf_link linkage, which is unconditionally dereferenced by selfdfree(). Another reference is for sf_threads, both selfdfree() and doselwakeup() race to deref it, the winner unlinks and than frees the selfd entry.
Reported by: Larry Rosenman <ler@lerctr.org> Tested by: Larry Rosenman <ler@lerctr.org>, pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
281714 |
|
18-Apr-2015 |
kib |
The lseek(2), mmap(2), truncate(2), ftruncate(2), pread(2), and pwrite(2) syscalls are wrapped to provide compatibility with pre-7.x kernels which required padding before the off_t parameter. The fcntl(2) contains compatibility code to handle kernels before the struct flock was changed during the 8.x CURRENT development. The shims were reasonable to allow easier revert to the older kernel at that time.
Now, two or three major releases later, shims do not serve any purpose. Such old kernels cannot handle current libc, so revert the compatibility code.
Make padded syscalls support conditional under the COMPAT6 config option. For COMPAT32, the syscalls were under COMPAT6 already.
Remove WITHOUT_SYSCALL_COMPAT build option, which only purpose was to (partially) disable the removed shims.
Reviewed by: jhb, imp (previous versions) Discussed with: peter Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
278930 |
|
17-Feb-2015 |
mjg |
filedesc: simplify fget_unlocked & friends
Introduce fget_fcntl which performs appropriate checks when needed. This removes a branch from fget_unlocked.
Introduce fget_mmap dealing with cap_rights_to_vmprot conversion. This removes a branch from _fget.
Modify fget_unlocked to pass sequence counter to interested callers so that they can perform their own checks and make sure the result was otained from stable & current state.
Reviewed by: silence on -hackers
|
#
275205 |
|
28-Nov-2014 |
hselasky |
Style changes: - Move two IOCTL related defines to the top of the C-file - Add more comments describing the recently added IOCTL small size and small align macros
|
#
274462 |
|
13-Nov-2014 |
dchagin |
Add the ppoll() system call. Export kern_poll() needed by an upcoming Linuxulator change.
Differential Revision: https://reviews.freebsd.org/D1133 Reviewed by: kib, wblock MFC after: 1 month
|
#
274088 |
|
04-Nov-2014 |
hselasky |
Simplify logic a bit. Ensure data buffer is properly aligned, especially for platforms where unaligned access is not allowed. Make it possible to override the small buffer size.
A simple continuous read string test using libusb showed a reduction in CPU usage from roughly 10% to less than 1% using a dual-core GHz CPU, when the malloc() operation was skipped for small buffers.
MFC after: 2 weeks
|
#
274017 |
|
03-Nov-2014 |
mjg |
Provide an on-stack temporary buffer for small ioctl requests.
|
#
273555 |
|
23-Oct-2014 |
mjg |
In selfdfree re-evaulate sf_si after takin the lock.
Otherwise we can race with doselwakeup.
This is a fixup to r273549
Reviewed by: jhb Reported by: everyone and their dog
|
#
273549 |
|
23-Oct-2014 |
mjg |
Avoid taking the lock in selfdfree when not needed.
|
#
267710 |
|
21-Jun-2014 |
mjg |
fd: replace fd_nfiles with fd_lastfile where appropriate
fd_lastfile is guaranteed to be the biggest open fd, so when the intent is to iterate over active fds or lookup one, there is no point in looking beyond that limit.
Few places are left unpatched for now.
MFC after: 1 week
|
#
264388 |
|
12-Apr-2014 |
davide |
Hide internal details of sbintime_t implementation wrapping INT64_MAX into SBT_MAX, to make it more robust in case internal type representation will change in the future. All the consumers were migrated to SBT_MAX and every new consumer (if any) should from now use this interface.
Requested by: bapt, jmg, Ryan Lortie (implictly) Reviewed by: mav, bde
|
#
263233 |
|
16-Mar-2014 |
rwatson |
Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h.
MFC after: 3 weeks
|
#
258181 |
|
15-Nov-2013 |
pjd |
Replace CAP_POLL_EVENT and CAP_POST_EVENT capability rights (which I had a very hard time to fully understand) with much more intuitive rights:
CAP_EVENT - when set on descriptor, the descriptor can be monitored with syscalls like select(2), poll(2), kevent(2).
CAP_KQUEUE_EVENT - When set on a kqueue descriptor, the kevent(2) syscall can be called on this kqueue to with the eventlist argument set to non-NULL value; in other words the given kqueue descriptor can be used to monitor other descriptors. CAP_KQUEUE_CHANGE - When set on a kqueue descriptor, the kevent(2) syscall can be called on this kqueue to with the changelist argument set to non-NULL value; in other words it allows to modify events monitored with the given kqueue descriptor.
Add alias CAP_KQUEUE, which allows for both CAP_KQUEUE_EVENT and CAP_KQUEUE_CHANGE.
Add backward compatibility define CAP_POLL_EVENT which is equal to CAP_EVENT.
Sponsored by: The FreeBSD Foundation MFC after: 3 days
|
#
256503 |
|
15-Oct-2013 |
kib |
By default, allow up to SSIZE_MAX i/o for non-devfs files.
Sponsored by: The FreeBSD Foundation Reminded by: Dmitry Sivachenko <trtrmitya@gmail.com> MFC after: 1 month X-MFC-note: stable/10 only
|
#
256502 |
|
15-Oct-2013 |
kib |
Similar to debug.iosize_max_clamp sysctl, introduce devfs_iosize_max_clamp sysctl, which allows/disables SSIZE_MAX-sized i/o requests on the devfs files.
Sponsored by: The FreeBSD Foundation Reminded by: Dmitry Sivachenko <trtrmitya@gmail.com> MFC after: 1 week
|
#
255230 |
|
05-Sep-2013 |
sbruno |
Restore builds on architectures that don't support CAPABILITIES (mips).
|
#
255219 |
|
04-Sep-2013 |
pjd |
Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way.
The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough.
The structure definition looks like this:
struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; };
The initial CAP_RIGHTS_VERSION is 0.
The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements.
The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future.
To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg.
#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)
We still support aliases that combine few rights, but the rights have to belong to the same array element, eg:
#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)
#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)
There is new API to manage the new cap_rights_t structure:
cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...);
bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);
Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg:
cap_rights_t rights;
cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);
There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg:
#define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...);
Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1:
cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);
Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition.
This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x.
Sponsored by: The FreeBSD Foundation
|
#
252367 |
|
29-Jun-2013 |
peter |
Help out gcc. clang understands.
sys_generic.c:1510: warning: 'precision' may be used uninitialized *** [sys_generic.o] Error code 1
|
#
252356 |
|
28-Jun-2013 |
davide |
- Trim an unused and bogus Makefile for mount_smbfs. - Reconnect with some minor modifications, in particular now selsocket() internals are adapted to use sbintime units after recent'ish calloutng switch.
|
#
248092 |
|
09-Mar-2013 |
mav |
Rework overflow checks of r247898 to not let too "intelligent" compiler to optimize it out.
Submitted by: bde
|
#
247898 |
|
06-Mar-2013 |
mav |
Fix time math overflows and improve zero intervals handling in poll(), select(), nanosleep() and kevent() functions after calloutng changes.
Reported by: bde
|
#
247801 |
|
04-Mar-2013 |
davide |
MFcalloutng: Fix kern_select() and sys_poll() so that they can handle sub-tick precision for timeouts (in the same fashion it was done for nanosleep() in r247797).
Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, markj, Fabian Keil
|
#
247602 |
|
01-Mar-2013 |
pjd |
Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights.
- The cap_new(2) system call is left, but it is no longer documented and should not be used in new code.
- The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one.
- The cap_getrights(2) syscall is renamed to cap_rights_get(2).
- If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall.
- If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2).
- To support ioctl and fcntl white-listing the filedesc structure was heavly modified.
- The audit subsystem, kdump and procstat tools were updated to recognize new syscalls.
- Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below:
CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT.
Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2).
Added CAP_SYMLINKAT: - Allow for symlinkat(2).
Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2).
Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory.
Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall.
Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call.
Removed CAP_MAPEXEC.
CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE.
Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).
Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT.
CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required).
CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required).
Added convinient defines:
#define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE
#define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN)
Added defines for backward API compatibility:
#define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER)
Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
|
#
244643 |
|
23-Dec-2012 |
kib |
Do not force a writer to the devfs file to drain the buffer writes.
Requested and tested by: Ian Lepore <freebsd@damnhippie.dyndns.org> MFC after: 2 weeks
|
#
241680 |
|
18-Oct-2012 |
attilio |
Disconnect non-MPSAFE SMBFS from the build in preparation for dropping GIANT from VFS. In addition, disconnect also netsmb, which is a base requirement for SMBFS.
In the while SMBFS regular users can use FUSE interface and smbnetfs port to work with their SMBFS partitions.
Also, there are ongoing efforts by vendor to support in-kernel smbfs, so there are good chances that it will get relinked once properly locked.
This is not targeted for MFC.
|
#
237195 |
|
17-Jun-2012 |
davide |
The variable 'error' in sys_poll() is initialized in declaration to value zero but in any case is overwritten by successive copyin(), making the previous initialization useless. Remove this. As an added bonus this fixes a style(9) bug.
Discussed with: kib Approved by: gnn (mentor) MFC after: 3 days
|
#
232494 |
|
04-Mar-2012 |
kib |
Instead of incomplete handling of read(2)/write(2) return values that does not fit into registers, declare that we do not support this case using CTASSERT(), and remove endianess-unsafe code to split return value into td_retval.
While there, change the style of the sysctl debug.iosize_max_clamp definition.
Requested by: bde MFC after: 3 weeks
|
#
231949 |
|
20-Feb-2012 |
kib |
Fix found places where uio_resid is truncated to int.
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode.
Discussed with: bde, das (previous versions) MFC after: 1 month
|
#
227485 |
|
13-Nov-2011 |
kib |
To limit amount of the kernel memory allocated, and to optimize the iteration over the fdsets, kern_select() limits the length of the fdsets copied in by the last valid file descriptor index. If any bit is set in a mask above the limit, current implementation ignores the filedescriptor, instead of returning EBADF.
Fix the issue by scanning the tails of fdset before entering the select loop and returning EBADF if any bit above last valid filedescriptor index is set. The performance impact of the additional check is only imposed on the (somewhat) buggy applications that pass bad file descriptors to select(2) or pselect(2).
PR: kern/155606, kern/162379 Discussed with: cognet, glebius Tested by: andreast (powerpc, all 64/32bit ABI combinations, big-endian), marius (sparc64, big-endian) MFC after: 2 weeks
|
#
225617 |
|
16-Sep-2011 |
kmacy |
In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls.
Reviewed by: rwatson Approved by: re (bz)
|
#
225177 |
|
25-Aug-2011 |
attilio |
Fix a deficiency in the selinfo interface: If a selinfo object is recorded (via selrecord()) and then it is quickly destroyed, with the waiters missing the opportunity to awake, at the next iteration they will find the selinfo object destroyed, causing a PF#.
That happens because the selinfo interface has no way to drain the waiters before to destroy the registered selinfo object. Also this race is quite rare to get in practice, because it would require a selrecord(), a poll request by another thread and a quick destruction of the selrecord()'ed selinfo object.
Fix this by adding the seldrain() routine which should be called before to destroy the selinfo objects (in order to avoid such case), and fix the present cases where it might have already been called. Sometimes, the context is safe enough to prevent this type of race, like it happens in device drivers which installs selinfo objects on poll callbacks. There, the destruction of the selinfo object happens at driver detach time, when all the filedescriptors should be already closed, thus there cannot be a race. For this case, mfi(4) device driver can be set as an example, as it implements a full correct logic for preventing this from happening.
Sponsored by: Sandvine Incorporated Reported by: rstone Tested by: pluknet Reviewed by: jhb, kib Approved by: re (bz) MFC after: 3 weeks
|
#
224910 |
|
16-Aug-2011 |
jonathan |
poll(2) implementation for capabilities.
When calling poll(2) on a capability, unwrap first and then poll the underlying object.
Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc
|
#
224797 |
|
12-Aug-2011 |
jonathan |
Rename CAP_*_KEVENT to CAP_*_EVENT.
Change the names of a couple of capability rights to be less FreeBSD-specific.
Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc
|
#
224778 |
|
11-Aug-2011 |
rwatson |
Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0:
Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op.
Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions.
In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit.
Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent.
Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
|
#
211941 |
|
28-Aug-2010 |
kib |
For some file types, select code registers two selfd structures. E.g., for socket, when specified POLLIN|POLLOUT in events, you would have one selfd registered for receiving socket buffer, and one for sending. Now, if both events are not ready to fire at the time of the initial scan, but are simultaneously ready after the sleep, pollrescan() would iterate over the pollfd struct twice. Since both times revents is not zero, returned value would be off by one.
Fix this by recalculating the return value in pollout().
PR: kern/143029 MFC after: 2 weeks
|
#
209595 |
|
29-Jun-2010 |
jhb |
Send SIGPIPE to the thread that issued the offending system call rather than to the entire process.
Reported by: Anit Chakraborty Reviewed by: kib, deischen (concept) MFC after: 1 week
|
#
208374 |
|
21-May-2010 |
kib |
Remove PIOLLHUP from the flags used to test for to set exceptfsd fd_set bits in select(2). It seems that historical behaviour is to not reporting exception on EOF, and several applications are broken.
Reported by: Yoshihiko Sarumaru <ysarumaru gmail com> Discussed with: bde PR: ports/140934 MFC after: 2 weeks
|
#
205014 |
|
11-Mar-2010 |
nwhitehorn |
Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms.
Reviewed by: kib, jhb
|
#
198508 |
|
27-Oct-2009 |
kib |
Current pselect(3) is implemented in usermode and thus vulnerable to well-known race condition, which elimination was the reason for the function appearance in first place. If sigmask supplied as argument to pselect() enables a signal, the signal might be delivered before thread called select(2), causing lost wakeup. Reimplement pselect() in kernel, making change of sigmask and sleep atomic.
Since signal shall be delivered to the usermode, but sigmask restored, set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK should be cleared by ast() in case signal was not gelivered during syscall execution.
Reviewed by: davidxu Tested by: pho MFC after: 1 month
|
#
197049 |
|
09-Sep-2009 |
kib |
kern_select(9) copies fd_set in and out of userspace in quantities of longs. Since 32bit processes longs are 4 bytes, 64bit kernel may copy in or out 4 bytes more then the process expected.
Calculate the amount of bytes to copy taking into account size of fd_set for the current process ABI.
Diagnosed and tested by: Peter Jeremy <peterjeremy acm org> Reviewed by: jhb MFC after: 1 week
|
#
196460 |
|
23-Aug-2009 |
kib |
Fix the conformance of poll(2) for sockets after r195423 by returning POLLHUP instead of POLLIN for several cases. Now, the tools/regression/poll results for FreeBSD are closer to that of the Solaris and Linux.
Also, improve the POSIX conformance by explicitely clearing POLLOUT when POLLHUP is reported in pollscan(), making the fix global.
Submitted by: bde Reviewed by: rwatson MFC after: 1 week
|
#
195281 |
|
02-Jul-2009 |
rwatson |
Audit file descriptor and command arguments to ioctl(2).
Approved by: re (audit argument blanket) MFC after: 1 week
|
#
195259 |
|
01-Jul-2009 |
jeff |
- Use fd_lastfile + 1 as the upper bound on nd. This is more correct than using the size of the descriptor array. - A lock is not needed to fetch fd_lastfile. The results are stale the instant it is dropped. - Use a private mutex pool for select since the pool mutex is not used as a leaf. - Fetch the si_mtx pointer first before resorting to hashing to compute the mutex address.
Reviewed by: McKusick Approved by: re (kib)
|
#
195104 |
|
27-Jun-2009 |
rwatson |
Replace AUDIT_ARG() with variable argument macros with a set more more specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr).
In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules.
Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week
|
#
192080 |
|
14-May-2009 |
jeff |
- Implement a lockless file descriptor lookup algorithm in fget_unlocked(). - Save old file descriptor tables created on expansion until the entire descriptor table is freed so that pointers may be followed without regard for expanders. - Mark the file zone as NOFREE so we may attempt to reference potentially freed files. - Convert several fget_locked() users to fget_unlocked(). This requires us to manage reference counts explicitly but reduces locking overhead in the common case.
|
#
189708 |
|
11-Mar-2009 |
rwatson |
When writing out updated pollfd records when returning from poll(), only copy out the revents field, not the whole pollfd structure. Otherwise, if the events field is updated concurrently by another thread, that update may be lost.
This issue apparently causes problems for the JDK on FreeBSD, which expects the Linux behavior of not updating all fields (somewhat oddly, Solaris does not implement the required behavior, but presumably our adaptation of the JDK is based on the Linux port?).
MFC after: 2 weeks PR: kern/130924 Submitted by: Kurt Miller <kurt @ intricatesoftware.com> Discussed with: kib
|
#
189450 |
|
06-Mar-2009 |
kib |
Extract the no_poll() and vop_nopoll() code into the common routine poll_no_poll(). Return a poll_no_poll() result from devfs_poll_f() when filedescriptor does not reference the live cdev, instead of ENXIO.
Noted and tested by: hps MFC after: 1 week
|
#
187996 |
|
02-Feb-2009 |
sepotvin |
Fix select on platforms where sizeof(long) != sizeof(int). This used to work by accident before the cleanup done in revision 187693.
Approved by: kan (mentor)
|
#
187693 |
|
25-Jan-2009 |
jeff |
- bit has to be fd_mask to work properly on 64bit platforms. Constants must also be cast even though the result ultimately is promoted to 64bit. - Correct a loop index upper bound in selscan().
|
#
187682 |
|
25-Jan-2009 |
jeff |
- Correct a typo in a comment.
Noticed by: danger
|
#
187677 |
|
25-Jan-2009 |
jeff |
Fix errors introduced when I rewrote select. - Restructure selscan() and selrescan() to avoid producing extra selfps when we have a fd in multiple sets. As described below multiple selfps may still exist for other reasons. - Make selrescan() tolerate multiple selfds for a given descriptor set since sockets use two selinfos per fd. If an event on each selinfo fires selrescan() will see the descriptor twice. This could result in select() returning 2x the number of fds actually existing in fd sets.
Reported by: mgleason@ncftp.com
|
#
183297 |
|
23-Sep-2008 |
obrien |
Reverse if() logic to improve readability.
Reviewed by: ru
|
#
177374 |
|
19-Mar-2008 |
jeff |
- Remove stale comment. - In the last revision the code was changed to use maxfilesperproc rather than the per-process file limit to restrict the size of the poll array. This eliminates a significant source of process lock contention in multithreaded programs and is cheaper. This had been committed with the wrong batch of changes.
|
#
177368 |
|
19-Mar-2008 |
jeff |
- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.
|
#
175140 |
|
07-Jan-2008 |
jhb |
Make ftruncate a 'struct file' operation rather than a vnode operation. This makes it possible to support ftruncate() on non-vnode file types in the future. - 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on a given file descriptor. - ftruncate() moves to kern/sys_generic.c and now just fetches a file object and invokes fo_truncate(). - The vnode-specific portions of ftruncate() move to vn_truncate() in vfs_vnops.c which implements fo_truncate() for vnode file types. - Non-vnode file types return EINVAL in their fo_truncate() method.
Submitted by: rwatson
|
#
174988 |
|
29-Dec-2007 |
jeff |
Remove explicit locking of struct file. - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them.
Tested by: kris, pho
|
#
174647 |
|
16-Dec-2007 |
jeff |
Refactor select to reduce contention and hide internal implementation details from consumers.
- Track individual selecters on a per-descriptor basis such that there are no longer collisions and after sleeping for events only those descriptors which triggered events must be rescaned. - Protect the selinfo (per descriptor) structure with a mtx pool mutex. mtx pool mutexes were chosen to preserve api compatibility with existing code which does nothing but bzero() to setup selinfo structures. - Use a per-thread wait channel rather than a global wait channel. - Hide select implementation details in a seltd structure which is opaque to the rest of the kernel. - Provide a 'selsocket' interface for those kernel consumers who wish to select on a socket when they have no fd so they no longer have to be aware of select implementation details.
Tested by: kris Reviewed on: arch
|
#
173600 |
|
14-Nov-2007 |
julian |
generally we are interested in what thread did something as opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.
|
#
171212 |
|
04-Jul-2007 |
peter |
Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncate
Approved by: re (kensmith)
|
#
170307 |
|
04-Jun-2007 |
jeff |
Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization.
Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
|
#
169161 |
|
01-May-2007 |
alc |
Remove unneeded include files.
|
#
168355 |
|
04-Apr-2007 |
rwatson |
Replace custom file descriptor array sleep lock constructed using a mutex and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead.
- Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks.
- Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively.
- Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb).
- Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date.
In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio).
Tested by: kris Discussed with: jhb, kris, attilio, jeff
|
#
167232 |
|
05-Mar-2007 |
rwatson |
Further system call comment cleanup:
- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
|
#
167211 |
|
04-Mar-2007 |
rwatson |
Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly.
Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
|
#
167150 |
|
01-Mar-2007 |
bms |
Do not dispatch SIGPIPE from the generic write path for a socket; with this patch the code behaves according to the comment on the line above.
Without this patch, a socket could cause SIGPIPE to be delivered to its process, once with SO_NOSIGPIPE set, and twice without.
With this patch, the kernel now passes the sigpipe regression test.
Tested by: Anton Yuzhaninov MFC after: 1 week
|
#
163355 |
|
14-Oct-2006 |
ru |
Prevent IOC_IN with zero size argument (this is only supported if backward copatibility options are present) from attempting to free memory that wasn't allocated. This is an old bug, and previously it would attempt to free a null pointer. I noticed this bug when working on the previous revision, but forgot to fix it.
Security: local DoS Reported by: Peter Holm MFC after: 3 days
|
#
162711 |
|
27-Sep-2006 |
ru |
Fix our ioctl(2) implementation when the argument is "int". New ioctls passing integer arguments should use the _IOWINT() macro. This fixes a lot of ioctl's not working on sparc64, most notable being keyboard/syscons ioctls.
Full ABI compatibility is provided, with the bonus of fixing the handling of old ioctls on sparc64.
Reviewed by: bde (with contributions) Tested by: emax, marius MFC after: 1 week
|
#
160192 |
|
08-Jul-2006 |
jhb |
- Split ioctl() up into ioctl() and kern_ioctl(). The kern_ioctl() assumes that the 'data' pointer is already setup to point to a valid KVM buffer or contains the copied-in data from userland as appropriate (ioctl(2) still does this). kern_ioctl() takes care of looking up a file pointer, implementing FIONCLEX and FIOCLEX, and calling fi_ioctl(). - Use kern_ioctl() to implement xenix_rdchk() instead of using the stackgap and mark xenix_rdchk() MPSAFE.
|
#
154073 |
|
06-Jan-2006 |
jhb |
Return error from fget_write() rather than hardcoding EBADF now that fget_write() DTRT.
Requested by: bde
|
#
154064 |
|
05-Jan-2006 |
jhb |
Remove XXX comments complaining that write(2) on a read-only descriptor returns EBADF. That errno is correct and is mandated by POSIX. It also goes back to revision 1.1 of our CVS history (i.e. 4.4BSD).
The _fget() function should probably also be upated as it currently returns EINVAL in that case rather than EBADF. (It does return EBADF for reads on a write-only descriptor without any XXX comments oddly enough.)
Discussed with: scottl, grog, mjacob, bde
|
#
147813 |
|
07-Jul-2005 |
jhb |
- Add two new system calls: preadv() and pwritev() which are like readv() and writev() except that they take an additional offset argument and do not change the current file position. In SAT speak: preadv:readv::pread:read and pwritev:writev::pwrite:write. - Try to reduce code duplication some by merging most of the old kern_foov() and dofilefoo() functions into new dofilefoo() functions that are called by kern_foov() and kern_pfoov(). The non-v functions now all generate a simple uio on the stack from the passed in arguments and then call kern_foov(). For example, read() now just builds a uio and calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev().
PR: kern/80362 Submitted by: Marc Olzheim marcolz at stack dot nl (1) Approved by: re (scottl) MFC after: 1 week
|
#
147676 |
|
29-Jun-2005 |
peter |
Conditionally weaken sys_generic.c rev 1.136 to allow certain dubious ioctl numbers in backwards compatability mode. eg: an IOC_IN ioctl with a size of zero. Traditionally this was what you did before IOC_VOID existed, and we had some established users of this in the tree, namely procfs. Certain 3rd party drivers with binary userland components also have this too.
This is necessary to have 4.x and 5.x binaries use these ioctl's. We found this at work when trying to run 4.x binaries.
Approved by: re
|
#
144445 |
|
31-Mar-2005 |
jhb |
Implement kern_adjtime(), kern_readv(), kern_sched_rr_get_interval(), kern_settimeofday(), and kern_writev() to allow for further stackgap reduction in the compat ABIs.
|
#
141664 |
|
10-Feb-2005 |
cperciva |
Declare "cnt" (a number of bytes to read or write) as an "ssize_t", not as a "long" in dofileread() and dofilewrite().
Discussed with: jhb
|
#
140800 |
|
25-Jan-2005 |
phk |
Previously a read of zero bytes got handled in devfs:vop_read() but I missed that when the vnode bypass was introduced.
Deal with zero length transfers before we even get to fo_ops->fo_read().
Found by: Slawa Olhovchenkov <slwzxy.spb.ru@zxy.spb.ru> PR: 75758
|
#
140406 |
|
18-Jan-2005 |
phk |
Detect sign-extension bugs in the ioctl(2) command argument: Truncate to 32 bits and print warning.
|
#
139804 |
|
06-Jan-2005 |
imp |
/* -> /*- for copyright notices, minor format tweaks as necessary
|
#
137806 |
|
17-Nov-2004 |
phk |
Push Giant down through ioctl.
Don't grab Giant in the upper syscall/wrapper code
NET_LOCK_GIANT in the socket code (sockets/fifos).
mtx_lock(&Giant) in the vnode code.
mtx_lock(&Giant) in the opencrypto code. (This may actually not be needed, but better safe than sorry).
Devfs grabs Giant if the driver is marked as needing Giant.
|
#
137805 |
|
17-Nov-2004 |
phk |
Push Giant down through select and poll.
Don't grab Giant in the upper syscall/wrapper code
NET_LOCK_GIANT in the socket code (sockets/fifos).
mtx_lock(&Giant) in the vnode code.
Devfs grabs Giant if the driver is marked as needing Giant.
|
#
137773 |
|
16-Nov-2004 |
phk |
Polish code to correctly reflect structure.
|
#
137689 |
|
14-Nov-2004 |
phk |
Rearrange memory management for ioctl arguments to use stronger checks for illegal values and don't store them on the stack any more.
|
#
137687 |
|
14-Nov-2004 |
phk |
style polish.
|
#
137647 |
|
13-Nov-2004 |
phk |
Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST.
Use this in all the places where sleeping with the lock held is not an issue.
The distinction will become significant once we finalize the exact lock-type to use for this kind of case.
|
#
134404 |
|
27-Aug-2004 |
andre |
Poll() uses the array smallbits that is big enough to hold 32 struct pollfd's to avoid calling malloc() on small numbers of fd's. Because smalltype's members have type char, its address might be misaligned for a struct pollfd. Change the array of char to an array of struct pollfd.
PR: kern/58214 Submitted by: Stefan Farfeleder <stefan@fafoe.narf.at> Reviewed by: bde (a long time ago) MFC after: 3 days
|
#
131897 |
|
10-Jul-2004 |
phk |
Clean up and wash struct iovec and struct uio handling.
Add copyiniov() which copies a struct iovec array in from userland into a malloc'ed struct iovec. Caller frees.
Change uiofromiov() to malloc the uio (caller frees) and name it copyinuio() which is more appropriate.
Add cloneuio() which returns a malloc'ed copy. Caller frees.
Use them throughout.
|
#
127911 |
|
05-Apr-2004 |
imp |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999.
Approved by: core
|
#
126909 |
|
13-Mar-2004 |
rwatson |
Add annotations to mtx_lock(&Giant) in kern_select() and poll() that we always grab Giant, even if we're actually only polling objects that don't require giant. Once socket locking is merged, there will be strong motivation to fix this.
|
#
126326 |
|
27-Feb-2004 |
jhb |
Switch the sleep/wakeup and condition variable implementations to use the sleep queue interface: - Sleep queues attempt to merge some of the benefits of both sleep queues and condition variables. Having sleep qeueus in a hash table avoids having to allocate a queue head for each wait channel. Thus, struct cv has shrunk down to just a single char * pointer now. However, the hash table does not hold threads directly, but queue heads. This means that once you have located a queue in the hash bucket, you no longer have to walk the rest of the hash chain looking for threads. Instead, you have a list of all the threads sleeping on that wait channel. - Outside of the sleepq code and the sleep/cv code the kernel no longer differentiates between cv's and sleep/wakeup. For example, calls to abortsleep() and cv_abort() are replaced with a call to sleepq_abort(). Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and cv_waitq_remove() have been replaced with calls to sleepq_remove(). - The sched_sleep() function no longer accepts a priority argument as sleep's no longer inherently bump the priority. Instead, this is soley a propery of msleep() which explicitly calls sched_prio() before blocking. - The TDF_ONSLEEPQ flag has been dropped as it was never used. The associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been dropped and replaced with a single explicit clearing of td_wchan. TD_SET_ONSLEEPQ() would really have only made sense if it had taken the wait channel and message as arguments anyway. Now that that only happens in one place, a macro would be overkill.
|
#
125454 |
|
04-Feb-2004 |
jhb |
Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists.
Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
|
#
124736 |
|
19-Jan-2004 |
ache |
pread/pwrite: follow lseek spirit - return EINVAL on negative offset for non-VCHR
|
#
122352 |
|
09-Nov-2003 |
tanimura |
- Implement selwakeuppri() which allows raising the priority of a thread being waken up. The thread waken up can run at a priority as high as after tsleep().
- Replace selwakeup()s with selwakeuppri()s and pass appropriate priorities.
- Add cv_broadcastpri() which raises the priority of the broadcast threads. Used by selwakeuppri() if collision occurs.
Not objected in: -arch, -current
|
#
120514 |
|
27-Sep-2003 |
phk |
Introduce no_poll() default method for device drivers. Have it do exactly the same as vop_nopoll() for consistency and put a comment in the two pointing at each other.
Retire seltrue() in favour of no_poll().
Create private default functions in kern_conf.c instead of public ones.
Change default strategy to return the bio with ENODEV instead of doing nothing which would lead the bio stranded.
Retire public nullopen() and nullclose() as well as the entire band of public no{read,write,ioctl,mmap,kqfilter,strategy,poll,dump} funtions, they are the default actions now.
Move the final two trivial functions from subr_xxx.c to kern_conf.c and retire the now empty subr_xxx.c
|
#
118290 |
|
01-Aug-2003 |
alc |
Remove Giant from writev(2). Eliminate trivial style differences between writev(2) and readv(2).
|
#
116550 |
|
18-Jun-2003 |
phk |
Introduce a new flag on a file descriptor: DFLAG_SEEKABLE and use that rather than assume that only DTYPE_VNODE is seekable.
|
#
116182 |
|
10-Jun-2003 |
obrien |
Use __FBSDID().
|
#
114216 |
|
29-Apr-2003 |
kan |
Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h>
Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
|
#
111119 |
|
19-Feb-2003 |
imp |
Back out M_* changes, per decision of the TRB.
Approved by: trb
|
#
109623 |
|
21-Jan-2003 |
alfred |
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
|
#
107849 |
|
13-Dec-2002 |
alfred |
SCARGS removal take II.
|
#
107839 |
|
13-Dec-2002 |
alfred |
Backout removal SCARGS, the code freeze is only "selectively" over.
|
#
107838 |
|
13-Dec-2002 |
alfred |
Remove SCARGS.
Reviewed by: md5
|
#
104094 |
|
28-Sep-2002 |
phk |
Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too.
Inspired by: FlexeLint warning #512
|
#
103702 |
|
20-Sep-2002 |
phk |
We don't need the <sys/disklabel.h> include for alpha anymore.
Sponsored by: DARPA & NAI Labs.
|
#
103216 |
|
11-Sep-2002 |
julian |
Completely redo thread states.
Reviewed by: davidxu@freebsd.org
|
#
102779 |
|
01-Sep-2002 |
iedowse |
Split out a number of mostly VFS and signal related syscalls into a kernel-internal kern_*() version and a wrapper that is called via the syscall vector table. For paths and structure pointers, the internal version either takes a uio_seg parameter or requires the caller to copyin() the data to kernel memory as appropiate. This will permit emulation layers to use these syscalls without having to copy out translated arguments to the stack gap.
Discussed on: -arch Review/suggestions: bde, jhb, peter, marcel
|
#
102333 |
|
23-Aug-2002 |
peter |
Move the TAILQ_INIT(&td->td_selq) before the retry: label. Otherwise in some circumstances when we get a select collision, we can end up with cases where we do not clear some sip->si_thread on the way out, leading to page faults in selwakeup(). This should solve the problem where postfix can crash the kernel during select collisions.
Reviewed by: alfred
|
#
102003 |
|
17-Aug-2002 |
rwatson |
In continuation of early fileop credential changes, modify fo_ioctl() to accept an 'active_cred' argument reflecting the credential of the thread initiating the ioctl operation.
- Change fo_ioctl() to accept active_cred; change consumers of the fo_ioctl() interface to generally pass active_cred from td->td_ucred. - In fifofs, initialize filetmp.f_cred to ap->a_cred so that the invocations of soo_ioctl() are provided access to the calling f_cred. Pass ap->a_td->td_ucred as the active_cred, but note that this is required because we don't yet distinguish file_cred and active_cred in invoking VOP's. - Update kqueue_ioctl() for its new argument. - Update pipe_ioctl() for its new argument, pass active_cred rather than td_ucred to MAC for authorization. - Update soo_ioctl() for its new argument. - Update vn_ioctl() for its new argument, use active_cred rather than td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR().
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
101983 |
|
16-Aug-2002 |
rwatson |
Make similar changes to fo_stat() and fo_poll() as made earlier to fo_read() and fo_write(): explicitly use the cred argument to fo_poll() as "active_cred" using the passed file descriptor's f_cred reference to provide access to the file credential. Add an active_cred argument to fo_stat() so that implementers have access to the active credential as well as the file credential. Generally modify callers of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which was redundantly provided via the fp argument. This set of modifications also permits threads to perform these operations on behalf of another thread without modifying their credential.
Trickle this change down into fo_stat/poll() implementations:
- badfo_poll(), badfo_stat(): modify/add arguments. - kqueue_poll(), kqueue_stat(): modify arguments. - pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to MAC checks rather than td->td_ucred. - soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather than cred to pru_sopoll() to maintain current semantics. - sopoll(): moidfy arguments. - vn_poll(), vn_statfile(): modify/add arguments, pass new arguments to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL() to maintian current semantics. - vn_close(): rename cred to file_cred to reflect reality while I'm here. - vn_stat(): Add active_cred and file_cred arguments to vn_stat() and consumers so that this distinction is maintained at the VFS as well as 'struct file' layer. Pass active_cred instead of td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.
- fifofs: modify the creation of a "filetemp" so that the file credential is properly initialized and can be used in the socket code if desired. Pass ap->a_td->td_ucred as the active credential to soo_poll(). If we teach the vnop interface about the distinction between file and active credentials, we would use the active credential here.
Note that current inconsistent passing of active_cred vs. file_cred to VOP's is maintained. It's not clear why GETATTR would be authorized using active_cred while POLL would be authorized using file_cred at the file system level.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
101941 |
|
15-Aug-2002 |
rwatson |
In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what:
- Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c.
For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics:
- badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred
Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics.
Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED.
These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations.
Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
100586 |
|
23-Jul-2002 |
alfred |
Attempt to clarify comment in selrecord.
|
#
100507 |
|
22-Jul-2002 |
alfred |
remove caddr_t from fo_ioctl calls
|
#
100506 |
|
22-Jul-2002 |
alfred |
remove caddr_t
|
#
99072 |
|
29-Jun-2002 |
julian |
Part 1 of KSE-III
The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools)
Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands)
NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
|
#
98499 |
|
20-Jun-2002 |
alfred |
Implement SO_NOSIGPIPE option for sockets. This allows one to request that an EPIPE error return not generate SIGPIPE on sockets.
Submitted by: lioux Inspired by: Darwin
|
#
98423 |
|
19-Jun-2002 |
phk |
Remove the compat bits for the mis-aligned struct disklabel on alpha, people got three times longer than I promised.
Sponsored by: DARPA & NAI Labs.
|
#
98133 |
|
12-Jun-2002 |
kbyanc |
Make nselcol, the number of select collisions since boot, unsigned as negative collisions simply doesn't make sense.
PR: (one small part of) 19720 Approved by: alfred
|
#
97994 |
|
07-Jun-2002 |
jhb |
Catch up to changes in ktrace API.
|
#
96243 |
|
09-May-2002 |
alc |
o Correct an error made in revision 1.65: In readv(), if uap->iovcnt is out-of-range, drop the file reference before returning. (This error also exists in the RELENG_4 branch.) o Eliminate the acquisition and release of Giant in readv() now that malloc() and free() are callable without Giant.
|
#
95958 |
|
02-May-2002 |
phk |
As promised make the hack for sizeof(struct disklabel) on alpha annoying.
Run make world (or recompile whatever program whines) to get rid of warning.
Compat bits will be removed entirely in about two weeks.
|
#
93818 |
|
04-Apr-2002 |
jhb |
Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.
Tested on: i386, alpha, sparc64
|
#
93810 |
|
04-Apr-2002 |
phk |
Delete the bogus d_boot[01] fields from struct disklabel.
This shrinks the size 4 bytes on alpha, down to the same 276 bytes as all other platforms.
Construct a hack to make old ioctls work on new kernels.
Once world is recompiled only the new and correct sysctls will be used.
This hack will become annoying around 1st of may to make people rebuild their worlds and it will be gone before 5.0.
|
#
92723 |
|
19-Mar-2002 |
alfred |
Remove __P.
|
#
92310 |
|
15-Mar-2002 |
alfred |
Giant pushdown for read/write/pread/pwrite syscalls.
kern/kern_descrip.c: Aquire Giant in fdrop_locked when file refcount hits zero, this removes the requirement for the caller to own Giant for the most part.
kern/kern_ktrace.c: Aquire Giant in ktrgenio, simplifies locking in upper read/write syscalls.
kern/vfs_bio.c: Aquire Giant in bwillwrite if needed.
kern/sys_generic.c Giant pushdown, remove Giant for: read, pread, write and pwrite. readv and writev aren't done yet because of the possible malloc calls for iov to uio processing.
kern/sys_socket.c Grab giant in the socket fo_read/write functions.
kern/vfs_vnops.c Grab giant in the vnode fo_read/write functions.
|
#
92252 |
|
13-Mar-2002 |
alfred |
Fixes to make select/poll mpsafe.
Problem: selwakeup required calling pfind which would cause lock order reversals with the allproc_lock and the per-process filedesc lock. Solution: Instead of recording the pid of the select()'ing process into the selinfo structure, actually record a pointer to the thread. To avoid dereferencing a bad address all the selinfo structures that are in use by a thread are kept in a list hung off the thread (protected by sellock). When a selwakeup occurs the selinfo is removed from that threads list, it is also removed on the way out of select or poll where the thread will traverse its list removing all the selinfos from its own list.
Problem: Previously the PROC_LOCK was used to provide the mutual exclusion needed to ensure proper locking, this couldn't work because there was a single condvar used for select and poll and condvars can only be used with a single mutex. Solution: Introduce a global mutex 'sellock' which is used to provide mutual exclusion when recording events to wait on as well as performing notification when an event occurs.
Interesting note: schedlock is required to manipulate the per-thread TDF_SELECT flag, however if given its own field it would not need schedlock, also because TDF_SELECT is only manipulated under sellock one doesn't actually use schedlock for syncronization, only to protect against corruption.
Proc locks are no longer used in select/poll.
Portions contributed by: davidc
|
#
91972 |
|
09-Mar-2002 |
alfred |
Remove __P
|
#
89996 |
|
30-Jan-2002 |
alfred |
Remove unused variables in select(2) from previous delta.
Pointed out by: bde
|
#
89969 |
|
29-Jan-2002 |
alfred |
Attempt to fixup select(2) and poll(2), this should fix some races with other threads as well as speed up the interfaces.
To fix the race and accomplish the speedup, remove selholddrop and pollholddrop. The entire concept is somewhat bogus because holding the individual struct file pointers offers us no guarantees that another thread context won't close it on us thereby removing our access to our own reference.
Selholddrop and pollholddrop also would do multiple locks and unlocks of mutexes _per-file_ in the fd arrays to be scanned, this needed to be sped up.
Instead of using selholddrop and pollholddrop, simply hold the filedesc lock over the selscan and pollscan functions. This should protect us against close(2)'s on the files as reduce the multiple lock/unlock pairs per fd into a single lock over the filedesc.
|
#
89696 |
|
23-Jan-2002 |
alfred |
forced commit, Previous revision also removed the holdfp() function from the kernel.
|
#
89695 |
|
23-Jan-2002 |
alfred |
make pread use fget_read instead of holdfp.
|
#
89523 |
|
18-Jan-2002 |
alfred |
undo a bit of the Giant pushdown.
fdrop isn't SMP safe as it may call into the file's close routine which definetly is not SMP safe right now, so we hold Giant over calls to fdrop now.
|
#
89435 |
|
16-Jan-2002 |
alfred |
Fix giant handling in pwrite(2), I forgot to release it when finishing the syscall.
|
#
89319 |
|
13-Jan-2002 |
alfred |
Replace ffind_* with fget calls.
Make fget MPsafe.
Make fgetvp and fgetsock use the fget subsystem to reduce code bloat.
Push giant down in fpathconf().
|
#
89306 |
|
13-Jan-2002 |
alfred |
SMP Lock struct file, filedesc and the global file list.
Seigo Tanimura (tanimura) posted the initial delta.
I've polished it quite a bit reducing the need for locking and adapting it for KSE.
Locks:
1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked.
1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex.
1 sx lock for the global filelist.
struct file * fhold(struct file *fp); /* increments reference count on a file */
struct file * fhold_locked(struct file *fp); /* like fhold but expects file to locked */
struct file * ffind_hold(struct thread *, int fd); /* finds the struct file in thread, adds one reference and returns it unlocked */
struct file * ffind_lock(struct thread *, int fd); /* ffind_hold, but returns file locked */
I still have to smp-safe the fget cruft, I'll get to that asap.
|
#
86341 |
|
14-Nov-2001 |
dillon |
remove holdfp()
Replace uses of holdfp() with fget*() or fgetvp*() calls as appropriate
introduce fget(), fget_read(), fget_write() - these functions will take a thread and file descriptor and return a file pointer with its ref count bumped.
introduce fgetvp(), fgetvp_read(), fgetvp_write() - these functions will take a thread and file descriptor and return a vref()'d vnode.
*_read() requires that the file pointer be FREAD, *_write that it be FWRITE.
This continues the cleanup of struct filedesc and struct file access routines which, when are all through with it, will allow us to then make the API calls MP safe and be able to move Giant down into the fo_* functions.
|
#
83799 |
|
21-Sep-2001 |
jhb |
The P_SELECT flag was moved from p->p_flag to td->td_flags, but p_flag was locked by the proc lock and td_flags is locked by the sched_lock. The places that read, set, and cleared TDF_SELECT weren't updated, so they read and modified td_flags w/o holding the sched_lock, meaning that they could corrupt the per-thread flags field. As an immediate band-aid, grab sched_lock while reading and manipulating td_flags in relation to TDF_SELECT. This will probably be cleaned up some later on.
|
#
83366 |
|
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
#
82752 |
|
01-Sep-2001 |
dillon |
Giant Pushdown: read() pread() readv() write () pwrite() writev() ioctl() select () poll() openbsd_poll()
|
#
76618 |
|
15-May-2001 |
tanimura |
Back out scanning file descriptors with holding a process lock. selrecord() requires allproc sx in pfind(), resulting in lock order reversal between allproc and a process lock.
|
#
76564 |
|
14-May-2001 |
tanimura |
- Convert msleep(9) in select(2) and poll(2) to cv_*wait*(9).
- Since polling should not involve sleeping, keep holding a process lock upon scanning file descriptors.
- Hold a reference to every file descriptor prior to entering polling loop in order to avoid lock order reversal between lockmgr and p_mtx upon calling fdrop() in fo_poll(). (NOTE: this work has not been done for netncp and netsmb yet because a socket itself has no reference counts.)
Reviewed by: jhb
|
#
75893 |
|
23-Apr-2001 |
jhb |
Change the pfind() and zpfind() functions to lock the process that they find before releasing the allproc lock and returning.
Reviewed by: -smp, dfr, jake
|
#
73929 |
|
07-Mar-2001 |
jhb |
Grab the process lock while calling psignal and before calling psignal.
|
#
73159 |
|
27-Feb-2001 |
jlemon |
Correctly declare variables as u_int rather than doing typecasts. Kill some register declarations while I'm here.
Submitted by: bde (1)
|
#
73116 |
|
26-Feb-2001 |
jlemon |
Cast nfds to u_int before range checking it in order to catch negative values.
PR: 25393
|
#
72203 |
|
09-Feb-2001 |
peter |
poll(2) array limits (take 2) - after some input from bde.
|
#
72200 |
|
09-Feb-2001 |
bmilekic |
Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case.
Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
|
#
72146 |
|
07-Feb-2001 |
peter |
The code I picked up from NetBSD in '97 had a nasty bug. It limited the index of the pollfd array to the number of fd's currently open, not the maximum number of fd's. ie: if you had 0,1,2 open, you could not use pollfd slots higher than 20. The specs say we only have to support OPEN_MAX [64] entries but we allow way more than that.
|
#
71566 |
|
24-Jan-2001 |
jhb |
- Catch up to proc flag changes. - Add proc locking for selwakeup() and selrecord().
|
#
70834 |
|
09-Jan-2001 |
wollman |
select() DKI is now in <sys/selinfo.h>.
|
#
69733 |
|
07-Dec-2000 |
dillon |
Only call bwillwrite() for vnodes. Do not penalize devices or pipes.
|
#
69686 |
|
06-Dec-2000 |
dillon |
Add necessary bwillwrite() in writev() entry point.
Deal with excessive dirty buffers when msync() syncs non-contiguous dirty buffers by checking for the case in UFS *before* checking for clusterability.
|
#
69407 |
|
30-Nov-2000 |
alfred |
only call bwillwrite() to stall on IO when dealing with VNODEs otherwise we will stall on non-disk IO for things like fifos and sockets
|
#
69008 |
|
21-Nov-2000 |
jlemon |
Protect p_wchan with sched_lock in selwakeup().
|
#
68883 |
|
18-Nov-2000 |
dillon |
This patchset fixes a large number of file descriptor race conditions. Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc...
PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>
|
#
63974 |
|
28-Jul-2000 |
peter |
Fix a warning that has been annoying me for some time: "kern/sys_generic.c:358: warning: cast discards qualifiers from pointer target type" The idea for using the uintptr_t intermediate cast for de-constifying a pointer was hinted at by bde some time ago.
|
#
63905 |
|
27-Jul-2000 |
green |
Distinguish between whether ktraceing was enabled before an IO operation or after it. If the ktrace operation was enabled while the process was blocked doing IO, the race would allow it to pass down invalid (uninitialized) data and panic later down the call stack.
|
#
63057 |
|
13-Jul-2000 |
jhb |
For infinite timeouts, set both the tv_sec and tv_usec fields to zero in poll() and select().
Noticed by: Wesley Morgan <morganw@chemicals.tacorp.com>
|
#
63049 |
|
12-Jul-2000 |
jhb |
Fix a very obscure bug in select() and poll() where the timeout would never expire if poll() or select() was called before the system had been in multiuser for 1 second. This was caused by only checking to see if tv_sec was zero rather than checking both tv_sec and tv_usec.
|
#
62792 |
|
07-Jul-2000 |
green |
Remove two micro-pessimizations I made. Bruce is teaching me well :) KTRPOINT(p, KTR_GENIO) is more uncommon than error == 0, so it should be first in the && statement.
|
#
62378 |
|
02-Jul-2000 |
green |
Modify ktrace's general I/O tracing, ktrgenio(), to use a struct uio * instead of a struct iovec * array and int len. Get rid of stupidly trying to allocate all of the memory and copyin()ing the entire iovec[], and instead just do the proper VOP_WRITE() in ktrwrite() using a copy of the struct uio that the syscall originally used.
This solves the DoS which could easily be performed; to work around the DoS, one could also remove "options KTRACE" from the kernel. This is a very strong MFC candidate for 4.1.
Found by: art@OpenBSD.org
|
#
61591 |
|
12-Jun-2000 |
alfred |
unstatic getfp() so that other subsystems can use it.
make sendfile() use it.
Approved by: dg
|
#
60269 |
|
09-May-2000 |
dillon |
Some ioctl routines assume that the ioctl buffer is aligned, but a char[] declaration makes no such guarentee. A union is used to force alignment of the char buffer.
|
#
57357 |
|
20-Feb-2000 |
peter |
Fix select(2) for the Alpha. (!!) It was never returning true for fd's in the range of 32-63, 96-127 etc. The first problem was the FD_*() macros were shifting a 32 bit integer "1" left by more than 32 bits. The same problem happened in selscan(). ffs() also takes an int argument and causes failure. For cases where int == long (ie: the usual case for x86, but not always as gcc can have long being a 64 bit quantity) ffs() could be used.
Reported by: Marian Stagarescu <marian@bile.skycache.com> Reviewed by: dfr, gallatin (sys/types.h only) Approved by: jkh
|
#
55943 |
|
14-Jan-2000 |
jasone |
Add aio_waitcomplete(). Make aio work correctly for socket descriptors. Make gratuitous style(9) fixes (me, not the submitter) to make the aio code more readable.
PR: kern/12053 Submitted by: Chris Sedore <cmsedore@maxwell.syr.edu>
|
#
55478 |
|
05-Jan-2000 |
peter |
Export the nselcoll counter via the kern.nselcoll sysctl so we can see just how bad it gets in various situations.
Reminded by: adrian
|
#
52234 |
|
14-Oct-1999 |
green |
Missed the second argument of fdrop().
Submitted by: jhay
|
#
52227 |
|
14-Oct-1999 |
green |
Fix a race condition with shared fd tables and writev(). It's still not safe to consider file table sharing secure. Submitted by: Ville-Pertti Keinonen <will@iki.fi>
|
#
52128 |
|
11-Oct-1999 |
peter |
Trim unused options (or #ifdef for undoc options).
Submitted by: phk
|
#
51418 |
|
19-Sep-1999 |
green |
This is what was "fdfix2.patch," a fix for fd sharing. It's pretty far-reaching in fd-land, so you'll want to consult the code for changes. The biggest change is that now, you don't use fp->f_ops->fo_foo(fp, bar) but instead fo_foo(fp, bar), which increments and decrements the fp refcount upon entry and exit. Two new calls, fhold() and fdrop(), are provided. Each does what it seems like it should, and if fdrop() brings the refcount to zero, the fd is freed as well.
Thanks to peter ("to hell with it, it looks ok to me.") for his review. Thanks to msmith for keeping me from putting locks everywhere :)
Reviewed by: peter
|
#
50477 |
|
27-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
#
45311 |
|
04-Apr-1999 |
dt |
Add standard padding argument to pread and pwrite syscall. That should make them NetBSD compatible.
Add parameter to fo_read and fo_write. (The only flag FOF_OFFSET mean that the offset is set in the struct uio).
Factor out some common code from read/pread/write/pwrite syscalls.
|
#
45065 |
|
27-Mar-1999 |
alc |
Added pread and pwrite. These functions are defined by the X/Open Threads Extension. (Note: We use the same syscall numbers as NetBSD.)
Submitted by: John Plevyak <jplevyak@inktomi.com>
|
#
43384 |
|
29-Jan-1999 |
bde |
Removed a bogus cast to c_caddr_t. This is part of terminating c_caddr_t with extreme prejudice. Here the point of the original cast to caddr_t was to break the warning about the const mismatch between write(2)'s `const void *buf' and `struct uio's `char *iov_base' (previous bitrot gave a gratuitous dependency on caddr_t being char *). Compiling with -Wcast-qual made the cast a full no-op.
This change has no effect on the warning for discarding `const' on assignment to iov_base. The warning should not be fixed by splitting `struct iovec' into a non-const version for read() and a const version for write(), since correct const poisoning would affect all pointers to i/o addresses. Const'ness should probably be forgotten by not declaring it in syscalls.master.
|
#
43301 |
|
27-Jan-1999 |
dillon |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
#
41632 |
|
09-Dec-1998 |
jkh |
poll(2) sets POLLNVAL for descriptors passed in that are less than 0. This makes it difficult to do efficient manipulation of the struct pollfd since you can't leave a slot empty.
PR: 8599 Submitted-by: Marc Slemko <marcs@znep.com>
|
#
41086 |
|
11-Nov-1998 |
truckman |
Installed the second patch attached to kern/7899 with some changes suggested by bde, a few other tweaks to get the patch to apply cleanly again and some improvements to the comments.
This change closes some fairly minor security holes associated with F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN had on tty devices. For more details, see the description on the PR.
Because this patch increases the size of the proc and pgrp structures, it is necessary to re-install the includes and recompile libkvm, the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w.
PR: kern/7899 Reviewed by: bde, elvind
|
#
38864 |
|
05-Sep-1998 |
bde |
Fixed bogotification of pseudocode for syscall args by rev.1.53 of syscalls.master.
|
#
38517 |
|
24-Aug-1998 |
dfr |
Change various syscalls to use size_t arguments instead of u_int.
Add some overflow checks to read/write (from bde).
Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags and vm_object::paging_in_progress to use operations which are not interruptable.
Reviewed by: Bruce Evans <bde@zeta.org.au>
|
#
36846 |
|
10-Jun-1998 |
dfr |
64bit fixes: use u_long not int for ioctl command.
|
#
36119 |
|
17-May-1998 |
phk |
s/nanoruntime/nanouptime/g s/microruntime/microuptime/g
Reviewed by: bde
|
#
35041 |
|
05-Apr-1998 |
ache |
Remove unused atv.tv_usec = 0; from select/poll code
|
#
35029 |
|
04-Apr-1998 |
phk |
Time changes mark 2:
* Figure out UTC relative to boottime. Four new functions provide time relative to boottime.
* move "runtime" into struct proc. This helps fix the calcru() problem in SMP.
* kill mono_time.
* add timespec{add|sub|cmp} macros to time.h. (XXX: These may change!)
* nanosleep, select & poll takes long sleeps one day at a time
Reviewed by: bde Tested by: ache and others
|
#
34999 |
|
02-Apr-1998 |
phk |
Try to fix poll & select after I broke them.
|
#
34961 |
|
30-Mar-1998 |
phk |
Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part.
Most uses of time.tv_sec now uses the new variable time_second instead.
gettime() changed to getmicrotime(0.
Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it).
A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random.
Add a new nfs_curusec() function.
Mark a couple of bogosities involving the now disappeard time variable.
Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args.
Change profiling in ncr.c to use ticks instead of time. Resolution is the same.
Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences.
Reviewed by: bde
|
#
31364 |
|
23-Nov-1997 |
bde |
Fixed some style bugs in the poll() code.
Removed dead code to "Avoid inadvertently sleeping forever". hzto() never returns 0.
|
#
30994 |
|
06-Nov-1997 |
phk |
Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead.
This fixes a boatload of compiler warning, and removes a lot of cruft from the sources.
I have not removed the /*ARGSUSED*/, they will require some looking at.
libkvm, ps and other userland struct proc frobbing programs will need recompiled.
|
#
30354 |
|
12-Oct-1997 |
phk |
Last major round (Unless Bruce thinks of somthing :-) of malloc changes.
Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them.
A couple of finer points by: bde
|
#
30309 |
|
11-Oct-1997 |
phk |
Distribute and statizice a lot of the malloc M_* types.
Substantial input from: bde
|
#
29351 |
|
14-Sep-1997 |
peter |
Implement poll(2). This is mostly taken from the NetBSD implementation (from some time ago) but with a few tweaks along the way.
Obtained from: NetBSD
|
#
29041 |
|
02-Sep-1997 |
bde |
Removed unused #includes.
|
#
26671 |
|
15-Jun-1997 |
dyson |
Modifications to existing files to support the initial AIO/LIO and kernel based threading support.
|
#
24206 |
|
24-Mar-1997 |
bde |
Don't include <sys/ioctl.h> in the kernel. Stage 4: include <sys/ttycom.h> and sometimes <sys/filio.h> instead of <sys/ioctl.h> in miscellaneous files. Most of these files have nothing to do with ttys but need to include <sys/ttycom.h> to get the definitions of TIOC[SG]PGRP which are (ab)used to convert F[SG]ETOWN fcntls into ioctls.
|
#
24131 |
|
23-Mar-1997 |
bde |
Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined. Fixed everything that depended on getting fcntl.h stuff from the wrong place. Most things don't depend on file.h stuff at all.
|
#
24102 |
|
22-Mar-1997 |
bde |
Removed `volatile' from declaration of `time', and removed the resulting null casts. `time' is nonvolatile for accesses within a region locked by splclock()/splx(). Accesses outside such a region are invalid, and splx() must have the side effect of potentially changing all global variables (since there are hundreds of sort of volatile variables like `time'), so declaring `time' as volatile didn't have any real benefits.
|
#
22975 |
|
22-Feb-1997 |
peter |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
22945 |
|
20-Feb-1997 |
bde |
Improved select(): - avoid malloc() if the number of fds is small. - pack the bits better so that `small' is quite large. - don't waste time generating zero bits for null fd_set pointers or scanning these bits.
Possibly improved select(): - free malloc()ed storage before returning. This is simpler and I think huge select()s aren't worth optimizing since they are rare, relative gain would be small and there would be tiny costs for all selects().
Reviewed by: ache (first version by him too)
|
#
21673 |
|
14-Jan-1997 |
jkh |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
17713 |
|
20-Aug-1996 |
smpatel |
Fix a minor style error in my code.
|
#
17702 |
|
20-Aug-1996 |
smpatel |
Remove the kernel FD_SETSIZE limit for select(). Make select()'s first argument 'int' not 'u_int'.
Reviewed by: bde
|
#
13203 |
|
03-Jan-1996 |
wollman |
Converted two options over to the new scheme: USER_LDT and KTRACE.
|
#
12819 |
|
14-Dec-1995 |
phk |
A Major staticize sweep. Generates a couple of warnings that I'll deal with later. A number of unused vars removed. A number of unused procs removed or #ifdefed.
|
#
12221 |
|
12-Nov-1995 |
bde |
Included <sys/sysproto.h> to get central declarations for syscall args structs and prototypes for syscalls.
Ifdefed duplicated decentralized declarations of args structs. It's convenient to have this visible but they are hard to maintain. Some are already different from the central declarations. 4.4lite2 puts them in comments in the function headers but I wanted to avoid the large changes for that.
|
#
12208 |
|
11-Nov-1995 |
bde |
Fixed the type of readv(). An args struct member name conflicted with the machine-generated one in <sys/sysproto.h>.
|
#
11400 |
|
10-Oct-1995 |
swallace |
Remove the ugly COMPAT_IBCS2 hack to hide a return value through magic numbers. The new socksys support does not need this hack.
I am against any magic practicing.
|
#
8876 |
|
30-May-1995 |
rgrimes |
Remove trailing whitespace.
|
#
7804 |
|
13-Apr-1995 |
dg |
Backed out previous change - it reduces performance. (oops).
|
#
7801 |
|
13-Apr-1995 |
dg |
Slight optimization to select().
|
#
3570 |
|
13-Oct-1994 |
sos |
Damn, check in the wrong version, fixed. Reviewed by: Submitted by: Obtained from:
|
#
3568 |
|
13-Oct-1994 |
sos |
Made it possible for ioctl to return a value. Ifdef by COMPAT_IBCS2 (used by the socksys system). Submitted by: Mostyn Lewis (mostyn@mrl.com)
|
#
3485 |
|
09-Oct-1994 |
phk |
Cosmetics. related to getting prototypes into view.
|
#
3308 |
|
02-Oct-1994 |
phk |
All of this is cosmetic. prototypes, #includes, printfs and so on. Makes GCC a lot more silent.
|
#
3098 |
|
25-Sep-1994 |
phk |
While in the real world, I had a bad case of being swapped out for a lot of cycles. While waiting there I added a lot of the extra ()'s I have, (I have never used LISP to any extent). So I compiled the kernel with -Wall and shut up a lot of "suggest you add ()'s", removed a bunch of unused var's and added a couple of declarations here and there. Having a lap-top is highly recommended. My kernel still runs, yell at me if you kernel breaks.
|
#
2462 |
|
02-Sep-1994 |
dg |
Whoops, accidently left out some pieces of the munmapfd patch.
|
#
2461 |
|
02-Sep-1994 |
dg |
Make sure that uio_resid isn't negative in read().
|
#
1817 |
|
02-Aug-1994 |
dg |
Added $Id$
|
#
1549 |
|
25-May-1994 |
rgrimes |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
#
1542 |
|
24-May-1994 |
rgrimes |
This commit was generated by cvs2svn to compensate for changes in r1541, which included commits to RCS files with non-trunk default branches.
|
#
1541 |
|
24-May-1994 |
rgrimes |
BSD 4.4 Lite Kernel Sources
|