#
357706 |
|
09-Feb-2020 |
kevans |
MFC O_SEARCH: r357412, r357461, r357580, r357584, r357636, r357671, r357688
r357412: Provide O_SEARCH
O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping permissions checks on the directory itself after the initial open(). This is close to the semantics we've historically applied for O_EXEC on a directory, which is UB according to POSIX. Conveniently, O_SEARCH on a file is also explicitly undefined behavior according to POSIX, so O_EXEC would be a fine choice. The spec goes on to state that O_SEARCH and O_EXEC need not be distinct values, but they're not defined to be the same value.
This was pointed out as an incompatibility with other systems that had made its way into libarchive, which had assumed that O_EXEC was an alias for O_SEARCH.
This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a directory is checked in vn_open_vnode already, so for completeness we add a NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not re-check that when descending in namei.
[0] https://pubs.opengroup.org/onlinepubs/9699919799/
r357461: namei: preserve errors from fget_cap_locked
Most notably, we want to make sure we don't clobber any capabilities-related errors. This is a regression from r357412 (O_SEARCH) that was picked up by the capsicum tests.
r357580: O_SEARCH test: drop O_SEARCH|O_RDWR local diff
In FreeBSD's O_SEARCH implementation, O_SEARCH in conjunction with O_RDWR or O_WRONLY is explicitly rejected. In this case, O_RDWR was not necessary anyways as the file will get created with or without it.
This was submitted upstream as misc/54940 and committed in rev 1.8 of the file.
r357584: Record-only MFV of r357583: netbsd-tests: import upstreamed changes
The changes in question originated in FreeBSD/head; no further action is required.
r357636: MFV r357635: imnport v1.9 of the O_SEARCH tests
The RCSID data was wrong, so this is effectively a record-only merge with correction of said data. No further changes should be needed in this area, as we've now upstreamed our local changes to this specific test.
r357671: O_SEARCH test: mark revokex an expected fail on NFS
The revokex test does not work when the scratch directory is created on NFS. Given the nature of NFS, it likely can never work without looking like a security hole since O_SEARCH would rely on the server knowing that the directory did have +x at the time of open and that it's OK for it to have been revoked based on POSIX specification for O_SEARCH.
This does mean that O_SEARCH is only partially functional on NFS in general, but I suspect the execute bit getting revoked in the process is likely not common.
r357688: MFV r357687: Import NFS fix for O_SEARCH tests
The version that ended upstream was ultimately slightly different than the version committed here; notably, statvfs() is used but it's redefined appropriately to statfs() on FreeBSD since we don't provide the fstypename for the former interface.
|
#
357660 |
|
07-Feb-2020 |
kevans |
MFC r355248: tty: implement TIOCNOTTY
Generally, it's preferred that an application fork/setsid if it doesn't want to keep its controlling TTY, but it could be that a debugger is trying to steal it instead -- so it would hook in, drop the controlling TTY, then do some magic to set things up again. In this case, TIOCNOTTY is quite handy and still respected by at least OpenBSD, NetBSD, and Linux as far as I can tell.
I've dropped the note about obsoletion, as I intend to support TIOCNOTTY as long as it doesn't impose a major burden.
|
#
353722 |
|
18-Oct-2019 |
kib |
MFC r353447: devfs_vptocnp(): correct the component name when node is not at top.
|
#
328298 |
|
23-Jan-2018 |
jhb |
MFC 320900,323882,324224,324226,324228,326986,326988,326989,326990,326993, 326994,326995,327004: Various fixes for pathconf(2).
The original change to use vop_stdpathconf() more widely was motivated by a panic due to recent AIO-related changes. However, bde@ reported that vop_stdpathconf() contained too many settings that were not filesystem-independent. The end result of this set of patches is to fix the AIO-related panic via use of a trimmed-down vop_stdpathconf() while also adding support for missing pathconf variables in various filesystems (and removing a few settings incorrectly reported as supported).
320900: Consistently use vop_stdpathconf() for default pathconf values.
Update filesystems not currently using vop_stdpathconf() in pathconf VOPs to use vop_stdpathconf() for any configuration variables that do not have filesystem-specific values. vop_stdpathconf() is used for variables that have system-wide settings as well as providing default values for some values based on system limits. Filesystems can still explicitly override individual settings.
323882: Only handle _PC_MAX_CANON, _PC_MAX_INPUT, and _PC_VDISABLE for TTY devices.
Move handling of these three pathconf() variables out of vop_stdpathconf() and into devfs_pathconf() as TTY devices can only be devfs files. In addition, only return settings for these three variables for devfs devices whose device switch has the D_TTY flag set.
324224: Handle _PC_FILESIZEBITS and _PC_SYMLINK_MAX pathconf() requests in cd9660.
cd9660 only supports symlinks with Rock Ridge extensions, so _PC_SYMLINK_MAX is conditional on Rock Ridge.
324226: Return 64 for pathconf(_PC_FILESIZEBITS) on tmpfs.
324228: Flesh out pathconf() on UDF.
- Return 64 bits for _PC_FILESIZEBITS. - Handle _PC_SYMLINK_MAX. - Defer _PC_PATH_MAX to vop_stdpathconf().
326986: Add a custom VOP_PATHCONF method for fdescfs.
The method handles NAME_MAX and LINK_MAX explicitly. For all other pathconf variables, the method passes the request down to the underlying file descriptor. This requires splitting a kern_fpathconf() syscallsubr routine out of sys_fpathconf(). Also, to avoid lock order reversals with vnode locks, the fdescfs vnode is unlocked around the call to kern_fpathconf(), but with the usecount of the vnode bumped.
326988: Add a custom VOP_PATHCONF method for fuse.
This method handles _PC_FILESIZEBITS, _PC_SYMLINK_MAX, and _PC_NO_TRUNC. For other values it defers to vop_stdpathconf().
326989: Support _PC_FILESIZEBITS in msdosfs' VOP_PATHCONF().
326990: Handle _PC_FILESIZEBITS and _PC_NO_TRUNC for smbfs' VOP_PATHCONF().
326993: Move NAME_MAX, LINK_MAX, and CHOWN_RESTRICTED out of vop_stdpathconf().
Having all filesystems fall through to default values isn't always correct and these values can vary for different filesystem implementations. Most of these changes just use the existing default values with a few exceptions: - Don't report CHOWN_RESTRICTED for ZFS since it doesn't do the exact permissions check this claims for chown(). - Use NANDFS_NAME_LEN for NAME_MAX for nandfs. - Don't report a LINK_MAX of 0 on smbfs. Now fail with EINVAL to indicate hard links aren't supported.
326994: Handle _PC_FILESIZEBITS and _PC_SYMLINK_MAX for devfs' VOP_PATHCONF().
326995: Use FUSE_LINK_MAX for LINK_MAX in fuse' VOP_PATHCONF().
Should have included this in r326993.
327004: Rework pathconf handling for FIFOs.
On the one hand, FIFOs should respect other variables not supported by the fifofs vnode operation (such as _PC_NAME_MAX, _PC_LINK_MAX, etc.). These values are fs-specific and must come from a fs-specific method. On the other hand, filesystems that support FIFOs are required to support _PC_PIPE_BUF on directory vnodes that can contain FIFOs. Given this latter requirement, once the fs-specific VOP_PATHCONF method supports _PC_PIPE_BUF for directories, it is also suitable for FIFOs permitting a single VOP_PATHCONF method to be used for both FIFOs and non-FIFOs.
To that end, retire all of the FIFO-specific pathconf methods from filesystems and change FIFO-specific vnode operation switches to use the existing fs-specific VOP_PATHCONF method. For fifofs, set it's VOP_PATHCONF to VOP_PANIC since it should no longer be used.
While here, move _PC_PIPE_BUF handling out of vop_stdpathconf() so that only filesystems supporting FIFOs will report a value. In addition, only report a valid _PC_PIPE_BUF for directories and FIFOs.
PR: 219851 Sponsored by: Chelsio Communications
|
#
327059 |
|
21-Dec-2017 |
kib |
MFC r326851: In devfs_lookupx() dotdot lookup case, avoid dereferencing dvp->v_mount after dvp is unlocked.
|
#
316366 |
|
01-Apr-2017 |
trasz |
MFC r313994:
Simplify devfs_fsync() by removing it. This might also be a minor optimization, as vn_isdisk() needs to lock a global mutex.
Sponsored by: DARPA, AFRL
|
#
316365 |
|
01-Apr-2017 |
trasz |
MFC r313994:
Change the "devfs_fsync: vop_stdfsync failed" from panic to a printf. It's not a proper fix, but should be better than what we have now. Since it got broken some six months ago it results in an incredibly annoying and trivially reproducible panic every time eg an USB disk gets disconnected.
Sponsored by: DARPA, AFRL
|
#
314298 |
|
26-Feb-2017 |
kib |
MFC r313967: Apply noexec mount option for mmap(PROT_EXEC).
PR: 217062
|
#
304843 |
|
26-Aug-2016 |
kib |
MFC r303382: Provide the getboottime(9) and getboottimebin(9) KPI.
MFC r303387: Prevent parallel tc_windup() calls. Keep boottime in timehands, and adjust it from tc_windup().
MFC notes:
The boottime and boottimebin globals are still exported from the kernel dyn symbol table in stable/11, but their declarations are removed from sys/time.h. This preserves KBI but not KPI, while all in-tree consumers are converted to getboottime().
The variables are updated after tc_setclock_mtx is dropped, which gives approximately same unlocked bugs as before.
The boottime and boottimebin locals in several sys/kern_tc.c functions were renamed by adding the '_x' suffix to avoid name conficts.
|
#
302408 |
|
07-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
301928 |
|
15-Jun-2016 |
kib |
Another follow-up to r291460. Only access vp->v_rdev for VCHR vnodes in devfs_reclaim().
Reported and tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (gjb) MFC after: 1 week
|
#
298732 |
|
28-Apr-2016 |
pfg |
sys/devfs: unsign an index to prevent signed integer overflow.
cdp_maxdirent in struct:cdev_priv is of type u_int. Use the same type for the corresponding index in devfs_revoke().
MFC after: 1 week
|
#
294204 |
|
17-Jan-2016 |
kib |
Assert that the linkage between struct cdev_privdata and and struct file is consistent.
Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
293827 |
|
13-Jan-2016 |
kib |
Make devfs_fpdrop() static. It was not a public KPI, and it has no reason to remain exported for some time.
Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
293059 |
|
02-Jan-2016 |
kib |
Hide transient EBADF errors caused by the parallel revoke(2) or forced unmount of devfs mounts, by restarting the failed syscall.
When restarted, failing syscalls eventually either stop finding the node and returning ENOENT, or the vnode op vectors finally transition to the deadfs vop. The later return EIO or other error, more appropriate for the operation.
Submitted by: bde Tested by: pho MFC after: 3 weeks
|
#
293042 |
|
01-Jan-2016 |
kib |
Minor style cleanup.
Submitted by: bde MFC after: 1 week
|
#
292624 |
|
22-Dec-2015 |
kib |
Make it possible for the cdevsw d_close() driver method to detect last close and close due to revoke(2)-like operation.
A new FLASTCLOSE flag indicates that this is last close. FREVOKE is set for revokes, and FNONBLOCK is also set, same as is already done for VOP_CLOSE() call from vgonel().
The flags reuse user open(2) flags which are never stored in f_flag, to not consume bit space in the ABI visible way. Assert this with the static check.
Requested and reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
292621 |
|
22-Dec-2015 |
kib |
Keep devfs mount locked for the whole duration of the devfs_setattr(), and ensure that our dirent is instantiated.
Reported and tested by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
291653 |
|
02-Dec-2015 |
jhb |
The cdevpriv_dtr_t typedef was not able to be used in a function prototype like the various d_*_t typedefs since it declared a function pointer rather than a function. Add a new d_priv_dtor_t typedef that declares the function and can be used as a function prototype. The previous typedef wasn't useful outside of the cdevpriv implementation, so retire it.
The name d_priv_dtor_t was chosen to be more consistent with cdev methods since it is commonly used in place of d_close_t even though it is not a direct pointer in struct cdevsw.
Reviewed by: kib, imp MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D4340
|
#
287033 |
|
23-Aug-2015 |
trasz |
After r286237 it should be fine to call vgone(9) on a busy GEOM vnode; remove KASSERT that would prevent forced devfs unmount from working.
MFC after: 1 month Sponsored by: The FreeBSD Foundation
|
#
286371 |
|
06-Aug-2015 |
jhb |
The changes that introduced fo_mmap() treated all character device mappings as if MAP_SHARED was always present since in general MAP_PRIVATE is not permitted for character devices. However, there is one exception in that MAP_PRIVATE mappings are permitted for /dev/zero.
Only require a writable file descriptor (FWRITE) for shared, writable mappings of character devices. vm_mmap_cdev() will reject any private mappings for other devices.
Reviewed by: kib Reported by: sbruno (broke qemu cross-builds), peter Differential Revision: https://reviews.freebsd.org/D3316
|
#
283998 |
|
04-Jun-2015 |
jhb |
Add a new file operations hook for mmap operations. File type-specific logic is now placed in the mmap hook implementation rather than requiring it to be placed in sys/vm/vm_mmap.c. This hook allows new file types to support mmap() as well as potentially allowing mmap() for existing file types that do not currently support any mapping.
The vm_mmap() function is now split up into two functions. A new vm_mmap_object() function handles the "back half" of vm_mmap() and accepts a referenced VM object to map rather than a (handle, handle_type) tuple. vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a a VM object and then calling vm_mmap_object() to handle the actual mapping. The vm_mmap() function remains for use by other parts of the kernel (e.g. device drivers and exec) but now only supports mapping vnodes, character devices, and anonymous memory.
The mmap() system call invokes vm_mmap_object() directly with a NULL object for anonymous mappings. For mappings using a file descriptor, the descriptors fo_mmap() hook is invoked instead. The fo_mmap() hook is responsible for performing type-specific checks and adjustments to arguments as well as possibly modifying mapping parameters such as flags or the object offset. The fo_mmap() hook routines then call vm_mmap_object() to handle the actual mapping.
The fo_mmap() hook is optional. If it is not set, then fo_mmap() will fail with ENODEV. A fo_mmap() hook is implemented for regular files, character devices, and shared memory objects (created via shm_open()).
While here, consistently use the VM_PROT_* constants for the vm_prot_t type for the 'prot' variable passed to vm_mmap() and vm_mmap_object() as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines. Previously some places were using the mmap()-specific PROT_* constants instead. While this happens to work because PROT_xx == VM_PROT_xx, using VM_PROT_* is more correct.
Differential Revision: https://reviews.freebsd.org/D2658 Reviewed by: alc (glanced over), kib MFC after: 1 month Sponsored by: Chelsio
|
#
280949 |
|
01-Apr-2015 |
kib |
Refine r280308. Do not completely disable timestamping of devfs nodes on reads or writes, the time marks are used to display idle time by w(1) [1]. Instead, use vfs.devfs.dotimes as the selector of default precision vs. using time_second. The later gives seconds precision, which is good enough for the purpose.
Note that timestamp updates are unlocked and the updates itself, as well as the check in devfs_timestamp, are non-atomic.
Noted by: truckman [1] Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
280308 |
|
20-Mar-2015 |
delphij |
Disable timestamping on devfs read/write operations by default.
Currently we update timestamps unconditionally when doing read or write operations. This may slow things down on hardware where reading timestamps is expensive (e.g. HPET, because of the default vfs.timestamp_precision setting is nanosecond now) with limited benefit.
A new sysctl variable, vfs.devfs.dotimes is added, which can be set to non-zero value when the old behavior is desirable.
Differential Revision: https://reviews.freebsd.org/D2104 Reported by: Mike Tancsa <mike sentex net> Reviewed by: kib Relnotes: yes Sponsored by: iXsystems, Inc. MFC after: 2 weeks
|
#
279362 |
|
27-Feb-2015 |
kib |
The VNASSERT in vflush() FORCECLOSE case is trying to panic early to prevent errors from yanking devices out from under filesystems. Only care about special vnodes on devfs, special nodes on other kinds of filesystems do not have special properties.
Sponsored by: EMC / Isilon Storage Division Submitted by: Conrad Meyer MFC after: 1 week
|
#
277390 |
|
19-Jan-2015 |
kib |
Ignore devfs directory entries for devices either being destroyed or delisted. The check is racy.
Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
274000 |
|
03-Nov-2014 |
mjg |
Fix up some session-related races in devfs.
One was introduced with r272596, the rest was there to begin with.
Noted by: jhb
|
#
273131 |
|
15-Oct-2014 |
kib |
When vnode bypass cannot be performed on the cdev file descriptor for read/write/poll/ioctl, call standard vnode filedescriptor fop. This restores the special handling for terminals by calling the deadfs VOP, instead of always returning ENXIO for destroyed devices or revoked terminals.
Since destroyed (and not revoked) device would use devfs_specops VOP vector, make dead_read/write/poll non-static and fill VOP table with pointers to the functions, to instead of VOP_PANIC.
Noted and reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
272600 |
|
06-Oct-2014 |
mjg |
devfs: tidy up after 272596
This moves a var to an if statement, no functional changes.
MFC after: 1 week
|
#
272596 |
|
06-Oct-2014 |
mjg |
devfs: don't take proctree_lock unconditionally in devfs_close
MFC after: 1 week
|
#
271976 |
|
22-Sep-2014 |
jhb |
Add a new fo_fill_kinfo fileops method to add type-specific information to struct kinfo_file. - Move the various fill_*_info() methods out of kern_descrip.c and into the various file type implementations. - Rework the support for kinfo_ofile to generate a suitable kinfo_file object for each file and then convert that to a kinfo_ofile structure rather than keeping a second, different set of code that directly manipulates type-specific file information. - Remove the shm_path() and ksem_info() layering violations.
Differential Revision: https://reviews.freebsd.org/D775 Reviewed by: kib, glebius (earlier version)
|
#
267564 |
|
17-Jun-2014 |
kib |
In msdosfs_setattr(), add a check for result of the utimes(2) permissions test, forgotten in r164033.
Refactor the permission checks for utimes(2) into vnode helper function vn_utimes_perm(9), and simplify its code comparing with the UFS origin, by writing the call to VOP_ACCESSX only once. Use the helper for UFS(5), tmpfs(5), devfs(5) and msdosfs(5).
Reported by: bde Reviewed by: bde, trasz Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
256502 |
|
15-Oct-2013 |
kib |
Similar to debug.iosize_max_clamp sysctl, introduce devfs_iosize_max_clamp sysctl, which allows/disables SSIZE_MAX-sized i/o requests on the devfs files.
Sponsored by: The FreeBSD Foundation Reminded by: Dmitry Sivachenko <trtrmitya@gmail.com> MFC after: 1 week
|
#
256501 |
|
15-Oct-2013 |
kib |
Remove two instances of ARGSUSED comment, and wrap lines nearby the code that is to be changed.
Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
254602 |
|
21-Aug-2013 |
kib |
Make the seek a method of the struct fileops.
Tested by: pho Sponsored by: The FreeBSD Foundation
|
#
254415 |
|
16-Aug-2013 |
kib |
Restore the previous sendfile(2) behaviour on the block devices. Provide valid .fo_sendfile method for several missed struct fileops.
Reviewed by: glebius Sponsored by: The FreeBSD Foundation
|
#
246472 |
|
07-Feb-2013 |
kib |
Stop translating the ERESTART error from the open(2) into EINTR. Posix requires that open(2) is restartable for SA_RESTART.
For non-posix objects, in particular, devfs nodes, still disable automatic restart of the opens. The open call to a driver could have significant side effects for the hardware.
Noted and reviewed by: jilles Discussed with: bde MFC after: 2 weeks
|
#
244643 |
|
23-Dec-2012 |
kib |
Do not force a writer to the devfs file to drain the buffer writes.
Requested and tested by: Ian Lepore <freebsd@damnhippie.dyndns.org> MFC after: 2 weeks
|
#
239303 |
|
15-Aug-2012 |
hselasky |
Streamline use of cdevpriv and correct some corner cases.
1) It is not useful to call "devfs_clear_cdevpriv()" from "d_close" callbacks, hence for example read, write, ioctl and so on might be sleeping at the time of "d_close" being called and then then freed private data can still be accessed. Examples: dtrace, linux_compat, ksyms (all fixed by this patch)
2) In sys/dev/drm* there are some cases in which memory will be freed twice, if open fails, first by code in the open routine, secondly by the cdevpriv destructor. Move registration of the cdevpriv to the end of the drm open routines.
3) devfs_clear_cdevpriv() is not called if the "d_open" callback registered cdevpriv data and the "d_open" callback function returned an error. Fix this.
Discussed with: phk MFC after: 2 weeks
|
#
238029 |
|
02-Jul-2012 |
kib |
Extend the KPI to lock and unlock f_offset member of struct file. It now fully encapsulates all accesses to f_offset, and extends f_offset locking to other consumers that need it, in particular, to lseek() and variants of getdirentries().
Ensure that on 32bit architectures f_offset, which is 64bit quantity, always read and written under the mtxpool protection. This fixes apparently easy to trigger race when parallel lseek()s or lseek() and read/write could destroy file offset.
The already broken ABI emulations, including iBCS and SysV, are not converted (yet).
Tested by: pho No objections from: jhb MFC after: 3 weeks
|
#
235922 |
|
24-May-2012 |
mav |
Revert devfs part of r235911. I was unaware about old but unfinished discussion between kib@ and gibbs@ about it.
|
#
235911 |
|
24-May-2012 |
mav |
MFprojects/zfsd: Revamp the CAM enclosure services driver. This updated driver uses an in-kernel daemon to track state changes and publishes physical path location information\for disk elements into the CAM device database.
Sponsored by: Spectra Logic Corporation Sponsored by: iXsystems, Inc. Submitted by: gibbs, will, mav
|
#
231949 |
|
20-Feb-2012 |
kib |
Fix found places where uio_resid is truncated to int.
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode.
Discussed with: bde, das (previous versions) MFC after: 1 month
|
#
228361 |
|
09-Dec-2011 |
jhb |
Explicitly use curthread while manipulating td_fpop during last close of a devfs file descriptor in devfs_close_f(). The passed in td argument may be NULL if the close was invoked by garbage collection of open file descriptors in pending control messages in the socket buffer of a UNIX domain socket after it was closed.
PR: kern/151758 Submitted by: Andrey Shidakov andrey shidakov ru Submitted by: Ruben van Staveren ruben verweg com Reviewed by: kib MFC after: 2 weeks
|
#
227697 |
|
19-Nov-2011 |
kib |
Existing VOP_VPTOCNP() interface has a fatal flow that is critical for nullfs. The problem is that resulting vnode is only required to be held on return from the successfull call to vop, instead of being referenced.
Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination with the VOP_VPTOCNP() interface means that the directory vnode returned from VOP_VPTOCNP() is reclaimed in advance, causing vn_fullpath() to error with EBADF or like.
Change the interface for VOP_VPTOCNP(), now the dvp must be referenced. Convert all in-tree implementations of VOP_VPTOCNP(), which is trivial, because vhold(9) and vref(9) are similar in the locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(), if any, should have no trouble with the fix.
Tested by: pho Reviewed by: mckusick MFC after: 3 weeks (subject of re approval)
|
#
227069 |
|
04-Nov-2011 |
jhb |
Move the cleanup of f_cdevpriv when the reference count of a devfs file descriptor drops to zero out of _fdrop() and into devfs_close_f() as it is only relevant for devfs file descriptors.
Reviewed by: kib MFC after: 1 week
|
#
227062 |
|
03-Nov-2011 |
kib |
Fix kernel panic when d_fdopen csw method is called for NULL fp. This may happen when kernel consumer calls VOP_OPEN().
Reported by: Tavis Ormandy <taviso cmpxchg8b com> through delphij MFC after: 3 days
|
#
224914 |
|
16-Aug-2011 |
kib |
Add the fo_chown and fo_chmod methods to struct fileops and use them to implement fchown(2) and fchmod(2) support for several file types that previously lacked it. Add MAC entries for chown/chmod done on posix shared memory and (old) in-kernel posix semaphores.
Based on the submission by: glebius Reviewed by: rwatson Approved by: re (bz)
|
#
223988 |
|
13-Jul-2011 |
kib |
While fixing the looping of a thread while devfs vnode is reclaimed, r179247 introduced a possibility of devfs_allocv() returning spurious ENOENT. If the vnode is selected by vnlru daemon for reclamation, then devfs_allocv() can get ENOENT from vget() due to devfs_close() dropping vnode lock around the call to cdevsw d_close method.
Use LK_RETRY in the vget() call, and do some part of the devfs_reclaim() work in devfs_allocv(), clearing vp->v_data and de->de_vnode. Retry the allocation of the vnode, now with de->de_vnode == NULL.
The check vp->v_data == NULL at the start of devfs_close() cannot be affected by the change, since vnode lock must be held while VI_DOOMED is set, and only dropped after the check.
Reported and tested by: Kohji Okuno <okuno.kohji jp panasonic com> Reviewed by: attilio MFC after: 3 weeks
|
#
216462 |
|
15-Dec-2010 |
jh |
Don't allow user created symbolic links to cover another entries marked with DE_USER. If a devfs rule hid such entry, it was possible to create infinite number of symbolic links with the same name.
Reviewed by: kib
|
#
216461 |
|
15-Dec-2010 |
jh |
- Assert that dm_lock is exclusively held in devfs_rules_apply() and in devfs_vmkdir() while adding the entry to de_list of the parent. - Apply devfs rules to newly created directories and symbolic links.
PR: kern/125034 Submitted by: Mateusz Guzik (original version)
|
#
213215 |
|
27-Sep-2010 |
jh |
Add reference counting for devfs paths containing user created symbolic links. The reference counting is needed to be able to determine if a specific devfs path exists. For true device file paths we can traverse the cdevp_list but a separate directory list is needed for user created symbolic links.
Add a new directory entry flag DE_USER to mark entries which should unreference their parent directory on deletion.
A new function to traverse cdevp_list and the directory list will be introduced in a separate commit.
Idea from: kib Reviewed by: kib
|
#
212966 |
|
21-Sep-2010 |
jh |
Modify devfs_fqpn() for future use in devfs path reference counting code:
- Accept devfs_mount and devfs_dirent as the arguments instead of a vnode. This generalizes the function so that it can be used from contexts where vnode references are not available. - Accept NULL cnp argument. No '/' will be appended, if a NULL cnp is provided. - Make the function global and add its prototype to devfs.h.
Reviewed by: kib
|
#
212660 |
|
15-Sep-2010 |
jh |
Remove empty devfs directories automatically.
devfs_delete() now recursively removes empty parent directories unless the DEVFS_DEL_NORECURSE flag is specified. devfs_delete() can't be called anymore with a parent directory vnode lock held because the possible parent directory deletion needs to lock the vnode. Thus we unlock the parent directory vnode in devfs_remove() before calling devfs_delete().
Call devfs_populate_vp() from devfs_symlink() and devfs_vptocnp() as now directories can get removed.
Add a check for DE_DOOMED flag to devfs_populate_vp() because devfs_delete() drops dm_lock before the VI_DOOMED vnode flag gets set. This ensures that devfs_populate_vp() returns an error for directories which are in progress of deletion.
Reviewed by: kib Discussed on: freebsd-current (mostly silence)
|
#
211847 |
|
26-Aug-2010 |
jh |
Set de_dir for user created symbolic links. This will be needed to be able to resolve their parent directories.
|
#
211816 |
|
25-Aug-2010 |
jh |
Call devfs_populate_vp() from devfs_getattr(). It was possible that fstat(2) returned stale information through an open file descriptor.
|
#
211628 |
|
22-Aug-2010 |
jh |
Introduce and use devfs_populate_vp() to unlock a vnode before calling devfs_populate(). This is a prerequisite for the automatic removal of empty directories which will be committed in the future.
Reviewed by: kib (previous version)
|
#
211531 |
|
20-Aug-2010 |
jhb |
Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and LK_CANRECURSE after a lock is created. Use them to implement macros that otherwise manipulated the flags directly. Assert that the associated lockmgr lock is exclusively locked by the current thread when manipulating these flags to ensure the flag updates are safe. This last change required some minor shuffling in a few filesystems to exclusively lock a brand new vnode slightly earlier.
Reviewed by: kib MFC after: 3 days
|
#
211513 |
|
19-Aug-2010 |
jh |
Call dev_rel() in error paths.
Reported by: kib Reviewed by: kib MFC after: 2 weeks
|
#
211226 |
|
12-Aug-2010 |
jh |
Allow user created symbolic links to cover device files and directories if the device file appears during or after the link creation.
User created symbolic links are now inserted at the head of the directory entry list after the "." and ".." entries. A new directory entry flag DE_COVERED indicates that an entry is covered by a symbolic link.
PR: kern/114057 Reviewed by: kib Idea from: kib Discussed on: freebsd-current (mostly silence)
|
#
210923 |
|
06-Aug-2010 |
kib |
Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created cdev will never be destroyed. Propagate the flag to devfs vnodes as VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a thread reference on such nodes.
In collaboration with: pho MFC after: 1 month
|
#
210921 |
|
06-Aug-2010 |
kib |
Enable shared locks for the devfs vnodes. Honor the locking mode requested by lookup(). This should be a nop at the moment.
In collaboration with: pho MFC after: 1 month
|
#
210918 |
|
06-Aug-2010 |
kib |
Initialize VV_ISTTY vnode flag on the devfs vnode creation instead of doing it on each open.
In collaboration with: pho MFC after: 1 month
|
#
208951 |
|
09-Jun-2010 |
jh |
Add a new function devfs_parent_dirent() for resolving devfs parent directory entry. Use the new function in devfs_fqpn(), devfs_lookupx() and devfs_vptocnp() instead of manually resolving the parent entry.
Reviewed by: kib
|
#
208717 |
|
01-Jun-2010 |
jh |
Don't try to call cdevsw d_close() method when devfs_close() is called because of insmntque1() failure.
Found with: stress2 Suggested and reviewed by: kib
|
#
200732 |
|
19-Dec-2009 |
ed |
Let access overriding to TTYs depend on the cdev_priv, not the vnode.
Basically this commit changes two things, which improves access to TTYs in exceptional conditions. Basically the problem was that when you ran jexec(8) to attach to a jail, you couldn't use /dev/tty (well, also the node of the actual TTY, e.g. /dev/pts/X). This is very inconvenient if you want to attach to screens quickly, use ssh(1), etc.
The fixes:
- Cache the cdev_priv of the controlling TTY in struct session. Change devfs_access() to compare against the cdev_priv instead of the vnode. This allows you to bypass UNIX permissions, even across different mounts of devfs.
- Extend devfs_prison_check() to unconditionally expose the device node of the controlling TTY, even if normal prison nesting rules normally don't allow this. This actually allows you to interact with this device node.
To be honest, I'm not really happy with this solution. We now have to store three pointers to a controlling TTY (s_ttyp, s_ttyvp, s_ttydp). In an ideal world, we should just get rid of the latter two and only use s_ttyp, but this makes certian pieces of code very impractical (e.g. devfs, kern_exit.c).
Reported by: Many people
|
#
194532 |
|
20-Jun-2009 |
ed |
Improve nested jail awareness of devfs by handling credentials.
Now that we start to use credentials on character devices more often (because of MPSAFE TTY), move the prison-checks that are in place in the TTY code into devfs.
Instead of strictly comparing the prisons, use the more common prison_check() function to compare credentials. This means that pseudo-terminals are only visible in devfs by processes within the same jail and parent jails.
Even though regular users in parent jails can now interact with pseudo-terminals from child jails, this seems to be the right approach. These processes are also capable of interacting with the jailed processes anyway, through signals for example.
Reviewed by: kib, rwatson (older version)
|
#
193919 |
|
10-Jun-2009 |
kib |
VOP_IOCTL takes unlocked vnode as an argument. Due to this, v_data may be NULL or derefenced memory may become free at arbitrary moment.
Lock the vnode in cd9660, devfs and pseudofs implementation of VOP_IOCTL to prevent reclaim; check whether the vnode was already reclaimed after the lock is granted.
Reported by: georg at dts su Reviewed by: des (pseudofs) MFC after: 2 weeks
|
#
193511 |
|
05-Jun-2009 |
rwatson |
Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include.
Discussed with: pjd
|
#
192151 |
|
15-May-2009 |
kib |
Devfs replaces file ops vector with devfs-specific one in devfs_open(), before the struct file is fully initialized in vn_open(), in particular, fp->f_vnode is NULL. Other thread calling file operation before f_vnode is set results in NULL pointer dereference in devvn_refthread().
Initialize f_vnode before calling d_fdopen() cdevsw method, that might set file ops too.
Reported and tested by: Chris Timmons <cwt networks cwu edu> (RELENG_7 version) MFC after: 3 days
|
#
191990 |
|
11-May-2009 |
attilio |
Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread.
In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP.
While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option.
VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
|
#
190888 |
|
10-Apr-2009 |
rwatson |
Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces.
Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd.
Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
|
#
189693 |
|
11-Mar-2009 |
kib |
Enable advisory file locking for devfs vnodes.
Reported by: Timothy Redaelli <timothy redaelli eu> MFC after: 1 week
|
#
189450 |
|
06-Mar-2009 |
kib |
Extract the no_poll() and vop_nopoll() code into the common routine poll_no_poll(). Return a poll_no_poll() result from devfs_poll_f() when filedescriptor does not reference the live cdev, instead of ENXIO.
Noted and tested by: hps MFC after: 1 week
|
#
187959 |
|
31-Jan-2009 |
bz |
Remove unused local variables.
Submitted by: Christoph Mallon christoph.mallon@gmx.de Reviewed by: kib MFC after: 2 weeks
|
#
186911 |
|
08-Jan-2009 |
trasz |
Don't panic with "vinvalbuf: dirty bufs" when the mounted device that was being written to goes away.
Reviewed by: kib, scottl Approved by: rwatson (mentor) Sponsored by: FreeBSD Foundation
|
#
185980 |
|
12-Dec-2008 |
kib |
Do not leak defs_de_interlock on error.
Another pointy hat for my collection.
|
#
185959 |
|
11-Dec-2008 |
marcus |
Implement VOP_VPTOCNP for devfs. Directory and character device vnodes are properly translated to their component names.
Reviewed by: arch Approved by: kib
|
#
184413 |
|
28-Oct-2008 |
trasz |
Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit.
Approved by: rwatson (mentor)
|
#
183383 |
|
26-Sep-2008 |
kib |
Save previous content of the td_fpop before storing the current filedescriptor into it. Make sure that td_fpop is NULL when calling d_mmap from dev_pager_getpages().
Change guards against td_fpop field being non-NULL with private state for another device, and against sudden clearing the td_fpop. This could occur when either a driver method calls another driver through the filedescriptor operation, or a page fault happen while driver is writing to a memory backed by another driver.
Noted by: rwatson Tested by: rnoland MFC after: 3 days
|
#
183215 |
|
20-Sep-2008 |
kib |
fdescfs, devfs, mqueuefs, nfs, portalfs, pseudofs, tmpfs and xfs initialize the vattr structure in VOP_GETATTR() with VATTR_NULL(), vattr_null() or by zeroing it. Remove these to allow preinitialization of fields work in vn_stat(). This is needed to get birthtime initialized correctly.
Submitted by: Jaakko Heinonen <jh saunalahti fi> Discussed on: freebsd-fs MFC after: 1 month
|
#
182371 |
|
28-Aug-2008 |
attilio |
Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful.
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
#
181905 |
|
20-Aug-2008 |
ed |
Integrate the new MPSAFE TTY layer to the FreeBSD operating system.
The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following:
- Improved driver model:
The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers.
If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver.
- Improved hotplugging:
With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc).
The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly.
- Improved performance:
One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters.
Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING.
Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan
|
#
181635 |
|
12-Aug-2008 |
kib |
Remove unnecessary locking around pointer fetch.
Requested by: jhb
|
#
179828 |
|
16-Jun-2008 |
kib |
Struct cdev is always the member of the struct cdev_priv. When devfs needed to promote cdev to cdev_priv, the si_priv pointer was followed.
Use member2struct() to calculate address of the wrapping cdev_priv. Rename si_priv to __si_reserved.
Tested by: pho Reviewed by: ed MFC after: 2 weeks
|
#
179554 |
|
05-Jun-2008 |
kib |
When devfs_allocv() committed to create new vnode, since de_vnode is NULL, the dm_lock is held while the newly allocated vnode is locked. Since no other threads may try to lock the new vnode yet, the LOR there cannot result in the deadlock.
Shut down the witness warning to note this fact.
Tested by: pho Prodded by: attilio
|
#
179475 |
|
01-Jun-2008 |
ed |
Revert the changes I made to devfs_setattr() in r179457.
As discussed with Robert Watson and John Baldwin, it would be better if PTY's are created with proper permissions, turning grantpt() into a no-op.
Bypassing security frameworks like MAC by passing NOCRED to VOP_SETATTR() will only make things more complex.
Approved by: philip (mentor)
|
#
179457 |
|
31-May-2008 |
ed |
Merge back devfs changes from the mpsafetty branch.
In the mpsafetty branch, PTY's are allocated through the posix_openpt() system call. The controller side of a PTY now uses its own file descriptor type (just like sockets, vnodes, pipes, etc).
To remain compatible with existing FreeBSD and Linux C libraries, we can still create PTY's by opening /dev/ptmx or /dev/ptyXX. These nodes implement d_fdopen(). Devfs has been slightly changed here, to allow finit() to be called from d_fdopen().
The routine grantpt() has also been moved into the kernel. This routine is a little odd, because it needs to bypass standard UNIX permissions. It needs to change the owner/group/mode of the slave device node, which may often not be possible. The old implementation solved this by spawning a setuid utility.
When VOP_SETATTR() is called with NOCRED, devfs_setattr() dereferences ap->a_cred, causing a kernel panic. Change the de_{uid,gid,mode} code to allow changes when a->a_cred is set to NOCRED.
Approved by: philip (mentor)
|
#
179247 |
|
23-May-2008 |
kib |
When vget() fails (because the vnode has been reclaimed), there is no sense to loop trying to vget() the vnode again.
PR: 122977 Submitted by: Arthur Hartwig <arthur.hartwig nokia com> Tested by: pho Reviewed by: jhb MFC after: 1 week
|
#
179175 |
|
21-May-2008 |
kib |
Implement the per-open file data for the cdev.
The patch does not change the cdevsw KBI. Management of the data is provided by the functions int devfs_set_cdevpriv(void *priv, cdevpriv_dtr_t dtr); int devfs_get_cdevpriv(void **datap); void devfs_clear_cdevpriv(void); All of the functions are supposed to be called from the cdevsw method contexts.
- devfs_set_cdevpriv assigns the priv as private data for the file descriptor which is used to initiate currently performed driver operation. dtr is the function that will be called when either the last refernce to the file goes away, the device is destroyed or devfs_clear_cdevpriv is called. - devfs_get_cdevpriv is the obvious accessor. - devfs_clear_cdevpriv allows to clear the private data for the still open file.
Implementation keeps the driver-supplied pointers in the struct cdev_privdata, that is referenced both from the struct file and struct cdev, and cannot outlive any of the referee.
Man pages will be provided after the KPI stabilizes.
Reviewed by: jhb Useful suggestions from: jeff, antoine Debugging help and tested by: pho MFC after: 1 month
|
#
178834 |
|
07-May-2008 |
jhb |
Don't explicitly drop Giant around d_open/d_fdopen/d_close for MPSAFE drivers. Since devfs is already marked MPSAFE it shouldn't be held anyway.
MFC after: 2 weeks Discussed with: phk
|
#
176559 |
|
25-Feb-2008 |
attilio |
Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is always curthread.
As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits.
Tested by: Andrea Barberio <insomniac at slackware dot it>
|
#
175294 |
|
13-Jan-2008 |
attilio |
VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary.
KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed.
Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
|
#
175202 |
|
09-Jan-2008 |
attilio |
vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
|
#
175151 |
|
08-Jan-2008 |
jhb |
Lock the vnode interlock while reading v_usecount to update si_usecount in a cdev in devfs_reclaim().
MFC after: 3 days Reviewed by: jeff (a while ago)
|
#
175140 |
|
07-Jan-2008 |
jhb |
Make ftruncate a 'struct file' operation rather than a vnode operation. This makes it possible to support ftruncate() on non-vnode file types in the future. - 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on a given file descriptor. - ftruncate() moves to kern/sys_generic.c and now just fetches a file object and invokes fo_truncate(). - The vnode-specific portions of ftruncate() move to vn_truncate() in vfs_vnops.c which implements fo_truncate() for vnode file types. - Non-vnode file types return EINVAL in their fo_truncate() method.
Submitted by: rwatson
|
#
174988 |
|
29-Dec-2007 |
jeff |
Remove explicit locking of struct file. - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them.
Tested by: kris, pho
|
#
172930 |
|
24-Oct-2007 |
rwatson |
Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms:
mac_<object>_<method/action> mac_<object>_check_<method/action>
The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names.
All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI.
Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
|
#
171599 |
|
26-Jul-2007 |
pjd |
When we do open, we should lock the vnode exclusively. This fixes few races: - fifo race, where two threads assign v_fifoinfo, - v_writecount modifications, - v_object modifications, - and probably more...
Discussed with: kib, ups Approved by: re (rwatson)
|
#
171181 |
|
03-Jul-2007 |
kib |
Since rev. 1.199 of sys/kern/kern_conf.c, the thread that calls destroy_dev() from d_close() cdev method would self-deadlock. devfs_close() bump device thread reference counter, and destroy_dev() sleeps, waiting for si_threadcount to reach zero for cdev without d_purge method.
destroy_dev_sched() could be used instead from d_close(), to schedule execution of destroy_dev() in another context. The destroy_dev_sched_drain() function can be used to drain the scheduled calls to destroy_dev_sched(). Similarly, drain_dev_clone_events() drains the events clone to make sure no lingering devices are left after dev_clone event handler deregistered.
make_dev_credf(MAKEDEV_REF) function should be used from dev_clone event handlers instead of make_dev()/make_dev_cred() to ensure that created device has reference counter bumped before cdev mutex is dropped inside make_dev().
Reviewed by: tegge (early versions), njl (programming interface) Debugging help and testing by: Peter Holm Approved by: re (kensmith)
|
#
170587 |
|
11-Jun-2007 |
rwatson |
Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present.
Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c.
We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h.
Reviewed by: csjp Obtained from: TrustedBSD Project
|
#
170152 |
|
31-May-2007 |
kib |
Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file.
Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)
|
#
168977 |
|
23-Apr-2007 |
rwatson |
Rename mac*devfsdirent*() to mac*devfs*() to synchronize with SEDarwin, where similar data structures exist to support devfs and the MAC Framework, but are named differently.
Obtained from: TrustedBSD Project Sponsored by: SPARTA, Inc.
|
#
168884 |
|
19-Apr-2007 |
trhodes |
In some cases, like whenever devfs file times are zero, the fix(aa) will not be applied to dev entries. This leaves us with file times like "Jan 1 1970." Work around this problem by replacing the tv_sec == 0 check with a <= 3600 check. It's doubtful anyone will be booting within an hour of the Epoch, let alone care about a few seconds worth of nonzero timestamps. It's a hackish work around, but it does work and I have not experienced any negatives in my testing.
Discussed with: bde "Ok with me: phk
|
#
168355 |
|
04-Apr-2007 |
rwatson |
Replace custom file descriptor array sleep lock constructed using a mutex and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead.
- Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks.
- Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively.
- Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb).
- Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date.
In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio).
Tested by: kris Discussed with: jhb, kris, attilio, jeff
|
#
167916 |
|
26-Mar-2007 |
kris |
Annotate that this giant acqusition is dependent on tty locking.
|
#
167497 |
|
12-Mar-2007 |
tegge |
Make insmntque() externally visibile and allow it to fail (e.g. during late stages of unmount). On failure, the vnode is recycled.
Add insmntque1(), to allow for file system specific cleanup when recycling vnode on failure.
Change getnewvnode() to no longer call insmntque(). Previously, embryonic vnodes were put onto the list of vnode belonging to a file system, which is unsafe for a file system marked MPSAFE.
Change vfs_hash_insert() to no longer lock the vnode. The caller now has that responsibility.
Change most file systems to lock the vnode and call insmntque() or insmntque1() after a new vnode has been sufficiently setup. Handle failed insmntque*() calls by propagating errors to callers, possibly after some file system specific cleanup.
Approved by: re (kensmith) Reviewed by: kib In collaboration with: kib
|
#
164033 |
|
06-Nov-2006 |
rwatson |
Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking.
Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
|
#
163606 |
|
22-Oct-2006 |
rwatson |
Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead.
This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd.
Obtained from: TrustedBSD Project Sponsored by: SPARTA
|
#
163530 |
|
20-Oct-2006 |
kib |
Update the access and modification times for dev while still holding thread reference on it.
Reviewed by: tegge Approved by: pjd (mentor)
|
#
163529 |
|
20-Oct-2006 |
kib |
Fix the race between devfs_fp_check and devfs_reclaim. Derefence the vnode' v_rdev and increment the dev threadcount , as well as clear it (in devfs_reclaim) under the dev_lock().
Reviewed by: tegge Approved by: pjd (mentor)
|
#
163481 |
|
18-Oct-2006 |
kib |
Properly lock the vnode around vgone() calls.
Unlock the vnode in devfs_close() while calling into the driver d_close() routine.
devfs_revoke() changes by: ups Reviewed and bugfixes by: tegge Tested by: mbr, Peter Holm Approved by: pjd (mentor) MFC after: 1 week
|
#
162443 |
|
19-Sep-2006 |
kib |
Fix the bug in rev. 1.134. In devfs_allocv_drop_refs(), when not_found == 2 and drop_dm_lock is true, no unlocking shall be attempted. The lock is already dropped and memory is freed.
Found with: Coverity Prevent(tm) CID: 1536 Approved by: pjd (mentor)
|
#
162398 |
|
18-Sep-2006 |
kib |
Resolve the devfs deadlock caused by LOR between devfs_mount->dm_lock and vnode lock in devfs_allocv. Do this by temporary dropping dm_lock around vnode locking.
For safe operation, add hold counters for both devfs_mount and devfs_dirent, and DE_DOOMED flag for devfs_dirent. The facilities allow to continue after dropping of the dm_lock, by making sure that referenced memory does not disappear.
Reviewed by: tegge Tested by: kris Approved by: kan (mentor) PR: kern/102335
|
#
160425 |
|
17-Jul-2006 |
phk |
Remove the NDEVFSINO and NDEVFSOVERFLOW options which no longer exists in DEVFS.
Remove the opt_devfs.h file now that it is empty.
|
#
160310 |
|
12-Jul-2006 |
ups |
Add vnode interlocking to devfs. This prevents race conditions that can cause pagefaults or devfs to use arbitrary vnodes.
MFC after: 1 week
|
#
160132 |
|
06-Jul-2006 |
rwatson |
Use #include "", not #include <> for opt_foo.h.
MFC after: 3 days
|
#
157342 |
|
31-Mar-2006 |
jeff |
- Add a bogus vhold/vdrop around vgone() in devfs_revoke. Without this the vnode is never recycled. It is bogus because the reference really should be associated with the devfs dirent.
|
#
155034 |
|
30-Jan-2006 |
jeff |
- Remove a stale comment. This function was rewritten to be SMP safe some time ago.
Sponsored by: Isilon Systems, Inc.
|
#
152254 |
|
09-Nov-2005 |
dwhite |
This is a workaround for a complicated issue involving VFS cookies and devfs. The PR and patch have the details. The ultimate fix requires architectural changes and clarifications to the VFS API, but this will prevent the system from panicking when someone does "ls /dev" while running in a shell under the linuxulator.
This issue affects HEAD and RELENG_6 only.
PR: 88249 Submitted by: "Devon H. O'Dell" <dodell@ixsystems.com> MFC after: 3 days
|
#
151453 |
|
18-Oct-2005 |
phk |
Use correct cirteria for determining which directory entries we can purge right away and which we merely can hide.
Beaten into my skull by: kris
|
#
150342 |
|
19-Sep-2005 |
phk |
Rewamp DEVFS internals pretty severely [1].
Give DEVFS a proper inode called struct cdev_priv. It is important to keep in mind that this "inode" is shared between all DEVFS mountpoints, therefore it is protected by the global device mutex.
Link the cdev_priv's into a list, protected by the global device mutex. Keep track of each cdev_priv's state with a flag bit and of references from mountpoints with a dedicated usecount.
Reap the benefits of much improved kernel memory allocator and the generally better defined device driver APIs to get rid of the tables of pointers + serial numbers, their overflow tables, the atomics to muck about in them and all the trouble that resulted in.
This makes RAM the only limit on how many devices we can have.
The cdev_priv is actually a super struct containing the normal cdev as the "public" part, and therefore allocation and freeing has moved to devfs_devs.c from kern_conf.c.
The overall responsibility is (to be) split such that kern/kern_conf.c is the stuff that deals with drivers and struct cdev and fs/devfs handles filesystems and struct cdev_priv and their private liason exposed only in devfs_int.h.
Move the inode number from cdev to cdev_priv and allocate inode numbers properly with unr. Local dirents in the mountpoints (directories, symlinks) allocate inodes from the same pool to guarantee against overlaps.
Various other fields are going to migrate from cdev to cdev_priv in the future in order to hide them. A few fields may migrate from devfs_dirent to cdev_priv as well.
Protect the DEVFS mountpoint with an sx lock instead of lockmgr, this lock also protects the directory tree of the mountpoint.
Give each mountpoint a unique integer index, allocated with unr. Use it into an array of devfs_dirent pointers in each cdev_priv. Initially the array points to a single element also inside cdev_priv, but as more devfs instances are mounted, the array is extended with malloc(9) as necessary when the filesystem populates its directory tree.
Retire the cdev alias lists, the cdev_priv now know about all the relevant devfs_dirents (and their vnodes) and devfs_revoke() will pick them up from there. We still spelunk into other mountpoints and fondle their data without 100% good locking. It may make better sense to vector the revoke event into the tty code and there do a destroy_dev/make_dev on the tty's devices, but that's for further study.
Lots of shuffling of stuff and churn of bits for no good reason[2].
XXX: There is still nothing preventing the dev_clone EVENTHANDLER from being invoked at the same time in two devfs mountpoints. It is not obvious what the best course of action is here.
XXX: comment out an if statement that lost its body, until I can find out what should go there so it doesn't do damage in the meantime.
XXX: Leave in a few extra malloc types and KASSERTS to help track down any remaining issues.
Much testing provided by: Kris Much confusion caused by (races in): md(4)
[1] You are not supposed to understand anything past this point.
[2] This line should simplify life for the peanut gallery.
|
#
150200 |
|
15-Sep-2005 |
phk |
Don't attempt to recurse lockmgr, it doesn't like it.
|
#
150151 |
|
15-Sep-2005 |
phk |
Various minor polishing.
|
#
150149 |
|
15-Sep-2005 |
phk |
Absolve devfs_rule.c from locking responsibility and call it with all necessary locking held.
|
#
150019 |
|
12-Sep-2005 |
phk |
Clean up prototypes.
|
#
149573 |
|
29-Aug-2005 |
phk |
Add a missing dev_relthread() call.
Remove unused variable.
Spotted by: Hans Petter Selasky <hselasky@c2i.net>
|
#
149177 |
|
17-Aug-2005 |
phk |
Handle device drivers with D_NEEDGIANT in a way which does not penalize the 'good' drivers: Allocate a shadow cdevsw and populate it with wrapper functions which grab Giant
|
#
149146 |
|
16-Aug-2005 |
phk |
Collect the devfs related sysctls in one place
|
#
149107 |
|
15-Aug-2005 |
phk |
Eliminate effectively unused dm_basedir field from devfs_mount.
|
#
148868 |
|
08-Aug-2005 |
rwatson |
Merge the dev_clone and dev_clone_cred event handlers into a single event handler, dev_clone, which accepts a credential argument. Implementors of the event can ignore it if they're not interested, and most do. This avoids having multiple event handler types and fall-back/precedence logic in devfs.
This changes the kernel API for /dev cloning, and may affect third party packages containg cloning kernel modules.
Requested by: phk MFC after: 3 days
|
#
148182 |
|
20-Jul-2005 |
simon |
Correct devfs ruleset bypass.
Submitted by: csjp Reviewed by: phk Security: FreeBSD-SA-05:17.devfs Approved by: cperciva
|
#
147982 |
|
14-Jul-2005 |
rwatson |
When devfs cloning takes place, provide access to the credential of the process that caused the clone event to take place for the device driver creating the device. This allows cloned device drivers to adapt the device node based on security aspects of the process, such as the uid, gid, and MAC label.
- Add a cred reference to struct cdev, so that when a device node is instantiated as a vnode, the cloning credential can be exposed to MAC.
- Add make_dev_cred(), a version of make_dev() that additionally accepts the credential to stick in the struct cdev. Implement it and make_dev() in terms of a back-end make_dev_credv().
- Add a new event handler, dev_clone_cred, which can be registered to receive the credential instead of dev_clone, if desired.
- Modify the MAC entry point mac_create_devfs_device() to accept an optional credential pointer (may be NULL), so that MAC policies can inspect and act on the label or other elements of the credential when initializing the skeleton device protections.
- Modify tty_pty.c to register clone_dev_cred and invoke make_dev_cred(), so that the pty clone credential is exposed to the MAC Framework.
While currently primarily focussed on MAC policies, this change is also a prerequisite for changes to allow ptys to be instantiated with the UID of the process looking up the pty. This requires further changes to the pty driver -- in particular, to immediately recycle pty nodes on last close so that the credential-related state can be recreated on next lookup.
Submitted by: Andrew Reisse <andrew.reisse@sparta.com> Obtained from: TrustedBSD Project Sponsored by: SPAWAR, SPARTA MFC after: 1 week MFC note: Merge to 6.x, but not 5.x for ABI reasons
|
#
146823 |
|
31-May-2005 |
rodrigc |
Do not declare a struct as extern, and then implement it as static in the same file. This is not legal C, and GCC 4.0 will issue an error.
Reviewed by: phk Approved by: das (mentor)
|
#
145730 |
|
30-Apr-2005 |
jeff |
- In devfs_open() and devfs_close() grab Giant if the driver sets NEEDGIANT. We still have to DROP_GIANT and PICKUP_GIANT when NEEDGIANT is not set because vfs is still sometime entered with Giant held.
|
#
145006 |
|
13-Apr-2005 |
jeff |
- Change all filesystems and vfs_cache to relock the dvp once the child is locked in the ISDOTDOT case. Se vfs_lookup.c r1.79 for details.
Sponsored by: Isilon Systems, Inc.
|
#
144389 |
|
31-Mar-2005 |
phk |
Explicitly hold a reference to the cdev we have just cloned. This closes the race where the cdev was reclaimed before it ever made it back to devfs lookup.
|
#
144384 |
|
31-Mar-2005 |
phk |
Rename dev_ref() to dev_refl()
|
#
144208 |
|
28-Mar-2005 |
jeff |
- We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.
Sponsored by: Isilon Systems, Inc.
|
#
143510 |
|
13-Mar-2005 |
jeff |
- The VI_DOOMED flag now signals the end of a vnode's relationship with the filesystem. Check that rather than VI_XLOCK.
Sponsored by: Isilon Systems, Inc.
|
#
143383 |
|
10-Mar-2005 |
phk |
One more bit of the major/minor patch to make ttyname happy as well.
|
#
143381 |
|
10-Mar-2005 |
phk |
Try to fix the mess I made of devname, with the minimal subset of the larger minor/major patch which was posted for testing.
|
#
142250 |
|
22-Feb-2005 |
phk |
We may not have an actual cdev at this point.
|
#
142242 |
|
22-Feb-2005 |
phk |
Reap more benefits from DEVFS:
List devfs_dirents rather than vnodes off their shared struct cdev, this saves a pointer field in the vnode at the expense of a field in the devfs_dirent. There are often 100 times more vnodes so this is bargain. In addition it makes it harder for people to try to do stypid things like "finding the vnode from cdev".
Since DEVFS handles all VCHR nodes now, we can do the vnode related cleanup in devfs_reclaim() instead of in dev_rel() and vgonel(). Similarly, we can do the struct cdev related cleanup in dev_rel() instead of devfs_reclaim().
rename idestroy_dev() to destroy_devl() for consistency.
Add LIST_ENTRY de_alias to struct devfs_dirent. Remove v_specnext from struct vnode. Change si_hlist to si_alist in struct cdev. String new devfs vnodes' devfs_dirent on si_alist when we create them and take them off in devfs_reclaim().
Fix devfs_revoke() accordingly. Also don't clear fields devfs_reclaim() will clear when called from vgone();
Let devfs_reclaim() call dev_rel() instead of vgonel().
Move the usecount tracking from dev_rel() to devfs_reclaim(), and let dev_rel() take a struct cdev argument instead of vnode.
Destroy SI_CHEAPCLONE devices in dev_rel() (instead of devfs_reclaim()) when they are no longer used. (This should maybe happen in devfs_close() instead.)
|
#
142232 |
|
22-Feb-2005 |
phk |
Make dev_ref() require the dev_lock() to be held and use it from devfs instead of directly frobbing the si_refcount.
|
#
142011 |
|
17-Feb-2005 |
phk |
Introduce vx_wait{l}() and use it instead of home-rolled versions.
|
#
141617 |
|
10-Feb-2005 |
phk |
Statize devfs_ops_f
|
#
140939 |
|
28-Jan-2005 |
phk |
Make filesystems get rid of their own vnodes vnode_pager object in VOP_RECLAIM().
|
#
140196 |
|
13-Jan-2005 |
phk |
Whitespace in vop_vector{} initializations.
|
#
139776 |
|
06-Jan-2005 |
imp |
/* -> /*- for copyright notices, minor format tweaks as necessary
|
#
139189 |
|
22-Dec-2004 |
phk |
Be consistent about flag values passed to device drivers read/write methods:
Read can see O_NONBLOCK and O_DIRECT.
Write can see O_NONBLOCK, O_DIRECT and O_FSYNC.
In addition O_DIRECT is shadowed as IO_DIRECT for now for backwards compatibility.
|
#
139188 |
|
22-Dec-2004 |
phk |
Shuffle numeric values of the IO_* flags to match the O_* flags from fcntl.h.
This is in preparation for making the flags passed to device drivers be consistently from fcntl.h for all entrypoints.
Today open, close and ioctl uses fcntl.h flags, while read and write uses vnode.h flags.
|
#
139085 |
|
20-Dec-2004 |
phk |
We can only ever get to vgonechrl() from a devfs vnode, so we do not need to reassign the vp->v_op to devfs_specops, we know that is the value already.
Make devfs_specops private to devfs.
|
#
139083 |
|
20-Dec-2004 |
phk |
Add a couple of KASSERTS to try to diagnose a problem reported.
|
#
138841 |
|
14-Dec-2004 |
phk |
Be a bit more assertive about vnode bypass.
|
#
138791 |
|
13-Dec-2004 |
phk |
Another FNONBLOCK -> O_NONBLOCK.
Don't unconditionally set IO_UNIT to device drivers in write: nobody checks it, and since it was always set it did not carry information anyway.
|
#
138790 |
|
13-Dec-2004 |
phk |
Use O_NONBLOCK instead of FNONBLOCK alias.
|
#
138788 |
|
13-Dec-2004 |
phk |
Explicit panic in vop_read/vop_write for devices
|
#
138509 |
|
07-Dec-2004 |
phk |
The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go:
cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS()
nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility.
ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS()
Remove vfs_omount() method, all filesystems are now converted.
Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now.
Change rootmounting to use DEVFS trampoline:
vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem.
Remove now unnecessary getdiskbyname().
kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c.
Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer.
Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).
|
#
138290 |
|
01-Dec-2004 |
phk |
Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals.
Adjust to more contemporary circumstances and gain type checking.
Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place.
Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.
Give coda correct prototypes and function definitions for all vop_()s.
Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods.
Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector.
Remove a lot of vfs_init since vop_vector is ready to use from the compiler.
Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc.
Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse.
Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)
|
#
138270 |
|
01-Dec-2004 |
phk |
Mechanically change prototypes for vnode operations to use the new typedefs.
|
#
137800 |
|
17-Nov-2004 |
phk |
Make vnode bypass for devices mandatory.
|
#
137755 |
|
15-Nov-2004 |
phk |
Make vnode bypass the default for devices.
Can be disabled in case of problems with vfs.devfs.fops=0 in loader.conf
|
#
137679 |
|
13-Nov-2004 |
phk |
Integrate most of vop_revoke() into devfs_revoke() where it belongs.
|
#
137678 |
|
13-Nov-2004 |
phk |
Add the devfs_fp_check() function which helps us get from a struct file to a cdev and a devsw, doing all the relevant checks along the way.
Add the check to see if fp->f_vnode->v_rdev differs from our cached fp->f_data copy of our cdev. If it does the device was revoked and we return ENXIO.
|
#
137382 |
|
08-Nov-2004 |
phk |
Add optional device vnode bypass to DEVFS.
The tunable vfs.devfs.fops controls this feature and defaults to off.
When enabled (vfs.devfs.fops=1 in loader), device vnodes opened through a filedescriptor gets a special fops vector which instead of the detour through the vnode layer goes directly to DEVFS.
Amongst other things this allows us to run Giant free read/write to device drivers which have been weaned off D_NEEDGIANT.
Currently this means /dev/null, /dev/zero, disks, (and maybe the random stuff ?)
On a 700MHz K7 machine this doubles the speed of dd if=/dev/zero of=/dev/null bs=1 count=1000000
This roughly translates to shaving 2usec of each read/write syscall.
The poll/kqfilter paths need more work before they are giant free, this work is ongoing in p4::phk_bufwork
Please test this and report any problems, LORs etc.
|
#
137308 |
|
06-Nov-2004 |
phk |
Properly implement a default version of VOP_GETWRITEMOUNT.
Remove improper access to vop_stdgetwritemount() which should and will instead rely on the VOP default path.
|
#
137195 |
|
04-Nov-2004 |
phk |
Add back securelevel check for disks.
XXX: This should live in geom_dev.c but we don't have access to the cred there. XXX: XXX: This may not matter anymore since filesystems use geom_vfs.
|
#
137047 |
|
29-Oct-2004 |
phk |
Don't give disks special treatment, they don't come this way anymore.
|
#
137043 |
|
29-Oct-2004 |
phk |
Remove VOP_SPECSTRATEGY() from the system.
|
#
137029 |
|
29-Oct-2004 |
phk |
Give dev_strategy() an explict cdev argument in preparation for removing buf->b-dev.
Put a bio between the buf passed to dev_strategy() and the device driver strategy routine in order to not clobber fields in the buf.
Assert copyright on vfs_bio.c and update copyright message to canonical text. There is no legal difference between John Dysons two-clause abbreviated BSD license and the canonical text.
|
#
137006 |
|
28-Oct-2004 |
phk |
What can I say: don't allow people to mount DEVFS with option "nodev".
|
#
136966 |
|
26-Oct-2004 |
phk |
Put the I/O block size in bufobj->bo_bsize.
We keep si_bsize_phys around for now as that is the simplest way to pull the number out of disk device drivers in devfs_open(). The correct solution would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth when filesystems sit on GEOM, so don't bother for now.
|
#
136770 |
|
22-Oct-2004 |
phk |
Alas, poor SPECFS! -- I knew him, Horatio; A filesystem of infinite jest, of most excellent fancy: he hath taught me lessons a thousand times; and now, how abhorred in my imagination it is! my gorge rises at it. Here were those hacks that I have curs'd I know not how oft. Where be your kludges now? your workarounds? your layering violations, that were wont to set the table on a roar?
Move the skeleton of specfs into devfs where it now belongs and bury the rest.
|
#
132653 |
|
26-Jul-2004 |
cperciva |
Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags.
The old name is still defined, but will be removed in a few days (unless I hear any complaints...)
Discussed with: rwatson, scottl Requested by: jhb
|
#
132547 |
|
22-Jul-2004 |
rwatson |
In devfs_allocv(), rather than assigning 'td = curthread', assert that the caller passes in a td that is curthread, and consistently pass 'td' into vget(). Remove some bogus logic that passed in td or curthread conditional on td being non-NULL, which seems redundant in the face of the earlier assignment of td to curthread if td is NULL.
In devfs_symlink(), cache the passed thread in 'td' so we don't have to keep retrieving it from the 'ap' structure, and assert that td is curthread (since we dereference it to get thread-local td_ucred). Use 'td' in preference to curthread for later lockmgr calls, since they are equal.
|
#
130640 |
|
17-Jun-2004 |
phk |
Second half of the dev_t cleanup.
The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev()
Various minor adjustments including handling of userland access to kernel space struct cdev etc.
|
#
130585 |
|
16-Jun-2004 |
phk |
Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.
|
#
126019 |
|
19-Feb-2004 |
phk |
Report the correct length for symlink entries.
|
#
124081 |
|
02-Jan-2004 |
phk |
Improve on POLA by populating DEVFS before doing devfs(8) rule ioctls.
PR: 60687 Spotted by: Colin Percival <cperciva@daemonology.net>
|
#
121281 |
|
20-Oct-2003 |
phk |
Remember to check the DE_WHITEOUT flag in the case where a cloned device is hidden by a devfs(8) rule.
Spotted by: Adam Nowacki <ptnowak@bsk.vectranet.pl>
|
#
121270 |
|
20-Oct-2003 |
phk |
When a driver successfully created a device on demand, we can directly pick up the DEVFS inode number from the dev_t and find our directory entry from that, we don't need to scan the directory to find it.
This also solves an issue with on-demand devices in subdirectories.
Submitted by: cognet
|
#
115511 |
|
31-May-2003 |
phk |
Remove unused variable.
Found by: FlexeLint
|
#
111841 |
|
03-Mar-2003 |
njl |
Finish cleanup of vprint() which was begun with changing v_tag to a string. Remove extraneous uses of vop_null, instead defering to the default op. Rename vnode type "vfs" to the more descriptive "syncer". Fix formatting for various filesystems that use vop_print.
|
#
111742 |
|
02-Mar-2003 |
des |
Clean up whitespace, s/register //, refrain from strong urge to ANSIfy.
|
#
111741 |
|
02-Mar-2003 |
des |
uiomove-related caddr_t -> void * (just the low-hanging fruit)
|
#
111119 |
|
19-Feb-2003 |
imp |
Back out M_* changes, per decision of the TRB.
Approved by: trb
|
#
110063 |
|
29-Jan-2003 |
phk |
NODEVFS cleanup: remove #ifdefs.
|
#
109623 |
|
21-Jan-2003 |
alfred |
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
|
#
109202 |
|
13-Jan-2003 |
phk |
Even if the permissions deny it, a process should be allowed to access its controlling terminal.
In essense, history dictates that any process is allowed to open /dev/tty for RW, irrespective of credential, because by definition it is it's own controlling terminal.
Before DEVFS we relied on a hacky half-device thing (kern/tty_tty.c) which did the magic deep down at device level, which at best was disgusting from an architectural point of view.
My first shot at this was to use the cloning mechanism to simply give people the right tty when they ask for /dev/tty, that's why you get this, slightly counter intuitive result:
syv# ls -l /dev/tty `tty` crw--w---- 1 u1 tty 5, 0 Jan 13 22:14 /dev/tty crw--w---- 1 u1 tty 5, 0 Jan 13 22:14 /dev/ttyp0
Trouble is, when user u1 su(1)'s to user u2, he cannot open /dev/ttyp0 anymore because he doesn't have permission to do so.
The above fix allows him to do that.
The interesting side effect is that one was previously only able to access the controlling tty by indirection: date > /dev/tty but not by name: date > `tty`
This is now possible, and that feels a lot more like DTRT.
PR: 46635 MFC candidate: could be.
|
#
108648 |
|
04-Jan-2003 |
phk |
Since Jeffr made the std* functions the default in rev 1.63 of kern/vfs_defaults.c it is wrong for the individual filesystems to use the std* functions as that prevents override of the default.
Found by: src/tools/tools/vop_table
|
#
108341 |
|
28-Dec-2002 |
rwatson |
Trim left-over and unused vop_refreshlabel() bits from devfs.
Reported by: bde
|
#
107698 |
|
09-Dec-2002 |
rwatson |
Remove dm_root entry from struct devfs_mount. It's never set, and is unused. Replace it with a dm_mount back-pointer to the struct mount that the devfs_mount is associated with. Export that pointer to MAC Framework entry points, where all current policies don't use the pointer. This permits the SEBSD port of SELinux's FLASK/TE to compile out-of-the-box on 5.0-CURRENT with full file system labeling support.
Approved by: re (murray) Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
105988 |
|
26-Oct-2002 |
rwatson |
Slightly change the semantics of vnode labels for MAC: rather than "refreshing" the label on the vnode before use, just get the label right from inception. For single-label file systems, set the label in the generic VFS getnewvnode() code; for multi-label file systems, leave the labeling up to the file system. With UFS1/2, this means reading the extended attribute during vfs_vget() as the inode is pulled off disk, rather than hitting the extended attributes frequently during operations later, improving performance. This also corrects sematics for shared vnode locks, which were not previously present in the system. This chances the cache coherrency properties WRT out-of-band access to label data, but in an acceptable form. With UFS1, there is a small race condition during automatic extended attribute start -- this is not present with UFS2, and occurs because EAs aren't available at vnode inception. We'll introduce a work around for this shortly.
Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
105585 |
|
20-Oct-2002 |
rwatson |
Missed a case of _POSIX_MAC_PRESENT -> _PC_MAC_PRESENT rename.
Pointed out by: phk
|
#
105212 |
|
16-Oct-2002 |
phk |
Fix comments and one resulting code confusion about the type of the "command" argument to VOP_IOCTL.
Spotted by: FlexeLint.
|
#
104908 |
|
11-Oct-2002 |
mike |
Change iov_base's type from `char *' to the standard `void *'. All uses of iov_base which assume its type is `char *' (in order to do pointer arithmetic) have been updated to cast iov_base to `char *'.
|
#
104533 |
|
05-Oct-2002 |
rwatson |
Integrate a devfs/MAC fix from the MAC tree: avoid a race condition during devfs VOP symlink creation by introducing a new entry point to determine the label of the devfs_dirent prior to allocation of a vnode for the symlink.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
104278 |
|
01-Oct-2002 |
phk |
Move the vop-vector declaration into devfs_vnops.c where it belongs.
|
#
104099 |
|
28-Sep-2002 |
phk |
Fix mis-indent.
|
#
103559 |
|
18-Sep-2002 |
njl |
Remove any VOP_PRINT that redundantly prints the tag. Move lockmgr_printinfo() into vprint() for everyone's benefit.
Suggested by: bde
|
#
103314 |
|
14-Sep-2002 |
njl |
Remove all use of vnode->v_tag, replacing with appropriate substitutes. v_tag is now const char * and should only be used for debugging.
Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP.
Suggested by: phk Reviewed by: bde, rwatson (earlier version)
|
#
101308 |
|
04-Aug-2002 |
jeff |
- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking.
Idea stolen from: BSD/OS
|
#
101195 |
|
02-Aug-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control.
Teach devfs how to respond to pathconf() _POSIX_MAC_PRESENT queries, allowing it to indicate to user processes that individual vnode labels are available.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
101191 |
|
01-Aug-2002 |
rwatson |
Hook up devfs_pathconf() for specfs devfs nodes, not just regular devfs nodes.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
101069 |
|
31-Jul-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control.
Instrument devfs to support per-dirent MAC labels. In particular, invoke MAC framework when devfs directory entries are instantiated due to make_dev() and related calls, and invoke the MAC framework when vnodes are instantiated from these directory entries. Implement vop_setlabel() for devfs, which pushes the label update into the devfs directory entry for semi-persistant store. This permits the MAC framework to assign labels to devices and directories as they are instantiated, and export access control information via devfs vnodes.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
100206 |
|
16-Jul-2002 |
dd |
Introduce the DEVFS "rule" subsystem. DEVFS rules permit the administrator to define certain properties of new devfs nodes before they become visible to the userland. Both static (e.g., /dev/speaker) and dynamic (e.g., /dev/bpf*, some removable devices) nodes are supported. Each DEVFS mount may have a different ruleset assigned to it, permitting different policies to be implemented for things like jails.
Approved by: phk
|
#
97702 |
|
01-Jun-2002 |
semenu |
Make devfs to give honour to PDIRUNLOCK flag.
Reviewed by: jeff MFC after: 1 week
|
#
96356 |
|
10-May-2002 |
mux |
Fix several bugs in devfs_lookupx(). When we check the nameiop to make sure it's a correct operation for devfs, do it only in the ISLASTCN case. If we don't, we are assuming that the final file will be in devfs, which is not true if another partition is mounted on top of devfs or with special filenames (like /dev/net/../../foo).
Reviewed by: phk
|
#
95750 |
|
29-Apr-2002 |
rwatson |
Use vnode locking with devfs; permit VFS locking assertions to make sense for devfs vnodes, and reduce/remove potential races in the devfs code.
Submitted by: iadowse Approved by: phk
|
#
93886 |
|
05-Apr-2002 |
bde |
Fixed assorted bugs in setting of timestamps in devfs_setattr().
Setting of timestamps on devices had no effect visible to userland because timestamps for devices were set in places that are never used. This broke: - update of file change time after a change of an attribute - setting of file access and modification times.
The VA_UTIMES_NULL case did not work. Revs 1.31-1.32 were supposed to fix this by copying correct bits from ufs, but had little or no effect because the old checks were not removed.
|
#
93593 |
|
01-Apr-2002 |
jhb |
Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
|
#
92727 |
|
19-Mar-2002 |
alfred |
Remove __P.
|
#
92270 |
|
14-Mar-2002 |
maxim |
Be consistent with UFS in a way how devfs_setattr() checks credentials for chmod(2), chown(2) and utimes(2) with respect to jail(2).
Reviewed by: rwatson, ru Not objected by: phk Approved by: ru
|
#
86892 |
|
25-Nov-2001 |
dd |
Address two minor issues: implement the _PC_NAME_MAX and _PC_PATH_MAX pathconf() variables for directories, and set st_size and st_blocks (of struct stat) for directories as appropriate. Note that st_size is always set to DEV_BSIZE, since the size of the directories is not currently kept.
Reviewed by: phk, bde
|
#
86040 |
|
04-Nov-2001 |
phk |
Fix "echo > /dev/null" for non-root users which broke in previous commit.
|
#
85980 |
|
03-Nov-2001 |
phk |
Use vfs_timestamp() instead of getnanotime().
Add magic stuff copied from ufs_setattr().
Instructed by: bde
|
#
84873 |
|
13-Oct-2001 |
bde |
Backed out vestiges of the quick fixes for the transient breakage of <sys/mount.h> in rev.1.106 of the latter (don't include <sys/socket.h> just to work around bugs in <sys/mount.h>).
|
#
84156 |
|
30-Sep-2001 |
phk |
The behaviour of whiteout'ing symlinks were too confusing, instead remove them when asked to.
|
#
83366 |
|
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
#
81620 |
|
14-Aug-2001 |
phk |
linux ls fails on DEVFS /dev because linux_getdents fails because linux_getdents uses VOP_READDIR( ..., &ncookies, &cookies ) instead of VOP_READDIR( ..., NULL, NULL ) because it seems to need the offsets for linux_dirent and sizeof(dirent) != sizeof(linux_dirent)...
PR: 29467 Submitted by: Michael Reifenberger <root@nihil.plaut.de> Reviewed by: phk
|
#
77589 |
|
01-Jun-2001 |
brian |
Support /dev/tun cloning. Ansify if_tun.c while I'm there.
Only tun0 -> tun32767 may now be opened as struct ifnet's if_unit is a short.
It's now possible to open /dev/tun and get a handle back for an available tun device (use devname to find out what you got).
The implementation uses rman by popular demand (and against my judgement) to track opened devices and uses the new dev_depends() to ensure that all make_dev()d devices go away before the module is unloaded.
Reviewed by: phk
|
#
77243 |
|
26-May-2001 |
phk |
Don't copy the trailing zero in readlink, it confuses namei().
PR: 27656
|
#
77050 |
|
23-May-2001 |
phk |
Change the way deletes are managed in DEVFS.
This fixes a number of warnings relating to removed cloned devices.
It also makes it possible to recreate deleted devices with mknod(2). The major/minor arguments are ignored.
|
#
76571 |
|
14-May-2001 |
phk |
After a successfull poll of the cloning functions, match on the returned dev_t rather than the original name.
This allows cloning from one name to another which is useful for /dev/tty and later for the pty's.
|
#
76554 |
|
13-May-2001 |
phk |
Convert DEVFS from an "opt-in" to an "opt-out" option.
If for some reason DEVFS is undesired, the "NODEVFS" option is needed now.
Pending any significant issues, DEVFS will be made mandatory in -current on july 1st so that we can start reaping the full benefits of having it.
|
#
76320 |
|
06-May-2001 |
phk |
Remove unneeded devfs_badop()
Noticed by: rwatson
|
#
76166 |
|
01-May-2001 |
markm |
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
|
#
76131 |
|
29-Apr-2001 |
phk |
Add a vop_stdbmap(), and make it part of the default vop vector.
Make 7 filesystems which don't really know about VOP_BMAP rely on the default vector, rather than more or less complete local vop_nopbmap() implementations.
|
#
75874 |
|
23-Apr-2001 |
mjacob |
add this ridiculous include foo so it will compile again
|
#
72637 |
|
18-Feb-2001 |
phk |
Remove a debug printf.
|
#
71945 |
|
02-Feb-2001 |
phk |
At the point in time where most devices are created, we don't know what time it is because boottime is not yet initialized. Finagle the relevant fields when we get the chance.
|
#
71936 |
|
02-Feb-2001 |
phk |
Only superuser can create symlinks. Give symlinks mode 755 by default to avoid triggering alert eyes. (the mode isn't use on symlinks)
|
#
71822 |
|
30-Jan-2001 |
phk |
Fix two minor nits.
Existences revealed, but no details offered by: bp
|
#
69767 |
|
08-Dec-2000 |
phk |
staticize.
|
#
67893 |
|
29-Oct-2000 |
phk |
Move suser() and suser_xxx() prototypes and a related #define from <sys/proc.h> to <sys/systm.h>.
Correctly document the #includes needed in the manpage.
Add one now needed #include of <sys/systm.h>. Remove the consequent 48 unused #includes of <sys/proc.h>.
|
#
66877 |
|
09-Oct-2000 |
phk |
Don't hold an extra reference to vnodes. Devfs vnodes are sufficiently cheap to setup that it doesn't really matter that we recycle device vnodes at kleenex speed.
Implement first cut try at killing cloned devices when they are not needed anymore. For now only the bpf driver is involved in this experiment. Cloned devices can set the SI_CHEAPCLONE flag which allows us to destroy_dev() it when the vcount() drops to zero and the vnode is reclaimed. For now it's a requirement that the driver doesn't keep persistent state from close to (re)open.
Some whitespace changes.
|
#
66028 |
|
18-Sep-2000 |
phk |
Ignore attempts to set flags to zero. This quenches a syslog warning from login(1).
|
#
65920 |
|
16-Sep-2000 |
phk |
Add canonical checks to devfs_setattr().
|
#
65515 |
|
06-Sep-2000 |
phk |
Add refcounts to the "global" DEVFS inode slots, this allows us to recycle inodes after a destroy_dev() but not until all mounts have picked up the change.
Add support for an overflow table for DEVFS inodes. The static table defaults to 1024 inodes, if that fills, an overflow table of 32k inodes is allocated. Both numbers can be changed at compile time, the size of the overflow table also with the sysctl vfs.devfs.noverflow.
Use atomic instructions to barrier between make_dev()/destroy_dev() and the mounts.
Add lockmgr() locking of directories for operations accessing or modifying the directory TAILQs.
Various nitpicking here and there.
|
#
65374 |
|
02-Sep-2000 |
phk |
Avoid the modules madness I inadvertently introduced by making the cloning infrastructure standard in kern_conf. Modules are now the same with or without devfs support.
If you need to detect if devfs is present, in modules or elsewhere, check the integer variable "devfs_present".
This happily removes an ugly hack from kern/vfs_conf.c.
This forces a rename of the eventhandler and the standard clone helper function.
Include <sys/eventhandler.h> in <sys/conf.h>: it's a helper #include like <sys/queue.h>
Remove all #includes of opt_devfs.h they no longer matter.
|
#
65200 |
|
29-Aug-2000 |
rwatson |
o Restructure vaccess() so as to check for DAC permission to modify the object before falling back on privilege. Make vaccess() accept an additional optional argument, privused, to determine whether privilege was required for vaccess() to return 0. Add commented out capability checks for reference. Rename some variables to make it more clear which modes/uids/etc are associated with the object, and which with the access mode. o Update file system use of vaccess() to pass NULL as the optional privused argument. Once additional patches are applied, suser() will no longer set ASU, so privused will permit passing of privilege information up the stack to the caller.
Reviewed by: bde, green, phk, -security, others Obtained from: TrustedBSD Project
|
#
65132 |
|
27-Aug-2000 |
phk |
Reorder vop's alphabetically. Smarter use of devfs_allocv() (from bp@) Introduce devfs_find() ".." fixes to devfs_lookup (from bp@)
|
#
65118 |
|
26-Aug-2000 |
phk |
Minor cleanups tp devfs_readdir(); Add devfs_read() for directories. (inspired by bp@)
|
#
65051 |
|
24-Aug-2000 |
phk |
Fix panic when removing open device (found by bp@) Implement subdirs. Build the full "devicename" for cloning functions. Fix panic when deleted device goes away. Collaps devfs_dir and devfs_dirent structures. Add proper cloning to the /dev/fd* "device-"driver. Fix a bug in make_dev_alias() handling which made aliases appear multiple times. Use devfs_clone to implement getdiskbyname() Make specfs maintain the stat(2) timestamps per dev_t
|
#
64895 |
|
21-Aug-2000 |
phk |
Fix devfs_access() bug on directories.
Remove unused #includes.
Bug spotted by: markm
|
#
64880 |
|
20-Aug-2000 |
phk |
Remove all traces of Julians DEVFS (incl from kern/subr_diskslice.c)
Remove old DEVFS support fields from dev_t.
Make uid, gid & mode members of dev_t and set them in make_dev().
Use correct uid, gid & mode in make_dev in disk minilayer.
Add support for registering alias names for a dev_t using the new function make_dev_alias(). These will show up as symlinks in DEVFS.
Use makedev() rather than make_dev() for MFSs magic devices to prevent DEVFS from noticing this abuse.
Add a field for DEVFS inode number in dev_t.
Add new DEVFS in fs/devfs.
Add devfs cloning to: disk minilayer (ie: ad(4), sd(4), cd(4) etc etc) md(4), tun(4), bpf(4), fd(4)
If DEVFS add -d flag to /sbin/inits args to make it mount devfs.
Add commented out DEVFS to GENERIC
|