History log of /freebsd-current/sys/fs/devfs/devfs_vnops.c
Revision Date Author Comments
# 6d79564f 07-May-2024 Konstantin Belousov <kib@FreeBSD.org>

devfs_allocv(): style

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# d3efbe01 21-Mar-2024 Konstantin Belousov <kib@FreeBSD.org>

cdevpriv(9): add iterator

Reviewed by: christos
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D44469


# 5c41d888 19-Jan-2024 Konstantin Belousov <kib@FreeBSD.org>

kcmp(2): implement for devfs files

Compare not vnodes, which are different between mount points, but
actual cdev referenced by the devfs node.

Reviewed by: brooks, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43518


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 67864268 19-Sep-2023 Jason A. Harmening <jah@FreeBSD.org>

devfs: add integrity asserts for cdevp_list

It's possible for misuse of cdev KPIs or for bugs in devfs itself to
result in e.g. a cdev object's container being freed while still on the
global list used to populate each devfs mount; see PR 273418 for a
recent example.

Since a node may be marked inactive well before it is reaped from the
list, add a new flag solely to track list membership, and employ it in
some basic list integrity assertions to catch bad actors.

Discussed with: kib, mjg
MFC after: 1 week


# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 829f0bcb 19-Dec-2022 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add the concept of vnode state transitions

To quote from a comment above vput_final:
<quote>
* XXX Some filesystems pass in an exclusively locked vnode and strongly depend
* on the lock being held all the way until VOP_INACTIVE. This in particular
* happens with UFS which adds half-constructed vnodes to the hash, where they
* can be found by other code.
</quote>

As is there is no mechanism which allows filesystems to denote that a
vnode is fully initialized, consequently problems like the above are
only found the hard way(tm).

Add rudimentary support for state transitions, which in particular allow
to assert the vnode is not legally unlocked until its fate is decided
(either construction finishes or vgone is called to abort it).

The new field lands in a 1-byte hole, thus it does not grow the struct.

Bump __FreeBSD_version to 1400077

Reviewed by: kib (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D37759


# 5b5b7e2c 17-Sep-2022 Mateusz Guzik <mjg@FreeBSD.org>

vfs: always retain path buffer after lookup

This removes some of the complexity needed to maintain HASBUF and
allows for removing injecting SAVENAME by filesystems.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D36542


# 497240de 19-Aug-2022 Mateusz Guzik <mjg@FreeBSD.org>

Retire clone_drain_lock

It is only ever xlocked in drain_dev_clone_events and the only consumer of
that routine does not need it -- eventhandler code already makes sure the
relevant callback is no longer running.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D36268


# 8ea3ceda 06-Feb-2022 Gordon Bergling <gbe@FreeBSD.org>

fs: fix a few common typos in source code comments

- s/quadradically/quadratically/
- s/persistant/persistent/

Obtained from: NetBSD
MFC after: 3 days


# 66c5fbca 27-Jan-2022 Konstantin Belousov <kib@FreeBSD.org>

insmntque1(): remove useless arguments

Also remove once-used functions to clean up after failed insmntque1(),
which were destructor callbacks in previous life.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D34071


# 2a7e4cf8 27-Jan-2022 Mateusz Guzik <mjg@FreeBSD.org>

Revert b58ca5df0bb7 ("vfs: remove the now unused insmntque1")

I was somehow convinced that insmntque calls insmntque1 with a NULL
destructor. Unfortunately this worked well enough to not immediately
blow up in simple testing.

Keep not using the destructor in previously patched filesystems though
as it avoids unnecessary casts.

Noted by: kib
Reported by: pho


# 3af3e99c 26-Jan-2022 Mateusz Guzik <mjg@FreeBSD.org>

devfs: stop using insmntque1

It adds nothing of value over insmntque.


# 3ffcfa59 26-Nov-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add vop_stdadd_writecount_nomsync

This avoids needing to inspect the mount point every time.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D33125


# 2b68eb8e 01-Oct-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove thread argument from VOP_STAT

and fo_stat.


# b4a58fbf 01-Oct-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove cn_thread

It is always curthread.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D32453


# 243b324f 02-May-2021 Mark Johnston <markj@FreeBSD.org>

devfs: Avoid comparison with an uninitialized var in devfs_fp_check()

devvn_refthread() will initialize *devp only if it succeeds, so check for
success before comparing with fp->f_data. Other devvn_refthread()
callers are careful to do this.

Reported by: KMSAN
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30068


# 3bc17248 08-Feb-2021 Mateusz Guzik <mjg@FreeBSD.org>

devfs: fix use count leak when using TIOCSCTTY

by matching devfs_ctty_ref

Fixes: 3b44443626603f65 ("devfs: rework si_usecount to track opens")


# 3b444436 11-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

devfs: rework si_usecount to track opens

This removes a lot of special casing from the VFS layer.

Reviewed by: kib (previous version)
Tested by: pho (previous version)
Differential Revision: https://reviews.freebsd.org/D25612


# 7b19bdda 10-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

devfs: save on spurious relocking for devfs_populate

Tested by: pho


# f8935a96 10-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

devfs: use cheaper lockmgr entry points

Tested by: pho


# f9c13ab8 10-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

devfs: use vget_prep/vget_finish

Tested by: pho


# d292b194 05-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove the obsolete privused argument from vaccess

This brings argument count down to 6, which is passable without the
stack on amd64.


# 11c345b1 04-Jul-2020 Mateusz Guzik <mjg@FreeBSD.org>

devfs: fix a vnode use-after-free in devfs_ioctl

The vnode to be replaced was read with a shared lock, meaning 2 racing threads
can find the same one.

While here clean it up a little bit.


# f2706588 21-Jun-2020 Thomas Munro <tmunro@FreeBSD.org>

vfs: track sequential reads and writes separately

For software like PostgreSQL and SQLite that sometimes reads sequentially
while also writing sequentially some distance behind with interleaved
syscalls on the same fd, performance is better on UFS if we do
sequential access heuristics separately for reads and writes.

Patch originally by Andrew Gierth in 2008, updated and proposed by me with
his permission.

Reviewed by: mjg, kib, tmunro
Approved by: mjg (mentor)
Obtained from: Andrew Gierth <andrew@tao11.riddles.org.uk>
Differential Revision: https://reviews.freebsd.org/D25024


# f1fa1ba3 03-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

Fix up various vnode-related asserts which did not dump the used vnode


# 6a5abb1e 02-Feb-2020 Kyle Evans <kevans@FreeBSD.org>

Provide O_SEARCH

O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping
permissions checks on the directory itself after the initial open(). This is
close to the semantics we've historically applied for O_EXEC on a directory,
which is UB according to POSIX. Conveniently, O_SEARCH on a file is also
explicitly undefined behavior according to POSIX, so O_EXEC would be a fine
choice. The spec goes on to state that O_SEARCH and O_EXEC need not be
distinct values, but they're not defined to be the same value.

This was pointed out as an incompatibility with other systems that had made
its way into libarchive, which had assumed that O_EXEC was an alias for
O_SEARCH.

This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC
respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a
directory is checked in vn_open_vnode already, so for completeness we add a
NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not
re-check that when descending in namei.

[0] https://pubs.opengroup.org/onlinepubs/9699919799/

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23247


# 45757984 01-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: consistently use size_t for buflen around VOP_VPTOCNP


# b249ce48 03-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop the mostly unused flags argument from VOP_UNLOCK

Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427


# 6fa079fc 15-Dec-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: flatten vop vectors

This eliminates the following loop from all VOP calls:

while(vop != NULL && \
vop->vop_spare2 == NULL && vop->vop_bypass == NULL)
vop = vop->vop_default;

Reviewed by: jeff
Tesetd by: pho
Differential Revision: https://reviews.freebsd.org/D22738


# abd80ddb 08-Dec-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: introduce v_irflag and make v_type smaller

The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by: kib, jeff
Differential Revision: https://reviews.freebsd.org/D22715


# 1b50b999 30-Nov-2019 Kyle Evans <kevans@FreeBSD.org>

tty: implement TIOCNOTTY

Generally, it's preferred that an application fork/setsid if it doesn't want
to keep its controlling TTY, but it could be that a debugger is trying to
steal it instead -- so it would hook in, drop the controlling TTY, then do
some magic to set things up again. In this case, TIOCNOTTY is quite handy
and still respected by at least OpenBSD, NetBSD, and Linux as far as I can
tell.

I've dropped the note about obsoletion, as I intend to support TIOCNOTTY as
long as it doesn't impose a major burden.

Reviewed by: bcr (manpages), kib
Differential Revision: https://reviews.freebsd.org/D22572


# 1fccb43c 19-Nov-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: change si_usecount management to count used vnodes

Currently si_usecount is effectively a sum of usecounts from all associated
vnodes. This is maintained by special-casing for VCHR every time usecount is
modified. Apart from complicating the code a little bit, it has a scalability
impact since it forces a read from a cacheline shared with said count.

There are no consumers of the feature in the ports tree. In head there are only
2: revoke and devfs_close. Both can get away with a weaker requirement than the
exact usecount, namely just the count of active vnodes. Changing the meaning to
the latter means we only need to modify it on 0<->1 transitions, avoiding the
check plenty of times (and entirely in something like vrefact).

Reviewed by: kib, jeff
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22202


# e3fdd051 11-Oct-2019 Konstantin Belousov <kib@FreeBSD.org>

devfs_vptocnp(): correct the component name when node is not at top.

Node' cdp.si_name is the full path as provided by make_dev(9), it
should not be returned by VOP_VPTOCNP() when only the last component
is requested. Use the dirent entry instead.

With this note, handling of VDIR and VCHR nodes only differs in
handling of root vnode, which simplifies and unifies the logic.

Reported by: Li, Zhichao1 <Zhichao_Li1@Dell.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# dfa8dae4 05-Oct-2019 Mateusz Guzik <mjg@FreeBSD.org>

devfs: plug redundant bwillwrite avoidance

vn_write already checks for vnode type to see if bwillwrite should be called.

This effectively reverts r244643.

Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21905


# 6470c8d3 29-Aug-2019 Konstantin Belousov <kib@FreeBSD.org>

Rework v_object lifecycle for vnodes.

Current implementation of vnode_create_vobject() and
vnode_destroy_vobject() is written so that it prepared to handle the
vm object destruction for live vnode. Practically, no filesystems use
this, except for some remnants that were present in UFS till today.
One of the consequences of that model is that each filesystem must
call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result
all of them get rid of the v_object in reclaim.

Move the call to vnode_destroy_vobject() to vgonel() before
VOP_RECLAIM(). This makes v_object stable: either the object is NULL,
or it is valid vm object till the vnode reclamation. Remove code from
vnode_create_vobject() to handle races with the parallel destruction.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21412


# 91898857 29-Jul-2019 Mark Johnston <markj@FreeBSD.org>

Avoid relying on header pollution from sys/refcount.h.

MFC after: 3 days
Sponsored by: The FreeBSD Foundation


# e2e050c8 19-May-2019 Conrad Meyer <cem@FreeBSD.org>

Extract eventfilter declarations to sys/_eventfilter.h

This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.


# 1c4ca778 14-Nov-2018 Konstantin Belousov <kib@FreeBSD.org>

Add d_off support for multiple filesystems.

The d_off field has been added to the dirent structure recently.
Currently filesystems don't support this feature. Support has been
added and tested for zfs, ufs, ext2fs, fdescfs, msdosfs and unionfs.
A stub implementation is available for cd9660, nandfs, udf and
pseudofs but hasn't been tested.

Motivation for this feature: our usecase is for a userspace nfs server
(nfs-ganesha) with zfs. At the moment we cache direntry offsets by
calling lseek once per entry, with this patch we can get the offset
directly from getdirentries(2) calls which provides a significant
speedup.

Submitted by: Jack Halford <jack@gandi.net>
Reviewed by: mckusick, pfg, rmacklem (previous versions)
Sponsored by: Gandi.net
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D17917


# ed34a7fc 26-Oct-2018 Brooks Davis <brooks@FreeBSD.org>

Move 32-bit compat support for FIODGNAME to the right place.

ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.

The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other member which requires no translation.

Unlike r339174 this change supports both places FIODGNAME is handled.

Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17475


# 9bc603bd 04-Oct-2018 Brooks Davis <brooks@FreeBSD.org>

Revert r339174: Move 32-bit compat support for FIODGNAME to the right place.

A case was missed in this commit which breaks sshing into a 32-bit sshd
on a 64-bit system.

Approved by: re (gjb)


# 23f2e228 03-Oct-2018 Brooks Davis <brooks@FreeBSD.org>

Move 32-bit compat support for FIODGNAME to the right place.

ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.

The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other member which requires no translation.

Reviewed by: kib
Approved by: re (rgrimes, gjb)
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Review: https://reviews.freebsd.org/D17388


# f6e25ec7 19-Dec-2017 John Baldwin <jhb@FreeBSD.org>

Report INT_MAX for LINK_MAX for devfs' VOP_PATHCONF().

devfs uses int's for link counts internally and already reports the the
full link count via stat() post ino64.

Sponsored by: Chelsio Communications


# 418e6212 19-Dec-2017 John Baldwin <jhb@FreeBSD.org>

Handle _PC_FILESIZEBITS and _PC_SYMLINK_MAX for devfs' VOP_PATHCONF().

MFC after: 1 month
Sponsored by: Chelsio Communications


# 599afe53 19-Dec-2017 John Baldwin <jhb@FreeBSD.org>

Move NAME_MAX, LINK_MAX, and CHOWN_RESTRICTED out of vop_stdpathconf().

Having all filesystems fall through to default values isn't always correct
and these values can vary for different filesystem implementations. Most
of these changes just use the existing default values with a few exceptions:
- Don't report CHOWN_RESTRICTED for ZFS since it doesn't do the exact
permissions check this claims for chown().
- Use NANDFS_NAME_LEN for NAME_MAX for nandfs.
- Don't report a LINK_MAX of 0 on smbfs. Now fail with EINVAL to
indicate hard links aren't supported.

Requested by: bde (though perhaps not this exact implementation)
Reviewed by: kib (earlier version)
MFC after: 1 month
Sponsored by: Chelsio Communications


# 83342922 14-Dec-2017 Konstantin Belousov <kib@FreeBSD.org>

In devfs_lookupx() dotdot lookup case, avoid dereferencing
dvp->v_mount after dvp is unlocked.

The vnode might be reclaimed after unlock, so v_mount becomes NULL.
Cache the struct mount pointer before the unlock, the struct is
type-stable.

Note that devfs_allocv() reads mp->mnt_data but does not operate on it
further when dirent is doomed. The unmount cannot proceed until all
dirents are reclaimed.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# d63027b6 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/fs: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# e1d15b89 21-Sep-2017 John Baldwin <jhb@FreeBSD.org>

Only handle _PC_MAX_CANON, _PC_MAX_INPUT, and _PC_VDISABLE for TTY devices.

Move handling of these three pathconf() variables out of vop_stdpathconf()
and into devfs_pathconf() as TTY devices can only be devfs files. In
addition, only return settings for these three variables for devfs devices
whose device switch has the D_TTY flag set.

Discussed with: bde, kib
Sponsored by: Chelsio Communications


# 69921123 23-May-2017 Konstantin Belousov <kib@FreeBSD.org>

Commit the 64-bit inode project.

Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment. Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks. Unfortunately, not everything can be
fixed, especially outside the base system. For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING. Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver. Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by: The FreeBSD Foundation (emaste, kib)
Differential revision: https://reviews.freebsd.org/D10439


# 1a2dbd1a 20-Feb-2017 Edward Tomasz Napierala <trasz@FreeBSD.org>

Simplify devfs_fsync() by removing it. This might also be a minor
optimization, as vn_isdisk() needs to lock a global mutex.

Reviewed by: imp
Tested by: pho
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D9628


# ecc6c515 19-Feb-2017 Konstantin Belousov <kib@FreeBSD.org>

Apply noexec mount option for mmap(PROT_EXEC).

Right now the noexec mount option disallows image activators to try
execve the files on the mount point. Also, after r127187, noexec
also limits max_prot map entries permissions for mappings of files
from such mounts, but not the actual mapping permissions.

As result, the API behaviour is inconsistent. The files from noexec
mount can be mapped with PROT_EXEC, but if mprotect(2) drops execution
permission, it cannot be re-enabled later. Make this consistent
logically and aligned with behaviour of other systems, by disallowing
PROT_EXEC for mmap(2).

Note that this change only ensures aligned results from mmap(2) and
mprotect(2), it does not prevent actual code execution from files
coming from noexec mount. Such files can always be read into
anonymous executable memory and executed from there.

Reported by: shamaz.mazum@gmail.com
PR: 217062
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# a3b4ab91 15-Feb-2017 Edward Tomasz Napierala <trasz@FreeBSD.org>

Change the "devfs_fsync: vop_stdfsync failed" from panic to a printf.
It's not a proper fix, but should be better than what we have now.
Since it got broken some six months ago it results in an incredibly
annoying and trivially reproducible panic every time eg an USB disk
gets disconnected.

MFC after: 2 weeks
Sponsored by: DARPA, AFRL


# 584b675e 27-Jul-2016 Konstantin Belousov <kib@FreeBSD.org>

Hide the boottime and bootimebin globals, provide the getboottime(9)
and getboottimebin(9) KPI. Change consumers of boottime to use the
KPI. The variables were renamed to avoid shadowing issues with local
variables of the same name.

Issue is that boottime* should be adjusted from tc_windup(), which
requires them to be members of the timehands structure. As a
preparation, this commit only introduces the interface.

Some uses of boottime were found doubtful, e.g. NLM uses boottime to
identify the system boot instance. Arguably the identity should not
change on the leap second adjustment, but the commit is about the
timekeeping code and the consumers were kept bug-to-bug compatible.

Tested by: pho (as part of the bigger patch)
Reviewed by: jhb (same)
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
X-Differential revision: https://reviews.freebsd.org/D7302


# af326ace 25-Jul-2016 Conrad Meyer <cem@FreeBSD.org>

devfs: Move most ioctl logic down to vnode layer

Devfs' file layer ioctl is now just a thin shim around the vnode layer.

Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D7286


# 2d5bba3a 15-Jun-2016 Konstantin Belousov <kib@FreeBSD.org>

Another follow-up to r291460. Only access vp->v_rdev for VCHR vnodes
in devfs_reclaim().

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
MFC after: 1 week


# b114da42 27-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/devfs: unsign an index to prevent signed integer overflow.

cdp_maxdirent in struct:cdev_priv is of type u_int. Use the same
type for the corresponding index in devfs_revoke().

MFC after: 1 week


# aeace3c3 17-Jan-2016 Konstantin Belousov <kib@FreeBSD.org>

Assert that the linkage between struct cdev_privdata and and struct
file is consistent.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# a53b7c69 13-Jan-2016 Konstantin Belousov <kib@FreeBSD.org>

Make devfs_fpdrop() static. It was not a public KPI, and it has no
reason to remain exported for some time.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# fb57d63e 02-Jan-2016 Konstantin Belousov <kib@FreeBSD.org>

Hide transient EBADF errors caused by the parallel revoke(2) or forced
unmount of devfs mounts, by restarting the failed syscall.

When restarted, failing syscalls eventually either stop finding the
node and returning ENOENT, or the vnode op vectors finally transition
to the deadfs vop. The later return EIO or other error, more
appropriate for the operation.

Submitted by: bde
Tested by: pho
MFC after: 3 weeks


# d52aff3c 01-Jan-2016 Konstantin Belousov <kib@FreeBSD.org>

Minor style cleanup.

Submitted by: bde
MFC after: 1 week


# cccac8a1 22-Dec-2015 Konstantin Belousov <kib@FreeBSD.org>

Make it possible for the cdevsw d_close() driver method to detect last
close and close due to revoke(2)-like operation.

A new FLASTCLOSE flag indicates that this is last close. FREVOKE is
set for revokes, and FNONBLOCK is also set, same as is already done
for VOP_CLOSE() call from vgonel().

The flags reuse user open(2) flags which are never stored in f_flag,
to not consume bit space in the ABI visible way. Assert this with the
static check.

Requested and reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# b63d070a 22-Dec-2015 Konstantin Belousov <kib@FreeBSD.org>

Keep devfs mount locked for the whole duration of the devfs_setattr(),
and ensure that our dirent is instantiated.

Reported and tested by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 8d7e0f58 02-Dec-2015 John Baldwin <jhb@FreeBSD.org>

The cdevpriv_dtr_t typedef was not able to be used in a function prototype
like the various d_*_t typedefs since it declared a function pointer rather
than a function. Add a new d_priv_dtor_t typedef that declares the function
and can be used as a function prototype. The previous typedef wasn't
useful outside of the cdevpriv implementation, so retire it.

The name d_priv_dtor_t was chosen to be more consistent with cdev methods
since it is commonly used in place of d_close_t even though it is not a
direct pointer in struct cdevsw.

Reviewed by: kib, imp
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D4340


# 6e572e08 23-Aug-2015 Edward Tomasz Napierala <trasz@FreeBSD.org>

After r286237 it should be fine to call vgone(9) on a busy GEOM vnode;
remove KASSERT that would prevent forced devfs unmount from working.

MFC after: 1 month
Sponsored by: The FreeBSD Foundation


# fada4adf 06-Aug-2015 John Baldwin <jhb@FreeBSD.org>

The changes that introduced fo_mmap() treated all character device
mappings as if MAP_SHARED was always present since in general MAP_PRIVATE
is not permitted for character devices. However, there is one exception
in that MAP_PRIVATE mappings are permitted for /dev/zero.

Only require a writable file descriptor (FWRITE) for shared, writable
mappings of character devices. vm_mmap_cdev() will reject any private
mappings for other devices.

Reviewed by: kib
Reported by: sbruno (broke qemu cross-builds), peter
Differential Revision: https://reviews.freebsd.org/D3316


# 7077c426 04-Jun-2015 John Baldwin <jhb@FreeBSD.org>

Add a new file operations hook for mmap operations. File type-specific
logic is now placed in the mmap hook implementation rather than requiring
it to be placed in sys/vm/vm_mmap.c. This hook allows new file types to
support mmap() as well as potentially allowing mmap() for existing file
types that do not currently support any mapping.

The vm_mmap() function is now split up into two functions. A new
vm_mmap_object() function handles the "back half" of vm_mmap() and accepts
a referenced VM object to map rather than a (handle, handle_type) tuple.
vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a
a VM object and then calling vm_mmap_object() to handle the actual mapping.
The vm_mmap() function remains for use by other parts of the kernel
(e.g. device drivers and exec) but now only supports mapping vnodes,
character devices, and anonymous memory.

The mmap() system call invokes vm_mmap_object() directly with a NULL object
for anonymous mappings. For mappings using a file descriptor, the
descriptors fo_mmap() hook is invoked instead. The fo_mmap() hook is
responsible for performing type-specific checks and adjustments to
arguments as well as possibly modifying mapping parameters such as flags
or the object offset. The fo_mmap() hook routines then call
vm_mmap_object() to handle the actual mapping.

The fo_mmap() hook is optional. If it is not set, then fo_mmap() will
fail with ENODEV. A fo_mmap() hook is implemented for regular files,
character devices, and shared memory objects (created via shm_open()).

While here, consistently use the VM_PROT_* constants for the vm_prot_t
type for the 'prot' variable passed to vm_mmap() and vm_mmap_object()
as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines.
Previously some places were using the mmap()-specific PROT_* constants
instead. While this happens to work because PROT_xx == VM_PROT_xx,
using VM_PROT_* is more correct.

Differential Revision: https://reviews.freebsd.org/D2658
Reviewed by: alc (glanced over), kib
MFC after: 1 month
Sponsored by: Chelsio


# bda2eb9a 01-Apr-2015 Konstantin Belousov <kib@FreeBSD.org>

Refine r280308. Do not completely disable timestamping of devfs nodes
on reads or writes, the time marks are used to display idle time by
w(1) [1]. Instead, use vfs.devfs.dotimes as the selector of default
precision vs. using time_second. The later gives seconds precision,
which is good enough for the purpose.

Note that timestamp updates are unlocked and the updates itself, as
well as the check in devfs_timestamp, are non-atomic.

Noted by: truckman [1]
Reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 4f9343fc 20-Mar-2015 Xin LI <delphij@FreeBSD.org>

Disable timestamping on devfs read/write operations by default.

Currently we update timestamps unconditionally when doing read or
write operations. This may slow things down on hardware where
reading timestamps is expensive (e.g. HPET, because of the default
vfs.timestamp_precision setting is nanosecond now) with limited
benefit.

A new sysctl variable, vfs.devfs.dotimes is added, which can be
set to non-zero value when the old behavior is desirable.

Differential Revision: https://reviews.freebsd.org/D2104
Reported by: Mike Tancsa <mike sentex net>
Reviewed by: kib
Relnotes: yes
Sponsored by: iXsystems, Inc.
MFC after: 2 weeks


# 08189ed6 27-Feb-2015 Konstantin Belousov <kib@FreeBSD.org>

The VNASSERT in vflush() FORCECLOSE case is trying to panic early to
prevent errors from yanking devices out from under filesystems. Only
care about special vnodes on devfs, special nodes on other kinds of
filesystems do not have special properties.

Sponsored by: EMC / Isilon Storage Division
Submitted by: Conrad Meyer
MFC after: 1 week


# a57a934a 19-Jan-2015 Konstantin Belousov <kib@FreeBSD.org>

Ignore devfs directory entries for devices either being destroyed or
delisted. The check is racy.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 4dba07b2 02-Nov-2014 Mateusz Guzik <mjg@FreeBSD.org>

Fix up some session-related races in devfs.

One was introduced with r272596, the rest was there to begin with.

Noted by: jhb


# f12aa60c 15-Oct-2014 Konstantin Belousov <kib@FreeBSD.org>

When vnode bypass cannot be performed on the cdev file descriptor for
read/write/poll/ioctl, call standard vnode filedescriptor fop. This
restores the special handling for terminals by calling the deadfs VOP,
instead of always returning ENXIO for destroyed devices or revoked
terminals.

Since destroyed (and not revoked) device would use devfs_specops VOP
vector, make dead_read/write/poll non-static and fill VOP table with
pointers to the functions, to instead of VOP_PANIC.

Noted and reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 3a222fe0 06-Oct-2014 Mateusz Guzik <mjg@FreeBSD.org>

devfs: tidy up after 272596

This moves a var to an if statement, no functional changes.

MFC after: 1 week


# 1d1b55fb 06-Oct-2014 Mateusz Guzik <mjg@FreeBSD.org>

devfs: don't take proctree_lock unconditionally in devfs_close

MFC after: 1 week


# 9696feeb 22-Sep-2014 John Baldwin <jhb@FreeBSD.org>

Add a new fo_fill_kinfo fileops method to add type-specific information to
struct kinfo_file.
- Move the various fill_*_info() methods out of kern_descrip.c and into the
various file type implementations.
- Rework the support for kinfo_ofile to generate a suitable kinfo_file object
for each file and then convert that to a kinfo_ofile structure rather than
keeping a second, different set of code that directly manipulates
type-specific file information.
- Remove the shm_path() and ksem_info() layering violations.

Differential Revision: https://reviews.freebsd.org/D775
Reviewed by: kib, glebius (earlier version)


# 7b81a399 17-Jun-2014 Konstantin Belousov <kib@FreeBSD.org>

In msdosfs_setattr(), add a check for result of the utimes(2)
permissions test, forgotten in r164033.

Refactor the permission checks for utimes(2) into vnode helper
function vn_utimes_perm(9), and simplify its code comparing with the
UFS origin, by writing the call to VOP_ACCESSX only once. Use the
helper for UFS(5), tmpfs(5), devfs(5) and msdosfs(5).

Reported by: bde
Reviewed by: bde, trasz
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# bf3e483b 15-Oct-2013 Konstantin Belousov <kib@FreeBSD.org>

Similar to debug.iosize_max_clamp sysctl, introduce
devfs_iosize_max_clamp sysctl, which allows/disables SSIZE_MAX-sized
i/o requests on the devfs files.

Sponsored by: The FreeBSD Foundation
Reminded by: Dmitry Sivachenko <trtrmitya@gmail.com>
MFC after: 1 week


# 64548150 15-Oct-2013 Konstantin Belousov <kib@FreeBSD.org>

Remove two instances of ARGSUSED comment, and wrap lines nearby the
code that is to be changed.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# c0a46535 21-Aug-2013 Konstantin Belousov <kib@FreeBSD.org>

Make the seek a method of the struct fileops.

Tested by: pho
Sponsored by: The FreeBSD Foundation


# b1dd38f4 16-Aug-2013 Konstantin Belousov <kib@FreeBSD.org>

Restore the previous sendfile(2) behaviour on the block devices.
Provide valid .fo_sendfile method for several missed struct fileops.

Reviewed by: glebius
Sponsored by: The FreeBSD Foundation


# 2ca49983 07-Feb-2013 Konstantin Belousov <kib@FreeBSD.org>

Stop translating the ERESTART error from the open(2) into EINTR.
Posix requires that open(2) is restartable for SA_RESTART.

For non-posix objects, in particular, devfs nodes, still disable
automatic restart of the opens. The open call to a driver could have
significant side effects for the hardware.

Noted and reviewed by: jilles
Discussed with: bde
MFC after: 2 weeks


# ad9789f6 23-Dec-2012 Konstantin Belousov <kib@FreeBSD.org>

Do not force a writer to the devfs file to drain the buffer writes.

Requested and tested by: Ian Lepore <freebsd@damnhippie.dyndns.org>
MFC after: 2 weeks


# 07da61a6 15-Aug-2012 Hans Petter Selasky <hselasky@FreeBSD.org>

Streamline use of cdevpriv and correct some corner cases.

1) It is not useful to call "devfs_clear_cdevpriv()" from
"d_close" callbacks, hence for example read, write, ioctl and
so on might be sleeping at the time of "d_close" being called
and then then freed private data can still be accessed.
Examples: dtrace, linux_compat, ksyms (all fixed by this patch)

2) In sys/dev/drm* there are some cases in which memory will
be freed twice, if open fails, first by code in the open
routine, secondly by the cdevpriv destructor. Move registration
of the cdevpriv to the end of the drm open routines.

3) devfs_clear_cdevpriv() is not called if the "d_open" callback
registered cdevpriv data and the "d_open" callback function
returned an error. Fix this.

Discussed with: phk
MFC after: 2 weeks


# c5c1199c 02-Jul-2012 Konstantin Belousov <kib@FreeBSD.org>

Extend the KPI to lock and unlock f_offset member of struct file. It
now fully encapsulates all accesses to f_offset, and extends f_offset
locking to other consumers that need it, in particular, to lseek() and
variants of getdirentries().

Ensure that on 32bit architectures f_offset, which is 64bit quantity,
always read and written under the mtxpool protection. This fixes
apparently easy to trigger race when parallel lseek()s or lseek() and
read/write could destroy file offset.

The already broken ABI emulations, including iBCS and SysV, are not
converted (yet).

Tested by: pho
No objections from: jhb
MFC after: 3 weeks


# d499701b 24-May-2012 Alexander Motin <mav@FreeBSD.org>

Revert devfs part of r235911. I was unaware about old but unfinished
discussion between kib@ and gibbs@ about it.


# f6ad3f23 24-May-2012 Alexander Motin <mav@FreeBSD.org>

MFprojects/zfsd:
Revamp the CAM enclosure services driver.
This updated driver uses an in-kernel daemon to track state changes and
publishes physical path location information\for disk elements into the
CAM device database.

Sponsored by: Spectra Logic Corporation
Sponsored by: iXsystems, Inc.
Submitted by: gibbs, will, mav


# 526d0bd5 20-Feb-2012 Konstantin Belousov <kib@FreeBSD.org>

Fix found places where uio_resid is truncated to int.

Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with: bde, das (previous versions)
MFC after: 1 month


# e517e6f1 09-Dec-2011 John Baldwin <jhb@FreeBSD.org>

Explicitly use curthread while manipulating td_fpop during last close
of a devfs file descriptor in devfs_close_f(). The passed in td argument
may be NULL if the close was invoked by garbage collection of open
file descriptors in pending control messages in the socket buffer of a
UNIX domain socket after it was closed.

PR: kern/151758
Submitted by: Andrey Shidakov andrey shidakov ru
Submitted by: Ruben van Staveren ruben verweg com
Reviewed by: kib
MFC after: 2 weeks


# f82360ac 19-Nov-2011 Konstantin Belousov <kib@FreeBSD.org>

Existing VOP_VPTOCNP() interface has a fatal flow that is critical for
nullfs. The problem is that resulting vnode is only required to be
held on return from the successfull call to vop, instead of being
referenced.

Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination
with the VOP_VPTOCNP() interface means that the directory vnode
returned from VOP_VPTOCNP() is reclaimed in advance, causing
vn_fullpath() to error with EBADF or like.

Change the interface for VOP_VPTOCNP(), now the dvp must be
referenced. Convert all in-tree implementations of VOP_VPTOCNP(),
which is trivial, because vhold(9) and vref(9) are similar in the
locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(),
if any, should have no trouble with the fix.

Tested by: pho
Reviewed by: mckusick
MFC after: 3 weeks (subject of re approval)


# dccc45e4 03-Nov-2011 John Baldwin <jhb@FreeBSD.org>

Move the cleanup of f_cdevpriv when the reference count of a devfs
file descriptor drops to zero out of _fdrop() and into devfs_close_f()
as it is only relevant for devfs file descriptors.

Reviewed by: kib
MFC after: 1 week


# 1fef78c3 03-Nov-2011 Konstantin Belousov <kib@FreeBSD.org>

Fix kernel panic when d_fdopen csw method is called for NULL fp.
This may happen when kernel consumer calls VOP_OPEN().

Reported by: Tavis Ormandy <taviso cmpxchg8b com> through delphij
MFC after: 3 days


# 9c00bb91 16-Aug-2011 Konstantin Belousov <kib@FreeBSD.org>

Add the fo_chown and fo_chmod methods to struct fileops and use them
to implement fchown(2) and fchmod(2) support for several file types
that previously lacked it. Add MAC entries for chown/chmod done on
posix shared memory and (old) in-kernel posix semaphores.

Based on the submission by: glebius
Reviewed by: rwatson
Approved by: re (bz)


# 7cc13fcf 15-Aug-2011 Justin T. Gibbs <gibbs@FreeBSD.org>

Correct the rendering of device aliases that reside in subdirectories
of a devfs.

devfs/devfs_vnops.c:
In devfs_readlink(), convert the devfs root relative path
of an alias's parent, that is recorded in the alias, into a
fully qualified path that includes the root of the containing
devfs. This avoids the ugliness of generating a relative path
by prepending "../"'s. For a non-jailed process, the "symlink
root" is the devfs's mount point. For a jailed process, we
must remove any jail prefix in the mount point so that our
response matches the user process's world view.

Sponsored by: Spectra Logic Corporation


# 724ce55b 13-Jul-2011 Konstantin Belousov <kib@FreeBSD.org>

While fixing the looping of a thread while devfs vnode is reclaimed,
r179247 introduced a possibility of devfs_allocv() returning spurious
ENOENT. If the vnode is selected by vnlru daemon for reclamation, then
devfs_allocv() can get ENOENT from vget() due to devfs_close() dropping
vnode lock around the call to cdevsw d_close method.

Use LK_RETRY in the vget() call, and do some part of the devfs_reclaim()
work in devfs_allocv(), clearing vp->v_data and de->de_vnode. Retry the
allocation of the vnode, now with de->de_vnode == NULL.

The check vp->v_data == NULL at the start of devfs_close() cannot be
affected by the change, since vnode lock must be held while VI_DOOMED
is set, and only dropped after the check.

Reported and tested by: Kohji Okuno <okuno.kohji jp panasonic com>
Reviewed by: attilio
MFC after: 3 weeks


# 2d843e7d 15-Dec-2010 Jaakko Heinonen <jh@FreeBSD.org>

Don't allow user created symbolic links to cover another entries marked
with DE_USER. If a devfs rule hid such entry, it was possible to create
infinite number of symbolic links with the same name.

Reviewed by: kib


# ef456eec 15-Dec-2010 Jaakko Heinonen <jh@FreeBSD.org>

- Assert that dm_lock is exclusively held in devfs_rules_apply() and
in devfs_vmkdir() while adding the entry to de_list of the parent.
- Apply devfs rules to newly created directories and symbolic links.

PR: kern/125034
Submitted by: Mateusz Guzik (original version)


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# d318c565 27-Sep-2010 Jaakko Heinonen <jh@FreeBSD.org>

Add reference counting for devfs paths containing user created symbolic
links. The reference counting is needed to be able to determine if a
specific devfs path exists. For true device file paths we can traverse
the cdevp_list but a separate directory list is needed for user created
symbolic links.

Add a new directory entry flag DE_USER to mark entries which should
unreference their parent directory on deletion.

A new function to traverse cdevp_list and the directory list will be
introduced in a separate commit.

Idea from: kib
Reviewed by: kib


# 6adc5230 21-Sep-2010 Jaakko Heinonen <jh@FreeBSD.org>

Modify devfs_fqpn() for future use in devfs path reference counting
code:

- Accept devfs_mount and devfs_dirent as the arguments instead of a
vnode. This generalizes the function so that it can be used from
contexts where vnode references are not available.
- Accept NULL cnp argument. No '/' will be appended, if a NULL cnp is
provided.
- Make the function global and add its prototype to devfs.h.

Reviewed by: kib


# 89d10571 15-Sep-2010 Jaakko Heinonen <jh@FreeBSD.org>

Remove empty devfs directories automatically.

devfs_delete() now recursively removes empty parent directories unless
the DEVFS_DEL_NORECURSE flag is specified. devfs_delete() can't be
called anymore with a parent directory vnode lock held because the
possible parent directory deletion needs to lock the vnode. Thus we
unlock the parent directory vnode in devfs_remove() before calling
devfs_delete().

Call devfs_populate_vp() from devfs_symlink() and devfs_vptocnp() as now
directories can get removed.

Add a check for DE_DOOMED flag to devfs_populate_vp() because
devfs_delete() drops dm_lock before the VI_DOOMED vnode flag gets set.
This ensures that devfs_populate_vp() returns an error for directories
which are in progress of deletion.

Reviewed by: kib
Discussed on: freebsd-current (mostly silence)


# 4136388a 26-Aug-2010 Jaakko Heinonen <jh@FreeBSD.org>

Set de_dir for user created symbolic links. This will be needed to be
able to resolve their parent directories.


# f5efcd64 25-Aug-2010 Jaakko Heinonen <jh@FreeBSD.org>

Call devfs_populate_vp() from devfs_getattr(). It was possible that
fstat(2) returned stale information through an open file descriptor.


# 0f6bb099 22-Aug-2010 Jaakko Heinonen <jh@FreeBSD.org>

Introduce and use devfs_populate_vp() to unlock a vnode before calling
devfs_populate(). This is a prerequisite for the automatic removal of
empty directories which will be committed in the future.

Reviewed by: kib (previous version)


# 3634d5b2 20-Aug-2010 John Baldwin <jhb@FreeBSD.org>

Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and
LK_CANRECURSE after a lock is created. Use them to implement macros that
otherwise manipulated the flags directly. Assert that the associated
lockmgr lock is exclusively locked by the current thread when manipulating
these flags to ensure the flag updates are safe. This last change required
some minor shuffling in a few filesystems to exclusively lock a brand new
vnode slightly earlier.

Reviewed by: kib
MFC after: 3 days


# 96835d61 19-Aug-2010 Jaakko Heinonen <jh@FreeBSD.org>

Call dev_rel() in error paths.

Reported by: kib
Reviewed by: kib
MFC after: 2 weeks


# 64040d39 12-Aug-2010 Jaakko Heinonen <jh@FreeBSD.org>

Allow user created symbolic links to cover device files and directories
if the device file appears during or after the link creation.

User created symbolic links are now inserted at the head of the
directory entry list after the "." and ".." entries. A new directory
entry flag DE_COVERED indicates that an entry is covered by a symbolic
link.

PR: kern/114057
Reviewed by: kib
Idea from: kib
Discussed on: freebsd-current (mostly silence)


# 3979450b 06-Aug-2010 Konstantin Belousov <kib@FreeBSD.org>

Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created
cdev will never be destroyed. Propagate the flag to devfs vnodes as
VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a
thread reference on such nodes.

In collaboration with: pho
MFC after: 1 month


# 9968a426 06-Aug-2010 Konstantin Belousov <kib@FreeBSD.org>

Enable shared locks for the devfs vnodes. Honor the locking mode
requested by lookup(). This should be a nop at the moment.

In collaboration with: pho
MFC after: 1 month


# 3a6fc63c 06-Aug-2010 Konstantin Belousov <kib@FreeBSD.org>

Initialize VV_ISTTY vnode flag on the devfs vnode creation instead of
doing it on each open.

In collaboration with: pho
MFC after: 1 month


# f40645c8 09-Jun-2010 Jaakko Heinonen <jh@FreeBSD.org>

Add a new function devfs_parent_dirent() for resolving devfs parent
directory entry. Use the new function in devfs_fqpn(), devfs_lookupx()
and devfs_vptocnp() instead of manually resolving the parent entry.

Reviewed by: kib


# 59e0452e 01-Jun-2010 Jaakko Heinonen <jh@FreeBSD.org>

Don't try to call cdevsw d_close() method when devfs_close() is called
because of insmntque1() failure.

Found with: stress2
Suggested and reviewed by: kib


# 8dc9b4cf 19-Dec-2009 Ed Schouten <ed@FreeBSD.org>

Let access overriding to TTYs depend on the cdev_priv, not the vnode.

Basically this commit changes two things, which improves access to TTYs
in exceptional conditions. Basically the problem was that when you ran
jexec(8) to attach to a jail, you couldn't use /dev/tty (well, also the
node of the actual TTY, e.g. /dev/pts/X). This is very inconvenient if
you want to attach to screens quickly, use ssh(1), etc.

The fixes:

- Cache the cdev_priv of the controlling TTY in struct session. Change
devfs_access() to compare against the cdev_priv instead of the vnode.
This allows you to bypass UNIX permissions, even across different
mounts of devfs.

- Extend devfs_prison_check() to unconditionally expose the device node
of the controlling TTY, even if normal prison nesting rules normally
don't allow this. This actually allows you to interact with this
device node.

To be honest, I'm not really happy with this solution. We now have to
store three pointers to a controlling TTY (s_ttyp, s_ttyvp, s_ttydp).
In an ideal world, we should just get rid of the latter two and only use
s_ttyp, but this makes certian pieces of code very impractical (e.g.
devfs, kern_exit.c).

Reported by: Many people


# f8f61460 20-Jun-2009 Ed Schouten <ed@FreeBSD.org>

Improve nested jail awareness of devfs by handling credentials.

Now that we start to use credentials on character devices more often
(because of MPSAFE TTY), move the prison-checks that are in place in the
TTY code into devfs.

Instead of strictly comparing the prisons, use the more common
prison_check() function to compare credentials. This means that
pseudo-terminals are only visible in devfs by processes within the same
jail and parent jails.

Even though regular users in parent jails can now interact with
pseudo-terminals from child jails, this seems to be the right approach.
These processes are also capable of interacting with the jailed
processes anyway, through signals for example.

Reviewed by: kib, rwatson (older version)


# c4df27d5 10-Jun-2009 Konstantin Belousov <kib@FreeBSD.org>

VOP_IOCTL takes unlocked vnode as an argument. Due to this, v_data may
be NULL or derefenced memory may become free at arbitrary moment.

Lock the vnode in cd9660, devfs and pseudofs implementation of VOP_IOCTL
to prevent reclaim; check whether the vnode was already reclaimed after
the lock is granted.

Reported by: georg at dts su
Reviewed by: des (pseudofs)
MFC after: 2 weeks


# bcf11e8d 05-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


# 0e9bd89d 15-May-2009 Konstantin Belousov <kib@FreeBSD.org>

Devfs replaces file ops vector with devfs-specific one in devfs_open(),
before the struct file is fully initialized in vn_open(), in particular,
fp->f_vnode is NULL. Other thread calling file operation before f_vnode
is set results in NULL pointer dereference in devvn_refthread().

Initialize f_vnode before calling d_fdopen() cdevsw method, that might
set file ops too.

Reported and tested by: Chris Timmons <cwt networks cwu edu>
(RELENG_7 version)
MFC after: 3 days


# dfd233ed 11-May-2009 Attilio Rao <attilio@FreeBSD.org>

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


# 885868cd 10-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by: jeff
Reviewed by: jeff
Discussed with: rmacklem, zach loafman @ isilon


# 52dc9305 10-Mar-2009 Konstantin Belousov <kib@FreeBSD.org>

Enable advisory file locking for devfs vnodes.

Reported by: Timothy Redaelli <timothy redaelli eu>
MFC after: 1 week


# 125dcf8c 06-Mar-2009 Konstantin Belousov <kib@FreeBSD.org>

Extract the no_poll() and vop_nopoll() code into the common routine
poll_no_poll().
Return a poll_no_poll() result from devfs_poll_f() when
filedescriptor does not reference the live cdev, instead of ENXIO.

Noted and tested by: hps
MFC after: 1 week


# 7956d34b 31-Jan-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Remove unused local variables.

Submitted by: Christoph Mallon christoph.mallon@gmx.de
Reviewed by: kib
MFC after: 2 weeks


# 71624181 08-Jan-2009 Edward Tomasz Napierala <trasz@FreeBSD.org>

Don't panic with "vinvalbuf: dirty bufs" when the mounted device that was
being written to goes away.

Reviewed by: kib, scottl
Approved by: rwatson (mentor)
Sponsored by: FreeBSD Foundation


# c7c7520a 12-Dec-2008 Konstantin Belousov <kib@FreeBSD.org>

Do not leak defs_de_interlock on error.

Another pointy hat for my collection.


# 4c44fd37 11-Dec-2008 Joe Marcus Clarke <marcus@FreeBSD.org>

Implement VOP_VPTOCNP for devfs. Directory and character device vnodes are
properly translated to their component names.

Reviewed by: arch
Approved by: kib


# 15bc6b2b 28-Oct-2008 Edward Tomasz Napierala <trasz@FreeBSD.org>

Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by: rwatson (mentor)


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 7818e0a5 26-Sep-2008 Konstantin Belousov <kib@FreeBSD.org>

Save previous content of the td_fpop before storing the current
filedescriptor into it. Make sure that td_fpop is NULL when calling
d_mmap from dev_pager_getpages().

Change guards against td_fpop field being non-NULL with private state
for another device, and against sudden clearing the td_fpop. This
could occur when either a driver method calls another driver through
the filedescriptor operation, or a page fault happen while driver is
writing to a memory backed by another driver.

Noted by: rwatson
Tested by: rnoland
MFC after: 3 days


# caf8aec8 20-Sep-2008 Konstantin Belousov <kib@FreeBSD.org>

fdescfs, devfs, mqueuefs, nfs, portalfs, pseudofs, tmpfs and xfs
initialize the vattr structure in VOP_GETATTR() with VATTR_NULL(),
vattr_null() or by zeroing it. Remove these to allow preinitialization
of fields work in vn_stat(). This is needed to get birthtime initialized
correctly.

Submitted by: Jaakko Heinonen <jh saunalahti fi>
Discussed on: freebsd-fs
MFC after: 1 month


# 0359a12e 28-Aug-2008 Attilio Rao <attilio@FreeBSD.org>

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# bc093719 20-Aug-2008 Ed Schouten <ed@FreeBSD.org>

Integrate the new MPSAFE TTY layer to the FreeBSD operating system.

The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.

If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).

The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan


# f35db5f7 12-Aug-2008 Konstantin Belousov <kib@FreeBSD.org>

Remove unnecessary locking around pointer fetch.

Requested by: jhb


# 05427aaf 16-Jun-2008 Konstantin Belousov <kib@FreeBSD.org>

Struct cdev is always the member of the struct cdev_priv. When devfs
needed to promote cdev to cdev_priv, the si_priv pointer was followed.

Use member2struct() to calculate address of the wrapping cdev_priv.
Rename si_priv to __si_reserved.

Tested by: pho
Reviewed by: ed
MFC after: 2 weeks


# 9e40a5f8 05-Jun-2008 Konstantin Belousov <kib@FreeBSD.org>

When devfs_allocv() committed to create new vnode, since de_vnode is NULL,
the dm_lock is held while the newly allocated vnode is locked. Since no
other threads may try to lock the new vnode yet, the LOR there cannot
result in the deadlock.

Shut down the witness warning to note this fact.

Tested by: pho
Prodded by: attilio


# 16151645 01-Jun-2008 Ed Schouten <ed@FreeBSD.org>

Revert the changes I made to devfs_setattr() in r179457.

As discussed with Robert Watson and John Baldwin, it would be better if
PTY's are created with proper permissions, turning grantpt() into a
no-op.

Bypassing security frameworks like MAC by passing NOCRED to
VOP_SETATTR() will only make things more complex.

Approved by: philip (mentor)


# 34d1dcf0 31-May-2008 Ed Schouten <ed@FreeBSD.org>

Merge back devfs changes from the mpsafetty branch.

In the mpsafetty branch, PTY's are allocated through the posix_openpt()
system call. The controller side of a PTY now uses its own file
descriptor type (just like sockets, vnodes, pipes, etc).

To remain compatible with existing FreeBSD and Linux C libraries, we can
still create PTY's by opening /dev/ptmx or /dev/ptyXX. These nodes
implement d_fdopen(). Devfs has been slightly changed here, to allow
finit() to be called from d_fdopen().

The routine grantpt() has also been moved into the kernel. This routine
is a little odd, because it needs to bypass standard UNIX permissions.
It needs to change the owner/group/mode of the slave device node, which
may often not be possible. The old implementation solved this by
spawning a setuid utility.

When VOP_SETATTR() is called with NOCRED, devfs_setattr() dereferences
ap->a_cred, causing a kernel panic. Change the de_{uid,gid,mode} code to
allow changes when a->a_cred is set to NOCRED.

Approved by: philip (mentor)


# 772e2453 23-May-2008 Konstantin Belousov <kib@FreeBSD.org>

When vget() fails (because the vnode has been reclaimed), there is no
sense to loop trying to vget() the vnode again.

PR: 122977
Submitted by: Arthur Hartwig <arthur.hartwig nokia com>
Tested by: pho
Reviewed by: jhb
MFC after: 1 week


# 82f4d640 21-May-2008 Konstantin Belousov <kib@FreeBSD.org>

Implement the per-open file data for the cdev.

The patch does not change the cdevsw KBI. Management of the data is
provided by the functions
int devfs_set_cdevpriv(void *priv, cdevpriv_dtr_t dtr);
int devfs_get_cdevpriv(void **datap);
void devfs_clear_cdevpriv(void);
All of the functions are supposed to be called from the cdevsw method
contexts.

- devfs_set_cdevpriv assigns the priv as private data for the file
descriptor which is used to initiate currently performed driver
operation. dtr is the function that will be called when either the
last refernce to the file goes away, the device is destroyed or
devfs_clear_cdevpriv is called.
- devfs_get_cdevpriv is the obvious accessor.
- devfs_clear_cdevpriv allows to clear the private data for the still
open file.

Implementation keeps the driver-supplied pointers in the struct
cdev_privdata, that is referenced both from the struct file and struct
cdev, and cannot outlive any of the referee.

Man pages will be provided after the KPI stabilizes.

Reviewed by: jhb
Useful suggestions from: jeff, antoine
Debugging help and tested by: pho
MFC after: 1 month


# 06d0d0e2 07-May-2008 John Baldwin <jhb@FreeBSD.org>

Don't explicitly drop Giant around d_open/d_fdopen/d_close for MPSAFE
drivers. Since devfs is already marked MPSAFE it shouldn't be held
anyway.

MFC after: 2 weeks
Discussed with: phk


# 81c794f9 25-Feb-2008 Attilio Rao <attilio@FreeBSD.org>

Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is
always curthread.

As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.

Tested by: Andrea Barberio <insomniac at slackware dot it>


# 22db15c0 13-Jan-2008 Attilio Rao <attilio@FreeBSD.org>

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


# cb05b60a 09-Jan-2008 Attilio Rao <attilio@FreeBSD.org>

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


# 314464f4 07-Jan-2008 John Baldwin <jhb@FreeBSD.org>

Lock the vnode interlock while reading v_usecount to update si_usecount
in a cdev in devfs_reclaim().

MFC after: 3 days
Reviewed by: jeff (a while ago)


# e4650294 07-Jan-2008 John Baldwin <jhb@FreeBSD.org>

Make ftruncate a 'struct file' operation rather than a vnode operation.
This makes it possible to support ftruncate() on non-vnode file types in
the future.
- 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on
a given file descriptor.
- ftruncate() moves to kern/sys_generic.c and now just fetches a file
object and invokes fo_truncate().
- The vnode-specific portions of ftruncate() move to vn_truncate() in
vfs_vnops.c which implements fo_truncate() for vnode file types.
- Non-vnode file types return EINVAL in their fo_truncate() method.

Submitted by: rwatson


# 397c19d1 29-Dec-2007 Jeff Roberson <jeff@FreeBSD.org>

Remove explicit locking of struct file.
- Introduce a finit() which is used to initailize the fields of struct file
in such a way that the ops vector is only valid after the data, type,
and flags are valid.
- Protect f_flag and f_count with atomic operations.
- Remove the global list of all files and associated accounting.
- Rewrite the unp garbage collection such that it no longer requires
the global list of all files and instead uses a list of all unp sockets.
- Mark sockets in the accept queue so we don't incorrectly gc them.

Tested by: kris, pho


# 30d239bc 24-Oct-2007 Robert Watson <rwatson@FreeBSD.org>

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


# 57fd3d55 26-Jul-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

When we do open, we should lock the vnode exclusively. This fixes few races:
- fifo race, where two threads assign v_fifoinfo,
- v_writecount modifications,
- v_object modifications,
- and probably more...

Discussed with: kib, ups
Approved by: re (rwatson)


# de10ffa5 03-Jul-2007 Konstantin Belousov <kib@FreeBSD.org>

Since rev. 1.199 of sys/kern/kern_conf.c, the thread that calls
destroy_dev() from d_close() cdev method would self-deadlock.
devfs_close() bump device thread reference counter, and destroy_dev()
sleeps, waiting for si_threadcount to reach zero for cdev without
d_purge method.

destroy_dev_sched() could be used instead from d_close(), to
schedule execution of destroy_dev() in another context. The
destroy_dev_sched_drain() function can be used to drain the scheduled
calls to destroy_dev_sched(). Similarly, drain_dev_clone_events() drains
the events clone to make sure no lingering devices are left after
dev_clone event handler deregistered.

make_dev_credf(MAKEDEV_REF) function should be used from dev_clone
event handlers instead of make_dev()/make_dev_cred() to ensure that created
device has reference counter bumped before cdev mutex is dropped inside
make_dev().

Reviewed by: tegge (early versions), njl (programming interface)
Debugging help and testing by: Peter Holm
Approved by: re (kensmith)


# 32f9753c 11-Jun-2007 Robert Watson <rwatson@FreeBSD.org>

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


# 9e223287 31-May-2007 Konstantin Belousov <kib@FreeBSD.org>

Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by: jhb
Reviewed by: daichi (unionfs)
Approved by: re (kensmith)


# 30575990 23-Apr-2007 Robert Watson <rwatson@FreeBSD.org>

Rename mac*devfsdirent*() to mac*devfs*() to synchronize with SEDarwin,
where similar data structures exist to support devfs and the MAC
Framework, but are named differently.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA, Inc.


# 164554de 19-Apr-2007 Tom Rhodes <trhodes@FreeBSD.org>

In some cases, like whenever devfs file times are zero, the fix(aa) will not
be applied to dev entries. This leaves us with file times like "Jan 1 1970."
Work around this problem by replacing the tv_sec == 0 check with a
<= 3600 check. It's doubtful anyone will be booting within an hour of the
Epoch, let alone care about a few seconds worth of nonzero timestamps. It's
a hackish work around, but it does work and I have not experienced any
negatives in my testing.

Discussed with: bde
"Ok with me: phk


# 5e3f7694 04-Apr-2007 Robert Watson <rwatson@FreeBSD.org>

Replace custom file descriptor array sleep lock constructed using a mutex
and flags with an sxlock. This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention. All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently. Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.

- Generally eliminate the distinction between "fast" and regular
acquisisition of the filedesc lock; the plan is that they will now all
be fast. Change all locking instances to either shared or exclusive
locks.

- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
was called without the mutex held; sx_sleep() is now always called with
the sxlock held exclusively.

- Universally hold the struct file lock over changes to struct file,
rather than the filedesc lock or no lock. Always update the f_ops
field last. A further memory barrier is required here in the future
(discussed with jhb).

- Improve locking and reference management in linux_at(), which fails to
properly acquire vnode references before using vnode pointers. Annotate
improper use of vn_fullpath(), which will be replaced at a future date.

In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).

Tested by: kris
Discussed with: jhb, kris, attilio, jeff


# 6455de00 26-Mar-2007 Kris Kennaway <kris@FreeBSD.org>

Annotate that this giant acqusition is dependent on tty locking.


# 61b9d89f 12-Mar-2007 Tor Egge <tegge@FreeBSD.org>

Make insmntque() externally visibile and allow it to fail (e.g. during
late stages of unmount). On failure, the vnode is recycled.

Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.

Change getnewvnode() to no longer call insmntque(). Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.

Change vfs_hash_insert() to no longer lock the vnode. The caller now
has that responsibility.

Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup. Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.

Approved by: re (kensmith)
Reviewed by: kib
In collaboration with: kib


# acd3428b 06-Nov-2006 Robert Watson <rwatson@FreeBSD.org>

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# aed55708 22-Oct-2006 Robert Watson <rwatson@FreeBSD.org>

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


# 16f50bcd 20-Oct-2006 Konstantin Belousov <kib@FreeBSD.org>

Update the access and modification times for dev while still holding
thread reference on it.

Reviewed by: tegge
Approved by: pjd (mentor)


# 1663075c 20-Oct-2006 Konstantin Belousov <kib@FreeBSD.org>

Fix the race between devfs_fp_check and devfs_reclaim. Derefence the
vnode' v_rdev and increment the dev threadcount , as well as clear it
(in devfs_reclaim) under the dev_lock().

Reviewed by: tegge
Approved by: pjd (mentor)


# 828d6d12 18-Oct-2006 Konstantin Belousov <kib@FreeBSD.org>

Properly lock the vnode around vgone() calls.

Unlock the vnode in devfs_close() while calling into the driver d_close()
routine.

devfs_revoke() changes by: ups
Reviewed and bugfixes by: tegge
Tested by: mbr, Peter Holm
Approved by: pjd (mentor)
MFC after: 1 week


# af72db71 19-Sep-2006 Konstantin Belousov <kib@FreeBSD.org>

Fix the bug in rev. 1.134. In devfs_allocv_drop_refs(), when not_found == 2
and drop_dm_lock is true, no unlocking shall be attempted. The lock is
already dropped and memory is freed.

Found with: Coverity Prevent(tm)
CID: 1536
Approved by: pjd (mentor)


# e7f9b744 18-Sep-2006 Konstantin Belousov <kib@FreeBSD.org>

Resolve the devfs deadlock caused by LOR between devfs_mount->dm_lock and
vnode lock in devfs_allocv. Do this by temporary dropping dm_lock around
vnode locking.

For safe operation, add hold counters for both devfs_mount and devfs_dirent,
and DE_DOOMED flag for devfs_dirent. The facilities allow to continue after
dropping of the dm_lock, by making sure that referenced memory does not
disappear.

Reviewed by: tegge
Tested by: kris
Approved by: kan (mentor)
PR: kern/102335


# 9c499ad9 17-Jul-2006 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the NDEVFSINO and NDEVFSOVERFLOW options which no longer exists in
DEVFS.

Remove the opt_devfs.h file now that it is empty.


# 56eeb277 12-Jul-2006 Stephan Uphoff <ups@FreeBSD.org>

Add vnode interlocking to devfs.
This prevents race conditions that can cause pagefaults or devfs
to use arbitrary vnodes.

MFC after: 1 week


# 83ff52a7 06-Jul-2006 Robert Watson <rwatson@FreeBSD.org>

Use #include "", not #include <> for opt_foo.h.

MFC after: 3 days


# 23b77994 31-Mar-2006 Jeff Roberson <jeff@FreeBSD.org>

- Add a bogus vhold/vdrop around vgone() in devfs_revoke. Without this
the vnode is never recycled. It is bogus because the reference really
should be associated with the devfs dirent.


# 3b77d80c 30-Jan-2006 Jeff Roberson <jeff@FreeBSD.org>

- Remove a stale comment. This function was rewritten to be SMP safe some
time ago.

Sponsored by: Isilon Systems, Inc.


# 16e35dcc 09-Nov-2005 Doug White <dwhite@FreeBSD.org>

This is a workaround for a complicated issue involving VFS cookies and devfs.
The PR and patch have the details. The ultimate fix requires architectural
changes and clarifications to the VFS API, but this will prevent the system
from panicking when someone does "ls /dev" while running in a shell under the
linuxulator.

This issue affects HEAD and RELENG_6 only.

PR: 88249
Submitted by: "Devon H. O'Dell" <dodell@ixsystems.com>
MFC after: 3 days


# 3b72f38b 18-Oct-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Use correct cirteria for determining which directory entries we can
purge right away and which we merely can hide.

Beaten into my skull by: kris


# e606a3c6 19-Sep-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Rewamp DEVFS internals pretty severely [1].

Give DEVFS a proper inode called struct cdev_priv. It is important
to keep in mind that this "inode" is shared between all DEVFS
mountpoints, therefore it is protected by the global device mutex.

Link the cdev_priv's into a list, protected by the global device
mutex. Keep track of each cdev_priv's state with a flag bit and
of references from mountpoints with a dedicated usecount.

Reap the benefits of much improved kernel memory allocator and the
generally better defined device driver APIs to get rid of the tables
of pointers + serial numbers, their overflow tables, the atomics
to muck about in them and all the trouble that resulted in.

This makes RAM the only limit on how many devices we can have.

The cdev_priv is actually a super struct containing the normal cdev
as the "public" part, and therefore allocation and freeing has moved
to devfs_devs.c from kern_conf.c.

The overall responsibility is (to be) split such that kern/kern_conf.c
is the stuff that deals with drivers and struct cdev and fs/devfs
handles filesystems and struct cdev_priv and their private liason
exposed only in devfs_int.h.

Move the inode number from cdev to cdev_priv and allocate inode
numbers properly with unr. Local dirents in the mountpoints
(directories, symlinks) allocate inodes from the same pool to
guarantee against overlaps.

Various other fields are going to migrate from cdev to cdev_priv
in the future in order to hide them. A few fields may migrate
from devfs_dirent to cdev_priv as well.

Protect the DEVFS mountpoint with an sx lock instead of lockmgr,
this lock also protects the directory tree of the mountpoint.

Give each mountpoint a unique integer index, allocated with unr.
Use it into an array of devfs_dirent pointers in each cdev_priv.
Initially the array points to a single element also inside cdev_priv,
but as more devfs instances are mounted, the array is extended with
malloc(9) as necessary when the filesystem populates its directory
tree.

Retire the cdev alias lists, the cdev_priv now know about all the
relevant devfs_dirents (and their vnodes) and devfs_revoke() will
pick them up from there. We still spelunk into other mountpoints
and fondle their data without 100% good locking. It may make better
sense to vector the revoke event into the tty code and there do a
destroy_dev/make_dev on the tty's devices, but that's for further
study.

Lots of shuffling of stuff and churn of bits for no good reason[2].

XXX: There is still nothing preventing the dev_clone EVENTHANDLER
from being invoked at the same time in two devfs mountpoints. It
is not obvious what the best course of action is here.

XXX: comment out an if statement that lost its body, until I can
find out what should go there so it doesn't do damage in the meantime.

XXX: Leave in a few extra malloc types and KASSERTS to help track
down any remaining issues.

Much testing provided by: Kris
Much confusion caused by (races in): md(4)

[1] You are not supposed to understand anything past this point.

[2] This line should simplify life for the peanut gallery.


# 59307b0d 15-Sep-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Don't attempt to recurse lockmgr, it doesn't like it.


# 214c8ff0 15-Sep-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Various minor polishing.


# ab32e952 15-Sep-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Absolve devfs_rule.c from locking responsibility and call it with
all necessary locking held.


# 21806f30 12-Sep-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Clean up prototypes.


# 80447bf7 29-Aug-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Add a missing dev_relthread() call.

Remove unused variable.

Spotted by: Hans Petter Selasky <hselasky@c2i.net>


# 516ad423 17-Aug-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Handle device drivers with D_NEEDGIANT in a way which does not
penalize the 'good' drivers: Allocate a shadow cdevsw and populate
it with wrapper functions which grab Giant


# 31cc57cd 16-Aug-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Collect the devfs related sysctls in one place


# d785dfef 15-Aug-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Eliminate effectively unused dm_basedir field from devfs_mount.


# 6a113b3d 08-Aug-2005 Robert Watson <rwatson@FreeBSD.org>

Merge the dev_clone and dev_clone_cred event handlers into a single
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do. This avoids having multiple event handler types and
fall-back/precedence logic in devfs.

This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.

Requested by: phk
MFC after: 3 days


# 02a4be3f 20-Jul-2005 Simon L. B. Nielsen <simon@FreeBSD.org>

Correct devfs ruleset bypass.

Submitted by: csjp
Reviewed by: phk
Security: FreeBSD-SA-05:17.devfs
Approved by: cperciva


# d26dd2d9 14-Jul-2005 Robert Watson <rwatson@FreeBSD.org>

When devfs cloning takes place, provide access to the credential of the
process that caused the clone event to take place for the device driver
creating the device. This allows cloned device drivers to adapt the
device node based on security aspects of the process, such as the uid,
gid, and MAC label.

- Add a cred reference to struct cdev, so that when a device node is
instantiated as a vnode, the cloning credential can be exposed to
MAC.

- Add make_dev_cred(), a version of make_dev() that additionally
accepts the credential to stick in the struct cdev. Implement it and
make_dev() in terms of a back-end make_dev_credv().

- Add a new event handler, dev_clone_cred, which can be registered to
receive the credential instead of dev_clone, if desired.

- Modify the MAC entry point mac_create_devfs_device() to accept an
optional credential pointer (may be NULL), so that MAC policies can
inspect and act on the label or other elements of the credential
when initializing the skeleton device protections.

- Modify tty_pty.c to register clone_dev_cred and invoke make_dev_cred(),
so that the pty clone credential is exposed to the MAC Framework.

While currently primarily focussed on MAC policies, this change is also
a prerequisite for changes to allow ptys to be instantiated with the UID
of the process looking up the pty. This requires further changes to the
pty driver -- in particular, to immediately recycle pty nodes on last
close so that the credential-related state can be recreated on next
lookup.

Submitted by: Andrew Reisse <andrew.reisse@sparta.com>
Obtained from: TrustedBSD Project
Sponsored by: SPAWAR, SPARTA
MFC after: 1 week
MFC note: Merge to 6.x, but not 5.x for ABI reasons


# fd225fe4 31-May-2005 Craig Rodrigues <rodrigc@FreeBSD.org>

Do not declare a struct as extern, and then implement
it as static in the same file. This is not legal C,
and GCC 4.0 will issue an error.

Reviewed by: phk
Approved by: das (mentor)


# 7b6b7657 30-Apr-2005 Jeff Roberson <jeff@FreeBSD.org>

- In devfs_open() and devfs_close() grab Giant if the driver sets NEEDGIANT.
We still have to DROP_GIANT and PICKUP_GIANT when NEEDGIANT is not set
because vfs is still sometime entered with Giant held.


# 4585e3ac 13-Apr-2005 Jeff Roberson <jeff@FreeBSD.org>

- Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case. Se vfs_lookup.c r1.79 for details.

Sponsored by: Isilon Systems, Inc.


# f4f6abcb 30-Mar-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Explicitly hold a reference to the cdev we have just cloned. This
closes the race where the cdev was reclaimed before it ever made it
back to devfs lookup.


# eb151cb9 30-Mar-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Rename dev_ref() to dev_refl()


# eddcb03d 28-Mar-2005 Jeff Roberson <jeff@FreeBSD.org>

- We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.

Sponsored by: Isilon Systems, Inc.


# c0f681c2 12-Mar-2005 Jeff Roberson <jeff@FreeBSD.org>

- The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem. Check that rather than VI_XLOCK.

Sponsored by: Isilon Systems, Inc.


# 26474078 10-Mar-2005 Poul-Henning Kamp <phk@FreeBSD.org>

One more bit of the major/minor patch to make ttyname happy as well.


# b43ab0e3 10-Mar-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Try to fix the mess I made of devname, with the minimal subset of the
larger minor/major patch which was posted for testing.


# 0454a53d 22-Feb-2005 Poul-Henning Kamp <phk@FreeBSD.org>

We may not have an actual cdev at this point.


# aa2f6ddc 22-Feb-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Reap more benefits from DEVFS:

List devfs_dirents rather than vnodes off their shared struct cdev, this
saves a pointer field in the vnode at the expense of a field in the
devfs_dirent. There are often 100 times more vnodes so this is bargain.
In addition it makes it harder for people to try to do stypid things like
"finding the vnode from cdev".

Since DEVFS handles all VCHR nodes now, we can do the vnode related
cleanup in devfs_reclaim() instead of in dev_rel() and vgonel().
Similarly, we can do the struct cdev related cleanup in dev_rel()
instead of devfs_reclaim().

rename idestroy_dev() to destroy_devl() for consistency.

Add LIST_ENTRY de_alias to struct devfs_dirent.
Remove v_specnext from struct vnode.
Change si_hlist to si_alist in struct cdev.
String new devfs vnodes' devfs_dirent on si_alist when
we create them and take them off in devfs_reclaim().

Fix devfs_revoke() accordingly. Also don't clear fields
devfs_reclaim() will clear when called from vgone();

Let devfs_reclaim() call dev_rel() instead of vgonel().

Move the usecount tracking from dev_rel() to devfs_reclaim(),
and let dev_rel() take a struct cdev argument instead of vnode.

Destroy SI_CHEAPCLONE devices in dev_rel() (instead of
devfs_reclaim()) when they are no longer used. (This
should maybe happen in devfs_close() instead.)


# 1a1457d4 22-Feb-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Make dev_ref() require the dev_lock() to be held and use it from
devfs instead of directly frobbing the si_refcount.


# 4d8ac58b 17-Feb-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce vx_wait{l}() and use it instead of home-rolled versions.


# df32e67c 09-Feb-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Statize devfs_ops_f


# a369f34d 28-Jan-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Make filesystems get rid of their own vnodes vnode_pager object in
VOP_RECLAIM().


# 83c64397 13-Jan-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Whitespace in vop_vector{} initializations.


# d167cf6f 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for copyright notices, minor format tweaks as necessary


# 50a36c11 22-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Be consistent about flag values passed to device drivers read/write
methods:

Read can see O_NONBLOCK and O_DIRECT.

Write can see O_NONBLOCK, O_DIRECT and O_FSYNC.

In addition O_DIRECT is shadowed as IO_DIRECT for now for backwards
compatibility.


# 10eee285 22-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Shuffle numeric values of the IO_* flags to match the O_* flags from
fcntl.h.

This is in preparation for making the flags passed to device drivers be
consistently from fcntl.h for all entrypoints.

Today open, close and ioctl uses fcntl.h flags, while read and write
uses vnode.h flags.


# e87047b4 20-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

We can only ever get to vgonechrl() from a devfs vnode, so we do not
need to reassign the vp->v_op to devfs_specops, we know that is the
value already.

Make devfs_specops private to devfs.


# 2c022012 20-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Add a couple of KASSERTS to try to diagnose a problem reported.


# 2a9e0c32 14-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Be a bit more assertive about vnode bypass.


# 1dc4727e 13-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Another FNONBLOCK -> O_NONBLOCK.

Don't unconditionally set IO_UNIT to device drivers in write: nobody
checks it, and since it was always set it did not carry information anyway.


# ab9caf9d 13-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Use O_NONBLOCK instead of FNONBLOCK alias.


# f0d5cba9 13-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Explicit panic in vop_read/vop_write for devices


# 20a92a18 07-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
Convert to nmount.
Add omount compat shims.
Remove dedicated rootfs mounting code.
Use vfs_mountedfrom()
Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
Convert to nmount (the simple way, mount_nfs(8) is still necessary).
Add omount compat shims.
Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
Convert to nmount.
Add omount compat shims.
Remove dedicated rootfs mounting code.
Use vfs_mountedfrom()
Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
Mount devfs on /. Devfs needs no 'from' so this is clean.
symlink /dev to /. This makes it possible to lookup /dev/foo.
Mount "real" root filesystem on /.
Surgically move the devfs mountpoint from under the real root
filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
Don't do devfs mounting and rootvnode assignment here, it was
already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias(). Put the
few necessary lines in devfs where they belong. This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway. Correct information is provided by
statfs(/).


# aec0fb7b 01-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Back when VOP_* was introduced, we did not have new-style struct
initializations but we did have lofty goals and big ideals.

Adjust to more contemporary circumstances and gain type checking.

Replace the entire vop_t frobbing thing with properly typed
structures. The only casualty is that we can not add a new
VOP_ method with a loadable module. History has not given
us reason to belive this would ever be feasible in the the
first place.

Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.

Give coda correct prototypes and function definitions for
all vop_()s.

Generate a bit more data from the vnode_if.src file: a
struct vop_vector and protype typedefs for all vop methods.

Add a new vop_bypass() and make vop_default be a pointer
to another struct vop_vector.

Remove a lot of vfs_init since vop_vector is ready to use
from the compiler.

Cast various vop_mumble() to void * with uppercase name,
for instance VOP_PANIC, VOP_NULL etc.

Implement VCALL() by making vdesc_offset the offsetof() the
relevant function pointer in vop_vector. This is disgusting
but since the code is generated by a script comparatively
safe. The alternative for nullfs etc. would be much worse.

Fix up all vnode method vectors to remove casts so they
become typesafe. (The bulk of this is generated by scripts)


# 6fde64c7 30-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Mechanically change prototypes for vnode operations to use the new typedefs.


# ea566ae2 17-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Make vnode bypass for devices mandatory.


# 8352b192 15-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Make vnode bypass the default for devices.

Can be disabled in case of problems with
vfs.devfs.fops=0
in loader.conf


# 49b7607e 13-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Integrate most of vop_revoke() into devfs_revoke() where it belongs.


# aac5167c 13-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Add the devfs_fp_check() function which helps us get from a struct file
to a cdev and a devsw, doing all the relevant checks along the way.

Add the check to see if fp->f_vnode->v_rdev differs from our cached
fp->f_data copy of our cdev. If it does the device was revoked and
we return ENXIO.


# 56dd3a61 08-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Add optional device vnode bypass to DEVFS.

The tunable vfs.devfs.fops controls this feature and defaults to off.

When enabled (vfs.devfs.fops=1 in loader), device vnodes opened
through a filedescriptor gets a special fops vector which instead
of the detour through the vnode layer goes directly to DEVFS.

Amongst other things this allows us to run Giant free read/write to
device drivers which have been weaned off D_NEEDGIANT.

Currently this means /dev/null, /dev/zero, disks, (and maybe the
random stuff ?)

On a 700MHz K7 machine this doubles the speed of
dd if=/dev/zero of=/dev/null bs=1 count=1000000

This roughly translates to shaving 2usec of each read/write syscall.

The poll/kqfilter paths need more work before they are giant free,
this work is ongoing in p4::phk_bufwork

Please test this and report any problems, LORs etc.


# 5349c79d 06-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Properly implement a default version of VOP_GETWRITEMOUNT.

Remove improper access to vop_stdgetwritemount() which should and
will instead rely on the VOP default path.


# ecc14aae 04-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Add back securelevel check for disks.

XXX: This should live in geom_dev.c but we don't have access to the
cred there.
XXX: XXX: This may not matter anymore since filesystems use geom_vfs.


# 4cea3289 29-Oct-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Don't give disks special treatment, they don't come this way anymore.


# c108bb74 29-Oct-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Remove VOP_SPECSTRATEGY() from the system.


# 6afb3b1c 29-Oct-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Give dev_strategy() an explict cdev argument in preparation for removing
buf->b-dev.

Put a bio between the buf passed to dev_strategy() and the device driver
strategy routine in order to not clobber fields in the buf.

Assert copyright on vfs_bio.c and update copyright message to canonical
text. There is no legal difference between John Dysons two-clause
abbreviated BSD license and the canonical text.


# 45628dd3 28-Oct-2004 Poul-Henning Kamp <phk@FreeBSD.org>

What can I say: don't allow people to mount DEVFS with option "nodev".


# 5d9d81e7 26-Oct-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Put the I/O block size in bufobj->bo_bsize.

We keep si_bsize_phys around for now as that is the simplest way to pull
the number out of disk device drivers in devfs_open(). The correct solution
would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth
when filesystems sit on GEOM, so don't bother for now.


# ff7c5a48 22-Oct-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Alas, poor SPECFS! -- I knew him, Horatio; A filesystem of infinite
jest, of most excellent fancy: he hath taught me lessons a thousand
times; and now, how abhorred in my imagination it is! my gorge rises
at it. Here were those hacks that I have curs'd I know not how
oft. Where be your kludges now? your workarounds? your layering
violations, that were wont to set the table on a roar?

Move the skeleton of specfs into devfs where it now belongs and
bury the rest.


# 56f21b9d 26-Jul-2004 Colin Percival <cperciva@FreeBSD.org>

Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with: rwatson, scottl
Requested by: jhb


# de592112 22-Jul-2004 Robert Watson <rwatson@FreeBSD.org>

In devfs_allocv(), rather than assigning 'td = curthread', assert that
the caller passes in a td that is curthread, and consistently pass 'td'
into vget(). Remove some bogus logic that passed in td or curthread
conditional on td being non-NULL, which seems redundant in the face of
the earlier assignment of td to curthread if td is NULL.

In devfs_symlink(), cache the passed thread in 'td' so we don't have
to keep retrieving it from the 'ap' structure, and assert that td is
curthread (since we dereference it to get thread-local td_ucred). Use
'td' in preference to curthread for later lockmgr calls, since they are
equal.


# f3732fd1 17-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Second half of the dev_t cleanup.

The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.


# 89c9c53d 16-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


# bc553559 19-Feb-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Report the correct length for symlink entries.


# 49e9fc0a 02-Jan-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Improve on POLA by populating DEVFS before doing devfs(8) rule ioctls.

PR: 60687
Spotted by: Colin Percival <cperciva@daemonology.net>


# 8b285b90 20-Oct-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Remember to check the DE_WHITEOUT flag in the case where a cloned
device is hidden by a devfs(8) rule.

Spotted by: Adam Nowacki <ptnowak@bsk.vectranet.pl>


# 7e8766a9 20-Oct-2003 Poul-Henning Kamp <phk@FreeBSD.org>

When a driver successfully created a device on demand, we can directly
pick up the DEVFS inode number from the dev_t and find our directory
entry from that, we don't need to scan the directory to find it.

This also solves an issue with on-demand devices in subdirectories.

Submitted by: cognet


# 2f613363 31-May-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unused variable.

Found by: FlexeLint


# 99648386 03-Mar-2003 Nate Lawson <njl@FreeBSD.org>

Finish cleanup of vprint() which was begun with changing v_tag to a string.
Remove extraneous uses of vop_null, instead defering to the default op.
Rename vnode type "vfs" to the more descriptive "syncer".
Fix formatting for various filesystems that use vop_print.


# 8994a245 02-Mar-2003 Dag-Erling Smørgrav <des@FreeBSD.org>

Clean up whitespace, s/register //, refrain from strong urge to ANSIfy.


# c9524588 02-Mar-2003 Dag-Erling Smørgrav <des@FreeBSD.org>

uiomove-related caddr_t -> void * (just the low-hanging fruit)


# a163d034 18-Feb-2003 Warner Losh <imp@FreeBSD.org>

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 6718b083 29-Jan-2003 Poul-Henning Kamp <phk@FreeBSD.org>

NODEVFS cleanup: remove #ifdefs.


# 44956c98 21-Jan-2003 Alfred Perlstein <alfred@FreeBSD.org>

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 9c71cb4b 13-Jan-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Even if the permissions deny it, a process should be allowed to
access its controlling terminal.

In essense, history dictates that any process is allowed to open
/dev/tty for RW, irrespective of credential, because by definition
it is it's own controlling terminal.

Before DEVFS we relied on a hacky half-device thing (kern/tty_tty.c)
which did the magic deep down at device level, which at best was
disgusting from an architectural point of view.

My first shot at this was to use the cloning mechanism to simply
give people the right tty when they ask for /dev/tty, that's why
you get this, slightly counter intuitive result:

syv# ls -l /dev/tty `tty`
crw--w---- 1 u1 tty 5, 0 Jan 13 22:14 /dev/tty
crw--w---- 1 u1 tty 5, 0 Jan 13 22:14 /dev/ttyp0

Trouble is, when user u1 su(1)'s to user u2, he cannot open
/dev/ttyp0 anymore because he doesn't have permission to do so.

The above fix allows him to do that.

The interesting side effect is that one was previously only able
to access the controlling tty by indirection:
date > /dev/tty
but not by name:
date > `tty`

This is now possible, and that feels a lot more like DTRT.

PR: 46635
MFC candidate: could be.


# c6e3ae99 04-Jan-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Since Jeffr made the std* functions the default in rev 1.63 of
kern/vfs_defaults.c it is wrong for the individual filesystems to use
the std* functions as that prevents override of the default.

Found by: src/tools/tools/vop_table


# adfad1eb 27-Dec-2002 Robert Watson <rwatson@FreeBSD.org>

Trim left-over and unused vop_refreshlabel() bits from devfs.

Reported by: bde


# 990b4b2d 08-Dec-2002 Robert Watson <rwatson@FreeBSD.org>

Remove dm_root entry from struct devfs_mount. It's never set, and is
unused. Replace it with a dm_mount back-pointer to the struct mount
that the devfs_mount is associated with. Export that pointer to MAC
Framework entry points, where all current policies don't use the
pointer. This permits the SEBSD port of SELinux's FLASK/TE to compile
out-of-the-box on 5.0-CURRENT with full file system labeling support.

Approved by: re (murray)
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 763bbd2f 26-Oct-2002 Robert Watson <rwatson@FreeBSD.org>

Slightly change the semantics of vnode labels for MAC: rather than
"refreshing" the label on the vnode before use, just get the label
right from inception. For single-label file systems, set the label
in the generic VFS getnewvnode() code; for multi-label file systems,
leave the labeling up to the file system. With UFS1/2, this means
reading the extended attribute during vfs_vget() as the inode is
pulled off disk, rather than hitting the extended attributes
frequently during operations later, improving performance. This
also corrects sematics for shared vnode locks, which were not
previously present in the system. This chances the cache
coherrency properties WRT out-of-band access to label data, but in
an acceptable form. With UFS1, there is a small race condition
during automatic extended attribute start -- this is not present
with UFS2, and occurs because EAs aren't available at vnode
inception. We'll introduce a work around for this shortly.

Approved by: re
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# e6a5564e 20-Oct-2002 Robert Watson <rwatson@FreeBSD.org>

Missed a case of _POSIX_MAC_PRESENT -> _PC_MAC_PRESENT rename.

Pointed out by: phk


# bc9d8a9a 16-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Fix comments and one resulting code confusion about the type of the
"command" argument to VOP_IOCTL.

Spotted by: FlexeLint.


# 2b7f24d2 11-Oct-2002 Mike Barcroft <mike@FreeBSD.org>

Change iov_base's type from `char *' to the standard `void *'. All
uses of iov_base which assume its type is `char *' (in order to do
pointer arithmetic) have been updated to cast iov_base to `char *'.


# 74e62b1b 05-Oct-2002 Robert Watson <rwatson@FreeBSD.org>

Integrate a devfs/MAC fix from the MAC tree: avoid a race condition during
devfs VOP symlink creation by introducing a new entry point to determine
the label of the devfs_dirent prior to allocation of a vnode for the
symlink.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 2c5e1d1e 01-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Move the vop-vector declaration into devfs_vnops.c where it belongs.


# 562c8668 28-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Fix mis-indent.


# 86ed6d45 18-Sep-2002 Nate Lawson <njl@FreeBSD.org>

Remove any VOP_PRINT that redundantly prints the tag.
Move lockmgr_printinfo() into vprint() for everyone's benefit.

Suggested by: bde


# 06be2aaa 14-Sep-2002 Nate Lawson <njl@FreeBSD.org>

Remove all use of vnode->v_tag, replacing with appropriate substitutes.
v_tag is now const char * and should only be used for debugging.

Additionally:
1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK
2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which
is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP.

Suggested by: phk
Reviewed by: bde, rwatson (earlier version)


# e6e370a7 04-Aug-2002 Jeff Roberson <jeff@FreeBSD.org>

- Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.

Idea stolen from: BSD/OS


# 67d722ed 01-Aug-2002 Robert Watson <rwatson@FreeBSD.org>

Introduce support for Mandatory Access Control and extensible
kernel access control.

Teach devfs how to respond to pathconf() _POSIX_MAC_PRESENT queries,
allowing it to indicate to user processes that individual vnode labels
are available.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# bdc2cd13 01-Aug-2002 Robert Watson <rwatson@FreeBSD.org>

Hook up devfs_pathconf() for specfs devfs nodes, not just regular
devfs nodes.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 6742f328 31-Jul-2002 Robert Watson <rwatson@FreeBSD.org>

Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument devfs to support per-dirent MAC labels. In particular,
invoke MAC framework when devfs directory entries are instantiated
due to make_dev() and related calls, and invoke the MAC framework
when vnodes are instantiated from these directory entries. Implement
vop_setlabel() for devfs, which pushes the label update into the
devfs directory entry for semi-persistant store. This permits the MAC
framework to assign labels to devices and directories as they are
instantiated, and export access control information via devfs vnodes.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# a1dc2096 16-Jul-2002 Dima Dorfman <dd@FreeBSD.org>

Introduce the DEVFS "rule" subsystem. DEVFS rules permit the
administrator to define certain properties of new devfs nodes before
they become visible to the userland. Both static (e.g., /dev/speaker)
and dynamic (e.g., /dev/bpf*, some removable devices) nodes are
supported. Each DEVFS mount may have a different ruleset assigned to
it, permitting different policies to be implemented for things like
jails.

Approved by: phk


# c83fca1f 01-Jun-2002 Semen Ustimenko <semenu@FreeBSD.org>

Make devfs to give honour to PDIRUNLOCK flag.

Reviewed by: jeff
MFC after: 1 week


# 8eee3a3d 10-May-2002 Maxime Henrion <mux@FreeBSD.org>

Fix several bugs in devfs_lookupx(). When we check the nameiop to
make sure it's a correct operation for devfs, do it only in the
ISLASTCN case. If we don't, we are assuming that the final file will
be in devfs, which is not true if another partition is mounted on top
of devfs or with special filenames (like /dev/net/../../foo).

Reviewed by: phk


# a12cfddc 29-Apr-2002 Robert Watson <rwatson@FreeBSD.org>

Use vnode locking with devfs; permit VFS locking assertions to make
sense for devfs vnodes, and reduce/remove potential races in the devfs
code.

Submitted by: iadowse
Approved by: phk


# 11257f4d 05-Apr-2002 Bruce Evans <bde@FreeBSD.org>

Fixed assorted bugs in setting of timestamps in devfs_setattr().

Setting of timestamps on devices had no effect visible to userland
because timestamps for devices were set in places that are never used.
This broke:
- update of file change time after a change of an attribute
- setting of file access and modification times.

The VA_UTIMES_NULL case did not work. Revs 1.31-1.32 were supposed to
fix this by copying correct bits from ufs, but had little or no effect
because the old checks were not removed.


# 44731cab 01-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


# 11caded3 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P.


# e9fc9230 14-Mar-2002 Maxim Konovalov <maxim@FreeBSD.org>

Be consistent with UFS in a way how devfs_setattr() checks credentials
for chmod(2), chown(2) and utimes(2) with respect to jail(2).

Reviewed by: rwatson, ru
Not objected by: phk
Approved by: ru


# 2ab80ed8 25-Nov-2001 Dima Dorfman <dd@FreeBSD.org>

Address two minor issues: implement the _PC_NAME_MAX and _PC_PATH_MAX
pathconf() variables for directories, and set st_size and st_blocks
(of struct stat) for directories as appropriate. Note that st_size is
always set to DEV_BSIZE, since the size of the directories is not
currently kept.

Reviewed by: phk, bde


# d018a84c 04-Nov-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Fix "echo > /dev/null" for non-root users which broke in previous commit.


# 96070273 03-Nov-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Use vfs_timestamp() instead of getnanotime().

Add magic stuff copied from ufs_setattr().

Instructed by: bde


# c95b982a 13-Oct-2001 Bruce Evans <bde@FreeBSD.org>

Backed out vestiges of the quick fixes for the transient breakage of
<sys/mount.h> in rev.1.106 of the latter (don't include <sys/socket.h>
just to work around bugs in <sys/mount.h>).


# 40739c02 30-Sep-2001 Poul-Henning Kamp <phk@FreeBSD.org>

The behaviour of whiteout'ing symlinks were too confusing, instead
remove them when asked to.


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 12d1aec2 14-Aug-2001 Poul-Henning Kamp <phk@FreeBSD.org>

linux ls fails on DEVFS /dev because linux_getdents fails because
linux_getdents uses VOP_READDIR( ..., &ncookies, &cookies ) instead of
VOP_READDIR( ..., NULL, NULL ) because it seems to need the offsets for
linux_dirent and sizeof(dirent) != sizeof(linux_dirent)...

PR: 29467
Submitted by: Michael Reifenberger <root@nihil.plaut.de>
Reviewed by: phk


# 51716196 01-Jun-2001 Brian Somers <brian@FreeBSD.org>

Support /dev/tun cloning. Ansify if_tun.c while I'm there.

Only tun0 -> tun32767 may now be opened as struct ifnet's if_unit
is a short.

It's now possible to open /dev/tun and get a handle back for an available
tun device (use devname to find out what you got).

The implementation uses rman by popular demand (and against my judgement)
to track opened devices and uses the new dev_depends() to ensure that
all make_dev()d devices go away before the module is unloaded.

Reviewed by: phk


# e33457d7 26-May-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Don't copy the trailing zero in readlink, it confuses namei().

PR: 27656


# 5a9300c4 23-May-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Change the way deletes are managed in DEVFS.

This fixes a number of warnings relating to removed cloned devices.

It also makes it possible to recreate deleted devices with
mknod(2). The major/minor arguments are ignored.


# f73cbde4 14-May-2001 Poul-Henning Kamp <phk@FreeBSD.org>

After a successfull poll of the cloning functions, match on the
returned dev_t rather than the original name.

This allows cloning from one name to another which is useful for
/dev/tty and later for the pty's.


# ab9f3b29 13-May-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Convert DEVFS from an "opt-in" to an "opt-out" option.

If for some reason DEVFS is undesired, the "NODEVFS" option is
needed now.

Pending any significant issues, DEVFS will be made mandatory in
-current on july 1st so that we can start reaping the full
benefits of having it.


# 6bd2ea83 06-May-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unneeded devfs_badop()

Noticed by: rwatson


# fb919e4d 01-May-2001 Mark Murray <markm@FreeBSD.org>

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


# b7ebffbc 29-Apr-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Add a vop_stdbmap(), and make it part of the default vop vector.

Make 7 filesystems which don't really know about VOP_BMAP rely
on the default vector, rather than more or less complete local
vop_nopbmap() implementations.


# 3be6e0c2 23-Apr-2001 Matt Jacob <mjacob@FreeBSD.org>

add this ridiculous include foo so it will compile again


# 3e8bea96 18-Feb-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Remove a debug printf.


# 4b1c62b3 02-Feb-2001 Poul-Henning Kamp <phk@FreeBSD.org>

At the point in time where most devices are created, we don't know what
time it is because boottime is not yet initialized. Finagle the relevant
fields when we get the chance.


# ecde9a6d 02-Feb-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Only superuser can create symlinks.
Give symlinks mode 755 by default to avoid triggering alert eyes.
(the mode isn't use on symlinks)


# aadf2655 30-Jan-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Fix two minor nits.

Existences revealed, but no details offered by: bp


# 2a043678 08-Dec-2000 Poul-Henning Kamp <phk@FreeBSD.org>

staticize.


# cf9fa8e7 29-Oct-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Move suser() and suser_xxx() prototypes and a related #define from
<sys/proc.h> to <sys/systm.h>.

Correctly document the #includes needed in the manpage.

Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.


# 8abea41d 09-Oct-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Don't hold an extra reference to vnodes. Devfs vnodes are sufficiently
cheap to setup that it doesn't really matter that we recycle device
vnodes at kleenex speed.

Implement first cut try at killing cloned devices when they are
not needed anymore. For now only the bpf driver is involved in
this experiment. Cloned devices can set the SI_CHEAPCLONE flag
which allows us to destroy_dev() it when the vcount() drops to zero
and the vnode is reclaimed. For now it's a requirement that the
driver doesn't keep persistent state from close to (re)open.

Some whitespace changes.


# eb7ba7f9 18-Sep-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Ignore attempts to set flags to zero. This quenches a syslog warning
from login(1).


# c80d2913 15-Sep-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Add canonical checks to devfs_setattr().


# 93bcdfe2 06-Sep-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Add refcounts to the "global" DEVFS inode slots, this allows us
to recycle inodes after a destroy_dev() but not until all mounts
have picked up the change.

Add support for an overflow table for DEVFS inodes. The static
table defaults to 1024 inodes, if that fills, an overflow table
of 32k inodes is allocated. Both numbers can be changed at
compile time, the size of the overflow table also with the
sysctl vfs.devfs.noverflow.

Use atomic instructions to barrier between make_dev()/destroy_dev()
and the mounts.

Add lockmgr() locking of directories for operations accessing or
modifying the directory TAILQs.

Various nitpicking here and there.


# db901281 02-Sep-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Avoid the modules madness I inadvertently introduced by making the
cloning infrastructure standard in kern_conf. Modules are now
the same with or without devfs support.

If you need to detect if devfs is present, in modules or elsewhere,
check the integer variable "devfs_present".

This happily removes an ugly hack from kern/vfs_conf.c.

This forces a rename of the eventhandler and the standard clone
helper function.

Include <sys/eventhandler.h> in <sys/conf.h>: it's a helper #include
like <sys/queue.h>

Remove all #includes of opt_devfs.h they no longer matter.


# 012c643d 29-Aug-2000 Robert Watson <rwatson@FreeBSD.org>

o Restructure vaccess() so as to check for DAC permission to modify the
object before falling back on privilege. Make vaccess() accept an
additional optional argument, privused, to determine whether
privilege was required for vaccess() to return 0. Add commented
out capability checks for reference. Rename some variables to make
it more clear which modes/uids/etc are associated with the object,
and which with the access mode.
o Update file system use of vaccess() to pass NULL as the optional
privused argument. Once additional patches are applied, suser()
will no longer set ASU, so privused will permit passing of
privilege information up the stack to the caller.

Reviewed by: bde, green, phk, -security, others
Obtained from: TrustedBSD Project


# c32d0a1d 27-Aug-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Reorder vop's alphabetically.
Smarter use of devfs_allocv() (from bp@)
Introduce devfs_find()
".." fixes to devfs_lookup (from bp@)


# 749e1537 26-Aug-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Minor cleanups tp devfs_readdir();
Add devfs_read() for directories. (inspired by bp@)


# a481b90b 24-Aug-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Fix panic when removing open device (found by bp@)
Implement subdirs.
Build the full "devicename" for cloning functions.
Fix panic when deleted device goes away.
Collaps devfs_dir and devfs_dirent structures.
Add proper cloning to the /dev/fd* "device-"driver.
Fix a bug in make_dev_alias() handling which made aliases appear
multiple times.
Use devfs_clone to implement getdiskbyname()
Make specfs maintain the stat(2) timestamps per dev_t


# fcc9b84c 21-Aug-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Fix devfs_access() bug on directories.

Remove unused #includes.

Bug spotted by: markm


# 3f54a085 20-Aug-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Remove all traces of Julians DEVFS (incl from kern/subr_diskslice.c)

Remove old DEVFS support fields from dev_t.

Make uid, gid & mode members of dev_t and set them in make_dev().

Use correct uid, gid & mode in make_dev in disk minilayer.

Add support for registering alias names for a dev_t using the
new function make_dev_alias(). These will show up as symlinks
in DEVFS.

Use makedev() rather than make_dev() for MFSs magic devices to prevent
DEVFS from noticing this abuse.

Add a field for DEVFS inode number in dev_t.

Add new DEVFS in fs/devfs.

Add devfs cloning to:
disk minilayer (ie: ad(4), sd(4), cd(4) etc etc)
md(4), tun(4), bpf(4), fd(4)

If DEVFS add -d flag to /sbin/inits args to make it mount devfs.

Add commented out DEVFS to GENERIC