History log of /freebsd-10-stable/sys/fs/pseudofs/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
312286 16-Jan-2017 kib

MFC r311815:
Forcibly remove the cached items from pseudofs vncache on module unload.

303907 10-Aug-2016 kib

MFC r303704:
Some style changes. Fix a typo in comment.

MFC r303705:
Remove Giant asserts. Update comment.

297267 25-Mar-2016 kib

MFC r296652:
Do not perform unneccessary shared recursion on the allproc_lock in
pfs_visible().

293595 09-Jan-2016 dchagin

MFC r283495:

Hide vfs.pfs.trace variable if it is not used.

259506 17-Dec-2013 kib

MFC r258088:
Add check for buflen overflow by comparing the buflen with both offset
and resid.

MFC r258397:
Redo r258088 to avoid relying on signed arithmetic overflow.

256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


242833 09-Nov-2012 attilio

Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag.
Porters should refer to __FreeBSD_version 1000021 for this change as
it may have happened at the same timeframe.


232541 05-Mar-2012 kib

Apply inlined vn_vget_ino() algorithm for ".." lookup in pseudofs.

Reported and tested by: pho
MFC after: 2 weeks


232278 29-Feb-2012 mm

Add procfs to jail-mountable filesystems.

Reviewed by: jamie
MFC after: 1 week


231949 21-Feb-2012 kib

Fix found places where uio_resid is truncated to int.

Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with: bde, das (previous versions)
MFC after: 1 month


230249 17-Jan-2012 mckusick

Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC after: 2 weeks


230132 15-Jan-2012 uqs

Convert files to UTF-8


229694 06-Jan-2012 jh

r222004 changed sbuf_finish() to not clear the buffer error status. As a
consequence sbuf_len() will return -1 for buffers which had the error
status set prior to sbuf_finish() call. This causes a problem in
pfs_read() which purposely uses a fixed size sbuf to discard bytes which
are not needed to fulfill the read request.

Work around the problem by using the full buffer length when
sbuf_finish() indicates an overflow. An overflowed sbuf with fixed size
is always full.

PR: kern/163076
Approved by: des
MFC after: 2 weeks


229692 06-Jan-2012 jh

Check the return value of sbuf_finish() in pfs_readlink() and return
ENAMETOOLONG if the buffer overflowed.

Approved by: des
MFC after: 2 weeks


227697 19-Nov-2011 kib

Existing VOP_VPTOCNP() interface has a fatal flow that is critical for
nullfs. The problem is that resulting vnode is only required to be
held on return from the successfull call to vop, instead of being
referenced.

Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination
with the VOP_VPTOCNP() interface means that the directory vnode
returned from VOP_VPTOCNP() is reclaimed in advance, causing
vn_fullpath() to error with EBADF or like.

Change the interface for VOP_VPTOCNP(), now the dvp must be
referenced. Convert all in-tree implementations of VOP_VPTOCNP(),
which is trivial, because vhold(9) and vref(9) are similar in the
locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(),
if any, should have no trouble with the fix.

Tested by: pho
Reviewed by: mckusick
MFC after: 3 weeks (subject of re approval)


227576 16-Nov-2011 kib

Fix build, use %d for int value formatting.


227550 16-Nov-2011 pho

Handle invalid large values for getdirentries(2) data buffer size.

In collaboration with: kib
Reviewed by: des
Reported by: The iknowthis syscall fuzzer.
MFC after: 1 week


227527 15-Nov-2011 pho

Removed extra PRELE() call.

MFC after: 1 week


227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


211531 20-Aug-2010 jhb

Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and
LK_CANRECURSE after a lock is created. Use them to implement macros that
otherwise manipulated the flags directly. Assert that the associated
lockmgr lock is exclusively locked by the current thread when manipulating
these flags to ensure the flag updates are safe. This last change required
some minor shuffling in a few filesystems to exclusively lock a brand new
vnode slightly earlier.

Reviewed by: kib
MFC after: 3 days


206894 20-Apr-2010 kib

The cache_enter(9) function shall not be called for doomed dvp.
Assert this.

In the reported panic, vdestroy() fired the assertion "vp has namecache
for ..", because pseudofs may end up doing cache_enter() with reclaimed
dvp, after dotdot lookup temporary unlocked dvp.
Similar problem exists in ufs_lookup() for "." lookup, when vnode
lock needs to be upgraded.

Verify that dvp is not reclaimed before calling cache_enter().

Reported and tested by: pho
Reviewed by: kan
MFC after: 2 weeks


202783 22-Jan-2010 jh

Truncate read request rather than returning EIO if the request is
larger than MAXPHYS + 1. This fixes a problem with cat(1) when it
uses a large I/O buffer.

Reported by: Fernando ApesteguĂ­a
Suggested by: jilles
Reviewed by: des
Approved by: trasz (mentor)


196921 07-Sep-2009 kib

If a race is detected, pfs_vncache_alloc() may reclaim a vnode that had
never been inserted into the pfs_vncache list. Since pfs_vncache_free()
does not anticipate this case, it decrements pfs_vncache_entries
unconditionally; if the vnode was not in the list, pfs_vncache_entries
will no longer reflect the actual number of list entries. This may cause
size of the cache to exceed the configured maximum. It may also trigger
a panic during module unload or system shutdown.

Do not decrement pfs_vncache_entries for the vnode that was not in the
list.

Submitted by: tegge
Reviewed by: des
MFC after: 1 week


196920 07-Sep-2009 kib

insmntque_stddtr() clears vp->v_data and resets vp->v_op to
dead_vnodeops before calling vgone(). Revert r189706 and corresponding
part of the r186560.

Noted and reviewed by: tegge
Approved by: des (pseudofs part)
MFC after: 3 days


196689 31-Aug-2009 kib

Remove spurious pfs_unlock().

PR: kern/137310
Reviewed by: des
MFC after: 3 days


194990 25-Jun-2009 kib

Change the type of uio_resid member of struct uio from int to ssize_t.
Note that this does not actually enable full-range i/o requests for
64 architectures, and is done now to update KBI only.

Tested by: pho
Reviewed by: jhb, bde (as part of the review of the bigger patch)


193919 10-Jun-2009 kib

VOP_IOCTL takes unlocked vnode as an argument. Due to this, v_data may
be NULL or derefenced memory may become free at arbitrary moment.

Lock the vnode in cd9660, devfs and pseudofs implementation of VOP_IOCTL
to prevent reclaim; check whether the vnode was already reclaimed after
the lock is granted.

Reported by: georg at dts su
Reviewed by: des (pseudofs)
MFC after: 2 weeks


193556 06-Jun-2009 des

Drop Giant.

MFC after: 1 week


193176 31-May-2009 kib

Unlock the pseudofs vnode before calling fill method for pfs_readlink().
The fill code may need to lock another vnode, e.g. procfs file
implementation.

Reviewed by: des
Tested by: pho
MFC after: 2 weeks


192973 28-May-2009 des

Use a temporary variable to avoid a duplicate strlen().

Submitted by: kib
MFC after: 1 week


191990 11-May-2009 attilio

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


190839 08-Apr-2009 des

Remove spurious locking in pfs_write().

Reported by: Andrew Brampton <me@bramp.net>
MFC after: 1 week


190806 07-Apr-2009 des

Fix an inverted KASSERT. Add similar assertions in other similar places.

Reported by: Andrew Brampton <me@bramp.net>
MFC after: 1 week


188677 16-Feb-2009 des

Fix a logic bug that caused the pfs_attr method to be called only for
PFS_PROCDEP nodes.

Submitted by: Andrew Brampton <brampton@gmail.com>
MFC after: 2 weeks


186981 09-Jan-2009 marcus

Fix a deadlock which can occur due to a pseudofs vnode not getting unlocked.

Reported by: Richard Todd <rmtodd@ichotolot.servalan.com>
Reviewed by: kib
Approved by: kib


186617 30-Dec-2008 marcus

Add a VOP_VPTOCNP implementation for pseudofs which covers file systems
such as procfs and linprocfs.

This implementation's locking was enhanced by kib.

Reviewed by: kib
des
Approved by: des
kib
Tested by: pho


186565 29-Dec-2008 kib

When the insmntque() in the pfs_vncache_alloc() fails, vop_reclaim calls
pfs_vncache_free() that removes pvd from the list, while it is not yet
put on the list.

Prevent the invalid removal from the list by clearing pvd_next and
pvd_prev for the newly allocated pvd, and only move pfs_vncache list
head when the pvd was at the head.

Suggested and approved by: des
MFC after: 2 weeks


186561 29-Dec-2008 kib

Drop the pseudofs vnode lock around call to pfs_read handler. The handler
may need to lock arbitrary vnodes, causing either lock order reversal or
recursive vnode lock acquisition.

Tested by: pho
Approved by: des
MFC after: 2 weeks


186560 29-Dec-2008 kib

After the pfs_vncache_mutex is dropped, another thread may attempt to
do pfs_vncache_alloc() for the same pfs_node and pid. In this case, we
could end up with two vnodes for the pair. Recheck the cache under the
locked pfs_vncache_mutex after all sleeping operations are done [1].

This case mostly cannot happen now because pseudofs uses exclusive vnode
locking for lookup. But it does drop the vnode lock for dotdot lookups,
and Marcus' pseudofs_vptocnp implementation is vulnerable too.

Do not call free() on the struct pfs_vdata after insmntque() failure,
because vp->v_data points to the structure, and pseudofs_reclaim()
frees it by the call to pfs_vncache_free().

Tested by: pho [1]
Approved by: des
MFC after: 2 weeks


184413 28-Oct-2008 trasz

Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by: rwatson (mentor)


184205 23-Oct-2008 des

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


183215 20-Sep-2008 kib

fdescfs, devfs, mqueuefs, nfs, portalfs, pseudofs, tmpfs and xfs
initialize the vattr structure in VOP_GETATTR() with VATTR_NULL(),
vattr_null() or by zeroing it. Remove these to allow preinitialization
of fields work in vn_stat(). This is needed to get birthtime initialized
correctly.

Submitted by: Jaakko Heinonen <jh saunalahti fi>
Discussed on: freebsd-fs
MFC after: 1 month


182371 28-Aug-2008 attilio

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


176519 24-Feb-2008 attilio

Introduce some functions in the vnode locks namespace and in the ffs
namespace in order to handle lockmgr fields in a controlled way instead
than spreading all around bogus stubs:
- VN_LOCK_AREC() allows lock recursion for a specified vnode
- VN_LOCK_ASHARE() allows lock sharing for a specified vnode

In FFS land:
- BUF_AREC() allows lock recursion for a specified buffer lock
- BUF_NOREC() disallows recursion for a specified buffer lock

Side note: union_subr.c::unionfs_node_update() is the only other function
directly handling lockmgr fields. As this is not simple to fix, it has
been left behind as "sole" exception.


175294 13-Jan-2008 attilio

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


175202 10-Jan-2008 attilio

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


172697 16-Oct-2007 alfred

Get rid of qaddr_t.

Requested by: bde


172453 05-Oct-2007 jhb

Use the correct pid when checking to see whether or not the /proc/<pid>
directory itself (rather than any of its contents) is visible to the
current thread.

MFC after: 1 week
PR: kern/90063
Submitted by: john of 8192.net
Approved by: re (kensmith)


170401 07-Jun-2007 bmah

Fix off-by-one error (introduced in r1.60) that had the effect of
disallowing a read of exactly MAXPHYS bytes.

Reviewed by: des, rdivacky
MFC after: 1 week
Sponsored by: nCircle Network Security


168985 23-Apr-2007 des

Fix old locking bugs which were revealed when pseudofs was made MPSAFE.

Submitted by: tegge


168768 15-Apr-2007 des

Avoid "unused variable" warning when building without PSEUDOFS_TRACE.


168764 15-Apr-2007 des

Make pseudofs (and consequently procfs, linprocfs and linsysfs) MPSAFE.


168720 14-Apr-2007 des

Further pseudofs improvements:

The pfs_info mutex is only needed to lock pi_unrhdr. Everything else
in struct pfs_info is modified only while Giant is held (during
vfs_init() / vfs_uninit()); add assertions to that effect.

Simplify pfs_destroy somewhat.

Remove superfluous arguments from pfs_fileno_{alloc,free}(), and the
assertions which were added in the previous commit to ensure they were
consistent.

Assert that Giant is held while the vnode cache is initialized and
destroyed. Also assert that the cache is empty when it is destroyed.

Rename the vnode cache mutex for consistency.

Fix a long-standing bug in pfs_getattr(): it would uncritically return
the node's pn_fileno as st_ino. This would result in st_ino being 0
if the node had not previously been visited by readdir(), and also in
an incorrect st_ino for process directories and any files contained
therein. Correct this by abstracting the fileno manipulations
previously done in pfs_readdir() into a new function, pfs_fileno(),
which is used by both pfs_getattr() and pfs_readdir().


168637 11-Apr-2007 des

Add a flag to struct pfs_vdata to mark the vnode as dead (e.g. process-
specific nodes when the process exits)

Move the vnode-cache-walking loop which was duplicated in pfs_exit() and
pfs_disable() into its own function, pfs_purge(), which looks for vnodes
marked as dead and / or belonging to the specified pfs_node and reclaims
them. Note that this loop is still extremely inefficient.

Add a comment in pfs_vncache_alloc() explaining why we have to purge the
vnode from the vnode cache before returning, in case anyone should be
tempted to remove the call to cache_purge().

Move the special handling for pfstype_root nodes into pfs_fileno_alloc()
and pfs_fileno_free() (the root node's fileno must always be 2). This
also fixes a bug where pfs_fileno_free() would reclaim the root node's
fileno, triggering a panic in the unr code, as that fileno was never
allocated from unr to begin with.

When destroying a pfs_node, release its fileno and purge it from the
vnode cache. I wish we could put off the call to pfs_purge() until
after the entire tree had been destroyed, but then we'd have vnodes
referencing freed pfs nodes. This probably doesn't matter while we're
still under Giant, but might become an issue later.

When destroying a pseudofs instance, destroy the tree before tearing
down the fileno allocator.

In pfs_mount(), acquire the mountpoint interlock when required.

MFC after: 3 weeks


168387 05-Apr-2007 des

Whitespace nits.


167497 13-Mar-2007 tegge

Make insmntque() externally visibile and allow it to fail (e.g. during
late stages of unmount). On failure, the vnode is recycled.

Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.

Change getnewvnode() to no longer call insmntque(). Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.

Change vfs_hash_insert() to no longer lock the vnode. The caller now
has that responsibility.

Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup. Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.

Approved by: re (kensmith)
Reviewed by: kib
In collaboration with: kib


167482 12-Mar-2007 des

Add a pn_destroy field to pfs_node. This field points to a destructor
function which is called from pfs_destroy() before the node is reclaimed.

Modify pfs_create_{dir,file,link}() to accept a pointer to a destructor
function in addition to the usual attr / fill / vis pointers.

This breaks both the programming and binary interfaces between pseudofs
and its consumers. It is believed that there are no pseudofs consumers
outside the source tree, so that the impact of this change is minimal.

Submitted by: Aniruddha Bohra <bohra@cs.rutgers.edu>


165737 02-Jan-2007 jhb

Use the vnode interlock to close a race where pfs_vncache_alloc() could
attempt to vn_lock() a destroyed vnode resulting in a hang.

MFC after: 1 week
Submitted by: ups
Reviewed by: des


159996 27-Jun-2006 netchild

Correctly calculate a buffer length. It was off by one so a read() returned
one byte less than needed.

This is a RELENG_x_y candidate, since it fixes a problem with Oracle 10.

Noticed by: Dmitry Ganenko <dima@apk-inform.com>
Testcase by: Dmitry Ganenko <dima@apk-inform.com>
Reviewed by: des
Submitted by: rdivacky
Sponsored by: Google SoC 2006
MFC after: 1 week


158611 15-May-2006 kbyanc

Restore the ability to mount procfs and fdescfs filesystems via the
mount(2) system call:

* Add cmount hook to fdescfs and pseudofs (and, by extension, procfs and
linprocfs). This (mostly) restores the ability to mount these
filesystems using the old mount(2) system call (see below for the
rest of the fix).

* Remove not-NULL check for the data argument from the mount(2) entry
point. Per the mount(2) man page, it is up to the individual
filesystem being mounted to verify data. Or, in the case of procfs,
etc. the filesystem is free to ignore the data parameter if it does
not use it. Enforcing data to be not-NULL in the mount(2) system call
entry point prevented passing NULL to filesystems which ignored the
data pointer value. Apparently, passing NULL was common practice
in such cases, as even our own mount_std(8) used to do it in the
pre-nmount(2) world.

All userland programs in the tree were converted to nmount(2) long ago,
but I've found at least one external program which broke due to this
(presumably unintentional) mount(2) API change. One could argue that
external programs should also be converted to nmount(2), but then there
isn't much point in keeping the mount(2) interface for backward
compatibility if it isn't backward compatible.


155922 22-Feb-2006 jhb

Close some races between procfs/ptrace and exit(2):
- Reorder the events in exit(2) slightly so that we trigger the S_EXIT
stop event earlier. After we have signalled that, we set P_WEXIT and
then wait for any processes with a hold on the vmspace via PHOLD to
release it. PHOLD now KASSERT()'s that P_WEXIT is clear when it is
invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops
to zero.
- Change proc_rwmem() to require that the processing read from has its
vmspace held via PHOLD by the caller and get rid of all the junk to
screw around with the vmspace reference count as we no longer need it.
- In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it
doesn't exist.
- Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers
FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem()
to clear an earlier single-step simualted via a breakpoint). We only
do one to avoid races. Also, by making the EINVAL error for unknown
requests be part of the default: case in the switch, the various
switch cases can now just break out to return which removes a _lot_ of
duplicated PRELE and proc unlocks, etc. Also, it fixes at least one bug
where a LWP ptrace command could return EINVAL with the proc lock still
held.
- Changed the locking for ptrace_single_step(), ptrace_set_pc(), and
ptrace_clear_single_step() to always be called with the proc lock
held (it was a mixed bag previously). Alpha and arm have to drop
the lock while the mess around with breakpoints, but other archs
avoid extra lock release/acquires in ptrace(). I did have to fix a
couple of other consumers in kern_kse and a few other places to
hold the proc lock and PHOLD.

Tested by: ps (1 mostly, but some bits of 2-4 as well)
MFC after: 1 week


155920 22-Feb-2006 jhb

Change pfs_visible() to optionally return a pointer to the process
associated with the passed in pfs_node. If it does return a pointer, it
keeps the process locked. This allows a lot of places that were calling
pfind() again right after pfs_visible() to not have to do that and avoids
races since we don't drop the proc lock just to turn around and lock it
again. This will become more important with future changes to fix races
between procfs/ptrace and exit(2). Also, removed a duplicate pfs_visible()
call in pfs_getextattr().

Reviewed by: des
MFC after: 1 week


148984 12-Aug-2005 des

Eliminate an unnecessary bcopy().


147809 07-Jul-2005 jeff

- Since we don't hold a usecount in pfs_exit we have to get a holdcnt
prior to calling vgone() to prevent any races.

Sponsored by: Isilon Systems, Inc.
Approved by: re (vfs blanket)


145714 30-Apr-2005 des

Fix an old pasto.


145006 13-Apr-2005 jeff

- Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case. Se vfs_lookup.c r1.79 for details.

Sponsored by: Isilon Systems, Inc.


144208 28-Mar-2005 jeff

- We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.

Sponsored by: Isilon Systems, Inc.


144058 24-Mar-2005 jeff

- Update vfs_root implementations to match the new prototype. None of
these filesystems will support shared locks until they are explicitly
modified to do so. Careful review must be done to ensure that this
is safe for each individual filesystem.

Sponsored by: Isilon Systems, Inc.


143841 19-Mar-2005 phk

Use subr_unit


143597 14-Mar-2005 des

Hook pfs_lookup() up to vfs_cachedlookup_desc instead of vfs_lookup_desc,
as suggested by Matt's comment. Also fix some style and paranoia issues.

The entire function could benefit from review by a VFS guru.

MFC after: 6 weeks


143596 14-Mar-2005 des

Fix two long-standing bugs in pfs_readdir():

Since we used an sbuf of size resid to accumulate dirents, we would end
up returning one byte short when we had enough dirents to fill or exceed
the size of the sbuf (the last byte being lost to bogus NUL termination)
causing the next call to return EINVAL due to an unaligned offset. This
went undetected for a long time because I did most of my testing in
single-user mode, where there are rarely enough processes to fill the
4096-byte buffer ls(1) uses. The most common symptom of this bug is that
tab completion of /proc or /compat/linux/proc does not work properly when
many processes are running.

Also, a check near the top would return EINVAL if resid was smaller than
PFS_DELEN, even if it was 0, which is frequently the case and perfectly
allowable. Change the test so that it returns 0 if resid is 0.

MFC after: 2 weeks


143595 14-Mar-2005 des

If PSEUDOFS_TRACE is defined, create a sysctl knob to enable / disable
pseudofs call tracing.


143592 14-Mar-2005 des

fbsdidize.


143513 13-Mar-2005 jeff

- The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem. Check that rather than VI_XLOCK.
- VOP_INACTIVE should no longer drop the vnode lock.
- The vnode lock is required around calls to vrecycle() and vgone().

Sponsored by: Isilon Systems, Inc.


142907 01-Mar-2005 phk

Avoid a couple of mutex operations in the process exit path for the
common case where procfs have never been mounted.

OK'ed by: des


140196 13-Jan-2005 phk

Whitespace in vop_vector{} initializations.


139896 08-Jan-2005 rwatson

Annotate that pfs_exit() always acquires and releases two mutexes for
every process exist, even if procfs isn't mounted. And one of those
mutexes is Giant. No immediate thoughts on fixing this.


138495 06-Dec-2004 phk

Use vfs_mountedfrom().

Since VFS_STATFS() always calls the filesystem with mp->mnt_stat now, the
vfs_statfs method is now a no-op. Explain this in a comment.


138485 06-Dec-2004 kan

Fix a typo in PFS_TRACE.

PR: kern/74461
Submitted by: Craig Rodrigues <rodrigc at crodrigues.org>


138290 01-Dec-2004 phk

Back when VOP_* was introduced, we did not have new-style struct
initializations but we did have lofty goals and big ideals.

Adjust to more contemporary circumstances and gain type checking.

Replace the entire vop_t frobbing thing with properly typed
structures. The only casualty is that we can not add a new
VOP_ method with a loadable module. History has not given
us reason to belive this would ever be feasible in the the
first place.

Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.

Give coda correct prototypes and function definitions for
all vop_()s.

Generate a bit more data from the vnode_if.src file: a
struct vop_vector and protype typedefs for all vop methods.

Add a new vop_bypass() and make vop_default be a pointer
to another struct vop_vector.

Remove a lot of vfs_init since vop_vector is ready to use
from the compiler.

Cast various vop_mumble() to void * with uppercase name,
for instance VOP_PANIC, VOP_NULL etc.

Implement VCALL() by making vdesc_offset the offsetof() the
relevant function pointer in vop_vector. This is disgusting
but since the code is generated by a script comparatively
safe. The alternative for nullfs etc. would be much worse.

Fix up all vnode method vectors to remove casts so they
become typesafe. (The bulk of this is generated by scripts)


134647 02-Sep-2004 rwatson

Back out pseudo_vnops.c:1.45, which was a workaround for pfind()
returning incompletely initialized processes. This problem was
eliminated by kern_proc.c:1.215, which causes pfind() not to
return processes in the PRS_NEW state.


133776 15-Aug-2004 des

Release the vnode cache mutex when calling vgone(), since vgone() may
sleep. This makes pfs_exit() even less efficient than before, but on
the bright side, the vnode cache mutex no longer needs to be recursive.


133668 13-Aug-2004 rwatson

Commit a work-around for a more general bug involving process state:
check whether p_ucred is NULL or not in pfs_getattr() before
dereferencing the credential, and return ENOENT if there wasn't one.

This is a symptom of a larger problem, wherein pfind() can return
references to incompletely initialized processes, and we instead ought
to not return them, or check the process state before acting on the
process.

Reported by: kris
Discussed with: tjr, others


132902 30-Jul-2004 phk

Put a version element in the VFS filesystem configuration structure
and refuse initializing filesystems with a wrong version. This will
aid maintenance activites on the 5-stable branch.

s/vfs_mount/vfs_omount/

s/vfs_nmount/vfs_mount/

Name our filesystems mount function consistently.

Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space. A few places abused
it to get hold of some credentials to pass around. Effectively
it is unused.

Reorganize the root filesystem selection code.


132199 15-Jul-2004 phk

Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".


132023 12-Jul-2004 alfred

Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.


131871 09-Jul-2004 des

Accumulate directory entries in a fixed-length sbuf, and uiomove them in
one go before returning. This avoids calling uiomove() while holding
allproc_lock.

Don't adjust uio->uio_offset manually, uiomove() does that for us.

Don't drop allproc_lock before calling panic().

Suggested by: alfred


126975 14-Mar-2004 green

When taking event callbacks (like process_exit) out from under Giant, those
which do not lock Giant themselves will be exposed. Unbreak pfs_exit().


125671 10-Feb-2004 nectar

Fix a panic in pseudofs(9) that could occur when doing an I/O
operation with a large request or large offset.

Reported by: Joel Ray Holveck <joelh@piquan.org>
Submitted by: des


123248 07-Dec-2003 des

Constify, and add an API function to find a named node in a directory.


120775 05-Oct-2003 jeff

- Don't cache_purge() in *_reclaim routines. vclean() does it for us so
this is redundant.


120665 02-Oct-2003 nectar

Introduce a uiomove_frombuf helper routine that handles computing and
validating the offset within a given memory buffer before handing the
real work off to uiomove(9).

Use uiomove_frombuf in procfs to correct several issues with
integer arithmetic that could result in underflows/overflows. As a
side-effect, the code is significantly simplified.

Add additional sanity checks when computing a memory allocation size
in pfs_read.

Submitted by: rwatson (original uiomove_frombuf -- bugs are mine :-)
Reported by: Joost Pol <joost@pine.nl> (integer underflows/overflows)


119122 19-Aug-2003 des

Add pfs_visible() checks to pfs_getattr() and pfs_getextattr(). This
also fixes pfs_access() since it relies on VOP_GETATTR() which will call
pfs_getattr(). This prevents jailed processes from discovering the
existence, start time and ownership of processes outside the jail.

PR: kern/48156


119091 18-Aug-2003 jhb

Spell the name of the lock right in addition to getting the type right.

Submitted by: Kim Culhan <kimc@w8hd.org>


119089 18-Aug-2003 jhb

The allproc lock is a sx lock, not a mutex, so fix the assertion. This
asserts that the sx lock is held, but does not specify if the lock is held
shared or exclusive, thus either type of lock satisfies the assertion.


119069 18-Aug-2003 des

Rework pfs_iterate() a bit to eliminate a bug related to process
directories. Previously, pfs_iterate() would return -1 when it
reached the end of the process list while processing a process
directory node, even if the parent directory contained further nodes
(which is the case for the linprocfs root directory, where the process
directory node is actually first in the list). With this patch,
pfs_iterate() will continue to traverse the parent directory's node
list after exhausting the process list (as was the intention all
along). The code should hopefully be easier to read as well.

While I'm here, have pfs_iterate() assert that the allproc lock is
held.


116639 20-Jun-2003 jmg

fix grammar in comment


116271 12-Jun-2003 phk

Initialize struct vfsops C99-sparsely.

Submitted by: hmp
Reviewed by: phk


115609 01-Jun-2003 truckman

Don't unlock the parent directory vnode twice if the ISDOTDOT flag
is set.


114216 29-Apr-2003 kan

Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on: standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>


112564 24-Mar-2003 jhb

Replace the at_fork, at_exec, and at_exit functions with the slightly more
flexible process_fork, process_exec, and process_exit eventhandlers. This
reduces code duplication and also means that I don't have to go duplicate
the eventhandler locking three more times for each of at_fork, at_exec, and
at_exit.

Reviewed by: phk, jake, almost complete silence on arch@


112119 11-Mar-2003 kan

Rename vfs_stdsync function to vfs_stdnosync which matches more
closely what function is really doing. Update all existing consumers
to use the new name.

Introduce a new vfs_stdsync function, which iterates over mount
point's vnodes and call FSYNC on each one of them in turn.

Make nwfs and smbfs use this new function instead of rolling their
own identical sync implementations.

Reviewed by: jeff


111769 02-Mar-2003 des

Get rid of caddr_t.


111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


109969 28-Jan-2003 tjr

Do not allow a cached vnode to be shared among multiple mounts of the same
kind of pseudofs-based filesystem. Fixes (at least) one problem where
when procfs is mounted mupltiple times, trying to unmount one will often
cause the wrong one to get unmounted, and other problem where mounting
one procfs on top of another caused the kernel to lock up.

Reviewed by: des


109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


109608 21-Jan-2003 rwatson

GC an unused reference to vop_refreshlabel_desc; reference to
opt_mac.h was removed previously so it was never compiled in.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


108648 04-Jan-2003 phk

Since Jeffr made the std* functions the default in rev 1.63 of
kern/vfs_defaults.c it is wrong for the individual filesystems to use
the std* functions as that prevents override of the default.

Found by: src/tools/tools/vop_table


105988 26-Oct-2002 rwatson

Slightly change the semantics of vnode labels for MAC: rather than
"refreshing" the label on the vnode before use, just get the label
right from inception. For single-label file systems, set the label
in the generic VFS getnewvnode() code; for multi-label file systems,
leave the labeling up to the file system. With UFS1/2, this means
reading the extended attribute during vfs_vget() as the inode is
pulled off disk, rather than hitting the extended attributes
frequently during operations later, improving performance. This
also corrects sematics for shared vnode locks, which were not
previously present in the system. This chances the cache
coherrency properties WRT out-of-band access to label data, but in
an acceptable form. With UFS1, there is a small race condition
during automatic extended attribute start -- this is not present
with UFS2, and occurs because EAs aren't available at vnode
inception. We'll introduce a work around for this shortly.

Approved by: re
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


105561 20-Oct-2002 phk

'&' not used for pointers to functions.

Spotted by: FlexeLint


105165 15-Oct-2002 phk

Plug an infrequent (I think) memory leak.

Spotted by: FlexeLint


105077 14-Oct-2002 mckusick

Regularize the vop_stdlock'ing protocol across all the filesystems
that use it. Specifically, vop_stdlock uses the lock pointed to by
vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to
reference vp->v_lock. Filesystems that wish to use the default
do not need to allocate a lock at the front of their node structure
(as some still did) or do a lockinit. They can simply start using
vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks,
but still use the vop_stdlock functions (such as nullfs) can simply
replace vp->v_vnlock with a pointer to the lock that they wish to
have used for the vnode. Such filesystems are responsible for
setting the vp->v_vnlock back to the default in their vop_reclaim
routine (e.g., vp->v_vnlock = &vp->v_lock).

In theory, this set of changes cleans up the existing filesystem
lock interface and should have no function change to the existing
locking scheme.

Sponsored by: DARPA & NAI Labs.


103936 25-Sep-2002 jeff

- Use vrefcnt() where it is safe to do so instead of doing direct and
unlocked accesses to v_usecount.
- Lock access to the buf lists in the various sync routines. interlock
locking could be avoided almost entirely in leaf filesystems if the
fsync function had a generic helper.


103314 14-Sep-2002 njl

Remove all use of vnode->v_tag, replacing with appropriate substitutes.
v_tag is now const char * and should only be used for debugging.

Additionally:
1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK
2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which
is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP.

Suggested by: phk
Reviewed by: bde, rwatson (earlier version)


101308 04-Aug-2002 jeff

- Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.

Idea stolen from: BSD/OS


101130 01-Aug-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Modify pseudofs so that it can support synthetic file systems with
the multilabel flag set. In particular, implement vop_refreshlabel()
as pn_refreshlabel(). Implement pfs_refreshlabel() to invoke this,
and have it fall back to the mount label if the file system does
not implement pn_refreshlabel() for the node. Otherwise, permit
the file system to determine how the service is provided.

Approved by: des
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


99566 08-Jul-2002 jeff

Lock down pseudofs:
- Initialize lock structure in vncache_alloc
- Return locked vnodes from vncache_alloc
- Setup vnode op vectors to use default lock, unlock, and islocked
- Implement simple locking scheme required for lookup


97940 06-Jun-2002 des

Gratuitous whitespace cleanup.


96886 19-May-2002 jhb

Change p_can{debug,see,sched,signal}()'s first argument to be a thread
pointer instead of a proc pointer and require the process pointed to
by the second argument to be locked. We now use the thread ucred reference
for the credential checks in p_can*() as a result. p_canfoo() should now
no longer need Giant.


95984 03-May-2002 des

s/pfs_badop/vop_eopnotsupp/

Submitted by: phk


95953 02-May-2002 mux

Convert the pseudofs framework to nmount (thus procfs and linprocfs).

Reviewed by: des (some time ago), phk


94637 14-Apr-2002 jhb

Remove stale XXX comment.


93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


90448 10-Feb-2002 rwatson

Part I: Update extended attribute API and ABI:

o Modify the system call syntax for extattr_{get,set}_{fd,file}() so
as not to use the scatter gather API (which appeared not to be used
by any consumers, and be less portable), rather, accepts 'data'
and 'nbytes' in the style of other simple read/write interfaces.
This changes the API and ABI.

o Modify system call semantics so that extattr_get_{fd,file}() return
a size_t. When performing a read, the number of bytes read will
be returned, unless the data pointer is NULL, in which case the
number of bytes of data are returned. This changes the API only.

o Modify the VOP_GETEXTATTR() vnode operation to accept a *size_t
argument so as to return the size, if desirable. If set to NULL,
the size will not be returned.

o Update various filesystems (pseodofs, ufs) to DTRT.

These changes should make extended attributes more useful and more
portable. More commits to rebuild the system call files, as well
as update userland utilities to follow.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


90206 04-Feb-2002 rwatson

Change EPERM to EOPNOTSUPP when failing pseudofs_setattr() arbitrarily.

Quoth the alfred: The latter would be better.


90205 04-Feb-2002 rwatson

Return EPERM instead of 0 in the un-implemented pseudofs_setattr().
Conceivably, it should even return EOPNOTSUPP.


89071 08-Jan-2002 msmith

Staticise pfs_vncache, it's not used anywhere else.

Reviewed by: des


88868 04-Jan-2002 tanimura

Do not derefer null.

Reviewed by: des


88234 19-Dec-2001 dillon

Pseudofs was leaking VFS cache entries badly due to its cache and use of
the wrong VOP descriptor. This misuse caused VFS-cached vnodes to be
re-cached, resulting in the leak. This commit is an interim fix until DES
has a chance to rework the code involved.


87670 11-Dec-2001 green

Add VOP_GETEXTATTR(9) passthrough support to pseudofs.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


87599 10-Dec-2001 obrien

Update to C99, s/__FUNCTION__/__func__/,
also don't use ANSI string concatenation.


87541 09-Dec-2001 des

Fix an incorrect PFS_TRACE. Also, use __func__ instead of __FUNCTION__.


86969 27-Nov-2001 des

Add support for a last-close handler.
Revert the module version bumps; they're quite pointless as long as the
only pseudofs consumer is linprocfs, which is in the tree.


85940 03-Nov-2001 des

Reduce the number of #include dependencies by declaring some of the structs
used in pseudofs.h as opaque structs.


85561 26-Oct-2001 des

Add VOP_IOCTL support, and fix a bug that would cause a panic if a file or
symlink lacked a filler function.


85180 19-Oct-2001 des

Argh! I updated the version number in the MODULE_DEPEND() thingamagook but
not in the actual MODULE_VERSION(). Pass me the pointy hat.


85128 19-Oct-2001 des

Switch to dynamic rather than static initialization.
This makes it possible (in theory) for nodes to be added and / or removed
from pseudofs filesystems at runtime.


84811 11-Oct-2001 jhb

Add missing includes of sys/lock.h.


84386 02-Oct-2001 des

Add a PFS_DISABLED flag; pfs_visible() automatically returns 0 if it is set
on the node in question. Also add two API functions for setting and clearing
this flag; setting it also reclaims all vnodes associated with the node.


84383 02-Oct-2001 des

Only print "XXX (un)registered" message if bootverbose.


84247 01-Oct-2001 des

[the previous commit to pseudofs_vncache.c got the wrong log message]

YA pseudofs megacommit, part 2:

- Merge the pfs_vnode and pfs_vdata structures, and make the vnode cache
a doubly-linked list. This eliminates the need to walk the list in
pfs_vncache_free().

- Add an exit callout which revokes vnodes associated with the process
that just exited. Since it needs to lock the cache when it does this,
pfs_vncache_mutex needs MTX_RECURSE.


84246 01-Oct-2001 des

YA pseudofs megacommit, part 1:

- Add a third callback to the pfs_node structure. This one simply returns
non-zero if the specified requesting process is allowed to access the
specified node for the specified target process. This is used in
addition to the usual permission checks, e.g. when certain files don't
make sense for certain (system) processes.

- Make sure that pfs_lookup() and pfs_readdir() don't yap about files
which aren't pfs_visible(). Also check pfs_visible() before performing
reads and writes, to prevent the kind of races reported in SA-00:77 and
SA-01:55 (fork a child, open /proc/child/ctl, have that child fork a
setuid binary, and assume control of it).

- Add some more trace points.


84187 30-Sep-2001 des

pseudofs.h:

- Rearrange the flag constants a little to simplify specifying and testing
for readability and writeability.

pseudofs_vnops.c:

- Track the aforementioned change.

- Add checks to pfs_open() to prevent opening read-only files for writing
or vice versa (pfs_{read,write} would block the actual reads and writes,
but it's still a bug to allow the open() to succeed). Also, return
EOPNOTSUPP if the caller attempts to lock the file.

- Add more trace points.


84098 29-Sep-2001 des

Pseudofs take 2:

- Remove hardcoded uid, gid, mode from struct pfs_node; make pfs_getattr()
smart enough to get it right most of the time, and allow for callbacks
to handle the remaining cases. Rework the definition macros to match.

- Add lots of (conditional) debugging output.

- Fix a long-standing bug inherited from procfs: don't pretend to be a
read-only file system. Instead, return EOPNOTSUPP for operations we
truly can't support and allow others to fail silently. In particular,
pfs_lookup() now treats CREATE as LOOKUP. This may need more work.

- In pfs_lookup(), if the parent node is process-dependent, check that
the process in question still exists.

- Implement pfs_open() - its only current function is to check that the
process opening the file can see the process it belongs to.

- Finish adding support for writeable nodes.

- Bump module version number.

- Introduce lots of new bugs.


84082 28-Sep-2001 des

The previous commit introduced some references to "curproc" which should have
been references to "curthread". Correct this.


83927 25-Sep-2001 des

Clean up my source tree to avoid getting hit too badly by the next KSE or
whatever mega-commit. This goes some way towards adding support for
writeable files (needed by procfs).


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


78274 15-Jun-2001 des

#if 0 out pfs_null() to silence the warning about it not being referenced.


78073 11-Jun-2001 des

For some reason, though the module builds just fine without <sys/lock.h>,
LINT fails to build without it.


78018 10-Jun-2001 des

Bail out if the fill function failed.


78017 10-Jun-2001 des

Whoops, some of my test code snuck in here.


78003 10-Jun-2001 des

Argh. Fix braino in previous commit.


78001 10-Jun-2001 des

Add a 'flags' argument to the PFS_PROCDIR macro.


77998 10-Jun-2001 des

Add support for process-dependent directories. This means that save for
the lack of a man page, pseudofs is mostly complete now.


77967 10-Jun-2001 des

Blah, not my day. This file needs <sys/mutex.h> now.


77966 10-Jun-2001 des

Remember to unlock the process pfind() returns.


77965 10-Jun-2001 des

Add missing #include of <sys/mutex.h>.


77964 10-Jun-2001 des

Catch up with the change in sbuf_new's prototype.


75295 07-Apr-2001 des

Let pseudofs into the warmth of the FreeBSD CVS repo.

It's not finished yet (I still have to find a way to implement process-
dependent nodes without consuming too much memory, and the permission
system needs tightening up), but it's becoming hard to work on without
a repo (I've accidentally almost nuked it once already), and it works
(except for the lack of process-dependent nodes, that is).

I was supposed to commit this a week ago, but timed out waiting for jkh
to reply to some questions I had. Pass him a spoonful of bad karma :)