History log of /freebsd-11-stable/sys/kern/vfs_mount.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 369560 06-Apr-2021 git2svn

mount: Disallow mounting over a jail root

Discussed with: jamie
Approved by: so
Security: CVE-2020-25584
Security: FreeBSD-SA-21:10.jail_mount

Git Hash: 6f7815083ad66c34bad0dfa08c7033ff670b3be1
Git Author: markj@FreeBSD.org


# 362375 19-Jun-2020 freqlabs

MFC r362252:

Apply default security flavor in vfs_export

Reported by: npn
Reviewed by: rmacklem
Approved by: mav (mentor)
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25300


# 361797 04-Jun-2020 freqlabs

MFC r361699, r361711:

Assign default security flavor when converting old export args

vfs_export requires security flavors be explicitly listed when
exporting as of r360900.

Use the default AUTH_SYS flavor when converting old export args to
ensure compatibility with the legacy mount syscall.

Reported by: rmacklem
Reviewed by: rmacklem
Approved by: mav (mentor)
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25045


# 357032 23-Jan-2020 mckusick

MFC of 356763

Remove call to VFS_SYNC() to avoid unmount livelock


# 353109 04-Oct-2019 emaste

MFC r352796: Check the vfs option length is valid before accessing through

When a VFS option passed to nmount is present but NULL the kernel will
place an empty option in its internal list. This will have a NULL
pointer and a length of 0. When we come to read one of these the kernel
will try to load from the last address of virtual memory. This is
normally invalid so will fault resulting in a kernel panic.

Fix this by checking if the length is valid before dereferencing.


# 342821 06-Jan-2019 mckusick

MFC of 342135 and 342290

Properly respond to error from VFS_ROOT() during mount.

Sponsored by: Netflix


# 332753 19-Apr-2018 avg

MFC r331616: vfs_donmount: in certain cases try r/o mount if r/w mount fails

If the operation is not an update, if neither r/w nor r/o mode is
explicitly requested, if the error code hints at the possibility of the
media being read-only, and if the fallback is allowed, then we can try
to automatically downgrade to the readonly mode.

This is especially useful for auto-mounting of removable media that
sometimes can happen to be write-protected.

The fallback to r/o is not enabled by default. It can be requested on a
per-mount basis with a new mount option, 'autoro'. Or it can be
globally allowed by setting vfs.default_autoro.

Relnotes: yes


# 331722 29-Mar-2018 eadler

Revert r330897:

This was intended to be a non-functional change. It wasn't. The commit
message was thus wrong. In addition it broke arm, and merged crypto
related code.

Revert with prejudice.

This revert skips files touched in r316370 since that commit was since
MFCed. This revert also skips files that require $FreeBSD$ property
changes.

Thank you to those who helped me get out of this mess including but not
limited to gonzo, kevans, rgrimes.

Requested by: gjb (re)


# 331643 27-Mar-2018 dim

MFC r314568 (by emaste):

kern_sig.c: ANSIfy and remove archaic register keyword

Sponsored by: The FreeBSD Foundation

MFC r318389 (by emaste):

Remove register keyword from sys/ and ANSIfy prototypes

A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.

ANSIfy related prototypes while here.

Reviewed by: cem, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10193


# 330897 14-Mar-2018 eadler

Partial merge of the SPDX changes

These changes are incomplete but are making it difficult
to determine what other changes can/should be merged.

No objections from: pfg


# 324293 05-Oct-2017 avg

MFC r323578,r323769: dounmount: do not release the mount point's reference
on the covered vnode

As long as mnt_ref is not zero there can be a consumer that might try
to access mnt_vnodecovered. For this reason the covered vnode must not
be freed until mnt_ref goes to zero.
So, move the release of the covered vnode to vfs_mount_destroy.


# 311957 12-Jan-2017 kib

MFC r311452:
Do not allocate struct statfs on kernel stack.


# 310959 31-Dec-2016 mjg

MFC r305378,r305379,r305386,r305684,r306224,r306608,r306803,r307650,r307685,
r308407,r308665,r308667,r309067:

cache: put all negative entry management code into dedicated functions

==
cache: manage negative entry list with a dedicated lock

Since negative entries are managed with a LRU list, a hit requires a
modificaton.

Currently the code tries to upgrade the global lock if needed and is
forced to retry the lookup if it fails.

Provide a dedicated lock for use when the cache is only shared-locked.

==

cache: defer freeing entries until after the global lock is dropped

This also defers vdrop for held vnodes.

==

cache: improve scalability by introducing bucket locks

An array of bucket locks is added.

All modifications still require the global cache_lock to be held for
writing. However, most readers only need the relevant bucket lock and in
effect can run concurrently to the writer as long as they use a
different lock. See the added comment for more details.

This is an intermediate step towards removal of the global lock.

==

cache: get rid of the global lock

Add a table of vnode locks and use them along with bucketlocks to provide
concurrent modification support. The approach taken is to preserve the
current behaviour of the namecache and just lock all relevant parts before
any changes are made.

Lookups still require the relevant bucket to be locked.

==

cache: ignore purgevfs requests for filesystems with few vnodes

purgevfs is purely optional and induces lock contention in workloads
which frequently mount and unmount filesystems.

In particular, poudriere will do this for filesystems with 4 vnodes or
less. Full cache scan is clearly wasteful.

Since there is no explicit counter for namecache entries, the number of
vnodes used by the target fs is checked.

The default limit is the number of bucket locks.

== (by kib)

Limit scope of the optimization in r306608 to dounmount() caller only.
Other uses of cache_purgevfs() do rely on the cache purge for correct
operations, when paths are invalidated without unmount.

==

cache: split negative entry LRU into multiple lists

This splits the ncneg_mtx lock while preserving the hit ratio at least
during buildworld.

Create N dedicated lists for new negative entries.

Entries with at least one hit get promoted to the hot list, where they
get requeued every M hits.

Shrinking demotes one hot entry and performs a round-robin shrinking of
regular lists.

==

cache: fix up a corner case in r307650

If no negative entry is found on the last list, the ncp pointer will be
left uninitialized and a non-null value will make the function assume an
entry was found.

Fix the problem by initializing to NULL on entry.

== (by kib)

vn_fullpath1() checked VV_ROOT and then unreferenced
vp->v_mount->mnt_vnodecovered unlocked. This allowed unmount to race.
Lock vnode after we noticed the VV_ROOT flag. See comments for
explanation why unlocked check for the flag is considered safe.

==

cache: fix a race between entry removal and demotion

The negative list shrinker can demote an entry with only hotlist + neglist
locks held. On the other hand entry removal possibly sets the NCF_DVDROP
without aformentioned locks held prior to detaching it from the respective
netlist., which can lose the update made by the shrinker.

==

cache: plug a write-only variable in cache_negative_zap_one

==

cache: ensure that the number of bucket locks does not exceed hash size

The size can be changed by side effect of modifying kern.maxvnodes.

Since numbucketlocks was not modified, setting a sufficiently low value
would give more locks than actual buckets, which would then lead to
corruption.

Force the number of buckets to be not smaller.

Note this should not matter for real world cases.


# 309207 27-Nov-2016 kib

MFC r308618:
Provide simple mutual exclusion between mount point update and unmount.
In the update path in ffs_mount(), drop vfs_busy() reference around namei().


# 308876 20-Nov-2016 kib

MFC r308617:
Move common cleanup code into helper.


# 308252 03-Nov-2016 trasz

MFC r306071:

Fix bug introduced with r302388, which could cause processes accessing
automounted shares to hang with "vfs_busy" wchan.

(As a workaround one can run 'automount -u' from cron.)


# 304983 29-Aug-2016 kib

MFC r303924 (by trasz):
Eliminate vprint().


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 302388 07-Jul-2016 trasz

Add new unmount(2) flag, MNT_NONBUSY, to check whether there are
any open vnodes before proceeding. Make autounmound(8) use this flag.
Without it, even an unsuccessfull unmount causes filesystem flush,
which interferes with normal operation.

Reviewed by: kib@
Approved by: re (gjb@)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D7047


# 301929 15-Jun-2016 kib

Do not assume that we own the use reference on the covered vnode until
we set MNTK_UNMOUNT flag on the mp. Otherwise parallel unmount which
wins race with us could dereference the covered vnode, and we are
left with the locked freed memory.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
MFC after: 1 week


# 299913 16-May-2016 avg

dounmount: do not call mountcheckdirs() for mounts with MNT_IGNORE

This is a bit hackish, but the flag is currently set only for ZFS
snapshots mounted under .zfs. mountcheckdirs() can change cdir/rdir
references to a covered vnode. But for the said snapshots the covered
vnode is really ephemeral and it must never be accessed (except
for a few specific cases).

To do: consider removing mountcheckdirs() entirely

MFC after: 5 days


# 295265 04-Feb-2016 kib

Do not copy by field when converting struct oexport_args to struct
export_args on mount update, bzero() is consistent with
vfs_oexport_conv().
Make the code structure more explicit by using switch.
Return EINVAL if export option layout (deduced from size) is unknown.

Based on the submission by: bde
Sponsored by: The FreeBSD Foundation


# 287107 24-Aug-2015 trasz

Make vfs_unmountall() unmount /dev after /, not before. The only
reason this didn't result in an unclean shutdown is that devfs ignores
MNT_FORCE flag.

Reviewed by: kib@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3467


# 285039 02-Jul-2015 kib

Vnode is not referenced by the vfs_domount() at the point where
asserts are made. Remove them, since we might dereference freed
memory. Leaked locks are asserted by the syscall return code anyway.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 283602 27-May-2015 kib

Right now, dounmount() is called with unreferenced mount point.
Nothing stops a parallel unmount to suceed before the given call to
dounmount() checks and locks the covered vnode. Prevent dounmount()
from acting on the freed (although type-stable) memory by changing the
interface to require the mount point to be referenced. dounmount()
consumes the reference on return, regardless of the sucessfull or
erronous result.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 278523 10-Feb-2015 kib

Mountd iterating over the mount points may race with the parallel
unmount, which causes error from nmount(2) call when performing
MNT_DELEXPORT over the directory which ceased to be a mount point.

The race is legitimate and innocent, but results in the chatty mountd.
Silence it by providing an distinguished error code for the situation,
and ignoring the error in mountd loop.

Based on the patch by: Andreas Longwitz <longwitz@incore.de>
Prodded and tested by: bdrewery
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 275638 09-Dec-2014 kib

Do not call VFS_SYNC() before VFS_UNMOUNT() for forced unmount.

Since VFS does not/cannot stop writes, sync might run indefinitely, or
be a wrong thing to do at all. E. g. NFS ignores VFS_SYNC() for
forced unmounts, since non-responding server does not allow sync to
finish. On the other hand, filesystems can and do stop writes using
fs-specific facilities, and should already fully flush caches in
VFS_UNMOUNT() due to the race.

Adjust msdosfs tp sync in unmount for forced call, to accomodate the
new behaviour. Note that it is still racy, since writes are not
stopped.

Discussed with: avg, bjk, mckusick
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks


# 270096 17-Aug-2014 trasz

Bring in the new automounter, similar to what's provided in most other
UNIX systems, eg. MacOS X and Solaris. It uses Sun-compatible map format,
has proper kernel support, and LDAP integration.

There are still a few outstanding problems; they will be fixed shortly.

Reviewed by: allanjude@, emaste@, kib@, wblock@ (earlier versions)
Phabric: D523
MFC after: 2 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation


# 269457 03-Aug-2014 kib

Remove Giant acquisition from the mount and unmount pathes.

It could be claimed that two things were reasonable protected by
Giant. One is vfsconf list links, which is converted to the new
dedicated sx vfsconf_sx. Another is vfsconf.vfc_refcount, which is
now updated with atomics.

Note that vfc_refcount still has the same races now as it has under
the Giant, the unload of filesystem modules can happen while the
module is still in use.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 264385 12-Apr-2014 bdrewery

Use proper MFSNAMELEN for fs type.

MFC after: 2 weeks
Reviewed by: rodrigc
Also spotted by:ambrisko


# 256032 03-Oct-2013 sbruno

Change len checks for fstypelen and fspathlen to be against absolute len
not strlen as they are *not* strings.

Discovered by GSOC student, Mike Ma <mikemandarine@gmail.com> during his
fuse.glusterfs port to FreeBSD.

Final patch from mckusick@

Submitted by: mckusick@
Approved by: re (hrs)
MFC after: 2 weeks


# 255136 01-Sep-2013 rmacklem

Forced dismounts of NFS mounts can fail when thread(s) are stuck
waiting for an RPC reply from the server while holding the mount
point busy (mnt_lockref incremented). This happens because dounmount()
msleep()s waiting for mnt_lockref to become 0, before calling
VFS_UNMOUNT(). This patch adds a new VFS operation called VFS_PURGE(),
which the NFS client implements as purging RPCs in progress. Making
this call before checking mnt_lockref fixes the problem, by ensuring
that the VOP_xxx() calls will fail and unbusy the mount point.

Reported by: sbruno
Reviewed by: kib
MFC after: 2 weeks


# 253158 10-Jul-2013 marcel

Add vfs_mounted and vfs_unmounted events so that components can be informed
about mount and unmount events. This is used by Juniper to implement a more
optimal implementation of NetBSD's veriexec.

This change differs from r253224 in the following way:
o The vfs_mounted handler is called before mountcheckdirs() and with
newdp locked. vp is unlocked.
o The event handlers are declared in <sys/eventhandler.h> and not in
<sys/mount.h>. The <sys/mount.h> header is used in user land code
that pretends to be kernel code and as such creates a very convoluted
environment. It's hard to untangle.

Submitted by: stevek@juniper.net
Discussed with: pjd@
Obtained from: Juniper Networks, Inc.


# 251604 10-Jun-2013 marcel

Revert r251590. It unexpectedly broke the build and there were some
questions on locking. As part of commit-bit grooming, I'd like Steve
to handle this, but can't leave things broken in the mean time.


# 251590 09-Jun-2013 marcel

Add vfs_mounted and vfs_unmounted events so that components can be informed
about mount and unmount events. This is used by Juniper to implement a more
optimal implementation of NetBSD's veriexec.

Submitted by: stevek@juniper.net
Obtained from: Juniper Networks, Inc


# 249582 17-Apr-2013 gabor

- Correct mispellings of the word occurrence

Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)


# 248562 20-Mar-2013 kib

When the journaled FFS volume is suspended due to the journal space
becoming too low, the softdep flush thread processes the workitems,
which frees the space in journal, and then unsuspends the fs. The
softdep_flush() and other workitem processing functions busy the
filesystem before iterating over the worklist, to prevent the parallel
unmount from freeing the mount data. The vfs_busy() is called with
MBF_NOWAIT flag.

Now, if the unmount is already started and the filesystem is suspended
due to low journal space, the journal is never flushed and filesystem
is never unsuspended, because vfs_busy(MBF_NOWAIT) call cannot succeed
for the unmounting fs, and softdep_flush() does not process the
workitems. Unmount needs to write metadata, where it hangs in the
"suspfs" state.

Move the vn_start_write() call in the dounmount() before setting the
MNTK_UNMOUNT flag. This practically ensures that softdep_flush()
processed the pending journal writes by making dounmount() wait for
the lift of the suspension.

Sponsored by: The FreeBSD Foundation
Reported and tested by: pho
MFC after: 2 weeks


# 245036 04-Jan-2013 davidxu

Revert revision 244760 because strncpy pads trailing space with zero,
this prevents kernel data from being leaked.

Noticed by: Joerg Sonnenberger &lt; joerg at britannica dot bec dot de &gt;


# 245001 03-Jan-2013 kib

Remove the deprecated MNT_VNODE_FOREACH interface. Use the
MNT_VNODE_FOREACH_ALL instead.


# 244760 28-Dec-2012 davidxu

Use strlcpy to NULL-terminate error message even if user provided a short
buffer.


# 244534 21-Dec-2012 attilio

Fixup r218424: uio_yield() was scaling directly to userland priority.
When kern_yield() was introduced with the possibility to specify
a new priority, the behaviour changed by not lowering priority at all
in the consumers, making the yielding mechanism highly ineffective for
high priority kthreads like bufdaemon, syncer, vlrudaemon, etc.
There are no evidences that consumers could bear with such change in
semantic and this situation could finally lead to bugs similar to the
ones fixed in r244240.
Re-specify userland pri for kthreads involved.

Tested by: pho
Reviewed by: kib, mdf
MFC after: 1 week


# 244053 09-Dec-2012 kib

Fix typo.

MFC after: 3 days


# 243719 30-Nov-2012 pjd

IFp4 @208450:

Remove redundant call to AUDIT_ARG_UPATH1().
Path will be remembered by the following NDINIT(AUDITVNODE1) call.

Sponsored by: FreeBSD Foundation (auditdistd)
MFC after: 2 weeks


# 241896 22-Oct-2012 kib

Remove the support for using non-mpsafe filesystem modules.

In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by: attilio
Tested by: pho


# 240284 09-Sep-2012 kib

Add a facility for vgone() to inform the set of subscribed mounts
about vnode reclamation. Typical use is for the bypass mounts like
nullfs to get a notification about lower vnode going away.

Now, vgone() calls new VFS op vfs_reclaim_lowervp() with an argument
lowervp which is reclaimed. It is possible to register several
reclamation event listeners, to correctly handle the case of several
nullfs mounts over the same directory.

For the filesystem not having nullfs mounts over it, the overhead
added is a single mount interlock lock/unlock in the vnode reclamation
path.

In collaboration with: pho
MFC after: 3 weeks


# 234482 20-Apr-2012 mckusick

This change creates a new list of active vnodes associated with
a mount point. Active vnodes are those with a non-zero use or hold
count, e.g., those vnodes that are not on the free list. Note that
this list is in addition to the list of all the vnodes associated
with a mount point.

To avoid adding another set of linkage pointers to the vnode
structure, the active list uses the existing linkage pointers
used by the free list (previously named v_freelist, now renamed
v_actfreelist).

This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops
over just the active vnodes associated with a mount point (typically
less than 1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks


# 234386 17-Apr-2012 mckusick

Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.
The primary changes are that the user of the interface no longer
needs to manage the mount-mutex locking and that the vnode that
is returned has its mutex locked (thus avoiding the need to check
to see if its is DOOMED or other possible end of life senarios).

To minimize compatibility issues for third-party developers, the
old MNT_VNODE_FOREACH interface will remain available so that this
change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH
will be removed in head.

The reason for this update is to prepare for the addition of the
MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the
active vnodes associated with a mount point (typically less than
1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks


# 233999 07-Apr-2012 gleb

Add vfs_getopt_size. Support human readable file system options in tmpfs.

Increase maximum tmpfs file system size to 4GB*PAGE_SIZE on 32 bit archs.

Discussed with: delphij
MFC after: 2 weeks


# 232709 08-Mar-2012 kib

Decomission mnt_noasync. Introduce MNTK_NOASYNC mnt_kern_flag which
allows a filesystem to request VFS to not allow MNTK_ASYNC.

MFC after: 1 week


# 231012 05-Feb-2012 mm

Analogous to r230407 a separate path buffer in vfs_mount.c is required
for r230129. Fixes a out of bounds write to fspath.

MFC after: 10 days


# 230249 16-Jan-2012 mckusick

Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC after: 2 weeks


# 230129 15-Jan-2012 mm

Introduce vn_path_to_global_path()

This function updates path string to vnode's full global path and checks
the size of the new path string against the pathlen argument.

In vfs_domount(), sys_unmount() and kern_jail_set() this new function
is used to update the supplied path argument to the respective global path.

Unbreaks jailed zfs(8) with enforce_statfs set to 1.

Reviewed by: kib
MFC after: 1 month


# 227333 08-Nov-2011 attilio

Introduce the option VFS_ALLOW_NONMPSAFE and turn it on by default on
all the architectures.
The option allows to mount non-MPSAFE filesystem. Without it, the
kernel will refuse to mount a non-MPSAFE filesytem.

This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.

No MFC is expected for this patch.

Tested by: gianni
Reviewed by: kib


# 227293 07-Nov-2011 ed

Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.

This means that their use is restricted to a single C file.


# 226265 11-Oct-2011 mckusick

When unmounting a filesystem always wait for the vfs_busy lock to clear
so that if no vnodes in the filesystem are actively in use the unmount
will succeed rather than failing with EBUSY.

Reported by: Garrett Cooper
Reviewed by: Attilio Rao and Kostik Belousov
Tested by: Garrett Cooper
PR: kern/161016
MFC after: 3 weeks


# 225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# 224712 08-Aug-2011 mm

Revert r224655 and r224614 because vn_fullpath* does not always work
on nullfs mounts.

Change shall be reconsidered after 9.0 is released.

Requested by: re (kib)
Approved by: re (kib)


# 224655 05-Aug-2011 mm

The change in r224615 didn't take into account that vn_fullpath_global()
doesn't operate on locked vnode. This could cause a panic.

Fix by unlocking vnode, re-locking afterwards and verifying that it wasn't
renamed or deleted. To improve readability and reduce code size, move code
to a new static function vfs_verify_global_path().

In addition, fix missing giant unlock in unmount().

Reported by: David Wolfskill <david@catwhisker.org>
Reviewed by: kib
Approved by: re (bz)
MFC after: 2 weeks


# 224614 02-Aug-2011 mm

For mount, discover f_mntonname from supplied path argument
using vn_fullpath_global(). This fixes f_mntonname if mounting
inside chroot, jail or with relative path as argument.

For unmount in jail, use vn_fullpath_global() to discover
global path from supplied path argument. This fixes unmount in jail.

Reviewed by: pjd, kib
Approved by: re (kib)
MFC after: 2 weeks


# 224290 24-Jul-2011 mckusick

This update changes the mnt_flag field in the mount structure from
32 bits to 64 bits and eliminates the unused mnt_xflag field. The
existing mnt_flag field is completely out of bits, so this update
gives us room to expand. Note that the f_flags field in the statfs
structure is already 64 bits, so the expanded mnt_flag field can
be exported without having to make any changes in the statfs structure.

Approved by: re (bz)


# 223919 11-Jul-2011 ae

Include sys/sbuf.h directly.


# 221829 13-May-2011 mdf

Use a name instead of a magic number for kern_yield(9) when the priority
should not change. Fetch the td_user_pri under the thread lock. This
is probably not necessary but a magic number also seems preferable to
knowing the implementation details here.

Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >


# 220937 22-Apr-2011 jh

Utilize vfs_sanitizeopts() in vfs_mergeopts() to merge options. Because
vfs_sanitizeopts() can handle "ro" and "rw" options properly, there is
no more need to add "noro" in vfs_donmount() to cancel "ro".

This also fixes a problem of canceling options beginning with "no".
For example, "noatime" didn't cancel "nonoatime". Thus it was possible
that both "noatime" and "nonoatime" were active at the same time.

Reviewed by: bde


# 220040 26-Mar-2011 jh

Fix some style issues in r219925.

Reported by: bde
MFC after: 1 month


# 219925 23-Mar-2011 jh

Recognize "ro", "rdonly", "norw", "rw" and "noro" as equal options in
vfs_equalopts(). This allows vfs_sanitizeopts() to filter redundant
occurrences of these options. It was possible that for example both "ro"
and "rw" options became active concurrently.

PR: kern/133614
Discussed on: freebsd-hackers
MFC after: 1 month


# 218852 19-Feb-2011 jh

Don't restore old mount options and flags if VFS_MOUNT(9) succeeds but
vfs_export() fails. Restoring old options and flags after successful
VFS_MOUNT(9) call may cause the file system internal state to become
inconsistent with mount options and flags. Specifically the FFS super
block fs_ronly field and the MNT_RDONLY flag may get out of sync.

PR: kern/133614
Discussed on: freebsd-hackers


# 218424 07-Feb-2011 mdf

Based on discussions on the svn-src mailing list, rework r218195:

- entirely eliminate some calls to uio_yeild() as being unnecessary,
such as in a sysctl handler.

- move should_yield() and maybe_yield() to kern_synch.c and move the
prototypes from sys/uio.h to sys/proc.h

- add a slightly more generic kern_yield() that can replace the
functionality of uio_yield().

- replace source uses of uio_yield() with the functional equivalent,
or in some cases do not change the thread priority when switching.

- fix a logic inversion bug in vlrureclaim(), pointed out by bde@.

- instead of using the per-cpu last switched ticks, use a per thread
variable for should_yield(). With PREEMPTION, the only reasonable
use of this is to determine if a lock has been held a long time and
relinquish it. Without PREEMPTION, this is essentially the same as
the per-cpu variable.


# 218195 02-Feb-2011 mdf

Put the general logic for being a CPU hog into a new function
should_yield(). Use this in various places. Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after: 1 week


# 217792 24-Jan-2011 jh

Replace spaces with tabs.


# 215747 23-Nov-2010 pluknet

Update MNT_ROOTFS comments after changes in the root mount logic.

Reported by: arundel
Suggested by: marcel (phrasing)
Approved by: kib (mentor)


# 214005 18-Oct-2010 marcel

In vfs_filteropt(), only print the errmsg when there's no errmsg
mount option. Otherwise errors tend to get printed multiple times.


# 213664 10-Oct-2010 kib

The r184588 changed the layout of struct export_args, causing an ABI
breakage for old mount(2) syscall, since most struct <filesystem>_args
embed export_args. The mount(2) is supposed to provide ABI
compatibility for pre-nmount mount(8) binaries, so restore ABI to
pre-r184588.

Requested and reviewed by: bde
MFC after: 2 weeks


# 213365 02-Oct-2010 marcel

Split the root mount logic from the (generic) mount code and move
it (the root mount code) into a new file called vfs_mountroot.c

The split is almost trivial, as the code is almost perfectly
non-intertwined. The only adjustment needed was to move the UMA
zone allocation out of vfs_mountroot() [in vfs_mountroot.c] and
into vfs_mount.c, where it had to be done as a SYSINIT [see
vfs_mount_init()].

There are no functional changes with this commit.


# 212466 11-Sep-2010 kib

Protect mnt_syncer with the sync_mtx. This prevents a (rare) vnode leak
when mount and update are executed in parallel.

Encapsulate syncer vnode deallocation into the helper function
vfs_deallocate_syncvnode(), to not externalize sync_mtx from vfs_subr.c.

Found and reviewed by: jh (previous version of the patch)
Tested by: pho
MFC after: 3 weeks


# 212356 09-Sep-2010 pjd

Remove VI_MOUNT flag from vnode on VFS_MOUNT() failure.


# 212341 08-Sep-2010 pjd

Doing first mount and updating mount points are both handled by the same
syscall and the same function, but are very different and share almost no code.
To make it easier to read and analyze, split vfs_domount() into
vfs_domount_first() and vfs_domount_update().

Reviewed by: kib


# 212340 08-Sep-2010 pjd

- Log all the problems in devfs_fixup().

- Correct error paths. The system will be useless on devfs_fixup() failure, so
why bother? Maybe for the same reason why a dead body is washed and dressed
in a nice suit before it is put into a coffin? Maybe system's last will is to
panic without any locks held?

Reviewed by: kib


# 211930 28-Aug-2010 pjd

There is a bug in vfs_allocate_syncvnode() failure handling in mount code.
Actually it is hard to properly handle such a failure, especially in MNT_UPDATE
case. The only reason for the vfs_allocate_syncvnode() function to fail is
getnewvnode() failure. Fortunately it is impossible for current implementation
of getnewvnode() to fail, so we can assert this and make
vfs_allocate_syncvnode() void. This in turn free us from handling its failures
in the mount code.

Reviewed by: kib
MFC after: 1 month


# 204066 18-Feb-2010 pjd

- Reduce scope of vnode lock. vfs_mount_alloc() doesn't need vnode to be
locked.
- Remove code duplication.


# 201145 28-Dec-2009 antoine

(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR: 137213
Submitted by: Eygene Ryabinkin (initial version)
MFC after: 1 month


# 199227 12-Nov-2009 attilio

Add the possibility for vfs.root.mountfrom tunable to accept a list of
items rather than a single one. The list is a space separated collection
of items defined as the current one accepted.

While there fix also a nit in a comment.

Obtained from: Sandvine Incorporated
Reviewed by: emaste
Tested by: Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>
Sponsored by: Sandvine Incorporated
MFC: 2 weeks


# 199041 08-Nov-2009 trasz

Add suggestion for zfs root.


# 195995 31-Jul-2009 jhb

Fix some LORs between vnode locks and filedescriptor table locks.
- Don't grab the filedesc lock just to read fd_cmask.
- Drop vnode locks earlier when mounting the root filesystem and before
sanitizing stdin/out/err file descriptors during execve().

Submitted by: kib
Approved by: re (rwatson)
MFC after: 1 week


# 195939 29-Jul-2009 rwatson

Eliminate ARG_UPATH[12] arguments to AUDIT_ARG_UPATH() and instead
provide specific macros, AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2()
to capture path information for audit records. This allows us to
move the definitions of ARG_* out of the public audit header file,
as they are an implementation detail of our current kernel-internal
audit record, which may change.

Approved by: re (kensmith)
Obtained from: TrustedBSD Project
MFC after: 1 month


# 195247 01-Jul-2009 rwatson

When auditing unmount(2), capture FSID arguments as regular text strings
rather than as paths, which would lead to them being treated as relative
pathnames and hence confusingly converted into absolute pathnames.

Capture flags to unmount(2) via an argument token.

Approved by: re (audit argument blanket)
MFC after: 3 days


# 195104 27-Jun-2009 rwatson

Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type. This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by: brooks
Approved by: re (kib)
Obtained from: TrustedBSD Project
MFC after: 1 week


# 193511 05-Jun-2009 rwatson

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


# 193192 31-May-2009 rodrigc

sys/boot/common.c
=================
Extend the loader to parse the root file system mount options in /etc/fstab,
and set a new loader variable vfs.root.mountfrom.options with these options.
The root mount options must be a comma-delimited string, as specified in
/etc/fstab.
Only set the vfs.root.mountfrom.options variable if it has not been
set in the environment.

sys/kern/vfs_mount.c
====================
When mounting the root file system, pass the mount options
specified in vfs.root.mountfrom.options, but filter out "rw" and "noro",
since the initial mount of the root file system must be done as "ro".
While we are here, try to add a few hints to the mountroot prompt
to give users and idea what might of gone wrong during mounting
of the root file system.

Reviewed by: jhb (an earlier patch)


# 192895 27-May-2009 jamie

Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails. Child jails may be restricted more than their parents,
but never less. Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system. Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings. The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by: bz (mentor)


# 191990 11-May-2009 attilio

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


# 190878 10-Apr-2009 thompsa

Revert r190676,190677

The geom and CAM changes for root_hold are the wrong solution for USB design
quirks.

Requested by: scottl


# 190676 03-Apr-2009 thompsa

Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called
in situations where sleeping isnt allowed.


# 190540 30-Mar-2009 thompsa

Further rate limit the root wait status, it will be printed once per
root_mount_rel() wakeup.


# 190460 27-Mar-2009 thompsa

Skip the allocation of the root hold token if the mount already happened.


# 189290 02-Mar-2009 jamie

Extend the "vfsopt" mount options for more general use. Make struct
vfsopt and the vfs_buildopts function public, and add some new fields
to struct vfsopt (pos and seen), and new functions vfs_getopt_pos and
vfs_opterror.

Further extend the interface to allow reading options from the kernel
in addition to sending them to the kernel, with vfs_setopt and related
functions.

While this allows the "name=value" option interface to be used for more
than just FS mounts (planned use is for jails), it retains the current
"vfsopt" name and <sys/mount.h> requirement.

Approved by: bz (mentor)


# 188150 05-Feb-2009 attilio

Add more KTR_VFS logging point in order to have a more effective tracing.

Reviewed by: brueffer, kib
Tested by: Gianni Trematerra <giovanni D trematerra A gmail D com>


# 186197 16-Dec-2008 attilio

1) Fix a deadlock in the VFS:
- threadA runs vfs_rel(mp1)
- threadB does unmount the mp1 fs, sets MNTK_UNMOUNT and drop MNT_ILOCK()
- threadA runs vfs_busy(mp1) and, as long as, MNTK_UNMOUNT is set, sleeps
waiting for threadB to complete the unmount
- threadB, in vfs_mount_destroy(), finds mnt_lock > 0 and sleeps waiting
for the refcount to expire.

Fix the deadlock by adding a flag called MNTK_REFEXPIRE which signals the
unmounter is waiting for mnt_ref to expire.
The vfs_busy contenders got awake, fails, and if they retry the
MNTK_REFEXPIRE won't allow them to sleep again.

2) Simplify significantly the code of vfs_mount_destroy() trimming
unnecessary codes:
- as long as any reference exited, it is no-more possible to have
write-op (primarty and secondary) in progress.
- it is no needed to drop and reacquire the mount lock.
- filling the structures with dummy values is unuseful as long as
it is going to be freed.

Tested by: pho, Andrea Barberio <insomniac at slackware dot it>
Discussed with: kib


# 185504 01-Dec-2008 attilio

Fix an inverted check introduced in r184554.

Submitted by: tegge
Pointy hat to: me


# 184599 03-Nov-2008 attilio

Remove the mnt_holdcnt and mnt_holdcntwaiters because they are useless.
Really, the concept of holdcnt in the struct mount is rappresented by
the mnt_ref (which prevents the type-stable structure from being
"recycled) handled through vfs_ref() and vfs_rel().
On this optic, switch the holdcnt acquisition into an emulated vfs_ref()
(and subsequent release into vfs_rel()).

Discussed with: kib
Tested by: pho


# 184588 03-Nov-2008 dfr

Implement support for RPCSEC_GSS authentication to both the NFS client
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager. I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.

The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.

To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.

As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.

Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.

The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.

Sponsored by: Isilon Systems
MFC after: 1 month


# 184554 02-Nov-2008 attilio

Improve VFS locking:
- Implement real draining for vfs consumers by not relying on the
mnt_lock and using instead a refcount in order to keep track of lock
requesters.
- Due to the change above, remove the mnt_lock lockmgr because it is now
useless.
- Due to the change above, vfs_busy() is no more linked to a lockmgr.
Change so its KPI by removing the interlock argument and defining 2 new
flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the
old version (which was unlinked from the lockmgr alredy) and
MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx
once the mnt interlock is held (ability still desired by most consumers).
- The stub used into vfs_mount_destroy(), that allows to override the
mnt_ref if running for more than 3 seconds, make it totally useless.
Remove it as it was thought to work into older versions.
If a problem of "refcount held never going away" should appear, we will
need to fix properly instead than trust on such hackish solution.
- Fix a bug where returning (with an error) from dounmount() was still
leaving the MNTK_MWAIT flag on even if it the waiters were actually
woken up. Just a place in vfs_mount_destroy() is left because it is
going to recycle the structure in any case, so it doesn't matter.
- Remove the markercnt refcount as it is useless.

This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and
__FreeBSD_version will be modified accordingly.

Discussed with: kib
Tested by: pho


# 183754 10-Oct-2008 attilio

Remove the struct thread unuseful argument from bufobj interface.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()

and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()

Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.

As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP

Reviewed by: kib
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# 183188 19-Sep-2008 obrien

Add freebsd32 compat shim for nmount(2).
(and quiet some compiler warnings for vfs_donmount)


# 182740 03-Sep-2008 simon

- Fix amd64 local privilege escalation. [08:07]
- Fix nmount(2) local privilege escalation. [08:08]
- Fix IPv6 remote kernel panics. [08:09]

Fix for [08:07] is merge of r181823.

Submitted by: kib [08:07], csjp [08:08], bz [08:09]
Reviewed by: peter [08:07], jhb [08:07]
Reviewed by: jinmei [08:09], rwatson [08:09]
Approved by: re (SA blanket)
Approved by: so (simon)
Security: FreeBSD-SA-08:07.amd64
Security: FreeBSD-SA-08:08.nmount
Security: FreeBSD-SA-08:09.icmp6


# 182542 31-Aug-2008 attilio

Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.

Manpages are updated accordingly.

Tested by: Diego Sardina <siarodx at gmail dot com>


# 182371 28-Aug-2008 attilio

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# 182025 22-Aug-2008 rodrigc

In nmount(), when we see the "force" option,
set the MNT_FORCE flag, but do not persist "force"
in the options list, since it is a command, not a persistent property
of a mount.

Similarly, when we see "reload", set MNT_RELOAD,
but delete "reload" from the options list.

MFC after: 1 week


# 181528 10-Aug-2008 kib

Revert r181345.
Move the NULL pointer check to the vfs_deleteopt() function.

Discussed with: rodrigc
MFC after: 3 days


# 181463 09-Aug-2008 des

Add sbuf_new_auto as a shortcut for the very common case of creating a
completely dynamic sbuf.

Obtained from: Varnish
MFC after: 2 weeks


# 180484 12-Jul-2008 rodrigc

In nmount(), if we see "update" in the mount options,
set MNT_UPDATE in fsflags, and delete the
"update" option from the global mount options.

MNT_UPDATE is a command, and not a property of a mount
that should persist after the command is executed.

We need to do similar things for MNT_FORCE and MNT_RELOAD.

All mount flags are prefixed by MNT_..... it would
be nice if flags which were commands were named differently
from flags which are persistent properties of a mount.
This was not such a big deal in the pre-nmount() days,
but with nmount() it is more important.

Requested by: yar
MFC after: 2 weeks


# 179670 09-Jun-2008 kib

Provide the mutual exclusion between the nfs export list modifications
and nfs requests processing. Lockmgr lock provides the shared locking for
nfs requests, while exclusive mode is used for modifications. The writer
starvation is handled by lockmgr too.

Reported by: kris, pho, many
Based on the submission by: mohan
Tested by: pho
MFC after: 2 weeks


# 179658 08-Jun-2008 wkoszek

Remove checks against DDB, which isn't used in this file.

My intention is to bring no functional change.

Discussion on: IRC
Reviewed by: ed, kan, rink,


# 179268 23-May-2008 rodrigc

Do not convert the "snapshot" string to the MNT_SNAPSHOT flag here, since
we do it further down in ffs_vfsops.c

MFC after: 1 month


# 178677 29-Apr-2008 rdivacky

Lock filedesc exclusively when modifying fd_[cr]dir.
This is probably harmless but it's better to lock it
correctly.

Approved by: kib (mentor)


# 178429 22-Apr-2008 phk

Now that all platforms use genclock, shuffle things around slightly
for better structure.

Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.

In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such. All that
is theoretically a matter for userland only.

Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC. For this we have <sys/clock.h>

<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on. These know only seconds and fractions thereof.

Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.

Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c. Remove references to it
elsewhere.

Remove a lot of unnecessary <sys/clock.h> includes.

Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.


# 178016 08-Apr-2008 sam

o add a mountroot event handler that fires when / is mounted; this information
was lost when root started being mounted by init
o remove SI_SUB_MOUNT_ROOT since it's no longer meaningful

MFC after: 2 weeks


# 177785 31-Mar-2008 kib

Add the support for the AT_FDCWD and fd-relative name lookups to the
namei(9).

Based on the submission by rdivacky,
sponsored by Google Summer of Code 2007
Reviewed by: rwatson, rdivacky
Tested by: pho


# 177528 23-Mar-2008 kib

Yield the cpu in the kernel while iterating the list of the
vnodes belonging to the mountpoint. Also, yield when in the
softdep_process_worklist() even when we are not going to sleep due to
buffer drain.

It is believed that the ULE fixed the problem [1], but the yielding
seems to be needed at least for the 4BSD case.

Discussed: on stable@, with bde
Reviewed by: tegge, jeff [1]
MFC after: 2 weeks


# 176394 18-Feb-2008 yar

Undo the damage I did in sys/kern/vfs_mount.c #1.274 and
sbin/mount_nfs/mount_nfs.c #1.76. Let the dragons sleep.

Requested by: rodrigc, des
PR: kern/120319 (welcome the bug back)


# 176383 18-Feb-2008 yar

Add a remark on a questionable property of vfs_mergeopts().


# 176283 14-Feb-2008 yar

In the new order of things dictated by nmount(2), a read-only mount
is to be requested via a "ro" option. At the same time, MNT_RDONLY
is gradually becoming an indicator of the current state of the FS
instead of a command flag. Today passing MNT_RDONLY alone to the
kernel's mount machinery will lead to various glitches. (See the
PRs for examples.)

Therefore mount the root FS with a "ro" option instead of the
MNT_RDONLY flag. (Note that MNT_RDONLY still is added to the mount
flags internally, by vfs_donmount(), if "ro" was specified.)

To be able to pass "ro" cleanly to kernel_vmount(), teach the latter
function to accept options with NULL values.

Also correct the comment explaining how mount_arg() handles length
of -1.

PR: bin/106636 kern/120319
Submitted by: Jaakko Heinonen <see PR kern/120319 for email> (originally)


# 175635 24-Jan-2008 attilio

Cleanup lockmgr interface and exported KPI:
- Remove the "thread" argument from the lockmgr() function as it is
always curthread now
- Axe lockcount() function as it is no longer used
- Axe LOCKMGR_ASSERT() as it is bogus really and no currently used.
Hopefully this will be soonly replaced by something suitable for it.
- Remove the prototype for dumplockinfo() as the function is no longer
present

Addictionally:
- Introduce a KASSERT() in lockstatus() in order to let it accept only
curthread or NULL as they should only be passed
- Do a little bit of style(9) cleanup on lockmgr.h

KPI results heavilly broken by this change, so manpages and
FreeBSD_version will be modified accordingly by further commits.

Tested by: matteo


# 175294 13-Jan-2008 attilio

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


# 175202 09-Jan-2008 attilio

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


# 175024 31-Dec-2007 rodrigc

In vfs_scanopt(), make sure that the mount option value is not NULL
before calling vsscanf().

PR: 118531
Submitted by: Jaakko Heinonen <jh saunalahti fi>
MFC after: 3 days


# 174937 27-Dec-2007 imp

A partial solution to some of the 'pull the umass device with a
mounted FS' problems. These are more along the lines of 'avoiding an
avoidable panic' than a complete solution to removable devices. We
now close the barn door after the horse has gotten lose and has been
hit by a truck, as it were. The barn no longer catches fire in this
case, but the horse is still dead :-).

The vfs_bio.c fix causes us not to put a failed write back into the
dirty pool if the error returned was ENXIO. In that case, the buffer
is treated like any other clean buffer that's being retured. ENXIO
means the device isn't there anymore and will never be there again in
the future, so retrying is futile.

The vfs_mount.c fix treats 'ENXIO' as success for unmounting a file
system. If the device is gone, retrying later won't help and we'll
never be able to unmount the device.

These two are part of a larger patch set submitted by the author. The
other patches will be forth coming. I added comments to these two
patches.

Submitted by: Henrik Gulbrandsen
Reviewed by: phk@
PR: usb/46176 (partial)


# 174282 05-Dec-2007 rodrigc

In nmount(), internally convert the mount option: "rdonly" to "ro".
This makes updates mounts such as:
"mount -u -o rdonly" work more like, "mount -u -o ro".

References to "-o rdonly" were changed to "-o ro" in revision 1.60 of
the mount(8) man page,
but some people still like to use "-o rdonly" since it was documented
in earlier versions of FreeBSD.

Requested by: rwatson
MFC after: 1 week


# 173064 27-Oct-2007 rodrigc

In nmount(), if MNT_ROOT is in the mount flags, filter it
out instead of returning an error.
(1) This makes the behavior consistent with mount(2).
(2) This makes update mounts on the root file system work properly.
(3) The explicit checks for MNT_ROOTFS in src/sbin/fsck_ffs/main.c
and src/usr.sbin/mountd/mountd.c which were put in to
eliminate errors during update mounts on the root file system
can be removed.

The only place were MNT_ROOTFS can be validly set
is inside the kernel, i.e. with vfs_mountroot_try().

Reviewed by: phk
MFC after: 3 days


# 172930 24-Oct-2007 rwatson

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


# 172151 12-Sep-2007 kib

When restoring the mount after umount failed, the MNTK_UNMOUNT flag
prevents insmntque() from placing reallocated syncer vnode on mount
list, that causes panic in vfs_allocate_syncvnode().

Introduce MNTK_NOINSMNTQ flag, that marks the period when instmntque is
not allowed to success, instead of MNTK_UNMOUNT. The MNTK_NOINSMNTQ is
set and cleared simultaneously with MNTK_UNMOUNT, except on umount error
path, where it is cleaned just before the syncer vnode is going to be
allocated.

Reported by: Peter Jeremy <peterjeremy optushome com au>
Suggested by: tegge
Approved by: re (rwatson)


# 171852 15-Aug-2007 jhb

On 6.x this works:

% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)

Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow

In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home

Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)

Requested by: jhb
Reported by: Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by: re (bmah)
Proxy commit for: rodrigc


# 171598 26-Jul-2007 pjd

The v_mountedhere field is protected by the vnode lock, not vnode's internal
lock.

Approved by: re (rwatson)


# 171452 14-Jul-2007 rodrigc

Revert previous commits which I committed by mistake.

Approved by: re (implicit)
Pointy hat to: me


# 171450 14-Jul-2007 rodrigc

The last entry in the ext2_opts array must be NULL,
otherwise the kernel with crash in vfs_filteropt() if an invalid
mount option is passed to ext2fs.

Approved by: re (kensmith)


# 170587 11-Jun-2007 rwatson

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


# 169054 26-Apr-2007 kib

Allow the dounmount() to proceed even for doomed coveredvp.

In dounmount(), before or while vn_lock(coveredvp) is called, coveredvp
vnode may be VI_DOOMED due to one of the following:
- other thread finished unmount and vput()ed it, and vnode was chosen
for recycling, while vn_lock() slept;
- forced unmount of the coveredvp->v_mount fs.
In the first case, next check for changed v_mountedhere or mnt_gen counter
would be successfull. In the second case, the unmount shall be allowed.

Submitted by: sobomax
MFC after: 2 weeks


# 168823 17-Apr-2007 pjd

Export vfs_mount_alloc() as it is used in ZFS.


# 168699 13-Apr-2007 pjd

Fix jails and jail-friendly file systems handling:
- We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail.
- Move security checks to vfs_suser() and deny unmounting and updating
for jailed root from different jails, etc.

OK'ed by: rwatson


# 168552 09-Apr-2007 njl

Restore the locking for the sleep/wakeup to avoid waiting an extra 1 sec
if a race was lost. We're still single-threaded at this point, but just
be safe for the future.


# 168547 09-Apr-2007 njl

Clean up the root mount and mount wait code. No mutexes are needed here
since a spurious wakeup() is the only possible outcome and this is fine in
the BSD programming model.


# 168506 08-Apr-2007 pjd

Add root_mounted() function that returns true if the root file system is
already mounted.


# 168396 05-Apr-2007 pjd

Add security.jail.mount_allowed sysctl, which allows to mount and
unmount jail-friendly file systems from within a jail.
Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and
PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user.
It is turned off by default.

A jail-friendly file system is a file system which driver registers
itself with VFCF_JAIL flag via VFS_SET(9) API.
The lsvfs(1) command can be used to see which file systems are
jail-friendly ones.

There currently no jail-friendly file systems, ZFS will be the first one.
In the future we may consider marking file systems like nullfs as
jail-friendly.

Reviewed by: rwatson


# 168355 04-Apr-2007 rwatson

Replace custom file descriptor array sleep lock constructed using a mutex
and flags with an sxlock. This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention. All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently. Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.

- Generally eliminate the distinction between "fast" and regular
acquisisition of the filedesc lock; the plan is that they will now all
be fast. Change all locking instances to either shared or exclusive
locks.

- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
was called without the mutex held; sx_sleep() is now always called with
the sxlock held exclusively.

- Universally hold the struct file lock over changes to struct file,
rather than the filedesc lock or no lock. Always update the f_ops
field last. A further memory barrier is required here in the future
(discussed with jhb).

- Improve locking and reference management in linux_at(), which fails to
properly acquire vnode references before using vnode pointers. Annotate
improper use of vn_fullpath(), which will be replaced at a future date.

In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).

Tested by: kris
Discussed with: jhb, kris, attilio, jeff


# 168300 03-Apr-2007 pjd

Add root_mount_wait() function which can be used to wait until the root
file system is mounted. This is useful for kernel modules loaded from
/boot/loader.conf, that have to access file system.


# 168207 01-Apr-2007 pjd

I think the code I'm removing here is completely bogus.
vfs_flags field is used for VFCF_* flags which are given at file system
driver creation time (via VFS_SET(9)) macro.

What this code did was bascially this:

If file system registers itself with VFCF_UNICODE flag (stores file names
as Unicode), it will gain MNT_SOFTDEP flag (UFS soft-updates).

If file system registers itself with VFCF_LOOPBACK flag (aliases some other
mounted FS), it will gain MNT_SUIDDIR flag (special handling of SUID on
dirs).

The latter will be quite dangerous, but those flags are reset later in
vfs_domount().

MFC after: 1 month


# 168185 31-Mar-2007 pjd

Make vfs_mount_destroy() and vfs_freeopts() non-static, I'd like to use them.


# 167672 18-Mar-2007 pjd

Don't deny unmounting file systems for jailed processes immediately, allow
prison_priv_check() to decide what to do.

This change is suppose not to change current (security) behaviour
in any way.

This change is simlar to the change of PRIV_VFS_MOUNT in previous revision.


# 167553 14-Mar-2007 pjd

Don't deny mounting for jailed processes immediately, allow
prison_priv_check() to decide what to do.

This change is suppose not to change current (security) behaviour
in any way.

Reviewed by: rwatson


# 167551 14-Mar-2007 pjd

White space nits.


# 167232 05-Mar-2007 rwatson

Further system call comment cleanup:

- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
"syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.


# 166681 12-Feb-2007 cognet

Make vfs_getopts() set *error to ENOENT if the option wasn't found, so that
consumers don't have to check for both error and the return value (some of
them actually don't do it).

MFC After: 1 week


# 165288 16-Dec-2006 rodrigc

Add a function vfs_deleteopt() which searches through the vfsoptlist
linked list of mount options by name, and deletes the option if it finds it.


# 164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# 163606 22-Oct-2006 rwatson

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


# 162983 03-Oct-2006 kib

Fix the remaining race in the revs. 1.232, 1,233 that could occur during
unmount when mp structure is reused while waiting for coveredvp lock.
Introduce struct mount generation count, increment it on each reuse and
compare the generations before and after obtaining the coveredvp lock.

Reviewed by: tegge, pjd
Approved by: pjd (mentor)
MFC after: 2 weeks


# 162954 02-Oct-2006 phk

First part of a little cleanup in the calendar/timezone/RTC handling.

Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.


# 162653 26-Sep-2006 tegge

Reduce fluctuations of mnt_flag to allow unlocked readers to get a
slightly more consistent view.


# 162651 26-Sep-2006 tegge

Don't restore MNT_QUOTA bit in mnt_flag after a failed mount with
MNT_UPDATE flag, closing a race between nmount() and quotactl().


# 162649 26-Sep-2006 tegge

Add mnt_noasync counter to better handle interleaved calls to nmount(),
sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag
which is set only when MNT_ASYNC is set and mnt_noasync is zero, and
check that flag instead of MNT_ASYNC before initiating async io.


# 162648 26-Sep-2006 tegge

Don't restore mnt_kern_flag on failed MNT_UPDATE mount, it can race
with dounmount(), causing loss of MNTK_UNMOUNT flag.


# 162647 26-Sep-2006 tegge

Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().


# 162444 19-Sep-2006 kib

Fix the bug in rev. 1.232. If vfs_suser returned false, coveredvp shall be
unlocked only if it really exists.

Found with: Coverity Prevent(tm)
CID: 1535
Approved by: pjd (mentor)


# 162407 18-Sep-2006 kib

Fix the race while waiting for coveredvp lock during unmount. The vnode may
be recycled during the sleep, wrap the vn_lock with vhold/vdrop.
Check that coveredvp still points to the same mp after sleep (needed
because sleep dropped Giant).
Move check for user rights for unmount after coveredvp lock is obtained.

Tested by: Peter Holm
Reviewed by: tegge
Approved by: kan (mentor)
MFC after: 2 weeks


# 161644 26-Aug-2006 marius

Fix another bug introduced with rev. 1.204; in vfs_donmount() if
the 'vfs_getopt(optlist, "errmsg", (void **)&errmsg, &errmsg_len)'
call fails, 'errmsg' is left uninitialized, making the later tests
against NULL meaningless, and the uses bogus. Thus initialize
'errmsg' to NULL beforehand. [1]
While at it, remove the superfluous assignment of 0 to 'errmsg_len'
if the above mentioned call fails as it's already initialized to 0.

Submitted by: Michael Plass [1]


# 161619 25-Aug-2006 pjd

Fix comment.


# 161584 24-Aug-2006 marius

Fix a bug introduced with rev. 1.204; in vfs_donmount() use
copyout(9) instead of copystr(9) for copying the errmsg from
kernel- to user-space. This fixes a panic on sparc64 when
using the nmount(2)-converted mountd(8).
While at it, use bcopy(3) instead of strncpy(3) in the kernel-
to kernel-space case for consistency with vfs_buildopts() and
between kernel- to user-space and kernel- to kernel-space case.


# 159982 27-Jun-2006 jhb

- Expand the scope of Giant some in mount(2) to protect the vfsp structure
from going away. mount(2) is now MPSAFE.
- Expand the scope of Giant some in unmount(2) to protect the mp structure
(or rather, to handle concurrent unmount races) from going away.
umount(2) is now MPSAFE, as well as linux_umount() and linux_oldumount().
- nmount(2) and linux_mount() were already MPSAFE.


# 159274 05-Jun-2006 rwatson

Audit some arguments to nmount(), mount(), umount().

Submitted by: wsalamon
Obtained from: TrustedBSD Project


# 159181 02-Jun-2006 pjd

Fix a problem introduced in revision 1.220. On mount(2) failure, don't
forget to unbusy file system before its destruction.

This fixes the following warning on mount failure:

Mount point <X> had 1 dangling refs

Tested by: wkoszek


# 158930 26-May-2006 rodrigc

Add "update" mount option to global_opts array,
for use with vfs_filteropt().


# 158924 25-May-2006 rodrigc

Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems. Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.


# 158611 15-May-2006 kbyanc

Restore the ability to mount procfs and fdescfs filesystems via the
mount(2) system call:

* Add cmount hook to fdescfs and pseudofs (and, by extension, procfs and
linprocfs). This (mostly) restores the ability to mount these
filesystems using the old mount(2) system call (see below for the
rest of the fix).

* Remove not-NULL check for the data argument from the mount(2) entry
point. Per the mount(2) man page, it is up to the individual
filesystem being mounted to verify data. Or, in the case of procfs,
etc. the filesystem is free to ignore the data parameter if it does
not use it. Enforcing data to be not-NULL in the mount(2) system call
entry point prevented passing NULL to filesystems which ignored the
data pointer value. Apparently, passing NULL was common practice
in such cases, as even our own mount_std(8) used to do it in the
pre-nmount(2) world.

All userland programs in the tree were converted to nmount(2) long ago,
but I've found at least one external program which broke due to this
(presumably unintentional) mount(2) API change. One could argue that
external programs should also be converted to nmount(2), but then there
isn't much point in keeping the mount(2) interface for backward
compatibility if it isn't backward compatible.


# 158546 13-May-2006 rodrigc

For nmount(), if "rw" is specified as a mount option,
add "noro" to the list of mount options. This allows
a read-only mount to be converted to read-write via:
mount -u -o rw

Requested by: kris


# 157343 31-Mar-2006 jeff

- When there are dangling vnodes at unmount print them before we panic.

Sponsored by: Isilon Systems, Inc.


# 157322 31-Mar-2006 jeff

- Allocate mounts from a uma zone that uses UMA_ZONE_NOFREE to prevent
mount memory from being reclaimed. This resolves a number of race
conditions described in vfs_default.c and introduced with the
VFS_LOCK_GIANT macros.
- Let the mtx and lock remain valid after the mount structure has been
freed by using init and fini calls. Technically fini will never be
called but is included for completeness.
- Consistently use lockmgr directly rather than lockmgr to lock and
vfs_unbusy to unlock.

Discussed with: tegge
Tested by: kris
Sponsored by: Isilon Systems, Inc.


# 156685 13-Mar-2006 ru

The mount(8) manpage says: "In case of conflicting options being
specified, the rightmost option takes effect." Fix code to obey
this. This makes e.g. "mount -r /usr" or "mount -ar" actually
mount file systems read-only.


# 156451 08-Mar-2006 tegge

Use vn_start_secondary_write() and vn_finished_secondary_write() as a
replacement for vn_write_suspend_wait() to better account for secondary write
processing.

Close race where secondary writes could be started after ffs_sync() returned
but before the file system was marked as suspended.

Detect if secondary writes or softdep processing occurred during vnode sync
loop in ffs_sync() and retry the loop if needed.


# 155902 22-Feb-2006 jeff

- We can not hold a vnode lock while we do a lookup. Search for and load
modules prior to looking up the directory which we will cover to avoid
this problem in mount.
- We must hold the coveredvp locked before we can busy the mountpoint to
prevent a lock order reversal with the vfs_busy() in lookup which holds
the directory lock prior to doing a vfs_busy(). The directory lock is
required to safely clear the v_mountedhere field on the directory.

MFC After: 1 week


# 155386 06-Feb-2006 jeff

- Add a ref count to the mount structure. Sleep for up to 3 seconds in
vfs_mount_destroy waiting for this ref to hit 0. We don't print an
error if we are rebooting as the root mount always retains some refernces
by init proc.
- Acquire a mnt ref for every vnode allocated to a mount point. Drop this
ref only once vdestroy() has been called and the mount has been freed.
- No longer NULL the v_mount pointer in delmntque() so that we may release
the ref after vgone() has been called. This allows us to guarantee
that the mount point structure will be valid until the last vnode has
lost its last ref.
- Fix a few places that rely on checking v_mount to detect recycling.

Sponsored by: Isilon Systems, Inc.
MFC After: 1 week


# 154962 28-Jan-2006 ssouhlal

Don't try to load KLDs if we're mounting the root. We'd otherwise panic.

Tested by: kris
MFC after: 3 days


# 154403 15-Jan-2006 csjp

vfs_busy can only return something useful if MNTK_UNMOUNT has been set.
Since we are using vfs_busy() on a freshly allocated mount structure, use
(void) to show that we do not care about the return value.

Found with: Coverity Prevent (tm)
MFC after: 2 weeks


# 154402 15-Jan-2006 rwatson

Cast VFS_STATFS() in vfs_domount() to (void) to indicate that ignoring the
return value is intentional: this is simply an attempt to pre-cache the
statfs state.

Found with: Coverity Prevent (tm)
MFC after: 3 days


# 154373 14-Jan-2006 ru

AMD64 also supports disk slices.


# 154152 09-Jan-2006 tegge

Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by: truckman


# 153546 19-Dec-2005 pjd

vfs_mount_alloc() always returns 0, but what we really want is newly
allocated 'struct mount *' pointer, so simplify code a bit and return
the pointer directly.

Reviewed by: ssouhlal


# 153541 19-Dec-2005 pjd

Use 'td' instead of 'curthread'.


# 153226 08-Dec-2005 rodrigc

In devfs_first(), set mp->mnt_opt to a valid empty list of mount options
instead of leaving it NULL. This eliminates a kernel panic
when trying to do a mount -o update of /dev.

Noticed by: cjsp
Reviewed by: phk


# 153225 08-Dec-2005 rodrigc

Add "errmsg" to list of global mount options.


# 153051 03-Dec-2005 rodrigc

Add "rdonly" to global_opts, and parse it in vfs_donmount().

Requested by: rwatson


# 153034 02-Dec-2005 rodrigc

- Add "rw" mount option to global_opts.
- In vfs_donmount(), parse "ro", "noro", and "rw", in order to set or
unset the MNT_RDONLY filesystem flag.


# 152735 23-Nov-2005 rodrigc

In nmount() and vfs_donmount(), do not strcmp() the options in the iovec
directly. We need to copyin() the strings in the iovec before
we can strcmp() them. Also, when we want to send the errmsg back
to userspace, we need to copyout()/copystr() the string.

Add a small helper function vfs_getopt_pos() which takes in the
name of an option, and returns the array index of the name in the iovec,
or -1 if not found. This allows us to locate an option in
the iovec without actually manipulating the iovec members. directly via
strcmp().

Noticed by: kris on sparc64


# 152620 19-Nov-2005 marcel

Fix bug introduced in revision 1.186:
When all file systems have a time stamp of zero, which is the case
for example when the root file system is on a read-only medium, we
ended up not calling inittodr() at all. A potential uncleanliness
existed as well. If multiple file systems had a non-zero time stamp,
we would call inittodr() multiple times. While this should not be
harmful, it's definitely not ideal.
Fix both issues by iterating over the mounted file systems to find
the largest time stamp and call inittodr() exactly once with that
time stamp. This could of course be a zero time stamp if none of the
mounted file systems have a non-zero time stamp. In that case the
annoying errors mentioned in the commit log for revision 1.186 still
haven't been avoided. The bottom line is that inittodr() should not
complain when it gets a time base of zero. At the time of this
commit only alpha seems to have that problem.

Reported by: Dario Freni (saturnero at freesbie dot org)
MFC after: 1 week


# 152619 19-Nov-2005 rodrigc

Parse more mount options in vfs_donmount(), before vfs_domount()
is called. It looks like there are lots of different mount flags checked
in vfs_domount(), so we need to do the parsing for these particular
mount flags earlier on. The new flags parsed are:
async, force, multilabel, noasync, noatime, noclusterr, noclusterw,
noexec, nosuid, nosymfollow, snapshot, suiddir, sync, union.

Existing code which uses mount() to mount UFS filesystems is not
affected, but new code which uses nmount() to mount UFS filesystems
should behave better.


# 152561 17-Nov-2005 rodrigc

In vfs_nmount(), check to see if "update" mount option was passed
in, and if so, set MNT_UPDATE filesystem flag.
vfs_nmount() calls vfs_domount(), and there is special logic
inside vfs_domount() if MNT_UPDATE is set. This is very important
when we want to do an update mount of the root filesystem, using nmount().


# 152332 12-Nov-2005 rodrigc

style(9) cleanups.

Spotted by: njl, bde


# 152217 09-Nov-2005 rodrigc

For nmount(), allow a text string error message to be propagated back
to user-space if a parameter named "errmsg" is passed into the iovec.
Used in conjunction with vfs_mount_error(), more useful error messages
than errno can be passed back to userspace when mounting a filesystem
fails.

Discussed with: phk, pjd


# 152176 08-Nov-2005 rodrigc

Add utility function to propagate mount errors as text string messages.

Discussed with: phk


# 149712 02-Sep-2005 ssouhlal

Don't unbusy the devfs mount in vfs_mountroot_try() as it gets accessed
and unbusied in devfs_fixup(), which assumes that the devfs mount is
still locked.

Granced at by: phk
MFC after: 3 days


# 146354 18-May-2005 pjd

devfs_first() return value isn't used, remove it.


# 146125 11-May-2005 pjd

We don't use 'mp' variable, but we do want to mount devfs, ehh.


# 146119 11-May-2005 pjd

Remove unised variable introduced by accident in rev 1.168.

Found by: Coverity Prevent analysis tool


# 146116 11-May-2005 pjd

Plug memory leaks.

Found by: Coverity Prevent analysis tool


# 145733 30-Apr-2005 jeff

- Remove an old splcam hack.


# 145304 19-Apr-2005 pjd

Call g_waitidle() before every check the list of holds is empty.

Suggested by: phk


# 145259 19-Apr-2005 phk

Call g_waitidle() instead of GEOM using the root_mount_hold() KPI.
GEOM could (and will) get events as a result of drivers coming in
late so a one-shot method is not good enough for GEOM.


# 145250 18-Apr-2005 phk

Add a named reference-count KPI to hold off mounting of the root filesystem.

While we wait for holds to be released, print a list of who holds us
back once per second.

Use the new KPI from GEOM instead of vfs_mount.c calling g_waitidle().

Use the new KPI also from ata.

With ATAmkIII's newbusification, ata could narrowly miss the window
and ad0 would not exist when we tried to mount root.


# 145249 18-Apr-2005 phk

Initialize mountlist_mtx with an MTX_SYSINIT(), we need it to be ready
earlier.


# 144367 31-Mar-2005 jeff

- LK_NOPAUSE is a nop now.

Sponsored by: Isilon Systems, Inc.


# 144089 24-Mar-2005 marcel

Fix inittodr() invocation. Now that devfs is mounted before the
actual root file system is mounted, the first entry on the mountlist
is not the root file system and the timestamp for that entry is
typically 0. Passing that to inittodr() caused annoying errors on
alpha and ia64.
So, call inittodr() for all file systems on mountlist, but only when
the timestamp (mnt_time) is non-zero.


# 144055 24-Mar-2005 jeff

- Pass LK_EXCLUSIVE to VFS_ROOT() to satisfy the new flags argument. For
now, all calls to VFS_ROOT() should still acquire exclusive locks.

Sponsored by: Isilon Systems, Inc.


# 143690 16-Mar-2005 phk

Fix a memoryleak in case of failed root filesystem mount.

Spotted by: Coverity via sam


# 143683 16-Mar-2005 jmg

MFp4: print a more useful error when we don't have a /dev to mount devfs
on..


# 143680 16-Mar-2005 phk

Add mnt_hashseed to struct mount and initialize it witn PRNG bits, use
it to get better hashing in vfs_hash.

In case of an insert collision in vfs_hash_insert(), put the loosing vnode
on a special list so that vfs_hash_remove() can just assume that it is on
a list.

Drop the VI_HASHED flag.


# 142153 20-Feb-2005 das

Remove VFS_START(). Its original purpose involved the mfs filesystem,
which is long gone.

Discussed with: mckusick
Reviewed by: phk


# 141634 10-Feb-2005 phk

Make various mountpoint related functions static.


# 141206 03-Feb-2005 pjd

- Move gets() function to libkern (I want to use it outside vfs_mount.c).
- Add buffer size limitations (overflow will not be possible anymore).
- Add 'visible' option, which will allow for passphrase reading in the
future.
- Remove special treatment of '@' and '#', those two are only confusing.

Discussed with: rwatson
MFC after: 2 weeks


# 140715 24-Jan-2005 jeff

- Protect mnt_kern_flag with the mountpoint's mutex. This is required
to make the suspend related functions mpsafe.

Sponsored By: Isilon Systems, Inc.


# 140220 14-Jan-2005 phk

Eliminate unused and unnecessary "cred" argument from vinvalbuf()


# 140048 11-Jan-2005 phk

Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().

I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

The credentials for syncing a file (ability to write to the
file) should be checked at the system call level.

Credentials for syncing one or more filesystems ("none")
should be checked at the system call level as well.

If the filesystem implementation needs a particular credential
to carry out the syncing it would logically have to the
cached mount credential, or a credential cached along with
any delayed write data.

Discussed with: rwatson


# 139804 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


# 139337 27-Dec-2004 kan

Do not vput(9) unlocked vnode and do not VREF it with the sole purpose
of vputting it back immediately.

Complained by: DEBUG_VFS_LOCKS


# 139095 20-Dec-2004 phk

Hide/remove various printfs, now that root mounting doesn't seem to explode
on people.


# 138835 14-Dec-2004 phk

Move the checkdirs() function from vfs_mount.c to kern_descrip.c and
call it mountcheckdirs().


# 138696 11-Dec-2004 phk

Copy the entire stats structure. Let compiler decide how.


# 138691 11-Dec-2004 phk

Fix whitespace.

Spotted by: njl


# 138679 11-Dec-2004 phk

Remove the /dev/dev -> / symlink after we are done with it.


# 138509 07-Dec-2004 phk

The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
Convert to nmount.
Add omount compat shims.
Remove dedicated rootfs mounting code.
Use vfs_mountedfrom()
Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
Convert to nmount (the simple way, mount_nfs(8) is still necessary).
Add omount compat shims.
Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
Convert to nmount.
Add omount compat shims.
Remove dedicated rootfs mounting code.
Use vfs_mountedfrom()
Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
Mount devfs on /. Devfs needs no 'from' so this is clean.
symlink /dev to /. This makes it possible to lookup /dev/foo.
Mount "real" root filesystem on /.
Surgically move the devfs mountpoint from under the real root
filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
Don't do devfs mounting and rootvnode assignment here, it was
already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias(). Put the
few necessary lines in devfs where they belong. This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway. Correct information is provided by
statfs(/).


# 138507 07-Dec-2004 phk

Instead of complaining about it, just silently filter out MNT_ROOTFS.

This fixes the "fsck /" problem various people have reported overnight.


# 138480 06-Dec-2004 phk

Always call VFS_STATFS() on mp->mnt_stat when we have mounted a filesystem,
this way individual filesystems don't have to do it.


# 138467 06-Dec-2004 phk

Add more functions for handling mount arguments in VFS_MOUNT():

vfs_flagopt() for binary/boolean options.
vfs_getopts() for string options
vfs_filteropt() to check for unknown options.
vfs_scanopt() for scanf() like processing of options.

Also add function for setting the stat.f_mntfromname field.


# 138461 06-Dec-2004 phk

Change the first argument of vfs_cmount() to a handy struct mntarg* and
call it accordingly.

(No filesystems implement vfs_cmount() yet, so this is a no-op commit)


# 138448 06-Dec-2004 phk

Add a few convenient functions in the mount_arg() family and collect the
entire family at the end of the source file.


# 138446 06-Dec-2004 phk

Collapse two almost identical license copies, preserving the rights of
all listed authors, rightholders and contributors.


# 138445 06-Dec-2004 phk

Remove the kern.rootdev sysctl.

Root filessytems (like NFS) don't have an associated disk device,
and even if they had, the exact semantics would be filesystem
dependent and should be implemented there.


# 138444 06-Dec-2004 phk

Make struct vfsopt{list} private to vfs_mount.c


# 138412 05-Dec-2004 phk

VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases
doesn't. Most of the implementations have grown weeds for this so they
copy some fields from mnt_stat if the passed argument isn't that.

Fix this the cleaner way: Always call the implementation on mnt_stat
and copy that in toto to the VFS_STATFS argument if different.


# 138363 03-Dec-2004 phk

Implement a function, mount_arg() for accumulating a list of mount parameters
to nmount.

Make kernel_mount() accept the output from mount_arg() and know how to
free the malloc'ed space.

Make kernel_vmount() use the new function.


# 138360 03-Dec-2004 phk

When omount() is called, check if the filesystem have a cmount method
and if so call it.

The cmount method will gather and interpret omount() style arguments,
and issue a kern_[v]mount() call to execute the corresponding nmount
operation.


# 138357 03-Dec-2004 phk

Add early checks for MNT_ROOTFS since we need to allow it later on in
the code path.


# 138356 03-Dec-2004 phk

Retire unused vfs_mount() function in the name of nmount migration.


# 138350 03-Dec-2004 phk

Introduce vfs_byname_kld() which will try to load the filesystem
as a module if possible.

Use it so we don't have linker magic in the middle of the already
complex mount code.


# 138150 28-Nov-2004 phk

Use FILEDESC_LOCK_FAST in checkdirs()


# 138121 26-Nov-2004 phk

Eliminate MNT_NODEV usage, it doesn't have any meaning any more.

Keep a #define MNT_NODEV 0 around to avoid dealing with contrib
userland like mount_smbfs.


# 138091 25-Nov-2004 phk

Allow a filesystem to have both old and new mount methods at the same
time. This will be necessary for transitioning.


# 138087 25-Nov-2004 phk

Assert Giant held in vfs_domount() and vfs_dounmount()

Explicitly grab Giant before calling these.


# 138077 25-Nov-2004 phk

Integrate the relevant bits of vfs_rootmountalloc() where it matters.


# 137863 18-Nov-2004 phk

Pass path to filesystem when mounting root


# 137523 10-Nov-2004 phk

remove unused variable


# 137509 10-Nov-2004 phk

Remove hack which mounts the root filesystem R/W if the device is
named 'md<something>'. While convenient, it does not belong here,
if anywhere at all.


# 137484 09-Nov-2004 phk

Make getdiskbyname() static to vfs_mount.c.

Eliminate use of vn_todev() while here.


# 136835 23-Oct-2004 phk

Drop Giant around the call to g_waitidle().
This is necessary to allow any geom events which need it to pick up Giant.


# 136144 05-Oct-2004 pjd

Back out changes which were introduced to delay mounting root file system.
Those changes were made on gmirror needs, but now gmirror handles this
by itself.


# 135728 24-Sep-2004 pjd

Rename 'mount_root_delay' tunable to 'vfs.root.mountdelay', which fits
a bit better to our current naming scheme.

Discussed with: ru


# 135720 24-Sep-2004 phk

Eliminate devsw() call, we are not dereferencing the pointer.


# 135612 23-Sep-2004 pjd

Introduce new /boot/loader.conf variable: root_mount_delay.
It can be used to delay mounting root partition to give a chance to GEOM
providers to show up.
Now, when there is no needed provider, vfs_rootmount() function will look
for it every second and if it can't be find in defined time, it'll ask
for root device name (before this change it was done immediately).

This will allow to boot from gmirror device in degraded mode.


# 134827 05-Sep-2004 alfred

It's too easy to panic the machine when INVARIANTS are turned on
and you botch a call to nmount(2).

This is because there is an INVARIANTS check that asserts that
opt->len must be zero if opt->val is not NULL. The problem is that
the code does not actually follow this invariant if there is an
error while processing mount options.

Fix the code to honor the INVARIANT.

Silence on: fs@


# 132902 30-Jul-2004 phk

Put a version element in the VFS filesystem configuration structure
and refuse initializing filesystems with a wrong version. This will
aid maintenance activites on the 5-stable branch.

s/vfs_mount/vfs_omount/

s/vfs_nmount/vfs_mount/

Name our filesystems mount function consistently.

Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space. A few places abused
it to get hold of some credentials to pass around. Effectively
it is unused.

Reorganize the root filesystem selection code.


# 132710 27-Jul-2004 phk

Convert the vfsconf list to a TAILQ.

Introduce vfs_byname() function to find things on it.

Staticize vfs_nmount() function under the name vfs_donmount().

Various cleanups.


# 132117 13-Jul-2004 phk

Give kldunload a -f(orce) argument.

Add a MOD_QUIESCE event for modules. This should return error (EBUSY)
of the module is in use.

MOD_UNLOAD should now only fail if it is impossible (as opposed to
inconvenient) to unload the module. Valid reasons are memory references
into the module which cannot be tracked down and eliminated.

When kldunloading, we abandon if MOD_UNLOAD fails, and if -force is
not given, MOD_QUIESCE failing will also prevent the unload.

For backwards compatibility, we treat EOPNOTSUPP from MOD_QUIESCE as
success.

Document that modules should return EOPNOTSUPP for unknown events.


# 132023 12-Jul-2004 alfred

Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.


# 131897 10-Jul-2004 phk

Clean up and wash struct iovec and struct uio handling.

Add copyiniov() which copies a struct iovec array in from userland into
a malloc'ed struct iovec. Caller frees.

Change uiofromiov() to malloc the uio (caller frees) and name it
copyinuio() which is more appropriate.

Add cloneuio() which returns a malloc'ed copy. Caller frees.

Use them throughout.


# 131696 06-Jul-2004 alfred

Use vfs_suser() where appropriate.


# 131691 06-Jul-2004 alfred

NFS mobility PHASE I, II & III (phase VI, and V pending):

Rebind the client socket when we experience a timeout. This fixes
the case where our IP changes for some reason.

Signal a VFS event when NFS transitions from up to down and vice
versa.

Add a placeholder vfs_sysctl where we will put status reporting
shortly.

Also:
Make down NFS mounts return EIO instead of EINTR when there is a
soft timeout or force unmount in progress.


# 131562 04-Jul-2004 alfred

Introduce a new kevent filter. EVFILT_FS that will be used to signal
generic filesystem events to userspace. Currently only mount and unmount
of filesystems are signalled. Soon to be added, up/down status of NFS.

Introduce a sysctl node used to route requests to/from filesystems
based on filesystem ids.

Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/
entrypoint by the sysctl code to change individual filesystems.


# 131551 04-Jul-2004 phk

When we traverse the vnodes on a mountpoint we need to look out for
our cached 'next vnode' being removed from this mountpoint. If we
find that it was recycled, we restart our traversal from the start
of the list.

Code to do that is in all local disk filesystems (and a few other
places) and looks roughly like this:

MNT_ILOCK(mp);
loop:
for (vp = TAILQ_FIRST(&mp...);
(vp = nvp) != NULL;
nvp = TAILQ_NEXT(vp,...)) {
if (vp->v_mount != mp)
goto loop;
MNT_IUNLOCK(mp);
...
MNT_ILOCK(mp);
}
MNT_IUNLOCK(mp);

The code which takes vnodes off a mountpoint looks like this:

MNT_ILOCK(vp->v_mount);
...
TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes);
...
MNT_IUNLOCK(vp->v_mount);
...
vp->v_mount = something;

(Take a moment and try to spot the locking error before you read on.)

On a SMP system, one CPU could have removed nvp from our mountlist
but not yet gotten to assign a new value to vp->v_mount while another
CPU simultaneously get to the top of the traversal loop where it
finds that (vp->v_mount != mp) is not true despite the fact that
the vnode has indeed been removed from our mountpoint.

Fix:

Introduce the macro MNT_VNODE_FOREACH() to traverse the list of
vnodes on a mountpoint while taking into account that vnodes may
be removed from the list as we go. This saves approx 65 lines of
duplicated code.

Split the insmntque() which potentially moves a vnode from one mount
point to another into delmntque() and insmntque() which does just
what the names say.

Fix delmntque() to set vp->v_mount to NULL while holding the
mountpoint lock.


# 130795 20-Jun-2004 tmm

Initialize ni_cnd.cn_cred before calling lookup() (this is normally done
by namei(), which cannot easily be used here however). This fixes boot
time crashes on sparc64 and probably other platforms.

Reviewed by: phk


# 130651 17-Jun-2004 phk

Reduce the thaumaturgical level of root filesystem mounts: Instead of using
an otherwise redundant clone routine in geom_disk.c, mount a temporary
DEVFS and do a proper lookup.

Submitted by: thomas


# 130640 17-Jun-2004 phk

Second half of the dev_t cleanup.

The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.


# 130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


# 127911 05-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 127476 27-Mar-2004 pjd

- Add a description for vfs.usermount sysctl.
- Add the vfs_equalopts() function for mount options comparsion.
Now it looks much more clear.
- Style fixed.

In co-operation with: bde


# 127473 27-Mar-2004 pjd

- Loudly disallow MNT_SUIDDIR mount flag for unprivileged users mounts.
- Style fixed.

Submitted by: bde


# 127464 26-Mar-2004 pjd

We probably shouldn't allow users to mount file systems with MNT_SUIDDIR.
There should be not shell access when SUIDDIR is compiled in, but
better be sure.

Reviewed by: rwatson


# 127058 16-Mar-2004 tjr

Make vfs_nmount() public. The Linux emulator needs this in order to mount
linprocfs filesystems.


# 126852 11-Mar-2004 phk

Remove unused mnt_reservedvnlist field.


# 125957 18-Feb-2004 cperciva

Don't ignore errors from vfs_allocate_syncvnode.

PR: kern/18503
Submitted by: Anatoly Vorobey <mellon@pobox.com>
Approved by: rwatson (mentor)


# 125340 02-Feb-2004 pjd

Fix many issues related to mount/unmount:

1. Root from inside a jail was able to unmount any file system
(except /).
2. Unprivileged root was able to unmount file systems mounted by
privileged root (execpt /).
3. User from inside a jail was able to mount file system when
sysctl vfs.usermount was set to 1.
4. User was able to mount file system when vfs.usermount was set to 1
(that's ok) and unmount it even if vfs.usermount was equal to 0
(that's not correct).

Possibility from point 1 was reported by: Dariusz Kowalski <darek@76.pl>

Only a part of this fix will be MFC'ed (if approved).

PR: kern/60149
Reviewed by: rwatson
Approved by: scottl (mentor)
MFC after: 3 days


# 123075 30-Nov-2003 iedowse

In dounmount(), only call checkdirs() prior to VFS_UNMOUNT() in the
forced unmount case. Otherwise, a file system that is referenced
only by process fd_cdir/fd_rdir references to the file system root
vnode will be successfully unmounted without the MNT_FORCE flag.

The previous behaviour was not compatible with the unmount semantics
required by amd(8), so file systems could be unexpectedly unmounted
while there were still references to the file system root directory.

Reported by: Erez Zadok <ezk@cs.sunysb.edu>
Approved by: re (scottl)


# 122965 23-Nov-2003 kan

Do not attempt to destroy NULL vfs options list.

Approved by: re (scottl)
Reported by: Christian Laursen <xi atborderworlds dot dk>


# 122640 14-Nov-2003 kan

Fix a number of style(9) bugs introduced in r1.113 by me.

Suggested by: bde


# 122567 12-Nov-2003 peter

MNAMELEN is back to an int again after Kirk's statfs commit

kern/vfs_mount.c:1305: warning: signed size_t format, different type arg (arg 4)
*** Error code 1


# 122523 12-Nov-2003 kan

1. Consolidate mount struct allocation/destruction into a common code in
vfs_mount_alloc/vfs_mount_destroy functions and take care to completely
destroy the mount point along with its locks. Mount struct has grown in
coplexity recently and depending on each failure path to destroy it
completely isn't working anymore.

2. Eliminate largely identical vfs_mount and vfs_unmount question by
moving the code to handle both cases into a newly introduced vfs_domount
function.

3. Simplify nfs_mount_diskless to always expect an allocated mount
struct and never attempt an allocation/destruction itself. The
vfs_allocroot allocation was there to support 'magic' swap space
configuration for diskless clients that was already removed by PHK some
time ago.

4. Include a vfs_buildopts cleanups by Peter Edwards to validate the
sanity of nmount parameters passed from userland.

Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com>
Reviewed by: rwatson


# 122091 05-Nov-2003 kan

Remove mntvnode_mtx and replace it with per-mountpoint mutex.
Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.

Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.

Discussed with: jeff


# 120462 26-Sep-2003 phk

Update the list of CDROM device names to try for booting with RB_CDROM
flag set.


# 119885 08-Sep-2003 iedowse

In the !MNT_BYFSID case, return EINVAL from unmount(2) when the
specified directory is not found in the mount list. Before the
MNT_BYFSID changes, unmount(2) used to return ENOENT for a nonexistent
path and EINVAL for a non-mountpoint, but we can no longer distinguish
between these cases. Of the two error codes, EINVAL was more likely
to occur in practice, and it was the only one of the two that was
documented.

Update the manual page to match the current behaviour.

Suggested by: tjr
Reviewed by: tjr


# 117132 01-Jul-2003 iedowse

Add a new mount flag MNT_BYFSID that can be used to unmount a file
system by specifying the file system ID instead of a path. Use this
by default in umount(8). This avoids the need to perform any vnode
operations to look up the mount point, so it makes it possible to
unmount a file system whose root vnode cannot be looked up (e.g.
due to a dead NFS server, or a file system that has become detached
from the hierarchy because an underlying file system was unmounted).
It also provides an unambiguous way to specify which file system is
to be unmunted.

Since the ability to unmount using a path name is retained only for
compatibility, that case now just uses a simple string comparison
of the supplied path against f_mntonname of each mounted file system.

Discussed on: freebsd-arch
mdoc help from: ru


# 116182 10-Jun-2003 obrien

Use __FBSDID().


# 115960 07-Jun-2003 phk

Improve the root-dev prompt facility for printing devices which could
possibly be a root filesystem.


# 113958 24-Apr-2003 tjr

Free mount credentials (mnt_cred) when freeing the mount struct
in failure cases to avoid leaking struct ucreds, and ultimately
leaking struct uidinfo references.


# 113887 23-Apr-2003 obrien

Add /dev to the Alpha manual mount root example.


# 112693 26-Mar-2003 tegge

Adjust the number of vnodes scanned by vlrureclaim() according to the
size of the vnode list.


# 111242 22-Feb-2003 rwatson

Export the name of the device used to mount the root file system as
kern.rootdev. If rootdev is undefined (NFS mount, etc), export an
empty string.

Desired by: peter


# 111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 110906 15-Feb-2003 alfred

Fix LOR with PROC/filedesc. Introduce fdesc_mtx that will be used as a
barrier between free'ing filedesc structures. Basically if you want to
access another process's filedesc, you want to hold this mutex over the
entire operation.


# 110863 14-Feb-2003 des

Style nit.


# 110862 14-Feb-2003 alfred

KASSERT format string does not need newline termination


# 110861 14-Feb-2003 alfred

Add kasserts to catch bad API usage.

Submitted by: Hiten Pandya <hiten@unixdaemons.com>


# 109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 108524 31-Dec-2002 alfred

When compiling the kernel do not implicitly include filedesc.h from proc.h,
this was causing filedesc work to be very painful.
In order to make this work split out sigio definitions to thier own header
(sigio.h) which is included from proc.h for the time being.


# 107855 14-Dec-2002 alfred

unwrap lines made short enough by SCARGS removal


# 107850 14-Dec-2002 alfred

remove syscallarg().

Suggested by: peter


# 107849 13-Dec-2002 alfred

SCARGS removal take II.


# 107839 13-Dec-2002 alfred

Backout removal SCARGS, the code freeze is only "selectively" over.


# 107838 13-Dec-2002 alfred

Remove SCARGS.

Reviewed by: md5


# 106576 07-Nov-2002 mux

- Use a better definition for MNAMELEN which doesn't require
to have one #ifdef per architecture.
- Change a space to a tab after a nearby #define.

Obtained from: bde


# 105948 25-Oct-2002 phk

#include <geom/geom.h> to get proper prototypes. Contrary to my fears we
seem to have all the prerequisites already.

Call g_waitidle() as the first thing in vfs_mountroot() so that we have
it out of the way before we even decide if we should call .._ask() or
.._try().

Call the g_dev_print() function to provide better guidance for the
root-mount prompt.


# 105893 24-Oct-2002 phk

Make sure GEOM has stopped rattling the disks before we try to mount
the root filesystem, this may be implicated in the PC98 issue.


# 105668 21-Oct-2002 mckusick

This update removes a race between unmount and lookup. The lookup
locks the mount point directory while waiting for vfs_busy to clear.
Meanwhile the unmount which holds the vfs_busy lock tried to lock
the mount point vnode. The fix is to observe that it is safe for the
unmount to remove the vnode from the mount point without locking it.
The lookup will wait for the unmount to complete, then recheck the
mount point when the vfs_busy lock clears.

Sponsored by: DARPA & NAI Labs.


# 105648 21-Oct-2002 phk

GEOM does not (and shall not) propagate flags like D_MEMDISK, so we will
revert to checking the name to determine if our root device is a ramdisk,
md(4) specifically to determine if we should attempt the root-mount RW

Sponsored by: DARPA & NAI Labs.


# 103928 24-Sep-2002 jeff

- Don't protect mountedhere with the vn interlock.
- Protect mountedhere with the vn lock.


# 103650 19-Sep-2002 mux

Switch to using strlcpy() in several places. It seems there
were cases where we could get unterminated strings before.


# 102088 19-Aug-2002 phk

Keep a copy of the credential used to mount filesystems around so
we can check and use it later on.

Change the pieces of code which relied on mount->mnt_stat.f_owner
to check which user mounted the filesystem.

This became needed as the EA code needs to be able to allocate
blocks for "system" EA users like ACLs.

There seems to be some half-baked (probably only quarter- actually)
notion that the superuser for a given filesystem is the user who
mounted it, but this has far from been carried through. It is
unclear if it should be.

Sponsored by: DARPA & NAI Labs.


# 101308 04-Aug-2002 jeff

- Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.

Idea stolen from: BSD/OS


# 101241 02-Aug-2002 mux

Make the consumers of the linker_load_file() function use
linker_load_module() instead.

This fixes a bug where the kernel was unable to properly locate and
load a kernel module in vfs_mount() (and probably in the netgraph
code as well since it was using the same function). This is because
the linker_load_file() does not properly search the module path.

Problem found by: peter
Reviewed by: peter
Thanks to: peter


# 101173 01-Aug-2002 rwatson

Include file cleanup; mac.h and malloc.h at one point had ordering
relationship requirements, and no longer do.

Reminded by: bde


# 101004 30-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the necessary MAC entry points to maintain labels on
mount structures. In particular, invoke entry points for
intialization and destruction in various scenarios (root,
non-root). Also introduce an entry point in the boot procedure
following the mount of the root file system, but prior to the
start of the userland init process to permit policies to
perform further initialization.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 100863 29-Jul-2002 jeff

- Backout the patch made in revision 1.75 of vfs_mount.c. The vputs here
were hiding the real problem of the missing unlock in sync_inactive.
- Add the missing unlock in sync_inactive.

Submitted by: iedowse


# 100631 24-Jul-2002 mux

Fix a stupid bug where I wasn't initializing the names
of 0-length mount options.


# 100363 19-Jul-2002 mux

- Merge the mount options at MNT_UPDATE time with vfs_mergeopts().
- Sanity check the mount options list (remove duplicates) with
vfs_sanitizeopts().
- Fix some malloc(0)/free(NULL) bugs.

Reviewed by: rwatson (some time ago)


# 99690 09-Jul-2002 jeff

- Use standard locking functions in syncer's opv
- vput instead of vrele syncer vnodes in vfs_mount
- Add vop_lookup_{pre,post} to verify locking in VOP_LOOKUP


# 99602 08-Jul-2002 mux

Add a VFS_START() call in vfs_mountroot_try() for the sake
of being correct. None of the root mountable filesystems
do something at VFS_START().

Shorten a comment to fix a style bug while I'm here.

PR: kern/18505


# 99423 05-Jul-2002 jeff

Include systm.h before vnode.h so Debugger() and printf() are available when
full vnode lock debugging is enabled.


# 99338 03-Jul-2002 mux

Move vfs_rootmountalloc() in vfs_mount.c and remove lite2_vfs_mountroot()
which was #if 0'd and is not likely to be used now.


# 99336 03-Jul-2002 mux

Remove an unused argument in vfs_mountroot().


# 99266 02-Jul-2002 mux

I didn't pay enough attention when copy/pasting disclaimers.
The disclaimer in vfs_conf.c was slightly different. Fix this.


# 99264 02-Jul-2002 mux

Move every code related to mount(2) in a new file, vfs_mount.c.
The file vfs_conf.c which was dealing with root mounting has
been repo-copied into vfs_mount.c to preserve history.
This makes nmount related development easier, and help reducing
the size of vfs_syscalls.c, which is still an enormous file.

Reviewed by: rwatson
Repo-copy by: peter


# 94936 17-Apr-2002 mux

Rework the kernel environment subsystem. We now convert the static
environment needed at boot time to a dynamic subsystem when VM is
up. The dynamic kernel environment is protected by an sx lock.

This adds some new functions to manipulate the kernel environment :
freeenv(), setenv(), unsetenv() and testenv(). freeenv() has to be
called after every getenv() when you have finished using the string.
testenv() only tests if an environment variable is present, and
doesn't require a freeenv() call. setenv() and unsetenv() are self
explanatory.

The kenv(2) syscall exports these new functionalities to userland,
mainly for kenv(1).

Reviewed by: peter


# 93467 31-Mar-2002 phk

Centralize the "bootdev" and "dumpdev" variables. They are still pretty
bogus all things considered, but at least now they don't camouflage as
being MD variables.


# 91859 08-Mar-2002 phk

Move the mount of the root filesystem to happen in the init process before
the exec if /sbin/init.

This allows the scheduler to get started and kthreads a chance to run
before we start filesystem operations.


# 91690 05-Mar-2002 eivind

Document all functions, global and static variables, and sysctls.
Includes some minor whitespace changes, and re-ordering to be able to document
properly (e.g, grouping of variables and the SYSCTL macro calls for them, where
the documentation has been added.)

Reviewed by: phk (but all errors are mine)


# 86842 23-Nov-2001 obrien

Remove the use of _PATH_DEV in the example.

The kernel certainly doesn't use _PATH_DEV or even /dev/ to find the device.
It cannot, since "/" has not been mounted. Maybe the only affect of using
/dev/ is that it gets put in the mounted-from name for "/", so that mount(8),
etc., display an absolute path before "/" has been remounted. Many have
never bothered typing the full path, and code that constructs a path in
rootdevnames[] never bothered to construct a full path, so the example
shouldn't have it.

Submitted by: bde


# 86704 20-Nov-2001 obrien

We only have slices on i386 and IA-64.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


# 76117 29-Apr-2001 grog

Revert consequences of changes to mount.h, part 2.

Requested by: bde


# 75858 23-Apr-2001 grog

Correct #includes to work with fixed sys/mount.h.


# 73286 01-Mar-2001 adrian

Reviewed by: jlemon

An initial tidyup of the mount() syscall and VFS mount code.

This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.

* the guts of the mount work has been moved into vfs_mount().

* move `type', `path' and `flags' from being userland variables into being
kernel variables in vfs_mount(). `data' remains a pointer into
userspace.

* Attempt to verify the `type' and `path' strings passed to vfs_mount()
aren't too long.

* rework mount() and linux_mount() to take the userland parameters
(besides data, as mentioned) and pass kernel variables to vfs_mount().
(linux_mount() already did this, I've just tidied it up a little more.)

* remove the copyin*() stuff for `path'. `data' still requires copyin*()
since its a pointer into userland.

* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
filesystem. This variable is generally initialised with `path', and
each filesystem can override it if they want to.

* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.


# 72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# 69793 09-Dec-2000 obrien

Add `_PATH_DEVZERO'.
Use _PATH_* where where possible.


# 67882 29-Oct-2000 phk

Remove unneeded #include <sys/proc.h> lines.


# 66615 03-Oct-2000 jasone

Convert lockmgr locks from using simple locks to using mutexes.

Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.


# 65374 02-Sep-2000 phk

Avoid the modules madness I inadvertently introduced by making the
cloning infrastructure standard in kern_conf. Modules are now
the same with or without devfs support.

If you need to detect if devfs is present, in modules or elsewhere,
check the integer variable "devfs_present".

This happily removes an ugly hack from kern/vfs_conf.c.

This forces a rename of the eventhandler and the standard clone
helper function.

Include <sys/eventhandler.h> in <sys/conf.h>: it's a helper #include
like <sys/queue.h>

Remove all #includes of opt_devfs.h they no longer matter.


# 65051 24-Aug-2000 phk

Fix panic when removing open device (found by bp@)
Implement subdirs.
Build the full "devicename" for cloning functions.
Fix panic when deleted device goes away.
Collaps devfs_dir and devfs_dirent structures.
Add proper cloning to the /dev/fd* "device-"driver.
Fix a bug in make_dev_alias() handling which made aliases appear
multiple times.
Use devfs_clone to implement getdiskbyname()
Make specfs maintain the stat(2) timestamps per dev_t


# 60807 22-May-2000 msmith

Make a trip to Pointy-Hats-R-Us and actually include the header that
defines ROOTDEVNAME.

Submitted by: "Jeffrey S. Sharp" <jss@subatomix.com>


# 58389 20-Mar-2000 green

Split the logic of
static int setrootbyname(char *name);
out into
dev_t getdiskbyname(char *name);

This makes it easy to create a new DDB command, which is the big reason
for the change. You can now do the following in DDB:

Example rc.conf entry:
dumpdev="/dev/ad0s1b" # Device name to crashdump to (if enabled).

db> show disk/ad0s1b
dev_t = 0xc0b7ea00
db> p *dumpdev
c0b7ea00


# 57296 17-Feb-2000 msmith

Change the mountroot prompt to something that doesn't look at all like a
firmware prompt. Several sleepy folk mistook the '>>>' for the SRM
prompt, which was never the desired idea.

Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>
Approved by: jkh


# 54498 12-Dec-1999 peter

Put on asbestos suit and put a splcam() around the 'Mounting root from..'
message to stop it splitting. Every single scsi machine I've seen seems
to reliably collide with this and it's rather annoying.


# 54298 08-Dec-1999 phk

Scan cdevs for potential root devices, rather than bdevs.


# 53888 29-Nov-1999 dillon

Make BOOTP work again.

Submitted by: Doug Ambrisko <ambrisko@whistle.com>


# 53858 28-Nov-1999 msmith

Use the correct mounted-from path when allocating the root mount, if we know
what it is.

Be more correct in unbusying the mountpoint (especially before freeing it).

Remove support for mounting 'r' devices as root. You don't mount 'r'
devices anywhere else, and they're going away anyway.

Submitted by: bde


# 53722 26-Nov-1999 phk

Retire MFS_ROOT and MFS_ROOT_SIZE options from the MFS implementation.

Add MD_ROOT and MD_ROOT_SIZE options to the md driver.

Make the md driver handle MFS_ROOT and MFS_ROOT_SIZE options for compatibility.

Add md driver to GENERIC, PCCARD and LINT.

This is a cleanup which removes the need for some of the worse hacks in
MFS: We really want to have a rootvnode but MFS on a preloaded image
doesn't really have one. md is a true device, so it is less trouble.

This has been tested with make release, and if people remember to add
the "md" pseudo-device to their kernels, PicoBSD should be just fine
as well. If people have no other use for MFS, it can be removed from
the kernel.


# 53493 21-Nov-1999 msmith

If vfs_mountroot_try() isn't given a path to try mounting, return a silent
error rather than complaining about it verbosely. No path is not really
a failure, but the diagnostic was confusing and unuseful.


# 53452 20-Nov-1999 phk

struct mountlist and struct mount.mnt_list have no business being
a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively.

This removes ugly mp != (void*)&mountlist comparisons.

Requested by: phk
Submitted by: Jake Burkholder jake@checker.org
PR: 14967


# 53014 08-Nov-1999 phk

Ignore leading 'r' in base of root device name.


# 52916 06-Nov-1999 msmith

Clean up a couple of initialisations in order to suppress a correct
but un-useful warning.


# 52906 05-Nov-1999 msmith

Guard against freeing NULL if vfs_mountroot_try is called with NULL
as an argument (this is legal to make other code simpler).


# 52886 05-Nov-1999 msmith

Expand the sscanf buffer to 32 bytes to make room for the expanded
pattern, with some space left over to avoid this mistake next time it's
improved.

Submitted by: luoqi


# 52881 04-Nov-1999 msmith

Allow vfs names to include the digits 0-9 as well as the letters a-z.
This should let 'cd9660' filesystems be allowed.

Submitted by: ghelmer


# 52854 03-Nov-1999 msmith

Re-implement the handing of RB_CDROM in a machine-independant fashion.
We currently only search SCSI and IDE CDROMs; if there's felt to be a
need for supporting the very old and rare soundcard etc. drives for this
application they can be trivially added.


# 52836 03-Nov-1999 msmith

Make MFS work with the new root filesystem search process.

In order to achieve this, root filesystem mount is moved from
SI_ORDER_FIRST to SI_ORDER_SECOND in the SI_SUB_MOUNT_ROOT sysinit
group. Now, modules which wish to usurp the default root mount
can use SI_ORDER_FIRST.

A compiled-in or preloaded MFS filesystem will become the root
filesystem unless the vfs.root.mountfrom environment variable refers
to a valid bootable device. This will normally only be the case when
the kernel and MFS image have been loaded from a disk which has a
valid /etc/fstab file. In this case, the variable should be manually
overridden in the loader, or the kernel booted with -a. In either
case "mfs:" should be supplied as the new value.

Also fix a typo in one DFLTROOT case that would not have compiled.


# 52778 01-Nov-1999 msmith

This is a complete rewrite of vfs_conf.c, which changes the way the root
filesystem is discovered. Preference is given to using the kernel
environment variable vfs.root.mountfrom, which is set by the loader
according to the contents of /etc/fstab. Changes in the MD code
provide fallback mechanisms for systems not using the loader.

A more robust fallback path is also provided, with the last recourse
being to prompt on the console for a root device.

These changes drastically simplify the machine-dependant parts of
the root configuration process. In addition, support for CDROM root
devices has been removed; it was a nasty hack and didn't work.


# 51388 19-Sep-1999 dillon

Fix BOOTP root FS mounts. Also cleanup vfs_getnewfsid() and collapse
addaliasu() into addalias() (no operational change) and clarify comments
relating to a trick that vclean() uses.

The fix to BOOTP is yet another hack. Actually, rootfsid handling
is already a major hack. The whole thing needs to be cleaned up.

Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 48521 03-Jul-1999 peter

Fix warnings in last commit (dev_t is not an int, and not even int
compatable in arg lists on the Alpha)


# 48512 03-Jul-1999 phk

Be more informative and try to ask the user in some instances if we can't
figure out the root device.


# 48249 26-Jun-1999 peter

I'm tired of having a 'hanging root device'.. This isn't a "fix", just
a workaround for a specific case where cam interrupts right in the middle
of this printf.


# 47447 23-May-1999 jb

Back out my previous change (phk didn't like it) in favour of setting
rootdev in the mfs initialisation code iff MFS_ROOT (which Bruce doesn't
like). Damned if I do - damned if I don't.


# 47423 23-May-1999 jb

Make MFS_ROOT work again. MFS_ROOT means that rootdev is not set.

Broken by: phk
Problem ignored by: phk


# 39187 14-Sep-1998 sos

Remove the SLICE code.
This clearly needs alot more thought, and we dont need this to hunt
us down in 3.0-RELEASE.


# 36809 09-Jun-1998 bde

Pass lists of possible root devices and their names up to the
machine-independent code and try mounting the devices in the
lists instead of guessing alternative root devices in a machine-
dependent way.

autoconf.c:
Reject preposterous slice numbers instead of silently converting
them to COMPATIBILITY_SLICE.

Don't forget to force slice = COMPATIBILITY_SLICE in the floppy
device name.

Eliminated most magic numbers and magic device names in setroot().

Fixed dozens of style bugs.

vfs_conf.c:
Put the actual root device name instead of "root_device" in the
mount struct if the actual name is available. This is useful after
booting with -s. If it were set in all cases then it could be used
to do mount(8)'s ROOTSLICE_HUNT and fsck(8)'s hotroot guess better.


# 35323 20-Apr-1998 julian

Make the devfs SLICE option a standard type option.
(hopefully it will go away eventually anyhow)


# 35319 19-Apr-1998 julian

Add changes and code to implement a functional DEVFS.
This code will be turned on with the TWO options
DEVFS and SLICE. (see LINT)
Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes.

/dev will be automatically mounted by init (thanks phk)
on bootup. See /sys/dev/slice/slice.4 for more info.
All code should act the same without these options enabled.

Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others

This code does not support the following:
bad144 handling.
Persistance. (My head is still hurting from the last time we discussed this)
ATAPI flopies are not handled by the SLICE code yet.

When this code is running, all major numbers are arbitrary and COULD
be dynamically assigned. (this is not done, for POLA only)
Minor numbers for disk slices ARE arbitray and dynamically assigned.


# 34479 10-Mar-1998 msmith

If the root mount fails from a device that is not the compatability slice
of a disk, because that slice does not exist, try again mounting from the
compatability slice.

This handles the case where a disk has been initialised by 'disklabel
auto', which places a bogus and invalid slice entry on the disk.
The bootstrap is not smart enough to reject this slice, and pretends to
boot from it. Believing the the bootstrap at this point is unwise.

Booting from non-'wd' disks thus prepared is still broken, as
'disklabel -rwB xdN auto' does not initialise the disk type field, and
the bootstrap mistakenly claims that the disk is handled by 'wd'.

Behaviour is now consistent with DEVFS expected characteristics.


# 33181 09-Feb-1998 eivind

Staticize.


# 32358 09-Jan-1998 eivind

Make the BOOTP family new-style options (in opt_bootp.h)


# 31476 01-Dec-1997 julian

Cleanup my last patch here
Reviewed by: sef@kthrup.com and phk@freebsd.org


# 31403 25-Nov-1997 julian

Shift a few SYSINT() calls around.
this results in a few functions becoming static, and
the SYSINITs being close to the code they are related to.
setting up the dump device is with dumpsys() and
kicking off the scheduler is with the scheduler.
Mounting root is with the code that does it.

Reviewed by: phk


# 31016 07-Nov-1997 phk

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


# 30468 16-Oct-1997 julian

We are mounting the root.
mount it at the HEAD of the queue, DEVFS might already be there..


# 30354 12-Oct-1997 phk

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 12569 02-Dec-1995 bde

Finished (?) cleaning up sysinit stuff.


# 11921 29-Oct-1995 phk

Second batch of cleanup changes.
This time mostly making a lot of things static and some unused
variables here and there.


# 10653 09-Sep-1995 dg

Fixed init functions argument type - caddr_t -> void *. Fixed a couple of
compiler warnings.


# 10427 29-Aug-1995 bde

Fix benign type mismatch in a sysinit function arg.


# 10358 28-Aug-1995 julian

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


# 4370 11-Nov-1994 phk

Make a kernel sans FFS possible.


# 2946 21-Sep-1994 wollman

Implemented loadable VFS modules, and made most existing filesystems
loadable. (NFS is a notable exception.)


# 2893 19-Sep-1994 dfr

Added msdosfs.

Obtained from: NetBSD


# 2152 20-Aug-1994 dg

Implemented filesystem clean bit via:

machdep.c:
Changed printf's a little and call vfs_unmountall() if the sync was
successful.

cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c:
Allow dismount of root FS. It is now disallowed at a higher level.

vfs_conf.c:
Removed unused rootfs global.

vfs_subr.c:
Added new routines vfs_unmountall and vfs_unmountroot. Filesystems
are now dismounted if the machine is properly rebooted.

ffs_vfsops.c:
Toggle clean bit at the appropriate places. Print warning if an
unclean FS is mounted.

ffs_vfsops.c, lfs_vfsops.c:
Fix bug in selecting proper flags for VOP_CLOSE().

vfs_syscalls.c:
Disallow dismounting root FS via umount syscall.


# 1817 02-Aug-1994 dg

Added $Id$


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources