Cross Reference: /freebsd-11-stable/sys/kern/vfs

History log of /freebsd-11-stable/sys/kern/vfs_mount.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 369560	06-Apr-2021	git2svn	mount: Disallow mounting over a jail root Discussed with: jamie Approved by: so Security: CVE-2020-25584 Security: FreeBSD-SA-21:10.jail_mount Git Hash: 6f7815083ad66c34bad0dfa08c7033ff670b3be1 Git Author: markj@FreeBSD.org
# 362375	19-Jun-2020	freqlabs	MFC r362252: Apply default security flavor in vfs_export Reported by: npn Reviewed by: rmacklem Approved by: mav (mentor) Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25300
# 361797	04-Jun-2020	freqlabs	MFC r361699, r361711: Assign default security flavor when converting old export args vfs_export requires security flavors be explicitly listed when exporting as of r360900. Use the default AUTH_SYS flavor when converting old export args to ensure compatibility with the legacy mount syscall. Reported by: rmacklem Reviewed by: rmacklem Approved by: mav (mentor) Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25045
# 357032	23-Jan-2020	mckusick	MFC of 356763 Remove call to VFS_SYNC() to avoid unmount livelock
# 353109	04-Oct-2019	emaste	MFC r352796: Check the vfs option length is valid before accessing through When a VFS option passed to nmount is present but NULL the kernel will place an empty option in its internal list. This will have a NULL pointer and a length of 0. When we come to read one of these the kernel will try to load from the last address of virtual memory. This is normally invalid so will fault resulting in a kernel panic. Fix this by checking if the length is valid before dereferencing.
# 342821	06-Jan-2019	mckusick	MFC of 342135 and 342290 Properly respond to error from VFS_ROOT() during mount. Sponsored by: Netflix
# 332753	19-Apr-2018	avg	MFC r331616: vfs_donmount: in certain cases try r/o mount if r/w mount fails If the operation is not an update, if neither r/w nor r/o mode is explicitly requested, if the error code hints at the possibility of the media being read-only, and if the fallback is allowed, then we can try to automatically downgrade to the readonly mode. This is especially useful for auto-mounting of removable media that sometimes can happen to be write-protected. The fallback to r/o is not enabled by default. It can be requested on a per-mount basis with a new mount option, 'autoro'. Or it can be globally allowed by setting vfs.default_autoro. Relnotes: yes
# 331722	29-Mar-2018	eadler	Revert r330897: This was intended to be a non-functional change. It wasn't. The commit message was thus wrong. In addition it broke arm, and merged crypto related code. Revert with prejudice. This revert skips files touched in r316370 since that commit was since MFCed. This revert also skips files that require $FreeBSD$ property changes. Thank you to those who helped me get out of this mess including but not limited to gonzo, kevans, rgrimes. Requested by: gjb (re)
# 331643	27-Mar-2018	dim	MFC r314568 (by emaste): kern_sig.c: ANSIfy and remove archaic register keyword Sponsored by: The FreeBSD Foundation MFC r318389 (by emaste): Remove register keyword from sys/ and ANSIfy prototypes A long long time ago the register keyword told the compiler to store the corresponding variable in a CPU register, but it is not relevant for any compiler used in the FreeBSD world today. ANSIfy related prototypes while here. Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10193
# 330897	14-Mar-2018	eadler	Partial merge of the SPDX changes These changes are incomplete but are making it difficult to determine what other changes can/should be merged. No objections from: pfg
# 324293	05-Oct-2017	avg	MFC r323578,r323769: dounmount: do not release the mount point's reference on the covered vnode As long as mnt_ref is not zero there can be a consumer that might try to access mnt_vnodecovered. For this reason the covered vnode must not be freed until mnt_ref goes to zero. So, move the release of the covered vnode to vfs_mount_destroy.
# 311957	12-Jan-2017	kib	MFC r311452: Do not allocate struct statfs on kernel stack.
# 310959	31-Dec-2016	mjg	MFC r305378,r305379,r305386,r305684,r306224,r306608,r306803,r307650,r307685, r308407,r308665,r308667,r309067: cache: put all negative entry management code into dedicated functions == cache: manage negative entry list with a dedicated lock Since negative entries are managed with a LRU list, a hit requires a modificaton. Currently the code tries to upgrade the global lock if needed and is forced to retry the lookup if it fails. Provide a dedicated lock for use when the cache is only shared-locked. == cache: defer freeing entries until after the global lock is dropped This also defers vdrop for held vnodes. == cache: improve scalability by introducing bucket locks An array of bucket locks is added. All modifications still require the global cache_lock to be held for writing. However, most readers only need the relevant bucket lock and in effect can run concurrently to the writer as long as they use a different lock. See the added comment for more details. This is an intermediate step towards removal of the global lock. == cache: get rid of the global lock Add a table of vnode locks and use them along with bucketlocks to provide concurrent modification support. The approach taken is to preserve the current behaviour of the namecache and just lock all relevant parts before any changes are made. Lookups still require the relevant bucket to be locked. == cache: ignore purgevfs requests for filesystems with few vnodes purgevfs is purely optional and induces lock contention in workloads which frequently mount and unmount filesystems. In particular, poudriere will do this for filesystems with 4 vnodes or less. Full cache scan is clearly wasteful. Since there is no explicit counter for namecache entries, the number of vnodes used by the target fs is checked. The default limit is the number of bucket locks. == (by kib) Limit scope of the optimization in r306608 to dounmount() caller only. Other uses of cache_purgevfs() do rely on the cache purge for correct operations, when paths are invalidated without unmount. == cache: split negative entry LRU into multiple lists This splits the ncneg_mtx lock while preserving the hit ratio at least during buildworld. Create N dedicated lists for new negative entries. Entries with at least one hit get promoted to the hot list, where they get requeued every M hits. Shrinking demotes one hot entry and performs a round-robin shrinking of regular lists. == cache: fix up a corner case in r307650 If no negative entry is found on the last list, the ncp pointer will be left uninitialized and a non-null value will make the function assume an entry was found. Fix the problem by initializing to NULL on entry. == (by kib) vn_fullpath1() checked VV_ROOT and then unreferenced vp->v_mount->mnt_vnodecovered unlocked. This allowed unmount to race. Lock vnode after we noticed the VV_ROOT flag. See comments for explanation why unlocked check for the flag is considered safe. == cache: fix a race between entry removal and demotion The negative list shrinker can demote an entry with only hotlist + neglist locks held. On the other hand entry removal possibly sets the NCF_DVDROP without aformentioned locks held prior to detaching it from the respective netlist., which can lose the update made by the shrinker. == cache: plug a write-only variable in cache_negative_zap_one == cache: ensure that the number of bucket locks does not exceed hash size The size can be changed by side effect of modifying kern.maxvnodes. Since numbucketlocks was not modified, setting a sufficiently low value would give more locks than actual buckets, which would then lead to corruption. Force the number of buckets to be not smaller. Note this should not matter for real world cases.
# 309207	27-Nov-2016	kib	MFC r308618: Provide simple mutual exclusion between mount point update and unmount. In the update path in ffs_mount(), drop vfs_busy() reference around namei().
# 308876	20-Nov-2016	kib	MFC r308617: Move common cleanup code into helper.
# 308252	03-Nov-2016	trasz	MFC r306071: Fix bug introduced with r302388, which could cause processes accessing automounted shares to hang with "vfs_busy" wchan. (As a workaround one can run 'automount -u' from cron.)
# 304983	29-Aug-2016	kib	MFC r303924 (by trasz): Eliminate vprint().
# 302408	07-Jul-2016	gjb	Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here. Additional commits post-branch will follow. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-11-stable/MAINTAINERS /freebsd-11-stable/cddl /freebsd-11-stable/cddl/contrib/opensolaris /freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print /freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs /freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs /freebsd-11-stable/contrib/amd /freebsd-11-stable/contrib/apr /freebsd-11-stable/contrib/apr-util /freebsd-11-stable/contrib/atf /freebsd-11-stable/contrib/binutils /freebsd-11-stable/contrib/bmake /freebsd-11-stable/contrib/byacc /freebsd-11-stable/contrib/bzip2 /freebsd-11-stable/contrib/com_err /freebsd-11-stable/contrib/compiler-rt /freebsd-11-stable/contrib/dialog /freebsd-11-stable/contrib/dma /freebsd-11-stable/contrib/dtc /freebsd-11-stable/contrib/ee /freebsd-11-stable/contrib/elftoolchain /freebsd-11-stable/contrib/elftoolchain/ar /freebsd-11-stable/contrib/elftoolchain/brandelf /freebsd-11-stable/contrib/elftoolchain/elfdump /freebsd-11-stable/contrib/expat /freebsd-11-stable/contrib/file /freebsd-11-stable/contrib/gcc /freebsd-11-stable/contrib/gcclibs/libgomp /freebsd-11-stable/contrib/gdb /freebsd-11-stable/contrib/gdtoa /freebsd-11-stable/contrib/groff /freebsd-11-stable/contrib/ipfilter /freebsd-11-stable/contrib/ldns /freebsd-11-stable/contrib/ldns-host /freebsd-11-stable/contrib/less /freebsd-11-stable/contrib/libarchive /freebsd-11-stable/contrib/libarchive/cpio /freebsd-11-stable/contrib/libarchive/libarchive /freebsd-11-stable/contrib/libarchive/libarchive_fe /freebsd-11-stable/contrib/libarchive/tar /freebsd-11-stable/contrib/libc++ /freebsd-11-stable/contrib/libc-vis /freebsd-11-stable/contrib/libcxxrt /freebsd-11-stable/contrib/libexecinfo /freebsd-11-stable/contrib/libpcap /freebsd-11-stable/contrib/libstdc++ /freebsd-11-stable/contrib/libucl /freebsd-11-stable/contrib/libxo /freebsd-11-stable/contrib/llvm /freebsd-11-stable/contrib/llvm/projects/libunwind /freebsd-11-stable/contrib/llvm/tools/clang /freebsd-11-stable/contrib/llvm/tools/lldb /freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump /freebsd-11-stable/contrib/llvm/tools/llvm-lto /freebsd-11-stable/contrib/mdocml /freebsd-11-stable/contrib/mtree /freebsd-11-stable/contrib/ncurses /freebsd-11-stable/contrib/netcat /freebsd-11-stable/contrib/ntp /freebsd-11-stable/contrib/nvi /freebsd-11-stable/contrib/one-true-awk /freebsd-11-stable/contrib/openbsm /freebsd-11-stable/contrib/openpam /freebsd-11-stable/contrib/openresolv /freebsd-11-stable/contrib/pf /freebsd-11-stable/contrib/sendmail /freebsd-11-stable/contrib/serf /freebsd-11-stable/contrib/sqlite3 /freebsd-11-stable/contrib/subversion /freebsd-11-stable/contrib/tcpdump /freebsd-11-stable/contrib/tcsh /freebsd-11-stable/contrib/tnftp /freebsd-11-stable/contrib/top /freebsd-11-stable/contrib/top/install-sh /freebsd-11-stable/contrib/tzcode/stdtime /freebsd-11-stable/contrib/tzcode/zic /freebsd-11-stable/contrib/tzdata /freebsd-11-stable/contrib/unbound /freebsd-11-stable/contrib/vis /freebsd-11-stable/contrib/wpa /freebsd-11-stable/contrib/xz /freebsd-11-stable/crypto/heimdal /freebsd-11-stable/crypto/openssh /freebsd-11-stable/crypto/openssl /freebsd-11-stable/gnu/lib /freebsd-11-stable/gnu/usr.bin/binutils /freebsd-11-stable/gnu/usr.bin/cc/cc_tools /freebsd-11-stable/gnu/usr.bin/gdb /freebsd-11-stable/lib/libc/locale/ascii.c /freebsd-11-stable/sys/cddl/contrib/opensolaris /freebsd-11-stable/sys/contrib/dev/acpica /freebsd-11-stable/sys/contrib/ipfilter /freebsd-11-stable/sys/contrib/libfdt /freebsd-11-stable/sys/contrib/octeon-sdk /freebsd-11-stable/sys/contrib/x86emu /freebsd-11-stable/sys/contrib/xz-embedded /freebsd-11-stable/usr.sbin/bhyve/atkbdc.h /freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c /freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h /freebsd-11-stable/usr.sbin/bhyve/console.c /freebsd-11-stable/usr.sbin/bhyve/console.h /freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c /freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c /freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h /freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c /freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h /freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c /freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h /freebsd-11-stable/usr.sbin/bhyve/rfb.c /freebsd-11-stable/usr.sbin/bhyve/rfb.h /freebsd-11-stable/usr.sbin/bhyve/sockstream.c /freebsd-11-stable/usr.sbin/bhyve/sockstream.h /freebsd-11-stable/usr.sbin/bhyve/usb_emul.c /freebsd-11-stable/usr.sbin/bhyve/usb_emul.h /freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c /freebsd-11-stable/usr.sbin/bhyve/vga.c /freebsd-11-stable/usr.sbin/bhyve/vga.h
# 302388	07-Jul-2016	trasz	Add new unmount(2) flag, MNT_NONBUSY, to check whether there are any open vnodes before proceeding. Make autounmound(8) use this flag. Without it, even an unsuccessfull unmount causes filesystem flush, which interferes with normal operation. Reviewed by: kib@ Approved by: re (gjb@) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7047
# 301929	15-Jun-2016	kib	Do not assume that we own the use reference on the covered vnode until we set MNTK_UNMOUNT flag on the mp. Otherwise parallel unmount which wins race with us could dereference the covered vnode, and we are left with the locked freed memory. Reported and tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (gjb) MFC after: 1 week
# 299913	16-May-2016	avg	dounmount: do not call mountcheckdirs() for mounts with MNT_IGNORE This is a bit hackish, but the flag is currently set only for ZFS snapshots mounted under .zfs. mountcheckdirs() can change cdir/rdir references to a covered vnode. But for the said snapshots the covered vnode is really ephemeral and it must never be accessed (except for a few specific cases). To do: consider removing mountcheckdirs() entirely MFC after: 5 days
# 295265	04-Feb-2016	kib	Do not copy by field when converting struct oexport_args to struct export_args on mount update, bzero() is consistent with vfs_oexport_conv(). Make the code structure more explicit by using switch. Return EINVAL if export option layout (deduced from size) is unknown. Based on the submission by: bde Sponsored by: The FreeBSD Foundation
# 287107	24-Aug-2015	trasz	Make vfs_unmountall() unmount /dev after /, not before. The only reason this didn't result in an unclean shutdown is that devfs ignores MNT_FORCE flag. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3467
# 285039	02-Jul-2015	kib	Vnode is not referenced by the vfs_domount() at the point where asserts are made. Remove them, since we might dereference freed memory. Leaked locks are asserted by the syscall return code anyway. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 283602	27-May-2015	kib	Right now, dounmount() is called with unreferenced mount point. Nothing stops a parallel unmount to suceed before the given call to dounmount() checks and locks the covered vnode. Prevent dounmount() from acting on the freed (although type-stable) memory by changing the interface to require the mount point to be referenced. dounmount() consumes the reference on return, regardless of the sucessfull or erronous result. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 278523	10-Feb-2015	kib	Mountd iterating over the mount points may race with the parallel unmount, which causes error from nmount(2) call when performing MNT_DELEXPORT over the directory which ceased to be a mount point. The race is legitimate and innocent, but results in the chatty mountd. Silence it by providing an distinguished error code for the situation, and ignoring the error in mountd loop. Based on the patch by: Andreas Longwitz <longwitz@incore.de> Prodded and tested by: bdrewery Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 275638	09-Dec-2014	kib	Do not call VFS_SYNC() before VFS_UNMOUNT() for forced unmount. Since VFS does not/cannot stop writes, sync might run indefinitely, or be a wrong thing to do at all. E. g. NFS ignores VFS_SYNC() for forced unmounts, since non-responding server does not allow sync to finish. On the other hand, filesystems can and do stop writes using fs-specific facilities, and should already fully flush caches in VFS_UNMOUNT() due to the race. Adjust msdosfs tp sync in unmount for forced call, to accomodate the new behaviour. Note that it is still racy, since writes are not stopped. Discussed with: avg, bjk, mckusick Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks
# 270096	17-Aug-2014	trasz	Bring in the new automounter, similar to what's provided in most other UNIX systems, eg. MacOS X and Solaris. It uses Sun-compatible map format, has proper kernel support, and LDAP integration. There are still a few outstanding problems; they will be fixed shortly. Reviewed by: allanjude@, emaste@, kib@, wblock@ (earlier versions) Phabric: D523 MFC after: 2 weeks Relnotes: yes Sponsored by: The FreeBSD Foundation
# 269457	03-Aug-2014	kib	Remove Giant acquisition from the mount and unmount pathes. It could be claimed that two things were reasonable protected by Giant. One is vfsconf list links, which is converted to the new dedicated sx vfsconf_sx. Another is vfsconf.vfc_refcount, which is now updated with atomics. Note that vfc_refcount still has the same races now as it has under the Giant, the unload of filesystem modules can happen while the module is still in use. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 264385	12-Apr-2014	bdrewery	Use proper MFSNAMELEN for fs type. MFC after: 2 weeks Reviewed by: rodrigc Also spotted by:ambrisko
# 256032	03-Oct-2013	sbruno	Change len checks for fstypelen and fspathlen to be against absolute len not strlen as they are not strings. Discovered by GSOC student, Mike Ma <mikemandarine@gmail.com> during his fuse.glusterfs port to FreeBSD. Final patch from mckusick@ Submitted by: mckusick@ Approved by: re (hrs) MFC after: 2 weeks
# 255136	01-Sep-2013	rmacklem	Forced dismounts of NFS mounts can fail when thread(s) are stuck waiting for an RPC reply from the server while holding the mount point busy (mnt_lockref incremented). This happens because dounmount() msleep()s waiting for mnt_lockref to become 0, before calling VFS_UNMOUNT(). This patch adds a new VFS operation called VFS_PURGE(), which the NFS client implements as purging RPCs in progress. Making this call before checking mnt_lockref fixes the problem, by ensuring that the VOP_xxx() calls will fail and unbusy the mount point. Reported by: sbruno Reviewed by: kib MFC after: 2 weeks
# 253158	10-Jul-2013	marcel	Add vfs_mounted and vfs_unmounted events so that components can be informed about mount and unmount events. This is used by Juniper to implement a more optimal implementation of NetBSD's veriexec. This change differs from r253224 in the following way: o The vfs_mounted handler is called before mountcheckdirs() and with newdp locked. vp is unlocked. o The event handlers are declared in <sys/eventhandler.h> and not in <sys/mount.h>. The <sys/mount.h> header is used in user land code that pretends to be kernel code and as such creates a very convoluted environment. It's hard to untangle. Submitted by: stevek@juniper.net Discussed with: pjd@ Obtained from: Juniper Networks, Inc.
# 251604	10-Jun-2013	marcel	Revert r251590. It unexpectedly broke the build and there were some questions on locking. As part of commit-bit grooming, I'd like Steve to handle this, but can't leave things broken in the mean time.
# 251590	09-Jun-2013	marcel	Add vfs_mounted and vfs_unmounted events so that components can be informed about mount and unmount events. This is used by Juniper to implement a more optimal implementation of NetBSD's veriexec. Submitted by: stevek@juniper.net Obtained from: Juniper Networks, Inc
# 249582	17-Apr-2013	gabor	- Correct mispellings of the word occurrence Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
# 248562	20-Mar-2013	kib	When the journaled FFS volume is suspended due to the journal space becoming too low, the softdep flush thread processes the workitems, which frees the space in journal, and then unsuspends the fs. The softdep_flush() and other workitem processing functions busy the filesystem before iterating over the worklist, to prevent the parallel unmount from freeing the mount data. The vfs_busy() is called with MBF_NOWAIT flag. Now, if the unmount is already started and the filesystem is suspended due to low journal space, the journal is never flushed and filesystem is never unsuspended, because vfs_busy(MBF_NOWAIT) call cannot succeed for the unmounting fs, and softdep_flush() does not process the workitems. Unmount needs to write metadata, where it hangs in the "suspfs" state. Move the vn_start_write() call in the dounmount() before setting the MNTK_UNMOUNT flag. This practically ensures that softdep_flush() processed the pending journal writes by making dounmount() wait for the lift of the suspension. Sponsored by: The FreeBSD Foundation Reported and tested by: pho MFC after: 2 weeks
# 245036	04-Jan-2013	davidxu	Revert revision 244760 because strncpy pads trailing space with zero, this prevents kernel data from being leaked. Noticed by: Joerg Sonnenberger < joerg at britannica dot bec dot de >
# 245001	03-Jan-2013	kib	Remove the deprecated MNT_VNODE_FOREACH interface. Use the MNT_VNODE_FOREACH_ALL instead.
# 244760	28-Dec-2012	davidxu	Use strlcpy to NULL-terminate error message even if user provided a short buffer.
# 244534	21-Dec-2012	attilio	Fixup r218424: uio_yield() was scaling directly to userland priority. When kern_yield() was introduced with the possibility to specify a new priority, the behaviour changed by not lowering priority at all in the consumers, making the yielding mechanism highly ineffective for high priority kthreads like bufdaemon, syncer, vlrudaemon, etc. There are no evidences that consumers could bear with such change in semantic and this situation could finally lead to bugs similar to the ones fixed in r244240. Re-specify userland pri for kthreads involved. Tested by: pho Reviewed by: kib, mdf MFC after: 1 week
# 244053	09-Dec-2012	kib	Fix typo. MFC after: 3 days
# 243719	30-Nov-2012	pjd	IFp4 @208450: Remove redundant call to AUDIT_ARG_UPATH1(). Path will be remembered by the following NDINIT(AUDITVNODE1) call. Sponsored by: FreeBSD Foundation (auditdistd) MFC after: 2 weeks
# 241896	22-Oct-2012	kib	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
# 240284	09-Sep-2012	kib	Add a facility for vgone() to inform the set of subscribed mounts about vnode reclamation. Typical use is for the bypass mounts like nullfs to get a notification about lower vnode going away. Now, vgone() calls new VFS op vfs_reclaim_lowervp() with an argument lowervp which is reclaimed. It is possible to register several reclamation event listeners, to correctly handle the case of several nullfs mounts over the same directory. For the filesystem not having nullfs mounts over it, the overhead added is a single mount interlock lock/unlock in the vnode reclamation path. In collaboration with: pho MFC after: 3 weeks
# 234482	20-Apr-2012	mckusick	This change creates a new list of active vnodes associated with a mount point. Active vnodes are those with a non-zero use or hold count, e.g., those vnodes that are not on the free list. Note that this list is in addition to the list of all the vnodes associated with a mount point. To avoid adding another set of linkage pointers to the vnode structure, the active list uses the existing linkage pointers used by the free list (previously named v_freelist, now renamed v_actfreelist). This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
# 234386	17-Apr-2012	mckusick	Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL. The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
# 233999	07-Apr-2012	gleb	Add vfs_getopt_size. Support human readable file system options in tmpfs. Increase maximum tmpfs file system size to 4GB*PAGE_SIZE on 32 bit archs. Discussed with: delphij MFC after: 2 weeks
# 232709	08-Mar-2012	kib	Decomission mnt_noasync. Introduce MNTK_NOASYNC mnt_kern_flag which allows a filesystem to request VFS to not allow MNTK_ASYNC. MFC after: 1 week
# 231012	05-Feb-2012	mm	Analogous to r230407 a separate path buffer in vfs_mount.c is required for r230129. Fixes a out of bounds write to fspath. MFC after: 10 days
# 230249	16-Jan-2012	mckusick	Make sure all intermediate variables holding mount flags (mnt_flag) and that all internal kernel calls passing mount flags are declared as uint64_t so that flags in the top 32-bits are not lost. MFC after: 2 weeks
# 230129	15-Jan-2012	mm	Introduce vn_path_to_global_path() This function updates path string to vnode's full global path and checks the size of the new path string against the pathlen argument. In vfs_domount(), sys_unmount() and kern_jail_set() this new function is used to update the supplied path argument to the respective global path. Unbreaks jailed zfs(8) with enforce_statfs set to 1. Reviewed by: kib MFC after: 1 month
# 227333	08-Nov-2011	attilio	Introduce the option VFS_ALLOW_NONMPSAFE and turn it on by default on all the architectures. The option allows to mount non-MPSAFE filesystem. Without it, the kernel will refuse to mount a non-MPSAFE filesytem. This patch is part of the effort of killing non-MPSAFE filesystems from the tree. No MFC is expected for this patch. Tested by: gianni Reviewed by: kib
# 227293	07-Nov-2011	ed	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.
# 226265	11-Oct-2011	mckusick	When unmounting a filesystem always wait for the vfs_busy lock to clear so that if no vnodes in the filesystem are actively in use the unmount will succeed rather than failing with EBUSY. Reported by: Garrett Cooper Reviewed by: Attilio Rao and Kostik Belousov Tested by: Garrett Cooper PR: kern/161016 MFC after: 3 weeks
# 225617	16-Sep-2011	kmacy	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
# 224712	08-Aug-2011	mm	Revert r224655 and r224614 because vn_fullpath* does not always work on nullfs mounts. Change shall be reconsidered after 9.0 is released. Requested by: re (kib) Approved by: re (kib)
# 224655	05-Aug-2011	mm	The change in r224615 didn't take into account that vn_fullpath_global() doesn't operate on locked vnode. This could cause a panic. Fix by unlocking vnode, re-locking afterwards and verifying that it wasn't renamed or deleted. To improve readability and reduce code size, move code to a new static function vfs_verify_global_path(). In addition, fix missing giant unlock in unmount(). Reported by: David Wolfskill <david@catwhisker.org> Reviewed by: kib Approved by: re (bz) MFC after: 2 weeks
# 224614	02-Aug-2011	mm	For mount, discover f_mntonname from supplied path argument using vn_fullpath_global(). This fixes f_mntonname if mounting inside chroot, jail or with relative path as argument. For unmount in jail, use vn_fullpath_global() to discover global path from supplied path argument. This fixes unmount in jail. Reviewed by: pjd, kib Approved by: re (kib) MFC after: 2 weeks
# 224290	24-Jul-2011	mckusick	This update changes the mnt_flag field in the mount structure from 32 bits to 64 bits and eliminates the unused mnt_xflag field. The existing mnt_flag field is completely out of bits, so this update gives us room to expand. Note that the f_flags field in the statfs structure is already 64 bits, so the expanded mnt_flag field can be exported without having to make any changes in the statfs structure. Approved by: re (bz)
# 223919	11-Jul-2011	ae	Include sys/sbuf.h directly.
# 221829	13-May-2011	mdf	Use a name instead of a magic number for kern_yield(9) when the priority should not change. Fetch the td_user_pri under the thread lock. This is probably not necessary but a magic number also seems preferable to knowing the implementation details here. Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >
# 220937	22-Apr-2011	jh	Utilize vfs_sanitizeopts() in vfs_mergeopts() to merge options. Because vfs_sanitizeopts() can handle "ro" and "rw" options properly, there is no more need to add "noro" in vfs_donmount() to cancel "ro". This also fixes a problem of canceling options beginning with "no". For example, "noatime" didn't cancel "nonoatime". Thus it was possible that both "noatime" and "nonoatime" were active at the same time. Reviewed by: bde
# 220040	26-Mar-2011	jh	Fix some style issues in r219925. Reported by: bde MFC after: 1 month
# 219925	23-Mar-2011	jh	Recognize "ro", "rdonly", "norw", "rw" and "noro" as equal options in vfs_equalopts(). This allows vfs_sanitizeopts() to filter redundant occurrences of these options. It was possible that for example both "ro" and "rw" options became active concurrently. PR: kern/133614 Discussed on: freebsd-hackers MFC after: 1 month
# 218852	19-Feb-2011	jh	Don't restore old mount options and flags if VFS_MOUNT(9) succeeds but vfs_export() fails. Restoring old options and flags after successful VFS_MOUNT(9) call may cause the file system internal state to become inconsistent with mount options and flags. Specifically the FFS super block fs_ronly field and the MNT_RDONLY flag may get out of sync. PR: kern/133614 Discussed on: freebsd-hackers
# 218424	07-Feb-2011	mdf	Based on discussions on the svn-src mailing list, rework r218195: - entirely eliminate some calls to uio_yeild() as being unnecessary, such as in a sysctl handler. - move should_yield() and maybe_yield() to kern_synch.c and move the prototypes from sys/uio.h to sys/proc.h - add a slightly more generic kern_yield() that can replace the functionality of uio_yield(). - replace source uses of uio_yield() with the functional equivalent, or in some cases do not change the thread priority when switching. - fix a logic inversion bug in vlrureclaim(), pointed out by bde@. - instead of using the per-cpu last switched ticks, use a per thread variable for should_yield(). With PREEMPTION, the only reasonable use of this is to determine if a lock has been held a long time and relinquish it. Without PREEMPTION, this is essentially the same as the per-cpu variable.
# 218195	02-Feb-2011	mdf	Put the general logic for being a CPU hog into a new function should_yield(). Use this in various places. Encapsulate the common case of check-and-yield into a new function maybe_yield(). Change several checks for a magic number of iterations to use should_yield() instead. MFC after: 1 week
# 217792	24-Jan-2011	jh	Replace spaces with tabs.
# 215747	23-Nov-2010	pluknet	Update MNT_ROOTFS comments after changes in the root mount logic. Reported by: arundel Suggested by: marcel (phrasing) Approved by: kib (mentor)
# 214005	18-Oct-2010	marcel	In vfs_filteropt(), only print the errmsg when there's no errmsg mount option. Otherwise errors tend to get printed multiple times.
# 213664	10-Oct-2010	kib	The r184588 changed the layout of struct export_args, causing an ABI breakage for old mount(2) syscall, since most struct <filesystem>_args embed export_args. The mount(2) is supposed to provide ABI compatibility for pre-nmount mount(8) binaries, so restore ABI to pre-r184588. Requested and reviewed by: bde MFC after: 2 weeks
# 213365	02-Oct-2010	marcel	Split the root mount logic from the (generic) mount code and move it (the root mount code) into a new file called vfs_mountroot.c The split is almost trivial, as the code is almost perfectly non-intertwined. The only adjustment needed was to move the UMA zone allocation out of vfs_mountroot() [in vfs_mountroot.c] and into vfs_mount.c, where it had to be done as a SYSINIT [see vfs_mount_init()]. There are no functional changes with this commit.
# 212466	11-Sep-2010	kib	Protect mnt_syncer with the sync_mtx. This prevents a (rare) vnode leak when mount and update are executed in parallel. Encapsulate syncer vnode deallocation into the helper function vfs_deallocate_syncvnode(), to not externalize sync_mtx from vfs_subr.c. Found and reviewed by: jh (previous version of the patch) Tested by: pho MFC after: 3 weeks
# 212356	09-Sep-2010	pjd	Remove VI_MOUNT flag from vnode on VFS_MOUNT() failure.
# 212341	08-Sep-2010	pjd	Doing first mount and updating mount points are both handled by the same syscall and the same function, but are very different and share almost no code. To make it easier to read and analyze, split vfs_domount() into vfs_domount_first() and vfs_domount_update(). Reviewed by: kib
# 212340	08-Sep-2010	pjd	- Log all the problems in devfs_fixup(). - Correct error paths. The system will be useless on devfs_fixup() failure, so why bother? Maybe for the same reason why a dead body is washed and dressed in a nice suit before it is put into a coffin? Maybe system's last will is to panic without any locks held? Reviewed by: kib
# 211930	28-Aug-2010	pjd	There is a bug in vfs_allocate_syncvnode() failure handling in mount code. Actually it is hard to properly handle such a failure, especially in MNT_UPDATE case. The only reason for the vfs_allocate_syncvnode() function to fail is getnewvnode() failure. Fortunately it is impossible for current implementation of getnewvnode() to fail, so we can assert this and make vfs_allocate_syncvnode() void. This in turn free us from handling its failures in the mount code. Reviewed by: kib MFC after: 1 month
# 204066	18-Feb-2010	pjd	- Reduce scope of vnode lock. vfs_mount_alloc() doesn't need vnode to be locked. - Remove code duplication.
# 201145	28-Dec-2009	antoine	(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument. Fix some wrong usages. Note: this does not affect generated binaries as this argument is not used. PR: 137213 Submitted by: Eygene Ryabinkin (initial version) MFC after: 1 month
# 199227	12-Nov-2009	attilio	Add the possibility for vfs.root.mountfrom tunable to accept a list of items rather than a single one. The list is a space separated collection of items defined as the current one accepted. While there fix also a nit in a comment. Obtained from: Sandvine Incorporated Reviewed by: emaste Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> Sponsored by: Sandvine Incorporated MFC: 2 weeks
# 199041	08-Nov-2009	trasz	Add suggestion for zfs root.
# 195995	31-Jul-2009	jhb	Fix some LORs between vnode locks and filedescriptor table locks. - Don't grab the filedesc lock just to read fd_cmask. - Drop vnode locks earlier when mounting the root filesystem and before sanitizing stdin/out/err file descriptors during execve(). Submitted by: kib Approved by: re (rwatson) MFC after: 1 week
# 195939	29-Jul-2009	rwatson	Eliminate ARG_UPATH[12] arguments to AUDIT_ARG_UPATH() and instead provide specific macros, AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2() to capture path information for audit records. This allows us to move the definitions of ARG_* out of the public audit header file, as they are an implementation detail of our current kernel-internal audit record, which may change. Approved by: re (kensmith) Obtained from: TrustedBSD Project MFC after: 1 month
# 195247	01-Jul-2009	rwatson	When auditing unmount(2), capture FSID arguments as regular text strings rather than as paths, which would lead to them being treated as relative pathnames and hence confusingly converted into absolute pathnames. Capture flags to unmount(2) via an argument token. Approved by: re (audit argument blanket) MFC after: 3 days
# 195104	27-Jun-2009	rwatson	Replace AUDIT_ARG() with variable argument macros with a set more more specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week
# 193511	05-Jun-2009	rwatson	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
# 193192	31-May-2009	rodrigc	sys/boot/common.c ================= Extend the loader to parse the root file system mount options in /etc/fstab, and set a new loader variable vfs.root.mountfrom.options with these options. The root mount options must be a comma-delimited string, as specified in /etc/fstab. Only set the vfs.root.mountfrom.options variable if it has not been set in the environment. sys/kern/vfs_mount.c ==================== When mounting the root file system, pass the mount options specified in vfs.root.mountfrom.options, but filter out "rw" and "noro", since the initial mount of the root file system must be done as "ro". While we are here, try to add a few hints to the mountroot prompt to give users and idea what might of gone wrong during mounting of the root file system. Reviewed by: jhb (an earlier patch)
# 192895	27-May-2009	jamie	Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)
# 191990	11-May-2009	attilio	Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
# 190878	10-Apr-2009	thompsa	Revert r190676,190677 The geom and CAM changes for root_hold are the wrong solution for USB design quirks. Requested by: scottl
# 190676	03-Apr-2009	thompsa	Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called in situations where sleeping isnt allowed.
# 190540	30-Mar-2009	thompsa	Further rate limit the root wait status, it will be printed once per root_mount_rel() wakeup.
# 190460	27-Mar-2009	thompsa	Skip the allocation of the root hold token if the mount already happened.
# 189290	02-Mar-2009	jamie	Extend the "vfsopt" mount options for more general use. Make struct vfsopt and the vfs_buildopts function public, and add some new fields to struct vfsopt (pos and seen), and new functions vfs_getopt_pos and vfs_opterror. Further extend the interface to allow reading options from the kernel in addition to sending them to the kernel, with vfs_setopt and related functions. While this allows the "name=value" option interface to be used for more than just FS mounts (planned use is for jails), it retains the current "vfsopt" name and <sys/mount.h> requirement. Approved by: bz (mentor)
# 188150	05-Feb-2009	attilio	Add more KTR_VFS logging point in order to have a more effective tracing. Reviewed by: brueffer, kib Tested by: Gianni Trematerra <giovanni D trematerra A gmail D com>
# 186197	16-Dec-2008	attilio	1) Fix a deadlock in the VFS: - threadA runs vfs_rel(mp1) - threadB does unmount the mp1 fs, sets MNTK_UNMOUNT and drop MNT_ILOCK() - threadA runs vfs_busy(mp1) and, as long as, MNTK_UNMOUNT is set, sleeps waiting for threadB to complete the unmount - threadB, in vfs_mount_destroy(), finds mnt_lock > 0 and sleeps waiting for the refcount to expire. Fix the deadlock by adding a flag called MNTK_REFEXPIRE which signals the unmounter is waiting for mnt_ref to expire. The vfs_busy contenders got awake, fails, and if they retry the MNTK_REFEXPIRE won't allow them to sleep again. 2) Simplify significantly the code of vfs_mount_destroy() trimming unnecessary codes: - as long as any reference exited, it is no-more possible to have write-op (primarty and secondary) in progress. - it is no needed to drop and reacquire the mount lock. - filling the structures with dummy values is unuseful as long as it is going to be freed. Tested by: pho, Andrea Barberio <insomniac at slackware dot it> Discussed with: kib
# 185504	01-Dec-2008	attilio	Fix an inverted check introduced in r184554. Submitted by: tegge Pointy hat to: me
# 184599	03-Nov-2008	attilio	Remove the mnt_holdcnt and mnt_holdcntwaiters because they are useless. Really, the concept of holdcnt in the struct mount is rappresented by the mnt_ref (which prevents the type-stable structure from being "recycled) handled through vfs_ref() and vfs_rel(). On this optic, switch the holdcnt acquisition into an emulated vfs_ref() (and subsequent release into vfs_rel()). Discussed with: kib Tested by: pho
# 184588	03-Nov-2008	dfr	Implement support for RPCSEC_GSS authentication to both the NFS client and server. This replaces the RPC implementation of the NFS client and server with the newer RPC implementation originally developed (actually ported from the userland sunrpc code) to support the NFS Lock Manager. I have tested this code extensively and I believe it is stable and that performance is at least equal to the legacy RPC implementation. The NFS code currently contains support for both the new RPC implementation and the older legacy implementation inherited from the original NFS codebase. The default is to use the new implementation - add the NFS_LEGACYRPC option to fall back to the old code. When I merge this support back to RELENG_7, I will probably change this so that users have to 'opt in' to get the new code. To use RPCSEC_GSS on either client or server, you must build a kernel which includes the KGSSAPI option and the crypto device. On the userland side, you must build at least a new libc, mountd, mount_nfs and gssd. You must install new versions of /etc/rc.d/gssd and /etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf. As long as gssd is running, you should be able to mount an NFS filesystem from a server that requires RPCSEC_GSS authentication. The mount itself can happen without any kerberos credentials but all access to the filesystem will be denied unless the accessing user has a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There is currently no support for situations where the ticket file is in a different place, such as when the user logged in via SSH and has delegated credentials from that login. This restriction is also present in Solaris and Linux. In theory, we could improve this in future, possibly using Brooks Davis' implementation of variant symlinks. Supporting RPCSEC_GSS on a server is nearly as simple. You must create service creds for the server in the form 'nfs/<fqdn>@<REALM>' and install them in /etc/krb5.keytab. The standard heimdal utility ktutil makes this fairly easy. After the service creds have been created, you can add a '-sec=krb5' option to /etc/exports and restart both mountd and nfsd. The only other difference an administrator should notice is that nfsd doesn't fork to create service threads any more. In normal operation, there will be two nfsd processes, one in userland waiting for TCP connections and one in the kernel handling requests. The latter process will create as many kthreads as required - these should be visible via 'top -H'. The code has some support for varying the number of service threads according to load but initially at least, nfsd uses a fixed number of threads according to the value supplied to its '-n' option. Sponsored by: Isilon Systems MFC after: 1 month
# 184554	02-Nov-2008	attilio	Improve VFS locking: - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless. This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly. Discussed with: kib Tested by: pho
# 183754	10-Oct-2008	attilio	Remove the struct thread unuseful argument from bufobj interface. In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
# 183188	19-Sep-2008	obrien	Add freebsd32 compat shim for nmount(2). (and quiet some compiler warnings for vfs_donmount)
# 182740	03-Sep-2008	simon	- Fix amd64 local privilege escalation. [08:07] - Fix nmount(2) local privilege escalation. [08:08] - Fix IPv6 remote kernel panics. [08:09] Fix for [08:07] is merge of r181823. Submitted by: kib [08:07], csjp [08:08], bz [08:09] Reviewed by: peter [08:07], jhb [08:07] Reviewed by: jinmei [08:09], rwatson [08:09] Approved by: re (SA blanket) Approved by: so (simon) Security: FreeBSD-SA-08:07.amd64 Security: FreeBSD-SA-08:08.nmount Security: FreeBSD-SA-08:09.icmp6
# 182542	31-Aug-2008	attilio	Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions. Manpages are updated accordingly. Tested by: Diego Sardina <siarodx at gmail dot com>
# 182371	28-Aug-2008	attilio	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
# 182025	22-Aug-2008	rodrigc	In nmount(), when we see the "force" option, set the MNT_FORCE flag, but do not persist "force" in the options list, since it is a command, not a persistent property of a mount. Similarly, when we see "reload", set MNT_RELOAD, but delete "reload" from the options list. MFC after: 1 week
# 181528	10-Aug-2008	kib	Revert r181345. Move the NULL pointer check to the vfs_deleteopt() function. Discussed with: rodrigc MFC after: 3 days
# 181463	09-Aug-2008	des	Add sbuf_new_auto as a shortcut for the very common case of creating a completely dynamic sbuf. Obtained from: Varnish MFC after: 2 weeks
# 180484	12-Jul-2008	rodrigc	In nmount(), if we see "update" in the mount options, set MNT_UPDATE in fsflags, and delete the "update" option from the global mount options. MNT_UPDATE is a command, and not a property of a mount that should persist after the command is executed. We need to do similar things for MNT_FORCE and MNT_RELOAD. All mount flags are prefixed by MNT_..... it would be nice if flags which were commands were named differently from flags which are persistent properties of a mount. This was not such a big deal in the pre-nmount() days, but with nmount() it is more important. Requested by: yar MFC after: 2 weeks
# 179670	09-Jun-2008	kib	Provide the mutual exclusion between the nfs export list modifications and nfs requests processing. Lockmgr lock provides the shared locking for nfs requests, while exclusive mode is used for modifications. The writer starvation is handled by lockmgr too. Reported by: kris, pho, many Based on the submission by: mohan Tested by: pho MFC after: 2 weeks
# 179658	08-Jun-2008	wkoszek	Remove checks against DDB, which isn't used in this file. My intention is to bring no functional change. Discussion on: IRC Reviewed by: ed, kan, rink,
# 179268	23-May-2008	rodrigc	Do not convert the "snapshot" string to the MNT_SNAPSHOT flag here, since we do it further down in ffs_vfsops.c MFC after: 1 month
# 178677	29-Apr-2008	rdivacky	Lock filedesc exclusively when modifying fd_[cr]dir. This is probably harmless but it's better to lock it correctly. Approved by: kib (mentor)
# 178429	22-Apr-2008	phk	Now that all platforms use genclock, shuffle things around slightly for better structure. Much of this is related to <sys/clock.h>, which should really have been called <sys/calendar.h>, but unless and until we need the name, the repocopy can wait. In general the kernel does not know about minutes, hours, days, timezones, daylight savings time, leap-years and such. All that is theoretically a matter for userland only. Parts of kernel code does however care: badly designed filesystems store timestamps in local time and RTC chips almost universally track time in a YY-MM-DD HH:MM:SS format, and sometimes in local timezone instead of UTC. For this we have <sys/clock.h> <sys/time.h> on the other hand, deals with time_t, timeval, timespec and so on. These know only seconds and fractions thereof. Move inittodr() and resettodr() prototypes to <sys/time.h>. Retain the names as it is one of the few surviving PDP/VAX references. Move startrtclock() to <machine/clock.h> on relevant platforms, it is a MD call between machdep.c/clock.c. Remove references to it elsewhere. Remove a lot of unnecessary <sys/clock.h> includes. Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs. XXX: should be kern.disable_rtc_set really, it's not MD.
# 178016	08-Apr-2008	sam	o add a mountroot event handler that fires when / is mounted; this information was lost when root started being mounted by init o remove SI_SUB_MOUNT_ROOT since it's no longer meaningful MFC after: 2 weeks
# 177785	31-Mar-2008	kib	Add the support for the AT_FDCWD and fd-relative name lookups to the namei(9). Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho
# 177528	23-Mar-2008	kib	Yield the cpu in the kernel while iterating the list of the vnodes belonging to the mountpoint. Also, yield when in the softdep_process_worklist() even when we are not going to sleep due to buffer drain. It is believed that the ULE fixed the problem [1], but the yielding seems to be needed at least for the 4BSD case. Discussed: on stable@, with bde Reviewed by: tegge, jeff [1] MFC after: 2 weeks
# 176394	18-Feb-2008	yar	Undo the damage I did in sys/kern/vfs_mount.c #1.274 and sbin/mount_nfs/mount_nfs.c #1.76. Let the dragons sleep. Requested by: rodrigc, des PR: kern/120319 (welcome the bug back)
# 176383	18-Feb-2008	yar	Add a remark on a questionable property of vfs_mergeopts().
# 176283	14-Feb-2008	yar	In the new order of things dictated by nmount(2), a read-only mount is to be requested via a "ro" option. At the same time, MNT_RDONLY is gradually becoming an indicator of the current state of the FS instead of a command flag. Today passing MNT_RDONLY alone to the kernel's mount machinery will lead to various glitches. (See the PRs for examples.) Therefore mount the root FS with a "ro" option instead of the MNT_RDONLY flag. (Note that MNT_RDONLY still is added to the mount flags internally, by vfs_donmount(), if "ro" was specified.) To be able to pass "ro" cleanly to kernel_vmount(), teach the latter function to accept options with NULL values. Also correct the comment explaining how mount_arg() handles length of -1. PR: bin/106636 kern/120319 Submitted by: Jaakko Heinonen <see PR kern/120319 for email> (originally)
# 175635	24-Jan-2008	attilio	Cleanup lockmgr interface and exported KPI: - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits. Tested by: matteo
# 175294	13-Jan-2008	attilio	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
# 175202	09-Jan-2008	attilio	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
# 175024	31-Dec-2007	rodrigc	In vfs_scanopt(), make sure that the mount option value is not NULL before calling vsscanf(). PR: 118531 Submitted by: Jaakko Heinonen <jh saunalahti fi> MFC after: 3 days
# 174937	27-Dec-2007	imp	A partial solution to some of the 'pull the umass device with a mounted FS' problems. These are more along the lines of 'avoiding an avoidable panic' than a complete solution to removable devices. We now close the barn door after the horse has gotten lose and has been hit by a truck, as it were. The barn no longer catches fire in this case, but the horse is still dead :-). The vfs_bio.c fix causes us not to put a failed write back into the dirty pool if the error returned was ENXIO. In that case, the buffer is treated like any other clean buffer that's being retured. ENXIO means the device isn't there anymore and will never be there again in the future, so retrying is futile. The vfs_mount.c fix treats 'ENXIO' as success for unmounting a file system. If the device is gone, retrying later won't help and we'll never be able to unmount the device. These two are part of a larger patch set submitted by the author. The other patches will be forth coming. I added comments to these two patches. Submitted by: Henrik Gulbrandsen Reviewed by: phk@ PR: usb/46176 (partial)
# 174282	05-Dec-2007	rodrigc	In nmount(), internally convert the mount option: "rdonly" to "ro". This makes updates mounts such as: "mount -u -o rdonly" work more like, "mount -u -o ro". References to "-o rdonly" were changed to "-o ro" in revision 1.60 of the mount(8) man page, but some people still like to use "-o rdonly" since it was documented in earlier versions of FreeBSD. Requested by: rwatson MFC after: 1 week
# 173064	27-Oct-2007	rodrigc	In nmount(), if MNT_ROOT is in the mount flags, filter it out instead of returning an error. (1) This makes the behavior consistent with mount(2). (2) This makes update mounts on the root file system work properly. (3) The explicit checks for MNT_ROOTFS in src/sbin/fsck_ffs/main.c and src/usr.sbin/mountd/mountd.c which were put in to eliminate errors during update mounts on the root file system can be removed. The only place were MNT_ROOTFS can be validly set is inside the kernel, i.e. with vfs_mountroot_try(). Reviewed by: phk MFC after: 3 days
# 172930	24-Oct-2007	rwatson	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
# 172151	12-Sep-2007	kib	When restoring the mount after umount failed, the MNTK_UNMOUNT flag prevents insmntque() from placing reallocated syncer vnode on mount list, that causes panic in vfs_allocate_syncvnode(). Introduce MNTK_NOINSMNTQ flag, that marks the period when instmntque is not allowed to success, instead of MNTK_UNMOUNT. The MNTK_NOINSMNTQ is set and cleared simultaneously with MNTK_UNMOUNT, except on umount error path, where it is cleaned just before the syncer vnode is going to be allocated. Reported by: Peter Jeremy <peterjeremy optushome com au> Suggested by: tegge Approved by: re (rwatson)
# 171852	15-Aug-2007	jhb	On 6.x this works: % mount \| grep home /dev/ad4s1e on /home (ufs, local, noatime, soft-updates) % mount -u -o atime /home % mount \| grep home /dev/ad4s1e on /home (ufs, local, soft-updates) Restore this behavior for on 7.x for the following mount options: noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow In addition, on 7.x, the following are equivalent: mount -u -o atime /home mount -u -o nonoatime /home Ideally, when we introduce new mount options, we should avoid options starting with "no". :) Requested by: jhb Reported by: Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com> Approved by: re (bmah) Proxy commit for: rodrigc
# 171598	26-Jul-2007	pjd	The v_mountedhere field is protected by the vnode lock, not vnode's internal lock. Approved by: re (rwatson)
# 171452	14-Jul-2007	rodrigc	Revert previous commits which I committed by mistake. Approved by: re (implicit) Pointy hat to: me
# 171450	14-Jul-2007	rodrigc	The last entry in the ext2_opts array must be NULL, otherwise the kernel with crash in vfs_filteropt() if an invalid mount option is passed to ext2fs. Approved by: re (kensmith)
# 170587	11-Jun-2007	rwatson	Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project
# 169054	26-Apr-2007	kib	Allow the dounmount() to proceed even for doomed coveredvp. In dounmount(), before or while vn_lock(coveredvp) is called, coveredvp vnode may be VI_DOOMED due to one of the following: - other thread finished unmount and vput()ed it, and vnode was chosen for recycling, while vn_lock() slept; - forced unmount of the coveredvp->v_mount fs. In the first case, next check for changed v_mountedhere or mnt_gen counter would be successfull. In the second case, the unmount shall be allowed. Submitted by: sobomax MFC after: 2 weeks
# 168823	17-Apr-2007	pjd	Export vfs_mount_alloc() as it is used in ZFS.
# 168699	13-Apr-2007	pjd	Fix jails and jail-friendly file systems handling: - We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail. - Move security checks to vfs_suser() and deny unmounting and updating for jailed root from different jails, etc. OK'ed by: rwatson
# 168552	09-Apr-2007	njl	Restore the locking for the sleep/wakeup to avoid waiting an extra 1 sec if a race was lost. We're still single-threaded at this point, but just be safe for the future.
# 168547	09-Apr-2007	njl	Clean up the root mount and mount wait code. No mutexes are needed here since a spurious wakeup() is the only possible outcome and this is fine in the BSD programming model.
# 168506	08-Apr-2007	pjd	Add root_mounted() function that returns true if the root file system is already mounted.
# 168396	05-Apr-2007	pjd	Add security.jail.mount_allowed sysctl, which allows to mount and unmount jail-friendly file systems from within a jail. Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user. It is turned off by default. A jail-friendly file system is a file system which driver registers itself with VFCF_JAIL flag via VFS_SET(9) API. The lsvfs(1) command can be used to see which file systems are jail-friendly ones. There currently no jail-friendly file systems, ZFS will be the first one. In the future we may consider marking file systems like nullfs as jail-friendly. Reviewed by: rwatson
# 168355	04-Apr-2007	rwatson	Replace custom file descriptor array sleep lock constructed using a mutex and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff
# 168300	03-Apr-2007	pjd	Add root_mount_wait() function which can be used to wait until the root file system is mounted. This is useful for kernel modules loaded from /boot/loader.conf, that have to access file system.
# 168207	01-Apr-2007	pjd	I think the code I'm removing here is completely bogus. vfs_flags field is used for VFCF_* flags which are given at file system driver creation time (via VFS_SET(9)) macro. What this code did was bascially this: If file system registers itself with VFCF_UNICODE flag (stores file names as Unicode), it will gain MNT_SOFTDEP flag (UFS soft-updates). If file system registers itself with VFCF_LOOPBACK flag (aliases some other mounted FS), it will gain MNT_SUIDDIR flag (special handling of SUID on dirs). The latter will be quite dangerous, but those flags are reset later in vfs_domount(). MFC after: 1 month
# 168185	31-Mar-2007	pjd	Make vfs_mount_destroy() and vfs_freeopts() non-static, I'd like to use them.
# 167672	18-Mar-2007	pjd	Don't deny unmounting file systems for jailed processes immediately, allow prison_priv_check() to decide what to do. This change is suppose not to change current (security) behaviour in any way. This change is simlar to the change of PRIV_VFS_MOUNT in previous revision.
# 167553	14-Mar-2007	pjd	Don't deny mounting for jailed processes immediately, allow prison_priv_check() to decide what to do. This change is suppose not to change current (security) behaviour in any way. Reviewed by: rwatson
# 167551	14-Mar-2007	pjd	White space nits.
# 167232	05-Mar-2007	rwatson	Further system call comment cleanup: - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
# 166681	12-Feb-2007	cognet	Make vfs_getopts() set *error to ENOENT if the option wasn't found, so that consumers don't have to check for both error and the return value (some of them actually don't do it). MFC After: 1 week
# 165288	16-Dec-2006	rodrigc	Add a function vfs_deleteopt() which searches through the vfsoptlist linked list of mount options by name, and deletes the option if it finds it.
# 164033	06-Nov-2006	rwatson	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
# 163606	22-Oct-2006	rwatson	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
# 162983	03-Oct-2006	kib	Fix the remaining race in the revs. 1.232, 1,233 that could occur during unmount when mp structure is reused while waiting for coveredvp lock. Introduce struct mount generation count, increment it on each reuse and compare the generations before and after obtaining the coveredvp lock. Reviewed by: tegge, pjd Approved by: pjd (mentor) MFC after: 2 weeks
# 162954	02-Oct-2006	phk	First part of a little cleanup in the calendar/timezone/RTC handling. Move relevant variables to <sys/clock.h> and fix #includes as necessary. Use libkern's much more time- & spamce-efficient BCD routines.
# 162653	26-Sep-2006	tegge	Reduce fluctuations of mnt_flag to allow unlocked readers to get a slightly more consistent view.
# 162651	26-Sep-2006	tegge	Don't restore MNT_QUOTA bit in mnt_flag after a failed mount with MNT_UPDATE flag, closing a race between nmount() and quotactl().
# 162649	26-Sep-2006	tegge	Add mnt_noasync counter to better handle interleaved calls to nmount(), sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag which is set only when MNT_ASYNC is set and mnt_noasync is zero, and check that flag instead of MNT_ASYNC before initiating async io.
# 162648	26-Sep-2006	tegge	Don't restore mnt_kern_flag on failed MNT_UPDATE mount, it can race with dounmount(), causing loss of MNTK_UNMOUNT flag.
# 162647	26-Sep-2006	tegge	Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().
# 162444	19-Sep-2006	kib	Fix the bug in rev. 1.232. If vfs_suser returned false, coveredvp shall be unlocked only if it really exists. Found with: Coverity Prevent(tm) CID: 1535 Approved by: pjd (mentor)
# 162407	18-Sep-2006	kib	Fix the race while waiting for coveredvp lock during unmount. The vnode may be recycled during the sleep, wrap the vn_lock with vhold/vdrop. Check that coveredvp still points to the same mp after sleep (needed because sleep dropped Giant). Move check for user rights for unmount after coveredvp lock is obtained. Tested by: Peter Holm Reviewed by: tegge Approved by: kan (mentor) MFC after: 2 weeks
# 161644	26-Aug-2006	marius	Fix another bug introduced with rev. 1.204; in vfs_donmount() if the 'vfs_getopt(optlist, "errmsg", (void **)&errmsg, &errmsg_len)' call fails, 'errmsg' is left uninitialized, making the later tests against NULL meaningless, and the uses bogus. Thus initialize 'errmsg' to NULL beforehand. [1] While at it, remove the superfluous assignment of 0 to 'errmsg_len' if the above mentioned call fails as it's already initialized to 0. Submitted by: Michael Plass [1]
# 161619	25-Aug-2006	pjd	Fix comment.
# 161584	24-Aug-2006	marius	Fix a bug introduced with rev. 1.204; in vfs_donmount() use copyout(9) instead of copystr(9) for copying the errmsg from kernel- to user-space. This fixes a panic on sparc64 when using the nmount(2)-converted mountd(8). While at it, use bcopy(3) instead of strncpy(3) in the kernel- to kernel-space case for consistency with vfs_buildopts() and between kernel- to user-space and kernel- to kernel-space case.
# 159982	27-Jun-2006	jhb	- Expand the scope of Giant some in mount(2) to protect the vfsp structure from going away. mount(2) is now MPSAFE. - Expand the scope of Giant some in unmount(2) to protect the mp structure (or rather, to handle concurrent unmount races) from going away. umount(2) is now MPSAFE, as well as linux_umount() and linux_oldumount(). - nmount(2) and linux_mount() were already MPSAFE.
# 159274	05-Jun-2006	rwatson	Audit some arguments to nmount(), mount(), umount(). Submitted by: wsalamon Obtained from: TrustedBSD Project
# 159181	02-Jun-2006	pjd	Fix a problem introduced in revision 1.220. On mount(2) failure, don't forget to unbusy file system before its destruction. This fixes the following warning on mount failure: Mount point <X> had 1 dangling refs Tested by: wkoszek
# 158930	26-May-2006	rodrigc	Add "update" mount option to global_opts array, for use with vfs_filteropt().
# 158924	25-May-2006	rodrigc	Remove calls to vfs_export() for exporting a filesystem for NFS mounting from individual filesystems. Call it instead in vfs_mount.c, after we call VFS_MOUNT() for a specific filesystem.
# 158611	15-May-2006	kbyanc	Restore the ability to mount procfs and fdescfs filesystems via the mount(2) system call: * Add cmount hook to fdescfs and pseudofs (and, by extension, procfs and linprocfs). This (mostly) restores the ability to mount these filesystems using the old mount(2) system call (see below for the rest of the fix). * Remove not-NULL check for the data argument from the mount(2) entry point. Per the mount(2) man page, it is up to the individual filesystem being mounted to verify data. Or, in the case of procfs, etc. the filesystem is free to ignore the data parameter if it does not use it. Enforcing data to be not-NULL in the mount(2) system call entry point prevented passing NULL to filesystems which ignored the data pointer value. Apparently, passing NULL was common practice in such cases, as even our own mount_std(8) used to do it in the pre-nmount(2) world. All userland programs in the tree were converted to nmount(2) long ago, but I've found at least one external program which broke due to this (presumably unintentional) mount(2) API change. One could argue that external programs should also be converted to nmount(2), but then there isn't much point in keeping the mount(2) interface for backward compatibility if it isn't backward compatible.
# 158546	13-May-2006	rodrigc	For nmount(), if "rw" is specified as a mount option, add "noro" to the list of mount options. This allows a read-only mount to be converted to read-write via: mount -u -o rw Requested by: kris
# 157343	31-Mar-2006	jeff	- When there are dangling vnodes at unmount print them before we panic. Sponsored by: Isilon Systems, Inc.
# 157322	31-Mar-2006	jeff	- Allocate mounts from a uma zone that uses UMA_ZONE_NOFREE to prevent mount memory from being reclaimed. This resolves a number of race conditions described in vfs_default.c and introduced with the VFS_LOCK_GIANT macros. - Let the mtx and lock remain valid after the mount structure has been freed by using init and fini calls. Technically fini will never be called but is included for completeness. - Consistently use lockmgr directly rather than lockmgr to lock and vfs_unbusy to unlock. Discussed with: tegge Tested by: kris Sponsored by: Isilon Systems, Inc.
# 156685	13-Mar-2006	ru	The mount(8) manpage says: "In case of conflicting options being specified, the rightmost option takes effect." Fix code to obey this. This makes e.g. "mount -r /usr" or "mount -ar" actually mount file systems read-only.
# 156451	08-Mar-2006	tegge	Use vn_start_secondary_write() and vn_finished_secondary_write() as a replacement for vn_write_suspend_wait() to better account for secondary write processing. Close race where secondary writes could be started after ffs_sync() returned but before the file system was marked as suspended. Detect if secondary writes or softdep processing occurred during vnode sync loop in ffs_sync() and retry the loop if needed.
# 155902	22-Feb-2006	jeff	- We can not hold a vnode lock while we do a lookup. Search for and load modules prior to looking up the directory which we will cover to avoid this problem in mount. - We must hold the coveredvp locked before we can busy the mountpoint to prevent a lock order reversal with the vfs_busy() in lookup which holds the directory lock prior to doing a vfs_busy(). The directory lock is required to safely clear the v_mountedhere field on the directory. MFC After: 1 week
# 155386	06-Feb-2006	jeff	- Add a ref count to the mount structure. Sleep for up to 3 seconds in vfs_mount_destroy waiting for this ref to hit 0. We don't print an error if we are rebooting as the root mount always retains some refernces by init proc. - Acquire a mnt ref for every vnode allocated to a mount point. Drop this ref only once vdestroy() has been called and the mount has been freed. - No longer NULL the v_mount pointer in delmntque() so that we may release the ref after vgone() has been called. This allows us to guarantee that the mount point structure will be valid until the last vnode has lost its last ref. - Fix a few places that rely on checking v_mount to detect recycling. Sponsored by: Isilon Systems, Inc. MFC After: 1 week
# 154962	28-Jan-2006	ssouhlal	Don't try to load KLDs if we're mounting the root. We'd otherwise panic. Tested by: kris MFC after: 3 days
# 154403	15-Jan-2006	csjp	vfs_busy can only return something useful if MNTK_UNMOUNT has been set. Since we are using vfs_busy() on a freshly allocated mount structure, use (void) to show that we do not care about the return value. Found with: Coverity Prevent (tm) MFC after: 2 weeks
# 154402	15-Jan-2006	rwatson	Cast VFS_STATFS() in vfs_domount() to (void) to indicate that ignoring the return value is intentional: this is simply an attempt to pre-cache the statfs state. Found with: Coverity Prevent (tm) MFC after: 3 days
# 154373	14-Jan-2006	ru	AMD64 also supports disk slices.
# 154152	09-Jan-2006	tegge	Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman
# 153546	19-Dec-2005	pjd	vfs_mount_alloc() always returns 0, but what we really want is newly allocated 'struct mount *' pointer, so simplify code a bit and return the pointer directly. Reviewed by: ssouhlal
# 153541	19-Dec-2005	pjd	Use 'td' instead of 'curthread'.
# 153226	08-Dec-2005	rodrigc	In devfs_first(), set mp->mnt_opt to a valid empty list of mount options instead of leaving it NULL. This eliminates a kernel panic when trying to do a mount -o update of /dev. Noticed by: cjsp Reviewed by: phk
# 153225	08-Dec-2005	rodrigc	Add "errmsg" to list of global mount options.
# 153051	03-Dec-2005	rodrigc	Add "rdonly" to global_opts, and parse it in vfs_donmount(). Requested by: rwatson
# 153034	02-Dec-2005	rodrigc	- Add "rw" mount option to global_opts. - In vfs_donmount(), parse "ro", "noro", and "rw", in order to set or unset the MNT_RDONLY filesystem flag.
# 152735	23-Nov-2005	rodrigc	In nmount() and vfs_donmount(), do not strcmp() the options in the iovec directly. We need to copyin() the strings in the iovec before we can strcmp() them. Also, when we want to send the errmsg back to userspace, we need to copyout()/copystr() the string. Add a small helper function vfs_getopt_pos() which takes in the name of an option, and returns the array index of the name in the iovec, or -1 if not found. This allows us to locate an option in the iovec without actually manipulating the iovec members. directly via strcmp(). Noticed by: kris on sparc64
# 152620	19-Nov-2005	marcel	Fix bug introduced in revision 1.186: When all file systems have a time stamp of zero, which is the case for example when the root file system is on a read-only medium, we ended up not calling inittodr() at all. A potential uncleanliness existed as well. If multiple file systems had a non-zero time stamp, we would call inittodr() multiple times. While this should not be harmful, it's definitely not ideal. Fix both issues by iterating over the mounted file systems to find the largest time stamp and call inittodr() exactly once with that time stamp. This could of course be a zero time stamp if none of the mounted file systems have a non-zero time stamp. In that case the annoying errors mentioned in the commit log for revision 1.186 still haven't been avoided. The bottom line is that inittodr() should not complain when it gets a time base of zero. At the time of this commit only alpha seems to have that problem. Reported by: Dario Freni (saturnero at freesbie dot org) MFC after: 1 week
# 152619	19-Nov-2005	rodrigc	Parse more mount options in vfs_donmount(), before vfs_domount() is called. It looks like there are lots of different mount flags checked in vfs_domount(), so we need to do the parsing for these particular mount flags earlier on. The new flags parsed are: async, force, multilabel, noasync, noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow, snapshot, suiddir, sync, union. Existing code which uses mount() to mount UFS filesystems is not affected, but new code which uses nmount() to mount UFS filesystems should behave better.
# 152561	17-Nov-2005	rodrigc	In vfs_nmount(), check to see if "update" mount option was passed in, and if so, set MNT_UPDATE filesystem flag. vfs_nmount() calls vfs_domount(), and there is special logic inside vfs_domount() if MNT_UPDATE is set. This is very important when we want to do an update mount of the root filesystem, using nmount().
# 152332	12-Nov-2005	rodrigc	style(9) cleanups. Spotted by: njl, bde
# 152217	09-Nov-2005	rodrigc	For nmount(), allow a text string error message to be propagated back to user-space if a parameter named "errmsg" is passed into the iovec. Used in conjunction with vfs_mount_error(), more useful error messages than errno can be passed back to userspace when mounting a filesystem fails. Discussed with: phk, pjd
# 152176	08-Nov-2005	rodrigc	Add utility function to propagate mount errors as text string messages. Discussed with: phk
# 149712	02-Sep-2005	ssouhlal	Don't unbusy the devfs mount in vfs_mountroot_try() as it gets accessed and unbusied in devfs_fixup(), which assumes that the devfs mount is still locked. Granced at by: phk MFC after: 3 days
# 146354	18-May-2005	pjd	devfs_first() return value isn't used, remove it.
# 146125	11-May-2005	pjd	We don't use 'mp' variable, but we do want to mount devfs, ehh.
# 146119	11-May-2005	pjd	Remove unised variable introduced by accident in rev 1.168. Found by: Coverity Prevent analysis tool
# 146116	11-May-2005	pjd	Plug memory leaks. Found by: Coverity Prevent analysis tool
# 145733	30-Apr-2005	jeff	- Remove an old splcam hack.
# 145304	19-Apr-2005	pjd	Call g_waitidle() before every check the list of holds is empty. Suggested by: phk
# 145259	19-Apr-2005	phk	Call g_waitidle() instead of GEOM using the root_mount_hold() KPI. GEOM could (and will) get events as a result of drivers coming in late so a one-shot method is not good enough for GEOM.
# 145250	18-Apr-2005	phk	Add a named reference-count KPI to hold off mounting of the root filesystem. While we wait for holds to be released, print a list of who holds us back once per second. Use the new KPI from GEOM instead of vfs_mount.c calling g_waitidle(). Use the new KPI also from ata. With ATAmkIII's newbusification, ata could narrowly miss the window and ad0 would not exist when we tried to mount root.
# 145249	18-Apr-2005	phk	Initialize mountlist_mtx with an MTX_SYSINIT(), we need it to be ready earlier.
# 144367	31-Mar-2005	jeff	- LK_NOPAUSE is a nop now. Sponsored by: Isilon Systems, Inc.
# 144089	24-Mar-2005	marcel	Fix inittodr() invocation. Now that devfs is mounted before the actual root file system is mounted, the first entry on the mountlist is not the root file system and the timestamp for that entry is typically 0. Passing that to inittodr() caused annoying errors on alpha and ia64. So, call inittodr() for all file systems on mountlist, but only when the timestamp (mnt_time) is non-zero.
# 144055	24-Mar-2005	jeff	- Pass LK_EXCLUSIVE to VFS_ROOT() to satisfy the new flags argument. For now, all calls to VFS_ROOT() should still acquire exclusive locks. Sponsored by: Isilon Systems, Inc.
# 143690	16-Mar-2005	phk	Fix a memoryleak in case of failed root filesystem mount. Spotted by: Coverity via sam
# 143683	16-Mar-2005	jmg	MFp4: print a more useful error when we don't have a /dev to mount devfs on..
# 143680	16-Mar-2005	phk	Add mnt_hashseed to struct mount and initialize it witn PRNG bits, use it to get better hashing in vfs_hash. In case of an insert collision in vfs_hash_insert(), put the loosing vnode on a special list so that vfs_hash_remove() can just assume that it is on a list. Drop the VI_HASHED flag.
# 142153	20-Feb-2005	das	Remove VFS_START(). Its original purpose involved the mfs filesystem, which is long gone. Discussed with: mckusick Reviewed by: phk
# 141634	10-Feb-2005	phk	Make various mountpoint related functions static.
# 141206	03-Feb-2005	pjd	- Move gets() function to libkern (I want to use it outside vfs_mount.c). - Add buffer size limitations (overflow will not be possible anymore). - Add 'visible' option, which will allow for passphrase reading in the future. - Remove special treatment of '@' and '#', those two are only confusing. Discussed with: rwatson MFC after: 2 weeks
# 140715	24-Jan-2005	jeff	- Protect mnt_kern_flag with the mountpoint's mutex. This is required to make the suspend related functions mpsafe. Sponsored By: Isilon Systems, Inc.
# 140220	14-Jan-2005	phk	Eliminate unused and unnecessary "cred" argument from vinvalbuf()
# 140048	11-Jan-2005	phk	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson
# 139804	06-Jan-2005	imp	/* -> /*- for copyright notices, minor format tweaks as necessary
# 139337	27-Dec-2004	kan	Do not vput(9) unlocked vnode and do not VREF it with the sole purpose of vputting it back immediately. Complained by: DEBUG_VFS_LOCKS
# 139095	20-Dec-2004	phk	Hide/remove various printfs, now that root mounting doesn't seem to explode on people.
# 138835	14-Dec-2004	phk	Move the checkdirs() function from vfs_mount.c to kern_descrip.c and call it mountcheckdirs().
# 138696	11-Dec-2004	phk	Copy the entire stats structure. Let compiler decide how.
# 138691	11-Dec-2004	phk	Fix whitespace. Spotted by: njl
# 138679	11-Dec-2004	phk	Remove the /dev/dev -> / symlink after we are done with it.
# 138509	07-Dec-2004	phk	The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go: cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility. ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() Remove vfs_omount() method, all filesystems are now converted. Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now. Change rootmounting to use DEVFS trampoline: vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem. Remove now unnecessary getdiskbyname(). kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c. Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer. Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).
# 138507	07-Dec-2004	phk	Instead of complaining about it, just silently filter out MNT_ROOTFS. This fixes the "fsck /" problem various people have reported overnight.
# 138480	06-Dec-2004	phk	Always call VFS_STATFS() on mp->mnt_stat when we have mounted a filesystem, this way individual filesystems don't have to do it.
# 138467	06-Dec-2004	phk	Add more functions for handling mount arguments in VFS_MOUNT(): vfs_flagopt() for binary/boolean options. vfs_getopts() for string options vfs_filteropt() to check for unknown options. vfs_scanopt() for scanf() like processing of options. Also add function for setting the stat.f_mntfromname field.
# 138461	06-Dec-2004	phk	Change the first argument of vfs_cmount() to a handy struct mntarg* and call it accordingly. (No filesystems implement vfs_cmount() yet, so this is a no-op commit)
# 138448	06-Dec-2004	phk	Add a few convenient functions in the mount_arg() family and collect the entire family at the end of the source file.
# 138446	06-Dec-2004	phk	Collapse two almost identical license copies, preserving the rights of all listed authors, rightholders and contributors.
# 138445	06-Dec-2004	phk	Remove the kern.rootdev sysctl. Root filessytems (like NFS) don't have an associated disk device, and even if they had, the exact semantics would be filesystem dependent and should be implemented there.
# 138444	06-Dec-2004	phk	Make struct vfsopt{list} private to vfs_mount.c
# 138412	05-Dec-2004	phk	VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases doesn't. Most of the implementations have grown weeds for this so they copy some fields from mnt_stat if the passed argument isn't that. Fix this the cleaner way: Always call the implementation on mnt_stat and copy that in toto to the VFS_STATFS argument if different.
# 138363	03-Dec-2004	phk	Implement a function, mount_arg() for accumulating a list of mount parameters to nmount. Make kernel_mount() accept the output from mount_arg() and know how to free the malloc'ed space. Make kernel_vmount() use the new function.
# 138360	03-Dec-2004	phk	When omount() is called, check if the filesystem have a cmount method and if so call it. The cmount method will gather and interpret omount() style arguments, and issue a kern_[v]mount() call to execute the corresponding nmount operation.
# 138357	03-Dec-2004	phk	Add early checks for MNT_ROOTFS since we need to allow it later on in the code path.
# 138356	03-Dec-2004	phk	Retire unused vfs_mount() function in the name of nmount migration.
# 138350	03-Dec-2004	phk	Introduce vfs_byname_kld() which will try to load the filesystem as a module if possible. Use it so we don't have linker magic in the middle of the already complex mount code.
# 138150	28-Nov-2004	phk	Use FILEDESC_LOCK_FAST in checkdirs()
# 138121	26-Nov-2004	phk	Eliminate MNT_NODEV usage, it doesn't have any meaning any more. Keep a #define MNT_NODEV 0 around to avoid dealing with contrib userland like mount_smbfs.
# 138091	25-Nov-2004	phk	Allow a filesystem to have both old and new mount methods at the same time. This will be necessary for transitioning.
# 138087	25-Nov-2004	phk	Assert Giant held in vfs_domount() and vfs_dounmount() Explicitly grab Giant before calling these.
# 138077	25-Nov-2004	phk	Integrate the relevant bits of vfs_rootmountalloc() where it matters.
# 137863	18-Nov-2004	phk	Pass path to filesystem when mounting root
# 137523	10-Nov-2004	phk	remove unused variable
# 137509	10-Nov-2004	phk	Remove hack which mounts the root filesystem R/W if the device is named 'md<something>'. While convenient, it does not belong here, if anywhere at all.
# 137484	09-Nov-2004	phk	Make getdiskbyname() static to vfs_mount.c. Eliminate use of vn_todev() while here.
# 136835	23-Oct-2004	phk	Drop Giant around the call to g_waitidle(). This is necessary to allow any geom events which need it to pick up Giant.
# 136144	05-Oct-2004	pjd	Back out changes which were introduced to delay mounting root file system. Those changes were made on gmirror needs, but now gmirror handles this by itself.
# 135728	24-Sep-2004	pjd	Rename 'mount_root_delay' tunable to 'vfs.root.mountdelay', which fits a bit better to our current naming scheme. Discussed with: ru
# 135720	24-Sep-2004	phk	Eliminate devsw() call, we are not dereferencing the pointer.
# 135612	23-Sep-2004	pjd	Introduce new /boot/loader.conf variable: root_mount_delay. It can be used to delay mounting root partition to give a chance to GEOM providers to show up. Now, when there is no needed provider, vfs_rootmount() function will look for it every second and if it can't be find in defined time, it'll ask for root device name (before this change it was done immediately). This will allow to boot from gmirror device in degraded mode.
# 134827	05-Sep-2004	alfred	It's too easy to panic the machine when INVARIANTS are turned on and you botch a call to nmount(2). This is because there is an INVARIANTS check that asserts that opt->len must be zero if opt->val is not NULL. The problem is that the code does not actually follow this invariant if there is an error while processing mount options. Fix the code to honor the INVARIANT. Silence on: fs@
# 132902	30-Jul-2004	phk	Put a version element in the VFS filesystem configuration structure and refuse initializing filesystems with a wrong version. This will aid maintenance activites on the 5-stable branch. s/vfs_mount/vfs_omount/ s/vfs_nmount/vfs_mount/ Name our filesystems mount function consistently. Eliminate the namiedata argument to both vfs_mount and vfs_omount. It was originally there to save stack space. A few places abused it to get hold of some credentials to pass around. Effectively it is unused. Reorganize the root filesystem selection code.
# 132710	27-Jul-2004	phk	Convert the vfsconf list to a TAILQ. Introduce vfs_byname() function to find things on it. Staticize vfs_nmount() function under the name vfs_donmount(). Various cleanups.
# 132117	13-Jul-2004	phk	Give kldunload a -f(orce) argument. Add a MOD_QUIESCE event for modules. This should return error (EBUSY) of the module is in use. MOD_UNLOAD should now only fail if it is impossible (as opposed to inconvenient) to unload the module. Valid reasons are memory references into the module which cannot be tracked down and eliminated. When kldunloading, we abandon if MOD_UNLOAD fails, and if -force is not given, MOD_QUIESCE failing will also prevent the unload. For backwards compatibility, we treat EOPNOTSUPP from MOD_QUIESCE as success. Document that modules should return EOPNOTSUPP for unknown events.
# 132023	12-Jul-2004	alfred	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.
# 131897	10-Jul-2004	phk	Clean up and wash struct iovec and struct uio handling. Add copyiniov() which copies a struct iovec array in from userland into a malloc'ed struct iovec. Caller frees. Change uiofromiov() to malloc the uio (caller frees) and name it copyinuio() which is more appropriate. Add cloneuio() which returns a malloc'ed copy. Caller frees. Use them throughout.
# 131696	06-Jul-2004	alfred	Use vfs_suser() where appropriate.
# 131691	06-Jul-2004	alfred	NFS mobility PHASE I, II & III (phase VI, and V pending): Rebind the client socket when we experience a timeout. This fixes the case where our IP changes for some reason. Signal a VFS event when NFS transitions from up to down and vice versa. Add a placeholder vfs_sysctl where we will put status reporting shortly. Also: Make down NFS mounts return EIO instead of EINTR when there is a soft timeout or force unmount in progress.
# 131562	04-Jul-2004	alfred	Introduce a new kevent filter. EVFILT_FS that will be used to signal generic filesystem events to userspace. Currently only mount and unmount of filesystems are signalled. Soon to be added, up/down status of NFS. Introduce a sysctl node used to route requests to/from filesystems based on filesystem ids. Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/ entrypoint by the sysctl code to change individual filesystems.
# 131551	04-Jul-2004	phk	When we traverse the vnodes on a mountpoint we need to look out for our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list. Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this: MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp); The code which takes vnodes off a mountpoint looks like this: MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something; (Take a moment and try to spot the locking error before you read on.) On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint. Fix: Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code. Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say. Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.
# 130795	20-Jun-2004	tmm	Initialize ni_cnd.cn_cred before calling lookup() (this is normally done by namei(), which cannot easily be used here however). This fixes boot time crashes on sparc64 and probably other platforms. Reviewed by: phk
# 130651	17-Jun-2004	phk	Reduce the thaumaturgical level of root filesystem mounts: Instead of using an otherwise redundant clone routine in geom_disk.c, mount a temporary DEVFS and do a proper lookup. Submitted by: thomas
# 130640	17-Jun-2004	phk	Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.
# 130585	16-Jun-2004	phk	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.
# 127911	05-Apr-2004	imp	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
# 127476	27-Mar-2004	pjd	- Add a description for vfs.usermount sysctl. - Add the vfs_equalopts() function for mount options comparsion. Now it looks much more clear. - Style fixed. In co-operation with: bde
# 127473	27-Mar-2004	pjd	- Loudly disallow MNT_SUIDDIR mount flag for unprivileged users mounts. - Style fixed. Submitted by: bde
# 127464	26-Mar-2004	pjd	We probably shouldn't allow users to mount file systems with MNT_SUIDDIR. There should be not shell access when SUIDDIR is compiled in, but better be sure. Reviewed by: rwatson
# 127058	16-Mar-2004	tjr	Make vfs_nmount() public. The Linux emulator needs this in order to mount linprocfs filesystems.
# 126852	11-Mar-2004	phk	Remove unused mnt_reservedvnlist field.
# 125957	18-Feb-2004	cperciva	Don't ignore errors from vfs_allocate_syncvnode. PR: kern/18503 Submitted by: Anatoly Vorobey <mellon@pobox.com> Approved by: rwatson (mentor)
# 125340	02-Feb-2004	pjd	Fix many issues related to mount/unmount: 1. Root from inside a jail was able to unmount any file system (except /). 2. Unprivileged root was able to unmount file systems mounted by privileged root (execpt /). 3. User from inside a jail was able to mount file system when sysctl vfs.usermount was set to 1. 4. User was able to mount file system when vfs.usermount was set to 1 (that's ok) and unmount it even if vfs.usermount was equal to 0 (that's not correct). Possibility from point 1 was reported by: Dariusz Kowalski <darek@76.pl> Only a part of this fix will be MFC'ed (if approved). PR: kern/60149 Reviewed by: rwatson Approved by: scottl (mentor) MFC after: 3 days
# 123075	30-Nov-2003	iedowse	In dounmount(), only call checkdirs() prior to VFS_UNMOUNT() in the forced unmount case. Otherwise, a file system that is referenced only by process fd_cdir/fd_rdir references to the file system root vnode will be successfully unmounted without the MNT_FORCE flag. The previous behaviour was not compatible with the unmount semantics required by amd(8), so file systems could be unexpectedly unmounted while there were still references to the file system root directory. Reported by: Erez Zadok <ezk@cs.sunysb.edu> Approved by: re (scottl)
# 122965	23-Nov-2003	kan	Do not attempt to destroy NULL vfs options list. Approved by: re (scottl) Reported by: Christian Laursen <xi atborderworlds dot dk>
# 122640	14-Nov-2003	kan	Fix a number of style(9) bugs introduced in r1.113 by me. Suggested by: bde
# 122567	12-Nov-2003	peter	MNAMELEN is back to an int again after Kirk's statfs commit kern/vfs_mount.c:1305: warning: signed size_t format, different type arg (arg 4) *** Error code 1
# 122523	12-Nov-2003	kan	1. Consolidate mount struct allocation/destruction into a common code in vfs_mount_alloc/vfs_mount_destroy functions and take care to completely destroy the mount point along with its locks. Mount struct has grown in coplexity recently and depending on each failure path to destroy it completely isn't working anymore. 2. Eliminate largely identical vfs_mount and vfs_unmount question by moving the code to handle both cases into a newly introduced vfs_domount function. 3. Simplify nfs_mount_diskless to always expect an allocated mount struct and never attempt an allocation/destruction itself. The vfs_allocroot allocation was there to support 'magic' swap space configuration for diskless clients that was already removed by PHK some time ago. 4. Include a vfs_buildopts cleanups by Peter Edwards to validate the sanity of nmount parameters passed from userland. Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com> Reviewed by: rwatson
# 122091	05-Nov-2003	kan	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff
# 120462	26-Sep-2003	phk	Update the list of CDROM device names to try for booting with RB_CDROM flag set.
# 119885	08-Sep-2003	iedowse	In the !MNT_BYFSID case, return EINVAL from unmount(2) when the specified directory is not found in the mount list. Before the MNT_BYFSID changes, unmount(2) used to return ENOENT for a nonexistent path and EINVAL for a non-mountpoint, but we can no longer distinguish between these cases. Of the two error codes, EINVAL was more likely to occur in practice, and it was the only one of the two that was documented. Update the manual page to match the current behaviour. Suggested by: tjr Reviewed by: tjr
# 117132	01-Jul-2003	iedowse	Add a new mount flag MNT_BYFSID that can be used to unmount a file system by specifying the file system ID instead of a path. Use this by default in umount(8). This avoids the need to perform any vnode operations to look up the mount point, so it makes it possible to unmount a file system whose root vnode cannot be looked up (e.g. due to a dead NFS server, or a file system that has become detached from the hierarchy because an underlying file system was unmounted). It also provides an unambiguous way to specify which file system is to be unmunted. Since the ability to unmount using a path name is retained only for compatibility, that case now just uses a simple string comparison of the supplied path against f_mntonname of each mounted file system. Discussed on: freebsd-arch mdoc help from: ru
# 116182	10-Jun-2003	obrien	Use __FBSDID().
# 115960	07-Jun-2003	phk	Improve the root-dev prompt facility for printing devices which could possibly be a root filesystem.
# 113958	24-Apr-2003	tjr	Free mount credentials (mnt_cred) when freeing the mount struct in failure cases to avoid leaking struct ucreds, and ultimately leaking struct uidinfo references.
# 113887	23-Apr-2003	obrien	Add /dev to the Alpha manual mount root example.
# 112693	26-Mar-2003	tegge	Adjust the number of vnodes scanned by vlrureclaim() according to the size of the vnode list.
# 111242	22-Feb-2003	rwatson	Export the name of the device used to mount the root file system as kern.rootdev. If rootdev is undefined (NFS mount, etc), export an empty string. Desired by: peter
# 111119	19-Feb-2003	imp	Back out M_* changes, per decision of the TRB. Approved by: trb
# 110906	15-Feb-2003	alfred	Fix LOR with PROC/filedesc. Introduce fdesc_mtx that will be used as a barrier between free'ing filedesc structures. Basically if you want to access another process's filedesc, you want to hold this mutex over the entire operation.
# 110863	14-Feb-2003	des	Style nit.
# 110862	14-Feb-2003	alfred	KASSERT format string does not need newline termination
# 110861	14-Feb-2003	alfred	Add kasserts to catch bad API usage. Submitted by: Hiten Pandya <hiten@unixdaemons.com>
# 109623	21-Jan-2003	alfred	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
# 108524	31-Dec-2002	alfred	When compiling the kernel do not implicitly include filedesc.h from proc.h, this was causing filedesc work to be very painful. In order to make this work split out sigio definitions to thier own header (sigio.h) which is included from proc.h for the time being.
# 107855	14-Dec-2002	alfred	unwrap lines made short enough by SCARGS removal
# 107850	14-Dec-2002	alfred	remove syscallarg(). Suggested by: peter
# 107849	13-Dec-2002	alfred	SCARGS removal take II.
# 107839	13-Dec-2002	alfred	Backout removal SCARGS, the code freeze is only "selectively" over.
# 107838	13-Dec-2002	alfred	Remove SCARGS. Reviewed by: md5
# 106576	07-Nov-2002	mux	- Use a better definition for MNAMELEN which doesn't require to have one #ifdef per architecture. - Change a space to a tab after a nearby #define. Obtained from: bde
# 105948	25-Oct-2002	phk	#include <geom/geom.h> to get proper prototypes. Contrary to my fears we seem to have all the prerequisites already. Call g_waitidle() as the first thing in vfs_mountroot() so that we have it out of the way before we even decide if we should call .._ask() or .._try(). Call the g_dev_print() function to provide better guidance for the root-mount prompt.
# 105893	24-Oct-2002	phk	Make sure GEOM has stopped rattling the disks before we try to mount the root filesystem, this may be implicated in the PC98 issue.
# 105668	21-Oct-2002	mckusick	This update removes a race between unmount and lookup. The lookup locks the mount point directory while waiting for vfs_busy to clear. Meanwhile the unmount which holds the vfs_busy lock tried to lock the mount point vnode. The fix is to observe that it is safe for the unmount to remove the vnode from the mount point without locking it. The lookup will wait for the unmount to complete, then recheck the mount point when the vfs_busy lock clears. Sponsored by: DARPA & NAI Labs.
# 105648	21-Oct-2002	phk	GEOM does not (and shall not) propagate flags like D_MEMDISK, so we will revert to checking the name to determine if our root device is a ramdisk, md(4) specifically to determine if we should attempt the root-mount RW Sponsored by: DARPA & NAI Labs.
# 103928	24-Sep-2002	jeff	- Don't protect mountedhere with the vn interlock. - Protect mountedhere with the vn lock.
# 103650	19-Sep-2002	mux	Switch to using strlcpy() in several places. It seems there were cases where we could get unterminated strings before.
# 102088	19-Aug-2002	phk	Keep a copy of the credential used to mount filesystems around so we can check and use it later on. Change the pieces of code which relied on mount->mnt_stat.f_owner to check which user mounted the filesystem. This became needed as the EA code needs to be able to allocate blocks for "system" EA users like ACLs. There seems to be some half-baked (probably only quarter- actually) notion that the superuser for a given filesystem is the user who mounted it, but this has far from been carried through. It is unclear if it should be. Sponsored by: DARPA & NAI Labs.
# 101308	04-Aug-2002	jeff	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS
# 101241	02-Aug-2002	mux	Make the consumers of the linker_load_file() function use linker_load_module() instead. This fixes a bug where the kernel was unable to properly locate and load a kernel module in vfs_mount() (and probably in the netgraph code as well since it was using the same function). This is because the linker_load_file() does not properly search the module path. Problem found by: peter Reviewed by: peter Thanks to: peter
# 101173	01-Aug-2002	rwatson	Include file cleanup; mac.h and malloc.h at one point had ordering relationship requirements, and no longer do. Reminded by: bde
# 101004	30-Jul-2002	rwatson	Introduce support for Mandatory Access Control and extensible kernel access control. Invoke the necessary MAC entry points to maintain labels on mount structures. In particular, invoke entry points for intialization and destruction in various scenarios (root, non-root). Also introduce an entry point in the boot procedure following the mount of the root file system, but prior to the start of the userland init process to permit policies to perform further initialization. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 100863	29-Jul-2002	jeff	- Backout the patch made in revision 1.75 of vfs_mount.c. The vputs here were hiding the real problem of the missing unlock in sync_inactive. - Add the missing unlock in sync_inactive. Submitted by: iedowse
# 100631	24-Jul-2002	mux	Fix a stupid bug where I wasn't initializing the names of 0-length mount options.
# 100363	19-Jul-2002	mux	- Merge the mount options at MNT_UPDATE time with vfs_mergeopts(). - Sanity check the mount options list (remove duplicates) with vfs_sanitizeopts(). - Fix some malloc(0)/free(NULL) bugs. Reviewed by: rwatson (some time ago)
# 99690	09-Jul-2002	jeff	- Use standard locking functions in syncer's opv - vput instead of vrele syncer vnodes in vfs_mount - Add vop_lookup_{pre,post} to verify locking in VOP_LOOKUP
# 99602	08-Jul-2002	mux	Add a VFS_START() call in vfs_mountroot_try() for the sake of being correct. None of the root mountable filesystems do something at VFS_START(). Shorten a comment to fix a style bug while I'm here. PR: kern/18505
# 99423	05-Jul-2002	jeff	Include systm.h before vnode.h so Debugger() and printf() are available when full vnode lock debugging is enabled.
# 99338	03-Jul-2002	mux	Move vfs_rootmountalloc() in vfs_mount.c and remove lite2_vfs_mountroot() which was #if 0'd and is not likely to be used now.
# 99336	03-Jul-2002	mux	Remove an unused argument in vfs_mountroot().
# 99266	02-Jul-2002	mux	I didn't pay enough attention when copy/pasting disclaimers. The disclaimer in vfs_conf.c was slightly different. Fix this.
# 99264	02-Jul-2002	mux	Move every code related to mount(2) in a new file, vfs_mount.c. The file vfs_conf.c which was dealing with root mounting has been repo-copied into vfs_mount.c to preserve history. This makes nmount related development easier, and help reducing the size of vfs_syscalls.c, which is still an enormous file. Reviewed by: rwatson Repo-copy by: peter
# 94936	17-Apr-2002	mux	Rework the kernel environment subsystem. We now convert the static environment needed at boot time to a dynamic subsystem when VM is up. The dynamic kernel environment is protected by an sx lock. This adds some new functions to manipulate the kernel environment : freeenv(), setenv(), unsetenv() and testenv(). freeenv() has to be called after every getenv() when you have finished using the string. testenv() only tests if an environment variable is present, and doesn't require a freeenv() call. setenv() and unsetenv() are self explanatory. The kenv(2) syscall exports these new functionalities to userland, mainly for kenv(1). Reviewed by: peter
# 93467	31-Mar-2002	phk	Centralize the "bootdev" and "dumpdev" variables. They are still pretty bogus all things considered, but at least now they don't camouflage as being MD variables.
# 91859	08-Mar-2002	phk	Move the mount of the root filesystem to happen in the init process before the exec if /sbin/init. This allows the scheduler to get started and kthreads a chance to run before we start filesystem operations.
# 91690	05-Mar-2002	eivind	Document all functions, global and static variables, and sysctls. Includes some minor whitespace changes, and re-ordering to be able to document properly (e.g, grouping of variables and the SYSCTL macro calls for them, where the documentation has been added.) Reviewed by: phk (but all errors are mine)
# 86842	23-Nov-2001	obrien	Remove the use of _PATH_DEV in the example. The kernel certainly doesn't use _PATH_DEV or even /dev/ to find the device. It cannot, since "/" has not been mounted. Maybe the only affect of using /dev/ is that it gets put in the mounted-from name for "/", so that mount(8), etc., display an absolute path before "/" has been remounted. Many have never bothered typing the full path, and code that constructs a path in rootdevnames[] never bothered to construct a full path, so the example shouldn't have it. Submitted by: bde
# 86704	20-Nov-2001	obrien	We only have slices on i386 and IA-64.
# 83366	12-Sep-2001	julian	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 76166	01-May-2001	markm	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
# 76117	29-Apr-2001	grog	Revert consequences of changes to mount.h, part 2. Requested by: bde
# 75858	23-Apr-2001	grog	Correct #includes to work with fixed sys/mount.h.
# 73286	01-Mar-2001	adrian	Reviewed by: jlemon An initial tidyup of the mount() syscall and VFS mount code. This code replaces the earlier work done by jlemon in an attempt to make linux_mount() work. * the guts of the mount work has been moved into vfs_mount(). * move `type', `path' and `flags' from being userland variables into being kernel variables in vfs_mount(). `data' remains a pointer into userspace. * Attempt to verify the `type' and `path' strings passed to vfs_mount() aren't too long. * rework mount() and linux_mount() to take the userland parameters (besides data, as mentioned) and pass kernel variables to vfs_mount(). (linux_mount() already did this, I've just tidied it up a little more.) * remove the copyin() stuff for `path'. `data' still requires copyin() since its a pointer into userland. * set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each filesystem. This variable is generally initialised with `path', and each filesystem can override it if they want to. * NOTE: f_mntonname is intiailised with "/" in the case of a root mount.
# 72200	09-Feb-2001	bmilekic	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# 69793	09-Dec-2000	obrien	Add `_PATH_DEVZERO'. Use _PATH_* where where possible.
# 67882	29-Oct-2000	phk	Remove unneeded #include <sys/proc.h> lines.
# 66615	03-Oct-2000	jasone	Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
# 65374	02-Sep-2000	phk	Avoid the modules madness I inadvertently introduced by making the cloning infrastructure standard in kern_conf. Modules are now the same with or without devfs support. If you need to detect if devfs is present, in modules or elsewhere, check the integer variable "devfs_present". This happily removes an ugly hack from kern/vfs_conf.c. This forces a rename of the eventhandler and the standard clone helper function. Include <sys/eventhandler.h> in <sys/conf.h>: it's a helper #include like <sys/queue.h> Remove all #includes of opt_devfs.h they no longer matter.
# 65051	24-Aug-2000	phk	Fix panic when removing open device (found by bp@) Implement subdirs. Build the full "devicename" for cloning functions. Fix panic when deleted device goes away. Collaps devfs_dir and devfs_dirent structures. Add proper cloning to the /dev/fd* "device-"driver. Fix a bug in make_dev_alias() handling which made aliases appear multiple times. Use devfs_clone to implement getdiskbyname() Make specfs maintain the stat(2) timestamps per dev_t
# 60807	22-May-2000	msmith	Make a trip to Pointy-Hats-R-Us and actually include the header that defines ROOTDEVNAME. Submitted by: "Jeffrey S. Sharp" <jss@subatomix.com>
# 58389	20-Mar-2000	green	Split the logic of static int setrootbyname(char name); out into dev_t getdiskbyname(char name); This makes it easy to create a new DDB command, which is the big reason for the change. You can now do the following in DDB: Example rc.conf entry: dumpdev="/dev/ad0s1b" # Device name to crashdump to (if enabled). db> show disk/ad0s1b dev_t = 0xc0b7ea00 db> p *dumpdev c0b7ea00
# 57296	17-Feb-2000	msmith	Change the mountroot prompt to something that doesn't look at all like a firmware prompt. Several sleepy folk mistook the '>>>' for the SRM prompt, which was never the desired idea. Submitted by: Andrew Gallatin <gallatin@cs.duke.edu> Approved by: jkh
# 54498	12-Dec-1999	peter	Put on asbestos suit and put a splcam() around the 'Mounting root from..' message to stop it splitting. Every single scsi machine I've seen seems to reliably collide with this and it's rather annoying.
# 54298	08-Dec-1999	phk	Scan cdevs for potential root devices, rather than bdevs.
# 53888	29-Nov-1999	dillon	Make BOOTP work again. Submitted by: Doug Ambrisko <ambrisko@whistle.com>
# 53858	28-Nov-1999	msmith	Use the correct mounted-from path when allocating the root mount, if we know what it is. Be more correct in unbusying the mountpoint (especially before freeing it). Remove support for mounting 'r' devices as root. You don't mount 'r' devices anywhere else, and they're going away anyway. Submitted by: bde
# 53722	26-Nov-1999	phk	Retire MFS_ROOT and MFS_ROOT_SIZE options from the MFS implementation. Add MD_ROOT and MD_ROOT_SIZE options to the md driver. Make the md driver handle MFS_ROOT and MFS_ROOT_SIZE options for compatibility. Add md driver to GENERIC, PCCARD and LINT. This is a cleanup which removes the need for some of the worse hacks in MFS: We really want to have a rootvnode but MFS on a preloaded image doesn't really have one. md is a true device, so it is less trouble. This has been tested with make release, and if people remember to add the "md" pseudo-device to their kernels, PicoBSD should be just fine as well. If people have no other use for MFS, it can be removed from the kernel.
# 53493	21-Nov-1999	msmith	If vfs_mountroot_try() isn't given a path to try mounting, return a silent error rather than complaining about it verbosely. No path is not really a failure, but the diagnostic was confusing and unuseful.
# 53452	20-Nov-1999	phk	struct mountlist and struct mount.mnt_list have no business being a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively. This removes ugly mp != (void*)&mountlist comparisons. Requested by: phk Submitted by: Jake Burkholder jake@checker.org PR: 14967
# 53014	08-Nov-1999	phk	Ignore leading 'r' in base of root device name.
# 52916	06-Nov-1999	msmith	Clean up a couple of initialisations in order to suppress a correct but un-useful warning.
# 52906	05-Nov-1999	msmith	Guard against freeing NULL if vfs_mountroot_try is called with NULL as an argument (this is legal to make other code simpler).
# 52886	05-Nov-1999	msmith	Expand the sscanf buffer to 32 bytes to make room for the expanded pattern, with some space left over to avoid this mistake next time it's improved. Submitted by: luoqi
# 52881	04-Nov-1999	msmith	Allow vfs names to include the digits 0-9 as well as the letters a-z. This should let 'cd9660' filesystems be allowed. Submitted by: ghelmer
# 52854	03-Nov-1999	msmith	Re-implement the handing of RB_CDROM in a machine-independant fashion. We currently only search SCSI and IDE CDROMs; if there's felt to be a need for supporting the very old and rare soundcard etc. drives for this application they can be trivially added.
# 52836	03-Nov-1999	msmith	Make MFS work with the new root filesystem search process. In order to achieve this, root filesystem mount is moved from SI_ORDER_FIRST to SI_ORDER_SECOND in the SI_SUB_MOUNT_ROOT sysinit group. Now, modules which wish to usurp the default root mount can use SI_ORDER_FIRST. A compiled-in or preloaded MFS filesystem will become the root filesystem unless the vfs.root.mountfrom environment variable refers to a valid bootable device. This will normally only be the case when the kernel and MFS image have been loaded from a disk which has a valid /etc/fstab file. In this case, the variable should be manually overridden in the loader, or the kernel booted with -a. In either case "mfs:" should be supplied as the new value. Also fix a typo in one DFLTROOT case that would not have compiled.
# 52778	01-Nov-1999	msmith	This is a complete rewrite of vfs_conf.c, which changes the way the root filesystem is discovered. Preference is given to using the kernel environment variable vfs.root.mountfrom, which is set by the loader according to the contents of /etc/fstab. Changes in the MD code provide fallback mechanisms for systems not using the loader. A more robust fallback path is also provided, with the last recourse being to prompt on the console for a root device. These changes drastically simplify the machine-dependant parts of the root configuration process. In addition, support for CDROM root devices has been removed; it was a nasty hack and didn't work.
# 51388	19-Sep-1999	dillon	Fix BOOTP root FS mounts. Also cleanup vfs_getnewfsid() and collapse addaliasu() into addalias() (no operational change) and clarify comments relating to a trick that vclean() uses. The fix to BOOTP is yet another hack. Actually, rootfsid handling is already a major hack. The whole thing needs to be cleaned up. Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>
# 50477	27-Aug-1999	peter	$Id$ -> $FreeBSD$
# 48521	03-Jul-1999	peter	Fix warnings in last commit (dev_t is not an int, and not even int compatable in arg lists on the Alpha)
# 48512	03-Jul-1999	phk	Be more informative and try to ask the user in some instances if we can't figure out the root device.
# 48249	26-Jun-1999	peter	I'm tired of having a 'hanging root device'.. This isn't a "fix", just a workaround for a specific case where cam interrupts right in the middle of this printf.
# 47447	23-May-1999	jb	Back out my previous change (phk didn't like it) in favour of setting rootdev in the mfs initialisation code iff MFS_ROOT (which Bruce doesn't like). Damned if I do - damned if I don't.
# 47423	23-May-1999	jb	Make MFS_ROOT work again. MFS_ROOT means that rootdev is not set. Broken by: phk Problem ignored by: phk
# 39187	14-Sep-1998	sos	Remove the SLICE code. This clearly needs alot more thought, and we dont need this to hunt us down in 3.0-RELEASE.
# 36809	09-Jun-1998	bde	Pass lists of possible root devices and their names up to the machine-independent code and try mounting the devices in the lists instead of guessing alternative root devices in a machine- dependent way. autoconf.c: Reject preposterous slice numbers instead of silently converting them to COMPATIBILITY_SLICE. Don't forget to force slice = COMPATIBILITY_SLICE in the floppy device name. Eliminated most magic numbers and magic device names in setroot(). Fixed dozens of style bugs. vfs_conf.c: Put the actual root device name instead of "root_device" in the mount struct if the actual name is available. This is useful after booting with -s. If it were set in all cases then it could be used to do mount(8)'s ROOTSLICE_HUNT and fsck(8)'s hotroot guess better.
# 35323	20-Apr-1998	julian	Make the devfs SLICE option a standard type option. (hopefully it will go away eventually anyhow)
# 35319	19-Apr-1998	julian	Add changes and code to implement a functional DEVFS. This code will be turned on with the TWO options DEVFS and SLICE. (see LINT) Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes. /dev will be automatically mounted by init (thanks phk) on bootup. See /sys/dev/slice/slice.4 for more info. All code should act the same without these options enabled. Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others This code does not support the following: bad144 handling. Persistance. (My head is still hurting from the last time we discussed this) ATAPI flopies are not handled by the SLICE code yet. When this code is running, all major numbers are arbitrary and COULD be dynamically assigned. (this is not done, for POLA only) Minor numbers for disk slices ARE arbitray and dynamically assigned.
# 34479	10-Mar-1998	msmith	If the root mount fails from a device that is not the compatability slice of a disk, because that slice does not exist, try again mounting from the compatability slice. This handles the case where a disk has been initialised by 'disklabel auto', which places a bogus and invalid slice entry on the disk. The bootstrap is not smart enough to reject this slice, and pretends to boot from it. Believing the the bootstrap at this point is unwise. Booting from non-'wd' disks thus prepared is still broken, as 'disklabel -rwB xdN auto' does not initialise the disk type field, and the bootstrap mistakenly claims that the disk is handled by 'wd'. Behaviour is now consistent with DEVFS expected characteristics.
# 33181	09-Feb-1998	eivind	Staticize.
# 32358	09-Jan-1998	eivind	Make the BOOTP family new-style options (in opt_bootp.h)
# 31476	01-Dec-1997	julian	Cleanup my last patch here Reviewed by: sef@kthrup.com and phk@freebsd.org
# 31403	25-Nov-1997	julian	Shift a few SYSINT() calls around. this results in a few functions becoming static, and the SYSINITs being close to the code they are related to. setting up the dump device is with dumpsys() and kicking off the scheduler is with the scheduler. Mounting root is with the code that does it. Reviewed by: phk
# 31016	07-Nov-1997	phk	Remove a bunch of variables which were unused both in GENERIC and LINT. Found by: -Wunused
# 30468	16-Oct-1997	julian	We are mounting the root. mount it at the HEAD of the queue, DEVFS might already be there..
# 30354	12-Oct-1997	phk	Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde
# 22975	22-Feb-1997	peter	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 22521	10-Feb-1997	dyson	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
# 21673	14-Jan-1997	jkh	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# 12569	02-Dec-1995	bde	Finished (?) cleaning up sysinit stuff.
# 11921	29-Oct-1995	phk	Second batch of cleanup changes. This time mostly making a lot of things static and some unused variables here and there.
# 10653	09-Sep-1995	dg	Fixed init functions argument type - caddr_t -> void *. Fixed a couple of compiler warnings.
# 10427	29-Aug-1995	bde	Fix benign type mismatch in a sysinit function arg.
# 10358	28-Aug-1995	julian	Reviewed by: julian with quick glances by bruce and others Submitted by: terry (terry lambert) This is a composite of 3 patch sets submitted by terry. they are: New low-level init code that supports loadbal modules better some cleanups in the namei code to help terry in 16-bit character support some changes to the mount-root code to make it a little more modular.. NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able to test those cases.. certainly mounting root of disk still works just fine.. mfs should work but is untested. (tomorrows task) The low level init stuff includes a total rewrite of init_main.c to make it possible for new modules to have an init phase by simply adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can be added to the kernel without editing any other files other than the 'files' file.
# 4370	11-Nov-1994	phk	Make a kernel sans FFS possible.
# 2946	21-Sep-1994	wollman	Implemented loadable VFS modules, and made most existing filesystems loadable. (NFS is a notable exception.)
# 2893	19-Sep-1994	dfr	Added msdosfs. Obtained from: NetBSD
# 2152	20-Aug-1994	dg	Implemented filesystem clean bit via: machdep.c: Changed printf's a little and call vfs_unmountall() if the sync was successful. cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c: Allow dismount of root FS. It is now disallowed at a higher level. vfs_conf.c: Removed unused rootfs global. vfs_subr.c: Added new routines vfs_unmountall and vfs_unmountroot. Filesystems are now dismounted if the machine is properly rebooted. ffs_vfsops.c: Toggle clean bit at the appropriate places. Print warning if an unclean FS is mounted. ffs_vfsops.c, lfs_vfsops.c: Fix bug in selecting proper flags for VOP_CLOSE(). vfs_syscalls.c: Disallow dismounting root FS via umount syscall.
# 1817	02-Aug-1994	dg	Added $Id$
# 1541	24-May-1994	rgrimes	BSD 4.4 Lite Kernel Sources