Cross Reference: /freebsd-11-stable/sys/kern/vfs

History log of /freebsd-11-stable/sys/kern/vfs_mountroot.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 353717	18-Oct-2019	kp	MFC r353443 mountroot: run statfs after mounting devfs The usual flow for mounting a file system is to VFS_MOUNT() and then immediately VFS_STATFS(). That's not done in vfs_mountroot_devfs(), which means the mp->mnt_stat.f_iosize field is not correctly populated, which in turn causes us to mark valid aio operations as unsafe (because the io size is set to 0), ultimately causing the aio_test:md_waitcomplete test to fail. Sponsored by: Axiado
# 331722	29-Mar-2018	eadler	Revert r330897: This was intended to be a non-functional change. It wasn't. The commit message was thus wrong. In addition it broke arm, and merged crypto related code. Revert with prejudice. This revert skips files touched in r316370 since that commit was since MFCed. This revert also skips files that require $FreeBSD$ property changes. Thank you to those who helped me get out of this mess including but not limited to gonzo, kevans, rgrimes. Requested by: gjb (re)
# 331262	20-Mar-2018	ian	MFC r330745: Make root mount timeout logic work for filesystems other than ufs. The vfs.mountroot.timeout tunable and .timeout directive in a mount.conf(5) file allow specifying a wait timeout for the device(s) hosting the root filesystem to become usable. The current mechanism for waiting for devices and detecting their availability can't be used for zfs-hosted filesystems. See the comment #20 in the PR for some expanded detail on these points. This change adds retry logic to the actual root filesystem mount. That is, insted of relying on device availability using device name lookups, it uses the kernel_mount() call itself to detect whether the filesystem can be mounted, and loops until it succeeds or the configured timeout is exceeded. These changes are based on the patch attached to the PR, but it's rewritten enough that all mistakes belong to me. PR: 208882
# 330897	14-Mar-2018	eadler	Partial merge of the SPDX changes These changes are incomplete but are making it difficult to determine what other changes can/should be merged. No objections from: pfg
# 324268	04-Oct-2017	trasz	MFC r323183: Make root_mount_rel(9) ignore NULL arguments, like it used to before r313351. It would be better to fix API consumers to not pass NULL there - most of them, such as gmirror, already contain the neccessary checks - but this is easier and much less error-prone. One known user-visible result is that it fixes panic on a failed "graid label".
# 315540	19-Mar-2017	trasz	MFC r313351: Make root_mount_hold() work after boot. This is important for two reasons. First is rerooting into USB-mounted device that happens to be not yet enumerated. The second is when mounting with (non-root) filesystem on USB device on a hub that's enumerated later than the root mount: the rc scripts explicitly mount for the root mount holds to be released, but each USB bus takes the hold asynchronously, and if that happens after root mount, it would just get ignored.
# 315539	19-Mar-2017	trasz	MFC r313350: In r290196 the root mount hold mechanism was changed to make it not wait for mount hold release if the root device already exists. So, unless your rootdev is not on USB - ie in the usual case - the root mount won't wait for USB. However, the old behaviour was sometimes used as "wait until USB is fully enumerated", and r290196 broke that. This commit adds vfs.root_mount_always_wait tunable, to force the kernel to always wait for root mount holds, even if the root is already there. Relnotes: yes
# 310959	31-Dec-2016	mjg	MFC r305378,r305379,r305386,r305684,r306224,r306608,r306803,r307650,r307685, r308407,r308665,r308667,r309067: cache: put all negative entry management code into dedicated functions == cache: manage negative entry list with a dedicated lock Since negative entries are managed with a LRU list, a hit requires a modificaton. Currently the code tries to upgrade the global lock if needed and is forced to retry the lookup if it fails. Provide a dedicated lock for use when the cache is only shared-locked. == cache: defer freeing entries until after the global lock is dropped This also defers vdrop for held vnodes. == cache: improve scalability by introducing bucket locks An array of bucket locks is added. All modifications still require the global cache_lock to be held for writing. However, most readers only need the relevant bucket lock and in effect can run concurrently to the writer as long as they use a different lock. See the added comment for more details. This is an intermediate step towards removal of the global lock. == cache: get rid of the global lock Add a table of vnode locks and use them along with bucketlocks to provide concurrent modification support. The approach taken is to preserve the current behaviour of the namecache and just lock all relevant parts before any changes are made. Lookups still require the relevant bucket to be locked. == cache: ignore purgevfs requests for filesystems with few vnodes purgevfs is purely optional and induces lock contention in workloads which frequently mount and unmount filesystems. In particular, poudriere will do this for filesystems with 4 vnodes or less. Full cache scan is clearly wasteful. Since there is no explicit counter for namecache entries, the number of vnodes used by the target fs is checked. The default limit is the number of bucket locks. == (by kib) Limit scope of the optimization in r306608 to dounmount() caller only. Other uses of cache_purgevfs() do rely on the cache purge for correct operations, when paths are invalidated without unmount. == cache: split negative entry LRU into multiple lists This splits the ncneg_mtx lock while preserving the hit ratio at least during buildworld. Create N dedicated lists for new negative entries. Entries with at least one hit get promoted to the hot list, where they get requeued every M hits. Shrinking demotes one hot entry and performs a round-robin shrinking of regular lists. == cache: fix up a corner case in r307650 If no negative entry is found on the last list, the ncp pointer will be left uninitialized and a non-null value will make the function assume an entry was found. Fix the problem by initializing to NULL on entry. == (by kib) vn_fullpath1() checked VV_ROOT and then unreferenced vp->v_mount->mnt_vnodecovered unlocked. This allowed unmount to race. Lock vnode after we noticed the VV_ROOT flag. See comments for explanation why unlocked check for the flag is considered safe. == cache: fix a race between entry removal and demotion The negative list shrinker can demote an entry with only hotlist + neglist locks held. On the other hand entry removal possibly sets the NCF_DVDROP without aformentioned locks held prior to detaching it from the respective netlist., which can lose the update made by the shrinker. == cache: plug a write-only variable in cache_negative_zap_one == cache: ensure that the number of bucket locks does not exceed hash size The size can be changed by side effect of modifying kern.maxvnodes. Since numbucketlocks was not modified, setting a sufficiently low value would give more locks than actual buckets, which would then lead to corruption. Force the number of buckets to be not smaller. Note this should not matter for real world cases.
# 302408	07-Jul-2016	gjb	Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here. Additional commits post-branch will follow. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-11-stable/MAINTAINERS /freebsd-11-stable/cddl /freebsd-11-stable/cddl/contrib/opensolaris /freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print /freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs /freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs /freebsd-11-stable/contrib/amd /freebsd-11-stable/contrib/apr /freebsd-11-stable/contrib/apr-util /freebsd-11-stable/contrib/atf /freebsd-11-stable/contrib/binutils /freebsd-11-stable/contrib/bmake /freebsd-11-stable/contrib/byacc /freebsd-11-stable/contrib/bzip2 /freebsd-11-stable/contrib/com_err /freebsd-11-stable/contrib/compiler-rt /freebsd-11-stable/contrib/dialog /freebsd-11-stable/contrib/dma /freebsd-11-stable/contrib/dtc /freebsd-11-stable/contrib/ee /freebsd-11-stable/contrib/elftoolchain /freebsd-11-stable/contrib/elftoolchain/ar /freebsd-11-stable/contrib/elftoolchain/brandelf /freebsd-11-stable/contrib/elftoolchain/elfdump /freebsd-11-stable/contrib/expat /freebsd-11-stable/contrib/file /freebsd-11-stable/contrib/gcc /freebsd-11-stable/contrib/gcclibs/libgomp /freebsd-11-stable/contrib/gdb /freebsd-11-stable/contrib/gdtoa /freebsd-11-stable/contrib/groff /freebsd-11-stable/contrib/ipfilter /freebsd-11-stable/contrib/ldns /freebsd-11-stable/contrib/ldns-host /freebsd-11-stable/contrib/less /freebsd-11-stable/contrib/libarchive /freebsd-11-stable/contrib/libarchive/cpio /freebsd-11-stable/contrib/libarchive/libarchive /freebsd-11-stable/contrib/libarchive/libarchive_fe /freebsd-11-stable/contrib/libarchive/tar /freebsd-11-stable/contrib/libc++ /freebsd-11-stable/contrib/libc-vis /freebsd-11-stable/contrib/libcxxrt /freebsd-11-stable/contrib/libexecinfo /freebsd-11-stable/contrib/libpcap /freebsd-11-stable/contrib/libstdc++ /freebsd-11-stable/contrib/libucl /freebsd-11-stable/contrib/libxo /freebsd-11-stable/contrib/llvm /freebsd-11-stable/contrib/llvm/projects/libunwind /freebsd-11-stable/contrib/llvm/tools/clang /freebsd-11-stable/contrib/llvm/tools/lldb /freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump /freebsd-11-stable/contrib/llvm/tools/llvm-lto /freebsd-11-stable/contrib/mdocml /freebsd-11-stable/contrib/mtree /freebsd-11-stable/contrib/ncurses /freebsd-11-stable/contrib/netcat /freebsd-11-stable/contrib/ntp /freebsd-11-stable/contrib/nvi /freebsd-11-stable/contrib/one-true-awk /freebsd-11-stable/contrib/openbsm /freebsd-11-stable/contrib/openpam /freebsd-11-stable/contrib/openresolv /freebsd-11-stable/contrib/pf /freebsd-11-stable/contrib/sendmail /freebsd-11-stable/contrib/serf /freebsd-11-stable/contrib/sqlite3 /freebsd-11-stable/contrib/subversion /freebsd-11-stable/contrib/tcpdump /freebsd-11-stable/contrib/tcsh /freebsd-11-stable/contrib/tnftp /freebsd-11-stable/contrib/top /freebsd-11-stable/contrib/top/install-sh /freebsd-11-stable/contrib/tzcode/stdtime /freebsd-11-stable/contrib/tzcode/zic /freebsd-11-stable/contrib/tzdata /freebsd-11-stable/contrib/unbound /freebsd-11-stable/contrib/vis /freebsd-11-stable/contrib/wpa /freebsd-11-stable/contrib/xz /freebsd-11-stable/crypto/heimdal /freebsd-11-stable/crypto/openssh /freebsd-11-stable/crypto/openssl /freebsd-11-stable/gnu/lib /freebsd-11-stable/gnu/usr.bin/binutils /freebsd-11-stable/gnu/usr.bin/cc/cc_tools /freebsd-11-stable/gnu/usr.bin/gdb /freebsd-11-stable/lib/libc/locale/ascii.c /freebsd-11-stable/sys/cddl/contrib/opensolaris /freebsd-11-stable/sys/contrib/dev/acpica /freebsd-11-stable/sys/contrib/ipfilter /freebsd-11-stable/sys/contrib/libfdt /freebsd-11-stable/sys/contrib/octeon-sdk /freebsd-11-stable/sys/contrib/x86emu /freebsd-11-stable/sys/contrib/xz-embedded /freebsd-11-stable/usr.sbin/bhyve/atkbdc.h /freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c /freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h /freebsd-11-stable/usr.sbin/bhyve/console.c /freebsd-11-stable/usr.sbin/bhyve/console.h /freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c /freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c /freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h /freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c /freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h /freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c /freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h /freebsd-11-stable/usr.sbin/bhyve/rfb.c /freebsd-11-stable/usr.sbin/bhyve/rfb.h /freebsd-11-stable/usr.sbin/bhyve/sockstream.c /freebsd-11-stable/usr.sbin/bhyve/sockstream.h /freebsd-11-stable/usr.sbin/bhyve/usb_emul.c /freebsd-11-stable/usr.sbin/bhyve/usb_emul.h /freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c /freebsd-11-stable/usr.sbin/bhyve/vga.c /freebsd-11-stable/usr.sbin/bhyve/vga.h
# 299523	12-May-2016	trasz	Stop hiding errors that result in failure to mount /dev. Otherwise, missing /dev directory makes one end up with a completely deaf (init without stdout/stderr) system with no hints on the console, unless you've booted up with bootverbose. MFC after: 1 month Sponsored by: The FreeBSD Foundation
# 298819	29-Apr-2016	pfg	sys/kern: spelling fixes in comments. No functional change.
# 297190	22-Mar-2016	trasz	Wait for root mount tokens before showing the root mount prompt. This restores the pre-r290196 behaviour, eliminating the need to manually press '.' a couple of times to get USB to finish probing. Note that there's still something wrong with the console (character echoing doesn't quite work), and there's also a reported problem with BHyVe, but those two don't seem related to the problem above. MFC after: 1 month Sponsored by: The FreeBSD Foundation
# 290197	30-Oct-2015	trasz	After r290196, the kernel won't wait for stuff like gmirror nodes if they are not required for mounting rootfs. However, it's possible that some setups try to mount them in mountcritlocal (ie from fstab). Export the list of current root mount holds using a new sysctl, vfs.root_mount_hold, and make mountcritlocal retry if "mount -a" fails and the list is not empty. MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3709
# 290196	30-Oct-2015	trasz	Make root mount wait mechanism smarter, by making it wait only if the root device doesn't yet exist. Reviewed by: kib@, marcel@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3709
# 289449	17-Oct-2015	ngie	Replace /dev/acd0 with /dev/cd1 atapicd(4) has been removed since r249083, and if a system has more than one optical drive, it will likely be /dev/cd1 Update mount.conf(8) to reflect the change in behavior MFC after: never Sponsored by: EMC / Isilon Storage Division
# 289064	09-Oct-2015	trasz	Remove root_mount_wait(). It's not used anywhere. Reviewed by: bapt@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3787
# 288091	22-Sep-2015	bdrewery	vfs_mountroot_shuffle() never returns non-zero.
# 287964	18-Sep-2015	trasz	Kernel part of reroot support - a way to change rootfs without reboot. Note that the mountlist manipulations are somewhat fragile, and not very pretty. The reason for this is to avoid changing vfs_mountroot(), which is (obviously) rather mission-critical, but not very well documented, and thus hard to test properly. It might be possible to rework it to use its own simple root mount mechanism instead of vfs_mountroot(). Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D2698
# 287190	27-Aug-2015	marcel	An error of -1 from parse_mount() indicates that the specification was invalid. Don't trigger a mount failure (which by default means a panic), but instead just move on to the next directive in the configuration. This typically has us ask for the root mount. PR: 163245
# 287107	24-Aug-2015	trasz	Make vfs_unmountall() unmount /dev after /, not before. The only reason this didn't result in an unclean shutdown is that devfs ignores MNT_FORCE flag. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3467
# 274476	13-Nov-2014	kib	Remove the no-at variants of the kern_xx() syscall helpers. E.g., we have both kern_open() and kern_openat(); change the callers to use kern_openat(). This removes one (sometimes two) levels of indirection and consolidates arguments checks. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 273174	16-Oct-2014	davide	Follow up to r225617. In order to maximize the re-usability of kernel code in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv(). This fixes a namespace collision with libc symbols. Submitted by: kmacy Tested by: make universe
# 267351	11-Jun-2014	mav	Move root_mount_hold() functionality to separate mutex. It has nothing to share with mutex protecting list of mounted file systems.
# 259892	25-Dec-2013	dim	In sys/kern/vfs_mountroot.c, remove static function parse_isspace(), which is unused since r214006. MFC after: 3 days
# 255412	09-Sep-2013	delphij	In r243868, the error message buffer errmsg have been changed from an on-stack array to a pointer and therefore sizeof(errmsg) would become 4 or 8 bytes depending on the architecture. Fix this by using ERRMSGL in place of sizeof(). Submitted by: J David <j.david.lists@gmail.com> MFC after: 3 days Approved by: re (kib)
# 253910	03-Aug-2013	marcel	Add a tunable for the default timeout.
# 253847	31-Jul-2013	ian	Changes to allow using BOOTP_NFSROOT and mounting an nfs root filesystem other than the one specified by the BOOTP server. This configures NFS using the BOOTP protocol while also respecting other root-path options such as setting vfs.root.mountfrom in the environment or using the RB_DFLTROOT boot option. It allows you to override the root path provided by the server, or to supply a root path when the server provides IP configuration but no root path info. This maintains the historical BOOTP_NFSROOT behavior of panicking on a failure to mount the root path provided by the server, unless you've provided an alternative via the ROOTDEVNAME kernel option or by setting vfs.root.mountfrom. The behavior of panicking when given no other options is preserved because it amounts to a bit of a retry loop that could eventually recover from a transient network or server problem. The user can now override the root path from loader(8) even if the kernel is compiled with BOOTP_NFSROOT. If vfs.root.mountfrom is set in the environment it is used unconditionally -- it always overrides the BOOTP info. If it begins with [old]nfs: then the BOOTP code uses it instead of the server-provided info. If it specifies some other filesystem then the bootp code will not panic like it used to and the code in vfs_mountroot.c will invoke the right filesystem to do the mount. If the kernel is compiled with the ROOTDEVNAME option, then that name is used by the BOOTP code if either * The server doesn't provide a pathname. * The boothowto flags include RB_DFLTROOT. The latter allows the user to compile in alternate path in ROOTDEVNAME such as ufs:/dev/da0s1a and boot from that path by setting boot_dftlroot=1 in loader(8) or using the '-r' option in boot(8). The one thing not provided here is automatic failover from a server-provided path to a compiled-in one without the user manually requesting that. The code just isn't currently structured in a way that makes that possible with a lot of rewrite. I think the ability to set vfs.root.mountfrom and to use ROOTDEVNAME automatically when the server doesn't provide a name covers the most common needs. A set of patches submitted by Lars Eggert provided the part I couldn't figure out by myself when I tried to do this last year; many thanks. Reviewed by: rodrigc
# 248645	23-Mar-2013	avg	post mountroot event after a real/final root is mounted not every time an intermediate root (including the first devfs) is mounted. This is also consistent with waking up via root_mount_complete. Reviewed by: jhb MFC after: 13 days
# 243868	04-Dec-2012	kib	Do not allocate buffer of the 255 bytes length on the stack. Reported and tested by: sig6247@gmail.com MFC after: 1 week
# 241896	22-Oct-2012	kib	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
# 231949	20-Feb-2012	kib	Fix found places where uio_resid is truncated to int. Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month
# 228634	17-Dec-2011	avg	replace uses of libkern gets with cngets MFC after: 2 months
# 226673	23-Oct-2011	marcel	Don't terminate the interactive root mount prompt on mount failure. This restores the previous behaviour. While here, match '?' and '.' inputs exactly and improve the error message. Requested by: avg@ Derived from a patch by: Arnaud Lacombe <lacombar@gmail.com>
# 223919	11-Jul-2011	ae	Include sys/sbuf.h directly.
# 217163	08-Jan-2011	nwhitehorn	Make RB_CDROM work. This should probably check for a disc in cd1 and acd1 as well.
# 215299	14-Nov-2010	ed	Add support for asterisk characters when filling in the GELI password during boot. Change the last argument of gets() to indicate a visibility flag and add definitions for the numerical constants. Except for the value 2, gets() will behave exactly the same, so existing consumers shouldn't break. We only use it in two places, though. Submitted by: lme (older version)
# 214067	19-Oct-2010	ae	ZFS pool name is not a real device in devfs. Do not wait for device appear when mounting root from ZFS. Reviewed by: marcel Approved by: mav (mentor)
# 214006	18-Oct-2010	marcel	Re-implement the root mount logic using a recursive approach, whereby each root file system (starting with devfs and a synthesized configuration) can contain directives for mounting another file system as root. The old root file system is re-mounted under the new root file system (with /.mount or /mnt as the mount point) to allow access to the underlying file system. The configuration allows for creating vnode-backed memory disks that can subsequently be mounted as root. This allows for an efficient and low- cost way to distribute and boot FreeBSD software images that reside on some storage media. When trying a mount, the kernel will wait for the device in question to arrive. The timeout is configurable and is part of the configuration. This allows arbitrarily complex GEOM configurations to be constructed on the fly. A side-effect of this change is that all root specifications, whether compiled into the kernel or typed at the prompt can contain root mount options.
# 213365	02-Oct-2010	marcel	Split the root mount logic from the (generic) mount code and move it (the root mount code) into a new file called vfs_mountroot.c The split is almost trivial, as the code is almost perfectly non-intertwined. The only adjustment needed was to move the UMA zone allocation out of vfs_mountroot() [in vfs_mountroot.c] and into vfs_mount.c, where it had to be done as a SYSINIT [see vfs_mount_init()]. There are no functional changes with this commit.