History log of /freebsd-11-stable/sys/kern/vfs_mountroot.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 353717 18-Oct-2019 kp

MFC r353443

mountroot: run statfs after mounting devfs

The usual flow for mounting a file system is to VFS_MOUNT() and then
immediately VFS_STATFS().

That's not done in vfs_mountroot_devfs(), which means the
mp->mnt_stat.f_iosize field is not correctly populated, which in turn
causes us to mark valid aio operations as unsafe (because the io size is
set to 0), ultimately causing the aio_test:md_waitcomplete test to fail.

Sponsored by: Axiado


# 331722 29-Mar-2018 eadler

Revert r330897:

This was intended to be a non-functional change. It wasn't. The commit
message was thus wrong. In addition it broke arm, and merged crypto
related code.

Revert with prejudice.

This revert skips files touched in r316370 since that commit was since
MFCed. This revert also skips files that require $FreeBSD$ property
changes.

Thank you to those who helped me get out of this mess including but not
limited to gonzo, kevans, rgrimes.

Requested by: gjb (re)


# 331262 20-Mar-2018 ian

MFC r330745:

Make root mount timeout logic work for filesystems other than ufs.

The vfs.mountroot.timeout tunable and .timeout directive in a mount.conf(5)
file allow specifying a wait timeout for the device(s) hosting the root
filesystem to become usable. The current mechanism for waiting for devices
and detecting their availability can't be used for zfs-hosted filesystems.
See the comment #20 in the PR for some expanded detail on these points.

This change adds retry logic to the actual root filesystem mount. That is,
insted of relying on device availability using device name lookups, it uses
the kernel_mount() call itself to detect whether the filesystem can be
mounted, and loops until it succeeds or the configured timeout is exceeded.

These changes are based on the patch attached to the PR, but it's rewritten
enough that all mistakes belong to me.

PR: 208882


# 330897 14-Mar-2018 eadler

Partial merge of the SPDX changes

These changes are incomplete but are making it difficult
to determine what other changes can/should be merged.

No objections from: pfg


# 324268 04-Oct-2017 trasz

MFC r323183:

Make root_mount_rel(9) ignore NULL arguments, like it used to before r313351.
It would be better to fix API consumers to not pass NULL there - most of them,
such as gmirror, already contain the neccessary checks - but this is easier
and much less error-prone.

One known user-visible result is that it fixes panic on a failed "graid label".


# 315540 19-Mar-2017 trasz

MFC r313351:

Make root_mount_hold() work after boot. This is important for two
reasons. First is rerooting into USB-mounted device that happens
to be not yet enumerated. The second is when mounting with (non-root)
filesystem on USB device on a hub that's enumerated later than the root
mount: the rc scripts explicitly mount for the root mount holds to be
released, but each USB bus takes the hold asynchronously, and if that
happens after root mount, it would just get ignored.


# 315539 19-Mar-2017 trasz

MFC r313350:

In r290196 the root mount hold mechanism was changed to make it not wait
for mount hold release if the root device already exists. So, unless your
rootdev is not on USB - ie in the usual case - the root mount won't wait
for USB. However, the old behaviour was sometimes used as "wait until USB
is fully enumerated", and r290196 broke that.

This commit adds vfs.root_mount_always_wait tunable, to force the kernel
to always wait for root mount holds, even if the root is already there.

Relnotes: yes


# 310959 31-Dec-2016 mjg

MFC r305378,r305379,r305386,r305684,r306224,r306608,r306803,r307650,r307685,
r308407,r308665,r308667,r309067:

cache: put all negative entry management code into dedicated functions

==
cache: manage negative entry list with a dedicated lock

Since negative entries are managed with a LRU list, a hit requires a
modificaton.

Currently the code tries to upgrade the global lock if needed and is
forced to retry the lookup if it fails.

Provide a dedicated lock for use when the cache is only shared-locked.

==

cache: defer freeing entries until after the global lock is dropped

This also defers vdrop for held vnodes.

==

cache: improve scalability by introducing bucket locks

An array of bucket locks is added.

All modifications still require the global cache_lock to be held for
writing. However, most readers only need the relevant bucket lock and in
effect can run concurrently to the writer as long as they use a
different lock. See the added comment for more details.

This is an intermediate step towards removal of the global lock.

==

cache: get rid of the global lock

Add a table of vnode locks and use them along with bucketlocks to provide
concurrent modification support. The approach taken is to preserve the
current behaviour of the namecache and just lock all relevant parts before
any changes are made.

Lookups still require the relevant bucket to be locked.

==

cache: ignore purgevfs requests for filesystems with few vnodes

purgevfs is purely optional and induces lock contention in workloads
which frequently mount and unmount filesystems.

In particular, poudriere will do this for filesystems with 4 vnodes or
less. Full cache scan is clearly wasteful.

Since there is no explicit counter for namecache entries, the number of
vnodes used by the target fs is checked.

The default limit is the number of bucket locks.

== (by kib)

Limit scope of the optimization in r306608 to dounmount() caller only.
Other uses of cache_purgevfs() do rely on the cache purge for correct
operations, when paths are invalidated without unmount.

==

cache: split negative entry LRU into multiple lists

This splits the ncneg_mtx lock while preserving the hit ratio at least
during buildworld.

Create N dedicated lists for new negative entries.

Entries with at least one hit get promoted to the hot list, where they
get requeued every M hits.

Shrinking demotes one hot entry and performs a round-robin shrinking of
regular lists.

==

cache: fix up a corner case in r307650

If no negative entry is found on the last list, the ncp pointer will be
left uninitialized and a non-null value will make the function assume an
entry was found.

Fix the problem by initializing to NULL on entry.

== (by kib)

vn_fullpath1() checked VV_ROOT and then unreferenced
vp->v_mount->mnt_vnodecovered unlocked. This allowed unmount to race.
Lock vnode after we noticed the VV_ROOT flag. See comments for
explanation why unlocked check for the flag is considered safe.

==

cache: fix a race between entry removal and demotion

The negative list shrinker can demote an entry with only hotlist + neglist
locks held. On the other hand entry removal possibly sets the NCF_DVDROP
without aformentioned locks held prior to detaching it from the respective
netlist., which can lose the update made by the shrinker.

==

cache: plug a write-only variable in cache_negative_zap_one

==

cache: ensure that the number of bucket locks does not exceed hash size

The size can be changed by side effect of modifying kern.maxvnodes.

Since numbucketlocks was not modified, setting a sufficiently low value
would give more locks than actual buckets, which would then lead to
corruption.

Force the number of buckets to be not smaller.

Note this should not matter for real world cases.


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 299523 12-May-2016 trasz

Stop hiding errors that result in failure to mount /dev. Otherwise,
missing /dev directory makes one end up with a completely deaf (init
without stdout/stderr) system with no hints on the console, unless
you've booted up with bootverbose.

MFC after: 1 month
Sponsored by: The FreeBSD Foundation


# 298819 29-Apr-2016 pfg

sys/kern: spelling fixes in comments.

No functional change.


# 297190 22-Mar-2016 trasz

Wait for root mount tokens before showing the root mount prompt.
This restores the pre-r290196 behaviour, eliminating the need to manually
press '.' a couple of times to get USB to finish probing.

Note that there's still something wrong with the console (character
echoing doesn't quite work), and there's also a reported problem with
BHyVe, but those two don't seem related to the problem above.

MFC after: 1 month
Sponsored by: The FreeBSD Foundation


# 290197 30-Oct-2015 trasz

After r290196, the kernel won't wait for stuff like gmirror nodes
if they are not required for mounting rootfs. However, it's possible
that some setups try to mount them in mountcritlocal (ie from fstab).

Export the list of current root mount holds using a new sysctl,
vfs.root_mount_hold, and make mountcritlocal retry if "mount -a" fails
and the list is not empty.

MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3709


# 290196 30-Oct-2015 trasz

Make root mount wait mechanism smarter, by making it wait only if the root
device doesn't yet exist.

Reviewed by: kib@, marcel@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3709


# 289449 17-Oct-2015 ngie

Replace /dev/acd0 with /dev/cd1

atapicd(4) has been removed since r249083, and if a system has more than one
optical drive, it will likely be /dev/cd1

Update mount.conf(8) to reflect the change in behavior

MFC after: never
Sponsored by: EMC / Isilon Storage Division


# 289064 09-Oct-2015 trasz

Remove root_mount_wait(). It's not used anywhere.

Reviewed by: bapt@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3787


# 288091 22-Sep-2015 bdrewery

vfs_mountroot_shuffle() never returns non-zero.


# 287964 18-Sep-2015 trasz

Kernel part of reroot support - a way to change rootfs without reboot.

Note that the mountlist manipulations are somewhat fragile, and not very
pretty. The reason for this is to avoid changing vfs_mountroot(), which
is (obviously) rather mission-critical, but not very well documented,
and thus hard to test properly. It might be possible to rework it to use
its own simple root mount mechanism instead of vfs_mountroot().

Reviewed by: kib@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D2698


# 287190 27-Aug-2015 marcel

An error of -1 from parse_mount() indicates that the specification
was invalid. Don't trigger a mount failure (which by default means
a panic), but instead just move on to the next directive in the
configuration. This typically has us ask for the root mount.

PR: 163245


# 287107 24-Aug-2015 trasz

Make vfs_unmountall() unmount /dev after /, not before. The only
reason this didn't result in an unclean shutdown is that devfs ignores
MNT_FORCE flag.

Reviewed by: kib@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3467


# 274476 13-Nov-2014 kib

Remove the no-at variants of the kern_xx() syscall helpers. E.g., we
have both kern_open() and kern_openat(); change the callers to use
kern_openat().

This removes one (sometimes two) levels of indirection and
consolidates arguments checks.

Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 273174 16-Oct-2014 davide

Follow up to r225617. In order to maximize the re-usability of kernel code
in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv().
This fixes a namespace collision with libc symbols.

Submitted by: kmacy
Tested by: make universe


# 267351 11-Jun-2014 mav

Move root_mount_hold() functionality to separate mutex.

It has nothing to share with mutex protecting list of mounted file systems.


# 259892 25-Dec-2013 dim

In sys/kern/vfs_mountroot.c, remove static function parse_isspace(),
which is unused since r214006.

MFC after: 3 days


# 255412 09-Sep-2013 delphij

In r243868, the error message buffer errmsg have been changed from
an on-stack array to a pointer and therefore sizeof(errmsg) would
become 4 or 8 bytes depending on the architecture.

Fix this by using ERRMSGL in place of sizeof().

Submitted by: J David <j.david.lists@gmail.com>
MFC after: 3 days
Approved by: re (kib)


# 253910 03-Aug-2013 marcel

Add a tunable for the default timeout.


# 253847 31-Jul-2013 ian

Changes to allow using BOOTP_NFSROOT and mounting an nfs root filesystem
other than the one specified by the BOOTP server. This configures NFS
using the BOOTP protocol while also respecting other root-path options such
as setting vfs.root.mountfrom in the environment or using the RB_DFLTROOT
boot option. It allows you to override the root path provided by the
server, or to supply a root path when the server provides IP configuration
but no root path info.

This maintains the historical BOOTP_NFSROOT behavior of panicking on a
failure to mount the root path provided by the server, unless you've
provided an alternative via the ROOTDEVNAME kernel option or by setting
vfs.root.mountfrom. The behavior of panicking when given no other options
is preserved because it amounts to a bit of a retry loop that could
eventually recover from a transient network or server problem.

The user can now override the root path from loader(8) even if the
kernel is compiled with BOOTP_NFSROOT. If vfs.root.mountfrom is set in
the environment it is used unconditionally -- it always overrides the
BOOTP info. If it begins with [old]nfs: then the BOOTP code uses it
instead of the server-provided info. If it specifies some other
filesystem then the bootp code will not panic like it used to and the code
in vfs_mountroot.c will invoke the right filesystem to do the mount.

If the kernel is compiled with the ROOTDEVNAME option, then that name is
used by the BOOTP code if either
* The server doesn't provide a pathname.
* The boothowto flags include RB_DFLTROOT.
The latter allows the user to compile in alternate path in ROOTDEVNAME
such as ufs:/dev/da0s1a and boot from that path by setting
boot_dftlroot=1 in loader(8) or using the '-r' option in boot(8).

The one thing not provided here is automatic failover from a
server-provided path to a compiled-in one without the user manually
requesting that. The code just isn't currently structured in a way that
makes that possible with a lot of rewrite. I think the ability to set
vfs.root.mountfrom and to use ROOTDEVNAME automatically when the server
doesn't provide a name covers the most common needs.

A set of patches submitted by Lars Eggert provided the part I couldn't
figure out by myself when I tried to do this last year; many thanks.

Reviewed by: rodrigc


# 248645 23-Mar-2013 avg

post mountroot event after a real/final root is mounted

not every time an intermediate root (including the first devfs) is
mounted.
This is also consistent with waking up via root_mount_complete.

Reviewed by: jhb
MFC after: 13 days


# 243868 04-Dec-2012 kib

Do not allocate buffer of the 255 bytes length on the stack.

Reported and tested by: sig6247@gmail.com
MFC after: 1 week


# 241896 22-Oct-2012 kib

Remove the support for using non-mpsafe filesystem modules.

In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by: attilio
Tested by: pho


# 231949 20-Feb-2012 kib

Fix found places where uio_resid is truncated to int.

Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with: bde, das (previous versions)
MFC after: 1 month


# 228634 17-Dec-2011 avg

replace uses of libkern gets with cngets

MFC after: 2 months


# 226673 23-Oct-2011 marcel

Don't terminate the interactive root mount prompt on mount failure.
This restores the previous behaviour. While here, match '?' and '.'
inputs exactly and improve the error message.

Requested by: avg@
Derived from a patch by: Arnaud Lacombe <lacombar@gmail.com>


# 223919 11-Jul-2011 ae

Include sys/sbuf.h directly.


# 217163 08-Jan-2011 nwhitehorn

Make RB_CDROM work. This should probably check for a disc in cd1 and acd1
as well.


# 215299 14-Nov-2010 ed

Add support for asterisk characters when filling in the GELI password
during boot.

Change the last argument of gets() to indicate a visibility flag and add
definitions for the numerical constants. Except for the value 2, gets()
will behave exactly the same, so existing consumers shouldn't break. We
only use it in two places, though.

Submitted by: lme (older version)


# 214067 19-Oct-2010 ae

ZFS pool name is not a real device in devfs. Do not wait for
device appear when mounting root from ZFS.

Reviewed by: marcel
Approved by: mav (mentor)


# 214006 18-Oct-2010 marcel

Re-implement the root mount logic using a recursive approach, whereby each
root file system (starting with devfs and a synthesized configuration) can
contain directives for mounting another file system as root. The old root
file system is re-mounted under the new root file system (with /.mount or
/mnt as the mount point) to allow access to the underlying file system.

The configuration allows for creating vnode-backed memory disks that can
subsequently be mounted as root. This allows for an efficient and low-
cost way to distribute and boot FreeBSD software images that reside on
some storage media.

When trying a mount, the kernel will wait for the device in question to
arrive. The timeout is configurable and is part of the configuration.
This allows arbitrarily complex GEOM configurations to be constructed
on the fly.

A side-effect of this change is that all root specifications, whether
compiled into the kernel or typed at the prompt can contain root mount
options.


# 213365 02-Oct-2010 marcel

Split the root mount logic from the (generic) mount code and move
it (the root mount code) into a new file called vfs_mountroot.c

The split is almost trivial, as the code is almost perfectly
non-intertwined. The only adjustment needed was to move the UMA
zone allocation out of vfs_mountroot() [in vfs_mountroot.c] and
into vfs_mount.c, where it had to be done as a SYSINIT [see
vfs_mount_init()].

There are no functional changes with this commit.