History log of /freebsd-current/sys/kern/vfs_mount.c
Revision Date Author Comments
# 21ccdb41 14-May-2024 Konstantin Belousov <kib@FreeBSD.org>

vfs_domount_update(): postpone setting MNT_UNION until VFS_MOUNT() is done

The file system that handles updating the mount point might do lookups
during the update, in which case it could find the flag MNT_UNION set on
the mp while mount point is still not updated. In particular, the
rootvp->v_mount->mnt_vnodecovered is not yet set.

Delay setting MNT_UNION until the mount is performed.

PR: 265311
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: mckusick, olce
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D45208


# 5a061a38 15-May-2024 Konstantin Belousov <kib@FreeBSD.org>

vfs_domount_update(): style, use space instead of tab

Noted by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 61cc4830 18-Jan-2024 Alfredo Mazzinghi <am2419@cl.cam.ac.uk>

Abstract UIO allocation and deallocation.

Introduce the allocuio() and freeuio() functions to allocate and
deallocate struct uio. This hides the actual allocator interface, so it
is easier to modify the sub-allocation layout of struct uio and the
corresponding iovec array.

Obtained from: CheriBSD
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: CHaOS, EPSRC grant EP/V000292/1
Differential Revision: https://reviews.freebsd.org/D43711


# 099d25c3 25-Dec-2023 Mark Johnston <markj@FreeBSD.org>

nmount: Ignore errors when copying out an error string

In general we copy error strings as part of reporting an error from
lower layers, so if the copyout() fails there's nothing to do since we'd
prefer to preserve the original error.

This is in preparation for annotating copyin() and related functions
with __result_use_check.

Reviewed by: olce, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D43147


# 2a1d50fc 24-Dec-2023 Andrew Gierth <andrew@tao146.riddles.org.uk>

vfs_domount_update(): correct fsidcmp() usage

MFC after: 3 days


# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# f5f27772 23-Nov-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix NFS access to .zfs/snapshot snapshots

When a process attempts to access a snapshot under
/<dataset>/.zfs/snapshot, the snapshot is automounted.
However, without this patch, the automount does not
set mnt_exjail, which results in the snapshot not being
accessible over NFS.

This patch defines a new function called vfs_exjail_clone()
which sets mnt_exjail from another mount point and
then uses that function to set mnt_exjail in the snapshot
automount. A separate patch that is currently a pull request
for OpenZFS, calls this function to fix the problem.

PR: 275200
Reviewed by: markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42672


# 3eed4803 18-Nov-2023 John Baldwin <jhb@FreeBSD.org>

vfs mount: Consistently use ENODEV internally for an invalid fstype

Change vfs_byname_kld to always return an error value of ENODEV to
indicate an unsupported fstype leaving ENOENT to indicate errors such
as a missing mount point or invalid path. This allows nmount(2) to
better distinguish these cases and avoid treating a missing device
node as an invalid fstype after commit 6e8272f317b8.

While here, change mount(2) to return EINVAL instead of ENODEV for an
invalid fstype to match nmount(2).

PR: 274600
Reviewed by: pstef, markj
Differential Revision: https://reviews.freebsd.org/D42327


# ede4c412 09-Nov-2023 Konstantin Belousov <kib@FreeBSD.org>

vfs_domount_update(): ensure that 'goto end' works

We need to vfs_op_enter()/vn_seqc_write_start() before jumping to
cleanup.

PR: 274992
Reported by: trasz
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Fixes: 9ef7a491a4236810e50f0a2ee8d52f5c4bb02c64


# 9ef7a491 29-Sep-2023 Konstantin Belousov <kib@FreeBSD.org>

nmount(MNT_UPDATE): add optional generid fsid parameter

to check looked up path against specific mounted filesystem.

Reviewed by: mjg
Tested by: Andrew Gierth <andrew@tao146.riddles.org.uk>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42023


# c584bb9c 19-Sep-2023 Konstantin Belousov <kib@FreeBSD.org>

vfs_remount_ro(): mnt_lockref should be only accessed after vfs_op_enter()

PR: 273953
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 2544b8e0 28-Apr-2023 Olivier Certner <olce.freebsd@certner.fr>

vfs: Rename vfs_emptydir() to vn_dir_check_empty()

No functional change. While here, adapt comments to style(9).

Reviewed by: kib
MFC after: 1 week


# bb24eaea 05-Apr-2023 Konstantin Belousov <kib@FreeBSD.org>

vn_lock_pair(): allow to request shared locking

If either of vnodes is shared locked, lock must not be recursed.

Requested by: rmacklem
Reviewed by: markj, rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39444


# 4bbbd587 02-Mar-2023 Rick Macklem <rmacklem@FreeBSD.org>

vfs_mount.c: Allow mountd(8) to do exports in a vnet prison

To run mountd in a vnet prison, three checks in vfs_domount()
and vfs_domount_update() related to doing exports needed
to be changed, so that a file system visible within the
prison but mounted outside the prison can be exported.

I did all three in a minimal way, only changing the checks for
the specific case of a process (typically mountd) doing exports
within a vnet prison and not updating the mount point in other
ways. The changes are:
- Ignore the error return from vfs_suser(), since the file
system being mounted outside the prison will cause it to fail.
- Use the priv_check(PRIV_NFS_DAEMON) for this specific case
within a prison.
- Skip the call to VFS_MOUNT(), since it will return an error,
due to the "from" argument not being set correctly. VFS_MOUNT()
does not appear to do anything for the case of doing exports only.

Reviewed by: markj
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D37741


# 88175af8 21-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

vfs_export: Add mnt_exjail to control exports done in prisons

If there are multiple instances of mountd(8) (in different
prisons), there will be confusion if they manipulate the
exports of the same file system. This patch adds mnt_exjail
to "struct mount" so that the credentials (and, therefore,
the prison) that did the exports for that file system can
be recorded. If another prison has already exported the
file system, vfs_export() will fail with an error.
If mnt_exjail == NULL, the file system has not been exported.
mnt_exjail is checked by the NFS server, so that exports done
from within a different prison will not be used.

The patch also implements vfs_exjail_destroy(), which is
called from prison_cleanup() to release all the mnt_exjail
credential references, so that the prison can be removed.
Mainly to avoid doing a scan of the mountlist for the case
where there were no exports done from within the prison,
a count of how many file systems have been exported from
within the prison is kept in pr_exportcnt.

Reviewed by: markj
Discussed with: jamie
Differential Revision: https://reviews.freebsd.org/D38371
MFC after: 3 months


# db565512 04-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

vfs_mount.c: Free exports structures in vfs_destroy_mount()

During testing of exporting file systems in jails, I
noticed that the export structures on a mount
were not being free'd when the mount is dismounted.

This bug appears to have been in the system for a
very long time. It would have resulted in a slow memory
leak when exported file systems were dismounted.

Prior to r362158, freeing the structures during dismount
would not have been safe, since VFS_CHECKEXP() returned
a pointer into an export structure, which might still have been
used by the NFS server for an in-progress RPC when the file system
is dismounted. r362158 fixed this, so it should now be safe
to free the structures in vfs_mount_destroy(), which is what
this patch does.

Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D38385


# 71e9be1b 06-Dec-2022 Doug Rabson <dfr@FreeBSD.org>

Don't allow stacking of file mounts

Reviewed by: mjg, kib
Tested by: pho


# a1d74b2d 04-Dec-2022 Doug Rabson <dfr@FreeBSD.org>

Allow realpath to work for file mounts

For file mounts, the directory vnode is not available from namei and this
prevents the use of vn_fullpath_hardlink. In this case, we can use the
vnode which was covered by the file mount with vn_fullpath.

This also disallows file mounts over files with link counts greater than
one to ensure a deterministic path to the mount point.

Reviewed by: mjg, kib
Tested by: pho


# 521fbb72 23-Nov-2022 Doug Rabson <dfr@FreeBSD.org>

Add support for mounting single files in nullfs

The main use-case for this is to support mounting config files and
secrets into OCI containers. My current workaround copies the files into
the container which is messy and risks secrets leaking into container
images if the cleanup fails.

This adds a VFCF flag to indicate whether the filesystem supports file
mounts and allows fspath to be either a directory or a file if the flag
is set.

Test Plan:
$ sudo mkdir -p /mnt
$ sudo touch /mnt/foo
$ sudo mount -t nullfs /COPYRIGHT /mnt/foo

Reviewed by: mjg, kib
Tested by: pho


# 195f1b12 16-Dec-2022 Rick Macklem <rmacklem@FreeBSD.org>

vfs_mount.c: fix vfs_domount() for PRIV_VFS_MOUNT_EXPORTED

It appears that, prior to r158857 vfs_domount() checked
suser() when MNT_EXPORTED was specified.

r158857 appears to have broken this, since MNT_EXPORTED
was no longer set when mountd.c was converted to use nmount(2).
r164033 replaced the suser() check with
priv_check(td, PRIV_VFS_MOUNT_EXPORTED), which does the
same thing (ie. checks for effective uid == 0 assuming suses_enabled
is set).

This patch restores this check by setting MNT_EXPORTED when the
"export" mount option is specified to nmount().

I think this is reasonable since only mountd(8) should be setting
exports and I doubt any non-root mounted file system would
be setting its own exports.

Reviewed by: kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D37718


# 6b69465e 27-Oct-2022 Konstantin Belousov <kib@FreeBSD.org>

vfs_domount(): ensure that v_mountedhere and VIRF_MOUNTPOINT are set under the vnode lock

Fixes: f7833196bd6ba9bfc060a41b353422b15d6aa95b
Reported and tested by: pho
Reviewed by: jah, markj (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37198


# 61a1d5dd 10-Sep-2022 Mateusz Guzik <mjg@FreeBSD.org>

vfs: stop using the V_MNTREF flag

Reviewed by: kib, mckusick
Differential Revision: https://reviews.freebsd.org/D36521


# ad175a10 29-Jun-2022 Konstantin Belousov <kib@FreeBSD.org>

vfs_mount.c: convert explicit panics and KASSERTs to MPASSERT/MPPASS

Reviewed by: imp, mjg
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35652


# 1e543628 20-Jun-2022 Konstantin Belousov <kib@FreeBSD.org>

vfs_op_exit(): assert that mnt_vfs_ops stays non-zero for unmount or suspend

Reviewed by: mjg
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35639


# ce00b119 14-Jun-2022 Doug Ambrisko <ambrisko@FreeBSD.org>

mount: revert the active vnode reporting feature

Revert the computing of active vnode reporting since statfs is used
by a lot of tools. Only report the vnodes used.

Reported by: mjg


# 7565431f 14-Jun-2022 Mark Johnston <markj@FreeBSD.org>

mount: Fix an incorrect assertion in kernel_mount()

The pointer to the mount values may be null if an error occurred while
copying them in, so fix the assertion condition to reflect that
possibility.

While here, move some initialization code into the error == 0 block. No
functional change intended.

Reported by: syzkaller
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation


# 6468cd8e 13-Jun-2022 Doug Ambrisko <ambrisko@FreeBSD.org>

mount: add vnode usage per file system with mount -v

This avoids the need to drop into the ddb to figure out vnode
usage per file system. It helps to see if they are or are not
being freed. Suggestion to report active vnode count was from
kib@

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D35436


# 31d1b816 28-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

sysent: Get rid of bogus sys/sysent.h include.

Where appropriate hide sysent.h under proper condition.

MFC after: 2 weeks


# bb92cd7b 24-Mar-2022 Mateusz Guzik <mjg@FreeBSD.org>

vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd)


# 4a4b059a 25-Dec-2021 Konstantin Belousov <kib@FreeBSD.org>

Add vfs_remount_ro()

a helper to remount filesystem from rw to ro.

Tested by: pho
Reviewed by: markj, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721


# 7e1d3eef 25-Nov-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove the unused thread argument from NDINIT*

See b4a58fbf640409a1 ("vfs: remove cn_thread")

Bump __FreeBSD_version to 1400043.


# 8981a100 20-Nov-2021 Robert Wing <rew@FreeBSD.org>

mount: retire kernel_vmount()

The last usage of this function was removed in e3b1c847a4237ad9.

There are no in-tree consumers of kernel_vmount().

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D32607


# f10a8d09 15-Nov-2021 Kirk McKusick <mckusick@FreeBSD.org>

Allow the MNT_FORCE flag to be passed through to an initial mount.

When doing an initial mount(8) with its -f (force) flag, the MNT_FORCE
flag is not passed through to the underlying filesystem mount routine.
MNT_FORCE is only passed through on later updates to an existing
mount. With this commit the MNT_FORCE flag is now passed through on the
initial mount.

Sanity check: kib
Sponsored by: Netflix


# 03d5820f 12-Oct-2021 Mark Johnston <markj@FreeBSD.org>

mount: Check for !VDIR mount points before handling -o emptydir

To implement -o emptydir, vfs_emptydir() checks that the passed
directory is empty. This should be done after checking whether the
vnode is of type VDIR, though, or vfs_emptydir() may end up calling
VOP_READDIR on a non-directory.

Reported by: syzbot+4006732c69fb0f792b2c@syzkaller.appspotmail.com
Reviewed by: kib, imp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32475


# 6e8272f3 14-Aug-2021 Piotr Pawel Stefaniak <pstef@FreeBSD.org>

mount: improve error message for invalid filesystem names

For an invalid filesystem name used like this:
mount -t asdfs /dev/ada1p5 /usr/obj

emit an error message like this:
mount: /dev/ada1p5: Invalid fstype: Invalid argument

instead of:
mount: /dev/ada1p5: Operation not supported by device

Differential Revision: https://reviews.freebsd.org/D31540


# f1e2cc1c 26-Aug-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop dedicated sysinit for mountlist_mtx

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 0d28d014 26-Aug-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: refactor kern_unmount

Split unmounting by path and id in preparation for other changes.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 7b2561b4 26-Aug-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: stop open-coding vfs_getvfs in kern_unmount

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 614faa32 22-Aug-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: fix cache-relatecd LOR introduced in the previous change

Reported by: kib
Sponsored by: Rubicon Communications, LLC ("Netgate")


# e81e71b0 07-Aug-2021 Jason A. Harmening <jah@FreeBSD.org>

Use interruptible wait for blocking recursive unmounts

Now that we allow recursive unmount attempts to be abandoned upon
exceeding the retry limit, we should avoid leaving an unkillable
thread when a synchronous unmount request was issued against the
base filesystem.

Reviewed by: kib (earlier revision), mkusick
Differential Revision: https://reviews.freebsd.org/D31450


# a8c732f4 07-Aug-2021 Jason A. Harmening <jah@FreeBSD.org>

VFS: add retry limit and delay for failed recursive unmounts

A forcible unmount attempt may fail due to a transient condition, but
it may also fail due to some issue in the filesystem implementation
that will indefinitely prevent successful unmount. In such a case,
the retry logic in the recursive unmount facility will cause the
deferred unmount taskqueue to execute constantly.

Avoid this scenario by imposing a retry limit, with a default value
of 10, beyond which the recursive unmount facility will emit a log
message and give up. Additionally, introduce a grace period, with
a default value of 1s, between successive unmount retries on the
same mount.

Create a new sysctl node, vfs.deferred_unmount, to export the total
number of failed recursive unmount attempts since boot, and to allow
the retry limit and retry grace period to be tuned.

Reviewed by: kib (earlier revision), mkusick
Differential Revision: https://reviews.freebsd.org/D31450


# dbc689cd 18-Aug-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: use vn_lock_pair to avoid establishing an ordering on mount

This fixes some of the LORs seen on mount/unmount.

Complete fix will require taking care of unmount as well.

Reviewed by: kib
Tested by: pho (previous version)
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31611


# c746ed72 12-Jun-2021 Jason A. Harmening <jah@FreeBSD.org>

Allow stacked filesystems to be recursively unmounted

In certain emergency cases such as media failure or removal, UFS will
initiate a forced unmount in order to prevent dirty buffers from
accumulating against the no-longer-usable filesystem. The presence
of a stacked filesystem such as nullfs or unionfs above the UFS mount
will prevent this forced unmount from succeeding.

This change addreses the situation by allowing stacked filesystems to
be recursively unmounted on a taskqueue thread when the MNT_RECURSE
flag is specified to dounmount(). This call will block until all upper
mounts have been removed unless the caller specifies the MNT_DEFERRED
flag to indicate the base filesystem should also be unmounted from the
taskqueue.

To achieve this, the recently-added vfs_pin_from_vp()/vfs_unpin() KPIs
have been combined with the existing 'mnt_uppers' list used by nullfs
and renamed to vfs_register_upper_from_vp()/vfs_unregister_upper().
The format of the mnt_uppers list has also been changed to accommodate
filesystems such as unionfs in which a given mount may be stacked atop
more than one lower mount. Additionally, management of lower FS
reclaim/unlink notifications has been split into a separate list
managed by a separate set of KPIs, as registration of an upper FS no
longer implies interest in these notifications.

Reviewed by: kib, mckusick
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D31016


# 6475667f 24-Jul-2021 Warner Losh <imp@FreeBSD.org>

devctl: don't publish the mount options

Mount options aren't solely ASCII strings. In addition, experience to
date suggests that the mount options are much less useful than was
originally supposed and the mount flags suffice to make decisions. Drop
the reporting of options for the mount/remount/unmount events.

Reviewed by: markj
Reported by: KASAN
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31287


# 59409cb9 17-May-2021 Jason A. Harmening <jah@FreeBSD.org>

Add a generic mechanism for preventing forced unmount

This is aimed at preventing stacked filesystems like nullfs and unionfs
from "losing" their lower mounts due to forced unmount. Otherwise,
VFS operations that are passed through to the lower filesystem(s) may
crash or otherwise cause unpredictable behavior.

Introduce two new functions: vfs_pin_from_vp() and vfs_unpin().
which are intended to be called on the lower mount(s) when the stacked
filesystem is mounted and unmounted, respectively.
Much as registration in the mnt_uppers list previously did, pinning
will prevent even forced unmount of the lower FS and will allow the
stacked FS to freely operate on the lower mount either by direct
use of the struct mount* or indirect use through a properly-referenced
vnode's v_mount field.

vfs_pin_from_vp() is modeled after vfs_ref_from_vp() in that it uses
the mount interlock coupled with re-checking vp->v_mount to ensure
that it will fail in the face of a pending unmount request, even if
the concurrent unmount fully completes.

Adopt these new functions in both nullfs and unionfs.

Reviewed By: kib, markj
Differential Revision: https://reviews.freebsd.org/D30401


# 2425f5e9 05-Apr-2021 Mark Johnston <markj@FreeBSD.org>

mount: Disallow mounting over a jail root

Discussed with: jamie
Approved by: so
Security: CVE-2020-25584
Security: FreeBSD-SA-21:10.jail_mount


# a15f787a 15-Feb-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add vfs_ref_from_vp

This generalizes what vop_stdgetwritemount used to be doing.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28695


# 82397d79 31-Dec-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: denote vnode being a mount point with VIRF_MOUNTPOINT

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D27794


# 164438a7 26-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

More careful handling of the mount failure.

- VFS_UNMOUNT() requires vn_start_write() around it [*].
- call VFS_PURGE() before unmount.
- do not destroy mp if cleanup unmount did not succeed.
- set MNTK_UNMOUNT, and indicate forced unmount with MNTK_UNMOUNTF
for VFS_UNMOUNT() in cleanup.

PR: 251320 [*]
Reported by: Tong Zhang <ztong0001@gmail.com>
Reviewed by: markj, mjg
Discussed with: rmacklem
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27327


# f6dd1aef 09-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: group mount per-cpu vars into one struct

While here move frequently read stuff into the same cacheline.

This shrinks struct mount by 64 bytes.

Tested by: pho


# f1084587 05-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

Suspend all writeable local filesystems on power suspend.

This ensures that no writes are pending in memory, either metadata or
user data, but not including dirty pages not yet converted to fs writes.

Only filesystems declared local are suspended.

Note that this does not guarantee absence of the metadata errors or
leaks if resume is not done: for instance, on UFS unlinked but opened
inodes are leaked and require fsck to gc.

Reviewed by: markj
Discussed with: imp
Tested by: imp (previous version), pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D27054


# 2dee296a 05-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

Rationalize per-cpu zones.

The 2 provided zones had inconsistent naming between each other
("int" and "64") and other allocator zones (which use bytes).

Follow malloc by naming them "pcpu-" + size in bytes.

This is a step towards replacing ad-hoc per-cpu zones with
general slabs.


# ad89066a 17-Oct-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: annotate mountlist_mtx with __exclusive_cache_line


# a3d9bf49 23-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

cache: drop the force flag from purgevfs

The optional scan is wasteful, thus it is removed altogether from unmount.

Callers which always want it anyway remain unaffected.


# df665abd 26-Aug-2020 Rick Macklem <rmacklem@FreeBSD.org>

Fix a "v_seqc_users == 0 not met" panic when VFS_STATFS() fails during mount.

r363210 introduced v_seqc_users to the vnodes. This change requires
a vn_seqc_write_end() to match the vn_seqc_write_begin() in
vfs_cache_root_clear().
mjg@ provided this patch which seems to fix the panic.

Tested for an NFS mount where the VFS_STATFS() call will fail.

Submitted by: mjg
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D26160


# 773e541e 20-Aug-2020 Warner Losh <imp@FreeBSD.org>

Use devctl.h instead of bus.h to reduce newbus pollution.

There's no need for these parts of the kernel to know about newbus,
so narrow what is included to devctl.h for device_notify_*.

Suggested by: kib@


# 0f2c2c1c 20-Aug-2020 Warner Losh <imp@FreeBSD.org>

Use names suggested by kib@ in review D25969, move call for unmount to not call
with vnode locked, use NOWAIT alloc and only report when we don't overflow.

These changes were accidentally omitted from r364402, except for the not
reporting on overflow. They were lumped in with a debugging commit in my tree
that I omitted w/o realizing this.

Other issues from the review are pending some other changes I need to do first.


# 8ef773d1 19-Aug-2020 Warner Losh <imp@FreeBSD.org>

Add VFS FS events for mount and unmount to devctl/devd

Report when a filesystem is mounted, remounted or unmounted via devd, along with
details about the mount point and mount options.

Discussed with: kib@
Reviewed by: kirk@ (prior version)
Sponsored by: Netflix
Diffential Revision: https://reviews.freebsd.org/D25969


# 4b3208a9 18-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: sanity check mount counters in vfs_op_enter


# 0379ff6a 25-Jul-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: introduce vnode sequence counters

Modified on each permission change and link/unlink.

Reviewed by: kib
Tested by: pho (in a patchset)
Differential Revision: https://reviews.freebsd.org/D25573


# 8c1f410c 10-Jul-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: avoid spurious memcpy in vfs_statfs

It is quite often called for the very same buffer.


# 33b39b66 16-Jun-2020 Ryan Moeller <freqlabs@FreeBSD.org>

Apply default security flavor in vfs_export

There may be some version of mountd out there that does not supply a default
security flavor when none is given for an export.

Set the default security flavor in vfs_export if none is given, and remove the
workaround for oexport compat.

Reported by: npn
Reviewed by: rmacklem
Approved by: mav (mentor)
MFC after: 3 days
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25300


# 1f7104d7 13-Jun-2020 Rick Macklem <rmacklem@FreeBSD.org>

Fix export_args ex_flags field so that is 64bits, the same as mnt_flags.

Since mnt_flags was upgraded to 64bits there has been a quirk in
"struct export_args", since it hold a copy of mnt_flags
in ex_flags, which is an "int" (32bits).
This happens to currently work, since all the flag bits used in ex_flags are
defined in the low order 32bits. However, new export flags cannot be defined.
Also, ex_anon is a "struct xucred", which limits it to 16 additional groups.
This patch revises "struct export_args" to make ex_flags 64bits and replaces
ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a
groups list, so it can be malloc'd up to NGROUPS in size.
This requires that the VFS_CHECKEXP() arguments change, so I also modified the
last "secflavors" argument to be an array pointer, so that the
secflavors could be copied in VFS_CHECKEXP() while the export entry is locked.
(Without this patch VFS_CHECKEXP() returns a pointer to the secflavors
array and then it is used after being unlocked, which is potentially
a problem if the exports entry is changed.
In practice this does not occur when mountd is run with "-S",
but I think it is worth fixing.)

This patch also deleted the vfs_oexport_conv() function, since
do_mount_update() does the conversion, as required by the old vfs_cmount()
calls.

Reviewed by: kib, freqlabs
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D25088


# c13e414d 01-Jun-2020 Rick Macklem <rmacklem@FreeBSD.org>

Fix build issue introduced by r361699.

Reported by: cy (and others)


# 1cfffed8 01-Jun-2020 Ryan Moeller <freqlabs@FreeBSD.org>

Assign default security flavor when converting old export args

vfs_export requires security flavors be explicitly listed when
exporting as of r360900.

Use the default AUTH_SYS flavor when converting old export args to
ensure compatibility with the legacy mount syscall.

Reported by: rmacklem
Reviewed by: rmacklem
Approved by: mav (mentor)
MFC after: 3 days
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25045


# f9122b64 22-Mar-2020 Rick Macklem <rmacklem@FreeBSD.org>

Fix an NFS mount attempt where VFS_STATFS() fails.

r353150 added mnt_rootvnode and this seems to have broken NFS mounts when the
VFS_STATFS() called just after VFS_MOUNT() returns an error.
Then the code calls VFS_UNMOUNT(), which calls vflush(), which returns EBUSY.
Then the thread get stuck sleeping on "mntref" in vfs_mount_destroy().
This patch fixes this problem.

Reviewed by: kib, mjg
Differential Revision: https://reviews.freebsd.org/D24022


# ed67a63c 12-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop remaining zpcpu casts


# 123c5197 12-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: switch to smp_rendezvous_cpus_retry for vfs_op_thread_enter/exit

In particular on amd64 this eliminates an atomic op in the common case,
trading it for IPIs in the uncommon case of catching CPUs executing the
code while the filesystem is getting suspended or unmounted.


# 3eb6b656 08-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove now useless ENODEV handling from vn_fullpath consumers

Noted by: ngie


# b3fb13eb 24-Jan-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add kern_unmount() and use in Linuxulator. No functional changes.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22646


# bbb1e07d 15-Jan-2020 Kirk McKusick <mckusick@FreeBSD.org>

Peter Holm reports that his test that does an umount(8) on an active
mount point while numerous tests are running that are writing to
files on that mount point cause the unmount(8) to hang forever.

The unmount(8) system call is handled in the kernel by the dounmount()
function. The cause of the hang is that prior to dounmount() calling
VFS_UNMOUNT() it is calling VFS_SYNC(mp, MNT_WAIT). The MNT_WAIT
flag indicates that VFS_SYNC() should not return until all the dirty
buffers associated with the mount point have been written to disk.
Because user processes are allowed to continue writing and can do
so faster than the data can be written to disk, the call to VFS_SYNC()
can never finish.

Unlike VFS_SYNC(), the VFS_UNMOUNT() routine can suspend all processes
when they request to do a write thus having a finite number of dirty
buffers to write that cannot be expanded. There is no need to call
VFS_SYNC() before calling VFS_UNMOUNT(), because VFS_UNMOUNT() needs
to flush everything again anyway after suspending writes, to catch
anything that was dirtied between the VFS_SYNC() and writes being
suspended.

The fix is to simply remove the unnecessary call to VFS_SYNC() from
dounmount().

Reported by: Peter Holm
Analysis by: Chuck Silvers
Tested by: Peter Holm
MFC after: 7 days
Sponsored by: Netflix


# cc3593fb 12-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: rework vnode list management

The current notion of an active vnode is eliminated.

Vnodes transition between 0<->1 hold counts all the time and the
associated traversal between different lists induces significant
scalability problems in certain workloads.

Introduce a global list containing all allocated vnodes. They get
unlinked only when UMA reclaims memory and are only requeued when
hold count reaches 0.

Sample result from an incremental make -s -j 104 bzImage on tmpfs:
stock: 118.55s user 3649.73s system 7479% cpu 50.382 total
patched: 122.38s user 1780.45s system 6242% cpu 30.480 total

Reviewed by: jeff
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D22997


# 57083d25 12-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add per-mount vnode lazy list and use it for deferred inactive + msync

This obviates the need to scan the entire active list looking for vnodes
of interest.

msync is handled by adding all vnodes with write count to the lazy list.

deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag.

Vnodes get dequeued from the list when their hold count reaches 0.

Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that
spurious locking is avoided in the common case.

Reviewed by: jeff
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D22995


# c8b3463d 07-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT)

The previous behavior of leaving VI_OWEINACT vnodes on the active list without
a hold count is eliminated. Hold count is kept and inactive processing gets
explicitly deferred by setting the VI_DEFINACT flag. The syncer is then
responsible for vdrop.

Reviewed by: kib (previous version)
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D23036


# b249ce48 03-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop the mostly unused flags argument from VOP_UNLOCK

Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427


# dc20b834 06-Oct-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add optional root vnode caching

Root vnodes looekd up all the time, e.g. when crossing a mount point.
Currently used routines always perform a costly lookup which can be
trivially avoided.

Reviewed by: jeff (previous version), kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21646


# 50bb04b7 27-Sep-2019 Andrew Turner <andrew@FreeBSD.org>

Check the vfs option length is valid before accessing through

When a VFS option passed to nmount is present but NULL the kernel will
place an empty option in its internal list. This will have a NULL
pointer and a length of 0. When we come to read one of these the kernel
will try to load from the last address of virtual memory. This is
normally invalid so will fault resulting in a kernel panic.

Fix this by checking if the length is valid before dereferencing.

MFC after: 3 days
Sponsored by: DARPA, AFRL


# ba7a55d9 22-Sep-2019 Sean Eric Fagan <sef@FreeBSD.org>

Add two options to allow mount to avoid covering up existing mount points.
The two options are

* nocover/cover: Prevent/allow mounting over an existing root mountpoint.
E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local
is already a mountpoint.
* emptydir/noemptydir: Prevent/allow mounting on a non-empty directory.
E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail.

Neither of these options is intended to be a default, for historical and
compatibility reasons.

Reviewed by: allanjude, kib
Differential Revision: https://reviews.freebsd.org/D21458


# b488246b 19-Sep-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: group fields used for per-cpu ops in one cacheline

Sponsored by: The FreeBSD Foundation


# 4cace859 16-Sep-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: convert struct mount counters to per-cpu

There are 3 counters modified all the time in this structure - one for
keeping the structure alive, one for preventing unmount and one for
tracking active writers. Exact values of these counters are very rarely
needed, which makes them a prime candidate for conversion to a per-cpu
scheme, resulting in much better performance.

Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on
a 104-way 2 socket Skylake system:
before: 852393 ops/s
after: 76682077 ops/s

Reviewed by: kib, jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21637


# a8c8e44b 16-Sep-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: manage mnt_ref with atomics

New primitive is introduced to denote sections can operate locklessly
on aspects of struct mount, but which can also be disabled if necessary.
This provides an opportunity to start scaling common case modifications
while providing stable state of the struct when facing unmount, write
suspendion or other events.

mnt_ref is the first counter to start being managed in this manner with
the intent to make it per-cpu.

Reviewed by: kib, jeff
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21425


# e671edac 23-Aug-2019 Konstantin Belousov <kib@FreeBSD.org>

De-commision the MNTK_NOINSMNTQ kernel mount flag.

After all the changes, its dynamic scope is same as for MNTK_UNMOUNT,
but to allow the syncer vnode to be re-installed on unmount failure.
But the case of syncer was already handled by using the VV_FORCEINSMQ
flag for quite some time.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 4b3f7673 19-Aug-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: fix up r351193 ("stop always overwriting ->mnt_stat in VFS_STATFS")

fs-specific part of vfs_statfs routines only fill in small portion of the
structure. Previous code was always copying everything at a higher layer to
acoomodate it and this patch does the same.

'df' (no arguments) worked fine because the caller uses mnt_stat itself as the
target buffer, making all the copying a no-op for its own case.
'df /' and similar use a different consumer which passes its own buffer and
this is where you can run into trouble.

Reported by: cy
Fixes: r351193
Sponsored by: The FreeBSD Foundation


# e7c1709a 18-Aug-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: stop always overwriting ->mnt_stat in VFS_STATFS

The struct is already populated on each mount (and remount). Fields are either
constant or not used by filesystem in the first place.

Some infrequently used functions use it to avoid having to allocate a new buffer
and are left alone.

The current code results in an avoidable copying single-threaded and significant
cache line bouncing multithreaded

While here deduplicate initial filling of the struct.

Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21317


# daec9284 21-May-2019 Conrad Meyer <cem@FreeBSD.org>

Include ktr.h in more compilation units

Similar to r348026, exhaustive search for uses of CTRn() and cross reference
ktr.h includes. Where it was obvious that an OS compat header of some kind
included ktr.h indirectly, .c files were left alone. Some of these files
clearly got ktr.h via header pollution in some scenarios, or tinderbox would
not be passing prior to this revision, but go ahead and explicitly include it
in files using it anyway.

Like r348026, these CUs did not show up in tinderbox as missing the include.

Reported by: peterj (arm64/mp_machdep.c)
X-MFC-With: r347984
Sponsored by: Dell EMC Isilon


# 13c31c29 20-Dec-2018 Kirk McKusick <mckusick@FreeBSD.org>

Some filesystems (like cd9660 and ext3) require that VFS_STATFS()
be called before VFS_ROOT() is called. Move the call for VFS_STATFS()
so that it is done after VFS_MOUNT(), but before VFS_ROOT().
This change actually improves the robustness of the mount system
call because it returns an error rather than failing silently
when VFS_STATFS() returns failure.

Reported by: Rebecca Cran <rebecca@bluestop.org>
Sponsored by: Netflix


# e04d2a3c 15-Dec-2018 Kirk McKusick <mckusick@FreeBSD.org>

Under UFS/FFS the VFS_ROOT() function will return an error if the inode
check-hash fails. Panic'ing is not an appropriate response. So, check
for an error return from VFS_ROOT() and when an error is reported,
unwind and return the error.

Reported by: Gary Jennejohn (gj)
Sponsored by: Netflix


# cc426dd3 11-Dec-2018 Mateusz Guzik <mjg@FreeBSD.org>

Remove unused argument to priv_check_cred.

Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by: The FreeBSD Foundation


# 970a174f 25-Oct-2018 Mark Johnston <markj@FreeBSD.org>

Add FALLTHROUGH comments to appease Coverity.

CID: 1017862-1017864, 1017866-1017868
MFC after: 2 weeks


# 4fceda62 24-Oct-2018 Konstantin Belousov <kib@FreeBSD.org>

Correct condition to detect mount(2) support by a filesystem.

Reported and tested by: cy
Sponsored by: The FreeBSD Foundation
Approved by: re (rgrimes)


# 8ff7fad1 23-Oct-2018 Konstantin Belousov <kib@FreeBSD.org>

Only call sigdeferstop() for NFS.

Use bypass to catch any NFS VOP dispatch and route it through the
wrapper which does sigdeferstop() and then dispatches original
VOP. NFS does not need a bypass below it, which is not supported.

The vop offset in the vop_vector is added since otherwise it is
impossible to get vop_op_t from the internal table, and I did not
wanted to create the layered fs only to wrap NFS VOPs.

VFS_OP()s wrap is straightforward.

Requested and reviewed by: mjg (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D17658


# 0e5c6bd4 04-May-2018 Jamie Gritton <jamie@FreeBSD.org>

Make it easier for filesystems to count themselves as jail-enabled,
by doing most of the work in a new function prison_add_vfs in kern_jail.c
Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and
the rest is taken care of. This includes adding a jail parameter like
allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed.
Both of these used to be a static list of known filesystems, with
predefined permission bits.

Reviewed by: kib
Differential Revision: D14681


# 31260bf0 27-Mar-2018 Andriy Gapon <avg@FreeBSD.org>

vfs_donmount: in certain cases try r/o mount if r/w mount fails

If the operation is not an update, if neither r/w nor r/o mode is
explicitly requested, if the error code hints at the possibility of the
media being read-only, and if the fallback is allowed, then we can try
to automatically downgrade to the readonly mode.

This is especially useful for auto-mounting of removable media that
sometimes can happen to be write-protected.

The fallback to r/o is not enabled by default. It can be requested on a
per-mount basis with a new mount option, 'autoro'. Or it can be
globally allowed by setting vfs.default_autoro.

Reviewed by: cem, kib
MFC after: 3 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D13361


# ac579135 07-Jan-2018 Ian Lepore <ian@FreeBSD.org>

Use EVENTHANDLER_DIRECT_INVOKE for [un]mount events, for better performance.


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# f92e3400 13-Oct-2017 Andriy Gapon <avg@FreeBSD.org>

remove process and jail directory machinations from dounmount

The manipulations done by mountcheckdirs() are not that useful during
the unmount, they can bring about unexpected security consequences.

Thic change effectively reverts the change in r73241.

The change also allows to simplify the handling of rootvnode global
variable.

Discussed with: mckusick, mjg, kib
Reviewed by: trasz
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D12366


# 9770475c 19-Sep-2017 Konstantin Belousov <kib@FreeBSD.org>

Do not vrele() covered vnode under the mp mutex.

If vrele() changes the hold count to zero, it needs to acquire the
vnode lock.

Sponsored by: The FreeBSD Foundation
Discussed with: avg
X-MFC with: r323578


# cbc785c2 14-Sep-2017 Andriy Gapon <avg@FreeBSD.org>

dounmount: do not release the mount point's reference on the covered vnode

As long as mnt_ref is not zero there can be a consumer that might try
to access mnt_vnodecovered. For this reason the covered vnode must not
be freed until mnt_ref goes to zero.
So, move the release of the covered vnode to vfs_mount_destroy.

Reviewed by: kib
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D12329


# 3e85b721 16-May-2017 Ed Maste <emaste@FreeBSD.org>

Remove register keyword from sys/ and ANSIfy prototypes

A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.

ANSIfy related prototypes while here.

Reviewed by: cem, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10193


# 2f304845 05-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Do not allocate struct statfs on kernel stack.

Right now size of the structure is 472 bytes on amd64, which is
already large and stack allocations are indesirable. With the ino64
work, MNAMELEN is increased to 1024, which will make it impossible to have
struct statfs on the stack.

Extracted from: ino64 work by gleb
Discussed with: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 714b7df5 13-Nov-2016 Konstantin Belousov <kib@FreeBSD.org>

Provide simple mutual exclusion between mount point update and unmount.

Currently mount update keeps vfs_busy(9) reference on the mount point
during MNT_UPDATE VFS_MOUNT() vfsops call. This already provides the
exclusion, but is problematic for filesystems which need to perform
namei(9) during VFS_MOUNT(MNT_UPDATE) operations, e.g. to refresh
mnt_from path, because namei(9) must not be called while the
vfs_busy(9) reference is owned.

Check for MNT_UPDATE flag before setting MNTK_UNMOUNT, and for
MNTK_UNMOUNT before entering innards of vfs_domount_update(), failing
syscalls with EBUSY if conflict is detected. Keep vfs_busy(9)
reference around VFS_MOUNT(MNT_UPDATE) calls still to not change VFS
KPI.

In the update path in ffs_mount(), drop vfs_busy() reference around
namei(), which is now safe due to unmount never executing in parallel
with VFS_MOUNT(MNT_UPDATE), and which avoids the deadlock.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 9eb8f495 13-Nov-2016 Konstantin Belousov <kib@FreeBSD.org>

Move common cleanup code into helper.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 45571f88 08-Oct-2016 Mateusz Guzik <mjg@FreeBSD.org>

vfs: assert empty tmp free list on unmount


# f71d0856 07-Oct-2016 Konstantin Belousov <kib@FreeBSD.org>

Limit scope of the optimization in r306608 to dounmount() caller only.
Other uses of cache_purgevfs() do rely on the cache purge for correct
operations, when paths are invalidated without unmount.

Reported and tested by: jkim
Discussed with: mjg
Sponsored by: The FreeBSD Foundation


# 5bb81f9b 30-Sep-2016 Mateusz Guzik <mjg@FreeBSD.org>

vfs: batch free vnodes in per-mnt lists

Previously free vnodes would always by directly returned to the global
LRU list. With this change up to mnt_free_list_batch vnodes are collected
first.

syncer runs always return the batch regardless of its size.

While vnodes on per-mnt lists are not counted as free, they can be
returned in case of vnode shortage.

Reviewed by: kib
Tested by: pho


# e313b4dd 20-Sep-2016 Edward Tomasz Napierala <trasz@FreeBSD.org>

Fix bug introduced with r302388, which could cause processes accessing
automounted shares to hang with "vfs_busy" wchan.

(As a workaround one can run 'automount -u' from cron.)

Reviewed by: kib@
MFC after: 1 month


# 69a28758 15-Sep-2016 Ed Maste <emaste@FreeBSD.org>

Renumber license clauses in sys/kern to avoid skipping #3


# 411455a8 10-Aug-2016 Edward Tomasz Napierala <trasz@FreeBSD.org>

Replace all remaining calls to vprint(9) with vn_printf(9), and remove
the old macro.

MFC after: 1 month


# debc480e 07-Jul-2016 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add new unmount(2) flag, MNT_NONBUSY, to check whether there are
any open vnodes before proceeding. Make autounmound(8) use this flag.
Without it, even an unsuccessfull unmount causes filesystem flush,
which interferes with normal operation.

Reviewed by: kib@
Approved by: re (gjb@)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D7047


# 9fdbfd3b 15-Jun-2016 Konstantin Belousov <kib@FreeBSD.org>

Do not assume that we own the use reference on the covered vnode until
we set MNTK_UNMOUNT flag on the mp. Otherwise parallel unmount which
wins race with us could dereference the covered vnode, and we are
left with the locked freed memory.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
MFC after: 1 week


# 8614f45b 16-May-2016 Andriy Gapon <avg@FreeBSD.org>

dounmount: do not call mountcheckdirs() for mounts with MNT_IGNORE

This is a bit hackish, but the flag is currently set only for ZFS
snapshots mounted under .zfs. mountcheckdirs() can change cdir/rdir
references to a covered vnode. But for the said snapshots the covered
vnode is really ephemeral and it must never be accessed (except
for a few specific cases).

To do: consider removing mountcheckdirs() entirely

MFC after: 5 days


# 76c404fc 04-Feb-2016 Konstantin Belousov <kib@FreeBSD.org>

Do not copy by field when converting struct oexport_args to struct
export_args on mount update, bzero() is consistent with
vfs_oexport_conv().
Make the code structure more explicit by using switch.
Return EINVAL if export option layout (deduced from size) is unknown.

Based on the submission by: bde
Sponsored by: The FreeBSD Foundation


# c9ba6504 24-Aug-2015 Edward Tomasz Napierala <trasz@FreeBSD.org>

Make vfs_unmountall() unmount /dev after /, not before. The only
reason this didn't result in an unclean shutdown is that devfs ignores
MNT_FORCE flag.

Reviewed by: kib@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3467


# 1965f86c 02-Jul-2015 Konstantin Belousov <kib@FreeBSD.org>

Vnode is not referenced by the vfs_domount() at the point where
asserts are made. Remove them, since we might dereference freed
memory. Leaked locks are asserted by the syscall return code anyway.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 780dca1b 27-May-2015 Konstantin Belousov <kib@FreeBSD.org>

Right now, dounmount() is called with unreferenced mount point.
Nothing stops a parallel unmount to suceed before the given call to
dounmount() checks and locks the covered vnode. Prevent dounmount()
from acting on the freed (although type-stable) memory by changing the
interface to require the mount point to be referenced. dounmount()
consumes the reference on return, regardless of the sucessfull or
erronous result.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 5d6f5b24 10-Feb-2015 Konstantin Belousov <kib@FreeBSD.org>

Mountd iterating over the mount points may race with the parallel
unmount, which causes error from nmount(2) call when performing
MNT_DELEXPORT over the directory which ceased to be a mount point.

The race is legitimate and innocent, but results in the chatty mountd.
Silence it by providing an distinguished error code for the situation,
and ignoring the error in mountd loop.

Based on the patch by: Andreas Longwitz <longwitz@incore.de>
Prodded and tested by: bdrewery
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# b2344ab5 09-Dec-2014 Konstantin Belousov <kib@FreeBSD.org>

Do not call VFS_SYNC() before VFS_UNMOUNT() for forced unmount.

Since VFS does not/cannot stop writes, sync might run indefinitely, or
be a wrong thing to do at all. E. g. NFS ignores VFS_SYNC() for
forced unmounts, since non-responding server does not allow sync to
finish. On the other hand, filesystems can and do stop writes using
fs-specific facilities, and should already fully flush caches in
VFS_UNMOUNT() due to the race.

Adjust msdosfs tp sync in unmount for forced call, to accomodate the
new behaviour. Note that it is still racy, since writes are not
stopped.

Discussed with: avg, bjk, mckusick
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks


# 3914ddf8 17-Aug-2014 Edward Tomasz Napierala <trasz@FreeBSD.org>

Bring in the new automounter, similar to what's provided in most other
UNIX systems, eg. MacOS X and Solaris. It uses Sun-compatible map format,
has proper kernel support, and LDAP integration.

There are still a few outstanding problems; they will be fixed shortly.

Reviewed by: allanjude@, emaste@, kib@, wblock@ (earlier versions)
Phabric: D523
MFC after: 2 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation


# 168f4ee0 02-Aug-2014 Konstantin Belousov <kib@FreeBSD.org>

Remove Giant acquisition from the mount and unmount pathes.

It could be claimed that two things were reasonable protected by
Giant. One is vfsconf list links, which is converted to the new
dedicated sx vfsconf_sx. Another is vfsconf.vfc_refcount, which is
now updated with atomics.

Note that vfc_refcount still has the same races now as it has under
the Giant, the unload of filesystem modules can happen while the
module is still in use.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 97c0df73 12-Apr-2014 Bryan Drewery <bdrewery@FreeBSD.org>

Use proper MFSNAMELEN for fs type.

MFC after: 2 weeks
Reviewed by: rodrigc
Also spotted by:ambrisko


# d3baefa8 03-Oct-2013 Sean Bruno <sbruno@FreeBSD.org>

Change len checks for fstypelen and fspathlen to be against absolute len
not strlen as they are *not* strings.

Discovered by GSOC student, Mike Ma <mikemandarine@gmail.com> during his
fuse.glusterfs port to FreeBSD.

Final patch from mckusick@

Submitted by: mckusick@
Approved by: re (hrs)
MFC after: 2 weeks


# 8fe6bddf 01-Sep-2013 Rick Macklem <rmacklem@FreeBSD.org>

Forced dismounts of NFS mounts can fail when thread(s) are stuck
waiting for an RPC reply from the server while holding the mount
point busy (mnt_lockref incremented). This happens because dounmount()
msleep()s waiting for mnt_lockref to become 0, before calling
VFS_UNMOUNT(). This patch adds a new VFS operation called VFS_PURGE(),
which the NFS client implements as purging RPCs in progress. Making
this call before checking mnt_lockref fixes the problem, by ensuring
that the VOP_xxx() calls will fail and unbusy the mount point.

Reported by: sbruno
Reviewed by: kib
MFC after: 2 weeks


# 8939c069 10-Jul-2013 Marcel Moolenaar <marcel@FreeBSD.org>

Add vfs_mounted and vfs_unmounted events so that components can be informed
about mount and unmount events. This is used by Juniper to implement a more
optimal implementation of NetBSD's veriexec.

This change differs from r253224 in the following way:
o The vfs_mounted handler is called before mountcheckdirs() and with
newdp locked. vp is unlocked.
o The event handlers are declared in <sys/eventhandler.h> and not in
<sys/mount.h>. The <sys/mount.h> header is used in user land code
that pretends to be kernel code and as such creates a very convoluted
environment. It's hard to untangle.

Submitted by: stevek@juniper.net
Discussed with: pjd@
Obtained from: Juniper Networks, Inc.


# 4612275f 10-Jun-2013 Marcel Moolenaar <marcel@FreeBSD.org>

Revert r251590. It unexpectedly broke the build and there were some
questions on locking. As part of commit-bit grooming, I'd like Steve
to handle this, but can't leave things broken in the mean time.


# 8c7ca16f 09-Jun-2013 Marcel Moolenaar <marcel@FreeBSD.org>

Add vfs_mounted and vfs_unmounted events so that components can be informed
about mount and unmount events. This is used by Juniper to implement a more
optimal implementation of NetBSD's veriexec.

Submitted by: stevek@juniper.net
Obtained from: Juniper Networks, Inc


# ab3f6b34 17-Apr-2013 Gabor Kovesdan <gabor@FreeBSD.org>

- Correct mispellings of the word occurrence

Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)


# 8d6884ce 20-Mar-2013 Konstantin Belousov <kib@FreeBSD.org>

When the journaled FFS volume is suspended due to the journal space
becoming too low, the softdep flush thread processes the workitems,
which frees the space in journal, and then unsuspends the fs. The
softdep_flush() and other workitem processing functions busy the
filesystem before iterating over the worklist, to prevent the parallel
unmount from freeing the mount data. The vfs_busy() is called with
MBF_NOWAIT flag.

Now, if the unmount is already started and the filesystem is suspended
due to low journal space, the journal is never flushed and filesystem
is never unsuspended, because vfs_busy(MBF_NOWAIT) call cannot succeed
for the unmounting fs, and softdep_flush() does not process the
workitems. Unmount needs to write metadata, where it hangs in the
"suspfs" state.

Move the vn_start_write() call in the dounmount() before setting the
MNTK_UNMOUNT flag. This practically ensures that softdep_flush()
processed the pending journal writes by making dounmount() wait for
the lift of the suspension.

Sponsored by: The FreeBSD Foundation
Reported and tested by: pho
MFC after: 2 weeks


# eea8d86d 04-Jan-2013 David Xu <davidxu@FreeBSD.org>

Revert revision 244760 because strncpy pads trailing space with zero,
this prevents kernel data from being leaked.

Noticed by: Joerg Sonnenberger &lt; joerg at britannica dot bec dot de &gt;


# d1c5e3f8 03-Jan-2013 Konstantin Belousov <kib@FreeBSD.org>

Remove the deprecated MNT_VNODE_FOREACH interface. Use the
MNT_VNODE_FOREACH_ALL instead.


# 9d4bf0db 27-Dec-2012 David Xu <davidxu@FreeBSD.org>

Use strlcpy to NULL-terminate error message even if user provided a short
buffer.


# b1308d72 21-Dec-2012 Attilio Rao <attilio@FreeBSD.org>

Fixup r218424: uio_yield() was scaling directly to userland priority.
When kern_yield() was introduced with the possibility to specify
a new priority, the behaviour changed by not lowering priority at all
in the consumers, making the yielding mechanism highly ineffective for
high priority kthreads like bufdaemon, syncer, vlrudaemon, etc.
There are no evidences that consumers could bear with such change in
semantic and this situation could finally lead to bugs similar to the
ones fixed in r244240.
Re-specify userland pri for kthreads involved.

Tested by: pho
Reviewed by: kib, mdf
MFC after: 1 week


# 796fa4fb 09-Dec-2012 Konstantin Belousov <kib@FreeBSD.org>

Fix typo.

MFC after: 3 days


# e1216d13 30-Nov-2012 Pawel Jakub Dawidek <pjd@FreeBSD.org>

IFp4 @208450:

Remove redundant call to AUDIT_ARG_UPATH1().
Path will be remembered by the following NDINIT(AUDITVNODE1) call.

Sponsored by: FreeBSD Foundation (auditdistd)
MFC after: 2 weeks


# 5050aa86 22-Oct-2012 Konstantin Belousov <kib@FreeBSD.org>

Remove the support for using non-mpsafe filesystem modules.

In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by: attilio
Tested by: pho


# bcd5bb8e 09-Sep-2012 Konstantin Belousov <kib@FreeBSD.org>

Add a facility for vgone() to inform the set of subscribed mounts
about vnode reclamation. Typical use is for the bypass mounts like
nullfs to get a notification about lower vnode going away.

Now, vgone() calls new VFS op vfs_reclaim_lowervp() with an argument
lowervp which is reclaimed. It is possible to register several
reclamation event listeners, to correctly handle the case of several
nullfs mounts over the same directory.

For the filesystem not having nullfs mounts over it, the overhead
added is a single mount interlock lock/unlock in the vnode reclamation
path.

In collaboration with: pho
MFC after: 3 weeks


# f257ebbb 20-Apr-2012 Kirk McKusick <mckusick@FreeBSD.org>

This change creates a new list of active vnodes associated with
a mount point. Active vnodes are those with a non-zero use or hold
count, e.g., those vnodes that are not on the free list. Note that
this list is in addition to the list of all the vnodes associated
with a mount point.

To avoid adding another set of linkage pointers to the vnode
structure, the active list uses the existing linkage pointers
used by the free list (previously named v_freelist, now renamed
v_actfreelist).

This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops
over just the active vnodes associated with a mount point (typically
less than 1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks


# 71469bb3 17-Apr-2012 Kirk McKusick <mckusick@FreeBSD.org>

Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.
The primary changes are that the user of the interface no longer
needs to manage the mount-mutex locking and that the vnode that
is returned has its mutex locked (thus avoiding the need to check
to see if its is DOOMED or other possible end of life senarios).

To minimize compatibility issues for third-party developers, the
old MNT_VNODE_FOREACH interface will remain available so that this
change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH
will be removed in head.

The reason for this update is to prepare for the addition of the
MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the
active vnodes associated with a mount point (typically less than
1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks


# 0ff93c48 07-Apr-2012 Gleb Kurtsou <gleb@FreeBSD.org>

Add vfs_getopt_size. Support human readable file system options in tmpfs.

Increase maximum tmpfs file system size to 4GB*PAGE_SIZE on 32 bit archs.

Discussed with: delphij
MFC after: 2 weeks


# 38ddb572 08-Mar-2012 Konstantin Belousov <kib@FreeBSD.org>

Decomission mnt_noasync. Introduce MNTK_NOASYNC mnt_kern_flag which
allows a filesystem to request VFS to not allow MNTK_ASYNC.

MFC after: 1 week


# a91d2201 05-Feb-2012 Martin Matuska <mm@FreeBSD.org>

Analogous to r230407 a separate path buffer in vfs_mount.c is required
for r230129. Fixes a out of bounds write to fspath.

MFC after: 10 days


# cc672d35 16-Jan-2012 Kirk McKusick <mckusick@FreeBSD.org>

Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC after: 2 weeks


# f6e633a9 14-Jan-2012 Martin Matuska <mm@FreeBSD.org>

Introduce vn_path_to_global_path()

This function updates path string to vnode's full global path and checks
the size of the new path string against the pathlen argument.

In vfs_domount(), sys_unmount() and kern_jail_set() this new function
is used to update the supplied path argument to the respective global path.

Unbreaks jailed zfs(8) with enforce_statfs set to 1.

Reviewed by: kib
MFC after: 1 month


# ed1f6dc2 08-Nov-2011 Attilio Rao <attilio@FreeBSD.org>

Introduce the option VFS_ALLOW_NONMPSAFE and turn it on by default on
all the architectures.
The option allows to mount non-MPSAFE filesystem. Without it, the
kernel will refuse to mount a non-MPSAFE filesytem.

This patch is part of the effort of killing non-MPSAFE filesystems
from the tree.

No MFC is expected for this patch.

Tested by: gianni
Reviewed by: kib


# d745c852 06-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.

This means that their use is restricted to a single C file.


# cd795a6e 11-Oct-2011 Kirk McKusick <mckusick@FreeBSD.org>

When unmounting a filesystem always wait for the vfs_busy lock to clear
so that if no vnodes in the filesystem are actively in use the unmount
will succeed rather than failing with EBUSY.

Reported by: Garrett Cooper
Reviewed by: Attilio Rao and Kostik Belousov
Tested by: Garrett Cooper
PR: kern/161016
MFC after: 3 weeks


# 8451d0dd 16-Sep-2011 Kip Macy <kmacy@FreeBSD.org>

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# d16eac5c 08-Aug-2011 Martin Matuska <mm@FreeBSD.org>

Revert r224655 and r224614 because vn_fullpath* does not always work
on nullfs mounts.

Change shall be reconsidered after 9.0 is released.

Requested by: re (kib)
Approved by: re (kib)


# 5388625f 05-Aug-2011 Martin Matuska <mm@FreeBSD.org>

The change in r224615 didn't take into account that vn_fullpath_global()
doesn't operate on locked vnode. This could cause a panic.

Fix by unlocking vnode, re-locking afterwards and verifying that it wasn't
renamed or deleted. To improve readability and reduce code size, move code
to a new static function vfs_verify_global_path().

In addition, fix missing giant unlock in unmount().

Reported by: David Wolfskill <david@catwhisker.org>
Reviewed by: kib
Approved by: re (bz)
MFC after: 2 weeks


# f6c1d63e 02-Aug-2011 Martin Matuska <mm@FreeBSD.org>

For mount, discover f_mntonname from supplied path argument
using vn_fullpath_global(). This fixes f_mntonname if mounting
inside chroot, jail or with relative path as argument.

For unmount in jail, use vn_fullpath_global() to discover
global path from supplied path argument. This fixes unmount in jail.

Reviewed by: pjd, kib
Approved by: re (kib)
MFC after: 2 weeks


# 6beb3bb4 24-Jul-2011 Kirk McKusick <mckusick@FreeBSD.org>

This update changes the mnt_flag field in the mount structure from
32 bits to 64 bits and eliminates the unused mnt_xflag field. The
existing mnt_flag field is completely out of bits, so this update
gives us room to expand. Note that the f_flags field in the statfs
structure is already 64 bits, so the expanded mnt_flag field can
be exported without having to make any changes in the statfs structure.

Approved by: re (bz)


# fef7c585 10-Jul-2011 Andrey V. Elsukov <ae@FreeBSD.org>

Include sys/sbuf.h directly.


# 3d08a76b 12-May-2011 Matthew D Fleming <mdf@FreeBSD.org>

Use a name instead of a magic number for kern_yield(9) when the priority
should not change. Fetch the td_user_pri under the thread lock. This
is probably not necessary but a magic number also seems preferable to
knowing the implementation details here.

Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >


# 1b0fe69d 22-Apr-2011 Jaakko Heinonen <jh@FreeBSD.org>

Utilize vfs_sanitizeopts() in vfs_mergeopts() to merge options. Because
vfs_sanitizeopts() can handle "ro" and "rw" options properly, there is
no more need to add "noro" in vfs_donmount() to cancel "ro".

This also fixes a problem of canceling options beginning with "no".
For example, "noatime" didn't cancel "nonoatime". Thus it was possible
that both "noatime" and "nonoatime" were active at the same time.

Reviewed by: bde


# 9dc6abbd 26-Mar-2011 Jaakko Heinonen <jh@FreeBSD.org>

Fix some style issues in r219925.

Reported by: bde
MFC after: 1 month


# 3fd8fe5b 23-Mar-2011 Jaakko Heinonen <jh@FreeBSD.org>

Recognize "ro", "rdonly", "norw", "rw" and "noro" as equal options in
vfs_equalopts(). This allows vfs_sanitizeopts() to filter redundant
occurrences of these options. It was possible that for example both "ro"
and "rw" options became active concurrently.

PR: kern/133614
Discussed on: freebsd-hackers
MFC after: 1 month


# da2e368f 19-Feb-2011 Jaakko Heinonen <jh@FreeBSD.org>

Don't restore old mount options and flags if VFS_MOUNT(9) succeeds but
vfs_export() fails. Restoring old options and flags after successful
VFS_MOUNT(9) call may cause the file system internal state to become
inconsistent with mount options and flags. Specifically the FFS super
block fs_ronly field and the MNT_RDONLY flag may get out of sync.

PR: kern/133614
Discussed on: freebsd-hackers


# e7ceb1e9 07-Feb-2011 Matthew D Fleming <mdf@FreeBSD.org>

Based on discussions on the svn-src mailing list, rework r218195:

- entirely eliminate some calls to uio_yeild() as being unnecessary,
such as in a sysctl handler.

- move should_yield() and maybe_yield() to kern_synch.c and move the
prototypes from sys/uio.h to sys/proc.h

- add a slightly more generic kern_yield() that can replace the
functionality of uio_yield().

- replace source uses of uio_yield() with the functional equivalent,
or in some cases do not change the thread priority when switching.

- fix a logic inversion bug in vlrureclaim(), pointed out by bde@.

- instead of using the per-cpu last switched ticks, use a per thread
variable for should_yield(). With PREEMPTION, the only reasonable
use of this is to determine if a lock has been held a long time and
relinquish it. Without PREEMPTION, this is essentially the same as
the per-cpu variable.


# 08b163fa 02-Feb-2011 Matthew D Fleming <mdf@FreeBSD.org>

Put the general logic for being a CPU hog into a new function
should_yield(). Use this in various places. Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after: 1 week


# 1a4fbae8 24-Jan-2011 Jaakko Heinonen <jh@FreeBSD.org>

Replace spaces with tabs.


# f03749ca 23-Nov-2010 Sergey Kandaurov <pluknet@FreeBSD.org>

Update MNT_ROOTFS comments after changes in the root mount logic.

Reported by: arundel
Suggested by: marcel (phrasing)
Approved by: kib (mentor)


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# c1f0aabb 17-Oct-2010 Marcel Moolenaar <marcel@FreeBSD.org>

In vfs_filteropt(), only print the errmsg when there's no errmsg
mount option. Otherwise errors tend to get printed multiple times.


# d0cc54f3 10-Oct-2010 Konstantin Belousov <kib@FreeBSD.org>

The r184588 changed the layout of struct export_args, causing an ABI
breakage for old mount(2) syscall, since most struct <filesystem>_args
embed export_args. The mount(2) is supposed to provide ABI
compatibility for pre-nmount mount(8) binaries, so restore ABI to
pre-r184588.

Requested and reviewed by: bde
MFC after: 2 weeks


# 24e01f59 02-Oct-2010 Marcel Moolenaar <marcel@FreeBSD.org>

Split the root mount logic from the (generic) mount code and move
it (the root mount code) into a new file called vfs_mountroot.c

The split is almost trivial, as the code is almost perfectly
non-intertwined. The only adjustment needed was to move the UMA
zone allocation out of vfs_mountroot() [in vfs_mountroot.c] and
into vfs_mount.c, where it had to be done as a SYSINIT [see
vfs_mount_init()].

There are no functional changes with this commit.


# 9a24dc07 11-Sep-2010 Konstantin Belousov <kib@FreeBSD.org>

Protect mnt_syncer with the sync_mtx. This prevents a (rare) vnode leak
when mount and update are executed in parallel.

Encapsulate syncer vnode deallocation into the helper function
vfs_deallocate_syncvnode(), to not externalize sync_mtx from vfs_subr.c.

Found and reviewed by: jh (previous version of the patch)
Tested by: pho
MFC after: 3 weeks


# 4946fa67 09-Sep-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove VI_MOUNT flag from vnode on VFS_MOUNT() failure.


# 7443b79b 08-Sep-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Doing first mount and updating mount points are both handled by the same
syscall and the same function, but are very different and share almost no code.
To make it easier to read and analyze, split vfs_domount() into
vfs_domount_first() and vfs_domount_update().

Reviewed by: kib


# a34512e3 08-Sep-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Log all the problems in devfs_fixup().

- Correct error paths. The system will be useless on devfs_fixup() failure, so
why bother? Maybe for the same reason why a dead body is washed and dressed
in a nice suit before it is put into a coffin? Maybe system's last will is to
panic without any locks held?

Reviewed by: kib


# c87f1ad4 28-Aug-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

There is a bug in vfs_allocate_syncvnode() failure handling in mount code.
Actually it is hard to properly handle such a failure, especially in MNT_UPDATE
case. The only reason for the vfs_allocate_syncvnode() function to fail is
getnewvnode() failure. Fortunately it is impossible for current implementation
of getnewvnode() to fail, so we can assert this and make
vfs_allocate_syncvnode() void. This in turn free us from handling its failures
in the mount code.

Reviewed by: kib
MFC after: 1 month


# d779d443 18-Feb-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Reduce scope of vnode lock. vfs_mount_alloc() doesn't need vnode to be
locked.
- Remove code duplication.


# e2b36efd 29-Jan-2010 Antoine Brodin <antoine@FreeBSD.org>

MFC r201145 to stable/8:
(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR: 137213
Submitted by: Eygene Ryabinkin (initial version)


# 13e403fd 28-Dec-2009 Antoine Brodin <antoine@FreeBSD.org>

(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR: 137213
Submitted by: Eygene Ryabinkin (initial version)
MFC after: 1 month


# 9a6d3188 26-Nov-2009 Attilio Rao <attilio@FreeBSD.org>

MFC r199227:
Add the possibility for vfs.root.mountfrom tunable to accept a list of
items rather than a single one.
While there fix also a nit in a comment.

Sponsored by: Sandvine Incorporated


# d1133049 12-Nov-2009 Attilio Rao <attilio@FreeBSD.org>

Add the possibility for vfs.root.mountfrom tunable to accept a list of
items rather than a single one. The list is a space separated collection
of items defined as the current one accepted.

While there fix also a nit in a comment.

Obtained from: Sandvine Incorporated
Reviewed by: emaste
Tested by: Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>
Sponsored by: Sandvine Incorporated
MFC: 2 weeks


# 7ba889b8 08-Nov-2009 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add suggestion for zfs root.


# 87eca70e 31-Jul-2009 John Baldwin <jhb@FreeBSD.org>

Fix some LORs between vnode locks and filedescriptor table locks.
- Don't grab the filedesc lock just to read fd_cmask.
- Drop vnode locks earlier when mounting the root filesystem and before
sanitizing stdin/out/err file descriptors during execve().

Submitted by: kib
Approved by: re (rwatson)
MFC after: 1 week


# 791b0ad2 29-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Eliminate ARG_UPATH[12] arguments to AUDIT_ARG_UPATH() and instead
provide specific macros, AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2()
to capture path information for audit records. This allows us to
move the definitions of ARG_* out of the public audit header file,
as they are an implementation detail of our current kernel-internal
audit record, which may change.

Approved by: re (kensmith)
Obtained from: TrustedBSD Project
MFC after: 1 month


# 6d5a6156 01-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

When auditing unmount(2), capture FSID arguments as regular text strings
rather than as paths, which would lead to them being treated as relative
pathnames and hence confusingly converted into absolute pathnames.

Capture flags to unmount(2) via an argument token.

Approved by: re (audit argument blanket)
MFC after: 3 days


# 14961ba7 27-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type. This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by: brooks
Approved by: re (kib)
Obtained from: TrustedBSD Project
MFC after: 1 week


# bcf11e8d 05-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


# 0c349f08 31-May-2009 Craig Rodrigues <rodrigc@FreeBSD.org>

sys/boot/common.c
=================
Extend the loader to parse the root file system mount options in /etc/fstab,
and set a new loader variable vfs.root.mountfrom.options with these options.
The root mount options must be a comma-delimited string, as specified in
/etc/fstab.
Only set the vfs.root.mountfrom.options variable if it has not been
set in the environment.

sys/kern/vfs_mount.c
====================
When mounting the root file system, pass the mount options
specified in vfs.root.mountfrom.options, but filter out "rw" and "noro",
since the initial mount of the root file system must be done as "ro".
While we are here, try to add a few hints to the mountroot prompt
to give users and idea what might of gone wrong during mounting
of the root file system.

Reviewed by: jhb (an earlier patch)


# 0304c731 27-May-2009 Jamie Gritton <jamie@FreeBSD.org>

Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails. Child jails may be restricted more than their parents,
but never less. Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system. Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings. The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by: bz (mentor)


# dfd233ed 11-May-2009 Attilio Rao <attilio@FreeBSD.org>

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


# 853a10a5 09-Apr-2009 Andrew Thompson <thompsa@FreeBSD.org>

Revert r190676,190677

The geom and CAM changes for root_hold are the wrong solution for USB design
quirks.

Requested by: scottl


# 626fc9fe 03-Apr-2009 Andrew Thompson <thompsa@FreeBSD.org>

Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called
in situations where sleeping isnt allowed.


# 46b70f07 29-Mar-2009 Andrew Thompson <thompsa@FreeBSD.org>

Further rate limit the root wait status, it will be printed once per
root_mount_rel() wakeup.


# d24d45d9 26-Mar-2009 Andrew Thompson <thompsa@FreeBSD.org>

Skip the allocation of the root hold token if the mount already happened.


# f86bce5e 02-Mar-2009 Jamie Gritton <jamie@FreeBSD.org>

Extend the "vfsopt" mount options for more general use. Make struct
vfsopt and the vfs_buildopts function public, and add some new fields
to struct vfsopt (pos and seen), and new functions vfs_getopt_pos and
vfs_opterror.

Further extend the interface to allow reading options from the kernel
in addition to sending them to the kernel, with vfs_setopt and related
functions.

While this allows the "name=value" option interface to be used for more
than just FS mounts (planned use is for jails), it retains the current
"vfsopt" name and <sys/mount.h> requirement.

Approved by: bz (mentor)


# feabc903 05-Feb-2009 Attilio Rao <attilio@FreeBSD.org>

Add more KTR_VFS logging point in order to have a more effective tracing.

Reviewed by: brueffer, kib
Tested by: Gianni Trematerra <giovanni D trematerra A gmail D com>


# 4a0f8076 16-Dec-2008 Attilio Rao <attilio@FreeBSD.org>

1) Fix a deadlock in the VFS:
- threadA runs vfs_rel(mp1)
- threadB does unmount the mp1 fs, sets MNTK_UNMOUNT and drop MNT_ILOCK()
- threadA runs vfs_busy(mp1) and, as long as, MNTK_UNMOUNT is set, sleeps
waiting for threadB to complete the unmount
- threadB, in vfs_mount_destroy(), finds mnt_lock > 0 and sleeps waiting
for the refcount to expire.

Fix the deadlock by adding a flag called MNTK_REFEXPIRE which signals the
unmounter is waiting for mnt_ref to expire.
The vfs_busy contenders got awake, fails, and if they retry the
MNTK_REFEXPIRE won't allow them to sleep again.

2) Simplify significantly the code of vfs_mount_destroy() trimming
unnecessary codes:
- as long as any reference exited, it is no-more possible to have
write-op (primarty and secondary) in progress.
- it is no needed to drop and reacquire the mount lock.
- filling the structures with dummy values is unuseful as long as
it is going to be freed.

Tested by: pho, Andrea Barberio <insomniac at slackware dot it>
Discussed with: kib


# ccc55b33 30-Nov-2008 Attilio Rao <attilio@FreeBSD.org>

Fix an inverted check introduced in r184554.

Submitted by: tegge
Pointy hat to: me


# 30f60d8c 03-Nov-2008 Attilio Rao <attilio@FreeBSD.org>

Remove the mnt_holdcnt and mnt_holdcntwaiters because they are useless.
Really, the concept of holdcnt in the struct mount is rappresented by
the mnt_ref (which prevents the type-stable structure from being
"recycled) handled through vfs_ref() and vfs_rel().
On this optic, switch the holdcnt acquisition into an emulated vfs_ref()
(and subsequent release into vfs_rel()).

Discussed with: kib
Tested by: pho


# a9148abd 03-Nov-2008 Doug Rabson <dfr@FreeBSD.org>

Implement support for RPCSEC_GSS authentication to both the NFS client
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager. I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.

The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.

To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.

As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.

Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.

The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.

Sponsored by: Isilon Systems
MFC after: 1 month


# 83b3bdbc 02-Nov-2008 Attilio Rao <attilio@FreeBSD.org>

Improve VFS locking:
- Implement real draining for vfs consumers by not relying on the
mnt_lock and using instead a refcount in order to keep track of lock
requesters.
- Due to the change above, remove the mnt_lock lockmgr because it is now
useless.
- Due to the change above, vfs_busy() is no more linked to a lockmgr.
Change so its KPI by removing the interlock argument and defining 2 new
flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the
old version (which was unlinked from the lockmgr alredy) and
MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx
once the mnt interlock is held (ability still desired by most consumers).
- The stub used into vfs_mount_destroy(), that allows to override the
mnt_ref if running for more than 3 seconds, make it totally useless.
Remove it as it was thought to work into older versions.
If a problem of "refcount held never going away" should appear, we will
need to fix properly instead than trust on such hackish solution.
- Fix a bug where returning (with an error) from dounmount() was still
leaving the MNTK_MWAIT flag on even if it the waiters were actually
woken up. Just a place in vfs_mount_destroy() is left because it is
going to recycle the structure in any case, so it doesn't matter.
- Remove the markercnt refcount as it is useless.

This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and
__FreeBSD_version will be modified accordingly.

Discussed with: kib
Tested by: pho


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 0d7935fd 10-Oct-2008 Attilio Rao <attilio@FreeBSD.org>

Remove the struct thread unuseful argument from bufobj interface.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()

and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()

Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.

As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP

Reviewed by: kib
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# 6e6049e9 19-Sep-2008 David E. O'Brien <obrien@FreeBSD.org>

Add freebsd32 compat shim for nmount(2).
(and quiet some compiler warnings for vfs_donmount)


# 59ca51ad 03-Sep-2008 Simon L. B. Nielsen <simon@FreeBSD.org>

- Fix amd64 local privilege escalation. [08:07]
- Fix nmount(2) local privilege escalation. [08:08]
- Fix IPv6 remote kernel panics. [08:09]

Fix for [08:07] is merge of r181823.

Submitted by: kib [08:07], csjp [08:08], bz [08:09]
Reviewed by: peter [08:07], jhb [08:07]
Reviewed by: jinmei [08:09], rwatson [08:09]
Approved by: re (SA blanket)
Approved by: so (simon)
Security: FreeBSD-SA-08:07.amd64
Security: FreeBSD-SA-08:08.nmount
Security: FreeBSD-SA-08:09.icmp6


# 59d49325 31-Aug-2008 Attilio Rao <attilio@FreeBSD.org>

Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.

Manpages are updated accordingly.

Tested by: Diego Sardina <siarodx at gmail dot com>


# 0359a12e 28-Aug-2008 Attilio Rao <attilio@FreeBSD.org>

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# d5bdb2f6 22-Aug-2008 Craig Rodrigues <rodrigc@FreeBSD.org>

In nmount(), when we see the "force" option,
set the MNT_FORCE flag, but do not persist "force"
in the options list, since it is a command, not a persistent property
of a mount.

Similarly, when we see "reload", set MNT_RELOAD,
but delete "reload" from the options list.

MFC after: 1 week


# e792b09b 09-Aug-2008 Konstantin Belousov <kib@FreeBSD.org>

Revert r181345.
Move the NULL pointer check to the vfs_deleteopt() function.

Discussed with: rodrigc
MFC after: 3 days


# 2616144e 09-Aug-2008 Dag-Erling Smørgrav <des@FreeBSD.org>

Add sbuf_new_auto as a shortcut for the very common case of creating a
completely dynamic sbuf.

Obtained from: Varnish
MFC after: 2 weeks


# 1aad294b 12-Jul-2008 Craig Rodrigues <rodrigc@FreeBSD.org>

In nmount(), if we see "update" in the mount options,
set MNT_UPDATE in fsflags, and delete the
"update" option from the global mount options.

MNT_UPDATE is a command, and not a property of a mount
that should persist after the command is executed.

We need to do similar things for MNT_FORCE and MNT_RELOAD.

All mount flags are prefixed by MNT_..... it would
be nice if flags which were commands were named differently
from flags which are persistent properties of a mount.
This was not such a big deal in the pre-nmount() days,
but with nmount() it is more important.

Requested by: yar
MFC after: 2 weeks


# a7053783 09-Jun-2008 Konstantin Belousov <kib@FreeBSD.org>

Provide the mutual exclusion between the nfs export list modifications
and nfs requests processing. Lockmgr lock provides the shared locking for
nfs requests, while exclusive mode is used for modifications. The writer
starvation is handled by lockmgr too.

Reported by: kris, pho, many
Based on the submission by: mohan
Tested by: pho
MFC after: 2 weeks


# 2e75877f 08-Jun-2008 Wojciech A. Koszek <wkoszek@FreeBSD.org>

Remove checks against DDB, which isn't used in this file.

My intention is to bring no functional change.

Discussion on: IRC
Reviewed by: ed, kan, rink,


# a9722ace 23-May-2008 Craig Rodrigues <rodrigc@FreeBSD.org>

Do not convert the "snapshot" string to the MNT_SNAPSHOT flag here, since
we do it further down in ffs_vfsops.c

MFC after: 1 month


# d6891277 29-Apr-2008 Roman Divacky <rdivacky@FreeBSD.org>

Lock filedesc exclusively when modifying fd_[cr]dir.
This is probably harmless but it's better to lock it
correctly.

Approved by: kib (mentor)


# 9b4a8ab7 22-Apr-2008 Poul-Henning Kamp <phk@FreeBSD.org>

Now that all platforms use genclock, shuffle things around slightly
for better structure.

Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.

In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such. All that
is theoretically a matter for userland only.

Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC. For this we have <sys/clock.h>

<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on. These know only seconds and fractions thereof.

Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.

Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c. Remove references to it
elsewhere.

Remove a lot of unnecessary <sys/clock.h> includes.

Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.


# 00c71fb7 08-Apr-2008 Sam Leffler <sam@FreeBSD.org>

o add a mountroot event handler that fires when / is mounted; this information
was lost when root started being mounted by init
o remove SI_SUB_MOUNT_ROOT since it's no longer meaningful

MFC after: 2 weeks


# 57b4252e 30-Mar-2008 Konstantin Belousov <kib@FreeBSD.org>

Add the support for the AT_FDCWD and fd-relative name lookups to the
namei(9).

Based on the submission by rdivacky,
sponsored by Google Summer of Code 2007
Reviewed by: rwatson, rdivacky
Tested by: pho


# 1be222e9 23-Mar-2008 Konstantin Belousov <kib@FreeBSD.org>

Yield the cpu in the kernel while iterating the list of the
vnodes belonging to the mountpoint. Also, yield when in the
softdep_process_worklist() even when we are not going to sleep due to
buffer drain.

It is believed that the ULE fixed the problem [1], but the yielding
seems to be needed at least for the 4BSD case.

Discussed: on stable@, with bde
Reviewed by: tegge, jeff [1]
MFC after: 2 weeks


# c6446de0 18-Feb-2008 Yaroslav Tykhiy <ytykhiy@gmail.com>

Undo the damage I did in sys/kern/vfs_mount.c #1.274 and
sbin/mount_nfs/mount_nfs.c #1.76. Let the dragons sleep.

Requested by: rodrigc, des
PR: kern/120319 (welcome the bug back)


# 37ed722f 18-Feb-2008 Yaroslav Tykhiy <ytykhiy@gmail.com>

Add a remark on a questionable property of vfs_mergeopts().


# 38a7fd05 14-Feb-2008 Yaroslav Tykhiy <ytykhiy@gmail.com>

In the new order of things dictated by nmount(2), a read-only mount
is to be requested via a "ro" option. At the same time, MNT_RDONLY
is gradually becoming an indicator of the current state of the FS
instead of a command flag. Today passing MNT_RDONLY alone to the
kernel's mount machinery will lead to various glitches. (See the
PRs for examples.)

Therefore mount the root FS with a "ro" option instead of the
MNT_RDONLY flag. (Note that MNT_RDONLY still is added to the mount
flags internally, by vfs_donmount(), if "ro" was specified.)

To be able to pass "ro" cleanly to kernel_vmount(), teach the latter
function to accept options with NULL values.

Also correct the comment explaining how mount_arg() handles length
of -1.

PR: bin/106636 kern/120319
Submitted by: Jaakko Heinonen <see PR kern/120319 for email> (originally)


# 0e9eb108 23-Jan-2008 Attilio Rao <attilio@FreeBSD.org>

Cleanup lockmgr interface and exported KPI:
- Remove the "thread" argument from the lockmgr() function as it is
always curthread now
- Axe lockcount() function as it is no longer used
- Axe LOCKMGR_ASSERT() as it is bogus really and no currently used.
Hopefully this will be soonly replaced by something suitable for it.
- Remove the prototype for dumplockinfo() as the function is no longer
present

Addictionally:
- Introduce a KASSERT() in lockstatus() in order to let it accept only
curthread or NULL as they should only be passed
- Do a little bit of style(9) cleanup on lockmgr.h

KPI results heavilly broken by this change, so manpages and
FreeBSD_version will be modified accordingly by further commits.

Tested by: matteo


# 22db15c0 13-Jan-2008 Attilio Rao <attilio@FreeBSD.org>

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


# cb05b60a 09-Jan-2008 Attilio Rao <attilio@FreeBSD.org>

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


# 450ea867 31-Dec-2007 Craig Rodrigues <rodrigc@FreeBSD.org>

In vfs_scanopt(), make sure that the mount option value is not NULL
before calling vsscanf().

PR: 118531
Submitted by: Jaakko Heinonen <jh saunalahti fi>
MFC after: 3 days


# b27aa20e 27-Dec-2007 Warner Losh <imp@FreeBSD.org>

A partial solution to some of the 'pull the umass device with a
mounted FS' problems. These are more along the lines of 'avoiding an
avoidable panic' than a complete solution to removable devices. We
now close the barn door after the horse has gotten lose and has been
hit by a truck, as it were. The barn no longer catches fire in this
case, but the horse is still dead :-).

The vfs_bio.c fix causes us not to put a failed write back into the
dirty pool if the error returned was ENXIO. In that case, the buffer
is treated like any other clean buffer that's being retured. ENXIO
means the device isn't there anymore and will never be there again in
the future, so retrying is futile.

The vfs_mount.c fix treats 'ENXIO' as success for unmounting a file
system. If the device is gone, retrying later won't help and we'll
never be able to unmount the device.

These two are part of a larger patch set submitted by the author. The
other patches will be forth coming. I added comments to these two
patches.

Submitted by: Henrik Gulbrandsen
Reviewed by: phk@
PR: usb/46176 (partial)


# 62bdb328 04-Dec-2007 Craig Rodrigues <rodrigc@FreeBSD.org>

In nmount(), internally convert the mount option: "rdonly" to "ro".
This makes updates mounts such as:
"mount -u -o rdonly" work more like, "mount -u -o ro".

References to "-o rdonly" were changed to "-o ro" in revision 1.60 of
the mount(8) man page,
but some people still like to use "-o rdonly" since it was documented
in earlier versions of FreeBSD.

Requested by: rwatson
MFC after: 1 week


# b4b5bf35 27-Oct-2007 Craig Rodrigues <rodrigc@FreeBSD.org>

In nmount(), if MNT_ROOT is in the mount flags, filter it
out instead of returning an error.
(1) This makes the behavior consistent with mount(2).
(2) This makes update mounts on the root file system work properly.
(3) The explicit checks for MNT_ROOTFS in src/sbin/fsck_ffs/main.c
and src/usr.sbin/mountd/mountd.c which were put in to
eliminate errors during update mounts on the root file system
can be removed.

The only place were MNT_ROOTFS can be validly set
is inside the kernel, i.e. with vfs_mountroot_try().

Reviewed by: phk
MFC after: 3 days


# 30d239bc 24-Oct-2007 Robert Watson <rwatson@FreeBSD.org>

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


# 245b2044 12-Sep-2007 Konstantin Belousov <kib@FreeBSD.org>

When restoring the mount after umount failed, the MNTK_UNMOUNT flag
prevents insmntque() from placing reallocated syncer vnode on mount
list, that causes panic in vfs_allocate_syncvnode().

Introduce MNTK_NOINSMNTQ flag, that marks the period when instmntque is
not allowed to success, instead of MNTK_UNMOUNT. The MNTK_NOINSMNTQ is
set and cleared simultaneously with MNTK_UNMOUNT, except on umount error
path, where it is cleaned just before the syncer vnode is going to be
allocated.

Reported by: Peter Jeremy <peterjeremy optushome com au>
Suggested by: tegge
Approved by: re (rwatson)


# 1dc5b1cc 15-Aug-2007 John Baldwin <jhb@FreeBSD.org>

On 6.x this works:

% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)

Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow

In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home

Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)

Requested by: jhb
Reported by: Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by: re (bmah)
Proxy commit for: rodrigc


# 68c1a246 26-Jul-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

The v_mountedhere field is protected by the vnode lock, not vnode's internal
lock.

Approved by: re (rwatson)


# d7f81adb 14-Jul-2007 Craig Rodrigues <rodrigc@FreeBSD.org>

Revert previous commits which I committed by mistake.

Approved by: re (implicit)
Pointy hat to: me


# d678780e 14-Jul-2007 Craig Rodrigues <rodrigc@FreeBSD.org>

The last entry in the ext2_opts array must be NULL,
otherwise the kernel with crash in vfs_filteropt() if an invalid
mount option is passed to ext2fs.

Approved by: re (kensmith)


# 32f9753c 11-Jun-2007 Robert Watson <rwatson@FreeBSD.org>

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


# e5ea32c2 26-Apr-2007 Konstantin Belousov <kib@FreeBSD.org>

Allow the dounmount() to proceed even for doomed coveredvp.

In dounmount(), before or while vn_lock(coveredvp) is called, coveredvp
vnode may be VI_DOOMED due to one of the following:
- other thread finished unmount and vput()ed it, and vnode was chosen
for recycling, while vn_lock() slept;
- forced unmount of the coveredvp->v_mount fs.
In the first case, next check for changed v_mountedhere or mnt_gen counter
would be successfull. In the second case, the unmount shall be allowed.

Submitted by: sobomax
MFC after: 2 weeks


# 7760d840 17-Apr-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Export vfs_mount_alloc() as it is used in ZFS.


# 24b0502e 13-Apr-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix jails and jail-friendly file systems handling:
- We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail.
- Move security checks to vfs_suser() and deny unmounting and updating
for jailed root from different jails, etc.

OK'ed by: rwatson


# a363f67a 09-Apr-2007 Nate Lawson <njl@FreeBSD.org>

Restore the locking for the sleep/wakeup to avoid waiting an extra 1 sec
if a race was lost. We're still single-threaded at this point, but just
be safe for the future.


# 6b1e469e 09-Apr-2007 Nate Lawson <njl@FreeBSD.org>

Clean up the root mount and mount wait code. No mutexes are needed here
since a spurious wakeup() is the only possible outcome and this is fine in
the BSD programming model.


# 2eb68d49 08-Apr-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add root_mounted() function that returns true if the root file system is
already mounted.


# f3a8d2f9 05-Apr-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add security.jail.mount_allowed sysctl, which allows to mount and
unmount jail-friendly file systems from within a jail.
Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and
PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user.
It is turned off by default.

A jail-friendly file system is a file system which driver registers
itself with VFCF_JAIL flag via VFS_SET(9) API.
The lsvfs(1) command can be used to see which file systems are
jail-friendly ones.

There currently no jail-friendly file systems, ZFS will be the first one.
In the future we may consider marking file systems like nullfs as
jail-friendly.

Reviewed by: rwatson


# 5e3f7694 04-Apr-2007 Robert Watson <rwatson@FreeBSD.org>

Replace custom file descriptor array sleep lock constructed using a mutex
and flags with an sxlock. This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention. All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently. Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.

- Generally eliminate the distinction between "fast" and regular
acquisisition of the filedesc lock; the plan is that they will now all
be fast. Change all locking instances to either shared or exclusive
locks.

- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
was called without the mutex held; sx_sleep() is now always called with
the sxlock held exclusively.

- Universally hold the struct file lock over changes to struct file,
rather than the filedesc lock or no lock. Always update the f_ops
field last. A further memory barrier is required here in the future
(discussed with jhb).

- Improve locking and reference management in linux_at(), which fails to
properly acquire vnode references before using vnode pointers. Annotate
improper use of vn_fullpath(), which will be replaced at a future date.

In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).

Tested by: kris
Discussed with: jhb, kris, attilio, jeff


# afd894bb 03-Apr-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add root_mount_wait() function which can be used to wait until the root
file system is mounted. This is useful for kernel modules loaded from
/boot/loader.conf, that have to access file system.


# 5c1c2e82 01-Apr-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

I think the code I'm removing here is completely bogus.
vfs_flags field is used for VFCF_* flags which are given at file system
driver creation time (via VFS_SET(9)) macro.

What this code did was bascially this:

If file system registers itself with VFCF_UNICODE flag (stores file names
as Unicode), it will gain MNT_SOFTDEP flag (UFS soft-updates).

If file system registers itself with VFCF_LOOPBACK flag (aliases some other
mounted FS), it will gain MNT_SUIDDIR flag (special handling of SUID on
dirs).

The latter will be quite dangerous, but those flags are reset later in
vfs_domount().

MFC after: 1 month


# 695919ad 31-Mar-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Make vfs_mount_destroy() and vfs_freeopts() non-static, I'd like to use them.


# 9a2fd584 17-Mar-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Don't deny unmounting file systems for jailed processes immediately, allow
prison_priv_check() to decide what to do.

This change is suppose not to change current (security) behaviour
in any way.

This change is simlar to the change of PRIV_VFS_MOUNT in previous revision.


# 75336520 14-Mar-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Don't deny mounting for jailed processes immediately, allow
prison_priv_check() to decide what to do.

This change is suppose not to change current (security) behaviour
in any way.

Reviewed by: rwatson


# f7d4e990 13-Mar-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

White space nits.


# 873fbcd7 05-Mar-2007 Robert Watson <rwatson@FreeBSD.org>

Further system call comment cleanup:

- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
"syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.


# 38cc2a5c 12-Feb-2007 Olivier Houchard <cognet@FreeBSD.org>

Make vfs_getopts() set *error to ENOENT if the option wasn't found, so that
consumers don't have to check for both error and the return value (some of
them actually don't do it).

MFC After: 1 week


# 2892f3bb 16-Dec-2006 Craig Rodrigues <rodrigc@FreeBSD.org>

Add a function vfs_deleteopt() which searches through the vfsoptlist
linked list of mount options by name, and deletes the option if it finds it.


# acd3428b 06-Nov-2006 Robert Watson <rwatson@FreeBSD.org>

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# aed55708 22-Oct-2006 Robert Watson <rwatson@FreeBSD.org>

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


# 30af7119 03-Oct-2006 Konstantin Belousov <kib@FreeBSD.org>

Fix the remaining race in the revs. 1.232, 1,233 that could occur during
unmount when mp structure is reused while waiting for coveredvp lock.
Introduce struct mount generation count, increment it on each reuse and
compare the generations before and after obtaining the coveredvp lock.

Reviewed by: tegge, pjd
Approved by: pjd (mentor)
MFC after: 2 weeks


# f645b0b5 01-Oct-2006 Poul-Henning Kamp <phk@FreeBSD.org>

First part of a little cleanup in the calendar/timezone/RTC handling.

Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.


# e60c3612 25-Sep-2006 Tor Egge <tegge@FreeBSD.org>

Reduce fluctuations of mnt_flag to allow unlocked readers to get a
slightly more consistent view.


# fba924ce 25-Sep-2006 Tor Egge <tegge@FreeBSD.org>

Don't restore MNT_QUOTA bit in mnt_flag after a failed mount with
MNT_UPDATE flag, closing a race between nmount() and quotactl().


# a1e363f2 25-Sep-2006 Tor Egge <tegge@FreeBSD.org>

Add mnt_noasync counter to better handle interleaved calls to nmount(),
sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag
which is set only when MNT_ASYNC is set and mnt_noasync is zero, and
check that flag instead of MNT_ASYNC before initiating async io.


# cea9d840 25-Sep-2006 Tor Egge <tegge@FreeBSD.org>

Don't restore mnt_kern_flag on failed MNT_UPDATE mount, it can race
with dounmount(), causing loss of MNTK_UNMOUNT flag.


# 5da56ddb 25-Sep-2006 Tor Egge <tegge@FreeBSD.org>

Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().


# f37e6338 19-Sep-2006 Konstantin Belousov <kib@FreeBSD.org>

Fix the bug in rev. 1.232. If vfs_suser returned false, coveredvp shall be
unlocked only if it really exists.

Found with: Coverity Prevent(tm)
CID: 1535
Approved by: pjd (mentor)


# 4dec8579 18-Sep-2006 Konstantin Belousov <kib@FreeBSD.org>

Fix the race while waiting for coveredvp lock during unmount. The vnode may
be recycled during the sleep, wrap the vn_lock with vhold/vdrop.
Check that coveredvp still points to the same mp after sleep (needed
because sleep dropped Giant).
Move check for user rights for unmount after coveredvp lock is obtained.

Tested by: Peter Holm
Reviewed by: tegge
Approved by: kan (mentor)
MFC after: 2 weeks


# aed760ef 26-Aug-2006 Marius Strobl <marius@FreeBSD.org>

Fix another bug introduced with rev. 1.204; in vfs_donmount() if
the 'vfs_getopt(optlist, "errmsg", (void **)&errmsg, &errmsg_len)'
call fails, 'errmsg' is left uninitialized, making the later tests
against NULL meaningless, and the uses bogus. Thus initialize
'errmsg' to NULL beforehand. [1]
While at it, remove the superfluous assignment of 0 to 'errmsg_len'
if the above mentioned call fails as it's already initialized to 0.

Submitted by: Michael Plass [1]


# bebabf24 25-Aug-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix comment.


# 3a30d178 24-Aug-2006 Marius Strobl <marius@FreeBSD.org>

Fix a bug introduced with rev. 1.204; in vfs_donmount() use
copyout(9) instead of copystr(9) for copying the errmsg from
kernel- to user-space. This fixes a panic on sparc64 when
using the nmount(2)-converted mountd(8).
While at it, use bcopy(3) instead of strncpy(3) in the kernel-
to kernel-space case for consistency with vfs_buildopts() and
between kernel- to user-space and kernel- to kernel-space case.


# 597d608f 27-Jun-2006 John Baldwin <jhb@FreeBSD.org>

- Expand the scope of Giant some in mount(2) to protect the vfsp structure
from going away. mount(2) is now MPSAFE.
- Expand the scope of Giant some in unmount(2) to protect the mp structure
(or rather, to handle concurrent unmount races) from going away.
umount(2) is now MPSAFE, as well as linux_umount() and linux_oldumount().
- nmount(2) and linux_mount() were already MPSAFE.


# 7ebfc8df 05-Jun-2006 Robert Watson <rwatson@FreeBSD.org>

Audit some arguments to nmount(), mount(), umount().

Submitted by: wsalamon
Obtained from: TrustedBSD Project


# 1f58dd49 02-Jun-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix a problem introduced in revision 1.220. On mount(2) failure, don't
forget to unbusy file system before its destruction.

This fixes the following warning on mount failure:

Mount point <X> had 1 dangling refs

Tested by: wkoszek


# 0c89bb0a 25-May-2006 Craig Rodrigues <rodrigc@FreeBSD.org>

Add "update" mount option to global_opts array,
for use with vfs_filteropt().


# 5eb304a9 25-May-2006 Craig Rodrigues <rodrigc@FreeBSD.org>

Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems. Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.


# c9ad8a67 15-May-2006 Kelly Yancey <kbyanc@FreeBSD.org>

Restore the ability to mount procfs and fdescfs filesystems via the
mount(2) system call:

* Add cmount hook to fdescfs and pseudofs (and, by extension, procfs and
linprocfs). This (mostly) restores the ability to mount these
filesystems using the old mount(2) system call (see below for the
rest of the fix).

* Remove not-NULL check for the data argument from the mount(2) entry
point. Per the mount(2) man page, it is up to the individual
filesystem being mounted to verify data. Or, in the case of procfs,
etc. the filesystem is free to ignore the data parameter if it does
not use it. Enforcing data to be not-NULL in the mount(2) system call
entry point prevented passing NULL to filesystems which ignored the
data pointer value. Apparently, passing NULL was common practice
in such cases, as even our own mount_std(8) used to do it in the
pre-nmount(2) world.

All userland programs in the tree were converted to nmount(2) long ago,
but I've found at least one external program which broke due to this
(presumably unintentional) mount(2) API change. One could argue that
external programs should also be converted to nmount(2), but then there
isn't much point in keeping the mount(2) interface for backward
compatibility if it isn't backward compatible.


# 5250012a 13-May-2006 Craig Rodrigues <rodrigc@FreeBSD.org>

For nmount(), if "rw" is specified as a mount option,
add "noro" to the list of mount options. This allows
a read-only mount to be converted to read-write via:
mount -u -o rw

Requested by: kris


# ba5eb429 31-Mar-2006 Jeff Roberson <jeff@FreeBSD.org>

- When there are dangling vnodes at unmount print them before we panic.

Sponsored by: Isilon Systems, Inc.


# a218edce 30-Mar-2006 Jeff Roberson <jeff@FreeBSD.org>

- Allocate mounts from a uma zone that uses UMA_ZONE_NOFREE to prevent
mount memory from being reclaimed. This resolves a number of race
conditions described in vfs_default.c and introduced with the
VFS_LOCK_GIANT macros.
- Let the mtx and lock remain valid after the mount structure has been
freed by using init and fini calls. Technically fini will never be
called but is included for completeness.
- Consistently use lockmgr directly rather than lockmgr to lock and
vfs_unbusy to unlock.

Discussed with: tegge
Tested by: kris
Sponsored by: Isilon Systems, Inc.


# 936ddefc 13-Mar-2006 Ruslan Ermilov <ru@FreeBSD.org>

The mount(8) manpage says: "In case of conflicting options being
specified, the rightmost option takes effect." Fix code to obey
this. This makes e.g. "mount -r /usr" or "mount -ar" actually
mount file systems read-only.


# 791dd2fa 08-Mar-2006 Tor Egge <tegge@FreeBSD.org>

Use vn_start_secondary_write() and vn_finished_secondary_write() as a
replacement for vn_write_suspend_wait() to better account for secondary write
processing.

Close race where secondary writes could be started after ffs_sync() returned
but before the file system was marked as suspended.

Detect if secondary writes or softdep processing occurred during vnode sync
loop in ffs_sync() and retry the loop if needed.


# a4aeaefe 21-Feb-2006 Jeff Roberson <jeff@FreeBSD.org>

- We can not hold a vnode lock while we do a lookup. Search for and load
modules prior to looking up the directory which we will cover to avoid
this problem in mount.
- We must hold the coveredvp locked before we can busy the mountpoint to
prevent a lock order reversal with the vfs_busy() in lookup which holds
the directory lock prior to doing a vfs_busy(). The directory lock is
required to safely clear the v_mountedhere field on the directory.

MFC After: 1 week


# 04f6d3ef 06-Feb-2006 Jeff Roberson <jeff@FreeBSD.org>

- Add a ref count to the mount structure. Sleep for up to 3 seconds in
vfs_mount_destroy waiting for this ref to hit 0. We don't print an
error if we are rebooting as the root mount always retains some refernces
by init proc.
- Acquire a mnt ref for every vnode allocated to a mount point. Drop this
ref only once vdestroy() has been called and the mount has been freed.
- No longer NULL the v_mount pointer in delmntque() so that we may release
the ref after vgone() has been called. This allows us to guarantee
that the mount point structure will be valid until the last vnode has
lost its last ref.
- Fix a few places that rely on checking v_mount to detect recycling.

Sponsored by: Isilon Systems, Inc.
MFC After: 1 week


# c270875f 28-Jan-2006 Suleiman Souhlal <ssouhlal@FreeBSD.org>

Don't try to load KLDs if we're mounting the root. We'd otherwise panic.

Tested by: kris
MFC after: 3 days


# 323203d3 15-Jan-2006 Christian S.J. Peron <csjp@FreeBSD.org>

vfs_busy can only return something useful if MNTK_UNMOUNT has been set.
Since we are using vfs_busy() on a freshly allocated mount structure, use
(void) to show that we do not care about the return value.

Found with: Coverity Prevent (tm)
MFC after: 2 weeks


# 6994eebc 15-Jan-2006 Robert Watson <rwatson@FreeBSD.org>

Cast VFS_STATFS() in vfs_domount() to (void) to indicate that ignoring the
return value is intentional: this is simply an attempt to pre-cache the
statfs state.

Found with: Coverity Prevent (tm)
MFC after: 3 days


# 6a61c14e 14-Jan-2006 Ruslan Ermilov <ru@FreeBSD.org>

AMD64 also supports disk slices.


# 82be0a5a 09-Jan-2006 Tor Egge <tegge@FreeBSD.org>

Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by: truckman


# ade9b797 19-Dec-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

vfs_mount_alloc() always returns 0, but what we really want is newly
allocated 'struct mount *' pointer, so simplify code a bit and return
the pointer directly.

Reviewed by: ssouhlal


# 003ba8a0 19-Dec-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Use 'td' instead of 'curthread'.


# d5989f64 07-Dec-2005 Craig Rodrigues <rodrigc@FreeBSD.org>

In devfs_first(), set mp->mnt_opt to a valid empty list of mount options
instead of leaving it NULL. This eliminates a kernel panic
when trying to do a mount -o update of /dev.

Noticed by: cjsp
Reviewed by: phk


# 8539ca4c 07-Dec-2005 Craig Rodrigues <rodrigc@FreeBSD.org>

Add "errmsg" to list of global mount options.


# 1245b343 02-Dec-2005 Craig Rodrigues <rodrigc@FreeBSD.org>

Add "rdonly" to global_opts, and parse it in vfs_donmount().

Requested by: rwatson


# ec528a34 02-Dec-2005 Craig Rodrigues <rodrigc@FreeBSD.org>

- Add "rw" mount option to global_opts.
- In vfs_donmount(), parse "ro", "noro", and "rw", in order to set or
unset the MNT_RDONLY filesystem flag.


# 5e6b93a0 23-Nov-2005 Craig Rodrigues <rodrigc@FreeBSD.org>

In nmount() and vfs_donmount(), do not strcmp() the options in the iovec
directly. We need to copyin() the strings in the iovec before
we can strcmp() them. Also, when we want to send the errmsg back
to userspace, we need to copyout()/copystr() the string.

Add a small helper function vfs_getopt_pos() which takes in the
name of an option, and returns the array index of the name in the iovec,
or -1 if not found. This allows us to locate an option in
the iovec without actually manipulating the iovec members. directly via
strcmp().

Noticed by: kris on sparc64


# 60b78239 19-Nov-2005 Marcel Moolenaar <marcel@FreeBSD.org>

Fix bug introduced in revision 1.186:
When all file systems have a time stamp of zero, which is the case
for example when the root file system is on a read-only medium, we
ended up not calling inittodr() at all. A potential uncleanliness
existed as well. If multiple file systems had a non-zero time stamp,
we would call inittodr() multiple times. While this should not be
harmful, it's definitely not ideal.
Fix both issues by iterating over the mounted file systems to find
the largest time stamp and call inittodr() exactly once with that
time stamp. This could of course be a zero time stamp if none of the
mounted file systems have a non-zero time stamp. In that case the
annoying errors mentioned in the commit log for revision 1.186 still
haven't been avoided. The bottom line is that inittodr() should not
complain when it gets a time base of zero. At the time of this
commit only alpha seems to have that problem.

Reported by: Dario Freni (saturnero at freesbie dot org)
MFC after: 1 week


# 425e5b62 19-Nov-2005 Craig Rodrigues <rodrigc@FreeBSD.org>

Parse more mount options in vfs_donmount(), before vfs_domount()
is called. It looks like there are lots of different mount flags checked
in vfs_domount(), so we need to do the parsing for these particular
mount flags earlier on. The new flags parsed are:
async, force, multilabel, noasync, noatime, noclusterr, noclusterw,
noexec, nosuid, nosymfollow, snapshot, suiddir, sync, union.

Existing code which uses mount() to mount UFS filesystems is not
affected, but new code which uses nmount() to mount UFS filesystems
should behave better.


# 8fd860cf 17-Nov-2005 Craig Rodrigues <rodrigc@FreeBSD.org>

In vfs_nmount(), check to see if "update" mount option was passed
in, and if so, set MNT_UPDATE filesystem flag.
vfs_nmount() calls vfs_domount(), and there is special logic
inside vfs_domount() if MNT_UPDATE is set. This is very important
when we want to do an update mount of the root filesystem, using nmount().


# d5328381 12-Nov-2005 Craig Rodrigues <rodrigc@FreeBSD.org>

style(9) cleanups.

Spotted by: njl, bde


# 4560dfb5 08-Nov-2005 Craig Rodrigues <rodrigc@FreeBSD.org>

For nmount(), allow a text string error message to be propagated back
to user-space if a parameter named "errmsg" is passed into the iovec.
Used in conjunction with vfs_mount_error(), more useful error messages
than errno can be passed back to userspace when mounting a filesystem
fails.

Discussed with: phk, pjd


# 84e69560 07-Nov-2005 Craig Rodrigues <rodrigc@FreeBSD.org>

Add utility function to propagate mount errors as text string messages.

Discussed with: phk


# 2611e5a6 02-Sep-2005 Suleiman Souhlal <ssouhlal@FreeBSD.org>

Don't unbusy the devfs mount in vfs_mountroot_try() as it gets accessed
and unbusied in devfs_fixup(), which assumes that the devfs mount is
still locked.

Granced at by: phk
MFC after: 3 days


# b578b0bd 18-May-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

devfs_first() return value isn't used, remove it.


# 07ebf8c8 11-May-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

We don't use 'mp' variable, but we do want to mount devfs, ehh.


# b8bc5373 11-May-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove unised variable introduced by accident in rev 1.168.

Found by: Coverity Prevent analysis tool


# f850b278 11-May-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Plug memory leaks.

Found by: Coverity Prevent analysis tool


# 194dfed9 30-Apr-2005 Jeff Roberson <jeff@FreeBSD.org>

- Remove an old splcam hack.


# f163441e 19-Apr-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Call g_waitidle() before every check the list of holds is empty.

Suggested by: phk


# d1c712ed 19-Apr-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Call g_waitidle() instead of GEOM using the root_mount_hold() KPI.
GEOM could (and will) get events as a result of drivers coming in
late so a one-shot method is not good enough for GEOM.


# 73fbaa74 18-Apr-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Add a named reference-count KPI to hold off mounting of the root filesystem.

While we wait for holds to be released, print a list of who holds us
back once per second.

Use the new KPI from GEOM instead of vfs_mount.c calling g_waitidle().

Use the new KPI also from ata.

With ATAmkIII's newbusification, ata could narrowly miss the window
and ad0 would not exist when we tried to mount root.


# bdb35646 18-Apr-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Initialize mountlist_mtx with an MTX_SYSINIT(), we need it to be ready
earlier.


# f247a524 30-Mar-2005 Jeff Roberson <jeff@FreeBSD.org>

- LK_NOPAUSE is a nop now.

Sponsored by: Isilon Systems, Inc.


# 379ba853 24-Mar-2005 Marcel Moolenaar <marcel@FreeBSD.org>

Fix inittodr() invocation. Now that devfs is mounted before the
actual root file system is mounted, the first entry on the mountlist
is not the root file system and the timestamp for that entry is
typically 0. Passing that to inittodr() caused annoying errors on
alpha and ia64.
So, call inittodr() for all file systems on mountlist, but only when
the timestamp (mnt_time) is non-zero.


# d830f828 24-Mar-2005 Jeff Roberson <jeff@FreeBSD.org>

- Pass LK_EXCLUSIVE to VFS_ROOT() to satisfy the new flags argument. For
now, all calls to VFS_ROOT() should still acquire exclusive locks.

Sponsored by: Isilon Systems, Inc.


# 9068e776 16-Mar-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Fix a memoryleak in case of failed root filesystem mount.

Spotted by: Coverity via sam


# 2a77000b 16-Mar-2005 John-Mark Gurney <jmg@FreeBSD.org>

MFp4: print a more useful error when we don't have a /dev to mount devfs
on..


# 78bb3c21 16-Mar-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Add mnt_hashseed to struct mount and initialize it witn PRNG bits, use
it to get better hashing in vfs_hash.

In case of an insert collision in vfs_hash_insert(), put the loosing vnode
on a special list so that vfs_hash_remove() can just assume that it is on
a list.

Drop the VI_HASHED flag.


# e8ed9330 20-Feb-2005 David Schultz <das@FreeBSD.org>

Remove VFS_START(). Its original purpose involved the mfs filesystem,
which is long gone.

Discussed with: mckusick
Reviewed by: phk


# ebbfc2f8 09-Feb-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Make various mountpoint related functions static.


# f627315f 03-Feb-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Move gets() function to libkern (I want to use it outside vfs_mount.c).
- Add buffer size limitations (overflow will not be possible anymore).
- Add 'visible' option, which will allow for passphrase reading in the
future.
- Remove special treatment of '@' and '#', those two are only confusing.

Discussed with: rwatson
MFC after: 2 weeks


# fc48b760 24-Jan-2005 Jeff Roberson <jeff@FreeBSD.org>

- Protect mnt_kern_flag with the mountpoint's mutex. This is required
to make the suspend related functions mpsafe.

Sponsored By: Isilon Systems, Inc.


# 7c0745ee 14-Jan-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Eliminate unused and unnecessary "cred" argument from vinvalbuf()


# 8df6bac4 11-Jan-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().

I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

The credentials for syncing a file (ability to write to the
file) should be checked at the system call level.

Credentials for syncing one or more filesystems ("none")
should be checked at the system call level as well.

If the filesystem implementation needs a particular credential
to carry out the syncing it would logically have to the
cached mount credential, or a credential cached along with
any delayed write data.

Discussed with: rwatson


# 9454b2d8 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for copyright notices, minor format tweaks as necessary


# aa6f98d1 26-Dec-2004 Alexander Kabaev <kan@FreeBSD.org>

Do not vput(9) unlocked vnode and do not VREF it with the sole purpose
of vputting it back immediately.

Complained by: DEBUG_VFS_LOCKS


# 72e8dfe5 20-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Hide/remove various printfs, now that root mounting doesn't seem to explode
on people.


# 12b18fda 14-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Move the checkdirs() function from vfs_mount.c to kern_descrip.c and
call it mountcheckdirs().


# 1ab58cc2 11-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Copy the entire stats structure. Let compiler decide how.


# e40da1f1 11-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Fix whitespace.

Spotted by: njl


# 494ea31a 10-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the /dev/dev -> / symlink after we are done with it.


# 20a92a18 07-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
Convert to nmount.
Add omount compat shims.
Remove dedicated rootfs mounting code.
Use vfs_mountedfrom()
Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
Convert to nmount (the simple way, mount_nfs(8) is still necessary).
Add omount compat shims.
Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
Convert to nmount.
Add omount compat shims.
Remove dedicated rootfs mounting code.
Use vfs_mountedfrom()
Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
Mount devfs on /. Devfs needs no 'from' so this is clean.
symlink /dev to /. This makes it possible to lookup /dev/foo.
Mount "real" root filesystem on /.
Surgically move the devfs mountpoint from under the real root
filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
Don't do devfs mounting and rootvnode assignment here, it was
already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias(). Put the
few necessary lines in devfs where they belong. This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway. Correct information is provided by
statfs(/).


# 46d2b418 06-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Instead of complaining about it, just silently filter out MNT_ROOTFS.

This fixes the "fsck /" problem various people have reported overnight.


# 1e8ca0f0 06-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Always call VFS_STATFS() on mp->mnt_stat when we have mounted a filesystem,
this way individual filesystems don't have to do it.


# 53a05b7c 06-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Add more functions for handling mount arguments in VFS_MOUNT():

vfs_flagopt() for binary/boolean options.
vfs_getopts() for string options
vfs_filteropt() to check for unknown options.
vfs_scanopt() for scanf() like processing of options.

Also add function for setting the stat.f_mntfromname field.


# 5ddb0739 06-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Change the first argument of vfs_cmount() to a handy struct mntarg* and
call it accordingly.

(No filesystems implement vfs_cmount() yet, so this is a no-op commit)


# 49bfeeb8 06-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Add a few convenient functions in the mount_arg() family and collect the
entire family at the end of the source file.


# f0df0367 05-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Collapse two almost identical license copies, preserving the rights of
all listed authors, rightholders and contributors.


# def7671a 05-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the kern.rootdev sysctl.

Root filessytems (like NFS) don't have an associated disk device,
and even if they had, the exact semantics would be filesystem
dependent and should be implemented there.


# a804d99c 05-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Make struct vfsopt{list} private to vfs_mount.c


# 74331236 05-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases
doesn't. Most of the implementations have grown weeds for this so they
copy some fields from mnt_stat if the passed argument isn't that.

Fix this the cleaner way: Always call the implementation on mnt_stat
and copy that in toto to the VFS_STATFS argument if different.


# 6c12df5a 03-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Implement a function, mount_arg() for accumulating a list of mount parameters
to nmount.

Make kernel_mount() accept the output from mount_arg() and know how to
free the malloc'ed space.

Make kernel_vmount() use the new function.


# b74f4d8b 03-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

When omount() is called, check if the filesystem have a cmount method
and if so call it.

The cmount method will gather and interpret omount() style arguments,
and issue a kern_[v]mount() call to execute the corresponding nmount
operation.


# 2a8b79eb 03-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Add early checks for MNT_ROOTFS since we need to allow it later on in
the code path.


# a08805c7 03-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Retire unused vfs_mount() function in the name of nmount migration.


# 32ba8e93 03-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce vfs_byname_kld() which will try to load the filesystem
as a module if possible.

Use it so we don't have linker magic in the middle of the already
complex mount code.


# a7db6b6e 28-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Use FILEDESC_LOCK_FAST in checkdirs()


# 6518a5aa 26-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Eliminate MNT_NODEV usage, it doesn't have any meaning any more.

Keep a #define MNT_NODEV 0 around to avoid dealing with contrib
userland like mount_smbfs.


# 1b52747b 24-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Allow a filesystem to have both old and new mount methods at the same
time. This will be necessary for transitioning.


# 19da2efc 24-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Assert Giant held in vfs_domount() and vfs_dounmount()

Explicitly grab Giant before calling these.


# de4cbbf5 25-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Integrate the relevant bits of vfs_rootmountalloc() where it matters.


# 7cc9fb79 18-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Pass path to filesystem when mounting root


# b6eb6699 10-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

remove unused variable


# c2597f2d 10-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Remove hack which mounts the root filesystem R/W if the device is
named 'md<something>'. While convenient, it does not belong here,
if anywhere at all.


# e207b52a 09-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Make getdiskbyname() static to vfs_mount.c.

Eliminate use of vn_todev() while here.


# 186e51cb 23-Oct-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Drop Giant around the call to g_waitidle().
This is necessary to allow any geom events which need it to pick up Giant.


# 8d02a378 05-Oct-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Back out changes which were introduced to delay mounting root file system.
Those changes were made on gmirror needs, but now gmirror handles this
by itself.


# d0257d9c 24-Sep-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Rename 'mount_root_delay' tunable to 'vfs.root.mountdelay', which fits
a bit better to our current naming scheme.

Discussed with: ru


# fe0b8275 24-Sep-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Eliminate devsw() call, we are not dereferencing the pointer.


# 5a19f8b0 23-Sep-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Introduce new /boot/loader.conf variable: root_mount_delay.
It can be used to delay mounting root partition to give a chance to GEOM
providers to show up.
Now, when there is no needed provider, vfs_rootmount() function will look
for it every second and if it can't be find in defined time, it'll ask
for root device name (before this change it was done immediately).

This will allow to boot from gmirror device in degraded mode.


# 4c0bef62 05-Sep-2004 Alfred Perlstein <alfred@FreeBSD.org>

It's too easy to panic the machine when INVARIANTS are turned on
and you botch a call to nmount(2).

This is because there is an INVARIANTS check that asserts that
opt->len must be zero if opt->val is not NULL. The problem is that
the code does not actually follow this invariant if there is an
error while processing mount options.

Fix the code to honor the INVARIANT.

Silence on: fs@


# 5e8c582a 30-Jul-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Put a version element in the VFS filesystem configuration structure
and refuse initializing filesystems with a wrong version. This will
aid maintenance activites on the 5-stable branch.

s/vfs_mount/vfs_omount/

s/vfs_nmount/vfs_mount/

Name our filesystems mount function consistently.

Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space. A few places abused
it to get hold of some credentials to pass around. Effectively
it is unused.

Reorganize the root filesystem selection code.


# 3dfe213e 27-Jul-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Convert the vfsconf list to a TAILQ.

Introduce vfs_byname() function to find things on it.

Staticize vfs_nmount() function under the name vfs_donmount().

Various cleanups.


# 65a311fc 13-Jul-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Give kldunload a -f(orce) argument.

Add a MOD_QUIESCE event for modules. This should return error (EBUSY)
of the module is in use.

MOD_UNLOAD should now only fail if it is impossible (as opposed to
inconvenient) to unload the module. Valid reasons are memory references
into the module which cannot be tracked down and eliminated.

When kldunloading, we abandon if MOD_UNLOAD fails, and if -force is
not given, MOD_QUIESCE failing will also prevent the unload.

For backwards compatibility, we treat EOPNOTSUPP from MOD_QUIESCE as
success.

Document that modules should return EOPNOTSUPP for unknown events.


# f257b7a5 12-Jul-2004 Alfred Perlstein <alfred@FreeBSD.org>

Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.


# 552afd9c 10-Jul-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Clean up and wash struct iovec and struct uio handling.

Add copyiniov() which copies a struct iovec array in from userland into
a malloc'ed struct iovec. Caller frees.

Change uiofromiov() to malloc the uio (caller frees) and name it
copyinuio() which is more appropriate.

Add cloneuio() which returns a malloc'ed copy. Caller frees.

Use them throughout.


# 1ea60617 06-Jul-2004 Alfred Perlstein <alfred@FreeBSD.org>

Use vfs_suser() where appropriate.


# c713aaae 06-Jul-2004 Alfred Perlstein <alfred@FreeBSD.org>

NFS mobility PHASE I, II & III (phase VI, and V pending):

Rebind the client socket when we experience a timeout. This fixes
the case where our IP changes for some reason.

Signal a VFS event when NFS transitions from up to down and vice
versa.

Add a placeholder vfs_sysctl where we will put status reporting
shortly.

Also:
Make down NFS mounts return EIO instead of EINTR when there is a
soft timeout or force unmount in progress.


# 94ed9c8a 04-Jul-2004 Alfred Perlstein <alfred@FreeBSD.org>

Introduce a new kevent filter. EVFILT_FS that will be used to signal
generic filesystem events to userspace. Currently only mount and unmount
of filesystems are signalled. Soon to be added, up/down status of NFS.

Introduce a sysctl node used to route requests to/from filesystems
based on filesystem ids.

Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/
entrypoint by the sysctl code to change individual filesystems.


# e3c5a7a4 04-Jul-2004 Poul-Henning Kamp <phk@FreeBSD.org>

When we traverse the vnodes on a mountpoint we need to look out for
our cached 'next vnode' being removed from this mountpoint. If we
find that it was recycled, we restart our traversal from the start
of the list.

Code to do that is in all local disk filesystems (and a few other
places) and looks roughly like this:

MNT_ILOCK(mp);
loop:
for (vp = TAILQ_FIRST(&mp...);
(vp = nvp) != NULL;
nvp = TAILQ_NEXT(vp,...)) {
if (vp->v_mount != mp)
goto loop;
MNT_IUNLOCK(mp);
...
MNT_ILOCK(mp);
}
MNT_IUNLOCK(mp);

The code which takes vnodes off a mountpoint looks like this:

MNT_ILOCK(vp->v_mount);
...
TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes);
...
MNT_IUNLOCK(vp->v_mount);
...
vp->v_mount = something;

(Take a moment and try to spot the locking error before you read on.)

On a SMP system, one CPU could have removed nvp from our mountlist
but not yet gotten to assign a new value to vp->v_mount while another
CPU simultaneously get to the top of the traversal loop where it
finds that (vp->v_mount != mp) is not true despite the fact that
the vnode has indeed been removed from our mountpoint.

Fix:

Introduce the macro MNT_VNODE_FOREACH() to traverse the list of
vnodes on a mountpoint while taking into account that vnodes may
be removed from the list as we go. This saves approx 65 lines of
duplicated code.

Split the insmntque() which potentially moves a vnode from one mount
point to another into delmntque() and insmntque() which does just
what the names say.

Fix delmntque() to set vp->v_mount to NULL while holding the
mountpoint lock.


# 3971dcfa 20-Jun-2004 Thomas Moestl <tmm@FreeBSD.org>

Initialize ni_cnd.cn_cred before calling lookup() (this is normally done
by namei(), which cannot easily be used here however). This fixes boot
time crashes on sparc64 and probably other platforms.

Reviewed by: phk


# b90c8559 17-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Reduce the thaumaturgical level of root filesystem mounts: Instead of using
an otherwise redundant clone routine in geom_disk.c, mount a temporary
DEVFS and do a proper lookup.

Submitted by: thomas


# f3732fd1 17-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Second half of the dev_t cleanup.

The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.


# 89c9c53d 16-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


# 7f8a436f 05-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 0b68054f 27-Mar-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Add a description for vfs.usermount sysctl.
- Add the vfs_equalopts() function for mount options comparsion.
Now it looks much more clear.
- Style fixed.

In co-operation with: bde


# 6c8cc8ec 27-Mar-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Loudly disallow MNT_SUIDDIR mount flag for unprivileged users mounts.
- Style fixed.

Submitted by: bde


# 2c6040bb 26-Mar-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

We probably shouldn't allow users to mount file systems with MNT_SUIDDIR.
There should be not shell access when SUIDDIR is compiled in, but
better be sure.

Reviewed by: rwatson


# 537370d0 16-Mar-2004 Tim J. Robbins <tjr@FreeBSD.org>

Make vfs_nmount() public. The Linux emulator needs this in order to mount
linprocfs filesystems.


# 2b348f74 11-Mar-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unused mnt_reservedvnlist field.


# 3a1bdbf8 17-Feb-2004 Colin Percival <cperciva@FreeBSD.org>

Don't ignore errors from vfs_allocate_syncvnode.

PR: kern/18503
Submitted by: Anatoly Vorobey <mellon@pobox.com>
Approved by: rwatson (mentor)


# 3410b193 02-Feb-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix many issues related to mount/unmount:

1. Root from inside a jail was able to unmount any file system
(except /).
2. Unprivileged root was able to unmount file systems mounted by
privileged root (execpt /).
3. User from inside a jail was able to mount file system when
sysctl vfs.usermount was set to 1.
4. User was able to mount file system when vfs.usermount was set to 1
(that's ok) and unmount it even if vfs.usermount was equal to 0
(that's not correct).

Possibility from point 1 was reported by: Dariusz Kowalski <darek@76.pl>

Only a part of this fix will be MFC'ed (if approved).

PR: kern/60149
Reviewed by: rwatson
Approved by: scottl (mentor)
MFC after: 3 days


# 25cb5d7a 30-Nov-2003 Ian Dowse <iedowse@FreeBSD.org>

In dounmount(), only call checkdirs() prior to VFS_UNMOUNT() in the
forced unmount case. Otherwise, a file system that is referenced
only by process fd_cdir/fd_rdir references to the file system root
vnode will be successfully unmounted without the MNT_FORCE flag.

The previous behaviour was not compatible with the unmount semantics
required by amd(8), so file systems could be unexpectedly unmounted
while there were still references to the file system root directory.

Reported by: Erez Zadok <ezk@cs.sunysb.edu>
Approved by: re (scottl)


# 97c43a54 23-Nov-2003 Alexander Kabaev <kan@FreeBSD.org>

Do not attempt to destroy NULL vfs options list.

Approved by: re (scottl)
Reported by: Christian Laursen <xi atborderworlds dot dk>


# 3b39740d 13-Nov-2003 Alexander Kabaev <kan@FreeBSD.org>

Fix a number of style(9) bugs introduced in r1.113 by me.

Suggested by: bde


# cde6302b 12-Nov-2003 Peter Wemm <peter@FreeBSD.org>

MNAMELEN is back to an int again after Kirk's statfs commit

kern/vfs_mount.c:1305: warning: signed size_t format, different type arg (arg 4)
*** Error code 1


# 5c957adb 11-Nov-2003 Alexander Kabaev <kan@FreeBSD.org>

1. Consolidate mount struct allocation/destruction into a common code in
vfs_mount_alloc/vfs_mount_destroy functions and take care to completely
destroy the mount point along with its locks. Mount struct has grown in
coplexity recently and depending on each failure path to destroy it
completely isn't working anymore.

2. Eliminate largely identical vfs_mount and vfs_unmount question by
moving the code to handle both cases into a newly introduced vfs_domount
function.

3. Simplify nfs_mount_diskless to always expect an allocated mount
struct and never attempt an allocation/destruction itself. The
vfs_allocroot allocation was there to support 'magic' swap space
configuration for diskless clients that was already removed by PHK some
time ago.

4. Include a vfs_buildopts cleanups by Peter Edwards to validate the
sanity of nmount parameters passed from userland.

Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com>
Reviewed by: rwatson


# ca430f2e 04-Nov-2003 Alexander Kabaev <kan@FreeBSD.org>

Remove mntvnode_mtx and replace it with per-mountpoint mutex.
Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.

Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.

Discussed with: jeff


# 3d4274a5 26-Sep-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Update the list of CDROM device names to try for booting with RB_CDROM
flag set.


# ffe40c80 08-Sep-2003 Ian Dowse <iedowse@FreeBSD.org>

In the !MNT_BYFSID case, return EINVAL from unmount(2) when the
specified directory is not found in the mount list. Before the
MNT_BYFSID changes, unmount(2) used to return ENOENT for a nonexistent
path and EINVAL for a non-mountpoint, but we can no longer distinguish
between these cases. Of the two error codes, EINVAL was more likely
to occur in practice, and it was the only one of the two that was
documented.

Update the manual page to match the current behaviour.

Suggested by: tjr
Reviewed by: tjr


# 318f2fb4 01-Jul-2003 Ian Dowse <iedowse@FreeBSD.org>

Add a new mount flag MNT_BYFSID that can be used to unmount a file
system by specifying the file system ID instead of a path. Use this
by default in umount(8). This avoids the need to perform any vnode
operations to look up the mount point, so it makes it possible to
unmount a file system whose root vnode cannot be looked up (e.g.
due to a dead NFS server, or a file system that has become detached
from the hierarchy because an underlying file system was unmounted).
It also provides an unambiguous way to specify which file system is
to be unmunted.

Since the ability to unmount using a path name is retained only for
compatibility, that case now just uses a simple string comparison
of the supplied path against f_mntonname of each mounted file system.

Discussed on: freebsd-arch
mdoc help from: ru


# 677b542e 10-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# 84c080a8 07-Jun-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Improve the root-dev prompt facility for printing devices which could
possibly be a root filesystem.


# 38dd7dee 24-Apr-2003 Tim J. Robbins <tjr@FreeBSD.org>

Free mount credentials (mnt_cred) when freeing the mount struct
in failure cases to avoid leaking struct ucreds, and ultimately
leaking struct uidinfo references.


# 2603007a 22-Apr-2003 David E. O'Brien <obrien@FreeBSD.org>

Add /dev to the Alpha manual mount root example.


# 6b080461 26-Mar-2003 Tor Egge <tegge@FreeBSD.org>

Adjust the number of vnodes scanned by vlrureclaim() according to the
size of the vnode list.


# 838a6d03 21-Feb-2003 Robert Watson <rwatson@FreeBSD.org>

Export the name of the device used to mount the root file system as
kern.rootdev. If rootdev is undefined (NFS mount, etc), export an
empty string.

Desired by: peter


# a163d034 18-Feb-2003 Warner Losh <imp@FreeBSD.org>

Back out M_* changes, per decision of the TRB.

Approved by: trb


# edf6699a 14-Feb-2003 Alfred Perlstein <alfred@FreeBSD.org>

Fix LOR with PROC/filedesc. Introduce fdesc_mtx that will be used as a
barrier between free'ing filedesc structures. Basically if you want to
access another process's filedesc, you want to hold this mutex over the
entire operation.


# af2eed66 14-Feb-2003 Dag-Erling Smørgrav <des@FreeBSD.org>

Style nit.


# 3dc593c8 14-Feb-2003 Alfred Perlstein <alfred@FreeBSD.org>

KASSERT format string does not need newline termination


# 0c5f7aaa 14-Feb-2003 Alfred Perlstein <alfred@FreeBSD.org>

Add kasserts to catch bad API usage.

Submitted by: Hiten Pandya <hiten@unixdaemons.com>


# 44956c98 21-Jan-2003 Alfred Perlstein <alfred@FreeBSD.org>

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 13438f68 31-Dec-2002 Alfred Perlstein <alfred@FreeBSD.org>

When compiling the kernel do not implicitly include filedesc.h from proc.h,
this was causing filedesc work to be very painful.
In order to make this work split out sigio definitions to thier own header
(sigio.h) which is included from proc.h for the time being.


# f97182ac 14-Dec-2002 Alfred Perlstein <alfred@FreeBSD.org>

unwrap lines made short enough by SCARGS removal


# b80521fe 13-Dec-2002 Alfred Perlstein <alfred@FreeBSD.org>

remove syscallarg().

Suggested by: peter


# d1e405c5 13-Dec-2002 Alfred Perlstein <alfred@FreeBSD.org>

SCARGS removal take II.


# bc9e75d7 13-Dec-2002 Alfred Perlstein <alfred@FreeBSD.org>

Backout removal SCARGS, the code freeze is only "selectively" over.


# 0bbe7292 13-Dec-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove SCARGS.

Reviewed by: md5


# b65d1ba9 07-Nov-2002 Maxime Henrion <mux@FreeBSD.org>

- Use a better definition for MNAMELEN which doesn't require
to have one #ifdef per architecture.
- Change a space to a tab after a nearby #define.

Obtained from: bde


# df6b615a 25-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

#include <geom/geom.h> to get proper prototypes. Contrary to my fears we
seem to have all the prerequisites already.

Call g_waitidle() as the first thing in vfs_mountroot() so that we have
it out of the way before we even decide if we should call .._ask() or
.._try().

Call the g_dev_print() function to provide better guidance for the
root-mount prompt.


# 7c0c26b4 24-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Make sure GEOM has stopped rattling the disks before we try to mount
the root filesystem, this may be implicated in the PC98 issue.


# 9e4b381a 21-Oct-2002 Kirk McKusick <mckusick@FreeBSD.org>

This update removes a race between unmount and lookup. The lookup
locks the mount point directory while waiting for vfs_busy to clear.
Meanwhile the unmount which holds the vfs_busy lock tried to lock
the mount point vnode. The fix is to observe that it is safe for the
unmount to remove the vnode from the mount point without locking it.
The lookup will wait for the unmount to complete, then recheck the
mount point when the vfs_busy lock clears.

Sponsored by: DARPA & NAI Labs.


# c177d125 21-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

GEOM does not (and shall not) propagate flags like D_MEMDISK, so we will
revert to checking the name to determine if our root device is a ramdisk,
md(4) specifically to determine if we should attempt the root-mount RW

Sponsored by: DARPA & NAI Labs.


# 609058e8 24-Sep-2002 Jeff Roberson <jeff@FreeBSD.org>

- Don't protect mountedhere with the vn interlock.
- Protect mountedhere with the vn lock.


# e2587e98 19-Sep-2002 Maxime Henrion <mux@FreeBSD.org>

Switch to using strlcpy() in several places. It seems there
were cases where we could get unterminated strings before.


# fee7d450 19-Aug-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Keep a copy of the credential used to mount filesystems around so
we can check and use it later on.

Change the pieces of code which relied on mount->mnt_stat.f_owner
to check which user mounted the filesystem.

This became needed as the EA code needs to be able to allocate
blocks for "system" EA users like ACLs.

There seems to be some half-baked (probably only quarter- actually)
notion that the superuser for a given filesystem is the user who
mounted it, but this has far from been carried through. It is
unclear if it should be.

Sponsored by: DARPA & NAI Labs.


# e6e370a7 04-Aug-2002 Jeff Roberson <jeff@FreeBSD.org>

- Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.

Idea stolen from: BSD/OS


# f2b17113 02-Aug-2002 Maxime Henrion <mux@FreeBSD.org>

Make the consumers of the linker_load_file() function use
linker_load_module() instead.

This fixes a bug where the kernel was unable to properly locate and
load a kernel module in vfs_mount() (and probably in the netgraph
code as well since it was using the same function). This is because
the linker_load_file() does not properly search the module path.

Problem found by: peter
Reviewed by: peter
Thanks to: peter


# f9d0d524 01-Aug-2002 Robert Watson <rwatson@FreeBSD.org>

Include file cleanup; mac.h and malloc.h at one point had ordering
relationship requirements, and no longer do.

Reminded by: bde


# a87cdf83 30-Jul-2002 Robert Watson <rwatson@FreeBSD.org>

Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the necessary MAC entry points to maintain labels on
mount structures. In particular, invoke entry points for
intialization and destruction in various scenarios (root,
non-root). Also introduce an entry point in the boot procedure
following the mount of the root file system, but prior to the
start of the userland init process to permit policies to
perform further initialization.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# a562685f 29-Jul-2002 Jeff Roberson <jeff@FreeBSD.org>

- Backout the patch made in revision 1.75 of vfs_mount.c. The vputs here
were hiding the real problem of the missing unlock in sync_inactive.
- Add the missing unlock in sync_inactive.

Submitted by: iedowse


# dae0abed 24-Jul-2002 Maxime Henrion <mux@FreeBSD.org>

Fix a stupid bug where I wasn't initializing the names
of 0-length mount options.


# 72fda5bc 19-Jul-2002 Maxime Henrion <mux@FreeBSD.org>

- Merge the mount options at MNT_UPDATE time with vfs_mergeopts().
- Sanity check the mount options list (remove duplicates) with
vfs_sanitizeopts().
- Fix some malloc(0)/free(NULL) bugs.

Reviewed by: rwatson (some time ago)


# 25b286d6 09-Jul-2002 Jeff Roberson <jeff@FreeBSD.org>

- Use standard locking functions in syncer's opv
- vput instead of vrele syncer vnodes in vfs_mount
- Add vop_lookup_{pre,post} to verify locking in VOP_LOOKUP


# 43088e98 08-Jul-2002 Maxime Henrion <mux@FreeBSD.org>

Add a VFS_START() call in vfs_mountroot_try() for the sake
of being correct. None of the root mountable filesystems
do something at VFS_START().

Shorten a comment to fix a style bug while I'm here.

PR: kern/18505


# 2efc89d4 04-Jul-2002 Jeff Roberson <jeff@FreeBSD.org>

Include systm.h before vnode.h so Debugger() and printf() are available when
full vnode lock debugging is enabled.


# d7f9ecc8 03-Jul-2002 Maxime Henrion <mux@FreeBSD.org>

Move vfs_rootmountalloc() in vfs_mount.c and remove lite2_vfs_mountroot()
which was #if 0'd and is not likely to be used now.


# 563af2ec 03-Jul-2002 Maxime Henrion <mux@FreeBSD.org>

Remove an unused argument in vfs_mountroot().


# 534ab2e1 02-Jul-2002 Maxime Henrion <mux@FreeBSD.org>

I didn't pay enough attention when copy/pasting disclaimers.
The disclaimer in vfs_conf.c was slightly different. Fix this.


# 2b4edb69 02-Jul-2002 Maxime Henrion <mux@FreeBSD.org>

Move every code related to mount(2) in a new file, vfs_mount.c.
The file vfs_conf.c which was dealing with root mounting has
been repo-copied into vfs_mount.c to preserve history.
This makes nmount related development easier, and help reducing
the size of vfs_syscalls.c, which is still an enormous file.

Reviewed by: rwatson
Repo-copy by: peter


# d786139c 17-Apr-2002 Maxime Henrion <mux@FreeBSD.org>

Rework the kernel environment subsystem. We now convert the static
environment needed at boot time to a dynamic subsystem when VM is
up. The dynamic kernel environment is protected by an sx lock.

This adds some new functions to manipulate the kernel environment :
freeenv(), setenv(), unsetenv() and testenv(). freeenv() has to be
called after every getenv() when you have finished using the string.
testenv() only tests if an environment variable is present, and
doesn't require a freeenv() call. setenv() and unsetenv() are self
explanatory.

The kenv(2) syscall exports these new functionalities to userland,
mainly for kenv(1).

Reviewed by: peter


# 8d19a265 31-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Centralize the "bootdev" and "dumpdev" variables. They are still pretty
bogus all things considered, but at least now they don't camouflage as
being MD variables.


# fb92273b 08-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Move the mount of the root filesystem to happen in the init process before
the exec if /sbin/init.

This allows the scheduler to get started and kthreads a chance to run
before we start filesystem operations.


# eb8e6d52 05-Mar-2002 Eivind Eklund <eivind@FreeBSD.org>

Document all functions, global and static variables, and sysctls.
Includes some minor whitespace changes, and re-ordering to be able to document
properly (e.g, grouping of variables and the SYSCTL macro calls for them, where
the documentation has been added.)

Reviewed by: phk (but all errors are mine)


# d970bcc9 23-Nov-2001 David E. O'Brien <obrien@FreeBSD.org>

Remove the use of _PATH_DEV in the example.

The kernel certainly doesn't use _PATH_DEV or even /dev/ to find the device.
It cannot, since "/" has not been mounted. Maybe the only affect of using
/dev/ is that it gets put in the mounted-from name for "/", so that mount(8),
etc., display an absolute path before "/" has been remounted. Many have
never bothered typing the full path, and code that constructs a path in
rootdevnames[] never bothered to construct a full path, so the example
shouldn't have it.

Submitted by: bde


# cabb03fc 20-Nov-2001 David E. O'Brien <obrien@FreeBSD.org>

We only have slices on i386 and IA-64.


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# fb919e4d 01-May-2001 Mark Murray <markm@FreeBSD.org>

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


# 60fb0ce3 28-Apr-2001 Greg Lehey <grog@FreeBSD.org>

Revert consequences of changes to mount.h, part 2.

Requested by: bde


# d98dc34f 23-Apr-2001 Greg Lehey <grog@FreeBSD.org>

Correct #includes to work with fixed sys/mount.h.


# f3a90da9 01-Mar-2001 Adrian Chadd <adrian@FreeBSD.org>

Reviewed by: jlemon

An initial tidyup of the mount() syscall and VFS mount code.

This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.

* the guts of the mount work has been moved into vfs_mount().

* move `type', `path' and `flags' from being userland variables into being
kernel variables in vfs_mount(). `data' remains a pointer into
userspace.

* Attempt to verify the `type' and `path' strings passed to vfs_mount()
aren't too long.

* rework mount() and linux_mount() to take the userland parameters
(besides data, as mentioned) and pass kernel variables to vfs_mount().
(linux_mount() already did this, I've just tidied it up a little more.)

* remove the copyin*() stuff for `path'. `data' still requires copyin*()
since its a pointer into userland.

* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
filesystem. This variable is generally initialised with `path', and
each filesystem can override it if they want to.

* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.


# 9ed346ba 08-Feb-2001 Bosko Milekic <bmilekic@FreeBSD.org>

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# 1a37aa56 09-Dec-2000 David E. O'Brien <obrien@FreeBSD.org>

Add `_PATH_DEVZERO'.
Use _PATH_* where where possible.


# 53ce36d1 29-Oct-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unneeded #include <sys/proc.h> lines.


# a18b1f1d 03-Oct-2000 Jason Evans <jasone@FreeBSD.org>

Convert lockmgr locks from using simple locks to using mutexes.

Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.


# db901281 02-Sep-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Avoid the modules madness I inadvertently introduced by making the
cloning infrastructure standard in kern_conf. Modules are now
the same with or without devfs support.

If you need to detect if devfs is present, in modules or elsewhere,
check the integer variable "devfs_present".

This happily removes an ugly hack from kern/vfs_conf.c.

This forces a rename of the eventhandler and the standard clone
helper function.

Include <sys/eventhandler.h> in <sys/conf.h>: it's a helper #include
like <sys/queue.h>

Remove all #includes of opt_devfs.h they no longer matter.


# a481b90b 24-Aug-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Fix panic when removing open device (found by bp@)
Implement subdirs.
Build the full "devicename" for cloning functions.
Fix panic when deleted device goes away.
Collaps devfs_dir and devfs_dirent structures.
Add proper cloning to the /dev/fd* "device-"driver.
Fix a bug in make_dev_alias() handling which made aliases appear
multiple times.
Use devfs_clone to implement getdiskbyname()
Make specfs maintain the stat(2) timestamps per dev_t


# b38f58db 22-May-2000 Mike Smith <msmith@FreeBSD.org>

Make a trip to Pointy-Hats-R-Us and actually include the header that
defines ROOTDEVNAME.

Submitted by: "Jeffrey S. Sharp" <jss@subatomix.com>


# 16aae9cb 20-Mar-2000 Brian Feldman <green@FreeBSD.org>

Split the logic of
static int setrootbyname(char *name);
out into
dev_t getdiskbyname(char *name);

This makes it easy to create a new DDB command, which is the big reason
for the change. You can now do the following in DDB:

Example rc.conf entry:
dumpdev="/dev/ad0s1b" # Device name to crashdump to (if enabled).

db> show disk/ad0s1b
dev_t = 0xc0b7ea00
db> p *dumpdev
c0b7ea00


# bb328b87 17-Feb-2000 Mike Smith <msmith@FreeBSD.org>

Change the mountroot prompt to something that doesn't look at all like a
firmware prompt. Several sleepy folk mistook the '>>>' for the SRM
prompt, which was never the desired idea.

Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>
Approved by: jkh


# cfdd2383 12-Dec-1999 Peter Wemm <peter@FreeBSD.org>

Put on asbestos suit and put a splcam() around the 'Mounting root from..'
message to stop it splitting. Every single scsi machine I've seen seems
to reliably collide with this and it's rather annoying.


# e8452d59 08-Dec-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Scan cdevs for potential root devices, rather than bdevs.


# 99e659dc 29-Nov-1999 Matthew Dillon <dillon@FreeBSD.org>

Make BOOTP work again.

Submitted by: Doug Ambrisko <ambrisko@whistle.com>


# c0da4cac 28-Nov-1999 Mike Smith <msmith@FreeBSD.org>

Use the correct mounted-from path when allocating the root mount, if we know
what it is.

Be more correct in unbusying the mountpoint (especially before freeing it).

Remove support for mounting 'r' devices as root. You don't mount 'r'
devices anywhere else, and they're going away anyway.

Submitted by: bde


# 71e4fff8 26-Nov-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Retire MFS_ROOT and MFS_ROOT_SIZE options from the MFS implementation.

Add MD_ROOT and MD_ROOT_SIZE options to the md driver.

Make the md driver handle MFS_ROOT and MFS_ROOT_SIZE options for compatibility.

Add md driver to GENERIC, PCCARD and LINT.

This is a cleanup which removes the need for some of the worse hacks in
MFS: We really want to have a rootvnode but MFS on a preloaded image
doesn't really have one. md is a true device, so it is less trouble.

This has been tested with make release, and if people remember to add
the "md" pseudo-device to their kernels, PicoBSD should be just fine
as well. If people have no other use for MFS, it can be removed from
the kernel.


# b3be35ee 21-Nov-1999 Mike Smith <msmith@FreeBSD.org>

If vfs_mountroot_try() isn't given a path to try mounting, return a silent
error rather than complaining about it verbosely. No path is not really
a failure, but the diagnostic was confusing and unuseful.


# 0429e37a 20-Nov-1999 Poul-Henning Kamp <phk@FreeBSD.org>

struct mountlist and struct mount.mnt_list have no business being
a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively.

This removes ugly mp != (void*)&mountlist comparisons.

Requested by: phk
Submitted by: Jake Burkholder jake@checker.org
PR: 14967


# 325f1398 08-Nov-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Ignore leading 'r' in base of root device name.


# 91eef0b8 06-Nov-1999 Mike Smith <msmith@FreeBSD.org>

Clean up a couple of initialisations in order to suppress a correct
but un-useful warning.


# c161a875 05-Nov-1999 Mike Smith <msmith@FreeBSD.org>

Guard against freeing NULL if vfs_mountroot_try is called with NULL
as an argument (this is legal to make other code simpler).


# 7a0beaf1 04-Nov-1999 Mike Smith <msmith@FreeBSD.org>

Expand the sscanf buffer to 32 bytes to make room for the expanded
pattern, with some space left over to avoid this mistake next time it's
improved.

Submitted by: luoqi


# 586aaa0f 04-Nov-1999 Mike Smith <msmith@FreeBSD.org>

Allow vfs names to include the digits 0-9 as well as the letters a-z.
This should let 'cd9660' filesystems be allowed.

Submitted by: ghelmer


# 90ebaea9 03-Nov-1999 Mike Smith <msmith@FreeBSD.org>

Re-implement the handing of RB_CDROM in a machine-independant fashion.
We currently only search SCSI and IDE CDROMs; if there's felt to be a
need for supporting the very old and rare soundcard etc. drives for this
application they can be trivially added.


# 88d4183b 03-Nov-1999 Mike Smith <msmith@FreeBSD.org>

Make MFS work with the new root filesystem search process.

In order to achieve this, root filesystem mount is moved from
SI_ORDER_FIRST to SI_ORDER_SECOND in the SI_SUB_MOUNT_ROOT sysinit
group. Now, modules which wish to usurp the default root mount
can use SI_ORDER_FIRST.

A compiled-in or preloaded MFS filesystem will become the root
filesystem unless the vfs.root.mountfrom environment variable refers
to a valid bootable device. This will normally only be the case when
the kernel and MFS image have been loaded from a disk which has a
valid /etc/fstab file. In this case, the variable should be manually
overridden in the loader, or the kernel booted with -a. In either
case "mfs:" should be supplied as the new value.

Also fix a typo in one DFLTROOT case that would not have compiled.


# ed9f9797 01-Nov-1999 Mike Smith <msmith@FreeBSD.org>

This is a complete rewrite of vfs_conf.c, which changes the way the root
filesystem is discovered. Preference is given to using the kernel
environment variable vfs.root.mountfrom, which is set by the loader
according to the contents of /etc/fstab. Changes in the MD code
provide fallback mechanisms for systems not using the loader.

A more robust fallback path is also provided, with the last recourse
being to prompt on the console for a root device.

These changes drastically simplify the machine-dependant parts of
the root configuration process. In addition, support for CDROM root
devices has been removed; it was a nasty hack and didn't work.


# e6f71111 19-Sep-1999 Matthew Dillon <dillon@FreeBSD.org>

Fix BOOTP root FS mounts. Also cleanup vfs_getnewfsid() and collapse
addaliasu() into addalias() (no operational change) and clarify comments
relating to a trick that vclean() uses.

The fix to BOOTP is yet another hack. Actually, rootfsid handling
is already a major hack. The whole thing needs to be cleaned up.

Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# ca224f89 03-Jul-1999 Peter Wemm <peter@FreeBSD.org>

Fix warnings in last commit (dev_t is not an int, and not even int
compatable in arg lists on the Alpha)


# ad6cb559 03-Jul-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Be more informative and try to ask the user in some instances if we can't
figure out the root device.


# fec1aafc 26-Jun-1999 Peter Wemm <peter@FreeBSD.org>

I'm tired of having a 'hanging root device'.. This isn't a "fix", just
a workaround for a specific case where cam interrupts right in the middle
of this printf.


# 35018515 23-May-1999 John Birrell <jb@FreeBSD.org>

Back out my previous change (phk didn't like it) in favour of setting
rootdev in the mfs initialisation code iff MFS_ROOT (which Bruce doesn't
like). Damned if I do - damned if I don't.


# d4706682 23-May-1999 John Birrell <jb@FreeBSD.org>

Make MFS_ROOT work again. MFS_ROOT means that rootdev is not set.

Broken by: phk
Problem ignored by: phk


# d024c955 14-Sep-1998 Søren Schmidt <sos@FreeBSD.org>

Remove the SLICE code.
This clearly needs alot more thought, and we dont need this to hunt
us down in 3.0-RELEASE.


# 1afde994 08-Jun-1998 Bruce Evans <bde@FreeBSD.org>

Pass lists of possible root devices and their names up to the
machine-independent code and try mounting the devices in the
lists instead of guessing alternative root devices in a machine-
dependent way.

autoconf.c:
Reject preposterous slice numbers instead of silently converting
them to COMPATIBILITY_SLICE.

Don't forget to force slice = COMPATIBILITY_SLICE in the floppy
device name.

Eliminated most magic numbers and magic device names in setroot().

Fixed dozens of style bugs.

vfs_conf.c:
Put the actual root device name instead of "root_device" in the
mount struct if the actual name is available. This is useful after
booting with -s. If it were set in all cases then it could be used
to do mount(8)'s ROOTSLICE_HUNT and fsck(8)'s hotroot guess better.


# c0bab11d 19-Apr-1998 Julian Elischer <julian@FreeBSD.org>

Make the devfs SLICE option a standard type option.
(hopefully it will go away eventually anyhow)


# 3e425b96 19-Apr-1998 Julian Elischer <julian@FreeBSD.org>

Add changes and code to implement a functional DEVFS.
This code will be turned on with the TWO options
DEVFS and SLICE. (see LINT)
Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes.

/dev will be automatically mounted by init (thanks phk)
on bootup. See /sys/dev/slice/slice.4 for more info.
All code should act the same without these options enabled.

Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others

This code does not support the following:
bad144 handling.
Persistance. (My head is still hurting from the last time we discussed this)
ATAPI flopies are not handled by the SLICE code yet.

When this code is running, all major numbers are arbitrary and COULD
be dynamically assigned. (this is not done, for POLA only)
Minor numbers for disk slices ARE arbitray and dynamically assigned.


# 617dd81f 10-Mar-1998 Mike Smith <msmith@FreeBSD.org>

If the root mount fails from a device that is not the compatability slice
of a disk, because that slice does not exist, try again mounting from the
compatability slice.

This handles the case where a disk has been initialised by 'disklabel
auto', which places a bogus and invalid slice entry on the disk.
The bootstrap is not smart enough to reject this slice, and pretends to
boot from it. Believing the the bootstrap at this point is unwise.

Booting from non-'wd' disks thus prepared is still broken, as
'disklabel -rwB xdN auto' does not initialise the disk type field, and
the bootstrap mistakenly claims that the disk is handled by 'wd'.

Behaviour is now consistent with DEVFS expected characteristics.


# 303b270b 08-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Staticize.


# e4f4247a 08-Jan-1998 Eivind Eklund <eivind@FreeBSD.org>

Make the BOOTP family new-style options (in opt_bootp.h)


# 1b049398 01-Dec-1997 Julian Elischer <julian@FreeBSD.org>

Cleanup my last patch here
Reviewed by: sef@kthrup.com and phk@freebsd.org


# 95802bf8 25-Nov-1997 Julian Elischer <julian@FreeBSD.org>

Shift a few SYSINT() calls around.
this results in a few functions becoming static, and
the SYSINITs being close to the code they are related to.
setting up the dump device is with dumpsys() and
kicking off the scheduler is with the scheduler.
Mounting root is with the code that does it.

Reviewed by: phk


# 4a11ca4e 07-Nov-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


# 4199ed40 16-Oct-1997 Julian Elischer <julian@FreeBSD.org>

We are mounting the root.
mount it at the HEAD of the queue, DEVFS might already be there..


# a1c995b6 12-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 996c772f 09-Feb-1997 John Dyson <dyson@FreeBSD.org>

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# d841aaa7 02-Dec-1995 Bruce Evans <bde@FreeBSD.org>

Finished (?) cleaning up sysinit stuff.


# a98ca469 29-Oct-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Second batch of cleanup changes.
This time mostly making a lot of things static and some unused
variables here and there.


# 4590fd3a 09-Sep-1995 David Greenman <dg@FreeBSD.org>

Fixed init functions argument type - caddr_t -> void *. Fixed a couple of
compiler warnings.


# 41a93c86 29-Aug-1995 Bruce Evans <bde@FreeBSD.org>

Fix benign type mismatch in a sysinit function arg.


# 2b14f991 28-Aug-1995 Julian Elischer <julian@FreeBSD.org>

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


# e113d763 11-Nov-1994 Poul-Henning Kamp <phk@FreeBSD.org>

Make a kernel sans FFS possible.


# c901836c 20-Sep-1994 Garrett Wollman <wollman@FreeBSD.org>

Implemented loadable VFS modules, and made most existing filesystems
loadable. (NFS is a notable exception.)


# 27a0bc89 19-Sep-1994 Doug Rabson <dfr@FreeBSD.org>

Added msdosfs.

Obtained from: NetBSD


# e0e9c421 20-Aug-1994 David Greenman <dg@FreeBSD.org>

Implemented filesystem clean bit via:

machdep.c:
Changed printf's a little and call vfs_unmountall() if the sync was
successful.

cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c:
Allow dismount of root FS. It is now disallowed at a higher level.

vfs_conf.c:
Removed unused rootfs global.

vfs_subr.c:
Added new routines vfs_unmountall and vfs_unmountroot. Filesystems
are now dismounted if the machine is properly rebooted.

ffs_vfsops.c:
Toggle clean bit at the appropriate places. Print warning if an
unclean FS is mounted.

ffs_vfsops.c, lfs_vfsops.c:
Fix bug in selecting proper flags for VOP_CLOSE().

vfs_syscalls.c:
Disallow dismounting root FS via umount syscall.


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources