History log of /netbsd-current/sys/kern/vfs_mount.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.105 19-Apr-2024 riastradh

dounmount: Avoid &((struct vnode_impl *)NULL)->vi_vnode.

Member access of a null pointer is undefined, even if the result
should also be null because vi_vnode is at the start of vnode_impl.

Reported-by: syzbot+a4b2d13c0d6d4dac2d07@syzkaller.appspotmail.com
https://syzkaller.appspot.com/bug?extid=a4b2d13c0d6d4dac2d07


# 1.104 17-Jan-2024 hannken

Print dangling vnode before panic() to help debug.

PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"


# 1.103 28-Dec-2023 hannken

Include "veriexec.h" and <sys/verified_exec.h> to run
veriexec_unmountchk() on "NVERIEXEC > 0".


Revision tags: thorpej-ifq-base thorpej-altq-separation-base
# 1.102 24-Feb-2023 riastradh

kern: Eliminate most __HAVE_ATOMIC_AS_MEMBAR conditionals.

I'm leaving in the conditional around the legacy membar_enters
(store-before-load, store-before-store) in kern_mutex.c and in
kern_lock.c because they may still matter: store-before-load barriers
tend to be the most expensive kind, so eliding them is probably
worthwhile on x86. (It also may not matter; I just don't care to do
measurements right now, and it's a single valid and potentially
justifiable use case in the whole tree.)

However, membar_release/acquire can be mere instruction barriers on
all TSO platforms including x86, so there's no need to go out of our
way with a bad API to conditionalize them. If the procedure call
overhead is measurable we just could change them to be macros on x86
that expand into __insn_barrier.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2023/02/23/msg028729.html


Revision tags: netbsd-10-0-RELEASE netbsd-10-0-RC6 netbsd-10-0-RC5 netbsd-10-0-RC4 netbsd-10-0-RC3 netbsd-10-0-RC2 netbsd-10-0-RC1 netbsd-10-base
# 1.101 09-Dec-2022 hannken

branches: 1.101.2;
Harden layered file systems usage of field "mnt_lower" against
forced unmounts of the lower layer.

- Dont allow "dead_rootmount" as lower layer.

- Take file system busy before a vfs operation walks down the stack.

Reported-by: syzbot+27b35e5675b1753cec03@syzkaller.appspotmail.com
Reported-by: syzbot+99071492e3de2eff49e9@syzkaller.appspotmail.com


# 1.100 10-Nov-2022 hannken

If built with DEBUG Limit the depth of file system stack so kernel sanitizers
may stress mount/unmount without exhausting the kernel stack.


# 1.99 04-Nov-2022 hannken

Add a helper to set or clear lower mount and use it.
Always add a reference to the lower mount.

Ride 9.99.105


# 1.98 26-Oct-2022 riastradh

sys/filedesc.h: New home for extern cwdi0.


Revision tags: bouyer-sunxi-drm-base
# 1.97 13-Sep-2022 riastradh

vflush(9): Insert `involuntary' preemption point at each vnode.

Currently there is a voluntary yield every 100ms, but that's a long
time. Should help to avoid hogging the CPU while flushing lots of
data to big disks on systems without kpreemption.


# 1.96 26-Aug-2022 hannken

Two defects in vfs_getnewfsid():

- Parallel mounts may get the same fsid. Always increment "xxxfs_mntid"
to make it unlikely.

- Directly walk "mountlist" to prevent a rare deadlock where one thread
holds a vnode locked, calls vfs_getnewfsid() and the iterator has to
wait for a suspended file system while the thread suspending needs
this vnode lock.


# 1.95 22-Aug-2022 hannken

Protect changing "v_mountedhere" with file system suspension instead
of vnode lock.


# 1.94 08-Jul-2022 hannken

Suspend file system after VFS_MOUNT() and before taking mnt_updating.
Prevents deadlock against concurrent unmounts of layered file systems.


# 1.93 09-Apr-2022 riastradh

sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.


# 1.92 28-Mar-2022 riastradh

specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.


# 1.91 24-Mar-2022 riastradh

vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.

Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.


# 1.90 19-Mar-2022 hannken

Lock vnode across VOP_OPEN.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.104 17-Jan-2024 hannken

Print dangling vnode before panic() to help debug.

PR kern/57775 ""panic: unmount: dangling vnode" while umounting procfs"


# 1.103 28-Dec-2023 hannken

Include "veriexec.h" and <sys/verified_exec.h> to run
veriexec_unmountchk() on "NVERIEXEC > 0".


Revision tags: thorpej-ifq-base thorpej-altq-separation-base
# 1.102 24-Feb-2023 riastradh

kern: Eliminate most __HAVE_ATOMIC_AS_MEMBAR conditionals.

I'm leaving in the conditional around the legacy membar_enters
(store-before-load, store-before-store) in kern_mutex.c and in
kern_lock.c because they may still matter: store-before-load barriers
tend to be the most expensive kind, so eliding them is probably
worthwhile on x86. (It also may not matter; I just don't care to do
measurements right now, and it's a single valid and potentially
justifiable use case in the whole tree.)

However, membar_release/acquire can be mere instruction barriers on
all TSO platforms including x86, so there's no need to go out of our
way with a bad API to conditionalize them. If the procedure call
overhead is measurable we just could change them to be macros on x86
that expand into __insn_barrier.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2023/02/23/msg028729.html


Revision tags: netbsd-10-0-RC3 netbsd-10-0-RC2 netbsd-10-0-RC1 netbsd-10-base
# 1.101 09-Dec-2022 hannken

Harden layered file systems usage of field "mnt_lower" against
forced unmounts of the lower layer.

- Dont allow "dead_rootmount" as lower layer.

- Take file system busy before a vfs operation walks down the stack.

Reported-by: syzbot+27b35e5675b1753cec03@syzkaller.appspotmail.com
Reported-by: syzbot+99071492e3de2eff49e9@syzkaller.appspotmail.com


# 1.100 10-Nov-2022 hannken

If built with DEBUG Limit the depth of file system stack so kernel sanitizers
may stress mount/unmount without exhausting the kernel stack.


# 1.99 04-Nov-2022 hannken

Add a helper to set or clear lower mount and use it.
Always add a reference to the lower mount.

Ride 9.99.105


# 1.98 26-Oct-2022 riastradh

sys/filedesc.h: New home for extern cwdi0.


Revision tags: bouyer-sunxi-drm-base
# 1.97 13-Sep-2022 riastradh

vflush(9): Insert `involuntary' preemption point at each vnode.

Currently there is a voluntary yield every 100ms, but that's a long
time. Should help to avoid hogging the CPU while flushing lots of
data to big disks on systems without kpreemption.


# 1.96 26-Aug-2022 hannken

Two defects in vfs_getnewfsid():

- Parallel mounts may get the same fsid. Always increment "xxxfs_mntid"
to make it unlikely.

- Directly walk "mountlist" to prevent a rare deadlock where one thread
holds a vnode locked, calls vfs_getnewfsid() and the iterator has to
wait for a suspended file system while the thread suspending needs
this vnode lock.


# 1.95 22-Aug-2022 hannken

Protect changing "v_mountedhere" with file system suspension instead
of vnode lock.


# 1.94 08-Jul-2022 hannken

Suspend file system after VFS_MOUNT() and before taking mnt_updating.
Prevents deadlock against concurrent unmounts of layered file systems.


# 1.93 09-Apr-2022 riastradh

sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.


# 1.92 28-Mar-2022 riastradh

specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.


# 1.91 24-Mar-2022 riastradh

vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.

Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.


# 1.90 19-Mar-2022 hannken

Lock vnode across VOP_OPEN.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.103 28-Dec-2023 hannken

Include "veriexec.h" and <sys/verified_exec.h> to run
veriexec_unmountchk() on "NVERIEXEC > 0".


Revision tags: thorpej-ifq-base thorpej-altq-separation-base
# 1.102 24-Feb-2023 riastradh

kern: Eliminate most __HAVE_ATOMIC_AS_MEMBAR conditionals.

I'm leaving in the conditional around the legacy membar_enters
(store-before-load, store-before-store) in kern_mutex.c and in
kern_lock.c because they may still matter: store-before-load barriers
tend to be the most expensive kind, so eliding them is probably
worthwhile on x86. (It also may not matter; I just don't care to do
measurements right now, and it's a single valid and potentially
justifiable use case in the whole tree.)

However, membar_release/acquire can be mere instruction barriers on
all TSO platforms including x86, so there's no need to go out of our
way with a bad API to conditionalize them. If the procedure call
overhead is measurable we just could change them to be macros on x86
that expand into __insn_barrier.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2023/02/23/msg028729.html


Revision tags: netbsd-10-0-RC1 netbsd-10-base
# 1.101 09-Dec-2022 hannken

Harden layered file systems usage of field "mnt_lower" against
forced unmounts of the lower layer.

- Dont allow "dead_rootmount" as lower layer.

- Take file system busy before a vfs operation walks down the stack.

Reported-by: syzbot+27b35e5675b1753cec03@syzkaller.appspotmail.com
Reported-by: syzbot+99071492e3de2eff49e9@syzkaller.appspotmail.com


# 1.100 10-Nov-2022 hannken

If built with DEBUG Limit the depth of file system stack so kernel sanitizers
may stress mount/unmount without exhausting the kernel stack.


# 1.99 04-Nov-2022 hannken

Add a helper to set or clear lower mount and use it.
Always add a reference to the lower mount.

Ride 9.99.105


# 1.98 26-Oct-2022 riastradh

sys/filedesc.h: New home for extern cwdi0.


Revision tags: bouyer-sunxi-drm-base
# 1.97 13-Sep-2022 riastradh

vflush(9): Insert `involuntary' preemption point at each vnode.

Currently there is a voluntary yield every 100ms, but that's a long
time. Should help to avoid hogging the CPU while flushing lots of
data to big disks on systems without kpreemption.


# 1.96 26-Aug-2022 hannken

Two defects in vfs_getnewfsid():

- Parallel mounts may get the same fsid. Always increment "xxxfs_mntid"
to make it unlikely.

- Directly walk "mountlist" to prevent a rare deadlock where one thread
holds a vnode locked, calls vfs_getnewfsid() and the iterator has to
wait for a suspended file system while the thread suspending needs
this vnode lock.


# 1.95 22-Aug-2022 hannken

Protect changing "v_mountedhere" with file system suspension instead
of vnode lock.


# 1.94 08-Jul-2022 hannken

Suspend file system after VFS_MOUNT() and before taking mnt_updating.
Prevents deadlock against concurrent unmounts of layered file systems.


# 1.93 09-Apr-2022 riastradh

sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.


# 1.92 28-Mar-2022 riastradh

specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.


# 1.91 24-Mar-2022 riastradh

vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.

Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.


# 1.90 19-Mar-2022 hannken

Lock vnode across VOP_OPEN.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.102 24-Feb-2023 riastradh

kern: Eliminate most __HAVE_ATOMIC_AS_MEMBAR conditionals.

I'm leaving in the conditional around the legacy membar_enters
(store-before-load, store-before-store) in kern_mutex.c and in
kern_lock.c because they may still matter: store-before-load barriers
tend to be the most expensive kind, so eliding them is probably
worthwhile on x86. (It also may not matter; I just don't care to do
measurements right now, and it's a single valid and potentially
justifiable use case in the whole tree.)

However, membar_release/acquire can be mere instruction barriers on
all TSO platforms including x86, so there's no need to go out of our
way with a bad API to conditionalize them. If the procedure call
overhead is measurable we just could change them to be macros on x86
that expand into __insn_barrier.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2023/02/23/msg028729.html


Revision tags: netbsd-10-base
# 1.101 09-Dec-2022 hannken

Harden layered file systems usage of field "mnt_lower" against
forced unmounts of the lower layer.

- Dont allow "dead_rootmount" as lower layer.

- Take file system busy before a vfs operation walks down the stack.

Reported-by: syzbot+27b35e5675b1753cec03@syzkaller.appspotmail.com
Reported-by: syzbot+99071492e3de2eff49e9@syzkaller.appspotmail.com


# 1.100 10-Nov-2022 hannken

If built with DEBUG Limit the depth of file system stack so kernel sanitizers
may stress mount/unmount without exhausting the kernel stack.


# 1.99 04-Nov-2022 hannken

Add a helper to set or clear lower mount and use it.
Always add a reference to the lower mount.

Ride 9.99.105


# 1.98 26-Oct-2022 riastradh

sys/filedesc.h: New home for extern cwdi0.


Revision tags: bouyer-sunxi-drm-base
# 1.97 13-Sep-2022 riastradh

vflush(9): Insert `involuntary' preemption point at each vnode.

Currently there is a voluntary yield every 100ms, but that's a long
time. Should help to avoid hogging the CPU while flushing lots of
data to big disks on systems without kpreemption.


# 1.96 26-Aug-2022 hannken

Two defects in vfs_getnewfsid():

- Parallel mounts may get the same fsid. Always increment "xxxfs_mntid"
to make it unlikely.

- Directly walk "mountlist" to prevent a rare deadlock where one thread
holds a vnode locked, calls vfs_getnewfsid() and the iterator has to
wait for a suspended file system while the thread suspending needs
this vnode lock.


# 1.95 22-Aug-2022 hannken

Protect changing "v_mountedhere" with file system suspension instead
of vnode lock.


# 1.94 08-Jul-2022 hannken

Suspend file system after VFS_MOUNT() and before taking mnt_updating.
Prevents deadlock against concurrent unmounts of layered file systems.


# 1.93 09-Apr-2022 riastradh

sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.


# 1.92 28-Mar-2022 riastradh

specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.


# 1.91 24-Mar-2022 riastradh

vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.

Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.


# 1.90 19-Mar-2022 hannken

Lock vnode across VOP_OPEN.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.101 09-Dec-2022 hannken

Harden layered file systems usage of field "mnt_lower" against
forced unmounts of the lower layer.

- Dont allow "dead_rootmount" as lower layer.

- Take file system busy before a vfs operation walks down the stack.

Reported-by: syzbot+27b35e5675b1753cec03@syzkaller.appspotmail.com
Reported-by: syzbot+99071492e3de2eff49e9@syzkaller.appspotmail.com


# 1.100 10-Nov-2022 hannken

If built with DEBUG Limit the depth of file system stack so kernel sanitizers
may stress mount/unmount without exhausting the kernel stack.


# 1.99 04-Nov-2022 hannken

Add a helper to set or clear lower mount and use it.
Always add a reference to the lower mount.

Ride 9.99.105


# 1.98 26-Oct-2022 riastradh

sys/filedesc.h: New home for extern cwdi0.


Revision tags: bouyer-sunxi-drm-base
# 1.97 13-Sep-2022 riastradh

vflush(9): Insert `involuntary' preemption point at each vnode.

Currently there is a voluntary yield every 100ms, but that's a long
time. Should help to avoid hogging the CPU while flushing lots of
data to big disks on systems without kpreemption.


# 1.96 26-Aug-2022 hannken

Two defects in vfs_getnewfsid():

- Parallel mounts may get the same fsid. Always increment "xxxfs_mntid"
to make it unlikely.

- Directly walk "mountlist" to prevent a rare deadlock where one thread
holds a vnode locked, calls vfs_getnewfsid() and the iterator has to
wait for a suspended file system while the thread suspending needs
this vnode lock.


# 1.95 22-Aug-2022 hannken

Protect changing "v_mountedhere" with file system suspension instead
of vnode lock.


# 1.94 08-Jul-2022 hannken

Suspend file system after VFS_MOUNT() and before taking mnt_updating.
Prevents deadlock against concurrent unmounts of layered file systems.


# 1.93 09-Apr-2022 riastradh

sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.


# 1.92 28-Mar-2022 riastradh

specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.


# 1.91 24-Mar-2022 riastradh

vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.

Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.


# 1.90 19-Mar-2022 hannken

Lock vnode across VOP_OPEN.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.100 10-Nov-2022 hannken

If built with DEBUG Limit the depth of file system stack so kernel sanitizers
may stress mount/unmount without exhausting the kernel stack.


# 1.99 04-Nov-2022 hannken

Add a helper to set or clear lower mount and use it.
Always add a reference to the lower mount.

Ride 9.99.105


# 1.98 26-Oct-2022 riastradh

sys/filedesc.h: New home for extern cwdi0.


Revision tags: bouyer-sunxi-drm-base
# 1.97 13-Sep-2022 riastradh

vflush(9): Insert `involuntary' preemption point at each vnode.

Currently there is a voluntary yield every 100ms, but that's a long
time. Should help to avoid hogging the CPU while flushing lots of
data to big disks on systems without kpreemption.


# 1.96 26-Aug-2022 hannken

Two defects in vfs_getnewfsid():

- Parallel mounts may get the same fsid. Always increment "xxxfs_mntid"
to make it unlikely.

- Directly walk "mountlist" to prevent a rare deadlock where one thread
holds a vnode locked, calls vfs_getnewfsid() and the iterator has to
wait for a suspended file system while the thread suspending needs
this vnode lock.


# 1.95 22-Aug-2022 hannken

Protect changing "v_mountedhere" with file system suspension instead
of vnode lock.


# 1.94 08-Jul-2022 hannken

Suspend file system after VFS_MOUNT() and before taking mnt_updating.
Prevents deadlock against concurrent unmounts of layered file systems.


# 1.93 09-Apr-2022 riastradh

sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.


# 1.92 28-Mar-2022 riastradh

specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.


# 1.91 24-Mar-2022 riastradh

vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.

Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.


# 1.90 19-Mar-2022 hannken

Lock vnode across VOP_OPEN.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.99 04-Nov-2022 hannken

Add a helper to set or clear lower mount and use it.
Always add a reference to the lower mount.

Ride 9.99.105


# 1.98 26-Oct-2022 riastradh

sys/filedesc.h: New home for extern cwdi0.


Revision tags: bouyer-sunxi-drm-base
# 1.97 13-Sep-2022 riastradh

vflush(9): Insert `involuntary' preemption point at each vnode.

Currently there is a voluntary yield every 100ms, but that's a long
time. Should help to avoid hogging the CPU while flushing lots of
data to big disks on systems without kpreemption.


# 1.96 26-Aug-2022 hannken

Two defects in vfs_getnewfsid():

- Parallel mounts may get the same fsid. Always increment "xxxfs_mntid"
to make it unlikely.

- Directly walk "mountlist" to prevent a rare deadlock where one thread
holds a vnode locked, calls vfs_getnewfsid() and the iterator has to
wait for a suspended file system while the thread suspending needs
this vnode lock.


# 1.95 22-Aug-2022 hannken

Protect changing "v_mountedhere" with file system suspension instead
of vnode lock.


# 1.94 08-Jul-2022 hannken

Suspend file system after VFS_MOUNT() and before taking mnt_updating.
Prevents deadlock against concurrent unmounts of layered file systems.


# 1.93 09-Apr-2022 riastradh

sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.


# 1.92 28-Mar-2022 riastradh

specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.


# 1.91 24-Mar-2022 riastradh

vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.

Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.


# 1.90 19-Mar-2022 hannken

Lock vnode across VOP_OPEN.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.98 26-Oct-2022 riastradh

sys/filedesc.h: New home for extern cwdi0.


Revision tags: bouyer-sunxi-drm-base
# 1.97 13-Sep-2022 riastradh

vflush(9): Insert `involuntary' preemption point at each vnode.

Currently there is a voluntary yield every 100ms, but that's a long
time. Should help to avoid hogging the CPU while flushing lots of
data to big disks on systems without kpreemption.


# 1.96 26-Aug-2022 hannken

Two defects in vfs_getnewfsid():

- Parallel mounts may get the same fsid. Always increment "xxxfs_mntid"
to make it unlikely.

- Directly walk "mountlist" to prevent a rare deadlock where one thread
holds a vnode locked, calls vfs_getnewfsid() and the iterator has to
wait for a suspended file system while the thread suspending needs
this vnode lock.


# 1.95 22-Aug-2022 hannken

Protect changing "v_mountedhere" with file system suspension instead
of vnode lock.


# 1.94 08-Jul-2022 hannken

Suspend file system after VFS_MOUNT() and before taking mnt_updating.
Prevents deadlock against concurrent unmounts of layered file systems.


# 1.93 09-Apr-2022 riastradh

sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.


# 1.92 28-Mar-2022 riastradh

specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.


# 1.91 24-Mar-2022 riastradh

vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.

Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.


# 1.90 19-Mar-2022 hannken

Lock vnode across VOP_OPEN.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.97 13-Sep-2022 riastradh

vflush(9): Insert `involuntary' preemption point at each vnode.

Currently there is a voluntary yield every 100ms, but that's a long
time. Should help to avoid hogging the CPU while flushing lots of
data to big disks on systems without kpreemption.


# 1.96 26-Aug-2022 hannken

Two defects in vfs_getnewfsid():

- Parallel mounts may get the same fsid. Always increment "xxxfs_mntid"
to make it unlikely.

- Directly walk "mountlist" to prevent a rare deadlock where one thread
holds a vnode locked, calls vfs_getnewfsid() and the iterator has to
wait for a suspended file system while the thread suspending needs
this vnode lock.


# 1.95 22-Aug-2022 hannken

Protect changing "v_mountedhere" with file system suspension instead
of vnode lock.


# 1.94 08-Jul-2022 hannken

Suspend file system after VFS_MOUNT() and before taking mnt_updating.
Prevents deadlock against concurrent unmounts of layered file systems.


# 1.93 09-Apr-2022 riastradh

sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.


# 1.92 28-Mar-2022 riastradh

specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.


# 1.91 24-Mar-2022 riastradh

vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.

Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.


# 1.90 19-Mar-2022 hannken

Lock vnode across VOP_OPEN.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.96 26-Aug-2022 hannken

Two defects in vfs_getnewfsid():

- Parallel mounts may get the same fsid. Always increment "xxxfs_mntid"
to make it unlikely.

- Directly walk "mountlist" to prevent a rare deadlock where one thread
holds a vnode locked, calls vfs_getnewfsid() and the iterator has to
wait for a suspended file system while the thread suspending needs
this vnode lock.


# 1.95 22-Aug-2022 hannken

Protect changing "v_mountedhere" with file system suspension instead
of vnode lock.


# 1.94 08-Jul-2022 hannken

Suspend file system after VFS_MOUNT() and before taking mnt_updating.
Prevents deadlock against concurrent unmounts of layered file systems.


# 1.93 09-Apr-2022 riastradh

sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.


# 1.92 28-Mar-2022 riastradh

specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.


# 1.91 24-Mar-2022 riastradh

vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.

Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.


# 1.90 19-Mar-2022 hannken

Lock vnode across VOP_OPEN.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.95 22-Aug-2022 hannken

Protect changing "v_mountedhere" with file system suspension instead
of vnode lock.


# 1.94 08-Jul-2022 hannken

Suspend file system after VFS_MOUNT() and before taking mnt_updating.
Prevents deadlock against concurrent unmounts of layered file systems.


# 1.93 09-Apr-2022 riastradh

sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.


# 1.92 28-Mar-2022 riastradh

specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.


# 1.91 24-Mar-2022 riastradh

vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.

Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.


# 1.90 19-Mar-2022 hannken

Lock vnode across VOP_OPEN.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.94 08-Jul-2022 hannken

Suspend file system after VFS_MOUNT() and before taking mnt_updating.
Prevents deadlock against concurrent unmounts of layered file systems.


# 1.93 09-Apr-2022 riastradh

sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.


# 1.92 28-Mar-2022 riastradh

specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.


# 1.91 24-Mar-2022 riastradh

vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.

Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.


# 1.90 19-Mar-2022 hannken

Lock vnode across VOP_OPEN.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.93 09-Apr-2022 riastradh

sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.


# 1.92 28-Mar-2022 riastradh

specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.


# 1.91 24-Mar-2022 riastradh

vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.

Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.


# 1.90 19-Mar-2022 hannken

Lock vnode across VOP_OPEN.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.92 28-Mar-2022 riastradh

specfs: Let spec_node_lookup_by_dev wait for reclaim to finish.

vdevgone relies on this to ensure that if there is a concurrent
revoke in progress, it will wait for that revoke to finish -- that
way, it can guarantee all I/O operations have completed and the
device is closed.


# 1.91 24-Mar-2022 riastradh

vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.

Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.


# 1.90 19-Mar-2022 hannken

Lock vnode across VOP_OPEN.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.91 24-Mar-2022 riastradh

vfs(9): Add missing vnode lock around VOP_CLOSE in vfs_mountroot.

Maybe vnode_if.c should be taught to KASSERT the vnode lock now that
locks always work.


# 1.90 19-Mar-2022 hannken

Lock vnode across VOP_OPEN.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.90 19-Mar-2022 hannken

Lock vnode across VOP_OPEN.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.89 16-Mar-2022 andvar

s/paniced/panicked/ and s/borken/broken/ in comments.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.88 12-Mar-2022 riastradh

sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.87 04-Feb-2022 hannken

Stop clearing "v_mountedhere" in mount_domount() error path.

We did not set it and may clear the value from another mount.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.86 16-Feb-2021 hannken

Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)


Revision tags: thorpej-futex-base
# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


# 1.84 13-Oct-2020 hannken

branches: 1.84.2;
Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.85 19-Nov-2020 hannken

We have to ignore interrupts when suspending here the same way
we have to do with revoke.

Reported-by: syzbot+0cfb253b382a9836450a@syzkaller.appspotmail.com


Revision tags: thorpej-futex-base
# 1.84 13-Oct-2020 hannken

Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.84 13-Oct-2020 hannken

Suspend file system before unmounting in mount_domount() error path
to prevent diagnostic assertions from unmount/flush.

Reported-by: syzbot+8d557f49c8b7888182eb@syzkaller.appspotmail.com
Reported-by: syzbot+e87fe1e769a3426d9bf3@syzkaller.appspotmail.com
Reported-by: syzbot+9c5b86e651e98c5bf438@syzkaller.appspotmail.com
Reported-by: syzbot+610b614af0d66179ca78@syzkaller.appspotmail.com
Reported-by: syzbot+7818ff113a1535ebc724@syzkaller.appspotmail.com


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.83 23-May-2020 ad

Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.82 01-May-2020 hannken

Undo Rev. 1.79, it breaks root-on-raid where it destroys the component
disks before the raid:

forcefully unmounting / (/dev/raid0a)...
sd1: detached
sd0: detached
raid0: cache flush to component /dev/sd0a failed.
raid0: cache flush to component /dev/sd1a failed.
fatal page fault in supervisor mode
Stopped in pid 2356.2356 (reboot) at netbsd:sdstrategy+0x36

Reopens PR kern/54969: Disk cache is no longer flushed on shutdown


Revision tags: bouyer-xenpvh-base2
# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

branches: 1.70.4;
Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.81 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421
# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.80 20-Apr-2020 ad

Rename buf_syncwait() to vfs_syncwait(), and have it wait on v_numoutput
rather than BC_BUSY. Removes the dependency on bufhash.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


Revision tags: bouyer-xenpvh-base1
# 1.79 19-Apr-2020 hannken

Destroy anonymous device vnodes on reboot once the last file system
got unmounted and the mount list is empty.

PR kern/54969: Disk cache is no longer flushed on shutdown


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

branches: 1.75.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.78 13-Apr-2020 ad

Replace most uses of vp->v_usecount with a call to vrefcnt(vp), a function
that hides the details and does atomic_load_relaxed(). Signature matches
FreeBSD.


# 1.77 13-Apr-2020 maxv

hardclock_ticks -> getticks()


Revision tags: phil-wifi-20200411
# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.76 10-Apr-2020 ad

vfs_mountroot(): don't needlessly grab a second reference to the root vnode
(the kernel never chdirs) nor a lock on it.


Revision tags: bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.75 23-Feb-2020 ad

Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.75 23-Feb-2020 ad

Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


Revision tags: ad-namecache-base2 ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


Revision tags: ad-namecache-base1
# 1.74 17-Jan-2020 ad

VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.


Revision tags: ad-namecache-base
# 1.73 22-Dec-2019 ad

branches: 1.73.2;
Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.73 22-Dec-2019 ad

Make mntvnode_lock per-mount, and address false sharing of struct mount.


Revision tags: phil-wifi-20191119
# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.72 16-Nov-2019 maxv

NULL-check the structure pointer, not the address of its first field. Also
add KASSERT. For clarity, and to appease kUBSan.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.71 19-Aug-2019 christos

If we could not start extattr for some reason, don't advertise extattr in the
mount.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base
# 1.70 20-Feb-2019 hannken

Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

branches: 1.67.4;
Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


Revision tags: isaki-audio2-base
# 1.70 20-Feb-2019 hannken

Move fstrans_unmount() to vfs_rele(), just before it would free the mount.
Don't take a mount reference for fstrans as it gets notified about the release.

Defer the final free of the mount to fstrans_mount_dtor() when fstrans
has released all references to this mount. Prevents the mount's memory
to be reused as a new mount before fstrans released all references.

Address PR kern/53928 modules/t_builtin:disable test case randomly fails.


# 1.69 20-Feb-2019 hannken

Attach "mnt_transinfo" to "dead_rootmount" so every mount has a
valid "mnt_transinfo" and remove now unneeded flag IMNT_HAS_TRANS.

Run fstrans_start()/fstrans_done() on dead_rootmount if FSTRANS_DEAD_ENABLED.
Should become the default for DIAGNOSTIC in the future.


# 1.68 05-Feb-2019 hannken

Allow dounmount() with file system already suspended.

Remove no longer valid test for layered mounts,
ZFS will unmount snapshots bottom up.


Revision tags: pgoyette-compat-merge-20190127 pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base tls-maxphys-base-20171202 nick-nhusb-base-20170825
# 1.67 21-Aug-2017 hannken

Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.67 21-Aug-2017 hannken

Change forced unmount to revert open device vnodes to anonymous devices.


Revision tags: perseant-stdc-iso10646-base
# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.66 04-Jun-2017 hannken

Operations fstrans_start() and fstrans_start_nowait() now always
use FSTRANS_SHARED as lock type so remove the lock type argument.

File system state FSTRANS_SUSPENDING is now unused so remove it.

Regen vnode_if files.

Ride 8.99.1 less than a hour ago.


Revision tags: netbsd-8-base
# 1.65 01-Jun-2017 chs

branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.65 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.64 24-May-2017 hannken

With dounmount() working on a suspended file system remove no longer
needed fields mnt_busynest and mnt_unmounting from struct mount.

Welcome to 7.99.73


# 1.63 24-May-2017 hannken

Remove the syncer dance from dounmount(). The syncer skips
unmounting file systems as they are suspended.

Remove now unused syncer_mutex.


Revision tags: prg-localcount2-base3
# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.62 17-May-2017 hannken

Suspend file system while unmounting. This way no operations run
on the mounted file system during unmount and all operations see
the state before or after the (possibly failed) unmount.


Revision tags: prg-localcount2-base2
# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.61 07-May-2017 hannken

Enter fstrans from _vfs_busy() and leave from vfs_unbusy().

Adapt sched_sync() and do_sys_sync().


# 1.60 07-May-2017 hannken

Move fstrans initialization to vfs_mountalloc().


# 1.59 07-May-2017 hannken

Remove now invalid comment.


Revision tags: prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.58 17-Apr-2017 hannken

branches: 1.58.2;
Add vfs_trybusy() and mountlist_iterator_trynext() and use it for the syncer.


# 1.57 17-Apr-2017 hannken

No need to keep a not yet visible mount busy. Move vfs_busy()
from vfs_mountalloc() to vfs_rootmountalloc().

XXX: Do we really need to vfs_busy() for vfs_mountroot?


# 1.56 17-Apr-2017 hannken

Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.


# 1.55 17-Apr-2017 hannken

Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).


# 1.54 17-Apr-2017 hannken

Cleanup after mountlist iterator:
- remove now unused field mnt_list.
- rename mount_list to mountlist and make it local to vfs_mount.c.
- make mountlist_lock local to vfs_mount.c.

Change pstat.c to retrieve vnodes by lru lists.


# 1.53 12-Apr-2017 hannken

Switch vfs_getvfs(), dounmount() and vfs_mountroot() to mountlist iterator.

Add a helper to retrieve a mount with "highest generation < arg" and
use it from vfs_unmount_forceone() and vfs_unmountall1().


# 1.52 11-Apr-2017 hannken

Add an iterator over the currently mounted file systems.

Ride 7.99.68


Revision tags: jdolecek-ncq-base
# 1.51 30-Mar-2017 hannken

Change _fstrans_start() to allocate per lwp info for layered file
systems to get a reference on the mount.

Set mnt_lower on successfull mount only.


Revision tags: pgoyette-localcount-20170320
# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

branches: 1.45.2;
Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.50 06-Mar-2017 hannken

Always use the lowest mount for fstrans and suspend. This way we
enter/leave or suspend/resume the stack of layered file systems as a unit.


# 1.49 06-Mar-2017 hannken

Deny unmounting file systems below layered file systems.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.48 22-Feb-2017 hannken

Enable fstrans on all file systems.

Welcome to 7.99.61


Revision tags: nick-nhusb-base-20170204
# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.47 27-Jan-2017 hannken

Vrecycle() cannot wait for the vnode lock. On a leaf file system this lock
will always succeed as we hold the last reference and prevent further
references. On layered file systems waiting for the lock would open a can of
deadlocks as the lower vnodes may have other active references.


# 1.46 27-Jan-2017 hannken

When called with WRITECLOSE vflush() must sync the vnode and take
care of unlinked but open vnodes.

PR kern/30525 remounting ffs read-only (mount -ur) does not sync metadata.


Revision tags: bouyer-socketcan-base
# 1.45 13-Jan-2017 hannken

Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.45 13-Jan-2017 hannken

Add file-local iterator variant vfs_vnode_iterator_next1() that
waits for vnodes to become reclaimed and use it from vflush().


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.44 11-Jan-2017 hannken

Move vnode member v_mntvnodes as vi_mntvnodes to vnode_impl.h.

Add an ugly hack so pstat.c may still traverse the list.


Revision tags: pgoyette-localcount-20170107
# 1.43 02-Jan-2017 hannken

Rename vget() to vcache_vget() and vcache_tryvget() respectively and
move the definitions to sys/vnode_impl.h.

No functional change intended.

Welcome to 7.99.54


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.


# 1.42 14-Dec-2016 hannken

Remove the "target" argment from vfs_drainvnodes() as it is
always equal to "desiredvnodes" and move its definition
from sys/vnode.h to sys/vnode_impl.h.

Extend vfs_drainvnodes() to also wait for deferred vrele to flush
and replace the call to vrele_flush() with a call to vfs_drainvnodes().


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.41 03-Nov-2016 hannken

Split sys/vnode.h into sys/vnode.h and sys/vnode_impl.h
- Move _VFS_VNODE_PRIVATE protected operations into vnode_impl.h.
- Move struct vnode_impl definition and operations into vnode_impl.h.
- Include vnode_impl.h where we include vnode.h with _VFS_VNODE_PRIVATE defined.
- Get rid of _VFS_VNODE_PRIVATE.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.40 07-Jul-2016 msaitoh

branches: 1.40.2;
KNF. Remove extra spaces. No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.39 19-May-2016 hannken

Change "ISSET(vp->v_iflag, VI_XLOCK)" to "vdead_check(vp, VDEAD_NOWAIT)".


# 1.38 19-May-2016 hannken

Add VFS_VNODE_PRIVATE protected operations vnalloc_marker() to create,
vnfree_marker() to destroy and vnis_marker() to test for marker vnodes.

Make operations vnalloc() and vnfree() local to vfs_vnode.c.


Revision tags: nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.37 19-Aug-2015 hannken

Redo Rev. 1.30: Change vfs_vnode_iterator_next() to skip reclaiming
vnodes (VI_XLOCK set) without waiting and change vflush() to wait for
these vnodes.


# 1.36 02-Aug-2015 manu

Do not VFS_SYNC before VFS_UNMOUNT on force unmount

VFS_SYNC does not consider whether we are performing a force unmount or not,
and therefore it can wait for a while if the filesytstem is misbehaving.
Removing VFS_SYNC before VFS_UNMOUNT on forced unmount fixes the problem.

This should not cause harm as the VFS_SYNC seems just useless.
As noted by Chuck Silvers in
http://mail-index.netbsd.org/tech-kern/2015/07/13/msg019156.html
- Nothing seems to prevent vnodes from getting dirty again after VFS_SYNC call.
- Filesystems do flush data through vflush() in VFS_UNMOUNT anyway.

As a consequence, the VFS_SYNC call in do_unmount() could probably be
completely removed. But since such a change is quite dangerous, we just
remove it in the case of forced unmounts, which are situations where
the risk of data loss is known to the operator.


Revision tags: nick-nhusb-base-20150606
# 1.35 06-May-2015 hannken

Remove miscfs/syncfs and

- move the syncer into kern/vfs_subr.c.

- change the syncer to process the mountlist and VFS_SYNC as appropriate.

- use an API for mount points similiar to the API for vnodes:
- vfs_syncer_add_to_worklist(struct mount *mp) to add
- vfs_syncer_remove_from_worklist(struct mount *mp) to remove a mount.

No objections on tech-kern@


# 1.34 20-Apr-2015 riastradh

Make vget always return vnode unlocked.

Convert callers who want locks to use vn_lock afterward.

Add extra argument so the compiler will report stragglers.


Revision tags: nick-nhusb-base-20150406
# 1.33 09-Mar-2015 pooka

The use of root_device is not limited to vfs, so don't supply it in
vfs_mount.c, use subr_device.c instead.

Fixes rump kernels built with DEBUG by again not making the base depend
on the vfs faction, as reported by Patrick Welche.


# 1.32 08-Jan-2015 hannken

vfs_vnode_iterator_destroy: set v_usecount of marker to zero to prevent
an assertion from vnfree().


Revision tags: nick-nhusb-base
# 1.31 14-Nov-2014 manu

branches: 1.31.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.30 30-May-2014 hannken

branches: 1.30.2;
vfs_vnode_iterator_next(): if a vnode is reclaiming (VI_XLOCK) skip
the filter. Vget() will wait until the vnode disappeared. No more
"dangling vnode" panics on unmount.


# 1.29 24-May-2014 christos

Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.28 18-Mar-2014 hannken

branches: 1.28.2;
Operations vmark(), vunmark() and vismarker() have been replaced by
vfs_vnode_iterator_*(), remove them.

Document vfs_vnode_iterator_*().

Make VI_MARKER private to vfs_vnode.c, vfs_mount.c and unfortunately
to ufs/lfs/lfs_segment.c.

Welcome to 6.99.37


Revision tags: riastradh-drm2-base3
# 1.27 05-Mar-2014 hannken

Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34


# 1.26 27-Feb-2014 hannken

Currently dead vnodes still reside on the vnodelist of the file system
they have been removed from.

Create a "dead mount" that takes dead vnodes until they get freed.

Discussed on tech-kern.


# 1.25 27-Nov-2013 christos

one more *_END(head) -> NULL


# 1.24 23-Nov-2013 christos

change the mountlist CIRCLEQ into a TAILQ


# 1.23 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.22 25-Oct-2013 martin

Mark diagnostic-only variables


# 1.21 30-Sep-2013 hannken

Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>


# 1.20 30-Aug-2013 hannken

Dounmount() violates the locking protocol for member v_mountedhere.
A vnode lock is required to access or modify this field.

Lock/unlock the vnode when clearing v_mountedhere.

Reviewed by: David Holland <dholland@netbsd.org>

Should fix PR #48135 (Bad locking for umount)


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.19 28-Apr-2013 mlelstv

branches: 1.19.4;
fix locking order mountlist_lock -> mnt_unmounting.
Set IMNT_GONE early to protect against concurrent dounmount()
and vfs_busy() before the mountpoint is removed from
mount list.


# 1.18 26-Apr-2013 mlelstv

Correct umount semantics to return EBUSY when a filesystem is busy
instead of failing filesystem operations with EBUSY when attempting
an umount.
This fixes kern/38141.


Revision tags: agc-symver-base
# 1.17 13-Feb-2013 hannken

Make the spec_node table implementation private to spec_vnops.c.

To retrieve a spec_node, two new lookup functions (by device or by mount)
are implemented. Both return a referenced vnode, for an opened block device
the opened vnode is returned so further diagnostic checks "vp == ... sd_bdevvp"
will not fire. Otherwise any vnode matching the criteria gets returned.

No objections on tech-kern.

Welcome to 6.99.17


Revision tags: yamt-pagecache-base8
# 1.16 14-Dec-2012 pooka

Adjust unmount prints to avoid "boothowto = AB_VERBOSE" from being
cluttered like this:

unmounting file systems...unmounted kernfs on /kern type kernfs
unmounted etcetc.
done

tested: ~AB_VERBOSE, AB_VERBOSE, -DDEBUG


Revision tags: yamt-pagecache-base7 yamt-pagecache-base6
# 1.15 27-Oct-2012 chs

split device_t/softc for all remaining drivers.
replace "struct device *" with "device_t".
use device_xname(), device_unit(), etc.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5
# 1.14 08-May-2012 gson

branches: 1.14.2;
Move VFS_EXTATTRCTL to mount_domount(). This makes the
fs/puffs/t_fuzz:mountfuzz7, fs/puffs/t_fuzz:mountfuzz8,
and fs/zfs/t_zpool:create tests pass again. Patch from
manu, discussed on tech-kern and committed at his request.


Revision tags: jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.13 13-Mar-2012 elad

Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.


Revision tags: jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.12 18-Nov-2011 christos

branches: 1.12.4; 1.12.6;
- collect the long (and sometimes incomplete) lists of basic flags into
the header file and use that.
- sort the list of basic flags
- add MNT_RELATIME, ST_RELATIME
- mask all the op flags, for symmetry.

The real bit difference is (which is harmless):
- mount was missing MNT_EXTATTR
- update sets MNT_RDONLY twice
- ops also could or in MNT_GETARGS, but this is impossible because the
code would have chosen to do getargs then.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.11 14-Oct-2011 hannken

branches: 1.11.2;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.


# 1.10 07-Oct-2011 hannken

As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.


# 1.9 01-Sep-2011 christos

undo previous


# 1.8 01-Sep-2011 christos

fix typo.


# 1.7 01-Sep-2011 christos

Check for v_type before v_rdev because it is cheaper and safer.


# 1.6 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase rmind-uvmplock-base
# 1.5 05-Jun-2011 dsl

branches: 1.5.2;
Don't directly call sys_sync() from random bits of code, instead
add do_sys_sync() that takes an 'lwp' (for l_cred) as an argument.
Explicitly pass &lwp0 rather than NULL and expecting sys_sync to
substitute some random lwp.


Revision tags: cherry-xenmp-base
# 1.4 03-Apr-2011 rmind

branches: 1.4.2; 1.4.4;
vfs_insmntque: convert check to assert.


# 1.3 02-Apr-2011 rmind

Merge vfs_shutdown1() and vfs_shutdown().


# 1.2 02-Apr-2011 rmind

- Move vrele_list flush notify code into vrele_flush() routine.
- Make some structures static.


# 1.1 02-Apr-2011 rmind

Split off parts of vfs_subr.c into vfs_vnode.c and vfs_mount.c modules.

No functional change. Discussed on tech-kern@.