History log of /freebsd-current/sys/fs/tmpfs/tmpfs_vfsops.c
Revision Date Author Comments
# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 765ad5b2 11-Aug-2023 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: add the "pgread" mount option to the allowed options list for mount

Fixes: 0f613ab85e5a5274704d179f39fb15163d46e7c4
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 0f613ab8 05-Aug-2023 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: add a knob to enable pgcache read for mount

Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41334


# b61a5730 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-NetBSD identifier is obsolete, drop -NetBSD

The SPDX folks have obsoleted the BSD-2-Clause-NetBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 15df9021 23-Feb-2023 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: support the nosymfollow mount option

PR: 269772
Reported by: firk@cantconnect.ru
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 56242a4c 05-Dec-2022 Fedor Uporov <fsu@FreeBSD.org>

Add extended attributes

The extattrs follows semantic of ufs, mean it cannot
be set to char/block devices and fifos. The attributes
are allocated using regular malloc with M_WAITOK
allocation with the own malloc tag M_TMPFSEA. The memory
consumed by extended attributes is limited to avoid OOM
triggereing by tmpfs_mount variable tm_ea_memory_max,
which is set initialy to 16 MB. The extended attributes
entries are stored as linked list in the tmpfs node.
The mount point lock is required only under setextattr
and deleteextattr to update extended attributes
memory-inuse counter, all other operations are doing
under vnode lock.

Reviewed by: kib
MFC after: 2 week
Differential revision: https://reviews.freebsd.org/D38052


# 1d9f3a37 06-Jan-2023 Konstantin Belousov <kib@FreeBSD.org>

Stop cleaning MNT_LOCAL on unmount

There is no point in clearing just this flag. Flags are reset on the
struct mount re-allocation for reuse anyway.

Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37966


# 37aea264 20-Oct-2022 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: for used pages, account really allocated pages, instead of file sizes

This makes tmpfs size accounting correct for the sparce files. Also
correct report st_blocks/va_bytes. Previously the reported value did not
accounted for the swapped out pages.

PR: 223015
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37097


# d9dc64f1 20-Oct-2022 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: make vm_object point to the tmpfs node instead of vnode

The vnode could be reclaimed and allocated again during the lifecycle of
the node, but the node cannot. Also, referencing the node would allow
to reach it and tmpfs mount data from the object, regardless of the
state of the possibly absent vnode.

Still use swp_tmpfs for back-pointer, instead of using handle. Use of
named swap objects would incur taking the sw_alloc_sx on node allocation
and deallocation.

swp_tmpfs is renamed to swp_priv to remove the last bit of tmpfs in vm/.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37097


# 83aff0f0 20-Oct-2022 Konstantin Belousov <kib@FreeBSD.org>

Add 'show tmpfs' ddb command

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37097


# 4dcdf398 17-May-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: replace the MNTK_TEXT_REFS flag with VIRF_TEXT_REF

This allows to stop maintaing the VI_TEXT_REF flag and consequently
opens up fully lockless v_writecount adjustment.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D33127


# c12118f6 24-Aug-2021 Ka Ho Ng <khng@FreeBSD.org>

tmpfs: Fix styles

A lot of return statements were in the wrong style before this commit.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# eec2e4ef 07-May-2021 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: reimplement the mtime scan to use the lazy list

Tested by: pho
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D30065


# 28bc23ab 07-May-2021 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: dynamically register tmpfs pager

Remove OBJT_SWAP_TMPFS. Move tmpfs-specific swap pager bits into
tmpfs_subr.c.

There is no longer any code to directly support tmpfs in sys/vm, most
tmpfs knowledge is shared by non-anon swap object type implementation.
The tmpfs-specific methods are provided by registered tmpfs pager, which
inherits from the swap pager.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30168


# 4b8365d7 30-Apr-2021 Konstantin Belousov <kib@FreeBSD.org>

Add OBJT_SWAP_TMPFS pager

This is OBJT_SWAP pager, specialized for tmpfs. Right now, both swap pager
and generic vm code have to explicitly handle swap objects which are tmpfs
vnode v_object, in the special ways. Replace (almost) all such places with
proper methods.

Since VM still needs a notion of the 'swap object', regardless of its
use, add yet another type-classification flag OBJ_SWAP. Set it in
vm_object_allocate() where other type-class flags are set.

This change almost completely eliminates the knowledge of tmpfs from VM,
and opens a way to make OBJT_SWAP_TMPFS loadable from tmpfs.ko.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30070


# 9f200bc4 05-Jan-2021 Konstantin Belousov <kib@FreeBSD.org>

tmpfs_free_tmp(): explicitly assert that tmp is locked

Despite TMPFS_UNLOCK() is done in both paths later, unlocking not locked
mutex provides different failure mode.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# 42bebbda 05-Jan-2021 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: make M_TMPFSMNT static to tmpfs_vfsops.c

This malloc type is only used in this file.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# 081e36e7 15-Sep-2020 Konstantin Belousov <kib@FreeBSD.org>

Add tmpfs page cache read support.

Or it could be explained as lockless (for vnode lock) reads. Reads
are performed from the node tn_obj object. Tmpfs regular vnode object
lifecycle is significantly different from the normal OBJT_VNODE: it is
alive as far as ref_count > 0.

Ensure liveness of the tmpfs VREG node and consequently v_object
inside VOP_READ_PGCACHE by referencing tmpfs node in tmpfs_open().
Provide custom tmpfs fo_close() method on file, to ensure that close
is paired with open.

Add tmpfs VOP_READ_PGCACHE that takes advantage of all tmpfs quirks.
It is quite cheap in code size sense to support page-ins for read for
tmpfs even if we do not own tmpfs vnode lock. Also, we can handle
holes in tmpfs node without additional efforts, and do not have
limitation of the transfer size.

Reviewed by: markj
Discussed with and benchmarked by: mjg (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26346


# a92a971b 16-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove the thread argument from vget

It was already asserted to be curthread.

Semantic patch:

@@

expression arg1, arg2, arg3;

@@

- vget(arg1, arg2, arg3)
+ vget(arg1, arg2)


# 03337743 10-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: clean MNTK_FPLOOKUP if MNT_UNION is set

Elides checking it during lookup.


# 172ffe70 25-Jul-2020 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: add support for lockless lookup

Reviewed by: kib
Tested by: pho (in a patchset)
Differential Revision: https://reviews.freebsd.org/D25580


# 693d10a2 03-Jun-2020 Ryan Moeller <freqlabs@FreeBSD.org>

tmpfs: Preserve alignment of struct fid fields

On 64-bit platforms, the two short fields in `struct tmpfs_fid` are padded to
the 64-bit alignment of the long field. This pushes the offsets of the
subsequent fields by 4 bytes and makes `struct tmpfs_fid` bigger than
`struct fid`. `tmpfs_vptofh()` casts a `struct fid *` to `struct tmpfs_fid *`,
causing 4 bytes of adjacent memory to be overwritten when the struct fields are
set. Through several layers of indirection and embedded structs, the adjacent
memory for one particular call to `tmpfs_vptofh()` happens to be the stack
canary for `nfsrvd_compound()`. Half of the canary ends up being clobbered,
going unnoticed until eventually the stack check fails when `nfsrvd_compound()`
returns and a panic is triggered.

Instead of duplicating fields of `struct fid` in `struct tmpfs_fid`, narrow the
struct to cover only the unique fields for tmpfs and assert at compile time
that the struct fits in the allotted space. This way we don't have to
replicate the offsets of `struct fid` fields, we just use them directly.

Reviewed by: kib, mav, rmacklem
Approved by: mav (mentor)
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25077


# 074ad60a 15-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: make write suspension mandatory

At the time opt-in was introduced adding yourself as a writer was esrializing
across the mount point. Nowadays it is fully per-cpu, the only impact being
a small single-threaded hit on top of what's there right now.

Vast majority of the overhead stems from the call to VOP_GETWRITEMOUNT which
has is done regardless.

Should someone want to microoptimize this single-threaded they can coalesce
looking the mount up with adding a write to it.


# c1e84733 04-Feb-2020 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: add nomtime mount option,

which disables tracking mtime updates due to writes through the shared
mapped areas backed by tmpfs files. This removes periodic scans which
downgrades rw mapped pages to ro to note the writes.

Suggested by: mjg
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D23432


# b66352b7 04-Feb-2020 Konstantin Belousov <kib@FreeBSD.org>

tmpfs_mount update: simplify, cache the value of VFS_TO_TMPFS() calculation.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# cc3593fb 12-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: rework vnode list management

The current notion of an active vnode is eliminated.

Vnodes transition between 0<->1 hold counts all the time and the
associated traversal between different lists induces significant
scalability problems in certain workloads.

Introduce a global list containing all allocated vnodes. They get
unlinked only when UMA reclaims memory and are only requeued when
hold count reaches 0.

Sample result from an incremental make -s -j 104 bzImage on tmpfs:
stock: 118.55s user 3649.73s system 7479% cpu 50.382 total
patched: 122.38s user 1780.45s system 6242% cpu 30.480 total

Reviewed by: jeff
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D22997


# b249ce48 03-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop the mostly unused flags argument from VOP_UNLOCK

Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427


# a51c8071 04-Dec-2019 Konstantin Belousov <kib@FreeBSD.org>

Stop using per-mount tmpfs zones.

Requested and reviewed by: jeff
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22643


# 67d0e293 29-Oct-2019 Jeff Roberson <jeff@FreeBSD.org>

Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY
flag and use the same system.

This enables further fault locking improvements by allowing more faults to
proceed with a shared lock.

Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22116


# 9c04e4c0 13-Oct-2019 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: use MNTK_NOMSYNC

Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22009


# 2288078c 08-Oct-2019 Doug Moore <dougm@FreeBSD.org>

Define macro VM_MAP_ENTRY_FOREACH for enumerating the entries in a vm_map.
In case the implementation ever changes from using a chain of next pointers,
then changing the macro definition will be necessary, but changing all the
files that iterate over vm_map entries will not.

Drop a counter in vm_object.c that would have an effect only if the
vm_map entry count was wrong.

Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision: https://reviews.freebsd.org/D21882


# 7682d0be 06-Oct-2019 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: add root vnode caching

See r353150.

Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21646


# de4e1aeb 18-Aug-2019 Konstantin Belousov <kib@FreeBSD.org>

Fix an issue with executing tmpfs binary.

Suppose that a binary was executed from tmpfs mount, and the text
vnode was reclaimed while the binary was still running. It is
possible during even the normal operations since tmpfs vnode'
vm_object has swap type, and no references on the vnode is held. Also
assume that the text vnode was revived for some reason. Then, on the
process exit or exec, unmapping of the text mapping tries to remove
the text reference from the vnode, but since it went from
recycle/instantiation cycle, there is no reference kept, and assertion
in VOP_UNSET_TEXT_CHECKED() triggers.

Fix this by keeping a use reference on the tmpfs vnode for each exec
reference. This prevents the vnode reclamation while executable map
entry is active.

Do it by adding per-mount flag MNTK_TEXT_REFS that directs
vop_stdset_text() to add use ref on first vnode text use, and
per-vnode VI_TEXT_REF flag, to record the need on unref in
vop_stdunset_text() on last vnode text use going away. Set
MNTK_TEXT_REFS for tmpfs mounts.

Reported by: bdrewery
Tested by: sbruno, pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 28ce2bc1 12-Apr-2019 Konstantin Belousov <kib@FreeBSD.org>

Ignore doomed vnodes in tmpfs_update_mtime().

Otherwise we might dereference NULL vp->v_data after
VP_TO_TMPFS_NODE().

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 5c4ce6fa 02-Apr-2019 Konstantin Belousov <kib@FreeBSD.org>

tmpfs: plug holes on rw->ro mount update.

In particular:
- suspend the mount around vflush() to avoid new writes come after the
vnode is processed;
- flush pending metadata updates (mostly node times);
- remap all rw mappings of files from the mount into ro.

It is not clear to me how to handle writeable mappings on rw->ro for
tmpfs best. Other filesystems, which use vnode vm object, call
vgone() on vnodes with writers, which sets the vm object type to
OBJT_DEAD, and keep the resident pages and installed ptes as is. In
particular, the existing mappings continue to work as far as
application only accesses resident pages, but changes are not flushed
to file.

For tmpfs the vm object of VREG vnodes also serves as the data pages
container, giving single copy of the mapped pages, so it cannot be set
to OBJT_DEAD. Alternatives for making rw mappings ro could be either
invalidating them at all, or marking as CoW.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D19737


# 4f207061 25-Mar-2019 Maxim Sobolev <sobomax@FreeBSD.org>

Refine r345425: get rid of superfluous helper macro that I have added.

MFC after: 2 weeks


# b4b3e349 25-Mar-2019 Allan Jude <allanjude@FreeBSD.org>

Make TMPFS_PAGES_MINRESERVED a kernel option

TMPFS_PAGES_MINRESERVED controls how much memory is reserved for the system
and not used by tmpfs.

On very small memory systems, the default value may be too high and this
prevents these small memory systems from using reroot, which is required
for them to install firmware updates.

Submitted by: Hiroki Mori <yamori813@yahoo.co.jp>
Reviewed by: mizhka
Differential Revision: https://reviews.freebsd.org/D13583


# ac1a10ef 22-Mar-2019 Maxim Sobolev <sobomax@FreeBSD.org>

Make it possible to update TMPFS mount point from read-only to read-write
and vice versa.

Reviewed by: delphij
Approved by: delphij
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D19682


# 6d2e2df7 23-Nov-2018 Mark Johnston <markj@FreeBSD.org>

Ensure that directory entry padding bytes are zeroed.

Directory entries must be padded to maintain alignment; in many
filesystems the padding was not initialized, resulting in stack
memory being copied out to userspace. With the ino64 work there
are also some explicit pad fields in struct dirent. Add a subroutine
to clear these bytes and use it in the in-tree filesystems. The
NFS client is omitted for now as it was fixed separately in r340787.

Reported by: Thomas Barabosch, Fraunhofer FKIE
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation


# 30e0cf49 20-Nov-2018 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: use unr64 for inode numbers

Sponsored by: The FreeBSD Foundation


# 0e5c6bd4 04-May-2018 Jamie Gritton <jamie@FreeBSD.org>

Make it easier for filesystems to count themselves as jail-enabled,
by doing most of the work in a new function prison_add_vfs in kern_jail.c
Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and
the rest is taken care of. This includes adding a jail parameter like
allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed.
Both of these used to be a static list of known filesystems, with
predefined permission bits.

Reviewed by: kib
Differential Revision: D14681


# 135beaf6 05-Dec-2017 Gleb Smirnoff <glebius@FreeBSD.org>

Reduce pollution via tmpfs.h.


# d63027b6 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/fs: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# ba19246e 23-Oct-2017 Matt Joras <mjoras@FreeBSD.org>

Move clear_unrhdr to tmpfs_free_tmp.

Clearing the unr in tmpfs_unmount is not correct. In the case of
multiple references to the tmpfs mount (e.g. when there are lookup
threads using it) it will not be the one to finish tmpfs_free_tmp. In
those cases tmpfs_free_node_locked will be the final one to execute
tmpfs_free_tmp, and until then the unr must be valid.

Reported by: pho
Approved/reviewed by: rstone (mentor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12749


# 9aaf913e 11-Oct-2017 Matt Joras <mjoras@FreeBSD.org>

When unmounting a tmpfs, do not call free_unr.

tmpfs uses unr(9) to allocate inodes. Previously when unmounting it
would individually free the units when it freed each vnode. This is
unnecessary as we can use the newly-added unrhdr_clear function to clear
out the unr in onde go. This measurably reduces the time to unmount a
tmpfs with many files.

Reviewed by: cem, lidl
Approved by: rstone (mentor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12591


# 00ac6a98 19-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Add mount option for tmpfs(5) to not use namecache.

The option "nonc" disables using of namecache for the created mount,
by default namecache is used. The rationale for the option is that
namecache duplicates the information which is already kept in memory
by tmpfs. Since it believed that namecache scales better than tmpfs,
or will scale better, do not enable the option by default. On the
other hand, smaller machines may benefit from lesser namecache
pressure.

Discussed with: mjg
Tested by: pho (as part of larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 64c25043 19-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Refcount tmpfs nodes and mount structures.

On dotdot lookup and fhtovp operations, it is possible for the file
represented by tmpfs node to be removed after the thread calculated
the pointer. In this case, tmpfs_alloc_vp() accesses freed memory.

Introduce the reference count on the nodes. The allnodes list from
tmpfs mount owns 1 reference, and threads performing unlocked
operations on the node, add one transient reference. Similarly, since
struct tmpfs_mount maintains the list where nodes are enlisted,
refcount it by one reference from struct mount and one reference from
each node on the list. Both nodes and tmpfs_mounts are removed when
refcount goes to zero.

Note that this means that nodes and tmpfs_mounts might survive some
time after the node is deleted or tmpfs_unmount() finished. The
tmpfs_alloc_vp() in these cases returns error either due to node
removal (tn_nlinks == 0) or because of insmntque1(9) error.

Tested by: pho (as part of larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 280ffa5e 19-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Rename tmpfs_mount member allnode_lock to include namespace prefix.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# bba7ed20 19-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Style fixes and comment updates.

Edit comments which explain no longer relevant details, and add
locking annotations to the struct tmpfs_node members.

Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# ed2159c9 13-Jan-2017 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: manage tm_pages_used with atomics

Reviewed by: kib (previous version)


# 31e73fd4 06-Jan-2017 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: enabled MNTK_EXTENDED_SHARED

Discussed with: kib


# 5f34e93c 05-Jul-2015 Mark Johnston <markj@FreeBSD.org>

Check suspendability on the mountpoint returned by VOP_GETWRITEMOUNT.
This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough
filesystems like nullfs and unionfs no longer need to inherit this
information from their lower layer(s). This change also restores the
pre-r273336 behaviour of using the presence of a susp_clean VFS method to
request suspension support.

Reviewed by: kib, mjg
Differential Revision: https://reviews.freebsd.org/D2937


# f40cb1c6 28-Jan-2015 Konstantin Belousov <kib@FreeBSD.org>

Update mtime for tmpfs files modified through memory mapping. Similar
to UFS, perform updates during syncer scans, which in particular means
that tmpfs now performs scan on sync. Also, this means that a mtime
update may be delayed up to 30 seconds after the write.

The vm_object' OBJ_TMPFS_DIRTY flag for tmpfs swap object is similar
to the OBJ_MIGHTBEDIRTY flag for the vnode object, it indicates that
object could have been dirtied. Adapt fast page fault handler and
vm_object_set_writeable_dirty() to handle OBJ_TMPFS_NODE same as
OBJT_VNODE.

Reported by: Ronald Klop <ronald-lists@klop.ws>
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 3544b0f6 28-Jan-2015 Konstantin Belousov <kib@FreeBSD.org>

tmpfs does not use UVM on FreeBSD.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 12e2a30e 21-Oct-2014 Mateusz Guzik <mjg@FreeBSD.org>

tmpfs: allow shared file lookups

Tested by: pho


# 4fce16e4 20-Oct-2014 Mateusz Guzik <mjg@FreeBSD.org>

Provide vfs suspension support only for filesystems which need it, take
two.

nullfs and unionfs need to request suspension if underlying filesystem(s)
use it. Utilize mnt_kern_flag for this purpose.

This is a fixup for 273271.

No strong objections from: kib
Pointy hat to: mjg
MFC after: 2 weeks


# 020b8f17 19-Oct-2014 Mateusz Guzik <mjg@FreeBSD.org>

Provide vfs suspension support only for filesystems which need it.

Need is expressed by providing vfs_susp_clean function in vfsops.

Differential Revision: D952
Reviewed by: kib (previous version)
MFC after: 2 weeks


# 4cda7f7e 14-Jul-2014 Konstantin Belousov <kib@FreeBSD.org>

Rework the tmpfs unmount.

- Suspend filesystem for unmount. This prevents new tmpfs nodes from
instantiating, and also ensures that only unmount thread can destroy
nodes.

- Do not start tmpfs node deletion until all vnodes are reclaimed,
which guarantees that no thread can access tmpfs data. For this,
call vflush() in the loop, until the mnt_nvnodelistsize is non-zero.
Note that after mnt_nvnodelistsize becomes 0, insmntque() blocks
insertion of a vnode germ into the mount list of vnodes.

- Fail node allocation when the filesystem is being unmounted. This
is race-free due to the vflush() call in loop. This is mostly
cosmetic, avoiding some more work which might be done until
suspension in unmount is started.

Note that there is currently no way to prevent new vnode instantiation
from readers during the unmount. Due to this, forced unmount might
live-lock if vflush() loop cannot get to the zero vnode count due to
races with readers. The unmount would proceed after the load is
lifted.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# fca015d3 14-Jul-2014 Konstantin Belousov <kib@FreeBSD.org>

Remove code separator lines which do not conform to style(9).

Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 0742ebc9 13-Mar-2014 Bryan Drewery <bdrewery@FreeBSD.org>

Fix -o size less than PAGE_SIZE resulting in SIZE_MAX being used.

Discussed with: kib
MFC after: 2 weeks


# 2454886e 23-Aug-2013 Xin LI <delphij@FreeBSD.org>

Allow tmpfs be mounted inside jail.


# 59169d91 23-Jul-2013 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

tmpfs works perfectly fine with -o union -- there is no reason to exclude it
from the list of options.


# 4fd5efe7 06-Jan-2013 Gleb Kurtsou <gleb@FreeBSD.org>

tmpfs: Replace directory entry linked list with RB-Tree.

Use file name hash as a tree key, handle duplicate keys. Both VOP_LOOKUP
and VOP_READDIR operations utilize same tree for search. Directory
entry offset (cookie) is either file name hash or incremental id in case
of hash collisions (duplicate-cookies). Keep sorted per directory list
of duplicate-cookie entries to facilitate cookie number allocation.

Don't fail if previous VOP_READDIR() offset is no longer valid, start
with next dirent instead. Other file system handle it similarly.

Workaround race prone tn_readdir_last[pn] fields update.

Add tmpfs_dir_destroy() to free all dirents.

Set NFS cookies in tmpfs_dir_getdents(). Return EJUSTRETURN from
tmpfs_dir_getdents() instead of hard coded -1.

Mark directory traversal routines static as they are no longer
used outside of tmpfs_subr.c


# bc2258da 09-Nov-2012 Attilio Rao <attilio@FreeBSD.org>

Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag.
Porters should refer to __FreeBSD_version 1000021 for this change as
it may have happened at the same timeframe.


# fc8fdae0 27-Sep-2012 Matthew D Fleming <mdf@FreeBSD.org>

Fix up kernel sources to be ready for a 64-bit ino_t.

Original code by: Gleb Kurtsou


# c5ab5ce3 16-Apr-2012 Jaakko Heinonen <jh@FreeBSD.org>

tmpfs: Allow update mounts only for certain options.

Since r230208 update mounts were allowed if the list of mount options
contained the "export" option. This is not correct as tmpfs doesn't
really support updating all options.

Reviewed by: kevlo, trociny


# 9295c628 07-Apr-2012 Gleb Kurtsou <gleb@FreeBSD.org>

tmpfs supports only INT_MAX nodes due to limitations of unit number
allocator.

Replace UINT32_MAX checks with INT_MAX. Keeping more than 2^31 nodes in
memory is not likely to become possible in foreseeable feature and would
require new unit number allocator.

Discussed with: delphij
MFC after: 2 weeks


# 0ff93c48 07-Apr-2012 Gleb Kurtsou <gleb@FreeBSD.org>

Add vfs_getopt_size. Support human readable file system options in tmpfs.

Increase maximum tmpfs file system size to 4GB*PAGE_SIZE on 32 bit archs.

Discussed with: delphij
MFC after: 2 weeks


# da7aa277 07-Apr-2012 Gleb Kurtsou <gleb@FreeBSD.org>

Add reserved memory limit sysctl to tmpfs.

Cleanup availble and used memory functions.
Check if free pages available before allocating new node.

Discussed with: delphij


# e0d3195b 16-Jan-2012 Kevin Lo <kevlo@FreeBSD.org>

Return EOPNOTSUPP since we only support update mounts for NFS export.

Spotted by: trociny


# 57eb5548 16-Jan-2012 Kevin Lo <kevlo@FreeBSD.org>

Add nfs export support to tmpfs(5)

Reviewed by: kib


# 82543c59 07-Nov-2011 Marcel Moolenaar <marcel@FreeBSD.org>

Don astbestos garment and remove the warning about TMPFS being experimental
-- highly experimental even. So far the closest to a bug in TMPFS that people
have gotten to relates to how ZFS can take away from the memory that TMPFS
needs. One can argue that such is not a bug in TMPFS. Irrespective, even if
there is a bug here and there in TMPFS, it's not in our own advantage to
scare people away from using TMPFS. I for one have been using it, even with
ZFS, very successfully.


# 694a586a 21-May-2011 Rick Macklem <rmacklem@FreeBSD.org>

Add a lock flags argument to the VFS_FHTOVP() file system
method, so that callers can indicate the minimum vnode
locking requirement. This will allow some file systems to choose
to return a LK_SHARED locked vnode when LK_SHARED is specified
for the flags argument. This patch only adds the flag. It
does not change any file system to use it and all callers
specify LK_EXCLUSIVE, so file system semantics are not changed.

Reviewed by: kib


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# dec3772e 28-Jan-2010 Jaakko Heinonen <jh@FreeBSD.org>

Add "maxfilesize" mount option for tmpfs to allow specifying the
maximum file size limit. Default is UINT64_MAX when the option is
not specified. It was useless to set the limit to the total amount of
memory and swap in the system.

Use tmpfs_mem_info() rather than get_swpgtotal() in tmpfs_mount() to
check if there is enough memory available.

Remove now unused get_swpgtotal().

Reviewed by: Gleb Kurtsou
Approved by: trasz (mentor)


# 189ee6be 20-Jan-2010 Jaakko Heinonen <jh@FreeBSD.org>

- Change the type of nodes_max to u_int and use "%u" format string to
convert its value. [1]
- Set default tm_nodes_max to min(pages + 3, UINT32_MAX). It's more
reasonable than the old four nodes per page (with page size 4096) because
non-empty regular files always use at least one page. This fixes possible
overflow in the calculation. [2]
- Don't allow more than tm_nodes_max nodes allocated in tmpfs_alloc_node().

PR: kern/138367
Suggested by: bde [1], Gleb Kurtsou [2]
Approved by: trasz (mentor)


# 5364a38d 13-Jan-2010 Jaakko Heinonen <jh@FreeBSD.org>

- Fix some style bugs in tmpfs_mount(). [1]
- Remove a stale comment about tmpfs_mem_info() 'total' argument.

Reported by: bde [1]


# 720c50b3 08-Jan-2010 Jaakko Heinonen <jh@FreeBSD.org>

- Change the type of size_max to u_quad_t because its value is converted
with vfs_scanopt(9) using the "%qu" format string.
- Limit the maximum value of size_max to (SIZE_MAX - PAGE_SIZE) to
prevent overflow in howmany() macro.

PR: kern/141194
Approved by: trasz (mentor)
MFC after: 2 weeks


# dfd233ed 11-May-2009 Attilio Rao <attilio@FreeBSD.org>

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# e08d5567 03-Sep-2008 Xin LI <delphij@FreeBSD.org>

Reflect license change of NetBSD code.

Obtained from: NetBSD
MFC after: 3 days


# 0359a12e 28-Aug-2008 Attilio Rao <attilio@FreeBSD.org>

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# eab626f1 16-Apr-2008 Konstantin Belousov <kib@FreeBSD.org>

Move the head of byte-level advisory lock list from the
filesystem-specific vnode data to the struct vnode. Provide the
default implementation for the vop_advlock and vop_advlockasync.
Purge the locks on the vnode reclaim by using the lf_purgelocks().
The default implementation is augmented for the nfs and smbfs.
In the nfs_advlock, push the Giant inside the nfs_dolock.

Before the change, the vop_advlock and vop_advlockasync have taken the
unlocked vnode and dereferenced the fs-private inode data, racing with
with the vnode reclamation due to forced unmount. Now, the vop_getattr
under the shared vnode lock is used to obtain the inode size, and
later, in the lf_advlockasync, after locking the vnode interlock, the
VI_DOOMED flag is checked to prevent an operation on the doomed vnode.

The implementation of the lf_purgelocks() is submitted by dfr.

Reported by: kris
Tested by: kris, pho
Discussed with: jeff, dfr
MFC after: 2 weeks


# 22db15c0 13-Jan-2008 Attilio Rao <attilio@FreeBSD.org>

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


# cb05b60a 09-Jan-2008 Attilio Rao <attilio@FreeBSD.org>

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


# 745973bd 06-Dec-2007 Xin LI <delphij@FreeBSD.org>

size_max should be unsigned, as such, use size_t here.


# 7871e52b 17-Nov-2007 Xin LI <delphij@FreeBSD.org>

MFp4: Several fixes to tmpfs which makes it to survive from pho@'s
strees2 suite, to quote his letter, this change:

1. It removes the tn_lookup_dirent stuff. I think this cannot be fixed,
because nothing protects vnode/tmpfs node between lookup is done, and
actual operation is performed, in the case the vnode lock is dropped.
At least, this is the case with the from vnode for rename.

For now, we do the linear lookup in the parent node. This has its own
drawbacks. Not mentioning speed (that could be fixed by using hash), the
real problem is the situation where several hardlinks exist in the dvp.
But, I think this is fixable.

2. The patch restores the VV_ROOT flag on the root vnode after it became
reclaimed and allocated again. This fixes MPASS assertion at the start
of the tmpfs_lookup() reported by many.

Submitted by: kib


# e0f51ae7 17-Nov-2007 Xin LI <delphij@FreeBSD.org>

MFp4: Fix several style(9) bugs.

Submitted by: des


# eed4ee29 12-Nov-2007 Xin LI <delphij@FreeBSD.org>

Correct a stack overflow which will trigger panics when
mode= is specified, caused by incorrect format string
specified to vfs_scanopt() and subsequently vsscanf().

Pointed out by: kib
Submitted by: des


# 3543c1b4 04-Oct-2007 Xin LI <delphij@FreeBSD.org>

MFp4: Provide a dummy verb "export" to shut up the message
showed up at start when NFS is enabled.

Reported by: rafan
Approved by: re (tmpfs blanket)


# 386c9692 04-Oct-2007 Xin LI <delphij@FreeBSD.org>

Additional work is still needed before we can claim that tmpfs
is stable enough for production usage. Warn user upon mount.

Approved by: re (tmpfs blanket)


# 0ae6383d 09-Aug-2007 Xin LI <delphij@FreeBSD.org>

MFp4:

- Respect cnflag and don't lock vnode always as LK_EXCLUSIVE [1]
- Properly lock around tn_vnode to avoid NULL deference
- Be more careful handling vnodes (*)

(*) This is a WIP
[1] by pjd via howardsu

Thanks kib@ for his valuable VFS related comments.

Tested with: fsx, fstest, tmpfs regression test set
Found by: pho's stress2 suite
Approved by: re (tmpfs blanket)


# f62e5595 24-Jul-2007 Xin LI <delphij@FreeBSD.org>

MFp4: Force 64-bit arithmatic when caculating the maximum file size.
This fixes tmpfs caculations on 32-bit systems equipped with more than
4GB swap.

Reported by: Craig Boston <craig xfoil gank org>
PR: kern/114870
Approved by: re (tmpfs blanket)


# 72800829 23-Jul-2007 Xin LI <delphij@FreeBSD.org>

MFp4: When swapping is not enabled, allow creating files by taking
physical memory pages into account for tm_maxfilesize.

Reported by: Dominique Goncalves <dominique.goncalves gmail.com>
Submitted by: Howard Su
Approved by: re (tmpfs blanket)


# 8d9a89a3 11-Jul-2007 Xin LI <delphij@FreeBSD.org>

MFp4: Make use of the kernel unit number allocation facility
for tmpfs nodes.

Submitted by: Mingyan Guo <guomingyan gmail com>
Approved by: re (tmpfs blanket)


# 1df86a32 08-Jul-2007 Xin LI <delphij@FreeBSD.org>

MFp4:
- Plug memory leak.
- Respect underlying vnode's properties rather than assuming that
the user want root:wheel + 0755. Useful for using tmpfs(5) for
/tmp.
- Use roundup2 and howmany macros instead of rolling our own version.
- Try to fix fsx -W -R foo case.
- Instead of blindly zeroing a page, determine whether we need a pagein
order to prevent data corruption.
- Fix several bugs reported by Coverity.

Submitted by: Mingyan Guo <guomingyan gmail com>, Howard Su, delphij
Coverity ID: CID 2550, 2551, 2552, 2557
Approved by: re (tmpfs blanket)


# 9b258fca 28-Jun-2007 Xin LI <delphij@FreeBSD.org>

MFp4:

- Remove unnecessary NULL checks after M_WAITOK allocations.
- Use VOP_ACCESS instead of hand-rolled suser_cred()
calls. [1]
- Use malloc(9) KPI to allocate memory for string. The
optimization taken from NetBSD is not valid for FreeBSD
because our malloc(9) already act that way. [2]

Requested by: rwatson [1]
Submitted by: Howard Su [2]
Approved by: re (tmpfs blanket)


# a321f489 27-Jun-2007 Xin LI <delphij@FreeBSD.org>

Space/style cleanups after last set of commits.

Approved by: re (tmpfs blanket)


# 7adb1776 25-Jun-2007 Xin LI <delphij@FreeBSD.org>

MFp4: Several clean-ups and improvements over tmpfs:

- Remove tmpfs_zone_xxx KPI, the uma(9) wrapper, since
they does not bring any value now.
- Use |= instead of = when applying VV_ROOT flag.
- Remove tm_avariable_nodes list. Use uma to hold the
released nodes.
- init/destory interlock mutex of node when init/fini
instead of ctor/dtor.
- Change memory computing using u_int to fix negative
value in 2G mem machine.
- Remove unnecessary bzero's
- Rely uma logic to make file id allocation harder to
guess.
- Fix some unsigned/signed related things. Make sure
we respect -o size=xxxx
- Use wire instead of hold a page.
- Pass allocate_zero to obtain zeroed pages upon first
use.

Submitted by: Howard Su
Approved by: re (tmpfs blanket, kensmith)


# d1fa59e9 15-Jun-2007 Xin LI <delphij@FreeBSD.org>

MFp4: Add tmpfs, an efficient memory file system.

Please note that, this is currently considered as an
experimental feature so there could be some rough
edges. Consult http://wiki.freebsd.org/TMPFS for
more information.

For now, connect tmpfs to build on i386 and amd64
architectures only. Please let us know if you have
success with other platforms.

This work was developed by Julio M. Merino Vidal
for NetBSD as a SoC project; Rohit Jalan ported it
from NetBSD to FreeBSD. Howard Su and Glen Leeder
are worked on it to continue this effort.

Obtained from: NetBSD via p4
Submitted by: Howard Su (with some minor changes)
Approved by: re (kensmith)