History log of /freebsd-11-stable/sys/fs/tmpfs/tmpfs_vnops.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 357706 09-Feb-2020 kevans

MFC O_SEARCH: r357412, r357461, r357580, r357584, r357636, r357671, r357688

r357412:
Provide O_SEARCH

O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping
permissions checks on the directory itself after the initial open(). This is
close to the semantics we've historically applied for O_EXEC on a directory,
which is UB according to POSIX. Conveniently, O_SEARCH on a file is also
explicitly undefined behavior according to POSIX, so O_EXEC would be a fine
choice. The spec goes on to state that O_SEARCH and O_EXEC need not be
distinct values, but they're not defined to be the same value.

This was pointed out as an incompatibility with other systems that had made
its way into libarchive, which had assumed that O_EXEC was an alias for
O_SEARCH.

This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC
respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a
directory is checked in vn_open_vnode already, so for completeness we add a
NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not
re-check that when descending in namei.

[0] https://pubs.opengroup.org/onlinepubs/9699919799/

r357461:
namei: preserve errors from fget_cap_locked

Most notably, we want to make sure we don't clobber any capabilities-related
errors. This is a regression from r357412 (O_SEARCH) that was picked up by
the capsicum tests.

r357580:
O_SEARCH test: drop O_SEARCH|O_RDWR local diff

In FreeBSD's O_SEARCH implementation, O_SEARCH in conjunction with O_RDWR or
O_WRONLY is explicitly rejected. In this case, O_RDWR was not necessary
anyways as the file will get created with or without it.

This was submitted upstream as misc/54940 and committed in rev 1.8 of the
file.

r357584:
Record-only MFV of r357583: netbsd-tests: import upstreamed changes

The changes in question originated in FreeBSD/head; no further action is
required.

r357636:
MFV r357635: imnport v1.9 of the O_SEARCH tests

The RCSID data was wrong, so this is effectively a record-only merge
with correction of said data. No further changes should be needed in this
area, as we've now upstreamed our local changes to this specific test.

r357671:
O_SEARCH test: mark revokex an expected fail on NFS

The revokex test does not work when the scratch directory is created on NFS.
Given the nature of NFS, it likely can never work without looking like a
security hole since O_SEARCH would rely on the server knowing that the
directory did have +x at the time of open and that it's OK for it to have
been revoked based on POSIX specification for O_SEARCH.

This does mean that O_SEARCH is only partially functional on NFS in general,
but I suspect the execute bit getting revoked in the process is likely not
common.

r357688:
MFV r357687: Import NFS fix for O_SEARCH tests

The version that ended upstream was ultimately slightly different than the
version committed here; notably, statvfs() is used but it's redefined
appropriately to statfs() on FreeBSD since we don't provide the fstypename
for the former interface.


# 355262 01-Dec-2019 kib

MFC r355061:
tmpfs: resolve deadlock between rename and unmount.


# 353387 10-Oct-2019 kib

MFC r353064:
tmpfs_rename: style.


# 353386 10-Oct-2019 kib

MFC r353033:
Remove unnecessary vm/vm_page.h and vm/vm_pager.h includes from
tmpfs/tmpfs_vnops.c.


# 346286 16-Apr-2019 kib

MFC r345425, r345514, r345799, r345800, r345803, r346157:
Enable tmpfs rw->ro remounts.


# 341074 27-Nov-2018 markj

MFC r340856:
Ensure that directory entry padding bytes are zeroed.


# 328298 23-Jan-2018 jhb

MFC 320900,323882,324224,324226,324228,326986,326988,326989,326990,326993,
326994,326995,327004: Various fixes for pathconf(2).

The original change to use vop_stdpathconf() more widely was motivated
by a panic due to recent AIO-related changes. However, bde@ reported
that vop_stdpathconf() contained too many settings that were not
filesystem-independent. The end result of this set of patches is to
fix the AIO-related panic via use of a trimmed-down vop_stdpathconf()
while also adding support for missing pathconf variables in various
filesystems (and removing a few settings incorrectly reported as
supported).

320900:
Consistently use vop_stdpathconf() for default pathconf values.

Update filesystems not currently using vop_stdpathconf() in pathconf
VOPs to use vop_stdpathconf() for any configuration variables that do
not have filesystem-specific values. vop_stdpathconf() is used for
variables that have system-wide settings as well as providing default
values for some values based on system limits. Filesystems can still
explicitly override individual settings.

323882:
Only handle _PC_MAX_CANON, _PC_MAX_INPUT, and _PC_VDISABLE for TTY devices.

Move handling of these three pathconf() variables out of vop_stdpathconf()
and into devfs_pathconf() as TTY devices can only be devfs files. In
addition, only return settings for these three variables for devfs devices
whose device switch has the D_TTY flag set.

324224:
Handle _PC_FILESIZEBITS and _PC_SYMLINK_MAX pathconf() requests in cd9660.

cd9660 only supports symlinks with Rock Ridge extensions, so
_PC_SYMLINK_MAX is conditional on Rock Ridge.

324226:
Return 64 for pathconf(_PC_FILESIZEBITS) on tmpfs.

324228:
Flesh out pathconf() on UDF.

- Return 64 bits for _PC_FILESIZEBITS.
- Handle _PC_SYMLINK_MAX.
- Defer _PC_PATH_MAX to vop_stdpathconf().

326986:
Add a custom VOP_PATHCONF method for fdescfs.

The method handles NAME_MAX and LINK_MAX explicitly. For all other
pathconf variables, the method passes the request down to the underlying
file descriptor. This requires splitting a kern_fpathconf() syscallsubr
routine out of sys_fpathconf(). Also, to avoid lock order reversals with
vnode locks, the fdescfs vnode is unlocked around the call to
kern_fpathconf(), but with the usecount of the vnode bumped.

326988:
Add a custom VOP_PATHCONF method for fuse.

This method handles _PC_FILESIZEBITS, _PC_SYMLINK_MAX, and _PC_NO_TRUNC.
For other values it defers to vop_stdpathconf().

326989:
Support _PC_FILESIZEBITS in msdosfs' VOP_PATHCONF().

326990:
Handle _PC_FILESIZEBITS and _PC_NO_TRUNC for smbfs' VOP_PATHCONF().

326993:
Move NAME_MAX, LINK_MAX, and CHOWN_RESTRICTED out of vop_stdpathconf().

Having all filesystems fall through to default values isn't always correct
and these values can vary for different filesystem implementations. Most
of these changes just use the existing default values with a few exceptions:
- Don't report CHOWN_RESTRICTED for ZFS since it doesn't do the exact
permissions check this claims for chown().
- Use NANDFS_NAME_LEN for NAME_MAX for nandfs.
- Don't report a LINK_MAX of 0 on smbfs. Now fail with EINVAL to
indicate hard links aren't supported.

326994:
Handle _PC_FILESIZEBITS and _PC_SYMLINK_MAX for devfs' VOP_PATHCONF().

326995:
Use FUSE_LINK_MAX for LINK_MAX in fuse' VOP_PATHCONF().

Should have included this in r326993.

327004:
Rework pathconf handling for FIFOs.

On the one hand, FIFOs should respect other variables not supported by
the fifofs vnode operation (such as _PC_NAME_MAX, _PC_LINK_MAX, etc.).
These values are fs-specific and must come from a fs-specific method.
On the other hand, filesystems that support FIFOs are required to
support _PC_PIPE_BUF on directory vnodes that can contain FIFOs.
Given this latter requirement, once the fs-specific VOP_PATHCONF
method supports _PC_PIPE_BUF for directories, it is also suitable for
FIFOs permitting a single VOP_PATHCONF method to be used for both
FIFOs and non-FIFOs.

To that end, retire all of the FIFO-specific pathconf methods from
filesystems and change FIFO-specific vnode operation switches to use
the existing fs-specific VOP_PATHCONF method. For fifofs, set it's
VOP_PATHCONF to VOP_PANIC since it should no longer be used.

While here, move _PC_PIPE_BUF handling out of vop_stdpathconf() so that
only filesystems supporting FIFOs will report a value. In addition,
only report a valid _PC_PIPE_BUF for directories and FIFOs.

PR: 219851
Sponsored by: Chelsio Communications


# 328295 23-Jan-2018 jhb

MFC 323993: Use tmpfs_print for tmpfs FIFOs.


# 313078 02-Feb-2017 kib

MFC r312432:
Add a mount option for tmpfs(5) to not use namecache.


# 313077 02-Feb-2017 kib

MFC r312430:
Implement VOP_VPTOCNP() for tmpfs.


# 313075 02-Feb-2017 kib

MFC r312428:
Refcount tmpfs nodes and mount structures.


# 312804 26-Jan-2017 kib

MFC r312410:
Rework some tmpfs lock assertions.

MFC r312412:
Protect macro argument.


# 312802 26-Jan-2017 kib

MFC r312409:
Style fixes and comment updates.

MFC r312435:
Remove mistakenly merged field.


# 312072 13-Jan-2017 kib

MFC r311522:
Use type-independent formats for printing nlink_t and ino_t.


# 312066 13-Jan-2017 kib

MFC r311525:
Lock tmpfs node tn_status updates done under the shared vnode lock.


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 298806 29-Apr-2016 pfg

sys/fs: spelling fixes in comments.

No functional change.


# 295574 12-Feb-2016 markj

Clear the cookie pointer on error in tmpfs_readdir().

It is otherwise left dangling, and callers that request cookies always free
the cookie buffer, even when VOP_READDIR(9) returns an error. This results
in a double free if tmpfs_readdir() returns an error to the NFS server or
the Linux getdents(2) emulation code.

Reported by: pho
MFC after: 1 week
Security: double free of malloc(9)-backed memory
Sponsored by: EMC / Isilon Storage Division


# 281378 10-Apr-2015 will

tmpfs_getattr(): Return more correct allocated byte counts.

For VREG vnodes, return the resident page count (multiplied by PAGE_SIZE)
for the tmpfs node's anonymous VM object that stores actual file contents.

For all other vnodes, return the tmpfs_node's tn_size, which should not
be rounded to a page.

This change allows using stat(2) to identify a sparse file on tmpfs.

Reviewed by: kib
MFC after: 1 week


# 277972 31-Jan-2015 kib

Remove single-use boolean.

Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 277969 31-Jan-2015 kib

POSIX states that write(2) "shall mark for update the last data
modification and last file status change timestamps of the file".
Currently, tmpfs only modifies ctime when file was extended. Since
r277828 followed tmpfs_write(), mmaped writes also do not modify
ctime.

Fix this, by updating both ctime and mtime for writes to tmpfs files.

Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 277828 28-Jan-2015 kib

Update mtime for tmpfs files modified through memory mapping. Similar
to UFS, perform updates during syncer scans, which in particular means
that tmpfs now performs scan on sync. Also, this means that a mtime
update may be delayed up to 30 seconds after the write.

The vm_object' OBJ_TMPFS_DIRTY flag for tmpfs swap object is similar
to the OBJ_MIGHTBEDIRTY flag for the vnode object, it indicates that
object could have been dirtied. Adapt fast page fault handler and
vm_object_set_writeable_dirty() to handle OBJ_TMPFS_NODE same as
OBJT_VNODE.

Reported by: Ronald Klop <ronald-lists@klop.ws>
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 276007 21-Dec-2014 kib

Handle MAKEENTRY cnp flag in the VOP_CREATE(). Curiously, some
fs, e.g. smbfs, already did it.

Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 275897 18-Dec-2014 kib

The VOP_LOOKUP() implementations for CREATE op do not put the name
into namecache, to avoid cache trashing when doing large operations.
E.g., tar archive extraction is not usually followed by access to many
of the files created.

Right now, each VOP_LOOKUP() implementation explicitely knowns about
this quirk and tests for both MAKEENTRY flag presence and op != CREATE
to make the call to cache_enter(). Centralize the handling of the
quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP.
VFS now sets NOCACHE flag for CREATE namei() calls.

Note that the change in semantic is backward-compatible and could be
merged to the stable branch, and is compatible with non-changed
third-party filesystems which correctly handle MAKEENTRY.

Suggested by: Chris Torek <torek@pi-coral.com>
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 268766 16-Jul-2014 kib

Do not ignore error from tmpfs_alloc_vp(). It results in access to
the random memory.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 268765 16-Jul-2014 kib

Remove unused header.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 268764 16-Jul-2014 kib

Check for the cross-device cross-link attempt in the VFS, instead of
forcing filesystem VOP_LINK() methods to repeat the code. In
tmpfs_link(), remove redundand check for the type of the source,
already done by VFS.

Note that NFS server already performs this check before calling
VOP_LINK().

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 268614 14-Jul-2014 kib

Use tmpfs_vn_get_ino_gen() to handle the races with reclaim in tmpfs
dotdot lookup.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 268613 14-Jul-2014 kib

Style. Add comment about lock mode.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 268609 14-Jul-2014 kib

Add some assertions for the code handling vm_object for tmpfs vnode.
In particular, vnode must be exclusively locked when the tmpfs vnode
and object are divorced. When the vnode is opened, the object must be
still alive, since only live vnode can be opened, and the tmpfs node
owns a reference on the object.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 268608 14-Jul-2014 kib

The tmpfs_link() must not dereference the filesystem-specific data for
a vnode until it is verified that the vnode indeed belongs to tmpfs
mount. Otherwise, it might access random memory, at least in the
debug kernel.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 268605 14-Jul-2014 kib

Remove code separator lines which do not conform to style(9).

Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 267564 17-Jun-2014 kib

In msdosfs_setattr(), add a check for result of the utimes(2)
permissions test, forgotten in r164033.

Refactor the permission checks for utimes(2) into vnode helper
function vn_utimes_perm(9), and simplify its code comparing with the
UFS origin, by writing the call to VOP_ACCESSX only once. Use the
helper for UFS(5), tmpfs(5), devfs(5) and msdosfs(5).

Reported by: bde
Reviewed by: bde, trasz
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 263174 14-Mar-2014 bdrewery

Rename cnt to maxcookies and change its use as the condition for when to
lookup cookies to be less obscure.

No functional change.

Since r245115, cnt has not really been needed in tmpfs_dir_getdents(). Keep
it for the MPASS() for now though.

Sponsored by: EMC / Isilon Storage Division
MFC after: 2 weeks


# 254601 21-Aug-2013 kib

Extract the general-purpose code from tmpfs to perform uiomove from
the page queue of some vm object.

Discussed with: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation


# 254138 09-Aug-2013 attilio

The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl


# 253953 05-Aug-2013 attilio

Revert r253939:
We cannot busy a page before doing pagefaults.
Infact, it can deadlock against vnode lock, as it tries to vget().
Other functions, right now, have an opposite lock ordering, like
vm_object_sync(), which acquires the vnode lock first and then
sleeps on the busy mechanism.

Before this patch is reinserted we need to break this ordering.

Sponsored by: EMC / Isilon storage division
Reported by: kib


# 253939 04-Aug-2013 attilio

The page hold mechanism is fast but it has couple of fallouts:
- It does not let pages respect the LRU policy
- It bloats the active/inactive queues of few pages

Try to avoid it as much as possible with the long-term target to
completely remove it.
Use the soft-busy mechanism to protect page content accesses during
short-term operations (like uiomove_fromphys()).

After this change only vm_fault_quick_hold_pages() is still using the
hold mechanism for page content access.
There is an additional complexity there as the quick path cannot
immediately access the page object to busy the page and the slow path
cannot however busy more than one page a time (to avoid deadlocks).

Fixing such primitive can bring to complete removal of the page hold
mechanism.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff
Tested by: pho


# 253927 04-Aug-2013 attilio

Remove unnecessary soft busy of the page before to do vn_rdwr() in
kern_sendfile() which is unnecessary.
The page is already wired so it will not be subjected to pagefault.
The content cannot be effectively protected as it is full of races
already.
Multiple accesses to the same indexes are serialized through vn_rdwr().

Sponsored by: EMC / Isilon storage division
Reviewed by: alc, jeff
Tested by: pho


# 251505 07-Jun-2013 alc

Add missing VM object unlocks in an error case.

Reviewed by: kib


# 251452 06-Jun-2013 alc

Don't busy the page unless we are likely to release the object lock.

Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division


# 251383 04-Jun-2013 alc

Eliminate unnecessary vm object locking from tmpfs_nocacheread().


# 250310 06-May-2013 kib

Avoid deactivating the page if it is already on a queue, only requeue
the page. This both reduces the number of queues locking and avoids
moving the active page to inactive list just because the page was read
or written.

Based on the suggestion by: alc
Reviewed by: alc
Tested by: pho


# 250190 02-May-2013 kib

Fix the v_object leak for non-regular tmpfs vnodes.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation


# 250189 02-May-2013 kib

For the new regular tmpfs vnode, v_object is initialized before
insmntque() is called. The standard insmntque destructor resets the
vop vector to deadfs one, and calls vgone() on the vnode. As result,
v_object is kept unchanged, which triggers an assertion in the reclaim
code, on instmntque() failure. Also, in this case, OBJ_TMPFS flag on
the backed vm object is not cleared.

Provide the tmpfs insmntque() destructor which properly clears
OBJ_TMPFS flag and resets v_object.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation


# 250188 02-May-2013 kib

The page read or written could be wired. Do not requeue if the page
is not on a queue.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation


# 250030 28-Apr-2013 kib

Rework the handling of the tmpfs node backing swap object and tmpfs
vnode v_object to avoid double-buffering. Use the same object both as
the backing store for tmpfs node and as the v_object.

Besides reducing memory use up to 2x times for situation of mapping
files from tmpfs, it also makes tmpfs read and write operations copy
twice bytes less.

VM subsystem was already slightly adapted to tolerate OBJT_SWAP object
as v_object. Now the vm_object_deallocate() is modified to not
reinstantiate OBJ_ONEMAPPING flag and help the VFS to correctly handle
VV_TEXT flag on the last dereference of the tmpfs backing object.

Reviewed by: alc
Tested by: pho, bf
MFC after: 1 month


# 248610 22-Mar-2013 pjd

- Constify local path variable for chflagsat().
- Use correct format characters (%lx) for u_long.

This fixes the build broken in r248599.


# 248422 17-Mar-2013 kib

Remove negative name cache entry pointing to the target name, which
could be instantiated while tdvp was unlocked.

Reported by: Rick Miller <vmiller at hostileadmin com>
Tested by: pho
MFC after: 1 week


# 248084 09-Mar-2013 attilio

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


# 246603 09-Feb-2013 attilio

Remove a racy checks on resident and cached pages for
tmpfs_mapped{read, write}() functions:
- tmpfs_mapped{read, write}() are only called within VOP_{READ, WRITE}(),
which check before-hand to work only on valid VREG vnodes. Also the
vnode is locked for the duration of the work, making vnode reclaiming
impossible, during the operation. Hence, vobj can never be NULL.
- Currently check on resident pages and cached pages without vm object
lock held is racy and can do even more harm than good, as a page could
be transitioning between these 2 pools and then be skipped entirely.
Skip the checks as lookups on empty splay trees are very cheap.

Discussed with: alc
Tested by: flo
MFC after: 2 weeks


# 245115 06-Jan-2013 gleb

tmpfs: Replace directory entry linked list with RB-Tree.

Use file name hash as a tree key, handle duplicate keys. Both VOP_LOOKUP
and VOP_READDIR operations utilize same tree for search. Directory
entry offset (cookie) is either file name hash or incremental id in case
of hash collisions (duplicate-cookies). Keep sorted per directory list
of duplicate-cookie entries to facilitate cookie number allocation.

Don't fail if previous VOP_READDIR() offset is no longer valid, start
with next dirent instead. Other file system handle it similarly.

Workaround race prone tn_readdir_last[pn] fields update.

Add tmpfs_dir_destroy() to free all dirents.

Set NFS cookies in tmpfs_dir_getdents(). Return EJUSTRETURN from
tmpfs_dir_getdents() instead of hard coded -1.

Mark directory traversal routines static as they are no longer
used outside of tmpfs_subr.c


# 239065 05-Aug-2012 kib

After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed. Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by: alc
MFC after: 2 weeks


# 234607 23-Apr-2012 trasz

Remove unused thread argument to vrecycle().

Reviewed by: kib


# 234064 09-Apr-2012 attilio

- Introduce a cache-miss optimization for consistency with other
accesses of the cache member of vm_object objects.
- Use novel vm_page_is_cached() for checks outside of the vm subsystem.

Reviewed by: alc
MFC after: 2 weeks
X-MFC: r234039


# 232960 14-Mar-2012 gleb

Prevent tmpfs_rename() deadlock in a way similar to UFS

Unlock vnodes and try to lock them one by one. Relookup fvp and tvp.

Approved by: mdf (mentor)


# 232821 11-Mar-2012 kib

Remove fifo.h. The only used function declaration from the header is
migrated to sys/vnode.h.

Submitted by: gianni


# 232401 02-Mar-2012 jhb

Similar to the fixes in 226967 and 226987, purge any name cache entries
associated with the previous vnode (if any) associated with the target of
a rename(). Otherwise, a lookup of the target pathname concurrent with a
rename() could re-add a name cache entry after the namei(RENAME) lookup
in kern_renameat() had purged the target pathname.

MFC after: 2 weeks


# 231669 14-Feb-2012 tijl

Replace PRIdMAX with "jd" in a printf call. Cast the corresponding value to
intmax_t instead of uintmax_t, because the original type is off_t.


# 230180 15-Jan-2012 alc

When tmpfs_write() resets an extended file to its original size after an
error, we want tmpfs_reg_resize() to ignore I/O errors and unconditionally
update the file's size.

Reviewed by: kib
MFC after: 3 weeks


# 230120 14-Jan-2012 alc

Neither tmpfs_nocacheread() nor tmpfs_mappedwrite() needs to call
vm_object_pip_{add,subtract}() on the swap object because the swap
object can't be destroyed while the vnode is exclusively locked.
Moreover, even if the swap object could have been destroyed during
tmpfs_nocacheread() and tmpfs_mappedwrite() this code is broken
because vm_object_pip_subtract() does not wake up the sleeping thread
that is trying to destroy the swap object.

Free invalid pages after an I/O error. There is no virtue in keeping
them around in the swap object creating more work for the page daemon.
(I believe that any non-busy page in the swap object will now always
be valid.)

vm_pager_get_pages() does not return a standard errno, so its return
value should not be returned by tmpfs without translation to an errno
value.

There is no reason for the wakeup on vpg in tmpfs_mappedwrite() to
occur with the swap object locked.

Eliminate printf()s from tmpfs_nocacheread() and tmpfs_mappedwrite().
(The swap pager already spam your console if data corruption is
imminent.)

Reviewed by: kib
MFC after: 3 weeks


# 229363 03-Jan-2012 alc

Don't pass VM_ALLOC_ZERO to vm_page_grab() in tmpfs_mappedwrite() and
tmpfs_nocacheread(). It is both unnecessary and a pessimization. It
results in either the page being zeroed twice or zeroed first and then
overwritten by an I/O operation.

MFC after: 3 weeks


# 227822 22-Nov-2011 ivoras

Avoid panics from recursive rename operations. Not a perfect patch but
good enough for now.

PR: kern/159418
Submitted by: Gleb Kurtsou
Reviewed by: kib
MFC after: 1 month


# 226987 01-Nov-2011 pho

Added missing cache purge of from argument for rename().

Reported by: Anton Yuzhaninov <citrin citrin ru>
In collaboration with: kib
MFC after: 1 week


# 225418 06-Sep-2011 kib

Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)


# 218949 22-Feb-2011 alc

Eliminate two dubious attempts at optimizing the implementation of a
file's last accessed, modified, and changed times:

TMPFS_NODE_ACCESSED and TMPFS_NODE_CHANGED should be set unconditionally
in tmpfs_remove() without regard to the number of hard links to the file.
Otherwise, after the last directory entry for a file has been removed, a
process that still has the file open could read stale values for the last
accessed and changed times with fstat(2).

Similarly, tmpfs_close() should update the time-related fields even if all
directory entries for a file have been removed. In this case, the effect
is that the time-related fields will have values that are later than
expected. They will correspond to the time at which fstat(2) is called.

In collaboration with: kib
MFC after: 1 week


# 218863 19-Feb-2011 alc

tmpfs_remove() isn't modifying the file's data, so it shouldn't set
TMPFS_NODE_MODIFIED on the node.

PR: 152488
Submitted by: Anton Yuzhaninov
Reviewed by: kib
MFC after: 1 week


# 217633 20-Jan-2011 kib

In tmpfs_readdir(), normalize handling of the directory entries that
either overflow the supplied buffer, or cause uiomove fail.
Do not advance cached de when directory entry was not copied out.
Do not return EOF when no entries could be copied due to first entry
too large for supplied buffer, signal EINVAL instead.

Reported by: Beat G?tzi <beat chruetertee ch>
MFC after: 1 week


# 213735 12-Oct-2010 avg

tmpfs + sendfile: do not produce partially valid pages for vnode's tail

See r213730 for details of analogous change in ZFS.

MFC after: 3 days


# 212650 15-Sep-2010 avg

tmpfs, zfs + sendfile: mark page bits as valid after populating it with data

Otherwise, adding insult to injury, in addition to double-caching of data
we would always copy the data into a vnode's vm object page from backend.
This is specific to sendfile case only (VOP_READ with UIO_NOCOPY).

PR: kern/141305
Reported by: Wiktor Niesiobedzki <bsd@vink.pl>
Reviewed by: alc
Tested by: tools/regression/sockets/sendfile
MFC after: 2 weeks


# 212305 07-Sep-2010 ivoras

Avoid "Entry can disappear before we lock fdvp" panic.

PR: 150143
Submitted by: Gleb Kurtsou <gk at FreeBSD.org>
Pretty sure it won't blow up: mckusick
MFC after: 2 weeks


# 211598 22-Aug-2010 ed

Add support for whiteouts on tmpfs.

Right now unionfs only allows filesystems to be mounted on top of
another if it supports whiteouts. Even though I have sent a patch to
daichi@ to let unionfs work without it, we'd better also add support for
whiteouts to tmpfs.

This patch implements .vop_whiteout and makes necessary changes to
lookup() and readdir() to take them into account. We must also make sure
that when adding or removing a file, we honour the componentname's
DOWHITEOUT and ISWHITEOUT, to prevent duplicate filenames.

MFC after: 1 month


# 209226 15-Jun-2010 alc

Eliminate unnecessary page queues locking.


# 207719 06-May-2010 trasz

Style fixes and removal of unneeded variable.

Submitted by: bde@


# 207662 05-May-2010 trasz

Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize().

Reviewed by: kib


# 207644 05-May-2010 alc

Push down the acquisition of the page queues lock into vm_page_unwire().

Update the comment describing which lock should be held on entry to
vm_page_wire().

Reviewed by: kib


# 207573 03-May-2010 alc

Acquire the page lock around vm_page_unwire() and vm_page_wire().

Reviewed by: kib


# 207530 02-May-2010 alc

It makes no sense for vm_page_sleep_if_busy()'s helper, vm_page_sleep(),
to unconditionally set PG_REFERENCED on a page before sleeping. In many
cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by
the page daemon, before the caller to vm_page_sleep() is reawakened.
Instead, we now explicitly set PG_REFERENCED in those cases where having
the page persist until the caller is awakened is clearly desirable. Note,
however, that setting PG_REFERENCED on the page is still only a hint,
and not a guarantee that the page should persist.


# 197953 11-Oct-2009 delphij

Add locking around access to parent node, and bail out when the parent
node is already freed rather than panicking the system.

PR: kern/122038
Submitted by: gk
Tested by: pho
MFC after: 1 week


# 197850 07-Oct-2009 delphij

Add a special workaround to handle UIO_NOCOPY case. This fixes data
corruption observed when sendfile() is being used.

PR: kern/127213
Submitted by: gk
MFC after: 2 weeks


# 197740 04-Oct-2009 delphij

Fix a bug that causes the fsx test case of mmap'ed page being out of sync
of read/write, inspired by ZFS's counterpart.

PR: kern/139312
Submitted by: gk@
MFC after: 1 week


# 194124 13-Jun-2009 alc

Eliminate unnecessary variables.


# 192917 27-May-2009 alc

Eliminate redundant setting of a page's valid bits and pointless clearing
of the same page's dirty bits.


# 191990 11-May-2009 attilio

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


# 188929 22-Feb-2009 alc

Use uiomove_fromphys() instead of the combination of sf_buf and uiomove().

This is not only shorter; it also eliminates unnecessary thread pinning on
architectures that implement a direct map.

MFC after: 3 weeks


# 188921 22-Feb-2009 alc

Simplify the unwiring and activation of pages.

MFC after: 1 week


# 188318 08-Feb-2009 kib

Lookup up the directory entry for the tmpfs node that are deleted by
both node pointer and name component. This does the right thing for
hardlinks to the same node in the same directory.

Submitted by: Yoshihiro Ota <ota j email ne jp>
PR: kern/131356
MFC after: 2 weeks


# 184413 28-Oct-2008 trasz

Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by: rwatson (mentor)


# 183299 23-Sep-2008 obrien

The kernel implemented 'memcmp' is an alias for 'bcmp'. However, memcmp
and bcmp are not the same thing. 'man bcmp' states that the return is
"non-zero" if the two byte strings are not identical. Where as,
'man memcmp' states that the return is the "difference between the
first two differing bytes (treated as unsigned char values" if the
two byte strings are not identical.

So provide a proper memcmp(9), but it is a C implementation not a tuned
assembly implementation. Therefore bcmp(9) should be preferred over memcmp(9).


# 183215 20-Sep-2008 kib

fdescfs, devfs, mqueuefs, nfs, portalfs, pseudofs, tmpfs and xfs
initialize the vattr structure in VOP_GETATTR() with VATTR_NULL(),
vattr_null() or by zeroing it. Remove these to allow preinitialization
of fields work in vn_stat(). This is needed to get birthtime initialized
correctly.

Submitted by: Jaakko Heinonen <jh saunalahti fi>
Discussed on: freebsd-fs
MFC after: 1 month


# 183214 20-Sep-2008 kib

Initialize va_rdev to NODEV instead of 0 or VNOVAL in VOP_GETATTR().
NODEV is more appropriate when va_rdev doesn't have a meaningful value.

Submitted by: Jaakko Heinonen <jh saunalahti fi>
Suggested by: bde
Discussed on: freebsd-fs
MFC after: 1 month


# 183212 20-Sep-2008 kib

Initialize va_flags and va_filerev properly in VOP_GETATTR(). Don't
initialize va_vaflags and va_spare because they are not part of the
VOP_GETATTR() API. Also don't initialize birthtime to ctime or zero.

Submitted by: Jaakko Heinonen <jh saunalahti fi>
Reviewed by: bde
Discussed on: freebsd-fs
MFC after: 1 month


# 182739 03-Sep-2008 delphij

Reflect license change of NetBSD code.

Obtained from: NetBSD
MFC after: 3 days


# 182371 28-Aug-2008 attilio

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# 178243 16-Apr-2008 kib

Move the head of byte-level advisory lock list from the
filesystem-specific vnode data to the struct vnode. Provide the
default implementation for the vop_advlock and vop_advlockasync.
Purge the locks on the vnode reclaim by using the lf_purgelocks().
The default implementation is augmented for the nfs and smbfs.
In the nfs_advlock, push the Giant inside the nfs_dolock.

Before the change, the vop_advlock and vop_advlockasync have taken the
unlocked vnode and dereferenced the fs-private inode data, racing with
with the vnode reclamation due to forced unmount. Now, the vop_getattr
under the shared vnode lock is used to obtain the inode size, and
later, in the lf_advlockasync, after locking the vnode interlock, the
VI_DOOMED flag is checked to prevent an operation on the doomed vnode.

The implementation of the lf_purgelocks() is submitted by dfr.

Reported by: kris
Tested by: kris, pho
Discussed with: jeff, dfr
MFC after: 2 weeks


# 177633 26-Mar-2008 dfr

Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.

* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.

Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks


# 176559 25-Feb-2008 attilio

Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is
always curthread.

As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.

Tested by: Andrea Barberio <insomniac at slackware dot it>


# 175294 13-Jan-2008 attilio

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


# 175202 09-Jan-2008 attilio

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


# 174265 04-Dec-2007 wkoszek

Explicitly initialize 'error' to 0 (two places). It lets one to build tmpfs
from the latest source tree with older compiler--gcc3.

Reviewed by: kib@ (on freebsd-current@)
Approved by: cognet@ (mentor)


# 173725 18-Nov-2007 delphij

MFp4: Several fixes to tmpfs which makes it to survive from pho@'s
strees2 suite, to quote his letter, this change:

1. It removes the tn_lookup_dirent stuff. I think this cannot be fixed,
because nothing protects vnode/tmpfs node between lookup is done, and
actual operation is performed, in the case the vnode lock is dropped.
At least, this is the case with the from vnode for rename.

For now, we do the linear lookup in the parent node. This has its own
drawbacks. Not mentioning speed (that could be fixed by using hash), the
real problem is the situation where several hardlinks exist in the dvp.
But, I think this is fixable.

2. The patch restores the VV_ROOT flag on the root vnode after it became
reclaimed and allocated again. This fixes MPASS assertion at the start
of the tmpfs_lookup() reported by many.

Submitted by: kib


# 171862 16-Aug-2007 delphij

MFp4: rework tmpfs_readdir() logic in terms of correctness.

Approved by: re (tmpfs blanket)
Tested with: fstest, fsx


# 171802 10-Aug-2007 delphij

MFp4:
- LK_RETRY prohibits vget() and vn_lock() to return error.
Remove associated code. [1]
- Properly use vhold() and vdrop() instead of their unlocked
versions, we are guaranteed to have the vnode's interlock
unheld. [1]
- Fix a pseudo-infinite loop caused by 64/32-bit arithmetic
with the same way used in modern NetBSD versions. [2]
- Reorganize tmpfs_readdir to reduce duplicated code.

Submitted by: kib [1]
Obtained from: NetBSD [2]
Approved by: re (tmpfs blanket)


# 171799 10-Aug-2007 delphij

MFp4:

- Respect cnflag and don't lock vnode always as LK_EXCLUSIVE [1]
- Properly lock around tn_vnode to avoid NULL deference
- Be more careful handling vnodes (*)

(*) This is a WIP
[1] by pjd via howardsu

Thanks kib@ for his valuable VFS related comments.

Tested with: fsx, fstest, tmpfs regression test set
Found by: pho's stress2 suite
Approved by: re (tmpfs blanket)


# 171489 19-Jul-2007 delphij

MFp4: Rework on tmpfs's mapped read/write procedures. This
should finally fix fsx test case.

The printf's added here would be eventually turned into
assertions.

Submitted by: Mingyan Guo (mostly)
Approved by: re (tmpfs blanket)


# 171308 08-Jul-2007 delphij

MFp4:
- Plug memory leak.
- Respect underlying vnode's properties rather than assuming that
the user want root:wheel + 0755. Useful for using tmpfs(5) for
/tmp.
- Use roundup2 and howmany macros instead of rolling our own version.
- Try to fix fsx -W -R foo case.
- Instead of blindly zeroing a page, determine whether we need a pagein
order to prevent data corruption.
- Fix several bugs reported by Coverity.

Submitted by: Mingyan Guo <guomingyan gmail com>, Howard Su, delphij
Coverity ID: CID 2550, 2551, 2552, 2557
Approved by: re (tmpfs blanket)


# 171087 29-Jun-2007 delphij

MFp4:

- Remove unnecessary NULL checks after M_WAITOK allocations.
- Use VOP_ACCESS instead of hand-rolled suser_cred()
calls. [1]
- Use malloc(9) KPI to allocate memory for string. The
optimization taken from NetBSD is not valid for FreeBSD
because our malloc(9) already act that way. [2]

Requested by: rwatson [1]
Submitted by: Howard Su [2]
Approved by: re (tmpfs blanket)


# 171070 28-Jun-2007 delphij

Space/style cleanups after last set of commits.

Approved by: re (tmpfs blanket)


# 171069 28-Jun-2007 delphij

Staticify most of fifo/vn operations, they should not
be directly exposed outside.

Approved by: re (tmpfs blanket)


# 171029 25-Jun-2007 delphij

MFp4: Several clean-ups and improvements over tmpfs:

- Remove tmpfs_zone_xxx KPI, the uma(9) wrapper, since
they does not bring any value now.
- Use |= instead of = when applying VV_ROOT flag.
- Remove tm_avariable_nodes list. Use uma to hold the
released nodes.
- init/destory interlock mutex of node when init/fini
instead of ctor/dtor.
- Change memory computing using u_int to fix negative
value in 2G mem machine.
- Remove unnecessary bzero's
- Rely uma logic to make file id allocation harder to
guess.
- Fix some unsigned/signed related things. Make sure
we respect -o size=xxxx
- Use wire instead of hold a page.
- Pass allocate_zero to obtain zeroed pages upon first
use.

Submitted by: Howard Su
Approved by: re (tmpfs blanket, kensmith)


# 170903 17-Jun-2007 delphij

MFp4: fix two locking problems:

- Hold TMPFS_LOCK while updating tm_pages_used.
- Hold vm page while doing uiomove.

This will hopefully fix all known panics.

Submitted by: Howard Su


# 170808 15-Jun-2007 delphij

MFp4: Add tmpfs, an efficient memory file system.

Please note that, this is currently considered as an
experimental feature so there could be some rough
edges. Consult http://wiki.freebsd.org/TMPFS for
more information.

For now, connect tmpfs to build on i386 and amd64
architectures only. Please let us know if you have
success with other platforms.

This work was developed by Julio M. Merino Vidal
for NetBSD as a SoC project; Rohit Jalan ported it
from NetBSD to FreeBSD. Howard Su and Glen Leeder
are worked on it to continue this effort.

Obtained from: NetBSD via p4
Submitted by: Howard Su (with some minor changes)
Approved by: re (kensmith)