History log of /freebsd-current/sys/fs/fuse/fuse_vnops.c
Revision Date Author Comments
# ff4fc43a 21-May-2024 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix build.


# 31223e68 18-May-2024 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Simplify the code.

Obtained from: Fudo Security
Reviewed by: asomers, imp
Approved by: oshogbo (mentor)
Differential Revision: https://reviews.freebsd.org/D45247


# 1c909c30 31-Dec-2023 Alan Somers <asomers@FreeBSD.org>

fusefs: fix an interaction between copy_file_range and mmap

If a copy_file_range operation tries to read from a page that was
previously written via mmap, that page must be flushed first.

MFC after: 2 weeks
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D43451


# c5405d1c 18-Nov-2023 Konstantin Belousov <kib@FreeBSD.org>

vn_copy_file_range(): provide ENOSYS fallback to vn_generic_copy_file_range()

Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42603


# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 4c6cded2 14-Nov-2023 Konstantin Belousov <kib@FreeBSD.org>

fuse_vnop_copy_file_range(): add safety

v_mount for unlocked vnode could be NULL, check for it. Explain why it
is safe to access fs-specific data for mp if it is read as non-NULL.

Reviewed by: asomers, jah
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42625


# 318c5671 14-Nov-2023 Konstantin Belousov <kib@FreeBSD.org>

fuse_vnop_copy_file_range(): use vn_lock_pair()

Reviewed by: asomers, jah
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42625


# 662ec2f7 04-Oct-2023 Alan Somers <asomers@FreeBSD.org>

fusefs: sanitize FUSE_READLINK results for embedded NULs

If VOP_READLINK returns a path that contains a NUL, it will trigger an
assertion in vfs_lookup. Sanitize such paths in fusefs, rejecting any
and warning the user about the misbehaving server.

PR: 274268
MFC after: 1 week
Sponsored by: Axcient
Reviewed by: mjg, markj
Differential Revision: https://reviews.freebsd.org/D42081


# fb619c94 20-Sep-2023 Alan Somers <asomers@FreeBSD.org>

fusefs: fix some bugs updating atime during close

When using cached attributes, we must update a file's atime during
close, if it has been read since the last attribute refresh. But,

* Don't update atime if we lack write permissions to the file or if the
file system is readonly.
* If the daemon fails our atime update request for any reason, don't
report this as a failure for VOP_CLOSE.

PR: 270749
Reported by: Jamie Landeg-Jones <jamie@catflap.org>
MFC after: 1 week
Sponsored by: Axcient
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D41925


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# ba8cc6d7 12-Mar-2023 Mateusz Guzik <mjg@FreeBSD.org>

vfs: use __enum_uint8 for vtype and vstate

This whacks hackery around only reading v_type once.

Bump __FreeBSD_version to 1400093


# e3f7081b 22-May-2023 Mark Johnston <markj@FreeBSD.org>

fusefs: Remove an unused pbuf zone

The zone has been dead ever since commit
b9e20197551d ("fusefs: rewrite vop_getpages and vop_putpages")

No functional change intended.

Reviewed by: asomers
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D40143


# cb750f7f 31-Mar-2023 John Baldwin <jhb@FreeBSD.org>

fuse: Remove set but unused cr_gid variable.

Reviewed by: asomers
Reported by: GCC
Differential Revision: https://reviews.freebsd.org/D39350


# 1bdf879b 11-Feb-2023 Alan Somers <asomers@FreeBSD.org>

fusefs: fix some resource leaks

fusefs would leak tickets in three cases:
* After FUSE_CREATE, if the server returned a bad inode number.
* After a FUSE_FALLOCATE operation during VOP_ALLOCATE
* After a FUSE_FALLOCATE operation during VOP_DEALLOCATE

MFC after: 3 days
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D38526


# f6e53195 11-Oct-2022 Alan Somers <asomers@FreeBSD.org>

fusefs: fix VOP_ADVLOCK with SEEK_END

When the user specifies SEEK_END, unlike SEEK_CUR, VOP_ADVLOCK must
adjust lock offsets itself.

Sort-of related to bug 266886.

MFC after: 2 weeks
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D37040


# 3c3b906b 11-Oct-2022 Alan Somers <asomers@FreeBSD.org>

fusefs: After successful F_GETLK, l_whence should be SEEK_SET

PR: 266886
Reported by: John Millikin <jmillikin@gmail.com>
MFC after: 2 weeks
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D37014


# 46fcf947 07-Oct-2022 Alan Somers <asomers@FreeBSD.org>

fusefs: during F_GETLK, don't change l_pid if no lock is found

PR: 266885
MFC after: 2 weeks
Submitted by: John Millikin <jmillikin@gmail.com>
Sponsored by: Axcient
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D36905


# 52360ca3 25-Sep-2022 Alan Somers <asomers@FreeBSD.org>

copy_file_range: truncate write if it would exceed RLIMIT_FSIZE

PR: 266611
MFC after: 2 weeks
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D36706


# 0a192b3a 25-Sep-2022 Alan Somers <asomers@FreeBSD.org>

fusefs: respect RLIMIT_FSIZE during truncate

PR: 164793
MFC after: 2 weeks
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D36703


# 5b5b7e2c 17-Sep-2022 Mateusz Guzik <mjg@FreeBSD.org>

vfs: always retain path buffer after lookup

This removes some of the complexity needed to maintain HASBUF and
allows for removing injecting SAVENAME by filesystems.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D36542


# 0bef4927 04-May-2022 Alan Somers <asomers@FreeBSD.org>

fusefs: handle evil servers that return illegal inode numbers

* If during FUSE_CREATE, FUSE_MKDIR, etc the server returns the same
inode number for the new file as for its parent directory, reject it.
Previously this would triggers a recurse-on-non-recursive lock panic.

* If during FUSE_LINK the server returns a different inode number for
the new name as for the old one, reject it. Obviously, that can't be
a hard link.

* If during FUSE_LOOKUP the server returns the same inode number for the
new file as for its parent directory, reject it. Nothing good can
come of this.

PR: 263662
Reported by: Robert Morris <rtm@lcs.mit.edu>
MFC after: 2 weeks
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D35128


# 45825a12 28-Apr-2022 Alan Somers <asomers@FreeBSD.org>

fusefs: fix FUSE_CREATE with file handles and fuse protocol < 7.9

Prior to fuse protocol version 7.9, the fuse_entry_out structure had a
smaller size. But fuse_vnop_create did not take that into account when
working with servers that use older protocols. The bug does not matter
for servers which don't use file handles or open flags (the only fields
affected).

PR: 263625
Submitted by: Ali Abdallah <ali.abdallah@suse.com>
MFC after: 2 weeks


# 32273253 02-Apr-2022 Alan Somers <asomers@FreeBSD.org>

fusefs: fix two bugs regarding VOP_RECLAIM of the root inode

* We never send FUSE_LOOKUP for the root inode, since its inode number
is hard-coded to 1. Therefore, we should not send FUSE_FORGET for it,
lest the server see its lookup count fall below 0.

* During VOP_RECLAIM, if we are reclaiming the root inode, we must clear
the file system's vroot pointer. Otherwise it will be left pointing
at a reclaimed vnode, which will cause future VOP_LOOKUP operations to
fail. Previously we only cleared that pointer during VFS_UMOUNT. I
don't know of any real-world way to trigger this bug.

MFC after: 2 weeks
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D34753


# ef1534ca 02-Apr-2022 Gordon Bergling <gbe@FreeBSD.org>

fusefs(5): Fix a typo in a source code comment

- s/accomodate/accommodate/

MFC after: 3 days


# e8553be9 21-Feb-2022 Alan Somers <asomers@FreeBSD.org>

fusefs: fix a cached attributes bug during directory rename

When renaming a directory into a different parent directory, invalidate
the cached attributes of the new parent. Otherwise, stat will show the
wrong st_nlink value.

MFC after: 1 week
Reviewed by: ngie
Differential Revision: https://reviews.freebsd.org/D34336


# 18ed2ce7 04-Feb-2022 Alan Somers <asomers@FreeBSD.org>

fusefs: fix the build without INVARIANTS after 00134a07898

MFC after: 2 weeks
MFC with: 00134a07898fa807b8a1fcb2596f0e3644143f69
Reported by: se


# 00134a07 02-Jan-2022 Alan Somers <asomers@FreeBSD.org>

fusefs: require FUSE_NO_OPENDIR_SUPPORT for NFS exporting

FUSE file systems that do not set FUSE_NO_OPENDIR_SUPPORT do not
guarantee that d_off will be valid after closing and reopening a
directory. That conflicts with NFS's statelessness, that results in
unresolvable bugs when NFS reads large directories, if:

* The file system _does_ change the d_off field for the last directory
entry previously returned by VOP_READDIR, or
* The file system deletes the last directory entry previously seen by
NFS.

Rather than doing a poor job of exporting such file systems, it's better
just to refuse.

Even though this is technically a breaking change, 13.0-RELEASE's
NFS-FUSE support was bad enough that an MFC should be allowed.

MFC after: 3 weeks.
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D33726


# 4a6526d8 02-Jan-2022 Alan Somers <asomers@FreeBSD.org>

fusefs: optimize NFS readdir for FUSE_NO_OPENDIR_SUPPORT

In its lowest common denominator, FUSE does not require that a directory
entry's d_off field is valid outside of the lifetime of the directory's
FUSE file handle. But since NFS is stateless, it must reopen the
directory on every call to VOP_READDIR. That means reading the
directory all the way from the first entry. Not only does this create
an O(n^2) condition for large directories, but it can also result in
incorrect behavior if either:

* The file system _does_ change the d_off field for the last directory
entry previously seen by NFS, or
* The file system deletes the last directory entry previously seen by
NFS.

Handily, for file systems that set FUSE_NO_OPENDIR_SUPPORT d_off is
guaranteed to be valid for the lifetime of the directory entry, there is
no need to read the directory from the start.

MFC after: 3 weeks
Reviewed by: rmacklem


# 3d856234 20-Jan-2022 Mark Johnston <markj@FreeBSD.org>

fusefs: Address -Wunused-but-set-variable warnings

Reviewed by: asomers
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33957


# 89d57b94 08-Jan-2022 Alan Somers <asomers@FreeBSD.org>

fusefs: implement VOP_DEALLOCATE

MFC after: Never
Reviewed by: khng
Differential Revision: https://reviews.freebsd.org/D33800


# 398c88c7 31-Dec-2021 Alan Somers <asomers@FreeBSD.org>

fusefs: implement VOP_ALLOCATE

Now posix_fallocate will be correctly forwarded to fuse file system
servers, for those that support it.

MFC after: 2 weeks
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D33389


# 1613087a 01-Dec-2021 Alan Somers <asomers@FreeBSD.org>

fusefs: fix .. lookups when the parent has been reclaimed.

By default, FUSE file systems are assumed not to support lookups for "."
and "..". They must opt-in to that. To cope with this limitation, the
fusefs kernel module caches every fuse vnode's parent's inode number,
and uses that during VOP_LOOKUP for "..". But if the parent's vnode has
been reclaimed that won't be possible. Previously we paniced in this
situation. Now, we'll return ESTALE instead. Or, if the file system
has opted into ".." lookups, we'll just do that instead.

This commit also fixes VOP_LOOKUP to respect the cache timeout for ".."
lookups, if the FUSE file system specified a finite timeout.

PR: 259974
MFC after: 2 weeks
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D33239


# 5169832c 28-Nov-2021 Alan Somers <asomers@FreeBSD.org>

fusefs: copy_file_range must update file timestamps

If FUSE_COPY_FILE_RANGE returns successfully, update the atime of the
source and the mtime and ctime of the destination.

MFC after: 2 weeks
Reviewers: pfg
Differential Revision: https://reviews.freebsd.org/D33159


# 13d593a5 28-Nov-2021 Alan Somers <asomers@FreeBSD.org>

Fix a race in fusefs that can corrupt a file's size.

VOPs like VOP_SETATTR can change a file's size, with the vnode
exclusively locked. But VOPs like VOP_LOOKUP look up the file size from
the server without the vnode locked. So a race is possible. For
example:

1) One thread calls VOP_SETATTR to truncate a file. It locks the vnode
and sends FUSE_SETATTR to the server.
2) A second thread calls VOP_LOOKUP and fetches the file's attributes from
the server. Then it blocks trying to acquire the vnode lock.
3) FUSE_SETATTR returns and the first thread releases the vnode lock.
4) The second thread acquires the vnode lock and caches the file's
attributes, which are now out-of-date.

Fix this race by recording a timestamp in the vnode of the last time
that its filesize was modified. Check that timestamp during VOP_LOOKUP
and VFS_VGET. If it's newer than the time at which FUSE_LOOKUP was
issued to the server, ignore the attributes returned by FUSE_LOOKUP.

PR: 259071
Reported by: Agata <chogata@moosefs.pro>
Reviewed by: pfg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33158


# b214fcce 13-Dec-2021 Alan Somers <asomers@FreeBSD.org>

Change VOP_READDIR's cookies argument to a **uint64_t

The cookies argument is only used by the NFS server. NFSv2 defines the
cookie as 32 bits on the wire, but NFSv3 increased it to 64 bits. Our
VOP_READDIR, however, has always defined it as u_long, which is 32 bits
on some architectures. Change it to 64 bits on all architectures. This
doesn't matter for any in-tree file systems, but it matters for some
FUSE file systems that use 64-bit directory cookies.

PR: 260375
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D33404


# 41ae9f9e 05-Dec-2021 Alan Somers <asomers@FreeBSD.org>

fusefs: invalidate the cache during copy_file_range

FUSE_COPY_FILE_RANGE instructs the server to write data to a file.
fusefs must invalidate any cached data within the written range.

PR: 260242
MFC after: 2 weeks
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D33280


# dc433e15 05-Dec-2021 Alan Somers <asomers@FreeBSD.org>

fusefs: inline fuse_io_dispatch

This function was always confusing, because it created an H-shaped
callgraph: two functions called in and left via different paths based on
which which called.

MFC after: 2 weeks


# 91972cfc 28-Nov-2021 Alan Somers <asomers@FreeBSD.org>

fusefs: update atime on reads when using cached attributes

When using cached attributes, whether or not the data cache is enabled,
fusefs must update a file's atime whenever it reads from it, so long as
it wasn't mounted with -o noatime. Update it in-kernel, and flush it to
the server on close or during the next setattr operation.

The downside is that close() will now frequently trigger a FUSE_SETATTR
upcall. But if you care about performance, you should be using
-o noatime anyway.

MFC after: 2 weeks
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D33145


# 65d70b3b 28-Nov-2021 Alan Somers <asomers@FreeBSD.org>

fusefs: fix copy_file_range when extending a file

When copy_file_range extends a file, it must update the cached file
size.

MFC after: 2 weeks
Reviewed by: rmacklem, pfg
Differential Revision: https://reviews.freebsd.org/D33151


# 8fbae6c7 28-Nov-2021 Alan Somers <asomers@FreeBSD.org>

fusefs: delete a redundant getnanouptime

It's been redundant since SVN r346060 added another getnanouptime just
above.

MFC after: 2 weeks


# b4a58fbf 01-Oct-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove cn_thread

It is always curthread.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D32453


# 7430017b 02-Oct-2021 Alan Somers <asomers@FreeBSD.org>

fusefs: fix a recurse-on-non-recursive lockmgr panic

fuse_vnop_bmap needs to know the file's size in order to calculate the
optimum amount of readahead. If the file's size is unknown, it must ask
the FUSE server. But if the file's data was previously cached and the
server reports that its size has shrunk, fusefs must invalidate the
cached data. That's not possible during VOP_BMAP because the buffer
object is already locked.

Fix the panic by not querying the FUSE server for the file's size during
VOP_BMAP if we don't need it. That's also a a slight performance
optimization.

PR: 256937
Reported by: Agata <chogata@moosefs.pro>
Tested by: Agata <chogata@moosefs.pro>
MFC after: 2 weeks


# 5d94aaac 03-Oct-2021 Alan Somers <asomers@FreeBSD.org>

fusefs: quiet some cache-related warnings

If the FUSE server does something that would make our cache incoherent,
we should print a warning to the user. However, we previously warned in
some situations when we shouldn't, such as if the file's size changed on
the server _after_ our own attribute cache had expired. This change
suppresses the warning in cases like that. It also moves the warning
logic to a single place within the code.

PR: 256936
Reported by: Agata <chogata@moosefs.pro>
Tested by: Agata <chogata@moosefs.pro>, jSML4ThWwBID69YC@protonmail.com
MFC after: 2 weeks


# 4f917847 16-Sep-2021 Alan Somers <asomers@FreeBSD.org>

fusefs: don't panic if FUSE_GETATTR fails durint VOP_GETPAGES

During VOP_GETPAGES, fusefs needs to determine the file's length, which
could require a FUSE_GETATTR operation. If that fails, it's better to
SIGBUS than panic.

MFC after: 1 week
Sponsored by: Axcient
Reviewed by: markj, kib
Differential Revision: https://reviews.freebsd.org/D31994


# 197a4f29 16-Sep-2021 Konstantin Belousov <kib@FreeBSD.org>

buffer pager: allow get_blksize method to return error

Reported and reviewed by: asomers
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31998


# 18b19f8c 19-May-2021 Alan Somers <asomers@FreeBSD.org>

fusefs: correctly set lock owner during FUSE_SETLK

During FUSE_SETLK, the owner field should uniquely identify the calling
process. The fusefs module now sets it to the process's pid.
Previously, it expected the calling process to set it directly, which
was wrong.

libfuse also apparently expects the owner field to be set during
FUSE_GETLK, though I'm not sure why.

PR: 256005
Reported by: Agata <chogata@moosefs.pro>
MFC after: 2 weeks
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D30622


# 0b9a5c6f 15-Jun-2021 Alan Somers <asomers@FreeBSD.org>

fusefs: improve warnings about buggy FUSE servers

The fusefs driver will print warning messages about FUSE servers that
commit protocol violations. Previously it would print those warnings on
every violation, but that could spam the console. Now it will print
each warning no more than once per lifetime of the mount. There is also
now a dtrace probe for each violation.

MFC after: 2 weeks
Sponsored by: Axcient
Reviewed by: emaste, pfg
Differential Revision: https://reviews.freebsd.org/D30780


# 9c5aac8f 19-Mar-2021 Alan Somers <asomers@FreeBSD.org>

fusefs: fix a dead store in fuse_vnop_advlock

kevans actually caught this in the original review and I fixed it, but
then I committed an older copy of the branch. Whoops.

Reported by: kevans
MFC after: 13 days
MFC with: 929acdb19acb67cc0e6ee5439df98e28a84d4772
Differential Revision: https://reviews.freebsd.org/D29031


# 929acdb1 18-Mar-2021 Alan Somers <asomers@FreeBSD.org>

fusefs: fix two bugs regarding fcntl file locks

1) F_SETLKW (blocking) operations would be sent to the FUSE server as
F_SETLK (non-blocking).

2) Release operations, F_SETLK with lk_type = F_UNLCK, would simply
return EINVAL.

PR: 253500
Reported by: John Millikin <jmillikin@gmail.com>
MFC after: 2 weeks


# 17a82e6a 01-Jan-2021 Alan Somers <asomers@FreeBSD.org>

Fix vnode locking bug in fuse_vnop_copy_file_range

MFC-With: 92bbfe1f0d1f1c4436d1f064a16e5aaf682526ba
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D27938


# 34477e25 01-Jan-2021 Alan Somers <asomers@FreeBSD.org>

fusefs: only check vnode locks with DEBUG_VFS_LOCKS

MFC-With: 37df9d3bba8577fcdd63382ff5a4a5cbb4aa55b4
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D27939


# 542711e5 30-Dec-2020 Alan Somers <asomers@FreeBSD.org>

Fix a vnode locking bug in fuse_vnop_advlock.

Must lock the vnode before accessing the fufh table. Also, check for
invalid parameters earlier. Bug introduced by r346170.

MFC after: 2 weeks

Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D27936


# 92bbfe1f 28-Dec-2020 Alan Somers <asomers@gmail.com>

fusefs: implement FUSE_COPY_FILE_RANGE.

This updates the FUSE protocol to 7.28, though most of the new features
are optional and are not yet implemented.

MFC after: 2 weeks
Relnotes: yes
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D27818


# 37df9d3b 28-Dec-2020 Alan Somers <asomers@FreeBSD.org>

fusefs: update FUSE protocol to 7.24 and implement FUSE_LSEEK

FUSE_LSEEK reports holes on fuse file systems, and is used for example
by bsdtar.

MFC after: 2 weeks
Relnotes: yes
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D27804


# 85078b85 17-Nov-2020 Conrad Meyer <cem@FreeBSD.org>

Split out cwd/root/jail, cmask state from filedesc table

No functional change intended.

Tracking these structures separately for each proc enables future work to
correctly emulate clone(2) in linux(4).

__FreeBSD_version is bumped (to 1300130) for consumption by, e.g., lsof.

Reviewed by: kib
Discussed with: markj, mjg
Differential Revision: https://reviews.freebsd.org/D27037


# ab21ed17 20-Oct-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop the de facto curthread argument from VOP_INACTIVE


# 586ee69f 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

fs: clean up empty lines in .c and .h files


# 4961e997 26-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

fuse: unbreak after r364814

Reported by: kevans


# 8f226f4c 19-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove the always-curthread td argument from VOP_RECLAIM


# bfcb817b 22-May-2020 Alan Somers <asomers@FreeBSD.org>

Fix issues with FUSE_ACCESS when default_permissions is disabled

This patch fixes two issues relating to FUSE_ACCESS when the
default_permissions mount option is disabled:

* VOP_ACCESS() calls with VADMIN set should never be sent to a fuse server
in the form of FUSE_ACCESS operations. The FUSE protocol has no equivalent
of VADMIN, so we must evaluate such things kernel-side, regardless of the
default_permissions setting.

* The FUSE protocol only requires FUSE_ACCESS to be sent for two purposes:
for the access(2) syscall and to check directory permissions for
searchability during lookup. FreeBSD sends it much more frequently, due to
differences between our VFS and Linux's, for which FUSE was designed. But
this patch does eliminate several cases not required by the FUSE protocol:

* for any FUSE_*XATTR operation
* when creating a new file
* when deleting a file
* when setting timestamps, such as by utimensat(2).

* Additionally, when default_permissions is disabled, this patch removes one
FUSE_GETATTR operation when deleting a file.

PR: 245689
Reported by: MooseFS FreeBSD Team <freebsd@moosefs.pro>
Reviewed by: cem
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24777


# b0ecfb42 10-Mar-2020 Alan Somers <asomers@FreeBSD.org>

fusefs: avoid cache corruption with buggy fuse servers

The FUSE protocol allows the client (kernel) to cache a file's size, if the
server (userspace daemon) allows it. A well-behaved daemon obviously should
not change a file's size while a client has it cached. But a buggy daemon
might. If the kernel ever detects that that has happened, then it should
invalidate the entire cache for that file. Previously, we would not only
cache stale data, but in the case of a file extension while we had the size
cached, we accidentally extended the cache with zeros.

PR: 244178
Reported by: Ben RUBSON <ben.rubson@gmx.com>
Reviewed by: cem
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24012


# 6a5abb1e 02-Feb-2020 Kyle Evans <kevans@FreeBSD.org>

Provide O_SEARCH

O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping
permissions checks on the directory itself after the initial open(). This is
close to the semantics we've historically applied for O_EXEC on a directory,
which is UB according to POSIX. Conveniently, O_SEARCH on a file is also
explicitly undefined behavior according to POSIX, so O_EXEC would be a fine
choice. The spec goes on to state that O_SEARCH and O_EXEC need not be
distinct values, but they're not defined to be the same value.

This was pointed out as an incompatibility with other systems that had made
its way into libarchive, which had assumed that O_EXEC was an alias for
O_SEARCH.

This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC
respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a
directory is checked in vn_open_vnode already, so for completeness we add a
NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not
re-check that when descending in namei.

[0] https://pubs.opengroup.org/onlinepubs/9699919799/

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23247


# 6fa079fc 15-Dec-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: flatten vop vectors

This eliminates the following loop from all VOP calls:

while(vop != NULL && \
vop->vop_spare2 == NULL && vop->vop_bypass == NULL)
vop = vop->vop_default;

Reviewed by: jeff
Tesetd by: pho
Differential Revision: https://reviews.freebsd.org/D22738


# 42767f76 16-Sep-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: fix some minor issues with fuse_vnode_setparent

* When unparenting a vnode, actually clear the flag. AFAIK this is basically
a no-op because we only unparent a vnode when reclaiming it or when
unlinking.

* There's no need to call fuse_vnode_setparent during reclaim, because we're
about to free the vnode data anyway.

Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21630


# 16f87834 06-Sep-2019 Alan Somers <asomers@FreeBSD.org>

Coverity fixes in fusefs(5)

CID 1404532 fixes a signed vs unsigned comparison error in fuse_vnop_bmap.
It could potentially have resulted in VOP_BMAP reporting too many
consecutive blocks.

CID 1404364 is much worse. It was an array access by an untrusted,
user-provided variable. It could potentially have resulted in a malicious
file system crashing the kernel or worse.

Reported by: Coverity
Reviewed by: emaste
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21466


# 9222b823 29-Aug-2019 Mark Johnston <markj@FreeBSD.org>

Remove unused VM page locking macros.

They were orphaned by r292373.

Reviewed by: asomers
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21469


# 6470c8d3 29-Aug-2019 Konstantin Belousov <kib@FreeBSD.org>

Rework v_object lifecycle for vnodes.

Current implementation of vnode_create_vobject() and
vnode_destroy_vobject() is written so that it prepared to handle the
vm object destruction for live vnode. Practically, no filesystems use
this, except for some remnants that were present in UFS till today.
One of the consequences of that model is that each filesystem must
call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result
all of them get rid of the v_object in reclaim.

Move the call to vnode_destroy_vobject() to vgonel() before
VOP_RECLAIM(). This makes v_object stable: either the object is NULL,
or it is valid vm object till the vnode reclamation. Remove code from
vnode_create_vobject() to handle races with the parallel destruction.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21412


# 5e633330 27-Aug-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: Fix some bugs regarding the size of the LISTXATTR list

* A small error in r338152 let to the returned size always being exactly
eight bytes too large.

* The FUSE_LISTXATTR operation works like Linux's listxattr(2): if the
caller does not provide enough space, then the server should return ERANGE
rather than return a truncated list. That's true even though in FUSE's
case the kernel doesn't provide space to the client at all; it simply
requests a maximum size for the list. We previously weren't handling the
case where the server returns ERANGE even though the kernel requested as
much size as the server had told us it needs; that can happen due to a
race.

* We also need to ensure that a pathological server that always returns
ERANGE no matter what size we request in FUSE_LISTXATTR won't cause an
infinite loop in the kernel. As of this commit, it will instead cause an
infinite loop that exits and enters the kernel on each iteration, allowing
signals to be processed.

Reviewed by: cem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21287


# 3a79e8e7 15-Aug-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: don't send the namespace during listextattr

The FUSE_LISTXATTR operation always returns the full list of a file's
extended attributes, in all namespaces. There's no way to filter the list
server-side. However, currently FreeBSD's fusefs driver sends a namespace
string with the FUSE_LISTXATTR request. That behavior was probably copied
from fuse_vnop_getextattr, which has an attribute name argument. It's
been there ever since extended attribute support was added in r324620. This
commit removes it.

Reviewed by: cem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21280


# 91898857 29-Jul-2019 Mark Johnston <markj@FreeBSD.org>

Avoid relying on header pollution from sys/refcount.h.

MFC after: 3 days
Sponsored by: The FreeBSD Foundation


# 8aafc8c3 27-Jun-2019 Alan Somers <asomers@FreeBSD.org>

[skip ci] update copyright headers in fusefs files

Sponsored by: The FreeBSD Foundation


# 435ecf40 27-Jun-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: recycle vnodes after their last unlink

Previously fusefs would never recycle vnodes. After VOP_INACTIVE, they'd
linger around until unmount or the vnlru reclaimed them. This commit
essentially actives and inlines the old reclaim_revoked sysctl, and fixes
some issues dealing with the attribute cache and multiply linked files.

Sponsored by: The FreeBSD Foundation


# 560a55d0 27-Jun-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: convert statistical sysctls to use counter(9)

counter(9) is more performant than using atomic instructions to update
sysctls that just report statistics to userland.

Sponsored by: The FreeBSD Foundation


# caeea8b4 26-Jun-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: fix some memory leaks

Fix memory leaks relating to FUSE_BMAP and FUSE_CREATE. There are still
leaks relating to FUSE_INTERRUPT, but they'll be harder to fix since the
server is legally allowed to never respond to a FUSE_INTERRUPT operation.

Sponsored by: The FreeBSD Foundation


# b9e20197 25-Jun-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: rewrite vop_getpages and vop_putpages

Use the standard facilities for getpages and putpages instead of bespoke
implementations that don't work well with the writeback cache. This has
several corollaries:

* Change the way we handle short reads _again_. vfs_bio_getpages doesn't
provide any way to handle unexpected short reads. Plus, I found some more
lock-order problems. So now when the short read is detected we'll just
clear the vnode's attribute cache, forcing the file size to be requeried
the next time it's needed. VOP_GETPAGES doesn't have any way to indicate
a short read to the "caller", so we just bzero the rest of the page
whenever a short read happens.

* Change the way we decide when to set the FUSE_WRITE_CACHE bit. We now set
it for clustered writes even when the writeback cache is not in use.

Sponsored by: The FreeBSD Foundation


# a1c9f4ad 20-Jun-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: implement VOP_BMAP

If the fuse daemon supports FUSE_BMAP, then use that for the block mapping.
Otherwise, use the same technique used by vop_stdbmap. Report large values
for runp and runb in order to maximize read clustering and minimize upcalls,
even if we don't know the true layout.

The major result of this change is that sequential reads to FUSE files will
now usually happen 128KB at a time instead of 64KB.

Sponsored by: The FreeBSD Foundation


# 0d2bf489 31-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: check the vnode cache when looking up files for the NFS server

FUSE allows entries to be cached for a limited amount of time. fusefs's
vnop_lookup method already implements that using the timeout functionality
of cache_lookup/cache_enter_time. However, lookups for the NFS server go
through a separate path: vfs_vget. That path can't use the same timeout
functionality because cache_lookup/cache_enter_time only work on pathnames,
whereas vfs_vget works by inode number.

This commit adds entry timeout information to the fuse vnode structure, and
checks it during vfs_vget. This allows the NFS server to take advantage of
cached entries. It's also the same path that FUSE's asynchronous cache
invalidation operations will use.

Sponsored by: The FreeBSD Foundation


# a4856c96 29-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: raise protocol level to 7.12

This commit raises the protocol level and adds backwards-compatibility code
to handle structure size changes. It doesn't implement any new features.
The new features added in protocol 7.12 are:

* server-side umask processing (which FreeBSD won't do)
* asynchronous inode and directory entry invalidation (which I'll do next)

Sponsored by: The FreeBSD Foundation


# e039bafa 28-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: add comments explaining why 7.11 features aren't implemented

Protocol 7.11 adds two new features, but neither of them were defined
correctly. FUSE_IOCTL messages don't work for 32-bit daemons on a 64-bit
host (fixed in protocol 7.16). FUSE_POLL is basically unusable until 7.21.
Before 7.21, the client can't choose which events to register for; the
client registers for "something" and the server replies to say which events
the client is registered for. Also, before 7.21 there was no way for a
client to deregister a file handle.

Sponsored by: The FreeBSD Foundation


# 8aa24ed3 27-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: flock(2) locks must be implemented in-kernel

If a FUSE file system sets the FUSE_POSIX_LOCKS flag then it can support
fcntl(2)-style locks directly. However, the protocol does not adequately
support flock(2)-style locks until revision 7.17. They must be implemented
locally in-kernel instead. This unfortunately breaks the interoperability
of fcntl(2) and flock(2) locks for file systems that support the former.
C'est la vie.

Prior to this commit flock(2) would get sent to the server as a
fcntl(2)-style lock with the lock owner field set to stack garbage.

Sponsored by: The FreeBSD Foundation


# 65417f5e 24-May-2019 Alan Somers <asomers@FreeBSD.org>

Remove "struct ucred*" argument from vtruncbuf

vtruncbuf takes a "struct ucred*" argument. AFAICT, it's been unused ever
since that function was first added in r34611. Remove it. Also, remove some
"struct ucred" arguments from fuse and nfs functions that were only used by
vtruncbuf.

Reviewed by: cem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20377


# e76986fd 23-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: fix exporting fuse filesystems with nfsd

A previous commit made fuse exportable via userland NFS servers.
Compatibility with the in-kernel nfsd required two more changes:

* During read and write operations, implicitly do a FUSE_OPEN if there isn't
already a valid file handle. That's because nfsd never calls VOP_OPEN.
* During VOP_READDIR, if an implicit open was necessary, directory offsets
from a previous VOP_READDIR may not be valid, so VOP_READDIR may have to
start from the beginning and read until it encounters the requested
offset.

I've done only limited testing over NFS, so there are probably still some
more bugs. Thanks to rmacklem for all of the readdir changes, which he had
made for his pnfs work.

Sponsored by: The FreeBSD Foundation


# e5b50fe7 22-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: Make fuse file systems NFS-exportable

This commit adds the VOPs needed by userspace NFS servers (tested with
net/unfs3). More work is needed to make the in-kernel nfsd work, because of
its stateless nature. It doesn't open files prior to doing I/O. Also, the
NFS-related VOPs currently ignore the entry cache.

Sponsored by: The FreeBSD Foundation


# 2013b723 22-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: improve attribute cacheing

Consolidate all calls to fuse_vnode_setsize as a result of a file attribute
change to one location in fuse_internal_setattr. There are still a few
calls elsewhere that happen as a result of a write.

Sponsored by: The FreeBSD Foundation


# d311d6c4 20-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: eliminate a superfluous fuse_node_setparent

Sponsored by: The FreeBSD Foundation


# 96192dfc 15-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: diff reduction vs the upstream sources

fuse_kernel.h defines the structures used by the FUSE protocol. Originally
it came from libfuse, but the current source of truth is the Linux kernel.
This commit minimizes the diffs between our version and the Linux version as
of 21f3da95d (protocol version 7.8).

The flags field of struct fuse_listxattr_out and fuse_listxattr_in was an
error in our header. Those fields don't exist in Linux or libfuse, and
they've never been used in FreeBSD. In fact, those structs don't even exist
in Linux and libfuse; those projects confusingly overload the identical
fuse_getexattr_in and fuse_getxattr_out structs.

Sponsored by: The FreeBSD Foundation


# 3d15b234 14-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: don't track a file's size in two places

fuse_vnode_data.filesize was mostly redundant with
fuse_vnode_data.cached_attrs.st_size, but didn't have exactly the same
meaning. It was very confusing. This commit eliminates the former. It
also eliminates fuse_vnode_refreshsize, which ignored the cache timeout
value.

Sponsored by: The FreeBSD Foundation


# 5940f822 13-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: remove the vfs.fusefs.data_cache_invalidate sysctl

This sysctl was added > 6.5 years ago and I don't know why. The description
seems at odds with the code. While it's supposed to "discard clean cached
data" during VOP_INACTIVE, it looks like it would discard any cached data,
clean or otherwise.

Sponsored by: The FreeBSD Foundation


# d5ff2688 09-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: create sockets with FUSE_MKNOD, not FUSE_CREATE

libfuse expects sockets to be created with FUSE_MKNOD, not FUSE_CREATE,
because that's how Linux does it. My first attempt at creating sockets
(r346894) used FUSE_CREATE because FreeBSD uses VOP_CREATE for this purpose.
There are no backwards-compatibility concerns with this change, because
socket support hasn't yet been merged to head.

Sponsored by: The FreeBSD Foundation


# 002e54b0 08-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: clear a dir's attr cache when its contents change

Any change to a directory's contents should cause its mtime and ctime to be
updated by the FUSE daemon. Clear its attribute cache so we'll get the new
attributs the next time that they're needed. This affects the following
VOPs: VOP_CREATE, VOP_LINK, VOP_MKDIR, VOP_MKNOD, VOP_REMOVE, VOP_RMDIR, and
VOP_SYMLINK

Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation


# 8e45ec4e 08-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: fix a permission handling bug during VOP_RENAME

If the file to be renamed is a directory and it's going to get a new parent,
then the user must have write permissions to that directory, because the
".." dirent must be changed.

Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation


# d943c93e 08-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: allow non-owners to set timestamps to UTIME_NOW

utimensat should allow anybody with write access to set atime and mtime to
UTIME_NOW.

PR: 237181
Sponsored by: The FreeBSD Foundation


# 4ae3a56c 08-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: updated cached attributes during VOP_LINK.

FUSE_LINK returns a new set of attributes. fusefs should cache them just
like it does during other VOPs. This is not only a matter of performance
but of correctness too; without caching the new attributes the vnode's nlink
value would be out-of-date.

Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation


# a2bdd737 07-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: drop suid after a successful chown by a non-root user

Drop sgid too. Also, drop them after a successful chgrp.

Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation


# 4e83d655 06-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: allow the null chown and null chgrp

Even an unprivileged user should be able to chown a file to its current
owner, or chgrp it to its current group. Those are no-ops.

Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation


# 1c8a5f5e 06-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: disable posix_fallocate

fuse file systems have far too much variability for the standard
posix_fallocate implementation to work. A future protocol revision (7.19)
adds a FUSE_FALLOCATE operation, but we don't support that yet. Better to
simply return EINVAL until then.

Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation


# 3fa12789 06-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: allow ftruncate on files without write permission

ftruncate should succeed as long as the file descriptor is writable, even if
the file doesn't have write permission. This is important when combined
with O_CREAT.

Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation


# 8cfb4431 06-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: Fix another obscure permission handling bug

Don't allow unprivileged users to set SGID on files to whose group they
don't belong. This is slightly different than what POSIX says we should do
(clear sgid on return from a successful chmod), but it matches what UFS
currently does.

Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation


# a90e32de 06-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: clear SUID & SGID after a successful write by a non-owner

Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation


# ac0a68e9 06-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: don't allow truncating irregular files on an read-only mount

The readonly mount check had a special case allowing the sizes of files to
be changed if they weren't regular files. I don't know why. Neither UFS,
ZFS, nor ext2 have such a special case, and I don't know when you would ever
change the size of a non-regular file anyway.

Sponsored by: The FreeBSD Foundation


# e5ff3a7e 04-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: only root may set the sticky bit on a non-directory

PR: 216391
Reported by: pjdfstest
Sponsored by: The FreeBSD Foundation


# 93198e64 01-May-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: fix a memory leak from r346979

PR: 216391
Sponsored by: The FreeBSD Foundation


# 474ba6fa 30-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: fix some permission checks with -o default_permissions

When mounted with -o default_permissions fusefs is supposed to validate all
permissions in the kernel, not the file system. This commit fixes two
permissions that I had previously overlooked.

* Only root may chown a file
* Non-root users may only chgrp a file to a group to which they belong

PR: 216391
Sponsored by: The FreeBSD Foundation


# ede571e4 29-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: support unix-domain sockets

Also, fix the teardown of the Fifo.read_write test

Sponsored by: The FreeBSD Foundation


# f9b0e30b 28-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: FIFO support

Sponsored by: The FreeBSD Foundation


# 9c7ec331 26-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: fix a deadlock in VOP_PUTPAGES

As of r346162 fuse now invalidates the cache during writes. But it can't do
that when writing from VOP_PUTPAGES, because the write is coming _from_ the
cache. Trying to invalidate the cache in that situation causes a deadlock
in vm_object_page_remove, because the pages in question have already been
busied by the same thread.

PR: 235774
Sponsored by: The FreeBSD Foundation


# 419e7ff6 19-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: rename the SDT probes from "fuse" to "fusefs"

This matches the new name of the kld.

Sponsored by: The FreeBSD Foundation


# 268c28ed 19-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: give priority to FUSE_INTERRUPT operations

When interrupting a FUSE operation, send the FUSE_INTERRUPT op to the daemon
ASAP, ahead of other unrelated operations.

PR: 236530
Sponsored by: The FreeBSD Foundation


# f0f7fc1b 19-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: fix interrupting FUSE_SETXATTR

fusefs's VOP_SETEXTATTR calls uiomove(9) before blocking, so it can't be
restarted. It must be interrupted instead.

PR: 236530
Sponsored by: The FreeBSD Foundation


# f067b609 12-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: implement VOP_ADVLOCK

PR: 234581
Sponsored by: The FreeBSD Foundation


# 1f4a83f9 11-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: Handle ENOSYS for all remaining opcodes

For many FUSE opcodes, an error of ENOSYS has special meaning. fusefs
already handled some of those; this commit adds handling for the remainder:

* FUSE_FSYNC, FUSE_FSYNCDIR: ENOSYS means "success, and automatically return
success without calling the daemon from now on"
* All extattr operations: ENOSYS means "fail EOPNOTSUPP, and automatically
do it without calling the daemon from now on"

PR: 236557
Sponsored by: The FreeBSD Foundation


# 4683b905 11-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: don't disappear a vnode on entry cache expiration

When the entry cache expires, it's only necessary to purge the cache.
Disappearing a vnode also purges the attribute cache, which is unnecessary,
and invalidates the data cache, which could be harmful.

Sponsored by: The FreeBSD Foundation


# 6124fd71 11-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: Finish supporting -o default_permissions

I got most of -o default_permissions working in r346088. This commit adds
sticky bit checks. One downside is that sometimes there will be an extra
FUSE_GETATTR call for the parent directory during unlink or rename. But in
actual use I think those attributes will almost always be cached.

PR: 216391
Sponsored by: The FreeBSD Foundation


# dc14d593 11-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: use vn_vget_ino_gen in fuse_vnop_lookup

vn_vget_ino_gen is a helper function added in r268606 to simplify cases just
like this.

Sponsored by: The FreeBSD Foundation


# 438b8a6f 10-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: eliminate a superfluous FUSE_GETATTR from VOP_LOOKUP

fuse_vnop_lookup was using a FUSE_GETATTR operation when looking up "." and
"..", even though the only information it needed was the file type and file
size. "." and ".." are obviously always going to be directories; there's no
need to double check.

Sponsored by: The FreeBSD Foundation


# 73825da3 10-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: remove "early permission check hack"

fuse_vnop_lookup contained an awkward hack meant to reduce daemon activity
during long lookup chains. However, the hack is no longer necessary now
that we properly cache file attributes. Also, I'm 99% certain that it
could've bypassed permission checks when using openat to open a file
relative to a directory that lacks execute permission.

Sponsored by: The FreeBSD Foundation


# 666f8543 10-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: various cleanups

* Eliminate fuse_access_param. Whatever it was supposed to do, it seems
like it was never complete. The only real function it ever seems to have
had was a minor performance optimization, which I've already eliminated.
* Make extended attribute operations obey the allow_other mount option.
* Allow unprivileged access to the SYSTEM extattr namespace when
-o default_permissions is not in use.
* Disallow setextattr and deleteextattr on read-only mounts.
* Add tests for a few more error cases.

Sponsored by: The FreeBSD Foundation


# ff4fbdf5 10-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: WIP supporting -o default_permissions

Normally all permission checking is done in the fuse server. But when -o
default_permissions is used, it should be done in the kernel instead. This
commit adds appropriate permission checks through fusefs when -o
default_permissions is used. However, sticky bit checks aren't working yet.
I'll handle those in a follow-up commit.

There are no checks for file flags, because those aren't supported by our
version of the FUSE protocol. Nor is there any support for ACLs, though
that could be added if there were any demand.

PR: 216391
Reported by: hiyorin@gmail.com
Sponsored by: The FreeBSD Foundation


# 44f10c6e 09-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: cache negative lookups

The FUSE protocol includes a way for a server to tell the client that a
negative lookup response is cacheable for a certain amount of time.

PR: 236226
Sponsored by: The FreeBSD Foundation


# ccb75e49 09-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: implement entry cache timeouts

Follow-up to r346046. These two commits implement fuse cache timeouts for
both entries and attributes. They also remove the vfs.fusefs.lookup_cache
enable sysctl, which is no longer needed now that cache timeouts are
honored.

PR: 235773
Sponsored by: The FreeBSD Foundation


# 3f2c630c 08-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: implement attribute cache timeouts

The FUSE protocol allows the server to specify the timeout period for the
client's attribute and entry caches. This commit implements the timeout
period for the attribute cache. The entry cache's timeout period is
currently disabled because it panics, and is guarded by the
vfs.fusefs.lookup_cache_expire sysctl.

PR: 235773
Reported by: cem
Sponsored by: The FreeBSD Foundation


# cad67791 08-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: cache file attributes

FUSE_LOOKUP, FUSE_GETATTR, FUSE_SETATTR, FUSE_MKDIR, FUSE_LINK,
FUSE_SYMLINK, FUSE_MKNOD, and FUSE_CREATE all return file attributes with a
cache validity period. fusefs will now cache the attributes, if the server
returns a non-zero cache validity period.

This change does _not_ implement finite attr cache timeouts. That will
follow as part of PR 235773.

PR: 235775
Reported by: cem
Sponsored by: The FreeBSD Foundation


# a7e81cb3 04-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: properly handle FOPEN_KEEP_CACHE

If a fuse file system returne FOPEN_KEEP_CACHE in the open or create
response, then the client is supposed to _not_ clear its caches for that
file. I don't know why clearing the caches would be the default given that
there's a separate flag to bypass the cache altogether, but that's the way
it is. fusefs(5) will now honor this flag.

Our behavior is slightly different than Linux's because we reuse file
handles. That means that open(2) wont't clear the cache if there's a
reusable file handle, even if the file server wouldn't have sent
FOPEN_KEEP_CACHE had we opened a new file handle like Linux does.

PR: 236560
Sponsored by: The FreeBSD Foundation


# 12292a99 04-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: correctly handle short writes

If a FUSE daemon returns FOPEN_DIRECT_IO when a file is opened, then it's
allowed to write less data than was requested during a FUSE_WRITE operation
on that file handle. fusefs should simply return a short write to userland.

The old code attempted to resend the unsent data. Not only was that
incorrect behavior, but it did it in an ineffective way, by attempting to
"rewind" the uio and uiomove the unsent data again.

This commit correctly handles short writes by returning directly to
userland if FOPEN_DIRECT_IO was set. If it wasn't set (making the short
write technically a protocol violation), then we resend the unsent data.
But instead of rewinding the uio, just resend the data that's already in the
kernel.

That necessitated a few changes to fuse_ipc.c to reduce the amount of bzero
activity. fusefs may be marginally faster as a result.

PR: 236381
Sponsored by: The FreeBSD Foundation


# 35cf0e7e 03-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: fix a panic in VOP_READDIR

The original fusefs import, r238402, contained a bug in fuse_vnop_close that
could close a directory's file handle while there were still other open file
descriptors. The code looks deliberate, but there is no explanation for it.
This necessitated a workaround in fuse_vnop_readdir that would open a new
file handle if, "for some mysterious reason", that vnode didn't have any
open file handles. r345781 had the effect of causing the workaround to
panic, making the problem more visible.

This commit removes the workaround and the original bug, which also fixes
the panic.

Sponsored by: The FreeBSD Foundation


# 9f10f423 03-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: send FUSE_FLUSH during VOP_CLOSE

The FUSE protocol says that FUSE_FLUSH should be send every time a file
descriptor is closed. That's not quite possible in FreeBSD because multiple
file descriptors can share a single struct file, and closef doesn't call
fo_close until the last close. However, we can still send FUSE_FLUSH on
every VOP_CLOSE, which is probably good enough.

There are two purposes for FUSE_FLUSH. One is to allow file systems to
return EIO if they have an error when writing data that's cached
server-side. The other is to release POSIX file locks (which fusefs(5) does
not yet support).

PR: 236405, 236327
Sponsored by: The FreeBSD Foundation


# d3a8f2dd 02-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: fix a just-introduced panic in readdir

r345808 changed the interface of fuse_filehandle_open, but failed to update
one caller.

Sponsored by: The FreeBSD Foundation


# 9e444871 02-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: cleanup and refactor some recent commits

This commit cleans up after recent commits, especially 345766, 345768, and
345781. There is no functional change. The most important change is to add
comments documenting why we can't send flags like O_APPEND in
FUSE_WRITE_OPEN.

PR: 236340
Sponsored by: The FreeBSD Foundation


# f8d4af10 01-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: send FUSE_OPEN for every open(2) with unique credentials

By default, FUSE performs authorization in the server. That means that it's
insecure for the client to reuse FUSE file handles between different users,
groups, or processes. Linux handles this problem by creating a different
FUSE file handle for every file descriptor. FreeBSD can't, due to
differences in our VFS design.

This commit adds credential information to each fuse_filehandle. During
open(2), fusefs will now only reuse a file handle if it matches the exact
same access mode, pid, uid, and gid of the calling process.

PR: 236844
Sponsored by: The FreeBSD Foundation


# 363a7416 01-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: allow opening files O_EXEC

O_EXEC is useful for fexecve(2) and fchdir(2). Treat it as another fufh
type alongside the existing RDONLY, WRONLY, and RDWR. Prior to r345742 this
would've caused a memory and performance penalty.

PR: 236329
Sponsored by: The FreeBSD Foundation


# 4a6d5507 01-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: fix an inverted error check in my last commit

This should be merged alongside 345766

Sponsored by: The FreeBSD Foundation


# 5ec10aa5 01-Apr-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: replace obsolete array idioms

r345742 replaced fusefs's fufh array with a fufh list. But it left a few
array idioms in place. This commit replaces those idioms with more
efficient list idioms. One location is in fuse_filehandle_close, which now
takes a pointer argument. Three other locations are places that had to loop
over all of a vnode's fuse filehandles.

Sponsored by: The FreeBSD Foundation


# 1cedd6df 30-Mar-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: replace the fufh table with a linked list

The FUSE protocol allows each open file descriptor to have a unique file
handle. On FreeBSD, these file handles must all be stored in the vnode.
The old method (also used by OSX and OpenBSD) is to store them all in a
small array. But that limits the total number that can be stored. This
commit replaces the array with a linked list (a technique also used by
Illumos). There is not yet any change in functionality, but this is the
first step to fixing several bugs.

PR: 236329, 236340, 236381, 236560, 236844
Discussed with: cem
Sponsored by: The FreeBSD Foundation


# 5fccbf31 29-Mar-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: don't force direct io for files opened O_WRONLY

Previously fusefs would treat any file opened O_WRONLY as though the
FOPEN_DIRECT_IO flag were set, in an attempt to avoid issuing reads as part
of a RMW write operation on a cached part of the file. However, the FUSE
protocol explicitly allows reads of write-only files for precisely that
reason.

Sponsored by: The FreeBSD Foundation


# 98852a32 28-Mar-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: fix error handling in fuse_vnop_strategy

Reported by: cem
Sponsored by: The FreeBSD Foundation


# f203d173 27-Mar-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: don't ignore errors in fuse_vnode_refreshsize

Reported by: Coverity
Coverity CID: 1368622
Sponsored by: The FreeBSD Foundation


# 019dca01 27-Mar-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: delete dead code in fuse_vnop_setattr

The dead code in question was a broken and incomplete attempt to support the
default_permissions mount option during VOP_SETATTR. There wasn't anything
there worth saving; I'll have to rewrite it later.

Reported by: Coverity
Coverity CID: 1008668
Sponsored by: The FreeBSD Foundation


# e0bec057 26-Mar-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: correctly set fuse_release_in.flags in an error path

fuse_vnop_create must close the newly created file if it can't allocate a
vnode. When it does so, it must use the same file flags for FUSE_RELEASE as
it used for FUSE_OPEN or FUSE_CREATE.

Reported by: Coverity
Coverity CID: 1066204
Sponsored by: The FreeBSD Foundation


# fd2749f2 25-Mar-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: delete dead code

This change also inlines several previously #define'd symbols that didn't
really have the meanings indicated by the comments.

Sponsored by: The FreeBSD Foundation


# 19ef317d 22-Mar-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: fallback to MKNOD/OPEN if a filesystem doesn't support CREATE

If a FUSE filesystem returns ENOSYS for FUSE_CREATE, then fallback to
FUSE_MKNOD/FUSE_OPEN.

Also, fix a memory leak in the error path of fuse_vnop_create. And do a
little cleanup in fuse_vnop_open.

PR: 199934
Reported by: samm@os2.kiev.ua
Sponsored by: The FreeBSD Foundation


# bf4d7084 22-Mar-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: support VOP_MKNOD

PR: 236236
Sponsored by: The FreeBSD Foundation


# 6248288e 21-Mar-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: correctly handle cacheable negative LOOKUP responses

The FUSE protocol allows for LOOKUP to return a cacheable negative response,
which means that the file doesn't exist and the kernel can cache its
nonexistence. As of this commit fusefs doesn't cache the nonexistence, but
it does correctly handle such responses. Prior to this commit attempting to
create a file, even with O_CREAT would fail with ENOENT if the daemon
returned a cacheable negative response.

PR: 236231
Sponsored by: The FreeBSD Foundation


# 915012e0 21-Mar-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: Don't treat fsync the same as fdatasync

For an unknown reason, fusefs was _always_ sending the fdatasync operation
instead of fsync. Now it correctly sends one or the other.

Also, remove the Fsync.fsync_metadata_only test, along with the recently
removed Fsync.nop. They should never have been added. The kernel shouldn't
keep track of which files have dirty data; that's the daemon's job.

PR: 236473
Sponsored by: The FreeBSD Foundation


# 90612f3c 21-Mar-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: VOP_FSYNC should be synchronous -- sometimes

I committed too hastily in r345390. There are cases, not directly reachable
from userland, where VOP_FSYNC ought to be asynchronous. This commit fixes
fusefs to handle VOP_FSYNC synchronously if and only if the VFS requests it.

PR: 236474
X-MFC-With: 345390
Sponsored by: The FreeBSD Foundation


# cc34f2f6 21-Mar-2019 Alan Somers <asomers@FreeBSD.org>

fusefs: VOP_FSYNC should be synchronous

returning asynchronously pretty much defeats the point of fsync

PR: 236474
Sponsored by: The FreeBSD Foundation


# 123af6ec 20-Mar-2019 Alan Somers <asomers@FreeBSD.org>

Rename fuse(4) to fusefs(4)

This makes it more consistent with other filesystems, which all end in "fs",
and more consistent with its mount helper, which is already named
"mount_fusefs".

Reviewed by: cem, rgrimes
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19649


# 7e4844f7 19-Mar-2019 Alan Somers <asomers@FreeBSD.org>

fuse(4): remove more debugging printfs

I missed these in r344664. They're basically useless because they can only
be controlled at compile-time. Also, de-inline fuse_internal_cache_attrs.
It's big enough to be a regular function, and this way it gets a dtrace FBT
probe.

Sponsored by: The FreeBSD Foundation


# e7df9886 06-Mar-2019 Conrad Meyer <cem@FreeBSD.org>

FUSE: Prevent trivial panic

When open(2) was invoked against a FUSE filesystem with an unexpected flags
value (no O_RDONLY / O_RDWR / O_WRONLY), an assertion fired, causing panic.

For now, prevent the panic by rejecting such VOP_OPENs with EINVAL.

This is not considered the correct long term fix, but does prevent an
unprivileged denial-of-service.

PR: 236329
Reported by: asomers
Reviewed by: asomers
Sponsored by: Dell EMC Isilon


# cf169498 28-Feb-2019 Alan Somers <asomers@FreeBSD.org>

fuse(4): convert debug printfs into dtrace probes

fuse(4) was heavily instrumented with debug printf statements that could
only be enabled with compile-time flags. They fell into three basic groups:

1) Totally redundant with dtrace FBT probes. These I deleted.
2) Print textual information, usually error messages. These I converted to
SDT probes of the form fuse:fuse:FILE:trace. They work just like the old
printf statements except they can be enabled at runtime with dtrace.
They can be filtered by FILE and/or by priority.
3) More complicated probes that print detailed information. These I
converted into ad-hoc SDT probes.

Sponsored by: The FreeBSD Foundation


# 02295caf 19-Feb-2019 Conrad Meyer <cem@FreeBSD.org>

Fuse: whitespace and style(9) cleanup

Take a pass through fixing some of the most egregious whitespace issues in
fs/fuse. Also fix some style(9) warts while here. Not 100% cleaned up, but
somewhat less painful to look at and edit.

No functional change.


# bd4cb2a4 19-Feb-2019 Conrad Meyer <cem@FreeBSD.org>

fuse: add descriptions for remaining sysctls

(Except reclaim revoked; I don't know what that goal of that one is.)


# 3c324b94 15-Feb-2019 Conrad Meyer <cem@FreeBSD.org>

FUSE: Refresh cached file size when it changes (lookup)

The cached fvdat->filesize is indepedent of the (mostly unused)
cached_attrs, and we failed to update it when a cached (but perhaps
inactive) vnode was found during VOP_LOOKUP to have a different size than
cached.

As noted in the code comment, this can occur in distributed filesystems or
with other kinds of irregular file behavior (anything is possible in FUSE).

We do something similar in fuse_vnop_getattr already.

PR: 230258 (as reported in description; other issues explored in
comments are not all resolved)
Reported by: MooseFS FreeBSD Team <freebsd AT moosefs.com>
Submitted by: Jakub Kruszona-Zawadzki <acid AT moosefs.com> (earlier version)


# 194e691a 15-Feb-2019 Conrad Meyer <cem@FreeBSD.org>

FUSE: Only "dirty" cached file size when data is dirty

Most users of fuse_vnode_setsize() set the cached fvdat->filesize and update
the buf cache bounds as a result of either a read from the underlying FUSE
filesystem, or as part of a write-through type operation (like truncate =>
VOP_SETATTR). In these cases, do not set the FN_SIZECHANGE flag, which
indicates that an inode's data is dirty (in particular, that the local buf
cache and fvdat->filesize have dirty extended data).

PR: 230258 (related)


# 09176f09 15-Feb-2019 Conrad Meyer <cem@FreeBSD.org>

FUSE: Respect userspace FS "do-not-cache" of path components

The FUSE protocol demands that kernel implementations cache user filesystem
path components (lookup/cnp data) for a maximum period of time in the range
of [0, ULONG_MAX] seconds. In practice, typical requests are for 0, 1, or
10 seconds; or "a long time" to represent indefinite caching.

Historically, FreeBSD FUSE has ignored this client directive entirely. This
works fine for local-only filesystems, but causes consistency issues with
multi-writer network filesystems.

For now, respect 0 second cache TTLs and do not cache such metadata.
Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds
are still cached indefinitely, because it is unclear how a userspace
filesystem could do anything sensible with those semantics even if
implemented.

Pass fuse_entry_out to fuse_vnode_get when available and only cache lookup
if the user filesystem did not set a zero second TTL.

PR: 230258 (inspired by; does not fix)


# 78a7722f 15-Feb-2019 Conrad Meyer <cem@FreeBSD.org>

FUSE: Respect userspace FS "do-not-cache" of file attributes

The FUSE protocol demands that kernel implementations cache user filesystem
file attributes (vattr data) for a maximum period of time in the range of
[0, ULONG_MAX] seconds. In practice, typical requests are for 0, 1, or 10
seconds; or "a long time" to represent indefinite caching.

Historically, FreeBSD FUSE has ignored this client directive entirely. This
works fine for local-only filesystems, but causes consistency issues with
multi-writer network filesystems.

For now, respect 0 second cache TTLs and do not cache such metadata.
Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds
are still cached indefinitely, because it is unclear how a userspace
filesystem could do anything sensible with those semantics even if
implemented.

In the future, as an optimization, we should implement notify_inval_entry,
etc, which provide userspace filesystems a way of evicting the kernel cache.

One potentially bogus access to invalid cached attribute data was left in
fuse_io_strategy. It is restricted behind the undocumented and non-default
"vfs.fuse.fix_broken_io" sysctl or "brokenio" mount option; maybe these are
deadcode and can be eliminated?

Some minor APIs changed to facilitate this:

1. Attribute cache validity is tracked in FUSE inodes ("fuse_vnode_data").

2. cache_attrs() respects the provided TTL and only caches in the FUSE
inode if TTL > 0. It also grows an "out" argument, which, if non-NULL,
stores the translated fuse_attr (even if not suitable for caching).

3. FUSE VTOVA(vp) returns NULL if the vnode's cache is invalid, to help
avoid programming mistakes.

4. A VOP_LINK check for potential nlink overflow prior to invoking the FUSE
link op was weakened (only performed when we have a valid attr cache). The
check is racy in a multi-writer network filesystem anyway -- classic TOCTOU.
We have to trust any userspace filesystem that rejects local caching to
account for it correctly.

PR: 230258 (inspired by; does not fix)


# 756a5412 14-Jan-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Allocate pager bufs from UMA instead of 80-ish mutex protected linked list.

o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many
pbufs are we going to have set.
In various subsystems that are going to utilize pbufs create private zones
via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(),
and sets a limit on created zone. After startup preallocate pbufs according
to requirements of all pbuf zones.

Subsystems that used to have a private limit with old allocator now have
private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS,
swap, vnode pager.

The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9),
aio(4). They should have their private limits, but changing that is out of
scope of this commit.

o Fetch tunable value of kern.nswbuf from init_param2() and while here move
NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only
this option.
Default values aren't touched by this commit, but they probably should be
reviewed wrt to modern hardware.

This change removes a tight bottleneck from sendfile(2) operation, that
uses pbufs in vnode pager. Other pagers also would benefit from faster
allocation.

Together with: gallatin
Tested by: pho


# cc426dd3 11-Dec-2018 Mateusz Guzik <mjg@FreeBSD.org>

Remove unused argument to priv_check_cred.

Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by: The FreeBSD Foundation


# 1493c2ee 02-Nov-2018 Brooks Davis <brooks@FreeBSD.org>

Make vop_symlink take a const target path.

This will enable callers to take const paths as part of syscall
decleration improvements.

Where doing so is easy and non-distruptive carry the const through
implementations. In UFS the value is passed to an interface that must
take non-const values. In ZFS, const poisoning would touch code shared
with upstream and it's not worth adding diffs.

Bump __FreeBSD_version for external API consumers.

Reviewed by: kib (prior version)
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17805


# 28f4f623 21-Aug-2018 Fedor Uporov <fsu@FreeBSD.org>

FUSE extattrs: fix issue when neither uio nor size were not passed to VOP_* (cosmetic only).

Reviewed by: cem, pfg
MFC after: 2 weeks

Differential Revision: https://reviews.freebsd.org/D13737


# 493b4a8c 21-Aug-2018 Fedor Uporov <fsu@FreeBSD.org>

FUSE extattrs: fix issue when neither uio nor size were not passed to VOP_*.

The requested size was returned incorrectly in case uio == NULL from listextattr because the
nameprefix/name conversion was not applied.
Also, make a_size/uio returning logic more unified with other filesystems.

Reviewed by: cem, pfg
MFC after: 2 weeks

Differential Revision: https://reviews.freebsd.org/D13528


# 3dc1c7d6 07-Aug-2018 Conrad Meyer <cem@FreeBSD.org>

FUSE: Remove some set-but-not-used variables

No functional change.


# f83f3d79 19-Dec-2017 John Baldwin <jhb@FreeBSD.org>

Update link count handling in fuse for post-ino64.

Set FUSE_LINK_MAX to UINT32_MAX instead of LINK_MAX to match the maximum
link count possible in the 'nlink' field of 'struct fuse_attr'.

Sponsored by: Chelsio Communications


# a74da9fb 19-Dec-2017 John Baldwin <jhb@FreeBSD.org>

Use FUSE_LINK_MAX for LINK_MAX in fuse' VOP_PATHCONF().

Should have included this in r326993.

MFC after: 1 month
Sponsored by: Chelsio Communications


# 599afe53 19-Dec-2017 John Baldwin <jhb@FreeBSD.org>

Move NAME_MAX, LINK_MAX, and CHOWN_RESTRICTED out of vop_stdpathconf().

Having all filesystems fall through to default values isn't always correct
and these values can vary for different filesystem implementations. Most
of these changes just use the existing default values with a few exceptions:
- Don't report CHOWN_RESTRICTED for ZFS since it doesn't do the exact
permissions check this claims for chown().
- Use NANDFS_NAME_LEN for NAME_MAX for nandfs.
- Don't report a LINK_MAX of 0 on smbfs. Now fail with EINVAL to
indicate hard links aren't supported.

Requested by: bde (though perhaps not this exact implementation)
Reviewed by: kib (earlier version)
MFC after: 1 month
Sponsored by: Chelsio Communications


# 746c92e0 19-Dec-2017 John Baldwin <jhb@FreeBSD.org>

Add a custom VOP_PATHCONF method for fuse.

This method handles _PC_FILESIZEBITS, _PC_SYMLINK_MAX, and _PC_NO_TRUNC.
For other values it defers to vop_stdpathconf().

MFC after: 1 month
Sponsored by: Chelsio Communications


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 04660064 14-Oct-2017 Fedor Uporov <fsu@FreeBSD.org>

Add extended attributes support to fuse kernel module.

Author: kem
Reviewed by: cem, pfg (mentor)
Approved by: pfg (mentor)
MFC after: 2 weeks

Differential Revision: https://reviews.freebsd.org/D12485


# 83c9dea1 17-Apr-2017 Gleb Smirnoff <glebius@FreeBSD.org>

- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter
in place. To do per-cpu stats, convert all fields that previously were
maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
before we have set up UMA and we can do counter_u64_alloc(), provide an
early counter mechanism:
o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
so that at early stages of boot, before counters are allocated we already
point to a counter that can be safely written to.
o For sparc64 that required a whole dummy pcpu[MAXCPU] array.

Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.

This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html

Reviewed by: kib, gallatin, marius, lidl
Differential Revision: https://reviews.freebsd.org/D10156


# ca148cda 17-Apr-2017 Gleb Smirnoff <glebius@FreeBSD.org>

Two more files missed in r317055: these files need sys/vmmeter.h, but now
they got it implicitly included via sys/pcpu.h.


# 372b97d0 18-May-2016 Rick Macklem <rmacklem@FreeBSD.org>

If a local (AF_LOCAL, AF_UNIX) socket creation (bind) is attempted
on a fuse mounted file system, it will crash. Although it may be
possible to make this work correctly, this patch avoids the crash
in the meantime.
I removed the MPASS(), since panicing for the FIFO case didn't make
a lot of sense when it returns an error for the others.

PR: 195000
Submitted by: henry.hu.sh@gmail.com (earlier version)
MFC after: 2 weeks


# e6e24456 15-May-2016 Rick Macklem <rmacklem@FreeBSD.org>

Fix fuse for "cp" of a mode 0444 file to the file system.

When "cp" of a file with read-only (mode 0444) to a fuse mounted
file system was attempted it would fail with EACCES. This was because
fuse would attempt to open the file WRONLY and the open would fail.
This patch changes the fuse_vnop_open() to test for an extant read-write
open and use that, if it is available.
This makes the "cp" of a read-only file to the fuse mounted file system
work ok.
There are simpler ways to fix this than adding the fuse_filehandle_validrw()
function, but this function is useful for future patches related to
exporting a fuse filesystem via NFS.

MFC after: 2 weeks


# 1390cca2 14-May-2016 Rick Macklem <rmacklem@FreeBSD.org>

Fix fuse to use DIRECT_IO when required.

When a file is opened write-only and a partial block was written,
buffered I/O would try and read the whole block in. This would
result in a hung thread, since there was no open (fuse filehandle)
that allowed reading. This patch avoids the problem by forcing
DIRECT_IO for this case.
It also sets DIRECT_IO when the file system specifies the FN_DIRECTIO
flag in its reply to the open.

Tested by: nishida@asusa.net, freebsd@moosefs.com
PR: 194293, 206238
MFC after: 2 weeks


# b3a15ddd 29-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/fs: spelling fixes in comments.

No functional change.


# f17f88d3 16-Dec-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Fix breakage caused by r292373 in ZFS/FUSE/NFS/SMBFS.

With the new VOP_GETPAGES() KPI the "count" argument counts pages already,
and doesn't need to be translated from bytes to pages.

While here make it consistent that *rbehind and *rahead are updated only
if we doesn't return error.

Pointy hat to: glebius


# b0cd2017 16-Dec-2015 Gleb Smirnoff <glebius@FreeBSD.org>

A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().

o With new KPI consumers can request contiguous ranges of pages, and
unlike before, all pages will be kept busied on return, like it was
done before with the 'reqpage' only. Now the reqpage goes away. With
new interface it is easier to implement code protected from race
conditions.

Such arrayed requests for now should be preceeded by a call to
vm_pager_haspage() to make sure that request is possible. This
could be improved later, making vm_pager_haspage() obsolete.

Strenghtening the promises on the business of the array of pages
allows us to remove such hacks as swp_pager_free_nrpage() and
vm_pager_free_nonreq().

o New KPI accepts two integer pointers that may optionally point at
values for read ahead and read behind, that a pager may do, if it
can. These pages are completely owned by pager, and not controlled
by the caller.

This shifts the UFS-specific readahead logic from vm_fault.c, which
should be file system agnostic, into vnode_pager.c. It also removes
one VOP_BMAP() request per hard fault.

Discussed with: kib, alc, jeff, scottl
Sponsored by: Nginx, Inc.
Sponsored by: Netflix


# ead063e0 02-Mar-2015 Edward Tomasz Napierala <trasz@FreeBSD.org>

Make fuse(4) respect FOPEN_DIRECT_IO. This is required for correct
operation of GlusterFS.

PR: 192701
Submitted by: harsha at harshavardhana.net
Reviewed by: kib@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation


# 6c21f6ed 18-Dec-2014 Konstantin Belousov <kib@FreeBSD.org>

The VOP_LOOKUP() implementations for CREATE op do not put the name
into namecache, to avoid cache trashing when doing large operations.
E.g., tar archive extraction is not usually followed by access to many
of the files created.

Right now, each VOP_LOOKUP() implementation explicitely knowns about
this quirk and tests for both MAKEENTRY flag presence and op != CREATE
to make the call to cache_enter(). Centralize the handling of the
quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP.
VFS now sets NOCACHE flag for CREATE namei() calls.

Note that the change in semantic is backward-compatible and could be
merged to the stable branch, and is compatible with non-changed
third-party filesystems which correctly handle MAKEENTRY.

Suggested by: Chris Torek <torek@pi-coral.com>
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 27ad26d8 09-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove unused arguments for VOP_GETPAGES(), VOP_PUTPAGES().


# c7aebda8 09-Aug-2013 Attilio Rao <attilio@FreeBSD.org>

The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl


# 89f6b863 08-Mar-2013 Attilio Rao <attilio@FreeBSD.org>

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


# d9454fab 07-Nov-2012 Attilio Rao <attilio@FreeBSD.org>

- Current caching mode is completely broken because it simply relies
on timing of the operations and not real lookup, bringing too many
false positives. Remove the whole mechanism. If it needs to be
implemented, next time it should really be done in the proper way.
- Fix VOP_GETATTR() in order to cope with userland bugs that would
change the type of file and not panic. Instead it gets the entry as
if it is not existing.

Reported and tested by: flo
MFC after: 2 months
X-MFC: 241519, 242536,242616


# 6de3b00d 03-Nov-2012 Attilio Rao <attilio@FreeBSD.org>

Fix a bug where operations was carried on even if not implemented,
leading to handling of an invalid fdip object.

Reported and tested by: flo
MFC after: 2 months
X-MFC: 241519


# 4cff153b 13-Oct-2012 Attilio Rao <attilio@FreeBSD.org>

Rename s/DEBUG()/FS_DEBUG() and s/DEBUG2G()/FS_DEBUG2G() in order to
avoid a name clash in sparc64.

MFC after: 2 months
X-MFC: r241519


# 5fe58019 13-Oct-2012 Attilio Rao <attilio@FreeBSD.org>

Import a FreeBSD port of the FUSE Linux module.
This has been developed during 2 summer of code mandates and being revived
by gnn recently.
The functionality in this commit mirrors entirely content of fusefs-kmod
port, which doesn't need to be installed anymore for -CURRENT setups.

In order to get some sparse technical notes, please refer to:
http://lists.freebsd.org/pipermail/freebsd-fs/2012-March/013876.html

or to the project branch:
svn://svn.freebsd.org/base/projects/fuse/

which also contains granular history of changes happened during port
refinements. This commit does not came from the branch reintegration
itself because it seems svn is not behaving properly for this functionaly
at the moment.

Partly Sponsored by: Google, Summer of Code program 2005, 2011
Originally submitted by: ilya, Csaba Henk <csaba-ml AT creo DOT hu >
In collabouration with: pho
Tested by: flo, gnn, Gustau Perez,
Kevin Oberman <rkoberman AT gmail DOT com>
MFC after: 2 months