History log of /linux-master/fs/fuse/readdir.c
Revision Date Author Comments
# cdf6ac2a 06-Mar-2024 Miklos Szeredi <mszeredi@redhat.com>

fuse: get rid of ff->readdir.lock

The same protection is provided by file->f_pos_lock.

Note, this relies on the fact that file->f_mode has FMODE_ATOMIC_POS.
This flag is cleared by stream_open(), which would prevent locking of
f_pos_lock.

Prior to commit 7de64d521bf9 ("fuse: break up fuse_open_common()")
FOPEN_STREAM on a directory would cause stream_open() to be called.
After this commit this is not done anymore, so f_pos_lock will always
be locked.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# 3c0d5df2 04-Oct-2023 Jeff Layton <jlayton@kernel.org>

fuse: convert to new timestamp accessors

Convert to using the new inode timestamp accessor functions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185347.80880-37-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 972f4c46 09-Aug-2023 Miklos Szeredi <mszeredi@redhat.com>

fuse: cache btime

Not all inode attributes are supported by all filesystems, but for the
basic stats (which are returned by stat(2) and friends) all of them will
have some value, even if that doesn't reflect a real attribute of the file.

Btime is different, in that filesystems are free to report or not report a
value in statx. If the value is available, then STATX_BTIME bit is set in
stx_mask.

When caching the value of btime, remember the availability of the attribute
as well as the value (if available). This is done by using the
FUSE_I_BTIME bit in fuse_inode->state to indicate availability, while using
fuse_inode->inval_mask & STATX_BTIME to indicate the state of the cache
itself (i.e. set if cache is invalid, and cleared if cache is valid).

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# 9dc10a54 09-Aug-2023 Miklos Szeredi <mszeredi@redhat.com>

fuse: add ATTR_TIMEOUT macro

Next patch will introduce yet another type attribute reply. Add a macro
that can handle attribute timeouts for all of the structs.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# b8bd342d 25-Apr-2023 ruanmeisi <ruan.meisi@zte.com.cn>

fuse: nlookup missing decrement in fuse_direntplus_link

During our debugging of glusterfs, we found an Assertion failed error:
inode_lookup >= nlookup, which was caused by the nlookup value in the
kernel being greater than that in the FUSE file system.

The issue was introduced by fuse_direntplus_link, where in the function,
fuse_iget increments nlookup, and if d_splice_alias returns failure,
fuse_direntplus_link returns failure without decrementing nlookup
https://github.com/gluster/glusterfs/pull/4081

Signed-off-by: ruanmeisi <ruan.meisi@zte.com.cn>
Fixes: 0b05b18381ee ("fuse: implement NFS-like readdirplus support")
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# a1db2f7e 12-Oct-2022 Fabio M. De Francesco <fmdefrancesco@gmail.com>

fs/fuse: Replace kmap() with kmap_local_page()

The use of kmap() is being deprecated in favor of kmap_local_page().

There are two main problems with kmap(): (1) It comes with an overhead as
the mapping space is restricted and protected by a global lock for
synchronization and (2) it also requires global TLB invalidation when the
kmap’s pool wraps and it might block when the mapping space is fully
utilized until a slot becomes available.

With kmap_local_page() the mappings are per thread, CPU local, can take
page faults, and can be called from any context (including interrupts).
It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore,
the tasks can be preempted and, when they are scheduled to run again, the
kernel virtual addresses are restored and still valid.

Therefore, replace kmap() with kmap_local_page() in fuse_readdir_cached(),
it being the only call site of kmap() currently left in fs/fuse.

Cc: "Venkataramanan, Anirudh" <anirudh.venkataramanan@intel.com>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# 9fa248c6 20-Oct-2022 Miklos Szeredi <mszeredi@redhat.com>

fuse: fix readdir cache race

There's a race in fuse's readdir cache that can result in an uninitilized
page being read. The page lock is supposed to prevent this from happening
but in the following case it doesn't:

Two fuse_add_dirent_to_cache() start out and get the same parameters
(size=0,offset=0). One of them wins the race to create and lock the page,
after which it fills in data, sets rdc.size and unlocks the page.

In the meantime the page gets evicted from the cache before the other
instance gets to run. That one also creates the page, but finds the
size to be mismatched, bails out and leaves the uninitialized page in the
cache.

Fix by marking a filled page uptodate and ignoring non-uptodate pages.

Reported-by: Frank Sorenson <fsorenso@redhat.com>
Fixes: 5d7bc7e8680c ("fuse: allow using readdir cache")
Cc: <stable@vger.kernel.org> # v4.20
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# c6c745b8 22-Oct-2021 Miklos Szeredi <mszeredi@redhat.com>

fuse: only update necessary attributes

fuse_update_attributes() refreshes metadata for internal use.

Each use needs a particular set of attributes to be refreshed, but
currently that cannot be expressed and all but atime are refreshed.

Add a mask argument, which lets fuse_update_get_attr() to decide based on
the cache_mask and the inval_mask whether a GETATTR call is needed or not.

Reported-by: Yongji Xie <xieyongji@bytedance.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# 5fe0fc9f 08-Sep-2021 Peng Hao <flyingpeng@tencent.com>

fuse: use kmap_local_page()

Due to the introduction of kmap_local_*, the storage of slots used for
short-term mapping has changed from per-CPU to per-thread. kmap_atomic()
disable preemption, while kmap_local_*() only disable migration.

There is no need to disable preemption in several kamp_atomic places used
in fuse.

Link: https://lwn.net/Articles/836144/
Signed-off-by: Peng Hao <flyingpeng@tencent.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# 15db1683 21-Jun-2021 Amir Goldstein <amir73il@gmail.com>

fuse: fix illegal access to inode with reused nodeid

Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
with ourarg containing nodeid and generation.

If a fuse inode is found in inode cache with the same nodeid but different
generation, the existing fuse inode should be unhashed and marked "bad" and
a new inode with the new generation should be hashed instead.

This can happen, for example, with passhrough fuse filesystem that returns
the real filesystem ino/generation on lookup and where real inode numbers
can get recycled due to real files being unlinked not via the fuse
passthrough filesystem.

With current code, this situation will not be detected and an old fuse
dentry that used to point to an older generation real inode, can be used to
access a completely new inode, which should be accessed only via the new
dentry.

Note that because the FORGET message carries the nodeid w/o generation, the
server should wait to get FORGET counts for the nlookup counts of the old
and reused inodes combined, before it can free the resources associated to
that nodeid.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# 6e3e2c43 01-Mar-2021 Al Viro <viro@zeniv.linux.org.uk>

new helper: inode_wrong_type()

inode_wrong_type(inode, mode) returns true if setting inode->i_mode
to given value would've changed the inode type. We have enough of
those checks open-coded to make a helper worthwhile.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 5d069dbe 10-Dec-2020 Miklos Szeredi <mszeredi@redhat.com>

fuse: fix bad inode

Jan Kara's analysis of the syzbot report (edited):

The reproducer opens a directory on FUSE filesystem, it then attaches
dnotify mark to the open directory. After that a fuse_do_getattr() call
finds that attributes returned by the server are inconsistent, and calls
make_bad_inode() which, among other things does:

inode->i_mode = S_IFREG;

This then confuses dnotify which doesn't tear down its structures
properly and eventually crashes.

Avoid calling make_bad_inode() on a live inode: switch to a private flag on
the fuse inode. Also add the test to ops which the bad_inode_ops would
have caught.

This bug goes back to the initial merge of fuse in 2.6.14...

Reported-by: syzbot+f427adf9324b92652ccc@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>


# fcee216b 06-May-2020 Max Reitz <mreitz@redhat.com>

fuse: split fuse_mount off of fuse_conn

We want to allow submounts for the same fuse_conn, but with different
superblocks so that each of the submounts has its own device ID. To do
so, we need to split all mount-specific information off of fuse_conn
into a new fuse_mount structure, so that multiple mounts can share a
single fuse_conn.

We need to take care only to perform connection-level actions once (i.e.
when the fuse_conn and thus the first fuse_mount are established, or
when the last fuse_mount and thus the fuse_conn are destroyed). For
example, fuse_sb_destroy() must invoke fuse_send_destroy() until the
last superblock is released.

To do so, we keep track of which fuse_mount is the root mount and
perform all fuse_conn-level actions only when this fuse_mount is
involved.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# cabdb4fa 14-Jan-2020 zhengbin <zhengbin13@huawei.com>

fuse: use true,false for bool variable

Fixes coccicheck warning:

fs/fuse/readdir.c:335:1-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/file.c:1398:2-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/file.c:1400:2-20: WARNING: Assignment of 0/1 to bool variable
fs/fuse/cuse.c:454:1-20: WARNING: Assignment of 0/1 to bool variable
fs/fuse/cuse.c:455:1-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:497:2-17: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:504:2-23: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:511:2-22: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:518:2-23: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:522:2-26: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:526:2-18: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:1000:1-20: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# eb59bd17 12-Nov-2019 Miklos Szeredi <mszeredi@redhat.com>

fuse: verify attributes

If a filesystem returns negative inode sizes, future reads on the file were
causing the cpu to spin on truncate_pagecache.

Create a helper to validate the attributes. This now does two things:

- check the file mode
- check if the file size fits in i_size without overflowing

Reported-by: Arijit Banerjee <arijit@rubrik.com>
Fixes: d8a5ba45457e ("[PATCH] FUSE - core")
Cc: <stable@vger.kernel.org> # v2.6.14
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# e5854b1c 22-Sep-2019 Tejun Heo <tj@kernel.org>

fuse: fix beyond-end-of-page access in fuse_parse_cache()

With DEBUG_PAGEALLOC on, the following triggers.

BUG: unable to handle page fault for address: ffff88859367c000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 3001067 P4D 3001067 PUD 406d3a8067 PMD 406d30c067 PTE 800ffffa6c983060
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
CPU: 38 PID: 3110657 Comm: python2.7
RIP: 0010:fuse_readdir+0x88f/0xe7a [fuse]
Code: 49 8b 4d 08 49 39 4e 60 0f 84 44 04 00 00 48 8b 43 08 43 8d 1c 3c 4d 01 7e 68 49 89 dc 48 03 5c 24 38 49 89 46 60 8b 44 24 30 <8b> 4b 10 44 29 e0 48 89 ca 48 83 c1 1f 48 83 e1 f8 83 f8 17 49 89
RSP: 0018:ffffc90035edbde0 EFLAGS: 00010286
RAX: 0000000000001000 RBX: ffff88859367bff0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88859367bfed RDI: 0000000000920907
RBP: ffffc90035edbe90 R08: 000000000000014b R09: 0000000000000004
R10: ffff88859367b000 R11: 0000000000000000 R12: 0000000000000ff0
R13: ffffc90035edbee0 R14: ffff889fb8546180 R15: 0000000000000020
FS: 00007f80b5f4a740(0000) GS:ffff889fffa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88859367c000 CR3: 0000001c170c2001 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
iterate_dir+0x122/0x180
__x64_sys_getdents+0xa6/0x140
do_syscall_64+0x42/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9

It's in fuse_parse_cache(). %rbx (ffff88859367bff0) is fuse_dirent
pointer - addr + offset. FUSE_DIRENT_SIZE() is trying to dereference
namelen off of it but that derefs into the next page which is disabled
by pagealloc debug causing a PF.

This is caused by dirent->namelen being accessed before ensuring that
there's enough bytes in the page for the dirent. Fix it by pushing
down reclen calculation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 5d7bc7e8680c ("fuse: allow using readdir cache")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# 43f5098e 10-Sep-2019 Miklos Szeredi <mszeredi@redhat.com>

fuse: convert readdir to simple api

The old fuse_read_fill() helper can be deleted, now that the last user is
gone.

The fuse_io_args struct is moved to fuse_i.h so it can be shared between
readdir/read code.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# 3545fe21 10-Sep-2019 Miklos Szeredi <mszeredi@redhat.com>

fuse: convert fuse_force_forget() to simple api

Move this function to the readdir.c where its only caller resides.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# c9d8f5f0 09-Nov-2018 Kirill Tkhai <ktkhai@virtuozzo.com>

fuse: Protect fi->nlookup with fi->lock

This continues previous patch and introduces the same protection for
nlookup field.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# 261aaba7 01-Oct-2018 Miklos Szeredi <mszeredi@redhat.com>

fuse: use iversion for readdir cache verification

Use the internal iversion counter to make sure modifications of the
directory through this filesystem are not missed by the mtime check (due to
mtime granularity).

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# 7118883b 01-Oct-2018 Miklos Szeredi <mszeredi@redhat.com>

fuse: use mtime for readdir cache verification

Store the modification time of the directory in the cache, obtained before
starting to fill the cache.

When reading the cache, verify that the directory hasn't changed, by
checking if current modification time is the same as the one stored in the
cache.

This only needs to be done when the current file position is at the
beginning of the directory, as mandated by POSIX.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# 3494927e 01-Oct-2018 Miklos Szeredi <mszeredi@redhat.com>

fuse: add readdir cache version

Allow the cache to be invalidated when page(s) have gone missing. In this
case increment the version of the cache and reset to an empty state.

Add a version number to the directory stream in struct fuse_file as well,
indicating the version of the cache it's supposed to be reading. If the
cache version doesn't match the stream's version, then reset the stream to
the beginning of the cache.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# 5d7bc7e8 01-Oct-2018 Miklos Szeredi <mszeredi@redhat.com>

fuse: allow using readdir cache

The cache is only used if it's completed, not while it's still being
filled; this constraint could be lifted later, if it turns out to be
useful.

Introduce state in struct fuse_file that indicates the position within the
cache. After a seek, reset the position to the beginning of the cache and
search the cache for the current position. If the current position is not
found in the cache, then fall back to uncached readdir.

It can also happen that page(s) disappear from the cache, in which case we
must also fall back to uncached readdir.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# 69e34551 01-Oct-2018 Miklos Szeredi <mszeredi@redhat.com>

fuse: allow caching readdir

This patch just adds the cache filling functions, which are invoked if
FOPEN_CACHE_DIR flag is set in the OPENDIR reply.

Cache reading and cache invalidation are added by subsequent patches.

The directory cache uses the page cache. Directory entries are packed into
a page in the same format as in the READDIR reply. A page only contains
whole entries, the space at the end of the page is cleared. The page is
locked while being modified.

Multiple parallel readdirs on the same directory can fill the cache; the
only constraint is that continuity must be maintained (d_off of last entry
points to position of current entry).

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# 18172b10 28-Sep-2018 Miklos Szeredi <mszeredi@redhat.com>

fuse: extract fuse_emit() helper

Prepare for cache filling by introducing a helper for emitting a single
directory entry.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# d123d8e1 28-Sep-2018 Miklos Szeredi <mszeredi@redhat.com>

fuse: split out readdir.c

Directory reading code is about to grow larger, so split it out from dir.c
into a new source file.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>