Searched +hist:50 +hist:fa53ec (Results 1 - 6 of 6) sorted by relevance

/linux-master/fs/f2fs/
H A Dnode.cdiff 86d7d57a Mon Dec 11 19:15:27 MST 2023 Zhiguo Niu <zhiguo.niu@unisoc.com> f2fs: fix to check return value of f2fs_recover_xattr_data

Should check return value of f2fs_recover_xattr_data in
__f2fs_setxattr rather than doing invalid retry if error happen.

Also just do set_page_dirty in f2fs_recover_xattr_data when
page is changed really.

Fixes: 50a472bbc79f ("f2fs: do not return EFSCORRUPTED, but try to run online repair")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 50a472bb Thu Oct 19 16:51:08 MDT 2023 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: do not return EFSCORRUPTED, but try to run online repair

If we return the error, there's no way to recover the status as of now, since
fsck does not fix the xattr boundary issue.

Cc: stable@vger.kernel.org
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff a6ec8378 Thu Jun 29 05:11:44 MDT 2023 Chao Yu <chao@kernel.org> f2fs: fix to do sanity check on direct node in truncate_dnode()

syzbot reports below bug:

BUG: KASAN: slab-use-after-free in f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574
Read of size 4 at addr ffff88802a25c000 by task syz-executor148/5000

CPU: 1 PID: 5000 Comm: syz-executor148 Not tainted 6.4.0-rc7-syzkaller-00041-ge660abd551f1 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:351
print_report mm/kasan/report.c:462 [inline]
kasan_report+0x11c/0x130 mm/kasan/report.c:572
f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574
truncate_dnode+0x229/0x2e0 fs/f2fs/node.c:944
f2fs_truncate_inode_blocks+0x64b/0xde0 fs/f2fs/node.c:1154
f2fs_do_truncate_blocks+0x4ac/0xf30 fs/f2fs/file.c:721
f2fs_truncate_blocks+0x7b/0x300 fs/f2fs/file.c:749
f2fs_truncate.part.0+0x4a5/0x630 fs/f2fs/file.c:799
f2fs_truncate include/linux/fs.h:825 [inline]
f2fs_setattr+0x1738/0x2090 fs/f2fs/file.c:1006
notify_change+0xb2c/0x1180 fs/attr.c:483
do_truncate+0x143/0x200 fs/open.c:66
handle_truncate fs/namei.c:3295 [inline]
do_open fs/namei.c:3640 [inline]
path_openat+0x2083/0x2750 fs/namei.c:3791
do_filp_open+0x1ba/0x410 fs/namei.c:3818
do_sys_openat2+0x16d/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_creat fs/open.c:1448 [inline]
__se_sys_creat fs/open.c:1442 [inline]
__x64_sys_creat+0xcd/0x120 fs/open.c:1442
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

The root cause is, inodeA references inodeB via inodeB's ino, once inodeA
is truncated, it calls truncate_dnode() to truncate data blocks in inodeB's
node page, it traverse mapping data from node->i.i_addr[0] to
node->i.i_addr[ADDRS_PER_BLOCK() - 1], result in out-of-boundary access.

This patch fixes to add sanity check on dnode page in truncate_dnode(),
so that, it can help to avoid triggering such issue, and once it encounters
such issue, it will record newly introduced ERROR_INVALID_NODE_REFERENCE
error into superblock, later fsck can detect such issue and try repairing.

Also, it removes f2fs_truncate_data_blocks() for cleanup due to the
function has only one caller, and uses f2fs_truncate_data_blocks_range()
instead.

Reported-and-tested-by: syzbot+12cb4425b22169b52036@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000f3038a05fef867f8@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 71644dff Thu Dec 01 18:37:15 MST 2022 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: add block_age-based extent cache

This patch introduces a runtime hot/cold data separation method
for f2fs, in order to improve the accuracy for data temperature
classification, reduce the garbage collection overhead after
long-term data updates.

Enhanced hot/cold data separation can record data block update
frequency as "age" of the extent per inode, and take use of the age
info to indicate better temperature type for data block allocation:
- It records total data blocks allocated since mount;
- When file extent has been updated, it calculate the count of data
blocks allocated since last update as the age of the extent;
- Before the data block allocated, it searches for the age info and
chooses the suitable segment for allocation.

Test and result:
- Prepare: create about 30000 files
* 3% for cold files (with cold file extension like .apk, from 3M to 10M)
* 50% for warm files (with random file extension like .FcDxq, from 1K
to 4M)
* 47% for hot files (with hot file extension like .db, from 1K to 256K)
- create(5%)/random update(90%)/delete(5%) the files
* total write amount is about 70G
* fsync will be called for .db files, and buffered write will be used
for other files

The storage of test device is large enough(128G) so that it will not
switch to SSR mode during the test.

Benefit: dirty segment count increment reduce about 14%
- before: Dirty +21110
- after: Dirty +18286

Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com>
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 141170b7 Sun Jul 24 10:03:23 MDT 2022 Chao Yu <chao.yu@oppo.com> f2fs: fix to avoid use f2fs_bug_on() in f2fs_new_node_page()

As Dipanjan Das <mail.dipanjan.das@gmail.com> reported, syzkaller
found a f2fs bug as below:

RIP: 0010:f2fs_new_node_page+0x19ac/0x1fc0 fs/f2fs/node.c:1295
Call Trace:
write_all_xattrs fs/f2fs/xattr.c:487 [inline]
__f2fs_setxattr+0xe76/0x2e10 fs/f2fs/xattr.c:743
f2fs_setxattr+0x233/0xab0 fs/f2fs/xattr.c:790
f2fs_xattr_generic_set+0x133/0x170 fs/f2fs/xattr.c:86
__vfs_setxattr+0x115/0x180 fs/xattr.c:182
__vfs_setxattr_noperm+0x125/0x5f0 fs/xattr.c:216
__vfs_setxattr_locked+0x1cf/0x260 fs/xattr.c:277
vfs_setxattr+0x13f/0x330 fs/xattr.c:303
setxattr+0x146/0x160 fs/xattr.c:611
path_setxattr+0x1a7/0x1d0 fs/xattr.c:630
__do_sys_lsetxattr fs/xattr.c:653 [inline]
__se_sys_lsetxattr fs/xattr.c:649 [inline]
__x64_sys_lsetxattr+0xbd/0x150 fs/xattr.c:649
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0

NAT entry and nat bitmap can be inconsistent, e.g. one nid is free
in nat bitmap, and blkaddr in its NAT entry is not NULL_ADDR, it
may trigger BUG_ON() in f2fs_new_node_page(), fix it.

Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com>
Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 6ce19aff Thu May 20 05:51:50 MDT 2021 Chao Yu <chao@kernel.org> f2fs: compress: add compress_inode to cache compressed blocks

Support to use address space of inner inode to cache compressed block,
in order to improve cache hit ratio of random read.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 042be373 Fri May 08 03:50:20 MDT 2020 Chao Yu <chao@kernel.org> f2fs: shrink spinlock coverage

In f2fs_try_to_free_nids(), .nid_list_lock spinlock critical region will
increase as expected shrink number increase, to avoid spining other CPUs
for long time, we change to release nid caches with small batch each time
under .nid_list_lock coverage.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
H A Dcheckpoint.cdiff 27275f18 Tue Mar 29 12:28:07 MDT 2022 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: fix wrong condition check when failing metapage read

This patch fixes wrong initialization.

Fixes: 50c63009f6ab ("f2fs: avoid an infinite loop in f2fs_sync_dirty_inodes")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 50c63009 Wed Mar 02 12:39:08 MST 2022 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: avoid an infinite loop in f2fs_sync_dirty_inodes

If one read IO is always failing, we can fall into an infinite loop in
f2fs_sync_dirty_inodes. This happens during xfstests/generic/475.

[ 142.803335] Buffer I/O error on dev dm-1, logical block 8388592, async page read
...
[ 382.887210] submit_bio_noacct+0xdd/0x2a0
[ 382.887213] submit_bio+0x80/0x110
[ 382.887223] __submit_bio+0x4d/0x300 [f2fs]
[ 382.887282] f2fs_submit_page_bio+0x125/0x200 [f2fs]
[ 382.887299] __get_meta_page+0xc9/0x280 [f2fs]
[ 382.887315] f2fs_get_meta_page+0x13/0x20 [f2fs]
[ 382.887331] f2fs_get_node_info+0x317/0x3c0 [f2fs]
[ 382.887350] f2fs_do_write_data_page+0x327/0x6f0 [f2fs]
[ 382.887367] f2fs_write_single_data_page+0x5b7/0x960 [f2fs]
[ 382.887386] f2fs_write_cache_pages+0x302/0x890 [f2fs]
[ 382.887405] ? preempt_count_add+0x7a/0xc0
[ 382.887408] f2fs_write_data_pages+0xfd/0x320 [f2fs]
[ 382.887425] ? _raw_spin_unlock+0x1a/0x30
[ 382.887428] do_writepages+0xd3/0x1d0
[ 382.887432] filemap_fdatawrite_wbc+0x69/0x90
[ 382.887434] filemap_fdatawrite+0x50/0x70
[ 382.887437] f2fs_sync_dirty_inodes+0xa4/0x270 [f2fs]
[ 382.887453] f2fs_write_checkpoint+0x189/0x1640 [f2fs]
[ 382.887469] ? schedule_timeout+0x114/0x150
[ 382.887471] ? ttwu_do_activate+0x6d/0xb0
[ 382.887473] ? preempt_count_add+0x7a/0xc0
[ 382.887476] kill_f2fs_super+0xca/0x100 [f2fs]
[ 382.887491] deactivate_locked_super+0x35/0xa0
[ 382.887494] deactivate_super+0x40/0x50
[ 382.887497] cleanup_mnt+0x139/0x190
[ 382.887499] __cleanup_mnt+0x12/0x20
[ 382.887501] task_work_run+0x64/0xa0
[ 382.887505] exit_to_user_mode_prepare+0x1b7/0x1c0
[ 382.887508] syscall_exit_to_user_mode+0x27/0x50
[ 382.887510] do_syscall_64+0x48/0xc0
[ 382.887513] entry_SYSCALL_64_after_hwframe+0x44/0xae

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 50fa53ec Thu Aug 02 09:03:19 MDT 2018 Chao Yu <chao@kernel.org> f2fs: fix to avoid broken of dnode block list

f2fs recovery flow is relying on dnode block link list, it means fsynced
file recovery depends on previous dnode's persistence in the list, so
during fsync() we should wait on all regular inode's dnode writebacked
before issuing flush.

By this way, we can avoid dnode block list being broken by out-of-order
IO submission due to IO scheduler or driver.

Sheng Yong helps to do the test with this patch:

Target:/data (f2fs, -)
64MB / 32768KB / 4KB / 8

1 / PERSIST / Index

Base:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 867.82 204.15 41440.03 41370.54 680.8 1025.94 1031.08
2 871.87 205.87 41370.3 40275.2 791.14 1065.84 1101.7
3 866.52 205.69 41795.67 40596.16 694.69 1037.16 1031.48
Avg 868.7366667 205.2366667 41535.33333 40747.3 722.21 1042.98 1054.753333

After:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 798.81 202.5 41143 40613.87 602.71 838.08 913.83
2 805.79 206.47 40297.2 41291.46 604.44 840.75 924.27
3 814.83 206.17 41209.57 40453.62 602.85 834.66 927.91
Avg 806.4766667 205.0466667 40883.25667 40786.31667 603.3333333 837.83 922.0033333

Patched/Original:
0.928332713 0.999074239 0.984300676 1.000957528 0.835398753 0.803303994 0.874141189

It looks like atomic write will suffer performance regression.

I suspect that the criminal is that we forcing to wait all dnode being in
storage cache before we issue PREFLUSH+FUA.

BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
cause the problem: we will lose data of last transaction after SPO, even if
atomic write return no error:

- atomic_open();
- write() P1, P2, P3;
- atomic_commit();
- writeback data: P1, P2, P3;
- writeback node: N1, N2, N3; <--- If N1, N2 is not writebacked, N3 with fsync_mark is
writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
last transaction.
- preflush + fua;
- power-cut

If we don't wait dnode writeback for atomic_write:

SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 779.91 206.03 41621.5 40333.16 716.9 1038.21 1034.85
2 848.51 204.35 40082.44 39486.17 791.83 1119.96 1083.77
3 772.12 206.27 41335.25 41599.65 723.29 1055.07 971.92
Avg 800.18 205.55 41013.06333 40472.99333 744.0066667 1071.08 1030.18

Patched/Original:
0.92108464 1.001526693 0.987425886 0.993268102 1.030180511 1.026942031 0.976702294

SQLite's performance recovers.

Jaegeuk:
"Practically, I don't see db corruption becase of this. We can excuse to lose
the last transaction."

Finally, we decide to keep original implementation of atomic write interface
sematics that we don't wait all dnode writeback before preflush+fua submission.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 50fa53ec Thu Aug 02 09:03:19 MDT 2018 Chao Yu <chao@kernel.org> f2fs: fix to avoid broken of dnode block list

f2fs recovery flow is relying on dnode block link list, it means fsynced
file recovery depends on previous dnode's persistence in the list, so
during fsync() we should wait on all regular inode's dnode writebacked
before issuing flush.

By this way, we can avoid dnode block list being broken by out-of-order
IO submission due to IO scheduler or driver.

Sheng Yong helps to do the test with this patch:

Target:/data (f2fs, -)
64MB / 32768KB / 4KB / 8

1 / PERSIST / Index

Base:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 867.82 204.15 41440.03 41370.54 680.8 1025.94 1031.08
2 871.87 205.87 41370.3 40275.2 791.14 1065.84 1101.7
3 866.52 205.69 41795.67 40596.16 694.69 1037.16 1031.48
Avg 868.7366667 205.2366667 41535.33333 40747.3 722.21 1042.98 1054.753333

After:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 798.81 202.5 41143 40613.87 602.71 838.08 913.83
2 805.79 206.47 40297.2 41291.46 604.44 840.75 924.27
3 814.83 206.17 41209.57 40453.62 602.85 834.66 927.91
Avg 806.4766667 205.0466667 40883.25667 40786.31667 603.3333333 837.83 922.0033333

Patched/Original:
0.928332713 0.999074239 0.984300676 1.000957528 0.835398753 0.803303994 0.874141189

It looks like atomic write will suffer performance regression.

I suspect that the criminal is that we forcing to wait all dnode being in
storage cache before we issue PREFLUSH+FUA.

BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
cause the problem: we will lose data of last transaction after SPO, even if
atomic write return no error:

- atomic_open();
- write() P1, P2, P3;
- atomic_commit();
- writeback data: P1, P2, P3;
- writeback node: N1, N2, N3; <--- If N1, N2 is not writebacked, N3 with fsync_mark is
writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
last transaction.
- preflush + fua;
- power-cut

If we don't wait dnode writeback for atomic_write:

SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 779.91 206.03 41621.5 40333.16 716.9 1038.21 1034.85
2 848.51 204.35 40082.44 39486.17 791.83 1119.96 1083.77
3 772.12 206.27 41335.25 41599.65 723.29 1055.07 971.92
Avg 800.18 205.55 41013.06333 40472.99333 744.0066667 1071.08 1030.18

Patched/Original:
0.92108464 1.001526693 0.987425886 0.993268102 1.030180511 1.026942031 0.976702294

SQLite's performance recovers.

Jaegeuk:
"Practically, I don't see db corruption becase of this. We can excuse to lose
the last transaction."

Finally, we decide to keep original implementation of atomic write interface
sematics that we don't wait all dnode writeback before preflush+fua submission.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
H A Dsuper.cdiff 11bec96a Thu Mar 07 20:50:57 MST 2024 Chao Yu <chao@kernel.org> f2fs: zone: fix to remove pow2 check condition for zoned block device

Commit 2e2c6e9b72ce ("f2fs: remove power-of-two limitation of zoned
device") missed to remove pow2 check condition in init_blkz_info(),
fix it.

Fixes: 2e2c6e9b72ce ("f2fs: remove power-of-two limitation of zoned device")
Signed-off-by: Feng Song <songfeng@oppo.com>
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 7a0263dc Sun Sep 24 23:54:50 MDT 2023 Eric Biggers <ebiggers@google.com> fscrypt: replace get_ino_and_lblk_bits with just has_32bit_inodes

Now that fs/crypto/ computes the filesystem's lblk_bits from its maximum
file size, it is no longer necessary for filesystems to provide
lblk_bits via fscrypt_operations::get_ino_and_lblk_bits.

It is still necessary for fs/crypto/ to retrieve ino_bits from the
filesystem. However, this is used only to decide whether inode numbers
fit in 32 bits. Also, ino_bits is static for all relevant filesystems,
i.e. it doesn't depend on the filesystem instance.

Therefore, in the interest of keeping things as simple as possible,
replace 'get_ino_and_lblk_bits' with a flag 'has_32bit_inodes'. This
can always be changed back to a function if a filesystem needs it to be
dynamic, but for now a static flag is all that's needed.

Link: https://lore.kernel.org/r/20230925055451.59499-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
diff 458c15df Mon May 22 21:58:22 MDT 2023 Chao Yu <chao@kernel.org> f2fs: don't reset unchangable mount option in f2fs_remount()

syzbot reports a bug as below:

general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] PREEMPT SMP KASAN
RIP: 0010:__lock_acquire+0x69/0x2000 kernel/locking/lockdep.c:4942
Call Trace:
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5691
__raw_write_lock include/linux/rwlock_api_smp.h:209 [inline]
_raw_write_lock+0x2e/0x40 kernel/locking/spinlock.c:300
__drop_extent_tree+0x3ac/0x660 fs/f2fs/extent_cache.c:1100
f2fs_drop_extent_tree+0x17/0x30 fs/f2fs/extent_cache.c:1116
f2fs_insert_range+0x2d5/0x3c0 fs/f2fs/file.c:1664
f2fs_fallocate+0x4e4/0x6d0 fs/f2fs/file.c:1838
vfs_fallocate+0x54b/0x6b0 fs/open.c:324
ksys_fallocate fs/open.c:347 [inline]
__do_sys_fallocate fs/open.c:355 [inline]
__se_sys_fallocate fs/open.c:353 [inline]
__x64_sys_fallocate+0xbd/0x100 fs/open.c:353
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

The root cause is race condition as below:
- since it tries to remount rw filesystem, so that do_remount won't
call sb_prepare_remount_readonly to block fallocate, there may be race
condition in between remount and fallocate.
- in f2fs_remount(), default_options() will reset mount option to default
one, and then update it based on result of parse_options(), so there is
a hole which race condition can happen.

Thread A Thread B
- f2fs_fill_super
- parse_options
- clear_opt(READ_EXTENT_CACHE)

- f2fs_remount
- default_options
- set_opt(READ_EXTENT_CACHE)
- f2fs_fallocate
- f2fs_insert_range
- f2fs_drop_extent_tree
- __drop_extent_tree
- __may_extent_tree
- test_opt(READ_EXTENT_CACHE) return true
- write_lock(&et->lock) access NULL pointer
- parse_options
- clear_opt(READ_EXTENT_CACHE)

Cc: <stable@vger.kernel.org>
Reported-by: syzbot+d015b6c2fbb5c383bf08@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/20230522124203.3838360-1-chao@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 71644dff Thu Dec 01 18:37:15 MST 2022 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: add block_age-based extent cache

This patch introduces a runtime hot/cold data separation method
for f2fs, in order to improve the accuracy for data temperature
classification, reduce the garbage collection overhead after
long-term data updates.

Enhanced hot/cold data separation can record data block update
frequency as "age" of the extent per inode, and take use of the age
info to indicate better temperature type for data block allocation:
- It records total data blocks allocated since mount;
- When file extent has been updated, it calculate the count of data
blocks allocated since last update as the age of the extent;
- Before the data block allocated, it searches for the age info and
chooses the suitable segment for allocation.

Test and result:
- Prepare: create about 30000 files
* 3% for cold files (with cold file extension like .apk, from 3M to 10M)
* 50% for warm files (with random file extension like .FcDxq, from 1K
to 4M)
* 47% for hot files (with hot file extension like .db, from 1K to 256K)
- create(5%)/random update(90%)/delete(5%) the files
* total write amount is about 70G
* fsync will be called for .db files, and buffered write will be used
for other files

The storage of test device is large enough(128G) so that it will not
switch to SSR mode during the test.

Benefit: dirty segment count increment reduce about 14%
- before: Dirty +21110
- after: Dirty +18286

Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com>
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff cc249e4c Sun Nov 06 06:25:44 MST 2022 Chao Yu <chao@kernel.org> f2fs: fix to avoid accessing uninitialized spinlock

syzbot reports a kernel bug:

__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106
assign_lock_key+0x22a/0x240 kernel/locking/lockdep.c:981
register_lock_class+0x287/0x9b0 kernel/locking/lockdep.c:1294
__lock_acquire+0xe4/0x1f60 kernel/locking/lockdep.c:4934
lock_acquire+0x1a7/0x400 kernel/locking/lockdep.c:5668
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:350 [inline]
f2fs_save_errors fs/f2fs/super.c:3868 [inline]
f2fs_handle_error+0x29/0x230 fs/f2fs/super.c:3896
f2fs_iget+0x215/0x4bb0 fs/f2fs/inode.c:516
f2fs_fill_super+0x47d3/0x7b50 fs/f2fs/super.c:4222
mount_bdev+0x26c/0x3a0 fs/super.c:1401
legacy_get_tree+0xea/0x180 fs/fs_context.c:610
vfs_get_tree+0x88/0x270 fs/super.c:1531
do_new_mount+0x289/0xad0 fs/namespace.c:3040
do_mount fs/namespace.c:3383 [inline]
__do_sys_mount fs/namespace.c:3591 [inline]
__se_sys_mount+0x2e3/0x3d0 fs/namespace.c:3568
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

F2FS-fs (loop1): Failed to read F2FS meta data inode

The root cause is if sbi->error_lock may be accessed before
its initialization, fix it.

Link: https://lore.kernel.org/linux-f2fs-devel/0000000000007edb6605ecbb6442@google.com/T/#u
Reported-by: syzbot+40642be9b7e0bb28e0df@syzkaller.appspotmail.com
Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff e5a0db6a Tue Oct 25 00:50:25 MDT 2022 Yangtao Li <frank.li@vivo.com> f2fs: replace gc_urgent_high_remaining with gc_remaining_trials

The user can set the trial count limit for GC urgent and
idle mode with replaced gc_remaining_trials.. If GC thread gets
to the limit, the mode will turn back to GC normal mode finally.

It was applied only to GC_URGENT, while this patch expands it for
GC_IDLE.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 930e2607 Tue Apr 12 15:45:50 MDT 2022 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: remove obsolete whint_mode

This patch removes obsolete whint_mode.

Fixes: 41d36a9f3e53 ("fs: remove kiocb.ki_hint")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 6ce19aff Thu May 20 05:51:50 MDT 2021 Chao Yu <chao@kernel.org> f2fs: compress: add compress_inode to cache compressed blocks

Support to use address space of inner inode to cache compressed block,
in order to improve cache hit ratio of random read.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 6d1349c7 Fri Sep 18 14:45:50 MDT 2020 Al Viro <viro@zeniv.linux.org.uk> [PATCH] reduce boilerplate in fsid handling

Get rid of boilerplate in most of ->statfs()
instances...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
diff ed318a6c Tue May 12 17:32:50 MDT 2020 Eric Biggers <ebiggers@google.com> fscrypt: support test_dummy_encryption=v2

v1 encryption policies are deprecated in favor of v2, and some new
features (e.g. encryption+casefolding) are only being added for v2.

Therefore, the "test_dummy_encryption" mount option (which is used for
encryption I/O testing with xfstests) needs to support v2 policies.

To do this, extend its syntax to be "test_dummy_encryption=v1" or
"test_dummy_encryption=v2". The existing "test_dummy_encryption" (no
argument) also continues to be accepted, to specify the default setting
-- currently v1, but the next patch changes it to v2.

To cleanly support both v1 and v2 while also making it easy to support
specifying other encryption settings in the future (say, accepting
"$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a
pointer to the dummy fscrypt_context rather than using mount flags.

To avoid concurrency issues, don't allow test_dummy_encryption to be set
or changed during a remount. (The former restriction is new, but
xfstests doesn't run into it, so no one should notice.)

Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'. On ext4,
there are two regressions, both of which are test bugs: ext4/023 and
ext4/028 fail because they set an xattr and expect it to be stored
inline, but the increase in size of the fscrypt_context from
24 to 40 bytes causes this xattr to be spilled into an external block.

Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.org
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
H A Ddata.cdiff 7d009e04 Thu Feb 22 05:18:50 MST 2024 Chao Yu <chao@kernel.org> f2fs: fix to handle segment allocation failure correctly

If CONFIG_F2FS_CHECK_FS is off, and for very rare corner case that
we run out of free segment, we should not panic kernel, instead,
let's handle such error correctly in its caller.

Signed-off-by: Chao Yu <chao@kernel.org>
Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 14db0b3c Mon Aug 15 17:50:51 MDT 2022 Eric Biggers <ebiggers@google.com> fscrypt: stop using PG_error to track error status

As a step towards freeing the PG_error flag for other uses, change ext4
and f2fs to stop using PG_error to track decryption errors. Instead, if
a decryption error occurs, just mark the whole bio as failed. The
coarser granularity isn't really a problem since it isn't any worse than
what the block layer provides, and errors from a multi-page readahead
aren't reported to applications unless a single-page read fails too.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <chao@kernel.org> # for f2fs part
Link: https://lore.kernel.org/r/20220815235052.86545-2-ebiggers@kernel.org
diff bbe1da7e Thu Aug 05 18:02:50 MDT 2021 Chao Yu <chao@kernel.org> f2fs: compress: do sanity check on cluster

This patch adds f2fs_sanity_check_cluster() to support doing
sanity check on cluster of compressed file, it will be triggered
from below two paths:

- __f2fs_cluster_blocks()
- f2fs_map_blocks(F2FS_GET_BLOCK_FIEMAP)

And it can detect below three kind of cluster insanity status.

C: COMPRESS_ADDR
N: NULL_ADDR or NEW_ADDR
V: valid blkaddr
*: any value

1. [*|C|*|*]
2. [C|*|C|*]
3. [C|N|N|V]

Signed-off-by: Chao Yu <chao@kernel.org>
[Nathan Chancellor: fix missing inline warning]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 7eab7a69 Tue Jun 22 05:50:59 MDT 2021 Fengnan Chang <changfengnan@vivo.com> f2fs: compress: remove unneeded read when rewrite whole cluster

when we overwrite the whole page in cluster, we don't need read original
data before write, because after write_end(), writepages() can help to
load left data in that cluster.

Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 6ce19aff Thu May 20 05:51:50 MDT 2021 Chao Yu <chao@kernel.org> f2fs: compress: add compress_inode to cache compressed blocks

Support to use address space of inner inode to cache compressed block,
in order to improve cache hit ratio of random read.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff adfc6943 Wed Sep 23 01:54:50 MDT 2020 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: fix slab leak of rpages pointer

This fixes the below mem leak.

[ 130.157600] =============================================================================
[ 130.159662] BUG f2fs_page_array_entry-252:16 (Tainted: G W O ): Objects remaining in f2fs_page_array_entry-252:16 on __kmem_cache_shutdown()
[ 130.162742] -----------------------------------------------------------------------------
[ 130.162742]
[ 130.164979] Disabling lock debugging due to kernel taint
[ 130.166188] INFO: Slab 0x000000009f5a52d2 objects=22 used=4 fp=0x00000000ba72c3e9 flags=0xfffffc0010200
[ 130.168269] CPU: 7 PID: 3560 Comm: umount Tainted: G B W O 5.9.0-rc4+ #35
[ 130.170019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
[ 130.171941] Call Trace:
[ 130.172528] dump_stack+0x74/0x9a
[ 130.173298] slab_err+0xb7/0xdc
[ 130.174044] ? kernel_poison_pages+0xc0/0xc0
[ 130.175065] ? on_each_cpu_cond_mask+0x48/0x90
[ 130.176096] __kmem_cache_shutdown.cold+0x34/0x141
[ 130.177190] kmem_cache_destroy+0x59/0x100
[ 130.178223] f2fs_destroy_page_array_cache+0x15/0x20 [f2fs]
[ 130.179527] f2fs_put_super+0x1bc/0x380 [f2fs]
[ 130.180538] generic_shutdown_super+0x72/0x110
[ 130.181547] kill_block_super+0x27/0x50
[ 130.182438] kill_f2fs_super+0x76/0xe0 [f2fs]
[ 130.183448] deactivate_locked_super+0x3b/0x80
[ 130.184456] deactivate_super+0x3e/0x50
[ 130.185363] cleanup_mnt+0x109/0x160
[ 130.186179] __cleanup_mnt+0x12/0x20
[ 130.187003] task_work_run+0x70/0xb0
[ 130.187841] exit_to_user_mode_prepare+0x18f/0x1b0
[ 130.188917] syscall_exit_to_user_mode+0x31/0x170
[ 130.189989] do_syscall_64+0x45/0x90
[ 130.190828] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 130.191986] RIP: 0033:0x7faf868ea2eb
[ 130.192815] Code: 7b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 7b 0c 00 f7 d8 64 89 01
[ 130.196872] RSP: 002b:00007fffb7edb478 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 130.198494] RAX: 0000000000000000 RBX: 00007faf86a18204 RCX: 00007faf868ea2eb
[ 130.201021] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055971df71c50
[ 130.203415] RBP: 000055971df71a40 R08: 0000000000000000 R09: 00007fffb7eda1f0
[ 130.205772] R10: 00007faf86a04339 R11: 0000000000000246 R12: 000055971df71c50
[ 130.208150] R13: 0000000000000000 R14: 000055971df71b38 R15: 0000000000000000
[ 130.210515] INFO: Object 0x00000000a980843a @offset=744
[ 130.212476] INFO: Allocated in page_array_alloc+0x3d/0xe0 [f2fs] age=1572 cpu=0 pid=3297
[ 130.215030] __slab_alloc+0x20/0x40
[ 130.216566] kmem_cache_alloc+0x2a0/0x2e0
[ 130.218217] page_array_alloc+0x3d/0xe0 [f2fs]
[ 130.219940] f2fs_init_compress_ctx+0x1f/0x40 [f2fs]
[ 130.221736] f2fs_write_cache_pages+0x3db/0x860 [f2fs]
[ 130.223591] f2fs_write_data_pages+0x2c9/0x300 [f2fs]
[ 130.225414] do_writepages+0x43/0xd0
[ 130.226907] __filemap_fdatawrite_range+0xd5/0x110
[ 130.228632] filemap_write_and_wait_range+0x48/0xb0
[ 130.230336] __generic_file_write_iter+0x18a/0x1d0
[ 130.232035] f2fs_file_write_iter+0x226/0x550 [f2fs]
[ 130.233737] new_sync_write+0x113/0x1a0
[ 130.235204] vfs_write+0x1a6/0x200
[ 130.236579] ksys_write+0x67/0xe0
[ 130.237898] __x64_sys_write+0x1a/0x20
[ 130.239309] do_syscall_64+0x38/0x90

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5df7731f Mon Feb 17 02:45:44 MST 2020 Chao Yu <chao@kernel.org> f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
H A Dfile.cdiff 7d009e04 Thu Feb 22 05:18:50 MST 2024 Chao Yu <chao@kernel.org> f2fs: fix to handle segment allocation failure correctly

If CONFIG_F2FS_CHECK_FS is off, and for very rare corner case that
we run out of free segment, we should not panic kernel, instead,
let's handle such error correctly in its caller.

Signed-off-by: Chao Yu <chao@kernel.org>
Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff a6ec8378 Thu Jun 29 05:11:44 MDT 2023 Chao Yu <chao@kernel.org> f2fs: fix to do sanity check on direct node in truncate_dnode()

syzbot reports below bug:

BUG: KASAN: slab-use-after-free in f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574
Read of size 4 at addr ffff88802a25c000 by task syz-executor148/5000

CPU: 1 PID: 5000 Comm: syz-executor148 Not tainted 6.4.0-rc7-syzkaller-00041-ge660abd551f1 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:351
print_report mm/kasan/report.c:462 [inline]
kasan_report+0x11c/0x130 mm/kasan/report.c:572
f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574
truncate_dnode+0x229/0x2e0 fs/f2fs/node.c:944
f2fs_truncate_inode_blocks+0x64b/0xde0 fs/f2fs/node.c:1154
f2fs_do_truncate_blocks+0x4ac/0xf30 fs/f2fs/file.c:721
f2fs_truncate_blocks+0x7b/0x300 fs/f2fs/file.c:749
f2fs_truncate.part.0+0x4a5/0x630 fs/f2fs/file.c:799
f2fs_truncate include/linux/fs.h:825 [inline]
f2fs_setattr+0x1738/0x2090 fs/f2fs/file.c:1006
notify_change+0xb2c/0x1180 fs/attr.c:483
do_truncate+0x143/0x200 fs/open.c:66
handle_truncate fs/namei.c:3295 [inline]
do_open fs/namei.c:3640 [inline]
path_openat+0x2083/0x2750 fs/namei.c:3791
do_filp_open+0x1ba/0x410 fs/namei.c:3818
do_sys_openat2+0x16d/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_creat fs/open.c:1448 [inline]
__se_sys_creat fs/open.c:1442 [inline]
__x64_sys_creat+0xcd/0x120 fs/open.c:1442
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

The root cause is, inodeA references inodeB via inodeB's ino, once inodeA
is truncated, it calls truncate_dnode() to truncate data blocks in inodeB's
node page, it traverse mapping data from node->i.i_addr[0] to
node->i.i_addr[ADDRS_PER_BLOCK() - 1], result in out-of-boundary access.

This patch fixes to add sanity check on dnode page in truncate_dnode(),
so that, it can help to avoid triggering such issue, and once it encounters
such issue, it will record newly introduced ERROR_INVALID_NODE_REFERENCE
error into superblock, later fsck can detect such issue and try repairing.

Also, it removes f2fs_truncate_data_blocks() for cleanup due to the
function has only one caller, and uses f2fs_truncate_data_blocks_range()
instead.

Reported-and-tested-by: syzbot+12cb4425b22169b52036@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000f3038a05fef867f8@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff ceb11d0e Mon May 22 07:50:06 MDT 2023 David Howells <dhowells@redhat.com> f2fs: Provide a splice-read wrapper

Provide a splice_read wrapper for f2fs. This does some checks and tracing
before calling filemap_splice_read() and will update the iostats
afterwards. Direct I/O is handled by the caller.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Jaegeuk Kim <jaegeuk@kernel.org>
cc: Chao Yu <chao@kernel.org>
cc: linux-f2fs-devel@lists.sourceforge.net
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20230522135018.2742245-20-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 50aa6f44 Thu Apr 13 04:17:11 MDT 2023 Wu Bo <bo.wu@vivo.com> f2fs: support iopoll method

Wire up the iopoll method to the common implementation.
As f2fs use common dio infrastructure:
commit a1e09b03e6f5 ("f2fs: use iomap for direct I/O")

Signed-off-by: Wu Bo <bo.wu@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 8051692f Tue Feb 21 07:45:50 MST 2023 Yangtao Li <frank.li@vivo.com> f2fs: handle dqget error in f2fs_transfer_project_quota()

We should set the error code when dqget() failed.

Fixes: 2c1d03056991 ("f2fs: support F2FS_IOC_FS{GET,SET}XATTR")
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5eaac835 Fri Dec 16 08:50:00 MST 2022 Chao Yu <chao@kernel.org> f2fs: fix to avoid potential deadlock

There is a potential deadlock reported by syzbot as below:

F2FS-fs (loop2): invalid crc value
F2FS-fs (loop2): Found nat_bits in checkpoint
F2FS-fs (loop2): Mounted with checkpoint version = 48b305e4
======================================================
WARNING: possible circular locking dependency detected
6.1.0-rc8-syzkaller-33330-ga5541c0811a0 #0 Not tainted
------------------------------------------------------
syz-executor.2/32123 is trying to acquire lock:
ffff0000c0e1a608 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x54/0xb4 mm/memory.c:5644

but task is already holding lock:
ffff0001317c6088 (&sbi->sb_lock){++++}-{3:3}, at: f2fs_down_write fs/f2fs/f2fs.h:2205 [inline]
ffff0001317c6088 (&sbi->sb_lock){++++}-{3:3}, at: f2fs_ioc_get_encryption_pwsalt fs/f2fs/file.c:2334 [inline]
ffff0001317c6088 (&sbi->sb_lock){++++}-{3:3}, at: __f2fs_ioctl+0x1370/0x3318 fs/f2fs/file.c:4151

which lock already depends on the new lock.

Chain exists of:
&mm->mmap_lock --> &nm_i->nat_tree_lock --> &sbi->sb_lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&sbi->sb_lock);
lock(&nm_i->nat_tree_lock);
lock(&sbi->sb_lock);
lock(&mm->mmap_lock);

Let's try to avoid above deadlock condition by moving __might_fault()
out of sbi->sb_lock coverage.

Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock")
Link: https://lore.kernel.org/linux-f2fs-devel/000000000000cd5fe305ef617fe2@google.com/T/#u
Reported-by: syzbot+4793f6096d174c90b4f7@syzkaller.appspotmail.com
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 71644dff Thu Dec 01 18:37:15 MST 2022 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: add block_age-based extent cache

This patch introduces a runtime hot/cold data separation method
for f2fs, in order to improve the accuracy for data temperature
classification, reduce the garbage collection overhead after
long-term data updates.

Enhanced hot/cold data separation can record data block update
frequency as "age" of the extent per inode, and take use of the age
info to indicate better temperature type for data block allocation:
- It records total data blocks allocated since mount;
- When file extent has been updated, it calculate the count of data
blocks allocated since last update as the age of the extent;
- Before the data block allocated, it searches for the age info and
chooses the suitable segment for allocation.

Test and result:
- Prepare: create about 30000 files
* 3% for cold files (with cold file extension like .apk, from 3M to 10M)
* 50% for warm files (with random file extension like .FcDxq, from 1K
to 4M)
* 47% for hot files (with hot file extension like .db, from 1K to 256K)
- create(5%)/random update(90%)/delete(5%) the files
* total write amount is about 70G
* fsync will be called for .db files, and buffered write will be used
for other files

The storage of test device is large enough(128G) so that it will not
switch to SSR mode during the test.

Benefit: dirty segment count increment reduce about 14%
- before: Dirty +21110
- after: Dirty +18286

Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com>
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff c8c02272 Sat Aug 27 00:58:50 MDT 2022 Eric Biggers <ebiggers@google.com> f2fs: support STATX_DIOALIGN

Add support for STATX_DIOALIGN to f2fs, so that direct I/O alignment
restrictions are exposed to userspace in a generic way.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Link: https://lore.kernel.org/r/20220827065851.135710-8-ebiggers@kernel.org
diff ee446e1a Sat Mar 14 14:50:51 MDT 2020 Eric Biggers <ebiggers@google.com> f2fs: wire up FS_IOC_GET_ENCRYPTION_NONCE

This new ioctl retrieves a file's encryption nonce, which is useful for
testing. See the corresponding fs/crypto/ patch for more details.

Link: https://lore.kernel.org/r/20200314205052.93294-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
diff 2536ac68 Tue Mar 10 06:50:09 MDT 2020 Chao Yu <chao@kernel.org> f2fs: allow to clear F2FS_COMPR_FL flag

If regular inode has no compressed cluster, allow using 'chattr -c'
to remove its compress flag, recovering it to a non-compressed file.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
H A Df2fs.hdiff 7d009e04 Thu Feb 22 05:18:50 MST 2024 Chao Yu <chao@kernel.org> f2fs: fix to handle segment allocation failure correctly

If CONFIG_F2FS_CHECK_FS is off, and for very rare corner case that
we run out of free segment, we should not panic kernel, instead,
let's handle such error correctly in its caller.

Signed-off-by: Chao Yu <chao@kernel.org>
Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5c13e238 Fri Aug 18 12:34:32 MDT 2023 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: avoid false alarm of circular locking

======================================================
WARNING: possible circular locking dependency detected
6.5.0-rc5-syzkaller-00353-gae545c3283dc #0 Not tainted
------------------------------------------------------
syz-executor273/5027 is trying to acquire lock:
ffff888077fe1fb0 (&fi->i_sem){+.+.}-{3:3}, at: f2fs_down_write fs/f2fs/f2fs.h:2133 [inline]
ffff888077fe1fb0 (&fi->i_sem){+.+.}-{3:3}, at: f2fs_add_inline_entry+0x300/0x6f0 fs/f2fs/inline.c:644

but task is already holding lock:
ffff888077fe07c8 (&fi->i_xattr_sem){.+.+}-{3:3}, at: f2fs_down_read fs/f2fs/f2fs.h:2108 [inline]
ffff888077fe07c8 (&fi->i_xattr_sem){.+.+}-{3:3}, at: f2fs_add_dentry+0x92/0x230 fs/f2fs/dir.c:783

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&fi->i_xattr_sem){.+.+}-{3:3}:
down_read+0x9c/0x470 kernel/locking/rwsem.c:1520
f2fs_down_read fs/f2fs/f2fs.h:2108 [inline]
f2fs_getxattr+0xb1e/0x12c0 fs/f2fs/xattr.c:532
__f2fs_get_acl+0x5a/0x900 fs/f2fs/acl.c:179
f2fs_acl_create fs/f2fs/acl.c:377 [inline]
f2fs_init_acl+0x15c/0xb30 fs/f2fs/acl.c:420
f2fs_init_inode_metadata+0x159/0x1290 fs/f2fs/dir.c:558
f2fs_add_regular_entry+0x79e/0xb90 fs/f2fs/dir.c:740
f2fs_add_dentry+0x1de/0x230 fs/f2fs/dir.c:788
f2fs_do_add_link+0x190/0x280 fs/f2fs/dir.c:827
f2fs_add_link fs/f2fs/f2fs.h:3554 [inline]
f2fs_mkdir+0x377/0x620 fs/f2fs/namei.c:781
vfs_mkdir+0x532/0x7e0 fs/namei.c:4117
do_mkdirat+0x2a9/0x330 fs/namei.c:4140
__do_sys_mkdir fs/namei.c:4160 [inline]
__se_sys_mkdir fs/namei.c:4158 [inline]
__x64_sys_mkdir+0xf2/0x140 fs/namei.c:4158
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

-> #0 (&fi->i_sem){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x2e3d/0x5de0 kernel/locking/lockdep.c:5144
lock_acquire kernel/locking/lockdep.c:5761 [inline]
lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726
down_write+0x93/0x200 kernel/locking/rwsem.c:1573
f2fs_down_write fs/f2fs/f2fs.h:2133 [inline]
f2fs_add_inline_entry+0x300/0x6f0 fs/f2fs/inline.c:644
f2fs_add_dentry+0xa6/0x230 fs/f2fs/dir.c:784
f2fs_do_add_link+0x190/0x280 fs/f2fs/dir.c:827
f2fs_add_link fs/f2fs/f2fs.h:3554 [inline]
f2fs_mkdir+0x377/0x620 fs/f2fs/namei.c:781
vfs_mkdir+0x532/0x7e0 fs/namei.c:4117
ovl_do_mkdir fs/overlayfs/overlayfs.h:196 [inline]
ovl_mkdir_real+0xb5/0x370 fs/overlayfs/dir.c:146
ovl_workdir_create+0x3de/0x820 fs/overlayfs/super.c:309
ovl_make_workdir fs/overlayfs/super.c:711 [inline]
ovl_get_workdir fs/overlayfs/super.c:864 [inline]
ovl_fill_super+0xdab/0x6180 fs/overlayfs/super.c:1400
vfs_get_super+0xf9/0x290 fs/super.c:1152
vfs_get_tree+0x88/0x350 fs/super.c:1519
do_new_mount fs/namespace.c:3335 [inline]
path_mount+0x1492/0x1ed0 fs/namespace.c:3662
do_mount fs/namespace.c:3675 [inline]
__do_sys_mount fs/namespace.c:3884 [inline]
__se_sys_mount fs/namespace.c:3861 [inline]
__x64_sys_mount+0x293/0x310 fs/namespace.c:3861
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
rlock(&fi->i_xattr_sem);
lock(&fi->i_sem);
lock(&fi->i_xattr_sem);
lock(&fi->i_sem);

Cc: <stable@vger.kernel.org>
Reported-and-tested-by: syzbot+e5600587fa9cbf8e3826@syzkaller.appspotmail.com
Fixes: 5eda1ad1aaff "f2fs: fix deadlock in i_xattr_sem and inode page lock"
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 5c13e238 Fri Aug 18 12:34:32 MDT 2023 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: avoid false alarm of circular locking

======================================================
WARNING: possible circular locking dependency detected
6.5.0-rc5-syzkaller-00353-gae545c3283dc #0 Not tainted
------------------------------------------------------
syz-executor273/5027 is trying to acquire lock:
ffff888077fe1fb0 (&fi->i_sem){+.+.}-{3:3}, at: f2fs_down_write fs/f2fs/f2fs.h:2133 [inline]
ffff888077fe1fb0 (&fi->i_sem){+.+.}-{3:3}, at: f2fs_add_inline_entry+0x300/0x6f0 fs/f2fs/inline.c:644

but task is already holding lock:
ffff888077fe07c8 (&fi->i_xattr_sem){.+.+}-{3:3}, at: f2fs_down_read fs/f2fs/f2fs.h:2108 [inline]
ffff888077fe07c8 (&fi->i_xattr_sem){.+.+}-{3:3}, at: f2fs_add_dentry+0x92/0x230 fs/f2fs/dir.c:783

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&fi->i_xattr_sem){.+.+}-{3:3}:
down_read+0x9c/0x470 kernel/locking/rwsem.c:1520
f2fs_down_read fs/f2fs/f2fs.h:2108 [inline]
f2fs_getxattr+0xb1e/0x12c0 fs/f2fs/xattr.c:532
__f2fs_get_acl+0x5a/0x900 fs/f2fs/acl.c:179
f2fs_acl_create fs/f2fs/acl.c:377 [inline]
f2fs_init_acl+0x15c/0xb30 fs/f2fs/acl.c:420
f2fs_init_inode_metadata+0x159/0x1290 fs/f2fs/dir.c:558
f2fs_add_regular_entry+0x79e/0xb90 fs/f2fs/dir.c:740
f2fs_add_dentry+0x1de/0x230 fs/f2fs/dir.c:788
f2fs_do_add_link+0x190/0x280 fs/f2fs/dir.c:827
f2fs_add_link fs/f2fs/f2fs.h:3554 [inline]
f2fs_mkdir+0x377/0x620 fs/f2fs/namei.c:781
vfs_mkdir+0x532/0x7e0 fs/namei.c:4117
do_mkdirat+0x2a9/0x330 fs/namei.c:4140
__do_sys_mkdir fs/namei.c:4160 [inline]
__se_sys_mkdir fs/namei.c:4158 [inline]
__x64_sys_mkdir+0xf2/0x140 fs/namei.c:4158
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

-> #0 (&fi->i_sem){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3142 [inline]
check_prevs_add kernel/locking/lockdep.c:3261 [inline]
validate_chain kernel/locking/lockdep.c:3876 [inline]
__lock_acquire+0x2e3d/0x5de0 kernel/locking/lockdep.c:5144
lock_acquire kernel/locking/lockdep.c:5761 [inline]
lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726
down_write+0x93/0x200 kernel/locking/rwsem.c:1573
f2fs_down_write fs/f2fs/f2fs.h:2133 [inline]
f2fs_add_inline_entry+0x300/0x6f0 fs/f2fs/inline.c:644
f2fs_add_dentry+0xa6/0x230 fs/f2fs/dir.c:784
f2fs_do_add_link+0x190/0x280 fs/f2fs/dir.c:827
f2fs_add_link fs/f2fs/f2fs.h:3554 [inline]
f2fs_mkdir+0x377/0x620 fs/f2fs/namei.c:781
vfs_mkdir+0x532/0x7e0 fs/namei.c:4117
ovl_do_mkdir fs/overlayfs/overlayfs.h:196 [inline]
ovl_mkdir_real+0xb5/0x370 fs/overlayfs/dir.c:146
ovl_workdir_create+0x3de/0x820 fs/overlayfs/super.c:309
ovl_make_workdir fs/overlayfs/super.c:711 [inline]
ovl_get_workdir fs/overlayfs/super.c:864 [inline]
ovl_fill_super+0xdab/0x6180 fs/overlayfs/super.c:1400
vfs_get_super+0xf9/0x290 fs/super.c:1152
vfs_get_tree+0x88/0x350 fs/super.c:1519
do_new_mount fs/namespace.c:3335 [inline]
path_mount+0x1492/0x1ed0 fs/namespace.c:3662
do_mount fs/namespace.c:3675 [inline]
__do_sys_mount fs/namespace.c:3884 [inline]
__se_sys_mount fs/namespace.c:3861 [inline]
__x64_sys_mount+0x293/0x310 fs/namespace.c:3861
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
rlock(&fi->i_xattr_sem);
lock(&fi->i_sem);
lock(&fi->i_xattr_sem);
lock(&fi->i_sem);

Cc: <stable@vger.kernel.org>
Reported-and-tested-by: syzbot+e5600587fa9cbf8e3826@syzkaller.appspotmail.com
Fixes: 5eda1ad1aaff "f2fs: fix deadlock in i_xattr_sem and inode page lock"
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff a6ec8378 Thu Jun 29 05:11:44 MDT 2023 Chao Yu <chao@kernel.org> f2fs: fix to do sanity check on direct node in truncate_dnode()

syzbot reports below bug:

BUG: KASAN: slab-use-after-free in f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574
Read of size 4 at addr ffff88802a25c000 by task syz-executor148/5000

CPU: 1 PID: 5000 Comm: syz-executor148 Not tainted 6.4.0-rc7-syzkaller-00041-ge660abd551f1 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:351
print_report mm/kasan/report.c:462 [inline]
kasan_report+0x11c/0x130 mm/kasan/report.c:572
f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574
truncate_dnode+0x229/0x2e0 fs/f2fs/node.c:944
f2fs_truncate_inode_blocks+0x64b/0xde0 fs/f2fs/node.c:1154
f2fs_do_truncate_blocks+0x4ac/0xf30 fs/f2fs/file.c:721
f2fs_truncate_blocks+0x7b/0x300 fs/f2fs/file.c:749
f2fs_truncate.part.0+0x4a5/0x630 fs/f2fs/file.c:799
f2fs_truncate include/linux/fs.h:825 [inline]
f2fs_setattr+0x1738/0x2090 fs/f2fs/file.c:1006
notify_change+0xb2c/0x1180 fs/attr.c:483
do_truncate+0x143/0x200 fs/open.c:66
handle_truncate fs/namei.c:3295 [inline]
do_open fs/namei.c:3640 [inline]
path_openat+0x2083/0x2750 fs/namei.c:3791
do_filp_open+0x1ba/0x410 fs/namei.c:3818
do_sys_openat2+0x16d/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_creat fs/open.c:1448 [inline]
__se_sys_creat fs/open.c:1442 [inline]
__x64_sys_creat+0xcd/0x120 fs/open.c:1442
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

The root cause is, inodeA references inodeB via inodeB's ino, once inodeA
is truncated, it calls truncate_dnode() to truncate data blocks in inodeB's
node page, it traverse mapping data from node->i.i_addr[0] to
node->i.i_addr[ADDRS_PER_BLOCK() - 1], result in out-of-boundary access.

This patch fixes to add sanity check on dnode page in truncate_dnode(),
so that, it can help to avoid triggering such issue, and once it encounters
such issue, it will record newly introduced ERROR_INVALID_NODE_REFERENCE
error into superblock, later fsck can detect such issue and try repairing.

Also, it removes f2fs_truncate_data_blocks() for cleanup due to the
function has only one caller, and uses f2fs_truncate_data_blocks_range()
instead.

Reported-and-tested-by: syzbot+12cb4425b22169b52036@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000f3038a05fef867f8@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 77e820ea Wed May 17 03:42:50 MDT 2023 Chao Yu <chao@kernel.org> f2fs: renew value of F2FS_FEATURE_*

Define F2FS_FEATURE_* macro w/ 32-bits value rather than 16-bits value.

No logic changes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 71644dff Thu Dec 01 18:37:15 MST 2022 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: add block_age-based extent cache

This patch introduces a runtime hot/cold data separation method
for f2fs, in order to improve the accuracy for data temperature
classification, reduce the garbage collection overhead after
long-term data updates.

Enhanced hot/cold data separation can record data block update
frequency as "age" of the extent per inode, and take use of the age
info to indicate better temperature type for data block allocation:
- It records total data blocks allocated since mount;
- When file extent has been updated, it calculate the count of data
blocks allocated since last update as the age of the extent;
- Before the data block allocated, it searches for the age info and
chooses the suitable segment for allocation.

Test and result:
- Prepare: create about 30000 files
* 3% for cold files (with cold file extension like .apk, from 3M to 10M)
* 50% for warm files (with random file extension like .FcDxq, from 1K
to 4M)
* 47% for hot files (with hot file extension like .db, from 1K to 256K)
- create(5%)/random update(90%)/delete(5%) the files
* total write amount is about 70G
* fsync will be called for .db files, and buffered write will be used
for other files

The storage of test device is large enough(128G) so that it will not
switch to SSR mode during the test.

Benefit: dirty segment count increment reduce about 14%
- before: Dirty +21110
- after: Dirty +18286

Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com>
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff e5a0db6a Tue Oct 25 00:50:25 MDT 2022 Yangtao Li <frank.li@vivo.com> f2fs: replace gc_urgent_high_remaining with gc_remaining_trials

The user can set the trial count limit for GC urgent and
idle mode with replaced gc_remaining_trials.. If GC thread gets
to the limit, the mode will turn back to GC normal mode finally.

It was applied only to GC_URGENT, while this patch expands it for
GC_IDLE.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 930e2607 Tue Apr 12 15:45:50 MDT 2022 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: remove obsolete whint_mode

This patch removes obsolete whint_mode.

Fixes: 41d36a9f3e53 ("fs: remove kiocb.ki_hint")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 50c63009 Wed Mar 02 12:39:08 MST 2022 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: avoid an infinite loop in f2fs_sync_dirty_inodes

If one read IO is always failing, we can fall into an infinite loop in
f2fs_sync_dirty_inodes. This happens during xfstests/generic/475.

[ 142.803335] Buffer I/O error on dev dm-1, logical block 8388592, async page read
...
[ 382.887210] submit_bio_noacct+0xdd/0x2a0
[ 382.887213] submit_bio+0x80/0x110
[ 382.887223] __submit_bio+0x4d/0x300 [f2fs]
[ 382.887282] f2fs_submit_page_bio+0x125/0x200 [f2fs]
[ 382.887299] __get_meta_page+0xc9/0x280 [f2fs]
[ 382.887315] f2fs_get_meta_page+0x13/0x20 [f2fs]
[ 382.887331] f2fs_get_node_info+0x317/0x3c0 [f2fs]
[ 382.887350] f2fs_do_write_data_page+0x327/0x6f0 [f2fs]
[ 382.887367] f2fs_write_single_data_page+0x5b7/0x960 [f2fs]
[ 382.887386] f2fs_write_cache_pages+0x302/0x890 [f2fs]
[ 382.887405] ? preempt_count_add+0x7a/0xc0
[ 382.887408] f2fs_write_data_pages+0xfd/0x320 [f2fs]
[ 382.887425] ? _raw_spin_unlock+0x1a/0x30
[ 382.887428] do_writepages+0xd3/0x1d0
[ 382.887432] filemap_fdatawrite_wbc+0x69/0x90
[ 382.887434] filemap_fdatawrite+0x50/0x70
[ 382.887437] f2fs_sync_dirty_inodes+0xa4/0x270 [f2fs]
[ 382.887453] f2fs_write_checkpoint+0x189/0x1640 [f2fs]
[ 382.887469] ? schedule_timeout+0x114/0x150
[ 382.887471] ? ttwu_do_activate+0x6d/0xb0
[ 382.887473] ? preempt_count_add+0x7a/0xc0
[ 382.887476] kill_f2fs_super+0xca/0x100 [f2fs]
[ 382.887491] deactivate_locked_super+0x35/0xa0
[ 382.887494] deactivate_super+0x40/0x50
[ 382.887497] cleanup_mnt+0x139/0x190
[ 382.887499] __cleanup_mnt+0x12/0x20
[ 382.887501] task_work_run+0x64/0xa0
[ 382.887505] exit_to_user_mode_prepare+0x1b7/0x1c0
[ 382.887508] syscall_exit_to_user_mode+0x27/0x50
[ 382.887510] do_syscall_64+0x48/0xc0
[ 382.887513] entry_SYSCALL_64_after_hwframe+0x44/0xae

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff bbe1da7e Thu Aug 05 18:02:50 MDT 2021 Chao Yu <chao@kernel.org> f2fs: compress: do sanity check on cluster

This patch adds f2fs_sanity_check_cluster() to support doing
sanity check on cluster of compressed file, it will be triggered
from below two paths:

- __f2fs_cluster_blocks()
- f2fs_map_blocks(F2FS_GET_BLOCK_FIEMAP)

And it can detect below three kind of cluster insanity status.

C: COMPRESS_ADDR
N: NULL_ADDR or NEW_ADDR
V: valid blkaddr
*: any value

1. [*|C|*|*]
2. [C|*|C|*]
3. [C|N|N|V]

Signed-off-by: Chao Yu <chao@kernel.org>
[Nathan Chancellor: fix missing inline warning]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Completed in 1048 milliseconds