History log of /linux-master/fs/f2fs/extent_cache.c
Revision Date Author Comments
# 31f85ccc 07-Mar-2024 Zhiguo Niu <zhiguo.niu@unisoc.com>

f2fs: unify the error handling of f2fs_is_valid_blkaddr

There are some cases of f2fs_is_valid_blkaddr not handled as
ERROR_INVALID_BLKADDR,so unify the error handling about all of
f2fs_is_valid_blkaddr.
Do f2fs_handle_error in __f2fs_is_valid_blkaddr for cleanup.

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f8039821 07-Sep-2023 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: split initial and dynamic conditions for extent_cache

Let's allocate the extent_cache tree without dynamic conditions to avoid a
missing condition causing a panic as below.

# create a file w/ a compressed flag
# disable the compression
# panic while updating extent_cache

F2FS-fs (dm-64): Swapfile: last extent is not aligned to section
F2FS-fs (dm-64): Swapfile (3) is not align to section: 1) creat(), 2) ioctl(F2FS_IOC_SET_PIN_FILE), 3) fallocate(2097152 * N)
Adding 124996k swap on ./swap-file. Priority:0 extents:2 across:17179494468k
==================================================================
BUG: KASAN: null-ptr-deref in instrument_atomic_read_write out/common/include/linux/instrumented.h:101 [inline]
BUG: KASAN: null-ptr-deref in atomic_try_cmpxchg_acquire out/common/include/asm-generic/atomic-instrumented.h:705 [inline]
BUG: KASAN: null-ptr-deref in queued_write_lock out/common/include/asm-generic/qrwlock.h:92 [inline]
BUG: KASAN: null-ptr-deref in __raw_write_lock out/common/include/linux/rwlock_api_smp.h:211 [inline]
BUG: KASAN: null-ptr-deref in _raw_write_lock+0x5a/0x110 out/common/kernel/locking/spinlock.c:295
Write of size 4 at addr 0000000000000030 by task syz-executor154/3327

CPU: 0 PID: 3327 Comm: syz-executor154 Tainted: G O 5.10.185 #1
Hardware name: emulation qemu-x86/qemu-x86, BIOS 2023.01-21885-gb3cc1cd24d 01/01/2023
Call Trace:
__dump_stack out/common/lib/dump_stack.c:77 [inline]
dump_stack_lvl+0x17e/0x1c4 out/common/lib/dump_stack.c:118
__kasan_report+0x16c/0x260 out/common/mm/kasan/report.c:415
kasan_report+0x51/0x70 out/common/mm/kasan/report.c:428
kasan_check_range+0x2f3/0x340 out/common/mm/kasan/generic.c:186
__kasan_check_write+0x14/0x20 out/common/mm/kasan/shadow.c:37
instrument_atomic_read_write out/common/include/linux/instrumented.h:101 [inline]
atomic_try_cmpxchg_acquire out/common/include/asm-generic/atomic-instrumented.h:705 [inline]
queued_write_lock out/common/include/asm-generic/qrwlock.h:92 [inline]
__raw_write_lock out/common/include/linux/rwlock_api_smp.h:211 [inline]
_raw_write_lock+0x5a/0x110 out/common/kernel/locking/spinlock.c:295
__drop_extent_tree+0xdf/0x2f0 out/common/fs/f2fs/extent_cache.c:1155
f2fs_drop_extent_tree+0x17/0x30 out/common/fs/f2fs/extent_cache.c:1172
f2fs_insert_range out/common/fs/f2fs/file.c:1600 [inline]
f2fs_fallocate+0x19fd/0x1f40 out/common/fs/f2fs/file.c:1764
vfs_fallocate+0x514/0x9b0 out/common/fs/open.c:310
ksys_fallocate out/common/fs/open.c:333 [inline]
__do_sys_fallocate out/common/fs/open.c:341 [inline]
__se_sys_fallocate out/common/fs/open.c:339 [inline]
__x64_sys_fallocate+0xb8/0x100 out/common/fs/open.c:339
do_syscall_64+0x35/0x50 out/common/arch/x86/entry/common.c:46

Cc: stable@vger.kernel.org
Fixes: 72840cccc0a1 ("f2fs: allocate the extent_cache by default")
Reported-and-tested-by: syzbot+d342e330a37b48c094b7@syzkaller.appspotmail.com
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 8375be2b 18-Apr-2023 Qi Han <hanqi@vivo.com>

f2fs: remove unnessary comment in __may_age_extent_tree

This comment make no sense and is in the wrong place, so let's
remove it.

Signed-off-by: Qi Han <hanqi@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bd90c5cd 06-Apr-2023 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: relax sanity check if checkpoint is corrupted

1. extent_cache
- let's drop the largest extent_cache
2. invalidate_block
- don't show the warnings

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bf21acf9 10-Mar-2023 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove entire rb_entry sharing

This is a last part to remove the memory sharing for rb_tree in extent_cache.

This should also fix arm32 memory alignment issue.

[struct extent_node] [struct rb_entry]
[0] struct rb_node rb_node; [0] struct rb_node rb_node;
union { union {
struct { struct {
[16] unsigned int fofs; [12] unsigned int ofs;
unsigned int len; unsigned int len;
};
unsigned long long key;
} __packed;

Cc: <stable@vger.kernel.org>
Fixes: 13054c548a1c ("f2fs: introduce infra macro and data structure of rb-tree extent cache")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f69475dd 10-Mar-2023 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: factor out discard_cmd usage from general rb_tree use

This is a second part to remove the mixed use of rb_tree in discard_cmd from
extent_cache.

This should also fix arm32 memory alignment issue caused by shared rb_entry.

[struct discard_cmd] [struct rb_entry]
[0] struct rb_node rb_node; [0] struct rb_node rb_node;
union { union {
struct { struct {
[16] block_t lstart; [12] unsigned int ofs;
block_t len; unsigned int len;
};
unsigned long long key;
} __packed;

Cc: <stable@vger.kernel.org>
Fixes: 004b68621897 ("f2fs: use rb-tree to track pending discard commands")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 043d2d00 10-Mar-2023 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: factor out victim_entry usage from general rb_tree use

Let's reduce the complexity of mixed use of rb_tree in victim_entry from
extent_cache and discard_cmd.

This should fix arm32 memory alignment issue caused by shared rb_entry.

[struct victim_entry] [struct rb_entry]
[0] struct rb_node rb_node; [0] struct rb_node rb_node;
union {
struct {
unsigned int ofs;
unsigned int len;
};
[16] unsigned long long mtime; [12] unsigned long long key;
} __packed;

Cc: <stable@vger.kernel.org>
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 269d1194 07-Feb-2023 Chao Yu <chao@kernel.org>

f2fs: fix to do sanity check on extent cache correctly

In do_read_inode(), sanity check for extent cache should be called after
f2fs_init_read_extent_tree(), fix it.

Fixes: 72840cccc0a1 ("f2fs: allocate the extent_cache by default")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 146949de 06-Feb-2023 Jinyoung CHOI <j-young.choi@samsung.com>

f2fs: fix typos in comments

This patch is to fix typos in f2fs files.

Signed-off-by: Jinyoung Choi <j-young.choi@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d23be468 04-Feb-2023 qixiaoyu1 <qxy65535@gmail.com>

f2fs: add sysfs nodes to set last_age_weight

Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com>
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b03a41a4 02-Feb-2023 qixiaoyu1 <qxy65535@gmail.com>

f2fs: fix wrong calculation of block age

Currently we wrongly calculate the new block age to
old * LAST_AGE_WEIGHT / 100.

Fix it to new * (100 - LAST_AGE_WEIGHT) / 100
+ old * LAST_AGE_WEIGHT / 100.

Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com>
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 04a91ab0 28-Nov-2022 Christoph Hellwig <hch@lst.de>

f2fs: add a f2fs_lookup_extent_cache_block helper

All but three callers of f2fs_lookup_extent_cache just want the block
address. Add a small helper to simplify them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# df9d44b6 21-Dec-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: let's avoid panic if extent_tree is not created

This patch avoids the below panic.

pc : __lookup_extent_tree+0xd8/0x760
lr : f2fs_do_write_data_page+0x104/0x87c
sp : ffffffc010cbb3c0
x29: ffffffc010cbb3e0 x28: 0000000000000000
x27: ffffff8803e7f020 x26: ffffff8803e7ed40
x25: ffffff8803e7f020 x24: ffffffc010cbb460
x23: ffffffc010cbb480 x22: 0000000000000000
x21: 0000000000000000 x20: ffffffff22e90900
x19: 0000000000000000 x18: ffffffc010c5d080
x17: 0000000000000000 x16: 0000000000000020
x15: ffffffdb1acdbb88 x14: ffffff888759e2b0
x13: 0000000000000000 x12: ffffff802da49000
x11: 000000000a001200 x10: ffffff8803e7ed40
x9 : ffffff8023195800 x8 : ffffff802da49078
x7 : 0000000000000001 x6 : 0000000000000000
x5 : 0000000000000006 x4 : ffffffc010cbba28
x3 : 0000000000000000 x2 : ffffffc010cbb480
x1 : 0000000000000000 x0 : ffffff8803e7ed40
Call trace:
__lookup_extent_tree+0xd8/0x760
f2fs_do_write_data_page+0x104/0x87c
f2fs_write_single_data_page+0x420/0xb60
f2fs_write_cache_pages+0x418/0xb1c
__f2fs_write_data_pages+0x428/0x58c
f2fs_write_data_pages+0x30/0x40
do_writepages+0x88/0x190
__writeback_single_inode+0x48/0x448
writeback_sb_inodes+0x468/0x9e8
__writeback_inodes_wb+0xb8/0x2a4
wb_writeback+0x33c/0x740
wb_do_writeback+0x2b4/0x400
wb_workfn+0xe4/0x34c
process_one_work+0x24c/0x5bc
worker_thread+0x3e8/0xa50
kthread+0x150/0x1b4

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 22a341b4 16-Dec-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: should use a temp extent_info for lookup

Otherwise, __lookup_extent_tree() will override the given extent_info which will
be used by caller.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ed272476 16-Dec-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: don't mix to use union values in extent_info

Let's explicitly use the defined values in block_age case only.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# fe59109a 16-Dec-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: initialize extent_cache parameter

This can avoid confusing tracepoint values.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 71644dff 01-Dec-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add block_age-based extent cache

This patch introduces a runtime hot/cold data separation method
for f2fs, in order to improve the accuracy for data temperature
classification, reduce the garbage collection overhead after
long-term data updates.

Enhanced hot/cold data separation can record data block update
frequency as "age" of the extent per inode, and take use of the age
info to indicate better temperature type for data block allocation:
- It records total data blocks allocated since mount;
- When file extent has been updated, it calculate the count of data
blocks allocated since last update as the age of the extent;
- Before the data block allocated, it searches for the age info and
chooses the suitable segment for allocation.

Test and result:
- Prepare: create about 30000 files
* 3% for cold files (with cold file extension like .apk, from 3M to 10M)
* 50% for warm files (with random file extension like .FcDxq, from 1K
to 4M)
* 47% for hot files (with hot file extension like .db, from 1K to 256K)
- create(5%)/random update(90%)/delete(5%) the files
* total write amount is about 70G
* fsync will be called for .db files, and buffered write will be used
for other files

The storage of test device is large enough(128G) so that it will not
switch to SSR mode during the test.

Benefit: dirty segment count increment reduce about 14%
- before: Dirty +21110
- after: Dirty +18286

Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com>
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 72840ccc 02-Dec-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: allocate the extent_cache by default

Let's allocate it to remove the runtime complexity.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e7547dac 30-Nov-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: refactor extent_cache to support for read and more

This patch prepares extent_cache to be ready for addition.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 749d543c 30-Nov-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove unnecessary __init_extent_tree

Added into the caller.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3bac20a8 30-Nov-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: move internal functions into extent_cache.c

No functional change.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 12607c1b 30-Nov-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: specify extent cache for read explicitly

Let's descrbie it's read extent cache.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 544b53da 13-Sep-2022 Zhang Qilong <zhangqilong3@huawei.com>

f2fs: code clean and fix a type error

ERROR: code indent should use tabs where possible
ERROR: spaces required around that ':'
ERROR: incorrect tab

Found serveral code type errors when review the code and fix it.
There is no function change.

Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a834aa3e 19-Sep-2022 Zhang Qilong <zhangqilong3@huawei.com>

f2fs: add "c_len" into trace_f2fs_update_extent_tree_range for compressed file

The trace_f2fs_update_extent_tree_range could not record compressed
block length in the cluster of compress file and we just add it.

Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 07725adc 04-Sep-2022 Zhang Qilong <zhangqilong3@huawei.com>

f2fs: fix race condition on setting FI_NO_EXTENT flag

The following scenarios exist.
process A: process B:
->f2fs_drop_extent_tree ->f2fs_update_extent_cache_range
->f2fs_update_extent_tree_range
->write_lock
->set_inode_flag
->is_inode_flag_set
->__free_extent_tree // Shouldn't
// have been
// cleaned up
// here
->write_lock

In this case, the "FI_NO_EXTENT" flag is set between
f2fs_update_extent_tree_range and is_inode_flag_set
by other process. it leads to clearing the whole exten
tree which should not have happened. And we fix it by
move the setting it to the range of write_lock.

Fixes:5f281fab9b9a3 ("f2fs: disable extent_cache for fcollapse/finsert inodes")
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 32410577 08-Aug-2021 Chao Yu <chao@kernel.org>

f2fs: support fault injection for f2fs_kmem_cache_alloc()

This patch supports to inject fault into f2fs_kmem_cache_alloc().

Usage:
a) echo 32768 > /sys/fs/f2fs/<dev>/inject_type or
b) mount -o fault_type=32768 <dev> <mountpoint>

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 94afd6d6 03-Aug-2021 Chao Yu <chao@kernel.org>

f2fs: extent cache: support unaligned extent

Compressed inode may suffer read performance issue due to it can not
use extent cache, so I propose to add this unaligned extent support
to improve it.

Currently, it only works in readonly format f2fs image.

Unaligned extent: in one compressed cluster, physical block number
will be less than logical block number, so we add an extra physical
block length in extent info in order to indicate such extent status.

The idea is if one whole cluster blocks are contiguous physically,
once its mapping info was readed at first time, we will cache an
unaligned (or aligned) extent info entry in extent cache, it expects
that the mapping info will be hitted when rereading cluster.

Merge policy:
- Aligned extents can be merged.
- Aligned extent and unaligned extent can not be merged.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2e9b2bb2 04-Aug-2020 Chao Yu <chao@kernel.org>

f2fs: support 64-bits key in f2fs rb-tree node entry

then, we can add specified entry into rb-tree with 64-bits segment time
as key.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a6d601f3 27-Jun-2020 Chao Yu <chao@kernel.org>

f2fs: fix to wait page writeback before update

Filesystem including f2fs should support stable page for special
device like software raid, however there is one missing path that
page could be updated while it is writeback state as below, fix
this.

- gc_node_segment
- f2fs_move_node_page
- __write_node_page
- set_page_writeback

- do_read_inode
- f2fs_init_extent_tree
- __f2fs_init_extent_tree
i_ext->len = 0;

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# dcbb4c10 18-Jun-2019 Joe Perches <joe@perches.com>

f2fs: introduce f2fs_<level> macros to wrap f2fs_printk()

- Add and use f2fs_<level> macros
- Convert f2fs_msg to f2fs_printk
- Remove level from f2fs_printk and embed the level in the format
- Coalesce formats and align multi-line arguments
- Remove unnecessary duplicate extern f2fs_msg f2fs.h

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f9aa52a8 15-Jan-2019 Chao Yu <chao@kernel.org>

f2fs: fix to initialize variable to avoid UBSAN/smatch warning

As Dan Carpenter as below:

The patch df634f444ee9: "f2fs: use rb_*_cached friends" from Oct 4,
2018, leads to the following static checker warning:

fs/f2fs/extent_cache.c:606 f2fs_update_extent_tree_range()
error: uninitialized symbol 'leftmost'.

And also Eric Biggers, and Kyungtae Kim reported, there is an UBSAN
warning described as below:

We report a bug in linux-4.20.2: "UBSAN: Undefined behaviour in
fs/f2fs/extent_cache.c"

kernel config: https://kt0755.github.io/etc/config_v4.20_stable
repro: https://kt0755.github.io/etc/repro.4a3e7.c (f2fs is mounted on
/mnt/f2fs/)

This arose in f2fs_update_extent_tree_range (fs/f2fs/extent_cache.c:605).
It seems that, for some reason, its last argument became "24"
although that was supposed to be bool type.

=========================================
UBSAN: Undefined behaviour in fs/f2fs/extent_cache.c:605:4
load of value 24 is not a valid value for type '_Bool'
CPU: 0 PID: 6774 Comm: syz-executor5 Not tainted 4.20.2 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xb1/0x118 lib/dump_stack.c:113
ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
__ubsan_handle_load_invalid_value+0x17a/0x1be lib/ubsan.c:457
f2fs_update_extent_tree_range+0x1d4a/0x1d50 fs/f2fs/extent_cache.c:605
f2fs_update_extent_cache+0x2b6/0x350 fs/f2fs/extent_cache.c:804
f2fs_update_data_blkaddr+0x61/0x70 fs/f2fs/data.c:656
f2fs_outplace_write_data+0x1d6/0x4b0 fs/f2fs/segment.c:3140
f2fs_convert_inline_page+0x86d/0x2060 fs/f2fs/inline.c:163
f2fs_convert_inline_inode+0x6b5/0xad0 fs/f2fs/inline.c:208
f2fs_preallocate_blocks+0x78b/0xb00 fs/f2fs/data.c:982
f2fs_file_write_iter+0x31b/0xf40 fs/f2fs/file.c:3062
call_write_iter include/linux/fs.h:1857 [inline]
new_sync_write fs/read_write.c:474 [inline]
__vfs_write+0x538/0x6e0 fs/read_write.c:487
vfs_write+0x1b3/0x520 fs/read_write.c:549
ksys_write+0xde/0x1c0 fs/read_write.c:598
__do_sys_write fs/read_write.c:610 [inline]
__se_sys_write fs/read_write.c:607 [inline]
__x64_sys_write+0x7e/0xc0 fs/read_write.c:607
do_syscall_64+0xbe/0x4f0 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f1ea15edc68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007f1ea15ee6cc RCX: 00000000004497b9
RDX: 0000000000001000 RSI: 0000000020000140 RDI: 0000000000000013
RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 000000000000bb50 R14: 00000000006f4bf0 R15: 00007f1ea15ee700
=========================================

As I checked, this uninitialized variable won't cause extent cache
corruption, but in order to avoid such kind of warning of both UBSAN
and smatch, fix to initialize related variable.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Eric Biggers <ebiggers@google.com>
Reported-by: Kyungtae Kim <kt0755@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4dada3fd 03-Oct-2018 Chao Yu <chao@kernel.org>

f2fs: use rb_*_cached friends

As rbtree supports caching leftmost node natively, update f2fs codes
to use rb_*_cached helpers to speed up leftmost node visiting.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7c1a000d 11-Sep-2018 Chao Yu <chao@kernel.org>

f2fs: add SPDX license identifiers

Remove the verbose license text from f2fs files and replace them with
SPDX tags. This does not change the license of any of the code.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b430f726 10-Sep-2018 Zhikang Zhang <zhangzhikang1@huawei.com>

f2fs: avoid sleeping under spin_lock

In the call trace below, we might sleep in function dput().

So in order to avoid sleeping under spin_lock, we remove f2fs_mark_inode_dirty_sync
from __try_update_largest_extent && __drop_largest_extent.

BUG: sleeping function called from invalid context at fs/dcache.c:796
Call trace:
dump_backtrace+0x0/0x3f4
show_stack+0x24/0x30
dump_stack+0xe0/0x138
___might_sleep+0x2a8/0x2c8
__might_sleep+0x78/0x10c
dput+0x7c/0x750
block_dump___mark_inode_dirty+0x120/0x17c
__mark_inode_dirty+0x344/0x11f0
f2fs_mark_inode_dirty_sync+0x40/0x50
__insert_extent_tree+0x2e0/0x2f4
f2fs_update_extent_tree_range+0xcf4/0xde8
f2fs_update_extent_cache+0x114/0x12c
f2fs_update_data_blkaddr+0x40/0x50
write_data_page+0x150/0x314
do_write_data_page+0x648/0x2318
__write_data_page+0xdb4/0x1640
f2fs_write_cache_pages+0x768/0xafc
__f2fs_write_data_pages+0x590/0x1218
f2fs_write_data_pages+0x64/0x74
do_writepages+0x74/0xe4
__writeback_single_inode+0xdc/0x15f0
writeback_sb_inodes+0x574/0xc98
__writeback_inodes_wb+0x190/0x204
wb_writeback+0x730/0xf14
wb_check_old_data_flush+0x1bc/0x1c8
wb_workfn+0x554/0xf74
process_one_work+0x440/0x118c
worker_thread+0xac/0x974
kthread+0x1a0/0x1c8
ret_from_fork+0x10/0x1c

Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4d57b86d 29-May-2018 Chao Yu <chao@kernel.org>

f2fs: clean up symbol namespace

As Ted reported:

"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).

As one example, in fs/f2fs/dir.c there is:

unsigned char get_de_type(struct f2fs_dir_entry *de)

This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.

You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.

acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."

This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;

Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 8fe326cb 21-Feb-2018 Colin Ian King <colin.king@canonical.com>

f2fs: remove redundant initialization of pointer 'p'

Pointer p is initialized with a value that is never read and is later
re-assigned a new value, hence the initialization is redundant and can
be removed.

Cleans up clang warning:
fs/f2fs/extent_cache.c:463:19: warning: Value stored to 'p' during
its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bf617f7a 27-Jan-2018 Chao Yu <chao@kernel.org>

f2fs: fix to check extent cache in f2fs_drop_extent_tree

If noextent_cache mount option is on, we will never initialize extent tree
in inode, but still we're going to access it in f2fs_drop_extent_tree,
result in kernel panic as below:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: _raw_write_lock+0xc/0x30
Call Trace:
? f2fs_drop_extent_tree+0x41/0x70 [f2fs]
f2fs_fallocate+0x5a0/0xdd0 [f2fs]
? common_file_perm+0x47/0xc0
? apparmor_file_permission+0x1a/0x20
vfs_fallocate+0x15b/0x290
SyS_fallocate+0x44/0x70
do_syscall_64+0x6e/0x160
entry_SYSCALL64_slow_path+0x25/0x25

This patch fixes to check extent cache status before using in
f2fs_drop_extent_tree.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# dad48e73 19-May-2017 Yunlei He <heyunlei@huawei.com>

f2fs: fix a bug caused by NULL extent tree

Thread A: Thread B:

-f2fs_remount
-sbi->mount_opt.opt = 0;
<--- -f2fs_iget
-do_read_inode
-f2fs_init_extent_tree
-F2FS_I(inode)->extent_tree is NULL
-default_options && parse_options
-remount return
<--- -f2fs_map_blocks
-f2fs_lookup_extent_tree
-f2fs_bug_on(sbi, !et);

The same problem with f2fs_new_inode.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# df0f6b44 17-Apr-2017 Chao Yu <chao@kernel.org>

f2fs: introduce __check_rb_tree_consistence

Introduce __check_rb_tree_consistence to check consistence of rb-tree
based discard cache in runtime.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 004b6862 14-Apr-2017 Chao Yu <chao@kernel.org>

f2fs: use rb-tree to track pending discard commands

Introduce rb-tree based discard cache infrastructure to speed up lookup and
merge operation of discard entry.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: initialize dc to avoid build warning]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 54c2258c 10-Apr-2017 Chao Yu <chao@kernel.org>

f2fs: extract rb-tree operation infrastructure

rb-tree lookup/update functions are deeply coupled into extent cache
codes, it's very hard to reuse these basic functions, this patch
extracts common rb-tree operation infrastructure for latter reusing.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 317e1300 25-Feb-2017 Chao Yu <chao@kernel.org>

f2fs: kill __is_extent_same

Since commit ee6d182f2a19 ("f2fs: remove syncing inode page in all the
cases") delayed inode element updating from inode cache to node page
cache, so once largest cached extent is updated, we can make inode dirty
immediately instead of checking and updating it in the end of extent
cache update.

The above commit didn't clean up unneeded codes in extent_cache.c, let's
finish the job in this patch.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5e8256ac 23-Feb-2017 Yunlei He <heyunlei@huawei.com>

f2fs: replace rw semaphore extent_tree_lock with mutex lock

This patch replace rw semaphore extent_tree_lock with mutex lock
for no read cases with this lock.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ed0b5620 20-Dec-2016 Geliang Tang <geliangtang@gmail.com>

f2fs: use rb_entry_safe

Use rb_entry_safe() instead of open-coding it.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7855eba4 19-Dec-2016 Yunlei He <heyunlei@huawei.com>

f2fs: fix a problem of using memory after free

This patch fix a problem of using memory after free
in function __try_merge_extent_node.

Fixes: 0f825ee6e873 ("f2fs: add new interfaces for extent tree")
Cc: <stable@vger.kernel.org>
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7c45729a 14-Oct-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: keep dirty inodes selectively for checkpoint

This is to avoid no free segment bug during checkpoint caused by a number of
dirty inodes.

The case was reported by Chao like this.
1. mount with lazytime option
2. fill 4k file until disk is full
3. sync filesystem
4. read all files in the image
5. umount

In this case, we actually don't need to flush dirty inode to inode page during
checkpoint.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5f281fab 12-Jul-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: disable extent_cache for fcollapse/finsert inodes

This reduces the elapsed time to do xfstests/generic/017.

Before: 458 s
After: 390 s

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b56ab837 30-Jun-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid mark_inode_dirty

Let's check inode's dirtiness before calling mark_inode_dirty.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ee6d182f 20-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove syncing inode page in all the cases

This patch reduces to call them across the whole tree.
- sync_inode_page()
- update_inode_page()
- update_inode()
- f2fs_write_inode()

Instead, checkpoint will flush all the dirty inode metadata before syncing
node pages.
Note that, this is doable, since we call mark_inode_dirty_sync() for all
inode's field change which needs to update on-disk inode as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 205b9822 20-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: call mark_inode_dirty_sync for i_field changes

This patch calls mark_inode_dirty_sync() for the following on-disk inode
changes.

-> largest
-> ctime/mtime/atime
-> i_current_depth
-> i_xattr_nid
-> i_pino
-> i_advise
-> i_flags
-> i_mode

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 91942321 20-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use inode pointer for {set, clear}_inode_flag

This patch refactors to use inode pointer for set_inode_flag and
clear_inode_flag.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bd933d4f 04-May-2016 Chao Yu <chao@kernel.org>

f2fs: reuse get_extent_info

Reuse get_extent_info for readability.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f28b3434 24-Feb-2016 Chao Yu <chao@kernel.org>

f2fs: introduce f2fs_update_data_blkaddr for cleanup

Add a new help f2fs_update_data_blkaddr to clean up redundant codes.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 81ca7350 26-Jan-2016 Chao Yu <chao@kernel.org>

f2fs: remove unneeded pointer conversion

There are redundant pointer conversion in following call stack:
- at position a, inode was been converted to f2fs_file_info.
- at position b, f2fs_file_info was been converted to inode again.

- truncate_blocks(inode,..)
- fi = F2FS_I(inode) ---a
- ADDRS_PER_PAGE(node_page, fi)
- addrs_per_inode(fi)
- inode = &fi->vfs_inode ---b
- f2fs_has_inline_xattr(inode)
- fi = F2FS_I(inode)
- is_inode_flag_set(fi,..)

In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and
addrs_per_inode to acept parameter with type of inode pointer.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6fe2bc95 19-Jan-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: give scheduling point in shrinking path

It needs to give a chance to be rescheduled while shrinking slab entries.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 201ef5e0 25-Jan-2016 Hou Pengyang <houpengyang@huawei.com>

f2fs: improve shrink performance of extent nodes

On the worst case, we need to scan the whole radix tree and every rb-tree to
free the victimed extent_nodes when shrinking.

Pengyang initially introduced a victim_list to record the victimed extent_nodes,
and free these extent_nodes by just scanning a list.

Later, Chao Yu enhances the original patch to improve memory footprint by
removing victim list.

The policy of lru list shrinking becomes:
1) lock lru list's lock
2) trylock extent tree's lock
3) remove extent node from lru list
4) unlock lru list's lock
5) do shrink
6) repeat 1) to 5)

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 42926744 26-Jan-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: don't set cached_en if it will be freed

If en has empty list pointer, it will be freed sooner, so we don't need to
set cached_en with it.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 43a2fa18 26-Jan-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: move extent_node list operations being coupled with rbtree operation

This patch moves extent_node list operations to be handled together with
its rbtree operations.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a03f01f2 25-Jan-2016 Hou Pengyang <houpengyang@huawei.com>

f2fs: reconstruct the code to free an extent_node

There are three steps to free an extent node:
1) list_del_init, 2)__detach_extent_node, 3) kmem_cache_free

In path f2fs_destroy_extent_tree, 1->2->3 to free a node,
But in path f2fs_update_extent_tree_range, it is 2->1->3.

This patch makes all the order to be: 1->2->3
It makes sense, since in the next patch, we import a victim list in the
path shrink_extent_tree, we could check if the extent_node is in the victim
list by checking the list_empty(). So it is necessary to put 1) first.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9b72a388 08-Jan-2016 Chao Yu <chao@kernel.org>

f2fs: skip releasing nodes in chindless extent tree

If there are no nodes in extent tree, let's skip releasing step to avoid
any overhead of grabbing/releasing extent tree lock.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 68e35385 08-Jan-2016 Chao Yu <chao@kernel.org>

f2fs: use atomic type for node count in extent tree

1. rename field in struct extent_tree from count to node_cnt for
readability.
2. alter to use atomic type for node_cnt.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 137d09f0 31-Dec-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce zombie list for fast shrinking extent trees

This patch removes refcount, and instead, adds zombie_list to shrink directly
without radix tree traverse.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ed3d1256 28-Dec-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: load largest extent all the time

Otherwise, we can get mismatched largest extent information.

One example is:
1. mount f2fs w/ extent_cache
2. make a small extent
3. umount
4. mount f2fs w/o extent_cache
5. update the largest extent
6. umount
7. mount f2fs w/ extent_cache
8. get the old extent made by #2

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 74fd8d99 21-Dec-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: speed up shrinking extent tree entries

If there is no candidates for shrinking slab entries, we don't need to traverse
any trees at all.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix missing initialization reported by Yunlei He]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7441ccef 21-Dec-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use atomic variable for total_extent_tree

It would be better to use atomic variable for total_extent_tree.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9006f2c9 30-Nov-2015 Chao Yu <chao@kernel.org>

f2fs: kill f2fs_drop_largest_extent

For direct IO, f2fs only allocate new address for the block which is not
exist in the disk before, its mapping info should not exist in extent
cache previously, so here we do not need to call f2fs_drop_largest_extent
to drop related cache.

Due to no more callers for f2fs_drop_largest_extent now, kill it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 760de791 30-Nov-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid deadlock in f2fs_shrink_extent_tree

While handling extent trees, we can enter into a reclaiming path anytime.
If it tries to release some extent nodes in the same extent tree,
write_lock(&et->lock) would be hanged.
In order to avoid the deadlock, we can just skip it.

Note that, if it is an unreferenced tree, we should get write_lock(&et->lock)
successfully and release all of therein nodes.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# beaa57dd 22-Oct-2015 Chao Yu <chao@kernel.org>

f2fs: fix to skip shrinking extent nodes

In f2fs_shrink_extent_tree we should stop shrink flow if we have already
shrunk enough nodes in extent cache.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4abd3f5a 22-Sep-2015 Chao Yu <chao@kernel.org>

f2fs: introduce __try_update_largest_extent

This patch adds a new helper __try_update_largest_extent for cleanup.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4d1fa815 17-Sep-2015 Fan Li <fanofcode.li@samsung.com>

f2fs: optimize code of f2fs_update_extent_tree_range

Fix 2 potential problems:
1. when largest extent needs to be invalidated, it will be reset in
__drop_largest_extent, which makes __is_extent_same after always
return false, and largest extent unchanged. Now we update it properly.

2. when extent is split and the latter part remains in tree, next_en
should be the latter part instead of next extent of original extent.
It will cause merge failure if there is in-place update, although
there is not, I think this fix will still makes codes less ambiguous.

This patch also simplifies codes of invalidating extents, and optimizes the
procedues that split extent into two.
There are a few modifications after last patch:
1. prev_en now is updated properly.
2. more codes and branches are simplified.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 41a099de 17-Sep-2015 Fan Li <fanofcode.li@samsung.com>

f2fs: drop largest extent by range

now we update extent by range, fofs may not be on the largest
extent if the new extent overlaps with it. so add a new function
to drop largest extent properly.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 514053e4 15-Sep-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: declare f2fs_update_extent_tree_range as static

This function should be static.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 100136ac 11-Sep-2015 Chao Yu <chao@kernel.org>

f2fs: fix incorrect searching position when shrinking extent cache

When shrinking extent cache, we have two steps in the flow:
1) shrink objects which are unreferenced by inodes;
2) shrink objects from LRU list of extent cache.

In step 1, if we haven't shrunk enough number of objects, we will try
step 2, but before that we didn't update the searching position which
may point to last inode index in global extent tree, result in failing
to shrink objects by traversing the all inodes' extent tree.

In this patch, we reset searching position to beginning of global extent
tree for fixing.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 744288c7 06-Sep-2015 Chao Yu <chao@kernel.org>

f2fs: trace in batches extent info update

Rename trace_f2fs_update_extent_tree to trace_f2fs_update_extent_tree_range,
then expand and enable it to trace in batches extent info updates.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 54d71856 28-Aug-2015 Chao Yu <chao@kernel.org>

f2fs: avoid accessing NULL pointer in f2fs_drop_largest_extent

If extent cache is disable, we will encounter oops when triggering direct
IO as below:

BUG: unable to handle kernel NULL pointer dereference at 0000000c
IP: [<f0b9c61e>] f2fs_drop_largest_extent+0xe/0x30 [f2fs]
*pdpt = 000000002bb9a001 *pde = 0000000000000000
Oops: 0000 [#1] SMP
Modules linked in: f2fs(O) fuse bnep rfcomm bluetooth nfsd dm_crypt nfs_acl auth_rpcgss oid_registry nfs binfmt_misc fscache lockd
sunrpc grace snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer
snd_seq_device snd soundcore joydev psmouse hid_generic i2c_piix4 serio_raw ppdev mac_hid parport_pc lp parport ext4 jbd2 mbcache
usbhid hid e1000
CPU: 3 PID: 3608 Comm: dd Tainted: G O 4.2.0-rc4 #12
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: ef161600 ti: ebd5e000 task.ti: ebd5e000
EIP: 0060:[<f0b9c61e>] EFLAGS: 00010202 CPU: 3
EIP is at f2fs_drop_largest_extent+0xe/0x30 [f2fs]
EAX: 00000000 EBX: ddebc000 ECX: 00000000 EDX: 00000000
ESI: ebd5fdf8 EDI: 00000000 EBP: ebd5fd58 ESP: ebd5fd58
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: 0000000c CR3: 2c24ee40 CR4: 000006f0
Stack:
ebd5fda4 f0b8c005 00000000 00000001 00000000 f0b8c430 c816cd68 ddebc000
ddebc088 00001000 00000555 00000555 ffffffff c160bb00 00055501 00000000
00000000 00000100 00000000 ebd5fe20 f0b8c430 00000046 ef161600 00001000
Call Trace:
[<f0b8c005>] __allocate_data_block+0x1a5/0x260 [f2fs]
[<f0b8c430>] ? f2fs_direct_IO+0x370/0x440 [f2fs]
[<c160bb00>] ? down_read+0x30/0x50
[<f0b8c430>] f2fs_direct_IO+0x370/0x440 [f2fs]
[<c113e115>] generic_file_direct_write+0xa5/0x260
[<c10b53f8>] ? current_fs_time+0x18/0x50
[<c113e38b>] __generic_file_write_iter+0xbb/0x210
[<c113e50f>] ? generic_file_write_iter+0x2f/0x320
[<c113e63c>] generic_file_write_iter+0x15c/0x320
[<f0b77f29>] f2fs_file_write_iter+0x39/0x80 [f2fs]
[<c11984d9>] __vfs_write+0xa9/0xe0
[<c1199227>] vfs_write+0x97/0x180
[<c119955b>] SyS_write+0x5b/0xd0
[<c160dcd0>] sysenter_do_call+0x12/0x12
Code: 10 8b 50 1c 89 53 14 eb ca 8d 74 26 00 85 f6 74 86 eb a6 0f 0b 90 8d b4 26 00 00 00 00 55 89 e5 3e 8d 74 26 00 8b 80 d4 02 00
00 <8b> 48 0c 39 d1 77 0e 03 48 14 39 ca 73 07 c7 40 14 00 00 00 00
EIP: [<f0b9c61e>] f2fs_drop_largest_extent+0xe/0x30 [f2fs] SS:ESP 0068:ebd5fd58
CR2: 000000000000000c
---[ end trace a38c07026a1afffd ]---

This is because when extent cache is disable, extent_tree pointer in struct
f2fs_inode_info should be NULL, but in f2fs_drop_largest_extent we access
this NULL pointer directly without checking state of extent cache, then,
the oops occurs. Let's fix it by checking state of extent cache before
accessing.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 19b2c30d 26-Aug-2015 Chao Yu <chao@kernel.org>

f2fs: update extent tree in batches

This patch introduce a new helper f2fs_update_extent_tree_range which can
do extent mapping update at a specified range.

The main idea is:
1) punch all mapping info in extent node(s) which are at a specified range;
2) try to merge new extent mapping with adjacent node, or failing that,
insert the mapping into extent tree as a new node.

In order to see the benefit, I add a function for stating time stamping
count as below:

uint64_t rdtsc(void)
{
uint32_t lo, hi;
__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
return (uint64_t)hi << 32 | lo;
}

My test environment is: ubuntu, intel i7-3770, 16G memory, 256g micron ssd.

truncation path: update extent cache from truncate_data_blocks_range
non-truncataion path: update extent cache from other paths
total: all update paths

a) Removing 128MB file which has one extent node mapping whole range of
file:
1. dd if=/dev/zero of=/mnt/f2fs/128M bs=1M count=128
2. sync
3. rm /mnt/f2fs/128M

Before:
total count average
truncation: 7651022 32768 233.49

Patched:
total count average
truncation: 3321 33 100.64

b) fsstress:
fsstress -d /mnt/f2fs -l 5 -n 100 -p 20
Test times: 5 times.

Before:
total count average
truncation: 5812480.6 20911.6 277.95
non-truncation: 7783845.6 13440.8 579.12
total: 13596326.2 34352.4 395.79

Patched:
total count average
truncation: 1281283.0 3041.6 421.25
non-truncation: 7355844.4 13662.8 538.38
total: 8637127.4 16704.4 517.06

1) For the updates in truncation path:
- we can see updating in batches leads total tsc and update count reducing
explicitly;
- besides, for a single batched updating, punching multiple extent nodes
in a loop, result in executing more operations, so our average tsc
increase intensively.
2) For the updates in non-truncation path:
- there is a little improvement, that is because for the scenario that we
just need to update in the head or tail of extent node, new interface
optimize to update info in extent node directly, rather than removing
original extent node for updating and then inserting that updated one
into cache as new node.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# dac2ddef 19-Aug-2015 Chao Yu <chao@kernel.org>

f2fs: lookup neighbor extent nodes for merging later

In __lookup_extent_tree_ret we will not try to find neighbor nodes if
we find the target node, in this condition, we will lost the chance to
merge the new mapping with exist extent node later.

So our extent cache of inode will be fragmented after overwrite exist
file, we can see the number of extent node increases intensively in
following test case:

dd if=/dev/zero of=/mnt/f2fs/4m bs=4K count=1024

Extent Cache:
- Hit Count: L1-1:0 L1-2:0 L2:0
- Hit Ratio: 0% (0 / 3072)
- Inner Struct Count: tree: 1, node: 1

dd if=/dev/zero of=/mnt/f2fs/4m bs=4K count=1024 conv=notrunc

Extent Cache:
- Hit Count: L1-1:2048 L1-2:0 L2:0
- Hit Ratio: 33% (2048 / 6144)
- Inner Struct Count: tree: 1, node: 961

This patch fixes to lookup neighbors of target node for further
merging.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ef05e221 19-Aug-2015 Chao Yu <chao@kernel.org>

f2fs: split __insert_extent_tree_ret for readability

This patch splits __insert_extent_tree_ret into __try_merge_extent_node &
__insert_extent_tree for code readability.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a6f78345 19-Aug-2015 Chao Yu <chao@kernel.org>

f2fs: kill dead code in __insert_extent_tree

After commit 0f825ee6e873 ("f2fs: add new interfaces for extent tree"),
f2fs_init_extent_tree becomes the only caller of __insert_extent_tree, and
in f2fs_init_extent_tree, we will only insert extent node in an empty tree,
so __try_{back,front}_merge in __insert_extent_tree will never be called.

This patch removes these dead codes, besides, rename __insert_extent_tree
to __init_extent_tree for readability.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 029e13cc 19-Aug-2015 Chao Yu <chao@kernel.org>

f2fs: adjust showing of extent cache stat

This patch alters to replace total hit stat with rbtree hit stat,
and then adjust showing of extent cache stat:

Hit Count:
L1-1: for largest node hit count;
L1-2: for last cached node hit count;
L2: for extent node hit after lookuping in rbtree.

Hit Ratio:
ratio (hit count / total lookup count)

Inner Struct Count:
tree count, node count.

Before:
Extent Hit Ratio: 0 / 2

Extent Tree Count: 3

Extent Node Count: 2

Patched:
Exten Cacache:
- Hit Count: L1-1:4871 L1-2:2074 L2:208
- Hit Ratio: 1% (7153 / 550751)
- Inner Struct Count: tree: 26560, node: 11824

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 91c481ff 19-Aug-2015 Chao Yu <chao@kernel.org>

f2fs: add largest/cached stat in extent cache

This patch adds to stat the hit count of largest/cached node for showing
in debugfs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f8b703da 18-Aug-2015 Fan Li <fanofcode.li@samsung.com>

f2fs: fix to update cached_en of extent tree properly

In f2fs_lookup_extent_tree, et->cached_en was read and updated with only
read lock held,
it could cause __lookup_extent_tree within return entirely wrong
extent_node, if other
thread update et->cached_en just before __lookup_extent_tree return.

However, there are two things about this patch that need to be noticed:
1. It does no good to arrange the order of concurrent read/write, the result
would still
be random in such case.
2. It's built on this assumption: the mix up of reads and writes on a single
pointer would
not make the pointer partially wrong at any time. Please let me know if I'm
wrong, thx.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0f825ee6 15-Jul-2015 Fan Li <fanofcode.li@samsung.com>

f2fs: add new interfaces for extent tree

Add a lookup and a insertion interface for extent tree.
The new lookup return the insert position and the prev/next
extents closest to the offset we lookup when find no match.
The new insertion uses above parameters to improve performance.

There are three possible insertions after the lookup in
f2fs_update_extent_tree, two of them insert parts of removed extent
back to tree, since no merge happens during this process, new insertion
skips the merge check in this scanario; the another insertion inserts a
new extent to tree, new insertion uses prev/next extent and insert
position to insert this extent directly, and save the time of searching
down the tree.

As long as tree remains unchanged between lookup and insertion, this
would work fine. And the new lookup would be useful when add
multi-blocks extent support for insertion interface.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 727edac5 15-Jul-2015 Chao Yu <chao@kernel.org>

f2fs: use atomic_t to record hit ratio info of extent cache

Variables for recording extent cache ratio info were updated without
protection, this patch tries to alter them to atomic_t type for more
accurate stat.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a28ef1f5 08-Jul-2015 Chao Yu <chao@kernel.org>

f2fs: maintain extent cache in separated file

This patch moves extent cache related code from data.c into extent_cache.c
since extent cache is independent feature, and its codes are not relate to
others in data.c, it's better for us to maintain them in separated place.

There is no functionality change, but several small coding style fixes
including:
* rename __drop_largest_extent to f2fs_drop_largest_extent for exporting;
* rename misspelled word 'untill' to 'until';
* remove unneeded 'return' in the end of f2fs_destroy_extent_tree().

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>