Searched +hist:01 +hist:eccef7 (Results 1 - 5 of 5) sorted by relevance
/linux-master/fs/f2fs/ | ||
H A D | gc.c | diff 24593061 Mon Mar 11 01:48:54 MDT 2024 Zhiguo Niu <zhiguo.niu@unisoc.com> f2fs: fix to handle error paths of {new,change}_curseg() {new,change}_curseg() may return error in some special cases, error handling should be did in their callers, and this will also facilitate subsequent error path expansion in {new,change}_curseg(). Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 22af1b8c Fri Mar 01 01:25:55 MST 2024 Zhiguo Niu <zhiguo.niu@unisoc.com> f2fs: fix to check return value of f2fs_gc_range f2fs_gc_range may return error, so its caller f2fs_allocate_pinning_section should determine whether to do retry based on ist return value. Also just do f2fs_gc_range when f2fs_allocate_new_section return -EAGAIN, and check cp error case in f2fs_gc_range. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 22af1b8c Fri Mar 01 01:25:55 MST 2024 Zhiguo Niu <zhiguo.niu@unisoc.com> f2fs: fix to check return value of f2fs_gc_range f2fs_gc_range may return error, so its caller f2fs_allocate_pinning_section should determine whether to do retry based on ist return value. Also just do f2fs_gc_range when f2fs_allocate_new_section return -EAGAIN, and check cp error case in f2fs_gc_range. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 4e0197f9 Tue Feb 20 01:48:44 MST 2024 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: kill heap-based allocation No one uses this feature. Let's kill it. Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 19ec1d31 Thu Dec 21 20:29:01 MST 2023 Yongpeng Yang <yangyongpeng1@oppo.com> f2fs: Add error handling for negative returns from do_garbage_collect The function do_garbage_collect can return a value less than 0 due to f2fs_cp_error being true or page allocation failure, as a result of calling f2fs_get_sum_page. However, f2fs_gc does not account for such cases, which could potentially lead to an abnormal total_freed and thus cause subsequent code to behave unexpectedly. Given that an f2fs_cp_error is irrecoverable, and considering that do_garbage_collect already retries page allocation errors through its call to f2fs_get_sum_page->f2fs_get_meta_page_retry, any error reported by do_garbage_collect should immediately terminate the current GC. Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 8bec7dd1 Tue Jun 06 00:19:01 MDT 2023 Chao Yu <chao@kernel.org> f2fs: check return value of freeze_super() freeze_super() can fail, it needs to check its return value and do error handling in f2fs_resize_fs(). Fixes: 04f0b2eaa3b3 ("f2fs: ioctl for removing a range from F2FS") Fixes: b4b10061ef98 ("f2fs: refactor resize_fs to avoid meta updates in progress") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff d11cef14 Mon Mar 20 18:12:51 MDT 2023 Yonggil Song <yonggil.song@samsung.com> f2fs: Fix system crash due to lack of free space in LFS When f2fs tries to checkpoint during foreground gc in LFS mode, system crash occurs due to lack of free space if the amount of dirty node and dentry pages generated by data migration exceeds free space. The reproduction sequence is as follows. - 20GiB capacity block device (null_blk) - format and mount with LFS mode - create a file and write 20,000MiB - 4k random write on full range of the file RIP: 0010:new_curseg+0x48a/0x510 [f2fs] Code: 55 e7 f5 89 c0 48 0f af c3 48 8b 5d c0 48 c1 e8 20 83 c0 01 89 43 6c 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc <0f> 0b f0 41 80 4f 48 04 45 85 f6 0f 84 ba fd ff ff e9 ef fe ff ff RSP: 0018:ffff977bc397b218 EFLAGS: 00010246 RAX: 00000000000027b9 RBX: 0000000000000000 RCX: 00000000000027c0 RDX: 0000000000000000 RSI: 00000000000027b9 RDI: ffff8c25ab4e74f8 RBP: ffff977bc397b268 R08: 00000000000027b9 R09: ffff8c29e4a34b40 R10: 0000000000000001 R11: ffff977bc397b0d8 R12: 0000000000000000 R13: ffff8c25b4dd81a0 R14: 0000000000000000 R15: ffff8c2f667f9000 FS: 0000000000000000(0000) GS:ffff8c344ec80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c00055d000 CR3: 0000000e30810003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> allocate_segment_by_default+0x9c/0x110 [f2fs] f2fs_allocate_data_block+0x243/0xa30 [f2fs] ? __mod_lruvec_page_state+0xa0/0x150 do_write_page+0x80/0x160 [f2fs] f2fs_do_write_node_page+0x32/0x50 [f2fs] __write_node_page+0x339/0x730 [f2fs] f2fs_sync_node_pages+0x5a6/0x780 [f2fs] block_operations+0x257/0x340 [f2fs] f2fs_write_checkpoint+0x102/0x1050 [f2fs] f2fs_gc+0x27c/0x630 [f2fs] ? folio_mark_dirty+0x36/0x70 f2fs_balance_fs+0x16f/0x180 [f2fs] This patch adds checking whether free sections are enough before checkpoint during gc. Signed-off-by: Yonggil Song <yonggil.song@samsung.com> [Jaegeuk Kim: code clean-up] Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff c6ad7fd1 Wed Sep 14 05:51:51 MDT 2022 Chao Yu <chao@kernel.org> f2fs: fix to do sanity check on summary info As Wenqing Liu reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216456 BUG: KASAN: use-after-free in recover_data+0x63ae/0x6ae0 [f2fs] Read of size 4 at addr ffff8881464dcd80 by task mount/1013 CPU: 3 PID: 1013 Comm: mount Tainted: G W 6.0.0-rc4 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 Call Trace: dump_stack_lvl+0x45/0x5e print_report.cold+0xf3/0x68d kasan_report+0xa8/0x130 recover_data+0x63ae/0x6ae0 [f2fs] f2fs_recover_fsync_data+0x120d/0x1fc0 [f2fs] f2fs_fill_super+0x4665/0x61e0 [f2fs] mount_bdev+0x2cf/0x3b0 legacy_get_tree+0xed/0x1d0 vfs_get_tree+0x81/0x2b0 path_mount+0x47e/0x19d0 do_mount+0xce/0xf0 __x64_sys_mount+0x12c/0x1a0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd The root cause is: in fuzzed image, SSA table is corrupted: ofs_in_node is larger than ADDRS_PER_PAGE(), result in out-of-range access on 4k-size page. - recover_data - do_recover_data - check_index_in_prev_nodes - f2fs_data_blkaddr This patch adds sanity check on summary info in recovery and GC flow in where the flows rely on them. After patch: [ 29.310883] F2FS-fs (loop0): Inconsistent ofs_in_node:65286 in summary, ino:0, nid:6, max:1018 Cc: stable@vger.kernel.org Reported-by: Wenqing Liu <wenqingliu0120@gmail.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff e4544b63 Fri Jan 07 01:48:44 MST 2022 Tim Murray <timmurray@google.com> f2fs: move f2fs to use reader-unfair rwsems f2fs rw_semaphores work better if writers can starve readers, especially for the checkpoint thread, because writers are strictly more important than reader threads. This prevents significant priority inversion between low-priority readers that blocked while trying to acquire the read lock and a second acquisition of the write lock that might be blocking high priority work. Signed-off-by: Tim Murray <timmurray@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 10d0786b Wed Jul 14 01:46:06 MDT 2021 Jia Yang <jiayang5@huawei.com> f2fs: Revert "f2fs: Fix indefinite loop in f2fs_gc() v1" This reverts commit 957fa47823dfe449c5a15a944e4e7a299a6601db. The patch "f2fs: Fix indefinite loop in f2fs_gc()" v1 and v4 are all merged. Patch v4 is test info for patch v1. Patch v1 doesn't work and may cause that sbi->cur_victim_sec can't be resetted to NULL_SEGNO, which makes SSR unable to get segment of sbi->cur_victim_sec. So it should be reverted. The mails record: [1] https://lore.kernel.org/linux-f2fs-devel/7288dcd4-b168-7656-d1af-7e2cafa4f720@huawei.com/T/ [2] https://lore.kernel.org/linux-f2fs-devel/20190809153653.GD93481@jaegeuk-macbookpro.roam.corp.google.com/T/ Signed-off-by: Jia Yang <jiayang5@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
H A D | node.c | diff 71644dff Thu Dec 01 18:37:15 MST 2022 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: add block_age-based extent cache This patch introduces a runtime hot/cold data separation method for f2fs, in order to improve the accuracy for data temperature classification, reduce the garbage collection overhead after long-term data updates. Enhanced hot/cold data separation can record data block update frequency as "age" of the extent per inode, and take use of the age info to indicate better temperature type for data block allocation: - It records total data blocks allocated since mount; - When file extent has been updated, it calculate the count of data blocks allocated since last update as the age of the extent; - Before the data block allocated, it searches for the age info and chooses the suitable segment for allocation. Test and result: - Prepare: create about 30000 files * 3% for cold files (with cold file extension like .apk, from 3M to 10M) * 50% for warm files (with random file extension like .FcDxq, from 1K to 4M) * 47% for hot files (with hot file extension like .db, from 1K to 256K) - create(5%)/random update(90%)/delete(5%) the files * total write amount is about 70G * fsync will be called for .db files, and buffered write will be used for other files The storage of test device is large enough(128G) so that it will not switch to SSR mode during the test. Benefit: dirty segment count increment reduce about 14% - before: Dirty +21110 - after: Dirty +18286 Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com> Signed-off-by: xiongping1 <xiongping1@xiaomi.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 9b7eadd9 Tue Aug 30 20:24:40 MDT 2022 Shuqi Zhang <zhangshuqi3@huawei.com> f2fs: fix wrong dirty page count when race between mmap and fallocate. This is a BUG_ON issue as follows when running xfstest-generic-503: WARNING: CPU: 21 PID: 1385 at fs/f2fs/inode.c:762 f2fs_evict_inode+0x847/0xaa0 Modules linked in: CPU: 21 PID: 1385 Comm: umount Not tainted 5.19.0-rc5+ #73 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 Call Trace: evict+0x129/0x2d0 dispose_list+0x4f/0xb0 evict_inodes+0x204/0x230 generic_shutdown_super+0x5b/0x1e0 kill_block_super+0x29/0x80 kill_f2fs_super+0xe6/0x140 deactivate_locked_super+0x44/0xc0 deactivate_super+0x79/0x90 cleanup_mnt+0x114/0x1a0 __cleanup_mnt+0x16/0x20 task_work_run+0x98/0x100 exit_to_user_mode_prepare+0x3d0/0x3e0 syscall_exit_to_user_mode+0x12/0x30 do_syscall_64+0x42/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Function flow analysis when BUG occurs: f2fs_fallocate mmap do_page_fault pte_spinlock // ---lock_pte do_wp_page wp_page_shared pte_unmap_unlock // unlock_pte do_page_mkwrite f2fs_vm_page_mkwrite down_read(invalidate_lock) lock_page if (PageMappedToDisk(page)) goto out; // set_page_dirty --NOT RUN out: up_read(invalidate_lock); finish_mkwrite_fault // unlock_pte f2fs_collapse_range down_write(i_mmap_sem) truncate_pagecache unmap_mapping_pages i_mmap_lock_write // down_write(i_mmap_rwsem) ...... zap_pte_range pte_offset_map_lock // ---lock_pte set_page_dirty f2fs_dirty_data_folio if (!folio_test_dirty(folio)) { fault_dirty_shared_page set_page_dirty f2fs_dirty_data_folio if (!folio_test_dirty(folio)) { filemap_dirty_folio f2fs_update_dirty_folio // ++ } unlock_page filemap_dirty_folio f2fs_update_dirty_folio // page count++ } pte_unmap_unlock // --unlock_pte i_mmap_unlock_write // up_write(i_mmap_rwsem) truncate_inode_pages up_write(i_mmap_sem) When race happens between mmap-do_page_fault-wp_page_shared and fallocate-truncate_pagecache-zap_pte_range, the zap_pte_range calls function set_page_dirty without page lock. Besides, though truncate_pagecache has immap and pte lock, wp_page_shared calls fault_dirty_shared_page without any. In this case, two threads race in f2fs_dirty_data_folio function. Page is set to dirty only ONCE, but the count is added TWICE by calling filemap_dirty_folio. Thus the count of dirty page cannot accord with the real dirty pages. Following is the solution to in case of race happens without any lock. Since folio_test_set_dirty in filemap_dirty_folio is atomic, judge return value will not be at risk of race. Signed-off-by: Shuqi Zhang <zhangshuqi3@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 82c7863e Sat Jun 18 01:42:24 MDT 2022 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: do not count ENOENT for error case Otherwise, we can get a wrong cp_error mark. Cc: <stable@vger.kernel.org> Fixes: a7b8618aa2f0 ("f2fs: avoid infinite loop to flush node pages") Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff e4544b63 Fri Jan 07 01:48:44 MST 2022 Tim Murray <timmurray@google.com> f2fs: move f2fs to use reader-unfair rwsems f2fs rw_semaphores work better if writers can starve readers, especially for the checkpoint thread, because writers are strictly more important than reader threads. This prevents significant priority inversion between low-priority readers that blocked while trying to acquire the read lock and a second acquisition of the write lock that might be blocking high priority work. Signed-off-by: Tim Murray <timmurray@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff c45d6002 Fri Nov 01 03:53:23 MDT 2019 Chao Yu <chao@kernel.org> f2fs: show f2fs instance in printk_ratelimited As Eric mentioned, bare printk{,_ratelimited} won't show which filesystem instance these message is coming from, this patch tries to show fs instance with sb->s_id field in all places we missed before. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff bc005a4d Fri Nov 01 10:34:21 MDT 2019 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: avoid kernel panic on corruption test xfstests/generic/475 complains kernel warn/panic while testing corrupted disk. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 93770ab7 Mon Apr 15 01:26:32 MDT 2019 Chao Yu <chao@kernel.org> f2fs: introduce DATA_GENERIC_ENHANCE Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check whether @blkaddr locates in main area or not. That check is weak, since the block address in range of main area can point to the address which is not valid in segment info table, and we can not detect such condition, we may suffer worse corruption as system continues running. So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check which trigger SIT bitmap check rather than only range check. This patch did below changes as wel: - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr(). - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid. - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check. - spread blkaddr check in: * f2fs_get_node_info() * __read_out_blkaddrs() * f2fs_submit_page_read() * ra_data_block() * do_recover_data() This patch can fix bug reported from bugzilla below: https://bugzilla.kernel.org/show_bug.cgi?id=203215 https://bugzilla.kernel.org/show_bug.cgi?id=203223 https://bugzilla.kernel.org/show_bug.cgi?id=203231 https://bugzilla.kernel.org/show_bug.cgi?id=203235 https://bugzilla.kernel.org/show_bug.cgi?id=203241 = Update by Jaegeuk Kim = DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths. But, xfstest/generic/446 compalins some generated kernel messages saying invalid bitmap was detected when reading a block. The reaons is, when we get the block addresses from extent_cache, there is no lock to synchronize it from truncating the blocks in parallel. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 626bcf2b Mon Apr 15 01:28:36 MDT 2019 Chao Yu <chao@kernel.org> f2fs: fix to do sanity check on free nid As Jungyeon reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203225 - Overview When mounting the attached crafted image and unmounting it, following errors are reported. Additionally, it hangs on sync after unmounting. The image is intentionally fuzzed from a normal f2fs image for testing. Compile options for F2FS are as follows. CONFIG_F2FS_FS=y CONFIG_F2FS_STAT_FS=y CONFIG_F2FS_FS_XATTR=y CONFIG_F2FS_FS_POSIX_ACL=y CONFIG_F2FS_CHECK_FS=y - Reproduces mkdir test mount -t f2fs tmp.img test touch test/t umount test sync - Messages kernel BUG at fs/f2fs/node.c:3073! RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300 Call Trace: f2fs_put_super+0xf4/0x270 generic_shutdown_super+0x62/0x110 kill_block_super+0x1c/0x50 kill_f2fs_super+0xad/0xd0 deactivate_locked_super+0x35/0x60 cleanup_mnt+0x36/0x70 task_work_run+0x75/0x90 exit_to_usermode_loop+0x93/0xa0 do_syscall_64+0xba/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300 NAT table is corrupted, so reserved meta/node inode ids were added into free list incorrectly, during file creation, since reserved id has cached in inode hash, so it fails the creation and preallocated nid can not be released later, result in kernel panic. To fix this issue, let's do nid boundary check during free nid loading. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff b42b179b Mon Apr 15 01:28:35 MDT 2019 Chao Yu <chao@kernel.org> f2fs: fix to do checksum even if inode page is uptodate As Jungyeon reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203221 - Overview When mounting the attached crafted image and running program, this error is reported. The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on. - Reproduces cc poc_07.c mkdir test mount -t f2fs tmp.img test cp a.out test cd test sudo ./a.out - Messages kernel BUG at fs/f2fs/node.c:1279! RIP: 0010:read_node_page+0xcf/0xf0 Call Trace: __get_node_page+0x6b/0x2f0 f2fs_iget+0x8f/0xdf0 f2fs_lookup+0x136/0x320 __lookup_slow+0x92/0x140 lookup_slow+0x30/0x50 walk_component+0x1c1/0x350 path_lookupat+0x62/0x200 filename_lookup+0xb3/0x1a0 do_fchmodat+0x3e/0xa0 __x64_sys_chmod+0x12/0x20 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 On below paths, we can have opportunity to readahead inode page - gc_node_segment -> f2fs_ra_node_page - gc_data_segment -> f2fs_ra_node_page - f2fs_fill_dentries -> f2fs_ra_node_page Unlike synchronized read, on readahead path, we can set page uptodate before verifying page's checksum, then read_node_page() will trigger kernel panic once it encounters a uptodated page w/ incorrect checksum. So considering readahead scenario, we have to do checksum each time when loading inode page even if it is uptodated. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 8b6810f8 Mon Apr 15 01:28:34 MDT 2019 Chao Yu <chao@kernel.org> f2fs: fix to avoid panic in f2fs_remove_inode_page() As Jungyeon reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203219 - Overview When mounting the attached crafted image and running program, I got this error. Additionally, it hangs on sync after running the program. The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on. - Reproduces cc poc_06.c mkdir test mount -t f2fs tmp.img test cp a.out test cd test sudo ./a.out sync - Messages kernel BUG at fs/f2fs/node.c:1183! RIP: 0010:f2fs_remove_inode_page+0x294/0x2d0 Call Trace: f2fs_evict_inode+0x2a3/0x3a0 evict+0xba/0x180 __dentry_kill+0xbe/0x160 dentry_kill+0x46/0x180 dput+0xbb/0x100 do_renameat2+0x3c9/0x550 __x64_sys_rename+0x17/0x20 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The reason is f2fs_remove_inode_page() will trigger kernel panic due to inconsistent i_blocks value of inode. To avoid panic, let's just print debug message and set SBI_NEED_FSCK to give a hint to fsck for latter repairing of potential image corruption. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix build warning and add unlikely] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
H A D | data.c | diff a842a909 Mon Jul 17 01:11:09 MDT 2023 Minjie Du <duminjie@vivo.com> f2fs: increase usage of folio_next_index() helper Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using the existing helper folio_next_index(). Signed-off-by: Minjie Du <duminjie@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff d9bac032 Wed Feb 01 03:47:02 MST 2023 Yangtao Li <frank.li@vivo.com> f2fs: use iostat_lat_type directly as a parameter in the iostat_update_and_unbind_ctx() Convert to use iostat_lat_type as parameter instead of raw number. BTW, move NUM_PREALLOC_IOSTAT_CTXS to the header file, adjust iostat_lat[{0,1,2}] to iostat_lat[{READ_IO,WRITE_SYNC_IO,WRITE_ASYNC_IO}] in tracepoint function, and rename iotype to page_type to match the definition. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff feb0576a Fri Dec 23 01:36:36 MST 2022 Eric Biggers <ebiggers@google.com> f2fs: simplify f2fs_readpage_limit() Now that the implementation of FS_IOC_ENABLE_VERITY has changed to not involve reading back Merkle tree blocks that were previously written, there is no need for f2fs_readpage_limit() to allow for this case. Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20221223203638.41293-10-ebiggers@kernel.org diff 98dc08ba Tue Nov 29 00:04:01 MST 2022 Eric Biggers <ebiggers@google.com> fsverity: stop using PG_error to track error status As a step towards freeing the PG_error flag for other uses, change ext4 and f2fs to stop using PG_error to track verity errors. Instead, if a verity error occurs, just mark the whole bio as failed. The coarser granularity isn't really a problem since it isn't any worse than what the block layer provides, and errors from a multi-page readahead aren't reported to applications unless a single-page read fails too. f2fs supports compression, which makes the f2fs changes a bit more complicated than desired, but the basic premise still works. Note: there are still a few uses of PageError in f2fs, but they are on the write path, so they are unrelated and this patch doesn't touch them. Reviewed-by: Chao Yu <chao@kernel.org> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221129070401.156114-1-ebiggers@kernel.org diff ca7efd71 Fri Sep 23 01:17:55 MDT 2022 Zhang Qilong <zhangqilong3@huawei.com> f2fs: remove the unnecessary check in f2fs_xattr_fiemap Whehter or not error occurs, checking "err == 1" is unnecessary in f2fs_xattr_fiemap(), and just remove it here. Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 9b7eadd9 Tue Aug 30 20:24:40 MDT 2022 Shuqi Zhang <zhangshuqi3@huawei.com> f2fs: fix wrong dirty page count when race between mmap and fallocate. This is a BUG_ON issue as follows when running xfstest-generic-503: WARNING: CPU: 21 PID: 1385 at fs/f2fs/inode.c:762 f2fs_evict_inode+0x847/0xaa0 Modules linked in: CPU: 21 PID: 1385 Comm: umount Not tainted 5.19.0-rc5+ #73 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 Call Trace: evict+0x129/0x2d0 dispose_list+0x4f/0xb0 evict_inodes+0x204/0x230 generic_shutdown_super+0x5b/0x1e0 kill_block_super+0x29/0x80 kill_f2fs_super+0xe6/0x140 deactivate_locked_super+0x44/0xc0 deactivate_super+0x79/0x90 cleanup_mnt+0x114/0x1a0 __cleanup_mnt+0x16/0x20 task_work_run+0x98/0x100 exit_to_user_mode_prepare+0x3d0/0x3e0 syscall_exit_to_user_mode+0x12/0x30 do_syscall_64+0x42/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Function flow analysis when BUG occurs: f2fs_fallocate mmap do_page_fault pte_spinlock // ---lock_pte do_wp_page wp_page_shared pte_unmap_unlock // unlock_pte do_page_mkwrite f2fs_vm_page_mkwrite down_read(invalidate_lock) lock_page if (PageMappedToDisk(page)) goto out; // set_page_dirty --NOT RUN out: up_read(invalidate_lock); finish_mkwrite_fault // unlock_pte f2fs_collapse_range down_write(i_mmap_sem) truncate_pagecache unmap_mapping_pages i_mmap_lock_write // down_write(i_mmap_rwsem) ...... zap_pte_range pte_offset_map_lock // ---lock_pte set_page_dirty f2fs_dirty_data_folio if (!folio_test_dirty(folio)) { fault_dirty_shared_page set_page_dirty f2fs_dirty_data_folio if (!folio_test_dirty(folio)) { filemap_dirty_folio f2fs_update_dirty_folio // ++ } unlock_page filemap_dirty_folio f2fs_update_dirty_folio // page count++ } pte_unmap_unlock // --unlock_pte i_mmap_unlock_write // up_write(i_mmap_rwsem) truncate_inode_pages up_write(i_mmap_sem) When race happens between mmap-do_page_fault-wp_page_shared and fallocate-truncate_pagecache-zap_pte_range, the zap_pte_range calls function set_page_dirty without page lock. Besides, though truncate_pagecache has immap and pte lock, wp_page_shared calls fault_dirty_shared_page without any. In this case, two threads race in f2fs_dirty_data_folio function. Page is set to dirty only ONCE, but the count is added TWICE by calling filemap_dirty_folio. Thus the count of dirty page cannot accord with the real dirty pages. Following is the solution to in case of race happens without any lock. Since folio_test_set_dirty in filemap_dirty_folio is atomic, judge return value will not be at risk of race. Signed-off-by: Shuqi Zhang <zhangshuqi3@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 01fc4b9a Sat Jul 30 21:33:46 MDT 2022 Fengnan Chang <fengnanchang@gmail.com> f2fs: use onstack pages instead of pvec Since pvec have 15 pages, it not a multiple of 4, when write compressed pages, write in 64K as a unit, it will call pagevec_lookup_range_tag agagin, sometimes this will take a lot of time. Use onstack pages instead of pvec to mitigate this problem. Signed-off-by: Fengnan Chang <fengnanchang@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff bff139b4 Tue Aug 02 01:24:37 MDT 2022 Daeho Jeong <daehojeong@google.com> f2fs: handle decompress only post processing in softirq Now decompression is being handled in workqueue and it makes read I/O latency non-deterministic, because of the non-deterministic scheduling nature of workqueues. So, I made it handled in softirq context only if possible, not in low memory devices, since this modification will maintain decompresion related memory a little longer. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 0adc2ab0 Tue Apr 12 16:01:58 MDT 2022 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: keep io_flags to avoid IO split due to different op_flags in two fio holders Let's attach io_flags to bio only, so that we can merge IOs given original io_flags only. Fixes: 64bf0eef0171 ("f2fs: pass the bio operation to bio_alloc_bioset") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff e4544b63 Fri Jan 07 01:48:44 MST 2022 Tim Murray <timmurray@google.com> f2fs: move f2fs to use reader-unfair rwsems f2fs rw_semaphores work better if writers can starve readers, especially for the checkpoint thread, because writers are strictly more important than reader threads. This prevents significant priority inversion between low-priority readers that blocked while trying to acquire the read lock and a second acquisition of the write lock that might be blocking high priority work. Signed-off-by: Tim Murray <timmurray@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
H A D | super.c | diff 24593061 Mon Mar 11 01:48:54 MDT 2024 Zhiguo Niu <zhiguo.niu@unisoc.com> f2fs: fix to handle error paths of {new,change}_curseg() {new,change}_curseg() may return error in some special cases, error handling should be did in their callers, and this will also facilitate subsequent error path expansion in {new,change}_curseg(). Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 4e0197f9 Tue Feb 20 01:48:44 MST 2024 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: kill heap-based allocation No one uses this feature. Let's kill it. Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff a6a010f5 Mon Dec 04 19:38:01 MST 2023 Daniel Rosenberg <drosen@google.com> f2fs: Restrict max filesize for 16K f2fs Blocks are tracked by u32, so the max permitted filesize is (U32_MAX + 1) * BLOCK_SIZE. Additionally, in order to support crypto data unit sizes of 4K with a 16K block with IV_INO_LBLK_{32,64}, we must further restrict max filesize to (U32_MAX + 1) * 4096. This does not affect 4K blocksize f2fs as the natural limit for files are well below that. Fixes: d7e9a9037de2 ("f2fs: Support Block Size == Page Size") Signed-off-by: Daniel Rosenberg <drosen@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 29215a7d Fri Dec 01 11:38:35 MST 2023 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: allow checkpoint=disable for zoned block device Let's allow checkpoint=disable back for zoned block device. It's very risky as the feature relies on fsck or runtime recovery which matches the write pointers again if the device rebooted while disabling the checkpoint. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff c62ebd35 Wed Jul 05 13:01:08 MDT 2023 Jeff Layton <jlayton@kernel.org> f2fs: convert to ctime accessor functions In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-41-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> diff dde38c03 Sun Jun 11 21:01:17 MDT 2023 Sheng Yong <shengyong@oppo.com> f2fs: cleanup MIN_INLINE_XATTR_SIZE Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff c571fbb5 Sun Jun 11 21:01:16 MDT 2023 Sheng Yong <shengyong@oppo.com> f2fs: add helper to check compression level This patch adds a helper function to check if compression level is valid. Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 00e120b5 Mon Jun 12 01:58:34 MDT 2023 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: assign default compression level Let's avoid any confusion from assigning compress_level=0 for LZ4HC and ZSTD. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 20872584 Sun May 28 01:47:12 MDT 2023 Chao Yu <chao@kernel.org> f2fs: fix to drop all dirty meta/node pages during umount() For cp error case, there will be dirty meta/node pages remained after f2fs_write_checkpoint() in f2fs_put_super(), drop them explicitly, and do sanity check on reference count of dirty pages and inflight IOs. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 0718afd4 Thu Jun 01 03:44:52 MDT 2023 Christoph Hellwig <hch@lst.de> block: introduce holder ops Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
H A D | f2fs.h | diff 24593061 Mon Mar 11 01:48:54 MDT 2024 Zhiguo Niu <zhiguo.niu@unisoc.com> f2fs: fix to handle error paths of {new,change}_curseg() {new,change}_curseg() may return error in some special cases, error handling should be did in their callers, and this will also facilitate subsequent error path expansion in {new,change}_curseg(). Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 28f66cc6 Fri Mar 01 01:25:54 MST 2024 Zhiguo Niu <zhiguo.niu@unisoc.com> f2fs: fix to check return value __allocate_new_segment __allocate_new_segment may return error when get_new_segment fails, so its caller should check its return value. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 28f66cc6 Fri Mar 01 01:25:54 MST 2024 Zhiguo Niu <zhiguo.niu@unisoc.com> f2fs: fix to check return value __allocate_new_segment __allocate_new_segment may return error when get_new_segment fails, so its caller should check its return value. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff fe3944fb Fri Feb 02 01:39:23 MST 2024 Bart Van Assche <bvanassche@acm.org> fs: Move enum rw_hint into a new header file Move enum rw_hint into a new header file to prepare for using this data type in the block layer. Add the attribute __packed to reduce the space occupied by instances of this data type from four bytes to one byte. Change the data type of i_write_hint from u8 into enum rw_hint. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Chao Yu <chao@kernel.org> # for the F2FS part Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240202203926.2478590-5-bvanassche@acm.org Signed-off-by: Christian Brauner <brauner@kernel.org> diff c62ebd35 Wed Jul 05 13:01:08 MDT 2023 Jeff Layton <jlayton@kernel.org> f2fs: convert to ctime accessor functions In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-41-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> diff c571fbb5 Sun Jun 11 21:01:16 MDT 2023 Sheng Yong <shengyong@oppo.com> f2fs: add helper to check compression level This patch adds a helper function to check if compression level is valid. Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 00e120b5 Mon Jun 12 01:58:34 MDT 2023 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: assign default compression level Let's avoid any confusion from assigning compress_level=0 for LZ4HC and ZSTD. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 193a639f Wed Dec 21 12:20:01 MST 2022 Yangtao Li <frank.li@vivo.com> f2fs: add iostat support for flush In this patch, it adds to account flush count. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 71644dff Thu Dec 01 18:37:15 MST 2022 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: add block_age-based extent cache This patch introduces a runtime hot/cold data separation method for f2fs, in order to improve the accuracy for data temperature classification, reduce the garbage collection overhead after long-term data updates. Enhanced hot/cold data separation can record data block update frequency as "age" of the extent per inode, and take use of the age info to indicate better temperature type for data block allocation: - It records total data blocks allocated since mount; - When file extent has been updated, it calculate the count of data blocks allocated since last update as the age of the extent; - Before the data block allocated, it searches for the age info and chooses the suitable segment for allocation. Test and result: - Prepare: create about 30000 files * 3% for cold files (with cold file extension like .apk, from 3M to 10M) * 50% for warm files (with random file extension like .FcDxq, from 1K to 4M) * 47% for hot files (with hot file extension like .db, from 1K to 256K) - create(5%)/random update(90%)/delete(5%) the files * total write amount is about 70G * fsync will be called for .db files, and buffered write will be used for other files The storage of test device is large enough(128G) so that it will not switch to SSR mode during the test. Benefit: dirty segment count increment reduce about 14% - before: Dirty +21110 - after: Dirty +18286 Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com> Signed-off-by: xiongping1 <xiongping1@xiaomi.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> diff 1cd2e6d5 Wed Nov 23 09:44:01 MST 2022 Yangtao Li <frank.li@vivo.com> f2fs: define MIN_DISCARD_GRANULARITY macro Do cleanup in f2fs_tuning_parameters() and __init_discard_policy(), let's use macro instead of number. Suggested-by: Chao Yu <chao@kernel.org> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
Completed in 1068 milliseconds