Cross Reference: /linux-master/fs/f2fs/data.c

History log of /linux-master/fs/f2fs/data.c
Revision	Date	Author	Comments
# 31f85ccc	07-Mar-2024	Zhiguo Niu <zhiguo.niu@unisoc.com>	f2fs: unify the error handling of f2fs_is_valid_blkaddr There are some cases of f2fs_is_valid_blkaddr not handled as ERROR_INVALID_BLKADDR,so unify the error handling about all of f2fs_is_valid_blkaddr. Do f2fs_handle_error in __f2fs_is_valid_blkaddr for cleanup. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f1e7646a	25-Feb-2024	Chao Yu <chao@kernel.org>	f2fs: relocate f2fs_precache_extents() in f2fs_swap_activate() This patch exchangs position of f2fs_precache_extents() and filemap_fdatawrite(), so that f2fs_precache_extents() can load extent info after physical addresses of all data are fixed. Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8249aac1	25-Feb-2024	Chao Yu <chao@kernel.org>	f2fs: fix blkofs_end correctly in f2fs_migrate_blocks() In f2fs_migrate_blocks(), when traversing blocks in last section, blkofs_end should be (start_blk + blkcnt - 1) % blk_per_sec, fix it. Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7d009e04	22-Feb-2024	Chao Yu <chao@kernel.org>	f2fs: fix to handle segment allocation failure correctly If CONFIG_F2FS_CHECK_FS is off, and for very rare corner case that we run out of free segment, we should not panic kernel, instead, let's handle such error correctly in its caller. Signed-off-by: Chao Yu <chao@kernel.org> Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 9703d69d	13-Feb-2024	Daeho Jeong <daehojeong@google.com>	f2fs: support file pinning for zoned devices Support file pinning with conventional storage area for zoned devices Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 87161a2b	06-Feb-2024	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: deprecate io_bits Let's deprecate an unused io_bits feature to save CPU cycles and memory. Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c2034ef6	16-Jan-2024	Wenjie Qi <qwjhust@gmail.com>	f2fs: fix NULL pointer dereference in f2fs_submit_page_write() BUG: kernel NULL pointer dereference, address: 0000000000000014 RIP: 0010:f2fs_submit_page_write+0x6cf/0x780 [f2fs] Call Trace: <TASK> ? show_regs+0x6e/0x80 ? __die+0x29/0x70 ? page_fault_oops+0x154/0x4a0 ? prb_read_valid+0x20/0x30 ? __irq_work_queue_local+0x39/0xd0 ? irq_work_queue+0x36/0x70 ? do_user_addr_fault+0x314/0x6c0 ? exc_page_fault+0x7d/0x190 ? asm_exc_page_fault+0x2b/0x30 ? f2fs_submit_page_write+0x6cf/0x780 [f2fs] ? f2fs_submit_page_write+0x736/0x780 [f2fs] do_write_page+0x50/0x170 [f2fs] f2fs_outplace_write_data+0x61/0xb0 [f2fs] f2fs_do_write_data_page+0x3f8/0x660 [f2fs] f2fs_write_single_data_page+0x5bb/0x7a0 [f2fs] f2fs_write_cache_pages+0x3da/0xbe0 [f2fs] ... It is possible that other threads have added this fio to io->bio and submitted the io->bio before entering f2fs_submit_page_write(). At this point io->bio = NULL. If is_end_zone_blkaddr(sbi, fio->new_blkaddr) of this fio is true, then an NULL pointer dereference error occurs at bio_get(io->bio). The original code for determining zone end was after "out:", which would have missed some fio who is zone end. I've moved this code before "skip:" to make sure it's done for each fio. Fixes: e067dc3c6b9c ("f2fs: maintain six open zones for zoned devices") Signed-off-by: Wenjie Qi <qwjhust@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 536af821	29-Jan-2024	Chao Yu <chao@kernel.org>	f2fs: zone: fix to wait completion of last bio in zone correctly It needs to check last zone_pending_bio and wait IO completion before traverse next fio in io->io_list, otherwise, bio in next zone may be submitted before all IO completion in current zone. Fixes: e067dc3c6b9c ("f2fs: maintain six open zones for zoned devices") Cc: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 54607494	12-Jan-2024	Chao Yu <chao@kernel.org>	f2fs: compress: fix to avoid inconsistence bewteen i_blocks and dnode In reserve_compress_blocks(), we update blkaddrs of dnode in prior to inc_valid_block_count(), it may cause inconsistent status bewteen i_blocks and blkaddrs once inc_valid_block_count() fails. To fix this issue, it needs to reverse their invoking order. Fixes: c75488fb4d82 ("f2fs: introduce F2FS_IOC_RESERVE_COMPRESS_BLOCKS") Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# fd244524	12-Jan-2024	Chao Yu <chao@kernel.org>	f2fs: compress: fix to cover normal cluster write with cp_rwsem When we overwrite compressed cluster w/ normal cluster, we should not unlock cp_rwsem during f2fs_write_raw_pages(), otherwise data will be corrupted if partial blocks were persisted before CP & SPOR, due to cluster metadata wasn't updated atomically. Fixes: 4c8ff7095bef ("f2fs: support data compression") Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8a430dd4	12-Jan-2024	Chao Yu <chao@kernel.org>	f2fs: compress: fix to guarantee persisting compressed blocks by CP If data block in compressed cluster is not persisted with metadata during checkpoint, after SPOR, the data may be corrupted, let's guarantee to write compressed page by checkpoint. Fixes: 4c8ff7095bef ("f2fs: support data compression") Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7437bb73	17-Dec-2023	Christoph Hellwig <hch@lst.de>	block: remove support for the host aware zone model When zones were first added the SCSI and ATA specs, two different models were supported (in addition to the drive managed one that is invisible to the host): - host managed where non-conventional zones there is strict requirement to write at the write pointer, or else an error is returned - host aware where a write point is maintained if writes always happen at it, otherwise it is left in an under-defined state and the sequential write preferred zones behave like conventional zones (probably very badly performing ones, though) Not surprisingly this lukewarm model didn't prove to be very useful and was finally removed from the ZBC and SBC specs (NVMe never implemented it). Due to to the easily disappearing write pointer host software could never rely on the write pointer to actually be useful for say recovery. Fortunately only a few HDD prototypes shipped using this model which never made it to mass production. Drop the support before it is too late. Note that any such host aware prototype HDD can still be used with Linux as we'll now treat it as a conventional HDD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20231217165359.604246-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 55fdc1c2	10-Dec-2023	Chao Yu <chao@kernel.org>	f2fs: fix to wait on block writeback for post_read case If inode is compressed, but not encrypted, it missed to call f2fs_wait_on_block_writeback() to wait for GCed page writeback in IPU write path. Thread A GC-Thread - f2fs_gc - do_garbage_collect - gc_data_segment - move_data_block - f2fs_submit_page_write migrate normal cluster's block via meta_inode's page cache - f2fs_write_single_data_page - f2fs_do_write_data_page - f2fs_inplace_write_data - f2fs_submit_page_bio IRQ - f2fs_read_end_io IRQ old data overrides new data due to out-of-order GC and common IO. - f2fs_read_end_io Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4e4f1eb9	10-Dec-2023	Chao Yu <chao@kernel.org>	f2fs: introduce f2fs_invalidate_internal_cache() for cleanup Just cleanup, no logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 59d0d4c3	10-Dec-2023	Chao Yu <chao@kernel.org>	f2fs: update blkaddr in __set_data_blkaddr() for cleanup This patch allows caller to pass blkaddr to f2fs_set_data_blkaddr() and let __set_data_blkaddr() inside f2fs_set_data_blkaddr() to update dn->data_blkaddr w/ last value of blkaddr. Just cleanup, no logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2020cd48	10-Dec-2023	Chao Yu <chao@kernel.org>	f2fs: introduce get_dnode_addr() to clean up codes Just cleanup, no logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# bb6e1c8f	10-Dec-2023	Chao Yu <chao@kernel.org>	f2fs: delete obsolete FI_DROP_CACHE FI_DROP_CACHE was introduced in commit 1e84371ffeef ("f2fs: change atomic and volatile write policies") for volatile write feature, after commit 7bc155fec5b3 ("f2fs: kill volatile write support"), we won't support volatile write, let's delete related codes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a5393636	10-Dec-2023	Chao Yu <chao@kernel.org>	f2fs: delete obsolete FI_FIRST_BLOCK_WRITTEN Commit 3c6c2bebef79 ("f2fs: avoid punch_hole overhead when releasing volatile data") introduced FI_FIRST_BLOCK_WRITTEN as below reason: This patch is to avoid some punch_hole overhead when releasing volatile data. If volatile data was not written yet, we just can make the first page as zero. After commit 7bc155fec5b3 ("f2fs: kill volatile write support"), we won't support volatile write, but it missed to remove obsolete FI_FIRST_BLOCK_WRITTEN, delete it in this patch. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 94589150	15-Nov-2023	Chao Yu <chao@kernel.org>	f2fs: use shared inode lock during f2fs_fiemap() f2fs_fiemap() will only traverse metadata of inode, let's use shared inode lock for it to avoid unnecessary race on inode lock. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# d7e9a903	02-Oct-2023	Daniel Rosenberg <drosen@google.com>	f2fs: Support Block Size == Page Size This allows f2fs to support cases where the block size = page size for both 4K and 16K block sizes. Other sizes should work as well, should the need arise. This does not currently support 4K Block size filesystems if the page size is 16K. Signed-off-by: Daniel Rosenberg <drosen@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 943f7c6f	04-Sep-2023	Chao Yu <chao@kernel.org>	f2fs: compress: fix to avoid fragment w/ OPU during f2fs_ioc_compress_file() If file has both cold and compress flag, during f2fs_ioc_compress_file(), f2fs will trigger IPU for non-compress cluster and OPU for compress cluster, so that, data of the file may be fragmented. Fix it by always triggering OPU for IOs from user mode compression. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2aaea533	28-Aug-2023	Chao Yu <chao@kernel.org>	f2fs: compress: do sanity check on cluster when CONFIG_F2FS_CHECK_FS is on This patch covers sanity check logic on cluster w/ CONFIG_F2FS_CHECK_FS, otherwise, there will be performance regression while querying cluster mapping info. Callers of f2fs_is_compressed_cluster() only care about whether cluster is compressed or not, rather than # of valid blocks in compressed cluster, so, let's adjust f2fs_is_compressed_cluster()'s logic according to caller's requirement. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b0327c84	28-Aug-2023	Chao Yu <chao@kernel.org>	f2fs: compress: fix to avoid use-after-free on dic Call trace: __memcpy+0x128/0x250 f2fs_read_multi_pages+0x940/0xf7c f2fs_mpage_readpages+0x5a8/0x624 f2fs_readahead+0x5c/0x110 page_cache_ra_unbounded+0x1b8/0x590 do_sync_mmap_readahead+0x1dc/0x2e4 filemap_fault+0x254/0xa8c f2fs_filemap_fault+0x2c/0x104 __do_fault+0x7c/0x238 do_handle_mm_fault+0x11bc/0x2d14 do_mem_abort+0x3a8/0x1004 el0_da+0x3c/0xa0 el0t_64_sync_handler+0xc4/0xec el0t_64_sync+0x1b4/0x1b8 In f2fs_read_multi_pages(), once f2fs_decompress_cluster() was called if we hit cached page in compress_inode's cache, dic may be released, it needs break the loop rather than continuing it, in order to avoid accessing invalid dic pointer. Fixes: 6ce19aff0b8c ("f2fs: compress: add compress_inode to cache compressed blocks") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c5d3f9b7	28-Aug-2023	Chao Yu <chao@kernel.org>	f2fs: compress: fix deadloop in f2fs_write_cache_pages() With below mount option and testcase, it hangs kernel. 1. mount -t f2fs -o compress_log_size=5 /dev/vdb /mnt/f2fs 2. touch /mnt/f2fs/file 3. chattr +c /mnt/f2fs/file 4. dd if=/dev/zero of=/mnt/f2fs/file bs=1MB count=1 5. sync 6. dd if=/dev/zero of=/mnt/f2fs/file bs=111 count=11 conv=notrunc 7. sync INFO: task sync:4788 blocked for more than 120 seconds. Not tainted 6.5.0-rc1+ #322 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:sync state:D stack:0 pid:4788 ppid:509 flags:0x00000002 Call Trace: <TASK> __schedule+0x335/0xf80 schedule+0x6f/0xf0 wb_wait_for_completion+0x5e/0x90 sync_inodes_sb+0xd8/0x2a0 sync_inodes_one_sb+0x1d/0x30 iterate_supers+0x99/0xf0 ksys_sync+0x46/0xb0 __do_sys_sync+0x12/0x20 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 The reason is f2fs_all_cluster_page_ready() assumes that pages array should cover at least one cluster, otherwise, it will always return false, result in deadloop. By default, pages array size is 16, and it can cover the case cluster_size is equal or less than 16, for the case cluster_size is larger than 16, let's allocate memory of pages array dynamically. Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5118697f	21-Aug-2023	Chao Yu <chao@kernel.org>	f2fs: fix error path of f2fs_submit_page_read() In error path of f2fs_submit_page_read(), it missed to call iostat_update_and_unbind_ctx() and free bio_post_read_ctx, fix it. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a842a909	17-Jul-2023	Minjie Du <duminjie@vivo.com>	f2fs: increase usage of folio_next_index() helper Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using the existing helper folio_next_index(). Signed-off-by: Minjie Du <duminjie@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# d2d9bb3b	19-Jan-2023	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: get out of a repeat loop when getting a locked data page https://bugzilla.kernel.org/show_bug.cgi?id=216050 Somehow we're getting a page which has a different mapping. Let's avoid the infinite loop. Cc: <stable@vger.kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 94c8431f	11-Jun-2023	Christoph Hellwig <hch@lst.de>	f2fs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method Since commit a2ad63daa88b ("VFS: add FMODE_CAN_ODIRECT file flag") file systems can just set the FMODE_CAN_ODIRECT flag at open time instead of wiring up a dummy direct_IO method to indicate support for direct I/O. Do that for f2fs so that noop_direct_IO can eventually be removed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f082c6b2	14-May-2023	Chao Yu <chao@kernel.org>	f2fs: fix potential deadlock due to unpaired node_write lock use If S_NOQUOTA is cleared from inode during data page writeback of quota file, it may miss to unlock node_write lock, result in potential deadlock, fix to use the lock in paired. Kworker Thread - writepage if (IS_NOQUOTA()) f2fs_down_read(&sbi->node_write); - vfs_cleanup_quota_inode - inode->i_flags &= ~S_NOQUOTA; if (IS_NOQUOTA()) f2fs_up_read(&sbi->node_write); Fixes: 79963d967b49 ("f2fs: shrink node_write lock coverage") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e067dc3c	03-May-2023	Daeho Jeong <daehojeong@google.com>	f2fs: maintain six open zones for zoned devices To keep six open zone constraints, make them not to be open over six open zones. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1223e432	24-Apr-2023	Li Zetao <lizetao1@huawei.com>	f2fs: remove redundant goto statement in f2fs_read_single_page() After the commit "0a4ee518185", this "goto" statement was redundant, remote it for clean code. Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b62e71be	23-Apr-2023	Chao Yu <chao@kernel.org>	f2fs: support errors=remount-ro\|continue\|panic mountoption This patch supports errors=remount-ro\|continue\|panic mount option for f2fs. f2fs behaves as below in three different modes: mode continue remount-ro panic access ops normal noraml N/A syscall errors -EIO -EROFS N/A mount option rw ro N/A pending dir write keep keep N/A pending non-dir write drop keep N/A pending node write drop keep N/A pending meta write keep keep N/A By default it uses "continue" mode. [Yangtao helps to clean up function's name] Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 591fc34e	18-Apr-2023	Daeho Jeong <daehojeong@google.com>	f2fs: use cow inode data when updating atomic write Need to use cow inode data content instead of the one in the original inode, when we try to write the already updated atomic write files. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# bd90c5cd	06-Apr-2023	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: relax sanity check if checkpoint is corrupted 1. extent_cache - let's drop the largest extent_cache 2. invalidate_block - don't show the warnings Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 635a52da	09-Apr-2023	Chao Yu <chao@kernel.org>	f2fs: remove folio_detach_private() in .invalidate_folio and .release_folio We have maintain PagePrivate and page_private and page reference w/ {set,clear}_page_private_*, it doesn't need to call folio_detach_private() in the end of .invalidate_folio and .release_folio, remove it and use f2fs_bug_on instead. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c9b3649a	09-Apr-2023	Chao Yu <chao@kernel.org>	f2fs: fix to drop all dirty pages during umount() if cp_error is set xfstest generic/361 reports a bug as below: f2fs_bug_on(sbi, sbi->fsync_node_num); kernel BUG at fs/f2fs/super.c:1627! RIP: 0010:f2fs_put_super+0x3a8/0x3b0 Call Trace: generic_shutdown_super+0x8c/0x1b0 kill_block_super+0x2b/0x60 kill_f2fs_super+0x87/0x110 deactivate_locked_super+0x39/0x80 deactivate_super+0x46/0x50 cleanup_mnt+0x109/0x170 __cleanup_mnt+0x16/0x20 task_work_run+0x65/0xa0 exit_to_user_mode_prepare+0x175/0x190 syscall_exit_to_user_mode+0x25/0x50 do_syscall_64+0x4c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc During umount(), if cp_error is set, f2fs_wait_on_all_pages() should not stop waiting all F2FS_WB_CP_DATA pages to be writebacked, otherwise, fsync_node_num can be non-zero after f2fs_wait_on_all_pages() causing this bug. In this case, to avoid deadloop in f2fs_wait_on_all_pages(), it needs to drop all dirty pages rather than redirtying them. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5cdb422c	09-Apr-2023	Chao Yu <chao@kernel.org>	f2fs: fix to avoid use-after-free for cached IPU bio xfstest generic/019 reports a bug: kernel BUG at mm/filemap.c:1619! RIP: 0010:folio_end_writeback+0x8a/0x90 Call Trace: end_page_writeback+0x1c/0x60 f2fs_write_end_io+0x199/0x420 bio_endio+0x104/0x180 submit_bio_noacct+0xa5/0x510 submit_bio+0x48/0x80 f2fs_submit_write_bio+0x35/0x300 f2fs_submit_merged_ipu_write+0x2a0/0x2b0 f2fs_write_single_data_page+0x838/0x8b0 f2fs_write_cache_pages+0x379/0xa30 f2fs_write_data_pages+0x30c/0x340 do_writepages+0xd8/0x1b0 __writeback_single_inode+0x44/0x370 writeback_sb_inodes+0x233/0x4d0 __writeback_inodes_wb+0x56/0xf0 wb_writeback+0x1dd/0x2d0 wb_workfn+0x367/0x4a0 process_one_work+0x21d/0x430 worker_thread+0x4e/0x3c0 kthread+0x103/0x130 ret_from_fork+0x2c/0x50 The root cause is: after cp_error is set, f2fs_submit_merged_ipu_write() in f2fs_write_single_data_page() tries to flush IPU bio in cache, however f2fs_submit_merged_ipu_write() missed to check validity of @bio parameter, result in submitting random cached bio which belong to other IO context, then it will cause use-after-free issue, fix it by adding additional validity check. Fixes: 0b20fcec8651 ("f2fs: cache global IPU bio") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c948be79	20-Mar-2023	Yangtao Li <frank.li@vivo.com>	f2fs: remove else in f2fs_write_cache_pages() As Christoph Hellwig point out: Please avoid the else by doing the goto in the branch. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 447286eb	16-Feb-2023	Yangtao Li <frank.li@vivo.com>	f2fs: convert to use bitmap API Let's use BIT() and GENMASK() instead of open it. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 146949de	06-Feb-2023	Jinyoung CHOI <j-young.choi@samsung.com>	f2fs: fix typos in comments This patch is to fix typos in f2fs files. Signed-off-by: Jinyoung Choi <j-young.choi@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 267c159f	05-Feb-2023	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix kernel crash due to null io->bio We should return when io->bio is null before doing anything. Otherwise, panic. BUG: kernel NULL pointer dereference, address: 0000000000000010 RIP: 0010:__submit_merged_write_cond+0x164/0x240 [f2fs] Call Trace: <TASK> f2fs_submit_merged_write+0x1d/0x30 [f2fs] commit_checkpoint+0x110/0x1e0 [f2fs] f2fs_write_checkpoint+0x9f7/0xf00 [f2fs] ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs] __checkpoint_and_complete_reqs+0x84/0x190 [f2fs] ? preempt_count_add+0x82/0xc0 ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs] issue_checkpoint_thread+0x4c/0xf0 [f2fs] ? __pfx_autoremove_wake_function+0x10/0x10 kthread+0xff/0x130 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> Cc: stable@vger.kernel.org # v5.18+ Fixes: 64bf0eef0171 ("f2fs: pass the bio operation to bio_alloc_bioset") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# d9bac032	01-Feb-2023	Yangtao Li <frank.li@vivo.com>	f2fs: use iostat_lat_type directly as a parameter in the iostat_update_and_unbind_ctx() Convert to use iostat_lat_type as parameter instead of raw number. BTW, move NUM_PREALLOC_IOSTAT_CTXS to the header file, adjust iostat_lat[{0,1,2}] to iostat_lat[{READ_IO,WRITE_SYNC_IO,WRITE_ASYNC_IO}] in tracepoint function, and rename iotype to page_type to match the definition. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 844545c5	02-Feb-2023	Eric Biggers <ebiggers@google.com>	f2fs: fix cgroup writeback accounting with fs-layer encryption When writing a page from an encrypted file that is using filesystem-layer encryption (not inline encryption), f2fs encrypts the pagecache page into a bounce page, then writes the bounce page. It also passes the bounce page to wbc_account_cgroup_owner(). That's incorrect, because the bounce page is a newly allocated temporary page that doesn't have the memory cgroup of the original pagecache page. This makes wbc_account_cgroup_owner() not account the I/O to the owner of the pagecache page as it should. Fix this by always passing the pagecache page to wbc_account_cgroup_owner(). Fixes: 578c647879f7 ("f2fs: implement cgroup writeback support") Cc: stable@vger.kernel.org Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2eae077e	02-Feb-2023	Chao Yu <chao@kernel.org>	f2fs: reduce stack memory cost by using bitfield in struct f2fs_io_info This patch tries to use bitfield in struct f2fs_io_info to improve memory usage. struct f2fs_io_info { ... unsigned int need_lock:8; /* indicate we need to lock cp_rwsem / unsigned int version:8; / version of the node / unsigned int submitted:1; / indicate IO submission / unsigned int in_list:1; / indicate fio is in io_list / unsigned int is_por:1; / indicate IO is from recovery or not / unsigned int retry:1; / need to reallocate block address / unsigned int encrypted:1; / indicate file is encrypted / unsigned int post_read:1; / require post read */ ... }; After this patch, size of struct f2fs_io_info reduces from 136 to 120. [Nathan: fix a compile warning (single-bit-bitfield-constant-conversion)] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c40e15a9	20-Dec-2022	Yangtao Li <frank.li@vivo.com>	f2fs: merge f2fs_show_injection_info() into time_to_inject() There is no need to additionally use f2fs_show_injection_info() to output information. Concatenate time_to_inject() and __time_to_inject() via a macro. In the new __time_to_inject() function, pass in the caller function name and parent function. In this way, we no longer need the f2fs_show_injection_info() function, and let's remove it. Suggested-by: Chao Yu <chao@kernel.org> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8358014d	14-Dec-2022	Chao Yu <chao@kernel.org>	f2fs: avoid to check PG_error flag After below changes: commit 14db0b3c7b83 ("fscrypt: stop using PG_error to track error status") commit 98dc08bae678 ("fsverity: stop using PG_error to track error status") There is no place in f2fs we will set PG_error flag in page, let's remove other PG_error usage in f2fs, as a step towards freeing the PG_error flag for other uses. Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# fdb7ccc3	18-Nov-2022	Yangtao Li <frank.li@vivo.com>	f2fs: introduce IS_F2FS_IPU_* macro IS_F2FS_IPU_* macro can be used to identify whether f2fs ipu related policies are enabled. BTW, convert to use BIT() instead of open code. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# fdbf69a7	28-Nov-2022	Christoph Hellwig <hch@lst.de>	f2fs: refactor the hole reporting and allocation logic in f2fs_map_blocks Add a is_hole local variable to figure out if the block number might need allocation, and untangle to logic to report the hole or fill it with a block allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 817c968b	28-Nov-2022	Christoph Hellwig <hch@lst.de>	f2fs: factor out a f2fs_map_no_dnode Factor out a helper to return a hole when no dnode was found. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0094e98b	28-Nov-2022	Christoph Hellwig <hch@lst.de>	f2fs: factor a f2fs_map_blocks_cached helper Add a helper to deal with everything needed to return a f2fs_map_blocks structure based on a lookup in the extent cache. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# cd8fc522	28-Nov-2022	Christoph Hellwig <hch@lst.de>	f2fs: remove the create argument to f2fs_map_blocks The create argument is always identicaly to map->m_may_create, so use that consistently. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ffdeab71	28-Nov-2022	Christoph Hellwig <hch@lst.de>	f2fs: remove f2fs_get_block Fold f2fs_get_block into the two remaining callers to simplify the call chain a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3cf684f2	28-Nov-2022	Christoph Hellwig <hch@lst.de>	f2fs: simplify __allocate_data_block Just use a simple if block for the conditional call to inc_valid_block_count. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 44b0dfeb	28-Nov-2022	Christoph Hellwig <hch@lst.de>	f2fs: reflow prepare_write_begin Reflow prepare_write_begin so that it reads more straight forward, and so that there is one place that does an extent cache lookup instead of three, two of which are hidden in f2fs_get_block calls. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2f51ade9	28-Nov-2022	Christoph Hellwig <hch@lst.de>	f2fs: f2fs_do_map_lock Split f2fs_do_map_lock into a lock and unlock helper to make the code using it easier to read. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# cf342d3b	28-Nov-2022	Christoph Hellwig <hch@lst.de>	f2fs: add a f2fs_get_block_locked helper This allows to keep the f2fs_do_map_lock based locking scheme private to data.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 04a91ab0	28-Nov-2022	Christoph Hellwig <hch@lst.de>	f2fs: add a f2fs_lookup_extent_cache_block helper All but three callers of f2fs_lookup_extent_cache just want the block address. Add a small helper to simplify them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# bc29835a	28-Nov-2022	Christoph Hellwig <hch@lst.de>	f2fs: split __submit_bio Split __submit_bio into one function each for reads and writes, and a helper for aligning writes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# da8c7fec	28-Nov-2022	Christoph Hellwig <hch@lst.de>	f2fs: rename F2FS_MAP_UNWRITTEN to F2FS_MAP_DELALLOC NEW_ADDR blocks are purely in-memory preallocated blocks, and thus equivalent to what the core FS code calls delayed allocations, and not unwritten extents which do have on-disk blocks allocated from which reads always return zeroes until they are converted to written status. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8d3c1fa3	28-Nov-2022	Christoph Hellwig <hch@lst.de>	f2fs: don't rely on F2FS_MAP_* in f2fs_iomap_begin When testing with a mixed zoned / convention device combination, there are regular but not 100% reproducible failures in xfstests generic/113 where the __is_valid_data_blkaddr assert hits due to finding a hole. This seems to be because f2fs_map_blocks can set this flag on a hole when it was found in the extent cache. Rework f2fs_iomap_begin to just check the special block numbers directly. This has the added benefits of the WARN_ON showing which invalid block address we found, and being properly error out on delalloc blocks that are confusingly called unwritten but not actually suitable for direct I/O. Fixes: 1517c1a7a445 ("f2fs: implement iomap operations") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6779b5db	21-Dec-2022	Chao Yu <chao@kernel.org>	f2fs: fix to call clear_page_private_reference in .{release,invalid}_folio b763f3bedc2d ("f2fs: restructure f2fs page.private layout") missed to call clear_page_private_reference() in .{release,invalid}_folio, fix it, though it's not a big deal since folio_detach_private() was called to clear all privae info and reference count in the page. BTW, remove page_private_reference() definition as it never be used. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1cd98ee7	04-Jan-2023	Vishal Moola (Oracle) <vishal.moola@gmail.com>	f2fs: convert f2fs_write_cache_pages() to use filemap_get_folios_tag() Convert the function to use a folio_batch instead of pagevec. This is in preparation for the removal of find_get_pages_range_tag(). Also modified f2fs_all_cluster_page_ready to take in a folio_batch instead of pagevec. This does NOT support large folios. The function currently only utilizes folios of size 1 so this shouldn't cause any issues right now. This version of the patch limits the number of pages fetched to F2FS_ONSTACK_PAGES. If that ever happens, update the start index here since filemap_get_folios_tag() updates the index to be after the last found folio, not necessarily the last used page. Link: https://lkml.kernel.org/r/20230104211448.4804-15-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
# feb0576a	23-Dec-2022	Eric Biggers <ebiggers@google.com>	f2fs: simplify f2fs_readpage_limit() Now that the implementation of FS_IOC_ENABLE_VERITY has changed to not involve reading back Merkle tree blocks that were previously written, there is no need for f2fs_readpage_limit() to allow for this case. Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20221223203638.41293-10-ebiggers@kernel.org
# fe59109a	16-Dec-2022	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: initialize extent_cache parameter This can avoid confusing tracepoint values. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e7547dac	30-Nov-2022	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: refactor extent_cache to support for read and more This patch prepares extent_cache to be ready for addition. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 870af777	25-Nov-2022	Yangtao Li <frank.li@vivo.com>	f2fs: do some cleanup for f2fs module init Just for cleanup, no functional changes. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 41e8f85a	11-Nov-2022	Daeho Jeong <daehojeong@google.com>	f2fs: introduce F2FS_IOC_START_ATOMIC_REPLACE introduce a new ioctl to replace the whole content of a file atomically, which means it induces truncate and content update at the same time. We can start it with F2FS_IOC_START_ATOMIC_REPLACE and complete it with F2FS_IOC_COMMIT_ATOMIC_WRITE. Or abort it with F2FS_IOC_ABORT_ATOMIC_WRITE. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 59237a21	08-Nov-2022	Chao Yu <chao@kernel.org>	f2fs: optimize iteration over sparse directories Wei Chen reports a kernel bug as blew: INFO: task syz-executor.0:29056 blocked for more than 143 seconds. Not tainted 5.15.0-rc5 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor.0 state:D stack:14632 pid:29056 ppid: 6574 flags:0x00000004 Call Trace: __schedule+0x4a1/0x1720 schedule+0x36/0xe0 rwsem_down_write_slowpath+0x322/0x7a0 fscrypt_ioctl_set_policy+0x11f/0x2a0 __f2fs_ioctl+0x1a9f/0x5780 f2fs_ioctl+0x89/0x3a0 __x64_sys_ioctl+0xe8/0x140 do_syscall_64+0x34/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Eric did some investigation on this issue, quoted from reply of Eric: "Well, the quality of this bug report has a lot to be desired (not on upstream kernel, reproducer is full of totally irrelevant stuff, not sent to the mailing list of the filesystem whose disk image is being fuzzed, etc.). But what is going on is that f2fs_empty_dir() doesn't consider the case of a directory with an extremely large i_size on a malicious disk image. Specifically, the reproducer mounts an f2fs image with a directory that has an i_size of 14814520042850357248, then calls FS_IOC_SET_ENCRYPTION_POLICY on it. That results in a call to f2fs_empty_dir() to check whether the directory is empty. f2fs_empty_dir() then iterates through all 3616826182336513 blocks the directory allegedly contains to check whether any contain anything. i_rwsem is held during this, so anything else that tries to take it will hang." In order to solve this issue, let's use f2fs_get_next_page_offset() to speed up iteration by skipping holes for all below functions: - f2fs_empty_dir - f2fs_readdir - find_in_level The way why we can speed up iteration was described in 'commit 3cf4574705b4 ("f2fs: introduce get_next_page_offset to speed up SEEK_DATA")'. Meanwhile, in f2fs_empty_dir(), let's use f2fs_find_data_page() instead f2fs_get_lock_data_page(), due to i_rwsem was held in caller of f2fs_empty_dir(), there shouldn't be any races, so it's fine to not lock dentry page during lookuping dirents in the page. Link: https://lore.kernel.org/lkml/536944df-a0ae-1dd8-148f-510b476e1347@kernel.org/T/ Reported-by: Wei Chen <harperchen1110@gmail.com> Cc: Eric Biggers <ebiggers@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 98dc08ba	29-Nov-2022	Eric Biggers <ebiggers@google.com>	fsverity: stop using PG_error to track error status As a step towards freeing the PG_error flag for other uses, change ext4 and f2fs to stop using PG_error to track verity errors. Instead, if a verity error occurs, just mark the whole bio as failed. The coarser granularity isn't really a problem since it isn't any worse than what the block layer provides, and errors from a multi-page readahead aren't reported to applications unless a single-page read fails too. f2fs supports compression, which makes the f2fs changes a bit more complicated than desired, but the basic premise still works. Note: there are still a few uses of PageError in f2fs, but they are on the write path, so they are unrelated and this patch doesn't touch them. Reviewed-by: Chao Yu <chao@kernel.org> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221129070401.156114-1-ebiggers@kernel.org
# 8ec071c3	03-Oct-2022	Chao Yu <chao.yu@oppo.com>	f2fs: account swapfile inodes In order to check count of opened swapfile inodes. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 95fa90c9	28-Sep-2022	Chao Yu <chao@kernel.org>	f2fs: support recording errors into superblock This patch supports to record detail reason of FSCORRUPTED error into f2fs_super_block.s_errors[]. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a9cfee0e	28-Sep-2022	Chao Yu <chao@kernel.org>	f2fs: support recording stop_checkpoint reason into super_block This patch supports to record stop_checkpoint error into f2fs_super_block.s_stop_reason[]. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ca7efd71	23-Sep-2022	Zhang Qilong <zhangqilong3@huawei.com>	f2fs: remove the unnecessary check in f2fs_xattr_fiemap Whehter or not error occurs, checking "err == 1" is unnecessary in f2fs_xattr_fiemap(), and just remove it here. Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# d80afefb	14-Sep-2022	Chao Yu <chao@kernel.org>	f2fs: fix to account FS_CP_DATA_IO correctly f2fs_inode_info.cp_task was introduced for FS_CP_DATA_IO accounting since commit b0af6d491a6b ("f2fs: add app/fs io stat"). However, cp_task usage coverage has been increased due to below commits: commit 040d2bb318d1 ("f2fs: fix to avoid deadloop if data_flush is on") commit 186857c5a14a ("f2fs: fix potential recursive call when enabling data_flush") So that, if data_flush mountoption is on, when data flush was triggered from background, the IO from data flush will be accounted as checkpoint IO type incorrectly. In order to fix this issue, this patch splits cp_task into two: a) cp_task: used for IO accounting b) wb_task: used to avoid deadlock Fixes: 040d2bb318d1 ("f2fs: fix to avoid deadloop if data_flush is on") Fixes: 186857c5a14a ("f2fs: fix potential recursive call when enabling data_flush") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 544b53da	13-Sep-2022	Zhang Qilong <zhangqilong3@huawei.com>	f2fs: code clean and fix a type error ERROR: code indent should use tabs where possible ERROR: spaces required around that ':' ERROR: incorrect tab Found serveral code type errors when review the code and fix it. There is no function change. Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f3b23c78	06-Sep-2022	Weichao Guo <guoweichao@oppo.com>	f2fs: let FI_OPU_WRITE override FADVISE_COLD_BIT Cold files may be fragmented due to SSR, defragment is needed as sequential reads are dominant scenarios of these files. FI_OPU_WRITE should override FADVISE_COLD_BIT to avoid defragment fails. Signed-off-by: Weichao Guo <guoweichao@oppo.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 9b7eadd9	30-Aug-2022	Shuqi Zhang <zhangshuqi3@huawei.com>	f2fs: fix wrong dirty page count when race between mmap and fallocate. This is a BUG_ON issue as follows when running xfstest-generic-503: WARNING: CPU: 21 PID: 1385 at fs/f2fs/inode.c:762 f2fs_evict_inode+0x847/0xaa0 Modules linked in: CPU: 21 PID: 1385 Comm: umount Not tainted 5.19.0-rc5+ #73 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 Call Trace: evict+0x129/0x2d0 dispose_list+0x4f/0xb0 evict_inodes+0x204/0x230 generic_shutdown_super+0x5b/0x1e0 kill_block_super+0x29/0x80 kill_f2fs_super+0xe6/0x140 deactivate_locked_super+0x44/0xc0 deactivate_super+0x79/0x90 cleanup_mnt+0x114/0x1a0 __cleanup_mnt+0x16/0x20 task_work_run+0x98/0x100 exit_to_user_mode_prepare+0x3d0/0x3e0 syscall_exit_to_user_mode+0x12/0x30 do_syscall_64+0x42/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Function flow analysis when BUG occurs: f2fs_fallocate mmap do_page_fault pte_spinlock // ---lock_pte do_wp_page wp_page_shared pte_unmap_unlock // unlock_pte do_page_mkwrite f2fs_vm_page_mkwrite down_read(invalidate_lock) lock_page if (PageMappedToDisk(page)) goto out; // set_page_dirty --NOT RUN out: up_read(invalidate_lock); finish_mkwrite_fault // unlock_pte f2fs_collapse_range down_write(i_mmap_sem) truncate_pagecache unmap_mapping_pages i_mmap_lock_write // down_write(i_mmap_rwsem) ...... zap_pte_range pte_offset_map_lock // ---lock_pte set_page_dirty f2fs_dirty_data_folio if (!folio_test_dirty(folio)) { fault_dirty_shared_page set_page_dirty f2fs_dirty_data_folio if (!folio_test_dirty(folio)) { filemap_dirty_folio f2fs_update_dirty_folio // ++ } unlock_page filemap_dirty_folio f2fs_update_dirty_folio // page count++ } pte_unmap_unlock // --unlock_pte i_mmap_unlock_write // up_write(i_mmap_rwsem) truncate_inode_pages up_write(i_mmap_sem) When race happens between mmap-do_page_fault-wp_page_shared and fallocate-truncate_pagecache-zap_pte_range, the zap_pte_range calls function set_page_dirty without page lock. Besides, though truncate_pagecache has immap and pte lock, wp_page_shared calls fault_dirty_shared_page without any. In this case, two threads race in f2fs_dirty_data_folio function. Page is set to dirty only ONCE, but the count is added TWICE by calling filemap_dirty_folio. Thus the count of dirty page cannot accord with the real dirty pages. Following is the solution to in case of race happens without any lock. Since folio_test_set_dirty in filemap_dirty_folio is atomic, judge return value will not be at risk of race. Signed-off-by: Shuqi Zhang <zhangshuqi3@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 34a23525	19-Aug-2022	Chao Yu <chao.yu@oppo.com>	f2fs: iostat: support accounting compressed IO Previously, we supported to account FS_CDATA_READ_IO type IO only, in this patch, it adds to account more type IO for compressed file: - APP_BUFFERED_CDATA_IO - APP_MAPPED_CDATA_IO - FS_CDATA_IO - APP_BUFFERED_CDATA_READ_IO - APP_MAPPED_CDATA_READ_IO Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 14db0b3c	15-Aug-2022	Eric Biggers <ebiggers@google.com>	fscrypt: stop using PG_error to track error status As a step towards freeing the PG_error flag for other uses, change ext4 and f2fs to stop using PG_error to track decryption errors. Instead, if a decryption error occurs, just mark the whole bio as failed. The coarser granularity isn't really a problem since it isn't any worse than what the block layer provides, and errors from a multi-page readahead aren't reported to applications unless a single-page read fails too. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <chao@kernel.org> # for f2fs part Link: https://lore.kernel.org/r/20220815235052.86545-2-ebiggers@kernel.org
# 01fc4b9a	30-Jul-2022	Fengnan Chang <fengnanchang@gmail.com>	f2fs: use onstack pages instead of pvec Since pvec have 15 pages, it not a multiple of 4, when write compressed pages, write in 64K as a unit, it will call pagevec_lookup_range_tag agagin, sometimes this will take a lot of time. Use onstack pages instead of pvec to mitigate this problem. Signed-off-by: Fengnan Chang <fengnanchang@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4f8219f8	30-Jul-2022	Fengnan Chang <changfengnan@vivo.com>	f2fs: intorduce f2fs_all_cluster_page_ready When write total cluster, all pages is uptodate, there is not need to call f2fs_prepare_compress_overwrite, intorduce f2fs_all_cluster_page_ready to avoid this. Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# bff139b4	02-Aug-2022	Daeho Jeong <daehojeong@google.com>	f2fs: handle decompress only post processing in softirq Now decompression is being handled in workqueue and it makes read I/O latency non-deterministic, because of the non-deterministic scheduling nature of workqueues. So, I made it handled in softirq context only if possible, not in low memory devices, since this modification will maintain decompresion related memory a little longer. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f8e2f32b	18-Jul-2022	Daeho Jeong <daehojeong@google.com>	f2fs: introduce sysfs atomic write statistics introduce the below 4 new sysfs node for atomic write statistics. - current_atomic_write: the total current atomic write block count, which is not committed yet. - peak_atomic_write: the peak value of total current atomic write block count after boot. - committed_atomic_block: the accumulated total committed atomic write block count after boot. - revoked_atomic_block: the accumulated total revoked atomic write block count after boot. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0d5b9d81	12-Jul-2022	Chao Yu <chao@kernel.org>	f2fs: invalidate meta pages only for post_read required inode After commit e3b49ea36802 ("f2fs: invalidate META_MAPPING before IPU/DIO write"), invalidate_mapping_pages() will be called to avoid race condition in between IPU/DIO and readahead for GC. However, readahead flow is only used for post_read required inode, so this patch adds check condition to avoids unnecessary page cache invalidating for non-post_read inode. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 67ca0687	06-Jul-2022	Chao Yu <chao@kernel.org>	f2fs: fix to invalidate META_MAPPING before DIO write Quoted from commit e3b49ea36802 ("f2fs: invalidate META_MAPPING before IPU/DIO write") " Encrypted pages during GC are read and cached in META_MAPPING. However, due to cached pages in META_MAPPING, there is an issue where newly written pages are lost by IPU or DIO writes. Thread A - f2fs_gc() Thread B /* phase 3 / down_write(i_gc_rwsem) ra_data_block() ---- (a) up_write(i_gc_rwsem) f2fs_direct_IO() : - down_read(i_gc_rwsem) - __blockdev_direct_io() - get_data_block_dio_write() - f2fs_dio_submit_bio() ---- (b) - up_read(i_gc_rwsem) / phase 4 */ down_write(i_gc_rwsem) move_data_block() ---- (c) up_write(i_gc_rwsem) (a) In phase 3 of f2fs_gc(), up-to-date page is read from storage and cached in META_MAPPING. (b) In thread B, writing new data by IPU or DIO write on same blkaddr as read in (a). cached page in META_MAPPING become out-dated. (c) In phase 4 of f2fs_gc(), out-dated page in META_MAPPING is copied to new blkaddr. In conclusion, the newly written data in (b) is lost. To address this issue, invalidating pages in META_MAPPING before IPU or DIO write. " In previous commit, we missed to cover extent cache hit case, and passed wrong value for parameter @end of invalidate_mapping_pages(), fix both issues. Fixes: 6aa58d8ad20a ("f2fs: readahead encrypted block during GC") Fixes: e3b49ea36802 ("f2fs: invalidate META_MAPPING before IPU/DIO write") Cc: Hyeong-Jun Kim <hj514.kim@samsung.com> Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1d5b9bd6	06-Jun-2022	Matthew Wilcox (Oracle) <willy@infradead.org>	f2fs: Convert to filemap_migrate_folio() filemap_migrate_folio() fits f2fs's needs perfectly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Chao Yu <chao@kernel.org>
# 7649c873	14-Jul-2022	Bart Van Assche <bvanassche@acm.org>	fs/f2fs: Use the enum req_op and blk_opf_t types Improve static type checking by using the enum req_op type for variables that represent a request operation and the new blk_opf_t type for variables that represent request flags. Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-53-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 908ea654	25-May-2022	Yufen Yu <yuyufen@huawei.com>	f2fs: add f2fs_init_write_merge_io function Almost all other initialization of variables in f2fs_fill_super are extraced to a single function. Also do it for write_io[], which can make code more clean. This patch just refactors the code, theres no functional change. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> [Jaegeuk Kim: clean up] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7bc155fe	05-May-2022	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: kill volatile write support There's no user, since all can use atomic writes simply. Let's kill it. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3db1de0e	28-Apr-2022	Daeho Jeong <daehojeong@google.com>	f2fs: change the current atomic write way Current atomic write has three major issues like below. - keeps the updates in non-reclaimable memory space and they are even hard to be migrated, which is not good for contiguous memory allocation. - disk spaces used for atomic files cannot be garbage collected, so this makes it difficult for the filesystem to be defragmented. - If atomic write operations hit the threshold of either memory usage or garbage collection failure count, All the atomic write operations will fail immediately. To resolve the issues, I will keep a COW inode internally for all the updates to be flushed from memory, when we need to flush them out in a situation like high memory pressure. These COW inodes will be tagged as orphan inodes to be reclaimed in case of sudden power-cut or system failure during atomic writes. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c26cd045	30-Apr-2022	Matthew Wilcox (Oracle) <willy@infradead.org>	f2fs: Convert to release_folio While converting f2fs_release_page() to f2fs_release_folio(), cache the sb_info so we don't need to retrieve it twice, and remove the redundant call to set_page_private(). The use of folios should be pushed further into f2fs from here. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jeff Layton <jlayton@kernel.org>
# be05584f	29-Apr-2022	Matthew Wilcox (Oracle) <willy@infradead.org>	f2fs: Convert f2fs to read_folio This is a "weak" conversion which converts straight back to using pages. A full conversion should be performed at some point, hopefully by someone familiar with the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
# 9d6b0cd7	22-Feb-2022	Matthew Wilcox (Oracle) <willy@infradead.org>	fs: Remove flags parameter from aops->write_begin There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
# 0adc2ab0	12-Apr-2022	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: keep io_flags to avoid IO split due to different op_flags in two fio holders Let's attach io_flags to bio only, so that we can merge IOs given original io_flags only. Fixes: 64bf0eef0171 ("f2fs: pass the bio operation to bio_alloc_bioset") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0fb5b2eb	29-Mar-2022	Matthew Wilcox (Oracle) <willy@infradead.org>	f2fs: Correct f2fs_dirty_data_folio() conversion I got the return value wrong. Very little checks the return value from set_page_dirty(), so nobody noticed during testing. Fixes: 4f5e34f71318 ("f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
# 704528d8	23-Mar-2022	Matthew Wilcox (Oracle) <willy@infradead.org>	fs: Remove ->readpages address space operation All filesystems have now been converted to use ->readahead, so remove the ->readpages operation and fix all the comments that used to refer to it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
# c75e707f	04-Mar-2022	Christoph Hellwig <hch@lst.de>	block: remove the per-bio/request write hint With the NVMe support for this gone, there are no consumers of these hints left, so remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220304175556.407719-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# a64239d0	22-Mar-2022	NeilBrown <neilb@suse.de>	f2fs: replace congestion_wait() calls with io_schedule_timeout() As congestion is no longer tracked, congestion_wait() is effectively equivalent to io_schedule_timeout(). So introduce f2fs_io_schedule_timeout() which sets TASK_UNINTERRUPTIBLE and call that instead. Link: https://lkml.kernel.org/r/164549983744.9187.6425865370954230902.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Paolo Valente <paolo.valente@linaro.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 9b56adcf	17-Mar-2022	Fengnan Chang <changfengnan@vivo.com>	f2fs: fix compressed file start atomic write may cause data corruption When compressed file has blocks, f2fs_ioc_start_atomic_write will succeed, but compressed flag will be remained in inode. If write partial compreseed cluster and commit atomic write will cause data corruption. This is the reproduction process: Step 1: create a compressed file ,write 64K data , call fsync(), then the blocks are write as compressed cluster. Step2: iotcl(F2FS_IOC_START_ATOMIC_WRITE) --- this should be fail, but not. write page 0 and page 3. iotcl(F2FS_IOC_COMMIT_ATOMIC_WRITE) -- page 0 and 3 write as normal file, Step3: drop cache. read page 0-4 -- Since page 0 has a valid block address, read as non-compressed cluster, page 1 and 2 will be filled with compressed data or zero. The root cause is, after commit 7eab7a696827 ("f2fs: compress: remove unneeded read when rewrite whole cluster"), in step 2, f2fs_write_begin() only set target page dirty, and in f2fs_commit_inmem_pages(), we will write partial raw pages into compressed cluster, result in corrupting compressed cluster layout. Fixes: 4c8ff7095bef ("f2fs: support data compression") Fixes: 7eab7a696827 ("f2fs: compress: remove unneeded read when rewrite whole cluster") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4f5e34f7	09-Feb-2022	Matthew Wilcox (Oracle) <willy@infradead.org>	f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio Removes several calls to __set_page_dirty_nobuffers(). Also turn the PageSwapCache() case into a BUG() as there's no way for a swapcache page to make it to a filesystem that doesn't use SWP_FS_OPS. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
# 91503996	09-Feb-2022	Matthew Wilcox (Oracle) <willy@infradead.org>	f2fs: Convert invalidatepage to invalidate_folio This is a minimal change which just accepts the new arguments and passes the single struct page to the functions which do the work. There is very little progress here toards making f2fs support large folios. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
# 34415099	26-Jan-2022	Chao Yu <chao@kernel.org>	f2fs: fix to avoid potential deadlock Quoted from Jing Xia's report, there is a potential deadlock may happen between kworker and checkpoint as below: [T:writeback] [T:checkpoint] - wb_writeback - blk_start_plug bio contains NodeA was plugged in writeback threads - do_writepages -- sync write inodeB, inc wb_sync_req[DATA] - f2fs_write_data_pages - f2fs_write_single_data_page -- write last dirty page - f2fs_do_write_data_page - set_page_writeback -- clear page dirty flag and PAGECACHE_TAG_DIRTY tag in radix tree - f2fs_outplace_write_data - f2fs_update_data_blkaddr - f2fs_wait_on_page_writeback -- wait NodeA to writeback here - inode_dec_dirty_pages - writeback_sb_inodes - writeback_single_inode - do_writepages - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA] - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped - blk_finish_plug Let's try to avoid deadlock condition by forcing unplugging previous bio via blk_finish_plug(current->plug) once we'v skipped writeback in writepages() due to valid sbi->wb_sync_req[DATA/NODE]. Fixes: 687de7f1010c ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE") Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Jing Xia <jing.xia@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8a2c77bc	28-Jan-2022	Eric Biggers <ebiggers@google.com>	f2fs: support direct I/O with fscrypt using blk-crypto Encrypted files traditionally haven't supported DIO, due to the need to encrypt/decrypt the data. However, when the encryption is implemented using inline encryption (blk-crypto) instead of the traditional filesystem-layer encryption, it is straightforward to support DIO. Therefore, make f2fs support DIO on files that are using inline encryption. Since f2fs uses iomap for DIO, and fscrypt support was already added to iomap DIO, this just requires two small changes: - Let DIO proceed when supported, by checking fscrypt_dio_supported() instead of assuming that encrypted files never support DIO. - In f2fs_iomap_begin(), use fscrypt_limit_io_blocks() to limit the length of the mapping in the rare case where a DUN discontiguity occurs in the middle of an extent. The iomap DIO implementation requires this, since it assumes that it can submit a bio covering (up to) the whole mapping, without checking fscrypt constraints itself. Co-developed-by: Satya Tangirala <satyat@google.com> Signed-off-by: Satya Tangirala <satyat@google.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Link: https://lore.kernel.org/r/20220128233940.79464-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
# 1018a546	04-Feb-2022	Chao Yu <chao@kernel.org>	f2fs: introduce F2FS_IPU_HONOR_OPU_WRITE ipu policy Once F2FS_IPU_FORCE policy is enabled in some cases: a) f2fs forces to use F2FS_IPU_FORCE in a small-sized volume b) user sets F2FS_IPU_FORCE policy via sysfs Then we may fail to defragment file due to IPU policy check, it doesn't make sense, let's introduce a new IPU policy to allow OPU during file defragmentation. In small-sized volume, let's enable F2FS_IPU_HONOR_OPU_WRITE policy by default. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e4544b63	07-Jan-2022	Tim Murray <timmurray@google.com>	f2fs: move f2fs to use reader-unfair rwsems f2fs rw_semaphores work better if writers can starve readers, especially for the checkpoint thread, because writers are strictly more important than reader threads. This prevents significant priority inversion between low-priority readers that blocked while trying to acquire the read lock and a second acquisition of the write lock that might be blocking high priority work. Signed-off-by: Tim Murray <timmurray@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 64bf0eef	28-Feb-2022	Christoph Hellwig <hch@lst.de>	f2fs: pass the bio operation to bio_alloc_bioset Refactor block I/O code so that the bio operation and known flags are set at bio allocation time. Only the later updated flags are updated on the fly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220228124123.856027-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 5189810a	28-Feb-2022	Christoph Hellwig <hch@lst.de>	f2fs: don't pass a bio to f2fs_target_device Set the bdev at bio allocation time by changing the f2fs_target_device calling conventions, so that no bio needs to be passed in. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220228124123.856027-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 609be106	24-Jan-2022	Christoph Hellwig <hch@lst.de>	block: pass a block_device and opf to bio_alloc_bioset Pass the block_device and operation that we plan to use this bio for to bio_alloc_bioset to optimize the assigment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 0a4ee518	21-Jan-2022	Christoph Hellwig <hch@lst.de>	mm: remove cleancache Patch series "remove Xen tmem leftovers". Since the removal of the Xen tmem driver in 2019, the cleancache hooks are entirely unused, as are large parts of frontswap. This series against linux-next (with the folio changes included) removes cleancaches, and cuts down frontswap to the bits actually used by zswap. This patch (of 13): The cleancache subsystem is unused since the removal of Xen tmem driver in commit 814bbf49dcd0 ("xen: remove tmem driver"). [akpm@linux-foundation.org: remove now-unreachable code] Link: https://lkml.kernel.org/r/20211224062246.1258487-1-hch@lst.de Link: https://lkml.kernel.org/r/20211224062246.1258487-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 2a64e303	16-Dec-2021	Chao Yu <chao@kernel.org>	f2fs: don't drop compressed page cache in .{invalidate,release}page For compressed inode, in .{invalidate,release}page, we will call f2fs_invalidate_compress_pages() to drop all compressed page cache of current inode. But we don't need to drop compressed page cache synchronously in .invalidatepage, because, all trancation paths of compressed physical block has been covered with f2fs_invalidate_compress_page(). And also we don't need to drop compressed page cache synchronously in .releasepage, because, if there is out-of-memory, we can count on page cache reclaim on sbi->compress_inode. BTW, this patch may fix the issue reported below: https://lore.kernel.org/linux-f2fs-devel/20211202092812.197647-1-changfengnan@vivo.com/T/#u Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a9419b63	13-Dec-2021	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: do not bother checkpoint by f2fs_get_node_info This patch tries to mitigate lock contention between f2fs_write_checkpoint and f2fs_get_node_info along with nat_tree_lock. The idea is, if checkpoint is currently running, other threads that try to grab nat_tree_lock would be better to wait for checkpoint. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 19bdba52	09-Dec-2021	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid EINVAL by SBI_NEED_FSCK when pinning a file Android OTA failed due to SBI_NEED_FSCK flag when pinning the file. Let's avoid it since we can do in-place-updates. Cc: stable@vger.kernel.org Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a1e09b03	23-Jul-2021	Eric Biggers <ebiggers@google.com>	f2fs: use iomap for direct I/O Make f2fs_file_read_iter() and f2fs_file_write_iter() use the iomap direct I/O implementation instead of the fs/direct-io.c one. The iomap implementation is more efficient, and it also avoids the need to add new features and optimizations to the old implementation. This new implementation also eliminates the need for f2fs to hook bio submission and completion and to allocate memory per-bio. This is because it's possible to correctly update f2fs's in-flight DIO counters using __iomap_dio_rw() in combination with an implementation of iomap_dio_ops::end_io() (as suggested by Christoph Hellwig). When possible, this new implementation preserves existing f2fs behavior such as the conditions for falling back to buffered I/O. This patch has been tested with xfstests by running 'gce-xfstests -c f2fs -g auto -X generic/017' with and without this patch; no regressions were seen. (Some tests fail both before and after. generic/017 hangs both before and after, so it had to be excluded.) Signed-off-by: Eric Biggers <ebiggers@google.com> [Jaegeuk Kim: use spin_lock_bh for f2fs_update_iostat in softirq] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1517c1a7	23-Jul-2021	Eric Biggers <ebiggers@google.com>	f2fs: implement iomap operations Implement 'struct iomap_ops' for f2fs, in preparation for making f2fs use iomap for direct I/O. Note that this may be used for other things besides direct I/O in the future; however, for now I've only tested it for direct I/O. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# d4dd19ec	12-Nov-2021	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: do not expose unwritten blocks to user by DIO DIO preallocates physical blocks before writing data, but if an error occurrs or power-cut happens, we can see block contents from the disk. This patch tries to fix it by 1) turning to buffered writes for DIO into holes, 2) truncating unwritten blocks from error or power-cut. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3d697a4a	16-Jul-2021	Eric Biggers <ebiggers@google.com>	f2fs: rework write preallocations f2fs_write_begin() assumes that all blocks were preallocated by default unless FI_NO_PREALLOC is explicitly set. This invites data corruption, as there are cases in which not all blocks are preallocated. Commit 47501f87c61a ("f2fs: preallocate DIO blocks when forcing buffered_io") fixed one case, but there are others remaining. Fix up this logic by replacing this flag with FI_PREALLOCATED_ALL, which only gets set if all blocks for the current write were preallocated. Also clean up f2fs_preallocate_blocks(), move it to file.c, and make it handle some of the logic that was previously in write_iter() directly. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3271d7eb	09-Nov-2021	Fengnan Chang <changfengnan@vivo.com>	f2fs: compress: reduce one page array alloc and free when write compressed page Don't alloc new page pointers array to replace old, just use old, introduce valid_nr_cpages to indicate valid number of page pointers in array, try to reduce one page array alloc and free when write compress page. Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4034247a	14-Jan-2022	NeilBrown <neilb@suse.de>	mm: introduce memalloc_retry_wait() Various places in the kernel - largely in filesystems - respond to a memory allocation failure by looping around and re-trying. Some of these cannot conveniently use __GFP_NOFAIL, for reasons such as: - a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on - a need to check for the process being signalled between failures - the possibility that other recovery actions could be performed - the allocation is quite deep in support code, and passing down an extra flag to say if __GFP_NOFAIL is wanted would be clumsy. Many of these currently use congestion_wait() which (in almost all cases) simply waits the given timeout - congestion isn't tracked for most devices. It isn't clear what the best delay is for loops, but it is clear that the various filesystems shouldn't be responsible for choosing a timeout. This patch introduces memalloc_retry_wait() with takes on that responsibility. Code that wants to retry a memory allocation can call this function passing the GFP flags that were used. It will wait however is appropriate. For now, it only considers __GFP_NORETRY and whatever gfpflags_allow_blocking() tests. If blocking is allowed without __GFP_NORETRY, then alloc_page either made some reclaim progress, or waited for a while, before failing. So there is no need for much further waiting. memalloc_retry_wait() will wait until the current jiffie ends. If this condition is not met, then alloc_page() won't have waited much if at all. In that case memalloc_retry_wait() waits about 200ms. This is the delay that most current loops uses. linux/sched/mm.h needs to be included in some files now, but linux/backing-dev.h does not. Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name Signed-off-by: NeilBrown <neilb@suse.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# e3b49ea3	02-Nov-2021	Hyeong-Jun Kim <hj514.kim@samsung.com>	f2fs: invalidate META_MAPPING before IPU/DIO write Encrypted pages during GC are read and cached in META_MAPPING. However, due to cached pages in META_MAPPING, there is an issue where newly written pages are lost by IPU or DIO writes. Thread A - f2fs_gc() Thread B /* phase 3 / down_write(i_gc_rwsem) ra_data_block() ---- (a) up_write(i_gc_rwsem) f2fs_direct_IO() : - down_read(i_gc_rwsem) - __blockdev_direct_io() - get_data_block_dio_write() - f2fs_dio_submit_bio() ---- (b) - up_read(i_gc_rwsem) / phase 4 */ down_write(i_gc_rwsem) move_data_block() ---- (c) up_write(i_gc_rwsem) (a) In phase 3 of f2fs_gc(), up-to-date page is read from storage and cached in META_MAPPING. (b) In thread B, writing new data by IPU or DIO write on same blkaddr as read in (a). cached page in META_MAPPING become out-dated. (c) In phase 4 of f2fs_gc(), out-dated page in META_MAPPING is copied to new blkaddr. In conclusion, the newly written data in (b) is lost. To address this issue, invalidating pages in META_MAPPING before IPU or DIO write. Fixes: 6aa58d8ad20a ("f2fs: readahead encrypted block during GC") Signed-off-by: Hyeong-Jun Kim <hj514.kim@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b368cc5e	22-Oct-2021	Fengnan Chang <changfengnan@vivo.com>	f2fs: compress: fix overwrite may reduce compress ratio unproperly when overwrite only first block of cluster, since cluster is not full, it will call f2fs_write_raw_pages when f2fs_write_multi_pages, and cause the whole cluster become uncompressed eventhough data can be compressed. this may will make random write bench score reduce a lot. root# dd if=/dev/zero of=./fio-test bs=1M count=1 root# sync root# echo 3 > /proc/sys/vm/drop_caches root# f2fs_io get_cblocks ./fio-test root# dd if=/dev/zero of=./fio-test bs=4K count=1 oflag=direct conv=notrunc w/o patch: root# f2fs_io get_cblocks ./fio-test 189 w/ patch: root# f2fs_io get_cblocks ./fio-test 192 Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 71f2c820	01-Sep-2021	Chao Yu <chao@kernel.org>	f2fs: multidevice: support direct IO Commit 3c62be17d4f5 ("f2fs: support multiple devices") missed to support direct IO for multiple device feature, this patch adds to support the missing part of multidevice feature. In addition, for multiple device image, we should be aware of any issued direct write IO rather than just buffered write IO, so that fsync and syncfs can issue a preflush command to the device where direct write IO goes, to persist user data for posix compliant. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 9605f75c	30-Aug-2021	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: should put a page beyond EOF when preparing a write The prepare_compress_overwrite() gets/locks a page to prepare a read, and calls f2fs_read_multi_pages() which checks EOF first. If there's any page beyond EOF, we unlock the page and set cc->rpages[i] = NULL, which we can't put the page anymore. This makes page leak, so let's fix by putting that page. Fixes: a949dc5f2c5c ("f2fs: compress: fix race condition of overwrite vs truncate") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# adf9ea89	25-Aug-2021	Chao Yu <chao@kernel.org>	f2fs: fix unexpected ENOENT comes from f2fs_map_blocks() In below path, it will return ENOENT if filesystem is shutdown: - f2fs_map_blocks - f2fs_get_dnode_of_data - f2fs_get_node_page - __get_node_page - read_node_page - is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN) return -ENOENT - force return value from ENOENT to 0 It should be fine for read case, since it indicates a hole condition, and caller could use .m_next_pgofs to skip the hole and continue the lookup. However it may cause confusing for write case, since leaving a hole there, and said nothing was wrong doesn't help. There is at least one case from dax_iomap_actor() will complain that, so fix this in prior to supporting dax in f2fs. xfstest generic/388 reports below warning: ubuntu godown: xfstests-induced forced shutdown of /mnt/scratch_f2fs: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 485833 at fs/dax.c:1127 dax_iomap_actor+0x339/0x370 Call Trace: iomap_apply+0x1c4/0x7b0 ? dax_iomap_rw+0x1c0/0x1c0 dax_iomap_rw+0xad/0x1c0 ? dax_iomap_rw+0x1c0/0x1c0 f2fs_file_write_iter+0x5ab/0x970 [f2fs] do_iter_readv_writev+0x273/0x2e0 do_iter_write+0xab/0x1f0 vfs_iter_write+0x21/0x40 iter_file_splice_write+0x287/0x540 do_splice+0x37c/0xa60 __x64_sys_splice+0x15f/0x3a0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae ubuntu godown: xfstests-induced forced shutdown of /mnt/scratch_f2fs: ------------[ cut here ]------------ RIP: 0010:dax_iomap_pte_fault.isra.0+0x72e/0x14a0 Call Trace: dax_iomap_fault+0x44/0x70 f2fs_dax_huge_fault+0x155/0x400 [f2fs] f2fs_dax_fault+0x18/0x30 [f2fs] __do_fault+0x4e/0x120 do_fault+0x3cf/0x7a0 __handle_mm_fault+0xa8c/0xf20 ? find_held_lock+0x39/0xd0 handle_mm_fault+0x1b6/0x480 do_user_addr_fault+0x320/0xcd0 ? rcu_read_lock_sched_held+0x67/0xc0 exc_page_fault+0x77/0x3f0 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 Fixes: 83a3bfdb5a8a ("f2fs: indicate shutdown f2fs to allow unmount successfully") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a4b68176	20-Aug-2021	Daeho Jeong <daehojeong@google.com>	f2fs: introduce periodic iostat io latency traces Whenever we notice some sluggish issues on our machines, we are always curious about how well all types of I/O in the f2fs filesystem are handled. But, it's hard to get this kind of real data. First of all, we need to reproduce the issue while turning on the profiling tool like blktrace, but the issue doesn't happen again easily. Second, with the intervention of any tools, the overall timing of the issue will be slightly changed and it sometimes makes us hard to figure it out. So, I added the feature printing out IO latency statistics tracepoint events, which are minimal things to understand filesystem's I/O related behaviors, into F2FS_IOSTAT kernel config. With "iostat_enable" sysfs node on, we can get this statistics info in a periodic way and it would cause the least overhead. [samples] f2fs_ckpt-254:1-507 [003] .... 2842.439683: f2fs_iostat_latency: dev = (254,11), iotype [peak lat.(ms)/avg lat.(ms)/count], rd_data [136/1/801], rd_node [136/1/1704], rd_meta [4/2/4], wr_sync_data [164/16/3331], wr_sync_node [152/3/648], wr_sync_meta [160/2/4243], wr_async_data [24/13/15], wr_async_node [0/0/0], wr_async_meta [0/0/0] f2fs_ckpt-254:1-507 [002] .... 2845.450514: f2fs_iostat_latency: dev = (254,11), iotype [peak lat.(ms)/avg lat.(ms)/count], rd_data [60/3/456], rd_node [60/3/1258], rd_meta [0/0/1], wr_sync_data [120/12/2285], wr_sync_node [88/5/428], wr_sync_meta [52/6/2990], wr_async_data [4/1/3], wr_async_node [0/0/0], wr_async_meta [0/0/0] Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 52118743	19-Aug-2021	Daeho Jeong <daehojeong@google.com>	f2fs: separate out iostat feature Added F2FS_IOSTAT config option to support getting IO statistics through sysfs and printing out periodic IO statistics tracepoint events and moved I/O statistics related codes into separate files for better maintenance. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> [Jaegeuk Kim: set default=y] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# bbe1da7e	05-Aug-2021	Chao Yu <chao@kernel.org>	f2fs: compress: do sanity check on cluster This patch adds f2fs_sanity_check_cluster() to support doing sanity check on cluster of compressed file, it will be triggered from below two paths: - __f2fs_cluster_blocks() - f2fs_map_blocks(F2FS_GET_BLOCK_FIEMAP) And it can detect below three kind of cluster insanity status. C: COMPRESS_ADDR N: NULL_ADDR or NEW_ADDR V: valid blkaddr : any value 1. [\|C\|\|] 2. [C\|\|C\|] 3. [C\|N\|N\|V] Signed-off-by: Chao Yu <chao@kernel.org> [Nathan Chancellor: fix missing inline warning] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 32410577	08-Aug-2021	Chao Yu <chao@kernel.org>	f2fs: support fault injection for f2fs_kmem_cache_alloc() This patch supports to inject fault into f2fs_kmem_cache_alloc(). Usage: a) echo 32768 > /sys/fs/f2fs/<dev>/inject_type or b) mount -o fault_type=32768 <dev> <mountpoint> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a2649315	12-Aug-2021	Fengnan Chang <changfengnan@vivo.com>	f2fs: compress: avoid duplicate counting of valid blocks when read compressed file Since cluster is basic unit of compression, one cluster is compressed or not, so we can calculate valid blocks only for first page in cluster, the other pages just skip. Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 94afd6d6	03-Aug-2021	Chao Yu <chao@kernel.org>	f2fs: extent cache: support unaligned extent Compressed inode may suffer read performance issue due to it can not use extent cache, so I propose to add this unaligned extent support to improve it. Currently, it only works in readonly format f2fs image. Unaligned extent: in one compressed cluster, physical block number will be less than logical block number, so we add an extra physical block length in extent info in order to indicate such extent status. The idea is if one whole cluster blocks are contiguous physically, once its mapping info was readed at first time, we will cache an unaligned (or aligned) extent info entry in extent cache, it expects that the mapping info will be hitted when rereading cluster. Merge policy: - Aligned extents can be merged. - Aligned extent and unaligned extent can not be merged. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4931e0c9	28-Jul-2021	Daeho Jeong <daehojeong@google.com>	f2fs: turn back remapped address in compressed page endio Turned back the remmaped sector address to the address in the partition, when ending io, for compress cache to work properly. Fixes: 6ce19aff0b8c ("f2fs: compress: add compress_inode to cache compressed blocks") Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com> Signed-off-by: Hyeong Jun Kim <hj514.kim@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 093f0bac	25-Jul-2021	Daeho Jeong <daehojeong@google.com>	f2fs: change fiemap way in printing compression chunk When we print out a discontinuous compression chunk, it shows like a continuous chunk now. To show it more correctly, I've changed the way of printing fiemap info like below. Plus, eliminated NEW_ADDR(-1) in fiemap info, since it is not in fiemap user api manual. Let's assume 16KB compression cluster. <before> Logical Physical Length Flags 0: 0000000000000000 00000002c091f000 0000000000004000 1008 1: 0000000000004000 00000002c0920000 0000000000004000 1008 ... 9: 0000000000034000 0000000f8c623000 0000000000004000 1008 10: 0000000000038000 000000101a6eb000 0000000000004000 1008 <after> 0: 0000000000000000 00000002c091f000 0000000000004000 1008 1: 0000000000004000 00000002c0920000 0000000000004000 1008 ... 9: 0000000000034000 0000000f8c623000 0000000000001000 1008 10: 0000000000035000 000000101a6ea000 0000000000003000 1008 11: 0000000000038000 000000101a6eb000 0000000000002000 1008 12: 000000000003a000 00000002c3544000 0000000000002000 1008 Flags 0x1000 => FIEMAP_EXTENT_MERGED 0x0008 => FIEMAP_EXTENT_ENCODED Signed-off-by: Daeho Jeong <daehojeong@google.com> Tested-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7eab7a69	22-Jun-2021	Fengnan Chang <changfengnan@vivo.com>	f2fs: compress: remove unneeded read when rewrite whole cluster when we overwrite the whole page in cluster, we don't need read original data before write, because after write_end(), writepages() can help to load left data in that cluster. Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6de8687c	16-Jul-2021	Eric Biggers <ebiggers@google.com>	f2fs: remove allow_outplace_dio() We can just check f2fs_lfs_mode() directly. The block_unaligned_IO() check is redundant because in LFS mode, f2fs doesn't do direct I/O writes that aren't block-aligned (due to f2fs_force_buffered_io() returning true in this case, triggering the fallback to buffered I/O). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3e679dc7	16-Jul-2021	Eric Biggers <ebiggers@google.com>	f2fs: make f2fs_write_failed() take struct inode Make f2fs_write_failed() take a 'struct inode' directly rather than a 'struct address_space', as this simplifies it slightly. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1ffc8f5f	14-Jul-2021	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: let's keep writing IOs on SBI_NEED_FSCK SBI_NEED_FSCK is an indicator that fsck.f2fs needs to be triggered, so it is not fully critical to stop any IO writes. So, let's allow to write data instead of reporting EIO forever given SBI_NEED_FSCK, but do keep OPU. Fixes: 955772787667 ("f2fs: drop inplace IO if fs status is abnormal") Cc: <stable@kernel.org> # v5.13+ Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# edc6d01b	13-Apr-2021	Jan Kara <jack@suse.cz>	f2fs: Convert to using invalidate_lock Use invalidate_lock instead of f2fs' private i_mmap_sem. The intended purpose is exactly the same. By this conversion we fix a long standing race between hole punching and read(2) / readahead(2) paths that can lead to stale page cache contents. CC: Jaegeuk Kim <jaegeuk@kernel.org> CC: Chao Yu <yuchao0@huawei.com> CC: linux-f2fs-devel@lists.sourceforge.net Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz>
# c9ebd3df	04-Jul-2021	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: initialize page->private when using for our internal use We need to guarantee it's initially zero. Otherwise, it'll hurt entire flag operations. Fixes: b763f3bedc2d ("f2fs: restructure f2fs page.private layout") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 859fca6b	26-May-2021	Chao Yu <chao@kernel.org>	f2fs: swap: support migrating swapfile in aligned write mode This patch supports to migrate swapfile in aligned write mode during swapon in order to keep swapfile being aligned to section as much as possible, then pinned swapfile will locates fully filled section which may not affected by GC. However, for the case that swapfile's size is not aligned to section size, it will still leave last extent in file's tail as unaligned due to its size is smaller than section size, like case #2. case #1 xfs_io -f /mnt/f2fs/file -c "pwrite 0 4M" -c "fsync" Before swapon: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..3047]: 1123352..1126399 3048 0x1000 1: [3048..7143]: 237568..241663 4096 0x1000 2: [7144..8191]: 245760..246807 1048 0x1001 After swapon: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..8191]: 249856..258047 8192 0x1001 Kmsg: F2FS-fs (zram0): Swapfile (2) is not align to section: 1) creat(), 2) ioctl(F2FS_IOC_SET_PIN_FILE), 3) fallocate(2097152 * n) case #2 xfs_io -f /mnt/f2fs/file -c "pwrite 0 3M" -c "fsync" Before swapon: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..3047]: 246808..249855 3048 0x1000 1: [3048..6143]: 237568..240663 3096 0x1001 After swapon: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..4095]: 258048..262143 4096 0x1000 1: [4096..6143]: 238616..240663 2048 0x1001 Kmsg: F2FS-fs (zram0): Swapfile: last extent is not aligned to section F2FS-fs (zram0): Swapfile (2) is not align to section: 1) creat(), 2) ioctl(F2FS_IOC_SET_PIN_FILE), 3) fallocate(2097152 * n) Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0b8fc006	26-May-2021	Chao Yu <chao@kernel.org>	f2fs: swap: remove dead codes After commit af4b6b8edf6a ("f2fs: introduce check_swap_activate_fast()"), we will never run into original logic of check_swap_activate() before f2fs supports non 4k-sized page, so let's delete those dead codes. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6ce19aff	20-May-2021	Chao Yu <chao@kernel.org>	f2fs: compress: add compress_inode to cache compressed blocks Support to use address space of inner inode to cache compressed block, in order to improve cache hit ratio of random read. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 833dcd35	26-May-2021	Joe Perches <joe@perches.com>	f2fs: logging neatening Update the logging uses that have unnecessary newlines as the f2fs_printk function and so its f2fs_<level> macro callers already adds one. This allows searching single line logging entries with an easier grep and also avoids unnecessary blank lines in the logging. Miscellanea: o Coalesce formats o Align to open parenthesis Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# d927ccfc	10-May-2021	Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>	f2fs: Prevent swap file in LFS mode The kernel writes to swap files on f2fs directly without the assistance of the filesystem. This direct write by kernel can be non-sequential even when the f2fs is in LFS mode. Such non-sequential write conflicts with the LFS semantics. Especially when f2fs is set up on zoned block devices, the non-sequential write causes unaligned write command errors. To avoid the non-sequential writes to swap files, prevent swap file activation when the filesystem is in LFS mode. Fixes: 4969c06a0d83 ("f2fs: support swap file w/ DIO") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.10+ Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b763f3be	28-Apr-2021	Chao Yu <chao@kernel.org>	f2fs: restructure f2fs page.private layout Restruct f2fs page private layout for below reasons: There are some cases that f2fs wants to set a flag in a page to indicate a specified status of page: a) page is in transaction list for atomic write b) page contains dummy data for aligned write c) page is migrating for GC d) page contains inline data for inline inode flush e) page belongs to merkle tree, and is verified for fsverity f) page is dirty and has filesystem/inode reference count for writeback g) page is temporary and has decompress io context reference for compression There are existed places in page structure we can use to store f2fs private status/data: - page.flags: PG_checked, PG_private - page.private However it was a mess when we using them, which may cause potential confliction: page.private PG_private PG_checked page._refcount (+1 at most) a) -1 set +1 b) -2 set c), d), e) set f) 0 set +1 g) pointer set The other problem is page.flags has no free slot, if we can avoid set zero to page.private and set PG_private flag, then we use non-zero value to indicate PG_private status, so that we may have chance to reclaim PG_private slot for other usage. [1] The other concern is f2fs has bad scalability in aspect of indicating more page status. So in this patch, let's restructure f2fs' page.private as below to solve above issues: Layout A: lowest bit should be 1 \| bit0 = 1 \| bit1 \| bit2 \| ... \| bit MAX \| private data .... \| bit 0 PAGE_PRIVATE_NOT_POINTER bit 1 PAGE_PRIVATE_ATOMIC_WRITE bit 2 PAGE_PRIVATE_DUMMY_WRITE bit 3 PAGE_PRIVATE_ONGOING_MIGRATION bit 4 PAGE_PRIVATE_INLINE_INODE bit 5 PAGE_PRIVATE_REF_RESOURCE bit 6- f2fs private data Layout B: lowest bit should be 0 page.private is a wrapped pointer. After the change: page.private PG_private PG_checked page._refcount (+1 at most) a) 11 set +1 b) 101 set +1 c) 1001 set +1 d) 10001 set +1 e) set f) 100001 set +1 g) pointer set +1 [1] https://lore.kernel.org/linux-f2fs-devel/20210422154705.GO3596236@casper.infradead.org/T/#u Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f395183f	12-May-2021	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: return EINVAL for hole cases in swap file This tries to fix xfstests/generic/495. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ca298241	11-May-2021	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid swapon failure by giving a warning first The final solution can be migrating blocks to form a section-aligned file internally. Meanwhile, let's ask users to do that when preparing the swap file initially like: 1) create() 2) ioctl(F2FS_IOC_SET_PIN_FILE) 3) fallocate() Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: 36e4d95891ed ("f2fs: check if swapfile is section-alligned") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8bfbfb0d	10-May-2021	Chao Yu <chao@kernel.org>	f2fs: compress: fix to assign cc.cluster_idx correctly In f2fs_destroy_compress_ctx(), after f2fs_destroy_compress_ctx(), cc.cluster_idx will be cleared w/ NULL_CLUSTER, f2fs_cluster_blocks() may check wrong cluster metadata, fix it. Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5f029c04	05-Apr-2021	Yi Zhuang <zhuangyi1@huawei.com>	f2fs: clean up build warnings This patch combined the below three clean-up patches. - modify open brace '{' following function definitions - ERROR: spaces required around that ':' - ERROR: spaces required before the open parenthesis '(' - ERROR: spaces prohibited before that ',' - Made suggested modifications from checkpatch in reference to WARNING: Missing a blank line after declarations Signed-off-by: Yi Zhuang <zhuangyi1@huawei.com> Signed-off-by: Jia Yang <jiayang5@huawei.com> Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0bb2045c	08-Mar-2021	Chengguang Xu <cgxu519@mykernel.net>	f2fs: fix to use per-inode maxbytes in f2fs_fiemap F2FS inode may have different max size, so change to use per-inode maxbytes. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 36e4d958	28-Feb-2021	huangjianan@oppo.com <huangjianan@oppo.com>	f2fs: check if swapfile is section-alligned If the swapfile isn't created by pin and fallocate, it can't be guaranteed section-aligned, so it may be selected by f2fs gc. When gc_pin_file_threshold is reached, the address of swapfile may change, but won't be synchronized to swap_extent, so swap will write to wrong address, which will cause data corruption. Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Guo Weichao <guoweichao@oppo.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1da66103	27-Feb-2021	huangjianan@oppo.com <huangjianan@oppo.com>	f2fs: fix last_lblock check in check_swap_activate_fast Because page_no < sis->max guarantees that the while loop break out normally, the wrong check contidion here doesn't cause a problem. Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Guo Weichao <guoweichao@oppo.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ebc29b62	27-Feb-2021	huangjianan@oppo.com <huangjianan@oppo.com>	f2fs: remove unnecessary IS_SWAPFILE check Now swapfile in f2fs directly submit IO to blockdev according to swapfile extents reported by f2fs when swapon, therefore there is no need to check IS_SWAPFILE when exec filesystem operation. Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Guo Weichao <guoweichao@oppo.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a8affc03	10-Mar-2021	Christoph Hellwig <hch@lst.de>	block: rename BIO_MAX_PAGES to BIO_MAX_VECS Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been horribly confusingly misnamed. Rename it to BIO_MAX_VECS to stop confusing users of the bio API. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 5f7136db	28-Jan-2021	Matthew Wilcox (Oracle) <willy@infradead.org>	block: Add bio_max_segs It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the sign to be the same. Introduce bio_max_segs() and change BIO_MAX_PAGES to be unsigned to make it easier for the users. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 39f71b7e	02-Feb-2021	Dehe Gu <gudehe@huawei.com>	f2fs: fix a wrong condition in __submit_bio We should use !F2FS_IO_ALIGNED() to check and submit_io directly. Fixes: 8223ecc456d0 ("f2fs: fix to add missing F2FS_IO_ALIGNED() condition") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Dehe Gu <gudehe@huawei.com> Signed-off-by: Ge Qiu <qiuge@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# d5f7bc00	14-Jan-2021	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: deprecate f2fs_trace_io This patch deprecates f2fs_trace_io, since f2fs uses page->private more broadly, resulting in more buggy cases. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 12699fb7	14-Jan-2021	Matthew Wilcox (Oracle) <willy@infradead.org>	f2fs: Remove readahead collision detection With the new ->readahead operation, locked pages are added to the page cache, preventing two threads from racing with each other to read the same chunk of file, so this is dead code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6d1451bf	12-Jan-2021	Chengguang Xu <cgxu519@mykernel.net>	f2fs: fix to use per-inode maxbytes F2FS inode may have different max size, e.g. compressed file have less blkaddr entries in all its direct-node blocks, result in being with less max filesize. So change to use per-inode maxbytes. Suggested-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3afae09f	11-Jan-2021	Chao Yu <chao@kernel.org>	f2fs: compress: fix potential deadlock generic/269 reports a hangtask issue, the root cause is ABBA deadlock described as below: Thread A Thread B - down_write(&sbi->gc_lock) -- A - f2fs_write_data_pages - lock all pages in cluster -- B - f2fs_write_multi_pages - f2fs_write_raw_pages - f2fs_write_single_data_page - f2fs_balance_fs - down_write(&sbi->gc_lock) -- A - f2fs_gc - do_garbage_collect - ra_data_block - pagecache_get_page -- B To fix this, it needs to avoid calling f2fs_balance_fs() if there is still cluster pages been locked in context of cluster writeback, so instead, let's call f2fs_balance_fs() in the end of f2fs_write_raw_pages() when all cluster pages were unlocked. Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7f59b277	04-Jan-2021	Eric Biggers <ebiggers@google.com>	f2fs: clean up post-read processing Rework the post-read processing logic to be much easier to understand. At least one bug is fixed by this: if an I/O error occurred when reading from disk, decryption and verity would be performed on the uninitialized data, causing misleading messages in the kernel log. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0953fe86	14-Dec-2020	Chao Yu <chao@kernel.org>	f2fs: fix to tag FIEMAP_EXTENT_MERGED in f2fs_fiemap() f2fs does not natively support extents in metadata, 'extent' in f2fs is used as a virtual concept, so in f2fs_fiemap() interface, it needs to tag FIEMAP_EXTENT_MERGED flag to indicated the extent status is a result of merging. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0b979f1b	26-Dec-2020	Chao Yu <chao@kernel.org>	f2fs: relocate f2fs_precache_extents() Relocate f2fs_precache_extents() in prior to check_swap_activate(), then extent cache can be enabled before its use. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 67883ade	26-Jan-2021	Christoph Hellwig <hch@lst.de>	f2fs: remove FAULT_ALLOC_BIO Sleeping bio allocations do not fail, which means that injecting an error into sleeping bio allocations is a little silly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 25ac8426	26-Jan-2021	Christoph Hellwig <hch@lst.de>	f2fs: use blkdev_issue_flush in __submit_flush_wait Use the blkdev_issue_flush helper instead of duplicating it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 309dca30	24-Jan-2021	Christoph Hellwig <hch@lst.de>	block: store a block_device pointer in struct bio Replace the gendisk pointer in struct bio with a pointer to the newly improved struct block device. From that the gendisk can be trivially accessed with an extra indirection, but it also allows to directly look up all information related to partition remapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 6422a71e	04-Dec-2020	Daeho Jeong <daehojeong@google.com>	f2fs: fix race of pending_pages in decompression I found out f2fs_free_dic() is invoked in a wrong timing, but f2fs_verify_bio() still needed the dic info and it triggered the below kernel panic. It has been caused by the race condition of pending_pages value between decompression and verity logic, when the same compression cluster had been split in different bios. By split bios, f2fs_verify_bio() ended up with decreasing pending_pages value before it is reset to nr_cpages by f2fs_decompress_pages() and caused the kernel panic. [ 4416.564763] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 ... [ 4416.896016] Workqueue: fsverity_read_queue f2fs_verity_work [ 4416.908515] pc : fsverity_verify_page+0x20/0x78 [ 4416.913721] lr : f2fs_verify_bio+0x11c/0x29c [ 4416.913722] sp : ffffffc019533cd0 [ 4416.913723] x29: ffffffc019533cd0 x28: 0000000000000402 [ 4416.913724] x27: 0000000000000001 x26: 0000000000000100 [ 4416.913726] x25: 0000000000000001 x24: 0000000000000004 [ 4416.913727] x23: 0000000000001000 x22: 0000000000000000 [ 4416.913728] x21: 0000000000000000 x20: ffffffff2076f9c0 [ 4416.913729] x19: ffffffff2076f9c0 x18: ffffff8a32380c30 [ 4416.913731] x17: ffffffc01f966d97 x16: 0000000000000298 [ 4416.913732] x15: 0000000000000000 x14: 0000000000000000 [ 4416.913733] x13: f074faec89ffffff x12: 0000000000000000 [ 4416.913734] x11: 0000000000001000 x10: 0000000000001000 [ 4416.929176] x9 : ffffffff20d1f5c7 x8 : 0000000000000000 [ 4416.929178] x7 : 626d7464ff286b6b x6 : ffffffc019533ade [ 4416.929179] x5 : 000000008049000e x4 : ffffffff2793e9e0 [ 4416.929180] x3 : 000000008049000e x2 : ffffff89ecfa74d0 [ 4416.929181] x1 : 0000000000000c40 x0 : ffffffff2076f9c0 [ 4416.929184] Call trace: [ 4416.929187] fsverity_verify_page+0x20/0x78 [ 4416.929189] f2fs_verify_bio+0x11c/0x29c [ 4416.929192] f2fs_verity_work+0x58/0x84 [ 4417.050667] process_one_work+0x270/0x47c [ 4417.055354] worker_thread+0x27c/0x4d8 [ 4417.059784] kthread+0x13c/0x320 [ 4417.063693] ret_from_fork+0x10/0x18 Chao pointed this can happen by the below race condition. Thread A f2fs_post_read_wq fsverity_wq - f2fs_read_multi_pages() - f2fs_alloc_dic - dic->pending_pages = 2 - submit_bio() - submit_bio() - f2fs_post_read_work() handle first bio - f2fs_decompress_work() - __read_end_io() - f2fs_decompress_pages() - dic->pending_pages-- - enqueue f2fs_verity_work() - f2fs_verity_work() handle first bio - f2fs_verify_bio() - dic->pending_pages-- - f2fs_post_read_work() handle second bio - f2fs_decompress_work() - enqueue f2fs_verity_work() - f2fs_verify_pages() - f2fs_free_dic() - f2fs_verity_work() handle second bio - f2fs_verfy_bio() - use-after-free on dic Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 10208567	03-Dec-2020	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: introduce max_io_bytes, a sysfs entry, to limit bio size This patch adds max_io_bytes to limit bio size when f2fs tries to merge consecutive IOs. This can give a testing point to split out bios and check end_io handles those bios correctly. This is used to capture a recent bug on the decompression and fsverity flow. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 602a16d5	30-Nov-2020	Daeho Jeong <daehojeong@google.com>	f2fs: add compress_mode mount option We will add a new "compress_mode" mount option to control file compression mode. This supports "fs" and "user". In "fs" mode (default), f2fs does automatic compression on the compression enabled files. In "user" mode, f2fs disables the automaic compression and gives the user discretion of choosing the target file and the timing. It means the user can do manual compression/decompression on the compression enabled files using ioctls. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b876f4c9	24-Nov-2020	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: remove buffer_head which has 32bits limit This patch removes buffer_head dependency when getting block addresses. Light reported there's a 32bit issue in f2fs_fiemap where map_bh.b_size is 32bits while len is 64bits given by user. This will give wrong length to f2fs_map_block. Reported-by: Light Hsieh <Light.Hsieh@mediatek.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 963ba7f9	24-Nov-2020	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix wrong block count instead of bytes We should convert cur_lblock, a block count, to bytes for len. Fixes: af4b6b8edf6a ("f2fs: introduce check_swap_activate_fast()") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 43b9d4b4	24-Nov-2020	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: use new conversion functions between blks and bytes This patch cleans up blks and bytes conversions. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6cbfcab5	24-Nov-2020	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: rename logical_to_blk and blk_to_logical This patch renames two functions like below having u64. - logical_to_blk to bytes_to_blks - blk_to_logical to blks_to_bytes Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# af4b6b8e	12-Oct-2020	Chao Yu <chao@kernel.org>	f2fs: introduce check_swap_activate_fast() check_swap_activate() will lookup block mapping via bmap() one by one, so its performance is very bad, this patch introduces check_swap_activate_fast() to use f2fs_fiemap() to boost this process, since f2fs_fiemap() will lookup block mappings in batch, therefore, it can improve swapon()'s performance significantly. Note that this enhancement only works when page size is equal to f2fs' block size. Testcase: (backend device: zram) - touch file - pin & fallocate file to 8GB - mkswap file - swapon file Before: real 0m2.999s user 0m0.000s sys 0m2.980s After: real 0m0.081s user 0m0.000s sys 0m0.064s Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# adfc6943	23-Sep-2020	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix slab leak of rpages pointer This fixes the below mem leak. [ 130.157600] ============================================================================= [ 130.159662] BUG f2fs_page_array_entry-252:16 (Tainted: G W O ): Objects remaining in f2fs_page_array_entry-252:16 on __kmem_cache_shutdown() [ 130.162742] ----------------------------------------------------------------------------- [ 130.162742] [ 130.164979] Disabling lock debugging due to kernel taint [ 130.166188] INFO: Slab 0x000000009f5a52d2 objects=22 used=4 fp=0x00000000ba72c3e9 flags=0xfffffc0010200 [ 130.168269] CPU: 7 PID: 3560 Comm: umount Tainted: G B W O 5.9.0-rc4+ #35 [ 130.170019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 130.171941] Call Trace: [ 130.172528] dump_stack+0x74/0x9a [ 130.173298] slab_err+0xb7/0xdc [ 130.174044] ? kernel_poison_pages+0xc0/0xc0 [ 130.175065] ? on_each_cpu_cond_mask+0x48/0x90 [ 130.176096] __kmem_cache_shutdown.cold+0x34/0x141 [ 130.177190] kmem_cache_destroy+0x59/0x100 [ 130.178223] f2fs_destroy_page_array_cache+0x15/0x20 [f2fs] [ 130.179527] f2fs_put_super+0x1bc/0x380 [f2fs] [ 130.180538] generic_shutdown_super+0x72/0x110 [ 130.181547] kill_block_super+0x27/0x50 [ 130.182438] kill_f2fs_super+0x76/0xe0 [f2fs] [ 130.183448] deactivate_locked_super+0x3b/0x80 [ 130.184456] deactivate_super+0x3e/0x50 [ 130.185363] cleanup_mnt+0x109/0x160 [ 130.186179] __cleanup_mnt+0x12/0x20 [ 130.187003] task_work_run+0x70/0xb0 [ 130.187841] exit_to_user_mode_prepare+0x18f/0x1b0 [ 130.188917] syscall_exit_to_user_mode+0x31/0x170 [ 130.189989] do_syscall_64+0x45/0x90 [ 130.190828] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 130.191986] RIP: 0033:0x7faf868ea2eb [ 130.192815] Code: 7b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 7b 0c 00 f7 d8 64 89 01 [ 130.196872] RSP: 002b:00007fffb7edb478 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 130.198494] RAX: 0000000000000000 RBX: 00007faf86a18204 RCX: 00007faf868ea2eb [ 130.201021] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055971df71c50 [ 130.203415] RBP: 000055971df71a40 R08: 0000000000000000 R09: 00007fffb7eda1f0 [ 130.205772] R10: 00007faf86a04339 R11: 0000000000000246 R12: 000055971df71c50 [ 130.208150] R13: 0000000000000000 R14: 000055971df71b38 R15: 0000000000000000 [ 130.210515] INFO: Object 0x00000000a980843a @offset=744 [ 130.212476] INFO: Allocated in page_array_alloc+0x3d/0xe0 [f2fs] age=1572 cpu=0 pid=3297 [ 130.215030] __slab_alloc+0x20/0x40 [ 130.216566] kmem_cache_alloc+0x2a0/0x2e0 [ 130.218217] page_array_alloc+0x3d/0xe0 [f2fs] [ 130.219940] f2fs_init_compress_ctx+0x1f/0x40 [f2fs] [ 130.221736] f2fs_write_cache_pages+0x3db/0x860 [f2fs] [ 130.223591] f2fs_write_data_pages+0x2c9/0x300 [f2fs] [ 130.225414] do_writepages+0x43/0xd0 [ 130.226907] __filemap_fdatawrite_range+0xd5/0x110 [ 130.228632] filemap_write_and_wait_range+0x48/0xb0 [ 130.230336] __generic_file_write_iter+0x18a/0x1d0 [ 130.232035] f2fs_file_write_iter+0x226/0x550 [f2fs] [ 130.233737] new_sync_write+0x113/0x1a0 [ 130.235204] vfs_write+0x1a6/0x200 [ 130.236579] ksys_write+0x67/0xe0 [ 130.237898] __x64_sys_write+0x1a/0x20 [ 130.239309] do_syscall_64+0x38/0x90 Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c8eb7024	14-Sep-2020	Chao Yu <chao@kernel.org>	f2fs: clean up kvfree After commit 0b6d4ca04a86 ("f2fs: don't return vmalloc() memory from f2fs_kmalloc()"), f2fs_k{m,z}alloc() will not return vmalloc()'ed memory, so clean up to use kfree() instead of kvfree() to free vmalloc()'ed memory. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 78134d03	07-Sep-2020	Daeho Jeong <daehojeong@google.com>	f2fs: change return value of f2fs_disable_compressed_file to bool The returned integer is not required anywhere. So we need to change the return value to bool type. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4eda1682	30-Aug-2020	Daeho Jeong <daehojeong@google.com>	f2fs: add block address limit check to compressed file Need to add block address range check to compressed file case and avoid calling get_data_block_bmap() for compressed file. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 335cac8b	30-Aug-2020	Jack Qiu <jack.qiu@huawei.com>	f2fs: correct statistic of APP_DIRECT_IO/APP_DIRECT_READ_IO Miss to update APP_DIRECT_IO/APP_DIRECT_READ_IO when receiving async DIO. For example: fio -filename=/data/test.0 -bs=1m -ioengine=libaio -direct=1 -name=fill -size=10m -numjobs=1 -iodepth=32 -rw=write Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 093749e2	04-Aug-2020	Chao Yu <chao@kernel.org>	f2fs: support age threshold based garbage collection There are several issues in current background GC algorithm: - valid blocks is one of key factors during cost overhead calculation, so if segment has less valid block, however even its age is young or it locates hot segment, CB algorithm will still choose the segment as victim, it's not appropriate. - GCed data/node will go to existing logs, no matter in-there datas' update frequency is the same or not, it may mix hot and cold data again. - GC alloctor mainly use LFS type segment, it will cost free segment more quickly. This patch introduces a new algorithm named age threshold based garbage collection to solve above issues, there are three steps mainly: 1. select a source victim: - set an age threshold, and select candidates beased threshold: e.g. 0 means youngest, 100 means oldest, if we set age threshold to 80 then select dirty segments which has age in range of [80, 100] as candiddates; - set candidate_ratio threshold, and select candidates based the ratio, so that we can shrink candidates to those oldest segments; - select target segment with fewest valid blocks in order to migrate blocks with minimum cost; 2. select a target victim: - select candidates beased age threshold; - set candidate_radius threshold, search candidates whose age is around source victims, searching radius should less than the radius threshold. - select target segment with most valid blocks in order to avoid migrating current target segment. 3. merge valid blocks from source victim into target victim with SSR alloctor. Test steps: - create 160 dirty segments: * half of them have 128 valid blocks per segment * left of them have 384 valid blocks per segment - run background GC Benefit: GC count and block movement count both decrease obviously: - Before: - Valid: 86 - Dirty: 1 - Prefree: 11 - Free: 6001 (6001) GC calls: 162 (BG: 220) - data segments : 160 (160) - node segments : 2 (2) Try to move 41454 blocks (BG: 41454) - data blocks : 40960 (40960) - node blocks : 494 (494) IPU: 0 blocks SSR: 0 blocks in 0 segments LFS: 41364 blocks in 81 segments - After: - Valid: 87 - Dirty: 0 - Prefree: 4 - Free: 6008 (6008) GC calls: 75 (BG: 76) - data segments : 74 (74) - node segments : 1 (1) Try to move 12813 blocks (BG: 12813) - data blocks : 12544 (12544) - node blocks : 269 (269) IPU: 0 blocks SSR: 12032 blocks in 77 segments LFS: 855 blocks in 2 segments Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e6c3948d	10-Aug-2020	Chao Yu <chao@kernel.org>	f2fs: compress: use more readable atomic_t type for {cic,dic}.ref refcount_t type variable should never be less than one, so it's a little bit hard to understand when we use it to indicate pending compressed page count, let's change to use atomic_t for better readability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c5d02785	04-Aug-2020	Chao Yu <chao@kernel.org>	f2fs: inherit mtime of original block during GC Don't let f2fs inner GC ruins original aging degree of segment. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e90027d2	04-Aug-2020	Xiaojun Wang <wangxiaojun11@huawei.com>	f2fs: remove duplicated type casting Since DUMMY_WRITTEN_PAGE and ATOMIC_WRITTEN_PAGE have already been converted as unsigned long type, we don't need do type casting again. Signed-off-by: Xiaojun Wang <wangxiaojun11@huawei.com> Reported-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 20d0a107	19-Aug-2020	Gabriel Krisman Bertazi <krisman@collabora.com>	f2fs: Return EOF on unaligned end of file DIO read Reading past end of file returns EOF for aligned reads but -EINVAL for unaligned reads on f2fs. While documentation is not strict about this corner case, most filesystem returns EOF on this case, like iomap filesystems. This patch consolidates the behavior for f2fs, by making it return EOF(0). it can be verified by a read loop on a file that does a partial read before EOF (A file that doesn't end at an aligned address). The following code fails on an unaligned file on f2fs, but not on btrfs, ext4, and xfs. while (done < total) { ssize_t delta = pread(fd, buf + done, total - done, off + done); if (!delta) break; ... } It is arguable whether filesystems should actually return EOF or -EINVAL, but since iomap filesystems support it, and so does the original DIO code, it seems reasonable to consolidate on that. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a86d27dd	29-Jul-2020	Chao Yu <chao@kernel.org>	f2fs: compress: add sanity check during compressed cluster read In f2fs_read_multi_pages(), we don't have to check cluster's type again, since overwrite or partial truncation need page lock in cluster which has already been held by reader, so cluster's type is stable, let's change check condition to sanity check. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 944dd22e	24-Jul-2020	Chao Yu <chao@kernel.org>	f2fs: compress: fix to update isize when overwriting compressed file We missed to update isize of compressed file in write_end() with below case: cluster size is 16KB - write 14KB data from offset 0 - overwrite 16KB data from offset 0 Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a87aff1d	24-Jul-2020	Jack Qiu <jack.qiu@huawei.com>	f2fs: space related cleanup Just for code style, no logic change 1. delete useless space 2. change spaces into tab Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4df7a75f	15-Jun-2020	Jason Yan <yanaijie@huawei.com>	f2fs: Eliminate usage of uninitialized_var() macro This is an effort to eliminate the uninitialized_var() macro[1]. The use of this macro is the wrong solution because it forces off ANY analysis by the compiler for a given variable. It even masks "unused variable" warnings. Quoted from Linus[2]: "It's a horrible thing to use, in that it adds extra cruft to the source code, and then shuts up a compiler warning (even the _reliable_ warnings from gcc)." Fix it by remove this variable since it is not needed at all. [1] https://github.com/KSPP/linux/issues/81 [2] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Suggested-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20200615085132.166470-1-yanaijie@huawei.com Signed-off-by: Kees Cook <keescook@chromium.org>
# 27aacd28	01-Jul-2020	Satya Tangirala <satyat@google.com>	f2fs: add inline encryption support Wire up f2fs to support inline encryption via the helper functions which fs/crypto/ now provides. This includes: - Adding a mount option 'inlinecrypt' which enables inline encryption on encrypted files where it can be used. - Setting the bio_crypt_ctx on bios that will be submitted to an inline-encrypted file. - Not adding logically discontiguous data to bios that will be submitted to an inline-encrypted file. - Not doing filesystem-layer crypto on inline-encrypted files. This patch includes a fix for a race during IPU by Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Satya Tangirala <satyat@google.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20200702015607.1215430-4-satyat@google.com Co-developed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
# 6b12367d	23-Jun-2020	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid readahead race condition If two readahead threads having same offset enter in readpages, every read IOs are split and issued to the disk which giving lower bandwidth. This patch tries to avoid redundant readahead calls. Fixes one build error reported by Randy. Fix build error when F2FS_FS_COMPRESSION is not set/enabled. This label is needed in either case. ../fs/f2fs/data.c: In function ‘f2fs_mpage_readpages’: ../fs/f2fs/data.c:2327:5: error: label ‘next_page’ used but not defined goto next_page; Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b7973091	30-Jun-2020	Jia Yang <jiayang5@huawei.com>	f2fs: add parameter op_flag in f2fs_submit_page_read() The parameter op_flag is not used in f2fs_get_read_data_page(), but it is used in f2fs_grab_read_bio(). Obviously, op_flag is not passed to f2fs_grab_read_bio() successfully. We need to add parameter in f2fs_submit_page_read() to pass it. The case: - gc_data_segment - f2fs_get_read_data_page(.., op_flag = REQ_RAHEAD,..) - f2fs_submit_page_read - f2fs_grab_read_bio(.., op_flag = 0, ..) Signed-off-by: Jia Yang <jiayang5@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# dd5a09bd	29-Jun-2020	Chao Yu <chao@kernel.org>	f2fs: support to trace f2fs_fiemap() to show f2fs_fiemap()'s result as below: f2fs_fiemap: dev = (251,0), ino = 7, lblock:0, pblock:1625292800, len:2097152, flags:0, ret:0 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b79b0a31	29-Jun-2020	Chao Yu <chao@kernel.org>	f2fs: support to trace f2fs_bmap() to show f2fs_bmap()'s result as below: f2fs_bmap: dev = (251,0), ino = 7, lblock:0, pblock:396800 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 250e84d7	28-Jun-2020	Chao Yu <chao@kernel.org>	f2fs: fix wrong return value of f2fs_bmap_compress() If compression is disable, we should return zero rather than -EOPNOTSUPP to indicate f2fs_bmap() is not supported. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f608c38c	18-Jun-2020	Chao Yu <chao@kernel.org>	f2fs: clean up parameter of f2fs_allocate_data_block() Use validation of @fio to inidcate whether caller want to serialize IOs in io.io_list or not, then @add_list will be redundant, remove it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 79963d96	18-Jun-2020	Chao Yu <chao@kernel.org>	f2fs: shrink node_write lock coverage - to avoid race between checkpoint and quota file writeback, it just needs to hold read lock of node_write in writeback path. - node_write lock has covered all LFS data write paths, it's not necessary, we only need to hold node_write lock at write path of quota file. This refactors commit ca7f76e68074 ("f2fs: fix wrong discard space"). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0ef81833	18-Jun-2020	Chao Yu <chao@kernel.org>	f2fs: add prefix for exported symbols to avoid polluting global symbol namespace. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b7b911d5	04-Jun-2020	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: attach IO flags to the missing cases This adds more IOs to attach flags. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 32b6aba8	04-Jun-2020	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: add node_io_flag for bio flags likewise data_io_flag This patch adds another way to attach bio flags to node writes. Description: Give a way to attach REQ_META\|FUA to node writes given temperature-based bits. Now the bits indicate: * REQ_META \| REQ_FUA \| * 5 \| 4 \| 3 \| 2 \| 1 \| 0 \| * Cold \| Warm \| Hot \| Cold \| Warm \| Hot \| Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e78790f8	02-Jun-2020	Sahitya Tummala <stummala@codeaurora.org>	f2fs: fix retry logic in f2fs_write_cache_pages() In case a compressed file is getting overwritten, the current retry logic doesn't include the current page to be retried now as it sets the new start index as 0 and new end index as writeback_index - 1. This causes the corresponding cluster to be uncompressed and written as normal pages without compression. Fix this by allowing writeback to be retried for the current page as well (in case of compressed page getting retried due to index mismatch with cluster index). So that this cluster can be written compressed in case of overwrite. Also, align f2fs_write_cache_pages() according to the change - <64081362e8ff>("mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock"). Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 45dd052e	23-May-2020	Christoph Hellwig <hch@lst.de>	fs: handle FIEMAP_FLAG_SYNC in fiemap_prep By moving FIEMAP_FLAG_SYNC handling to fiemap_prep we ensure it is handled once instead of duplicated, but can still be done under fs locks, like xfs/iomap intended with its duplicate handling. Also make sure the error value of filemap_write_and_wait is propagated to user space. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20200523073016.2944131-8-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
# cddf8a2c	23-May-2020	Christoph Hellwig <hch@lst.de>	fs: move fiemap range validation into the file systems instances Replace fiemap_check_flags with a fiemap_prep helper that also takes the inode and mapped range, and performs the sanity check and truncation previously done in fiemap_check_range. This way the validation is inside the file system itself and thus properly works for the stacked overlayfs case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20200523073016.2944131-7-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
# 10c5db28	23-May-2020	Christoph Hellwig <hch@lst.de>	fs: move the fiemap definitions out of fs.h No need to pull the fiemap definitions into almost every file in the kernel build. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
# e20a7693	01-Jun-2020	Matthew Wilcox (Oracle) <willy@infradead.org>	f2fs: pass the inode to f2fs_mpage_readpages This function now only uses the mapping argument to look up the inode, and both callers already have the inode, so just pass the inode instead of the mapping. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-24-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 23323196	01-Jun-2020	Matthew Wilcox (Oracle) <willy@infradead.org>	f2fs: convert from readpages to readahead Use the new readahead operation in f2fs Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-23-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 2c684234	01-Jun-2020	Matthew Wilcox (Oracle) <willy@infradead.org>	mm: add page_cache_readahead_unbounded ext4 and f2fs have duplicated the guts of the readahead code so they can read past i_size. Instead, separate out the guts of the readahead code so they can call it directly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-14-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 9c122384	23-Apr-2020	Chao Yu <chao@kernel.org>	f2fs: add compressed/gc data read IO stat in order to account data read IOs more accurately. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f3494345	23-Apr-2020	Chao Yu <chao@kernel.org>	f2fs: fix potential use-after-free issue In error path of f2fs_read_multi_pages(), it should let last referrer release decompress io context memory, otherwise, other referrer will cause use-after-free issue. Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 03382f1a	21-Apr-2020	Chao Yu <chao@kernel.org>	f2fs: compress: don't handle non-compressed data in workqueue If bio has no compressed data, we don't need to handle end_io work in workqueue, instead, it should just let interrupter handle it directly to speed up IO response. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c1c63387	30-Mar-2020	Chao Yu <chao@kernel.org>	f2fs: introduce f2fs_bmap_compress() to support bmap() on compressed inode: if queried block locates in non-compressed cluster, return its physical block address. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# bf38fbad	28-Mar-2020	Chao Yu <chao@kernel.org>	f2fs: support fiemap on compressed inode Map normal/compressed cluster of compressed inode correctly, and give the right fiemap flag FIEMAP_EXTENT_ENCODED on mapped compressed extent. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 435cbab9	09-Apr-2020	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix quota_sync failure due to f2fs_lock_op f2fs_quota_sync() uses f2fs_lock_op() before flushing dirty pages, but f2fs_write_data_page() returns EAGAIN. Likewise dentry blocks, we can just bypass getting the lock, since quota blocks are also maintained by checkpoint. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8b83ac81	16-Apr-2020	Chao Yu <chao@kernel.org>	f2fs: support read iostat Adds to support accounting read IOs from userspace/kernel. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# da9953b7	02-Apr-2020	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: introduce sysfs/data_io_flag to attach REQ_META/FUA This patch introduces a way to attach REQ_META/FUA explicitly to all the data writes given temperature. -> attach REQ_FUA to Hot Data writes -> attach REQ_FUA to Hot\|Warm Data writes -> attach REQ_FUA to Hot\|Warm\|Cold Data writes -> attach REQ_FUA to Hot\|Warm\|Cold Data writes as well as REQ_META to Hot Data writes Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7496affa	30-Mar-2020	Chao Yu <chao@kernel.org>	f2fs: fix to use f2fs_readpage_limit() in f2fs_read_multi_pages() Multipage read flow should consider fsverity, so it needs to use f2fs_readpage_limit() instead of i_size_read() to check EOF condition. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 74878565	24-Mar-2020	Chao Yu <chao@kernel.org>	f2fs: fix to avoid double unlock On image that has verity and compression feature, if compressed pages and non-compressed pages are mixed in one bio, we may double unlock non-compressed page in below flow: - f2fs_post_read_work - f2fs_decompress_work - f2fs_decompress_bio - __read_end_io - unlock_page - fsverity_enqueue_verify_work - f2fs_verity_work - f2fs_verify_bio - unlock_page So it should skip handling non-compressed page in f2fs_decompress_work() if verity is on. Besides, add missing dec_page_count() in f2fs_verify_bio(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 79bbefb1	23-Mar-2020	Chao Yu <chao@kernel.org>	f2fs: fix NULL pointer dereference in f2fs_verity_work() If both compression and fsverity feature is on, generic/572 will report below NULL pointer dereference bug. BUG: kernel NULL pointer dereference, address: 0000000000000018 RIP: 0010:f2fs_verity_work+0x60/0x90 [f2fs] #PF: supervisor read access in kernel mode Workqueue: fsverity_read_queue f2fs_verity_work [f2fs] RIP: 0010:f2fs_verity_work+0x60/0x90 [f2fs] Call Trace: process_one_work+0x16c/0x3f0 worker_thread+0x4c/0x440 ? rescuer_thread+0x350/0x350 kthread+0xf8/0x130 ? kthread_unpark+0x70/0x70 ret_from_fork+0x35/0x40 There are two issue in f2fs_verity_work(): - it needs to traverse and verify all pages in bio. - if pages in bio belong to non-compressed cluster, accessing decompress IO context stored in page private will cause NULL pointer dereference. Fix them. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b13f67ff	19-Mar-2020	Chao Yu <chao@kernel.org>	f2fs: fix to avoid potential deadlock We should always check F2FS_I(inode)->cp_task condition in prior to other conditions in __should_serialize_io() to avoid deadloop described in commit 040d2bb318d1 ("f2fs: fix to avoid deadloop if data_flush is on"), however we break this rule when we support compression, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ad8d6a02	20-Mar-2020	DongDongJu <commisori28@gmail.com>	f2fs: delete DIO read lock This lock can be a contention with multi 4k random read IO with single inode. example) fio --output=test --name=test --numjobs=60 --filename=/media/samsung960pro/file_test --rw=randread --bs=4k --direct=1 --time_based --runtime=7 --ioengine=libaio --iodepth=256 --group_reporting --size=10G With this commit, it remove that possible lock contention. Signed-off-by: Dongjoo Seo <commisori28@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ca9e968a	18-Feb-2020	Chao Yu <chao@kernel.org>	f2fs: avoid __GFP_NOFAIL in f2fs_bio_alloc __f2fs_bio_alloc() won't fail due to memory pool backend, remove unneeded __GFP_NOFAIL flag in __f2fs_bio_alloc(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0683728a	18-Feb-2020	Chao Yu <chao@kernel.org>	f2fs: fix to avoid triggering IO in write path If we are in write IO path, we need to avoid using GFP_KERNEL. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 98510003	17-Feb-2020	Chao Yu <chao@kernel.org>	f2fs: add prefix for f2fs slab cache name In order to avoid polluting global slab cache namespace. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5df7731f	17-Feb-2020	Chao Yu <chao@kernel.org>	f2fs: introduce DEFAULT_IO_TIMEOUT As Geert Uytterhoeven reported: for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50); On some platforms, HZ can be less than 50, then unexpected 0 timeout jiffies will be set in congestion_wait(). This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue. Quoted from Geert Uytterhoeven: "A timeout of HZ means 1 second. HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50. If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20), as that takes care of the special cases, and never returns 0." Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b0332a0f	14-Feb-2020	Chao Yu <chao@kernel.org>	f2fs: clean up lfs/adaptive mount option This patch removes F2FS_MOUNT_ADAPTIVE and F2FS_MOUNT_LFS mount options, and add F2FS_OPTION.fs_mode with below two status to indicate filesystem mode. enum { FS_MODE_ADAPTIVE, /* use both lfs/ssr allocation / FS_MODE_LFS, / use lfs allocation only */ }; It can enhance code readability and fs mode's scalability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a2ced1ce	14-Feb-2020	Chao Yu <chao@kernel.org>	f2fs: clean up codes with {f2fs_,}data_blkaddr() - rename datablock_addr() to data_blkaddr(). - wrap data_blkaddr() with f2fs_data_blkaddr() to clean up parameters. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7a88ddb5	27-Feb-2020	Chao Yu <chao@kernel.org>	f2fs: fix inconsistent comments Lack of maintenance on comments may mislead developers, fix them. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c10c9820	27-Feb-2020	Chao Yu <chao@kernel.org>	f2fs: cover last_disk_size update with spinlock This change solves below hangtask issue: INFO: task kworker/u16:1:58 blocked for more than 122 seconds. Not tainted 5.6.0-rc2-00590-g9983bdae4974e #11 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u16:1 D 0 58 2 0x00000000 Workqueue: writeback wb_workfn (flush-179:0) Backtrace: (__schedule) from [<c0913234>] (schedule+0x78/0xf4) (schedule) from [<c017ec74>] (rwsem_down_write_slowpath+0x24c/0x4c0) (rwsem_down_write_slowpath) from [<c0915f2c>] (down_write+0x6c/0x70) (down_write) from [<c0435b80>] (f2fs_write_single_data_page+0x608/0x7ac) (f2fs_write_single_data_page) from [<c0435fd8>] (f2fs_write_cache_pages+0x2b4/0x7c4) (f2fs_write_cache_pages) from [<c043682c>] (f2fs_write_data_pages+0x344/0x35c) (f2fs_write_data_pages) from [<c0267ee8>] (do_writepages+0x3c/0xd4) (do_writepages) from [<c0310cbc>] (__writeback_single_inode+0x44/0x454) (__writeback_single_inode) from [<c03112d0>] (writeback_sb_inodes+0x204/0x4b0) (writeback_sb_inodes) from [<c03115cc>] (__writeback_inodes_wb+0x50/0xe4) (__writeback_inodes_wb) from [<c03118f4>] (wb_writeback+0x294/0x338) (wb_writeback) from [<c0312dac>] (wb_workfn+0x35c/0x54c) (wb_workfn) from [<c014f2b8>] (process_one_work+0x214/0x544) (process_one_work) from [<c014f634>] (worker_thread+0x4c/0x574) (worker_thread) from [<c01564fc>] (kthread+0x144/0x170) (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c) Reported-and-tested-by: Ondřej Jirman <megi@xff.cz> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 30460e1e	09-Jan-2020	Carlos Maiolino <cmaiolino@redhat.com>	fs: Enable bmap() function to properly return errors By now, bmap() will either return the physical block number related to the requested file offset or 0 in case of error or the requested offset maps into a hole. This patch makes the needed changes to enable bmap() to proper return errors, using the return value as an error return, and now, a pointer must be passed to bmap() to be filled with the mapped physical block. It will change the behavior of bmap() on return: - negative value in case of error - zero on success or map fell into a hole In case of a hole, the *block will be zero too Since this is a prep patch, by now, the only error return is -EINVAL if ->bmap doesn't exist. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 644c8c92	30-Dec-2019	Eric Biggers <ebiggers@google.com>	f2fs: fix deadlock allocating bio_post_read_ctx from mempool Without any form of coordination, any case where multiple allocations from the same mempool are needed at a time to make forward progress can deadlock under memory pressure. This is the case for struct bio_post_read_ctx, as one can be allocated to decrypt a Merkle tree page during fsverity_verify_bio(), which itself is running from a post-read callback for a data bio which has its own struct bio_post_read_ctx. Fix this by freeing first bio_post_read_ctx before calling fsverity_verify_bio(). This works because verity (if enabled) is always the last post-read step. This deadlock can be reproduced by trying to read from an encrypted verity file after reducing NUM_PREALLOC_POST_READ_CTXS to 1 and patching mempool_alloc() to pretend that pool->alloc() always fails. Note that since NUM_PREALLOC_POST_READ_CTXS is actually 128, to actually hit this bug in practice would require reading from lots of encrypted verity files at the same time. But it's theoretically possible, as N available objects doesn't guarantee forward progress when > N/2 threads each need 2 objects at a time. Fixes: 95ae251fe828 ("f2fs: add fs-verity support") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e8ce5749	30-Dec-2019	Eric Biggers <ebiggers@google.com>	f2fs: remove unneeded check for error allocating bio_post_read_ctx Since allocating an object from a mempool never fails when __GFP_DIRECT_RECLAIM (which is included in GFP_NOFS) is set, the check for failure to allocate a bio_post_read_ctx is unnecessary. Remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3e5e479a	27-Dec-2019	Chao Yu <chao@kernel.org>	f2fs: fix to add swap extent correctly As Youling reported in mailing list: https://www.linuxquestions.org/questions/linux-newbie-8/the-file-system-f2fs-is-broken-4175666043/ https://www.linux.org/threads/the-file-system-f2fs-is-broken.26490/ There is a test case can corrupt f2fs image: - dd if=/dev/zero of=/swapfile bs=1M count=4096 - chmod 600 /swapfile - mkswap /swapfile - swapon --discard /swapfile The root cause is f2fs_swap_activate() intends to return zero value to setup_swap_extents() to enable SWP_FS mode (swap file goes through fs), in this flow, setup_swap_extents() setups swap extent with wrong block address range, result in discard_swap() erasing incorrect address. Because f2fs_swap_activate() has pinned swapfile, its data block address will not change, it's safe to let swap to handle IO through raw device, so we can get rid of SWAP_FS mode and initial swap extents inside f2fs_swap_activate(), by this way, later discard_swap() can trim in right address range. Fixes: 4969c06a0d83 ("f2fs: support swap file w/ DIO") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4c8ff709	01-Nov-2019	Chao Yu <chao@kernel.org>	f2fs: support data compression This patch tries to support compression in f2fs. - New term named cluster is defined as basic unit of compression, file can be divided into multiple clusters logically. One cluster includes 4 << n (n >= 0) logical pages, compression size is also cluster size, each of cluster can be compressed or not. - In cluster metadata layout, one special flag is used to indicate cluster is compressed one or normal one, for compressed cluster, following metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores data including compress header and compressed data. - In order to eliminate write amplification during overwrite, F2FS only support compression on write-once file, data can be compressed only when all logical blocks in file are valid and cluster compress ratio is lower than specified threshold. - To enable compression on regular inode, there are three ways: * chattr +c file * chattr +c dir; touch dir/file * mount w/ -o compress_extension=ext; touch file.ext Compress metadata layout: [Dnode Structure] +-----------------------------------------------+ \| cluster 1 \| cluster 2 \| ......... \| cluster N \| +-----------------------------------------------+ . . . . . . . . . Compressed Cluster . . Normal Cluster . +----------+---------+---------+---------+ +---------+---------+---------+---------+ \|compr flag\| block 1 \| block 2 \| block 3 \| \| block 1 \| block 2 \| block 3 \| block 4 \| +----------+---------+---------+---------+ +---------+---------+---------+---------+ . . . . . . +-------------+-------------+----------+----------------------------+ \| data length \| data chksum \| reserved \| compressed data \| +-------------+-------------+----------+----------------------------+ Changelog: 20190326: - fix error handling of read_end_io(). - remove unneeded comments in f2fs_encrypt_one_page(). 20190327: - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages(). - don't jump into loop directly to avoid uninitialized variables. - add TODO tag in error path of f2fs_write_cache_pages(). 20190328: - fix wrong merge condition in f2fs_read_multi_pages(). - check compressed file in f2fs_post_read_required(). 20190401 - allow overwrite on non-compressed cluster. - check cluster meta before writing compressed data. 20190402 - don't preallocate blocks for compressed file. - add lz4 compress algorithm - process multiple post read works in one workqueue Now f2fs supports processing post read work in multiple workqueue, it shows low performance due to schedule overhead of multiple workqueue executing orderly. 20190921 - compress: support buffered overwrite C: compress cluster flag V: valid block address N: NEW_ADDR One cluster contain 4 blocks before overwrite after overwrite - VVVV -> CVNN - CVNN -> VVVV - CVNN -> CVNN - CVNN -> CVVV - CVVV -> CVNN - CVVV -> CVVV 20191029 - add kconfig F2FS_FS_COMPRESSION to isolate compression related codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm. note that: will remove lzo backend if Jaegeuk agreed that too. - update codes according to Eric's comments. 20191101 - apply fixes from Jaegeuk 20191113 - apply fixes from Jaegeuk - split workqueue for fsverity 20191216 - apply fixes from Jaegeuk 20200117 - fix to avoid NULL pointer dereference [Jaegeuk Kim] - add tracepoint for f2fs_{,de}compress_pages() - fix many bugs and add some compression stats - fix overwrite/mmap bugs - address 32bit build error, reported by Geert. - bug fixes when handling errors and i_compressed_blocks Reported-by: <noreply@ellerman.id.au> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f543805f	03-Dec-2019	Chao Yu <chao@kernel.org>	f2fs: introduce private bioset In low memory scenario, we can allocate multiple bios without submitting any of them. - f2fs_write_checkpoint() - block_operations() - f2fs_sync_node_pages() step 1) flush cold nodes, allocate new bio from mempool - bio_alloc() - mempool_alloc() step 2) flush hot nodes, allocate a bio from mempool - bio_alloc() - mempool_alloc() step 3) flush warm nodes, be stuck in below call path - bio_alloc() - mempool_alloc() - loop to wait mempool element release, as we only reserved memory for two bio allocation, however above allocated two bios may never be submitted. So we need avoid using default bioset, in this patch we introduce a private bioset, in where we enlarg mempool element count to total number of log header, so that we can make sure we have enough backuped memory pool in scenario of allocating/holding multiple bios. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# fd39073d	06-Jan-2020	Eric Biggers <ebiggers@google.com>	fs-verity: implement readahead of Merkle tree pages When fs-verity verifies data pages, currently it reads each Merkle tree page synchronously using read_mapping_page(). Therefore, when the Merkle tree pages aren't already cached, fs-verity causes an extra 4 KiB I/O request for every 512 KiB of data (assuming that the Merkle tree uses SHA-256 and 4 KiB blocks). This results in more I/O requests and performance loss than is strictly necessary. Therefore, implement readahead of the Merkle tree pages. For simplicity, we take advantage of the fact that the kernel already does readahead of the file's data, just like it does for any other file. Due to this, we don't really need a separate readahead state (struct file_ra_state) just for the Merkle tree, but rather we just need to piggy-back on the existing data readahead requests. We also only really need to bother with the first level of the Merkle tree, since the usual fan-out factor is 128, so normally over 99% of Merkle tree I/O requests are for the first level. Therefore, make fsverity_verify_bio() enable readahead of the first Merkle tree level, for up to 1/4 the number of pages in the bio, when it sees that the REQ_RAHEAD flag is set on the bio. The readahead size is then passed down to ->read_merkle_tree_page() for the filesystem to (optionally) implement if it sees that the requested page is uncached. While we're at it, also make build_merkle_tree_level() set the Merkle tree readahead size, since it's easy to do there. However, for now don't set the readahead size in fsverity_verify_page(), since currently it's only used to verify holes on ext4 and f2fs, and it would need parameters added to know how much to read ahead. This patch significantly improves fs-verity sequential read performance. Some quick benchmarks with 'cat'-ing a 250MB file after dropping caches: On an ARM64 phone (using sha256-ce): Before: 217 MB/s After: 263 MB/s (compare to sha256sum of non-verity file: 357 MB/s) In an x86_64 VM (using sha256-avx2): Before: 173 MB/s After: 215 MB/s (compare to sha256sum of non-verity file: 223 MB/s) Link: https://lore.kernel.org/r/20200106205533.137005-1-ebiggers@kernel.org Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
# 3f188c23	03-Dec-2019	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: keep quota data on write_begin failure This patch avoids some unnecessary locks for quota files when write_begin fails. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 47501f87	26-Nov-2019	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: preallocate DIO blocks when forcing buffered_io The previous preallocation and DIO decision like below. allow_outplace_dio !allow_outplace_dio f2fs_force_buffered_io () No_Prealloc / Buffered_IO Prealloc / Buffered_IO !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO But, Javier reported Case () where zoned device bypassed preallocation but fell back to buffered writes in f2fs_direct_IO(), resulting in stale data being read. In order to fix the issue, actually we need to preallocate blocks whenever we fall back to buffered IO like this. No change is made in the other cases. allow_outplace_dio !allow_outplace_dio f2fs_force_buffered_io (*) Prealloc / Buffered_IO Prealloc / Buffered_IO !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO Reported-and-tested-by: Javier Gonzalez <javier@javigon.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c45d6002	01-Nov-2019	Chao Yu <chao@kernel.org>	f2fs: show f2fs instance in printk_ratelimited As Eric mentioned, bare printk{,_ratelimited} won't show which filesystem instance these message is coming from, this patch tries to show fs instance with sb->s_id field in all places we missed before. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1f0d5c91	07-Nov-2019	Chao Yu <chao@kernel.org>	f2fs: fix potential overflow We expect 64-bit calculation result from below statement, however in 32-bit machine, looped left shift operation on pgoff_t type variable may cause overflow issue, fix it by forcing type cast. page->index << PAGE_SHIFT; Fixes: 26de9b117130 ("f2fs: avoid unnecessary updating inode during fsync") Fixes: 0a2aa8fbb969 ("f2fs: refactor __exchange_data_block for speed up") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0b20fcec	30-Sep-2019	Chao Yu <chao@kernel.org>	f2fs: cache global IPU bio In commit 8648de2c581e ("f2fs: add bio cache for IPU"), we added f2fs_submit_ipu_bio() in __write_data_page() as below: __write_data_page() if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode)) { f2fs_submit_ipu_bio(sbi, bio, page); .... } in order to avoid below deadlock: Thread A Thread B - __write_data_page (inode x, page y) - f2fs_do_write_data_page - set_page_writeback ---- set writeback flag in page y - f2fs_inplace_write_data - f2fs_balance_fs - lock gc_mutex - lock gc_mutex - f2fs_gc - do_garbage_collect - gc_data_segment - move_data_page - f2fs_wait_on_page_writeback - wait_on_page_writeback --- wait writeback of page y However, the bio submission breaks the merge of IPU IOs. So in this patch let's add a global bio cache for merged IPU pages, then f2fs_wait_on_page_writeback() is able to submit bio if a writebacked page is cached in global bio cache. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8223ecc4	28-Aug-2019	Chao Yu <chao@kernel.org>	f2fs: fix to add missing F2FS_IO_ALIGNED() condition In f2fs_allocate_data_block(), we will reset fio.retry for IO alignment feature instead of IO serialization feature. In addition, spread F2FS_IO_ALIGNED() to check IO alignment feature status explicitly. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 05e360061	28-Aug-2019	Chao Yu <chao@kernel.org>	f2fs: fix to handle error path correctly in f2fs_map_blocks In f2fs_map_blocks(), we should bail out once __allocate_data_block() failed. Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 86f35dc3	28-Aug-2019	Chao Yu <chao@kernel.org>	f2fs: fix extent corrupotion during directIO in LFS mode In LFS mode, por_fsstress testcase reports a bug as below: [ASSERT] (fsck_chk_inode_blk: 931) --> ino: 0x12fe has wrong ext: [pgofs:142, blk:215424, len:16] Since commit f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode"), we start to allow OPU mode for direct IO, however, we missed to update extent cache in __allocate_data_block(), finally, it cause extent field being inconsistent with physical block address, fix it. Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 00e09c0b	23-Aug-2019	Chao Yu <chao@kernel.org>	f2fs: enhance f2fs_is_checkpoint_ready()'s readability This patch changes sematics of f2fs_is_checkpoint_ready()'s return value as: return true when checkpoint is ready, other return false, it can improve readability of below conditions. f2fs_submit_page_write() ... if (is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN) \|\| !f2fs_is_checkpoint_ready(sbi)) __submit_merged_bio(io); f2fs_balance_fs() ... if (!f2fs_is_checkpoint_ready(sbi)) return; Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b757f6ed	23-Aug-2019	Chao Yu <chao@kernel.org>	f2fs: clean up __bio_alloc()'s parameter Just cleanup, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7975f349	22-Jul-2019	Chao Yu <chao@kernel.org>	f2fs: support fiemap() for directory inode Adjust f2fs_fiemap() to support fiemap() on directory inode. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c72db71e	12-Jul-2019	Chao Yu <chao@kernel.org>	f2fs: fix panic of IO alignment feature Since 07173c3ec276 ("block: enable multipage bvecs"), one bio vector can store multi pages, so that we can not calculate max IO size of bio as PAGE_SIZE * bio->bi_max_vecs. However IO alignment feature of f2fs always has that assumption, so finally, it may cause panic during IO submission as below stack. kernel BUG at fs/f2fs/data.c:317! RIP: 0010:__submit_merged_bio+0x8b0/0x8c0 Call Trace: f2fs_submit_page_write+0x3cd/0xdd0 do_write_page+0x15d/0x360 f2fs_outplace_write_data+0xd7/0x210 f2fs_do_write_data_page+0x43b/0xf30 __write_data_page+0xcf6/0x1140 f2fs_write_cache_pages+0x3ba/0xb40 f2fs_write_data_pages+0x3dd/0x8b0 do_writepages+0xbb/0x1e0 __writeback_single_inode+0xb6/0x800 writeback_sb_inodes+0x441/0x910 wb_writeback+0x261/0x650 wb_workfn+0x1f9/0x7a0 process_one_work+0x503/0x970 worker_thread+0x7d/0x820 kthread+0x1ad/0x210 ret_from_fork+0x35/0x40 This patch adds one extra condition to check left space in bio while trying merging page to bio, to avoid panic. This bug was reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204043 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8896cbdf	12-Jul-2019	Chao Yu <chao@kernel.org>	f2fs: introduce {page,io}_is_mergeable() for readability Wrap merge condition into function for readability, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 75a037f3	31-Jul-2019	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix livelock in swapfile writes This patch fixes livelock in the below call path when writing swap pages. [46374.617256] c2 701 __switch_to+0xe4/0x100 [46374.617265] c2 701 __schedule+0x80c/0xbc4 [46374.617273] c2 701 schedule+0x74/0x98 [46374.617281] c2 701 rwsem_down_read_failed+0x190/0x234 [46374.617291] c2 701 down_read+0x58/0x5c [46374.617300] c2 701 f2fs_map_blocks+0x138/0x9a8 [46374.617310] c2 701 get_data_block_dio_write+0x74/0x104 [46374.617320] c2 701 __blockdev_direct_IO+0x1350/0x3930 [46374.617331] c2 701 f2fs_direct_IO+0x55c/0x8bc [46374.617341] c2 701 __swap_writepage+0x1d0/0x3e8 [46374.617351] c2 701 swap_writepage+0x44/0x54 [46374.617360] c2 701 shrink_page_list+0x140/0xe80 [46374.617371] c2 701 shrink_inactive_list+0x510/0x918 [46374.617381] c2 701 shrink_node_memcg+0x2d4/0x804 [46374.617391] c2 701 shrink_node+0x10c/0x2f8 [46374.617400] c2 701 do_try_to_free_pages+0x178/0x38c [46374.617410] c2 701 try_to_free_pages+0x348/0x4b8 [46374.617419] c2 701 __alloc_pages_nodemask+0x7f8/0x1014 [46374.617429] c2 701 pagecache_get_page+0x184/0x2cc [46374.617438] c2 701 f2fs_new_node_page+0x60/0x41c [46374.617449] c2 701 f2fs_new_inode_page+0x50/0x7c [46374.617460] c2 701 f2fs_init_inode_metadata+0x128/0x530 [46374.617472] c2 701 f2fs_add_inline_entry+0x138/0xd64 [46374.617480] c2 701 f2fs_do_add_link+0xf4/0x178 [46374.617488] c2 701 f2fs_create+0x1e4/0x3ac [46374.617497] c2 701 path_openat+0xdc0/0x1308 [46374.617507] c2 701 do_filp_open+0x78/0x124 [46374.617516] c2 701 do_sys_open+0x134/0x248 [46374.617525] c2 701 SyS_openat+0x14/0x20 Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 95ae251f	22-Jul-2019	Eric Biggers <ebiggers@google.com>	f2fs: add fs-verity support Add fs-verity support to f2fs. fs-verity is a filesystem feature that enables transparent integrity protection and authentication of read-only files. It uses a dm-verity like mechanism at the file level: a Merkle tree is used to verify any block in the file in log(filesize) time. It is implemented mainly by helper functions in fs/verity/. See Documentation/filesystems/fsverity.rst for the full documentation. The f2fs support for fs-verity consists of: - Adding a filesystem feature flag and an inode flag for fs-verity. - Implementing the fsverity_operations to support enabling verity on an inode and reading/writing the verity metadata. - Updating ->readpages() to verify data as it's read from verity files and to support reading verity metadata pages. - Updating ->write_begin(), ->write_end(), and ->writepages() to support writing verity metadata pages. - Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl(). Like ext4, f2fs stores the verity metadata (Merkle tree and fsverity_descriptor) past the end of the file, starting at the first 64K boundary beyond i_size. This approach works because (a) verity files are readonly, and (b) pages fully beyond i_size aren't visible to userspace but can be read/written internally by f2fs with only some relatively small changes to f2fs. Extended attributes cannot be used because (a) f2fs limits the total size of an inode's xattr entries to 4096 bytes, which wouldn't be enough for even a single Merkle tree block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity metadata must be encrypted when the file is because it contains hashes of the plaintext data. Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
# 37109694	18-Jul-2019	Keith Busch <kbusch@kernel.org>	mm: migrate: remove unused mode argument migrate_page_move_mapping() doesn't use the mode argument. Remove it and update callers accordingly. Link: http://lkml.kernel.org/r/20190508210301.8472-1-keith.busch@intel.com Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 34e51a5e	27-Jun-2019	Tejun Heo <tj@kernel.org>	blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner() wbc_account_io() does a very specific job - try to see which cgroup is actually dirtying an inode and transfer its ownership to the majority dirtier if needed. The name is too generic and confusing. Let's rename it to something more specific. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 4969c06a	01-Jul-2019	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: support swap file w/ DIO Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 10f966bb	19-Jun-2019	Chao Yu <chao@kernel.org>	f2fs: use generic EFSBADCRC/EFSCORRUPTED f2fs uses EFAULT as error number to indicate filesystem is corrupted all the time, but generic filesystems use EUCLEAN for such condition, we need to change to follow others. This patch adds two new macros as below to wrap more generic error code macros, and spread them in code. EFSBADCRC EBADMSG /* Bad CRC detected / EFSCORRUPTED EUCLEAN / Filesystem is corrupted */ Reported-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Chao Yu <yuchao0@huawei.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 53bc1d85	20-May-2019	Eric Biggers <ebiggers@google.com>	fscrypt: support encrypting multiple filesystem blocks per page Rename fscrypt_encrypt_page() to fscrypt_encrypt_pagecache_blocks() and redefine its behavior to encrypt all filesystem blocks from the given region of the given page, rather than assuming that the region consists of just one filesystem block. Also remove the 'inode' and 'lblk_num' parameters, since they can be retrieved from the page as it's already assumed to be a pagecache page. This is in preparation for allowing encryption on ext4 filesystems with blocksize != PAGE_SIZE. This is based on work by Chandan Rajendra. Reviewed-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
# d2d0727b	20-May-2019	Eric Biggers <ebiggers@google.com>	fscrypt: simplify bounce page handling Currently, bounce page handling for writes to encrypted files is unnecessarily complicated. A fscrypt_ctx is allocated along with each bounce page, page_private(bounce_page) points to this fscrypt_ctx, and fscrypt_ctx::w::control_page points to the original pagecache page. However, because writes don't use the fscrypt_ctx for anything else, there's no reason why page_private(bounce_page) can't just point to the original pagecache page directly. Therefore, this patch makes this change. In the process, it also cleans up the API exposed to filesystems that allows testing whether a page is a bounce page, getting the pagecache page from a bounce page, and freeing a bounce page. Reviewed-by: Chandan Rajendra <chandan@linux.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
# 040d2bb3	20-May-2019	Chao Yu <chao@kernel.org>	f2fs: fix to avoid deadloop if data_flush is on As Hagbard Celine reported: [ 615.697824] INFO: task kworker/u16:5:344 blocked for more than 120 seconds. [ 615.697825] Not tainted 5.0.15-gentoo-f2fslog #4 [ 615.697826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 615.697827] kworker/u16:5 D 0 344 2 0x80000000 [ 615.697831] Workqueue: writeback wb_workfn (flush-259:0) [ 615.697832] Call Trace: [ 615.697836] ? __schedule+0x2c5/0x8b0 [ 615.697839] schedule+0x32/0x80 [ 615.697841] schedule_preempt_disabled+0x14/0x20 [ 615.697842] __mutex_lock.isra.8+0x2ba/0x4d0 [ 615.697845] ? log_store+0xf5/0x260 [ 615.697848] f2fs_write_data_pages+0x133/0x320 [ 615.697851] ? trace_hardirqs_on+0x2c/0xe0 [ 615.697854] do_writepages+0x41/0xd0 [ 615.697857] __filemap_fdatawrite_range+0x81/0xb0 [ 615.697859] f2fs_sync_dirty_inodes+0x1dd/0x200 [ 615.697861] f2fs_balance_fs_bg+0x2a7/0x2c0 [ 615.697863] ? up_read+0x5/0x20 [ 615.697865] ? f2fs_do_write_data_page+0x2cb/0x940 [ 615.697867] f2fs_balance_fs+0xe5/0x2c0 [ 615.697869] __write_data_page+0x1c8/0x6e0 [ 615.697873] f2fs_write_cache_pages+0x1e0/0x450 [ 615.697878] f2fs_write_data_pages+0x14b/0x320 [ 615.697880] ? trace_hardirqs_on+0x2c/0xe0 [ 615.697883] do_writepages+0x41/0xd0 [ 615.697885] __filemap_fdatawrite_range+0x81/0xb0 [ 615.697887] f2fs_sync_dirty_inodes+0x1dd/0x200 [ 615.697889] f2fs_balance_fs_bg+0x2a7/0x2c0 [ 615.697891] f2fs_write_node_pages+0x51/0x220 [ 615.697894] do_writepages+0x41/0xd0 [ 615.697897] __writeback_single_inode+0x3d/0x3d0 [ 615.697899] writeback_sb_inodes+0x1e8/0x410 [ 615.697902] __writeback_inodes_wb+0x5d/0xb0 [ 615.697904] wb_writeback+0x28f/0x340 [ 615.697906] ? cpumask_next+0x16/0x20 [ 615.697908] wb_workfn+0x33e/0x420 [ 615.697911] process_one_work+0x1a1/0x3d0 [ 615.697913] worker_thread+0x30/0x380 [ 615.697915] ? process_one_work+0x3d0/0x3d0 [ 615.697916] kthread+0x116/0x130 [ 615.697918] ? kthread_create_worker_on_cpu+0x70/0x70 [ 615.697921] ret_from_fork+0x3a/0x50 There is still deadloop in below condition: d A - do_writepages - f2fs_write_node_pages - f2fs_balance_fs_bg - f2fs_sync_dirty_inodes - f2fs_write_cache_pages - mutex_lock(&sbi->writepages) -- lock once - __write_data_page - f2fs_balance_fs_bg - f2fs_sync_dirty_inodes - f2fs_write_data_pages - mutex_lock(&sbi->writepages) -- lock again Thread A Thread B - do_writepages - f2fs_write_node_pages - f2fs_balance_fs_bg - f2fs_sync_dirty_inodes - .cp_task = current - f2fs_sync_dirty_inodes - .cp_task = current - filemap_fdatawrite - .cp_task = NULL - filemap_fdatawrite - f2fs_write_cache_pages - enter f2fs_balance_fs_bg since .cp_task is NULL - .cp_task = NULL Change as below to avoid this: - add condition to avoid holding .writepages mutex lock in path of data flush - introduce mutex lock sbi.flush_lock to exclude concurrent data flush in background. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8648de2c	19-Feb-2019	Chao Yu <chao@kernel.org>	f2fs: add bio cache for IPU SQLite in Wal mode may trigger sequential IPU write in db-wal file, after commit d1b3e72d5490 ("f2fs: submit bio of in-place-update pages"), we lost the chance of merging page in inner managed bio cache, result in submitting more small-sized IO. So let's add temporary bio in writepages() to cache mergeable write IO as much as possible. Test case: 1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync" 2. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync" Before: f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65552, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65560, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65568, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65576, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65584, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65592, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65600, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65608, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65616, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65624, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65632, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65640, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65648, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65656, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65664, size = 4096 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57352, size = 4096 After: f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 65536 f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57368, size = 4096 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 93770ab7	15-Apr-2019	Chao Yu <chao@kernel.org>	f2fs: introduce DATA_GENERIC_ENHANCE Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check whether @blkaddr locates in main area or not. That check is weak, since the block address in range of main area can point to the address which is not valid in segment info table, and we can not detect such condition, we may suffer worse corruption as system continues running. So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check which trigger SIT bitmap check rather than only range check. This patch did below changes as wel: - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr(). - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid. - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check. - spread blkaddr check in: * f2fs_get_node_info() * __read_out_blkaddrs() * f2fs_submit_page_read() * ra_data_block() * do_recover_data() This patch can fix bug reported from bugzilla below: https://bugzilla.kernel.org/show_bug.cgi?id=203215 https://bugzilla.kernel.org/show_bug.cgi?id=203223 https://bugzilla.kernel.org/show_bug.cgi?id=203231 https://bugzilla.kernel.org/show_bug.cgi?id=203235 https://bugzilla.kernel.org/show_bug.cgi?id=203241 = Update by Jaegeuk Kim = DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths. But, xfstest/generic/446 compalins some generated kernel messages saying invalid bitmap was detected when reading a block. The reaons is, when we get the block addresses from extent_cache, there is no lock to synchronize it from truncating the blocks in parallel. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2df0ab04	25-Mar-2019	Chao Yu <chao@kernel.org>	f2fs: introduce f2fs_read_single_page() for cleanup This patch introduces f2fs_read_single_page() to wrap core operations of reading one page in f2fs_mpage_readpages(). In addition, if we failed in f2fs_mpage_readpages(), propagate error number to f2fs_read_data_page(), for f2fs_read_data_pages() path, always return success. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# cd23ffa9	15-Apr-2019	Chao Yu <chao@kernel.org>	f2fs: fix to set FI_UPDATE_WRITE correctly This patch fixes to set FI_UPDATE_WRITE only if in-place IO was issued. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6dc3a126	15-Apr-2019	Chao Yu <chao@kernel.org>	f2fs: fix wrong __is_meta_io() macro This patch changes codes as below: - don't use is_read_io() as a condition to judge the meta IO. - use .is_por to replace .is_meta to indicate IO is from recovery explicitly. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2b070cfe	25-Apr-2019	Christoph Hellwig <hch@lst.de>	block: remove the i argument to bio_for_each_segment_all We only have two callers that need the integer loop iterator, and they can easily maintain it themselves. Suggested-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: David Sterba <dsterba@suse.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# adcc00f7	06-Apr-2019	Hariprasad Kelam <hariprasad.kelam@gmail.com>	f2fs: data: fix warning Using plain integer as NULL pointer changed passing function argument "0 to NULL" to fix below sparse warning fs/f2fs/data.c:426:47: warning: Using plain integer as NULL pointer Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 186857c5	02-Apr-2019	Chao Yu <chao@kernel.org>	f2fs: fix potential recursive call when enabling data_flush As Hagbard Celine reported: Hi, this is a long standing bug that I've hit before on older kernels, but I was not able to get the syslog saved because of the nature of the bug. This time I had booted form a pen-drive, and was able to save the log to it's efi-partition. What i did to trigger it was to create a partition and format it f2fs, then mount it with options: "rw,relatime,lazytime,background_gc=on,disable_ext_identify,discard,heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,data_flush,extent_cache,mode=adaptive,active_logs=6,whint_mode=fs-based,alloc_mode=default,fsync_mode=strict". Then I unpacked a big .tar.xz to the partition (I used a gentoo-stage3-tarball as I was in process of installing Gentoo). Same options just without data_flush gives no problems. Mar 20 20:54:01 usbgentoo kernel: FAT-fs (nvme0n1p4): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. Mar 20 21:05:23 usbgentoo kernel: kworker/dying (1588) used greatest stack depth: 12064 bytes left Mar 20 21:06:40 usbgentoo kernel: BUG: stack guard page was hit at 00000000a4b0733c (stack is 0000000056016422..0000000096e7463f) Mar 20 21:06:40 usbgentoo kernel: kernel stack overflow ...... Mar 20 21:06:40 usbgentoo kernel: Call Trace: Mar 20 21:06:40 usbgentoo kernel: read_node_page+0x71/0xf0 Mar 20 21:06:40 usbgentoo kernel: ? xas_load+0x8/0x50 Mar 20 21:06:40 usbgentoo kernel: __get_node_page+0x73/0x2a0 Mar 20 21:06:40 usbgentoo kernel: f2fs_get_dnode_of_data+0x34e/0x580 Mar 20 21:06:40 usbgentoo kernel: f2fs_write_inline_data+0x5e/0x2a0 Mar 20 21:06:40 usbgentoo kernel: __write_data_page+0x421/0x690 Mar 20 21:06:40 usbgentoo kernel: f2fs_write_cache_pages+0x1cf/0x460 Mar 20 21:06:40 usbgentoo kernel: f2fs_write_data_pages+0x2b3/0x2e0 Mar 20 21:06:40 usbgentoo kernel: ? f2fs_inode_chksum_verify+0x1d/0xc0 Mar 20 21:06:40 usbgentoo kernel: ? read_node_page+0x71/0xf0 Mar 20 21:06:40 usbgentoo kernel: do_writepages+0x3c/0xd0 Mar 20 21:06:40 usbgentoo kernel: __filemap_fdatawrite_range+0x7c/0xb0 Mar 20 21:06:40 usbgentoo kernel: f2fs_sync_dirty_inodes+0xf2/0x200 Mar 20 21:06:40 usbgentoo kernel: f2fs_balance_fs_bg+0x2a3/0x2c0 Mar 20 21:06:40 usbgentoo kernel: ? f2fs_inode_dirtied+0x21/0xc0 Mar 20 21:06:40 usbgentoo kernel: f2fs_balance_fs+0xd6/0x2b0 Mar 20 21:06:40 usbgentoo kernel: __write_data_page+0x4fb/0x690 ...... Mar 20 21:06:40 usbgentoo kernel: __writeback_single_inode+0x2a1/0x340 Mar 20 21:06:40 usbgentoo kernel: ? soft_cursor+0x1b4/0x220 Mar 20 21:06:40 usbgentoo kernel: writeback_sb_inodes+0x1d5/0x3e0 Mar 20 21:06:40 usbgentoo kernel: __writeback_inodes_wb+0x58/0xa0 Mar 20 21:06:40 usbgentoo kernel: wb_writeback+0x250/0x2e0 Mar 20 21:06:40 usbgentoo kernel: ? 0xffffffff8c000000 Mar 20 21:06:40 usbgentoo kernel: ? cpumask_next+0x16/0x20 Mar 20 21:06:40 usbgentoo kernel: wb_workfn+0x2f6/0x3b0 Mar 20 21:06:40 usbgentoo kernel: ? __switch_to_asm+0x40/0x70 Mar 20 21:06:40 usbgentoo kernel: process_one_work+0x1f5/0x3f0 Mar 20 21:06:40 usbgentoo kernel: worker_thread+0x28/0x3c0 Mar 20 21:06:40 usbgentoo kernel: ? rescuer_thread+0x330/0x330 Mar 20 21:06:40 usbgentoo kernel: kthread+0x10e/0x130 Mar 20 21:06:40 usbgentoo kernel: ? kthread_create_on_node+0x60/0x60 Mar 20 21:06:40 usbgentoo kernel: ret_from_fork+0x35/0x40 The root cause is that we run into an infinite recursive calling in between f2fs_balance_fs_bg and writepage() as described below: - f2fs_write_data_pages --- A - __write_data_page - f2fs_balance_fs - f2fs_balance_fs_bg --- B - f2fs_sync_dirty_inodes - filemap_fdatawrite - f2fs_write_data_pages --- A ... - f2fs_balance_fs_bg --- B ... In order to fix this issue, let's detect such condition in __write_data_page() and just skip calling f2fs_balance_fs() recursively. Reported-by: Hagbard Celine <hagbardcelin@gmail.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0916878d	15-Mar-2019	Damien Le Moal <damien.lemoal@wdc.com>	f2fs: Fix use of number of devices For a single device mount using a zoned block device, the zone information for the device is stored in the sbi->devs single entry array and sbi->s_ndevs is set to 1. This differs from a single device mount using a regular block device which does not allocate sbi->devs and sets sbi->s_ndevs to 0. However, sbi->s_devs == 0 condition is used throughout the code to differentiate a single device mount from a multi-device mount where sbi->s_ndevs is always larger than 1. This results in problems with single zoned block device volumes as these are treated as multi-device mounts but do not have the start_blk and end_blk information set. One of the problem observed is skipping of zone discard issuing resulting in write commands being issued to full zones or unaligned to a zone write pointer. Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single regular block device mount) and sbi->s_ndevs == 1 (single zoned block device mount) in the same manner. This is done by introducing the helper function f2fs_is_multi_device() and using this helper in place of direct tests of sbi->s_ndevs value, improving code readability. Fixes: 7bb3a371d199 ("f2fs: Fix zoned block device support") Cc: <stable@vger.kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 86109c90	07-Mar-2019	Chao Yu <chao@kernel.org>	f2fs: don't trigger read IO for beyond EOF page In f2fs_mpage_readpages(), if page is beyond EOF, we should just zero out it, but previously, before checking previous mapping info, we missed to check filesize boundary, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 240a5915	06-Mar-2019	Chao Yu <chao@kernel.org>	f2fs: fix to add refcount once page is tagged PG_private As Gao Xiang reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202749 f2fs may skip pageout() due to incorrect page reference count. The problem here is that MM defined the rule [1] very clearly that once page was set with PG_private flag, we should increment the refcount in that page, also main flows like pageout(), migrate_page() will assume there is one additional page reference count if page_has_private() returns true. But currently, f2fs won't add/del refcount when changing PG_private flag. Anyway, f2fs should follow MM's rule to make MM's related flows running as expected. [1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com/ Reported-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 25720cc0	06-Mar-2019	Chao Yu <chao@kernel.org>	f2fs: remove wrong comment in f2fs_invalidate_page() Since 8c242db9b8c0 ("f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer"), we've started to not skip clear private flag for atomic_write page truncation, so removing old wrong comment in f2fs_invalidate_page(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6492a335	21-Feb-2019	Chao Yu <chao@kernel.org>	f2fs: fix encrypted page memory leak For IPU path of f2fs_do_write_data_page(), in its error path, we need to release encrypted page and fscrypt context, otherwise it will cause memory leak. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# bc73a4b2	18-Feb-2019	Gao Xiang <xiang@kernel.org>	f2fs: silence VM_WARN_ON_ONCE in mempool_alloc Note that __GFP_ZERO is not supported for mempool_alloc, which also documented in the mempool_alloc comments. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c42d28ce	02-Feb-2019	Chao Yu <chao@kernel.org>	f2fs: fix potential data inconsistence of checkpoint Previously, we changed lock from cp_rwsem to node_change, it solved the deadlock issue which was caused by below race condition: Thread A Thread B - f2fs_setattr - f2fs_lock_op -- read_lock - dquot_transfer - __dquot_transfer - dquot_acquire - commit_dqblk - f2fs_quota_write - f2fs_write_begin - f2fs_write_failed - write_checkpoint - block_operations - f2fs_lock_all -- write_lock - f2fs_truncate_blocks - f2fs_lock_op -- read_lock But it breaks the sematics of cp_rwsem, in other callers like: - f2fs_file_write_iter -> f2fs_write_begin -> f2fs_write_failed - f2fs_direct_IO -> f2fs_write_failed We allow to truncate dnode w/o cp_rwsem held, result in incorrect sit bitmap update, which can cause further data corruption. So this patch reverts previous fix implementation, and try to fix deadlock by skipping calling f2fs_truncate_blocks() in f2fs_write_failed() only for quota file, and keep the preallocated data/node in the tail of quota file, we can expecte that the preallocated space can be used to store quota info latter soon. Fixes: af033b2aa8a8 ("f2fs: guarantee journalled quota data by checkpoint") Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6dc4f100	15-Feb-2019	Ming Lei <ming.lei@redhat.com>	block: allow bio_for_each_segment_all() to iterate over multi-page bvec This patch introduces one extra iterator variable to bio_for_each_segment_all(), then we can allow bio_for_each_segment_all() to iterate over multi-page bvec. Given it is just one mechannical & simple change on all bio_for_each_segment_all() users, this patch does tree-wide change in one single patch, so that we can avoid to use a temporary helper for this conversion. Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 62230e0d	12-Dec-2018	Chandan Rajendra <chandan@linux.vnet.ibm.com>	f2fs: use IS_ENCRYPTED() to check encryption status This commit removes the f2fs specific f2fs_encrypted_inode() and makes use of the generic IS_ENCRYPTED() macro to check for the encryption status of an inode. Acked-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
# 8e114038	03-Jan-2019	YueHaibing <yuehaibing@huawei.com>	f2fs: remove set but not used variable 'err' Fixes gcc '-Wunused-but-set-variable' warning: fs/f2fs/data.c: In function 'f2fs_dio_submit_bio': fs/f2fs/data.c:2585:6: warning: variable 'err' set but not used [-Wunused-but-set-variable] Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ab41ee68	28-Dec-2018	Jan Kara <jack@suse.cz>	mm: migrate: drop unused argument of migrate_page_move_mapping() All callers of migrate_page_move_mapping() now pass NULL for 'head' argument. Drop it. Link: http://lkml.kernel.org/r/20181211172143.7358-7-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# bae0ee7a	25-Dec-2018	Chao Yu <chao@kernel.org>	f2fs: check PageWriteback flag for ordered case For all ordered cases in f2fs_wait_on_page_writeback(), we need to check PageWriteback status, so let's clean up to relocate the check into f2fs_wait_on_page_writeback(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5222595d	13-Dec-2018	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: use kvmalloc, if kmalloc is failed One report says memalloc failure during mount. (unwind_backtrace) from [<c010cd4c>] (show_stack+0x10/0x14) (show_stack) from [<c049c6b8>] (dump_stack+0x8c/0xa0) (dump_stack) from [<c024fcf0>] (warn_alloc+0xc4/0x160) (warn_alloc) from [<c0250218>] (__alloc_pages_nodemask+0x3f4/0x10d0) (__alloc_pages_nodemask) from [<c0270450>] (kmalloc_order_trace+0x2c/0x120) (kmalloc_order_trace) from [<c03fa748>] (build_node_manager+0x35c/0x688) (build_node_manager) from [<c03de494>] (f2fs_fill_super+0xf0c/0x16cc) (f2fs_fill_super) from [<c02a5864>] (mount_bdev+0x15c/0x188) (mount_bdev) from [<c03da624>] (f2fs_mount+0x18/0x20) (f2fs_mount) from [<c02a68b8>] (mount_fs+0x158/0x19c) (mount_fs) from [<c02c3c9c>] (vfs_kern_mount+0x78/0x134) (vfs_kern_mount) from [<c02c76ac>] (do_mount+0x474/0xca4) (do_mount) from [<c02c8264>] (SyS_mount+0x94/0xbc) (SyS_mount) from [<c0108180>] (ret_fast_syscall+0x0/0x48) Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2062e0c3	04-Dec-2018	Sheng Yong <shengyong1@huawei.com>	f2fs: clear PG_writeback if IPU failed If IPU failed, nothing is commited, we should end page writeback. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f4f0b677	19-Nov-2018	Jia Zhu <zhujia13@huawei.com>	f2fs: fix m_may_create to make OPU DIO write correctly Previously, we added a parameter @map.m_may_create to trigger OPU allocation and call f2fs_balance_fs() correctly. But in get_more_blocks(), @create has been overwritten by below code. So the function f2fs_map_blocks() will not allocate new block address but directly go out. Meanwile,there are several functions calling f2fs_map_blocks() directly and @map.m_may_create not initialized. CODE: create = dio->op == REQ_OP_WRITE; if (dio->flags & DIO_SKIP_HOLES) { if (fs_startblk <= ((i_size_read(dio->inode) - 1) >> i_blkbits)) create = 0; } This patch fixes it. Signed-off-by: Jia Zhu <zhujia13@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 73c0a927	26-Nov-2018	Jia Zhu <zhujia13@huawei.com>	f2fs: fix to update new block address correctly for OPU Previously, we allocated a new block address for OPU mode in direct_IO. But the new address couldn't be assigned to @map->m_pblk correctly. This patch fix it. Cc: <stable@vger.kernel.org> Fixes: 511f52d02f05 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Jia Zhu <zhujia13@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2866fb16	14-Nov-2018	Sheng Yong <shengyong1@huawei.com>	f2fs: fix race between write_checkpoint and write_begin The following race could lead to inconsistent SIT bitmap: Task A Task B ====== ====== f2fs_write_checkpoint block_operations f2fs_lock_all down_write(node_change) down_write(node_write) ... sync ... up_write(node_change) f2fs_file_write_iter set_inode_flag(FI_NO_PREALLOC) ...... f2fs_write_begin(index=0, has inline data) prepare_write_begin __do_map_lock(AIO) => down_read(node_change) f2fs_convert_inline_page => update SIT __do_map_lock(AIO) => up_read(node_change) f2fs_flush_sit_entries <= inconsistent SIT finish write checkpoint sudden-power-off If SPO occurs after checkpoint is finished, SIT bitmap will be set incorrectly. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1e771e83	12-Nov-2018	Yunlong Song <yunlong.song@huawei.com>	f2fs: only flush the single temp bio cache which owns the target page Previously, when f2fs finds which temp bio cache owns the target page, it will flush all the three temp bio caches, but we only need to flush one single bio cache indeed, which can help to keep bio merged. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f9d6d059	12-Nov-2018	Chao Yu <chao@kernel.org>	f2fs: fix out-place-update DIO write In get_more_blocks(), we may override @create as below code: create = dio->op == REQ_OP_WRITE; if (dio->flags & DIO_SKIP_HOLES) { if (fs_startblk <= ((i_size_read(dio->inode) - 1) >> i_blkbits)) create = 0; } But in f2fs_map_blocks(), we only trigger f2fs_balance_fs() if @create is 1, so in LFS mode, dio overwrite under LFS mode can easily run out of free segments, result in below panic. Call Trace: allocate_segment_by_default+0xa8/0x270 [f2fs] f2fs_allocate_data_block+0x1ea/0x5c0 [f2fs] __allocate_data_block+0x306/0x480 [f2fs] f2fs_map_blocks+0x6f6/0x920 [f2fs] __get_data_block+0x4f/0xb0 [f2fs] get_data_block_dio_write+0x50/0x60 [f2fs] do_blockdev_direct_IO+0xcd5/0x21e0 __blockdev_direct_IO+0x3a/0x3c f2fs_direct_IO+0x1ff/0x4a0 [f2fs] generic_file_direct_write+0xd9/0x160 __generic_file_write_iter+0xbb/0x1e0 f2fs_file_write_iter+0xaf/0x220 [f2fs] __vfs_write+0xd0/0x130 vfs_write+0xb2/0x1b0 SyS_pwrite64+0x69/0xa0 ? vtime_user_exit+0x29/0x70 do_syscall_64+0x6e/0x160 entry_SYSCALL64_slow_path+0x25/0x25 RIP: new_curseg+0x36f/0x380 [f2fs] RSP: ffffac570393f7a8 So this patch introduces a parameter map.m_may_create to indicate that f2fs_map_blocks() is called from write or read path, which can give the right hint to let f2fs_map_blocks() trigger OPU allocation and call f2fs_balanc_fs() correctly. BTW, it disables physical address preallocation for direct IO in f2fs_preallocate_blocks, which is redundant to OPU allocation of f2fs_map_blocks. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 02b16d0a	11-Nov-2018	Chao Yu <chao@kernel.org>	f2fs: add to account direct IO This patch adds f2fs_dio_submit_bio() to hook submit_io/end_io functions in direct IO path, in order to account DIO. Later, we will add this count into is_idle() to let background GC/Discard thread be aware of DIO. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# af033b2a	20-Sep-2018	Chao Yu <chao@kernel.org>	f2fs: guarantee journalled quota data by checkpoint For journalled quota mode, let checkpoint to flush dquot dirty data and quota file data to guarntee persistence of all quota sysfile in last checkpoint, by this way, we can avoid corrupting quota sysfile when encountering SPO. The implementation is as below: 1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is cached dquot metadata changes in quota subsystem, and later checkpoint should: a) flush dquot metadata into quota file. b) flush quota file to storage to keep file usage be consistent. 2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota operation failed due to -EIO or -ENOSPC, so later, a) checkpoint will skip syncing dquot metadata. b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a hint for fsck repairing. 3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota data updating is very heavy, it may cause hungtask in block_operation(). To avoid this, if our retry time exceed threshold, let's just skip flushing and retry in next checkpoint(). Signed-off-by: Weichao Guo <guoweichao@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: avoid warnings and set fsck flag] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1e78e8bd	09-Oct-2018	Sahitya Tummala <stummala@codeaurora.org>	f2fs: fix data corruption issue with hardware encryption Direct IO can be used in case of hardware encryption. The following scenario results into data corruption issue in this path - Thread A - Thread B- -> write file#1 in direct IO -> GC gets kicked in -> GC submitted bio on meta mapping for file#1, but pending completion -> write file#1 again with new data in direct IO -> GC bio gets completed now -> GC writes old data to the new location and thus file#1 is corrupted. Fix this by submitting and waiting for pending io on meta mapping for direct IO case in f2fs_map_blocks(). Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2baf0781	27-Jul-2018	Chao Yu <chao@kernel.org>	f2fs: fix to spread clear_cold_data() We need to drop PG_checked flag on page as well when we clear PG_uptodate flag, in order to avoid treating the page as GCing one later. Signed-off-by: Weichao Guo <guoweichao@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 164a63fa	16-Oct-2018	Jaegeuk Kim <jaegeuk@kernel.org>	Revert "f2fs: fix to clear PG_checked flag in set_page_dirty()" This reverts commit 66110abc4c931f879d70e83e1281f891699364bf. If we clear the cold data flag out of the writeback flow, we can miscount -1 by end_io, which incurs a deadlock caused by all I/Os being blocked during heavy GC. Balancing F2FS Async: - IO (CP: 1, Data: -1, Flush: ( 0 0 1), Discard: ( ... GC thread: IRQ - move_data_page() - set_page_dirty() - clear_cold_data() - f2fs_write_end_io() - type = WB_DATA_TYPE(page); here, we get wrong type - dec_page_count(sbi, type); - f2fs_wait_on_page_writeback() Cc: <stable@vger.kernel.org> Reported-and-Tested-by: Park Ju Hyung <qkrwngud825@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5f9abab4	16-Oct-2018	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: account read IOs and use IO counts for is_idle This patch adds issued read IO counts which is under block layer. Chao modified a bit, since: Below race can cause reversed reference on F2FS_RD_DATA, there is the same issue in f2fs_submit_page_bio(), fix them by relocate __submit_bio() and inc_page_count. Thread A Thread B - f2fs_write_begin - f2fs_submit_page_read - __submit_bio - f2fs_read_end_io - __read_end_io - dec_page_count(, F2FS_RD_DATA) - inc_page_count(, F2FS_RD_DATA) Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 78efac53	22-Oct-2018	Chao Yu <chao@kernel.org>	f2fs: fix to account IO correctly for cgroup writeback Now, we have supported cgroup writeback, it depends on correctly IO account of specified filesystem. But in commit d1b3e72d5490 ("f2fs: submit bio of in-place-update pages"), we split write paths from f2fs_submit_page_mbio() to two: - f2fs_submit_page_bio() for IPU path - f2fs_submit_page_bio() for OPU path But still we account write IO only in f2fs_submit_page_mbio(), result in incorrect IO account, fix it by adding missing IO account in IPU path. Fixes: d1b3e72d5490 ("f2fs: submit bio of in-place-update pages") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4c58ed07	21-Oct-2018	Chao Yu <chao@kernel.org>	f2fs: fix to account IO correctly Below race can cause reversed reference on dirty count, fix it by relocating __submit_bio() and inc_page_count(). Thread A Thread B - f2fs_inplace_write_data - f2fs_submit_page_bio - __submit_bio - f2fs_write_end_io - dec_page_count - inc_page_count Cc: <stable@vger.kernel.org> Fixes: d1b3e72d5490 ("f2fs: submit bio of in-place-update pages") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5ec2d99d	04-Dec-2017	Matthew Wilcox <willy@infradead.org>	f2fs: Convert to XArray This is a straightforward conversion. Signed-off-by: Matthew Wilcox <willy@infradead.org>
# 10bbd235	05-Dec-2017	Matthew Wilcox <willy@infradead.org>	pagevec: Use xa_mark_t Removes sparse warnings. Signed-off-by: Matthew Wilcox <willy@infradead.org>
# 4354994f	20-Aug-2018	Daniel Rosenberg <drosen@google.com>	f2fs: checkpoint disabling Note that, it requires "f2fs: return correct errno in f2fs_gc". This adds a lightweight non-persistent snapshotting scheme to f2fs. To use, mount with the option checkpoint=disable, and to return to normal operation, remount with checkpoint=enable. If the filesystem is shut down before remounting with checkpoint=enable, it will revert back to its apparent state when it was first mounted with checkpoint=disable. This is useful for situations where you wish to be able to roll back the state of the disk in case of some critical failure. Signed-off-by: Daniel Rosenberg <drosen@google.com> [Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# fb7d70db	25-Sep-2018	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: clear PageError on the read path When running fault injection test, I hit somewhat wrong behavior in f2fs_gc -> gc_data_segment(): 0. fault injection generated some PageError'ed pages 1. gc_data_segment -> f2fs_get_read_data_page(REQ_RAHEAD) 2. move_data_page -> f2fs_get_lock_data_page() -> f2f_get_read_data_page() -> f2fs_submit_page_read() -> submit_bio(READ) -> return EIO due to PageError -> fail to move data Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f847c699	27-Sep-2018	Chao Yu <chao@kernel.org>	f2fs: allow out-place-update for direct IO in LFS mode Normally, DIO uses in-pllace-update, but in LFS mode, f2fs doesn't allow triggering any in-place-update writes, so we fallback direct write to buffered write, result in bad performance of large size write. This patch adds to support triggering out-place-update for direct IO to enhance its performance. Note that it needs to exclude direct read IO during direct write, since new data writing to new block address will no be valid until write finished. storage: zram time xfs_io -f -d /mnt/f2fs/file -c "pwrite 0 1073741824" -c "fsync" Before: real 0m13.061s user 0m0.327s sys 0m12.486s After: real 0m6.448s user 0m0.228s sys 0m6.212s Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 39a86958	27-Sep-2018	Chao Yu <chao@kernel.org>	f2fs: refactor ->page_mkwrite() flow Thread A Thread B - f2fs_vm_page_mkwrite - f2fs_setattr - down_write(i_mmap_sem) - truncate_setsize - f2fs_truncate - up_write(i_mmap_sem) - f2fs_reserve_block reserve NEW_ADDR - skip dirty page due to truncation 1. we don't need to rserve new block address for a truncated page. 2. dn.data_blkaddr is used out of node page lock coverage. Refactor ->page_mkwrite() flow to fix above issues: - use __do_map_lock() to avoid racing checkpoint() - lock data page in prior to dnode page - cover f2fs_reserve_block with i_mmap_sem lock - wait page writeback before zeroing page Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# bab475c5	27-Sep-2018	Chao Yu <chao@kernel.org>	Revert: "f2fs: check last page index in cached bio to decide submission" There is one case that we can leave bio in f2fs, result in hanging page writeback waiter. Thread A Thread B - f2fs_write_cache_pages - f2fs_submit_page_write page #0 cached in bio #0 of cold log - f2fs_submit_page_write page #1 cached in bio #1 of warm log - f2fs_write_cache_pages - f2fs_submit_page_write bio is full, submit bio #1 contain page #1 - f2fs_submit_merged_write_cond(, page #1) fail to submit bio #0 due to page #1 is not in any cached bios. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0a4daae5	19-Sep-2018	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: update i_size after DIO completion This is related to ee70daaba82d ("xfs: update i_size after unwritten conversion in dio completion") If we update i_size during dio_write, dio_read can read out stale data, which breaks xfstests/465. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6f5c2ed0	11-Sep-2018	Chao Yu <chao@kernel.org>	f2fs: split IO error injection according to RW This patch adds to support injecting error for write IO, this can simulate IO error like fail_make_request or dm_flakey does. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7c1a000d	11-Sep-2018	Chao Yu <chao@kernel.org>	f2fs: add SPDX license identifiers Remove the verbose license text from f2fs files and replace them with SPDX tags. This does not change the license of any of the code. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5ce80586	06-Sep-2018	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: submit bio after shutdown Sometimes, some merged IOs could get a chance to be submitted, resulting in system hang in shutdown test. This issues IOs all the time after shutdown. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0ded69f6	22-Aug-2018	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid wrong decrypted data from disk 1. Create a file in an encrypted directory 2. Do GC & drop caches 3. Read stale data before its bio for metapage was not issued yet Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6aa58d8a	14-Aug-2018	Chao Yu <chao@kernel.org>	f2fs: readahead encrypted block during GC During GC, for each encrypted block, we will read block synchronously into meta page, and then submit it into current cold data log area. So this block read model with 4k granularity can make poor performance, like migrating non-encrypted block, let's readahead encrypted block as well to improve migration performance. To implement this, we choose meta page that its index is old block address of the encrypted block, and readahead ciphertext into this page, later, if readaheaded page is still updated, we will load its data into target meta page, and submit the write IO. Note that for OPU, truncation, deletion, we need to invalid meta page after we invalid old block address, to make sure we won't load invalid data from target meta page during encrypted block migration. for ((i = 0; i < 1000; i++)) do { xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync"; } done for ((i = 0; i < 1000; i+=2)) do { rm /mnt/f2fs/dir/$i; } done ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0); Before: gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc] gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1 gc-6549 [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc] gc-6549 [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc] gc-6549 [001] .... 214682.213902: block_plug: [gc] gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc] gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1 gc-6549 [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc] gc-6549 [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc] gc-6549 [001] .... 214682.226414: block_plug: [gc] gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc] gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1 gc-6549 [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc] gc-6549 [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc] gc-6549 [001] .... 214682.226911: block_plug: [gc] gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc] gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1 After: gc-5678 [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc] gc-5678 [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc] gc-5678 [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc] gc-5678 [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc] gc-5678 [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc] gc-5678 [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc] gc-5678 [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc] gc-5678 [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc] gc-5678 [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc] gc-5678 [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc] gc-5678 [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc] gc-5678 [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc] gc-5678 [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc] gc-5678 [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc] gc-5678 [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc] gc-5678 [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc] gc-5678 [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc] gc-5678 [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc] gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc] gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1 gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc] gc-5678 [003] .... 214327.026046: block_plug: [gc] Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6f8d4455	24-Jul-2018	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc The f2fs_gc() called by f2fs_balance_fs() requires to be called outside of fi->i_gc_rwsem[WRITE], since f2fs_gc() can try to grab it in a loop. If it hits the miximum retrials in GC, let's give a chance to release gc_mutex for a short time in order not to go into live lock in the worst case. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 853137ce	09-Aug-2018	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix performance issue observed with multi-thread sequential read This reverts the commit - "b93f771 - f2fs: remove writepages lock" to fix the drop in sequential read throughput. Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L device: UFS Before - read throughput: 185 MB/s total read requests: 85177 (of these ~80000 are 4KB size requests). total write requests: 2546 (of these ~2208 requests are written in 512KB). After - read throughput: 758 MB/s total read requests: 2417 (of these ~2042 are 512KB reads). total write requests: 2701 (of these ~2034 requests are written in 512KB). Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 74c8164e	17-Aug-2018	Jens Axboe <axboe@kernel.dk>	mpage: mpage_readpages() should submit IO as read-ahead a_ops->readpages() is only ever used for read-ahead, yet we don't flag the IO being submitted as such. Fix that up. Any file system that uses mpage_readpages() as its ->readpages() implementation will now get this right. Since we're passing in whether the IO is read-ahead or not, we don't need to pass in the 'gfp' separately, as it is dependent on the IO being read-ahead. Kill off that member. Add some documentation notes on ->readpages() being purely for read-ahead. Link: http://lkml.kernel.org/r/20180621010725.17813-3-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <clm@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 7fa750a1	13-Aug-2018	Arnd Bergmann <arnd@arndb.de>	f2fs: rework fault injection handling to avoid a warning When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an unused label: fs/f2fs/segment.c: In function '__submit_discard_cmd': fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used [-Werror=unused-label] This could be fixed by adding another #ifdef around it, but the more reliable way of doing this seems to be to remove the other #ifdefs where that is easily possible. By defining time_to_inject() as a trivial stub, most of the checks for CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer formatting of the code. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a33c1502	05-Aug-2018	Chao Yu <chao@kernel.org>	f2fs: fix avoid race between truncate and background GC Thread A Background GC - f2fs_setattr isize to 0 - truncate_setsize - gc_data_segment - f2fs_get_read_data_page page #0 - set_page_dirty - set_cold_data - f2fs_truncate - f2fs_setattr isize to 4k - read 4k <--- hit data in cached page #0 Above race condition can cause read out invalid data in a truncated page, fix it by i_gc_rwsem[WRITE] lock. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 91291e99	10-Jul-2018	Chao Yu <chao@kernel.org>	f2fs: fix to do sanity check with block address in main area v2 This patch adds f2fs_is_valid_blkaddr() in below functions to do sanity check with block address to avoid pentential panic: - f2fs_grab_read_bio() - __written_first_block() https://bugzilla.kernel.org/show_bug.cgi?id=200465 - Reproduce - POC (poc.c) #define _GNU_SOURCE #include <sys/types.h> #include <sys/mount.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/xattr.h> #include <dirent.h> #include <errno.h> #include <error.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <linux/falloc.h> #include <linux/loop.h> static void activity(char mpoint) { char xattr; int err; err = asprintf(&xattr, "%s/foo/bar/xattr", mpoint); char buf2[113]; memset(buf2, 0, sizeof(buf2)); listxattr(xattr, buf2, sizeof(buf2)); } int main(int argc, char argv[]) { activity(argv[1]); return 0; } - kernel message [ 844.718738] F2FS-fs (loop0): Mounted with checkpoint version = 2 [ 846.430929] F2FS-fs (loop0): access invalid blkaddr:1024 [ 846.431058] WARNING: CPU: 1 PID: 1249 at fs/f2fs/checkpoint.c:154 f2fs_is_valid_blkaddr+0x10f/0x160 [ 846.431059] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd input_leds joydev soundcore serio_raw i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear qxl ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops pcbc drm 8139too aesni_intel 8139cp floppy psmouse mii aes_x86_64 crypto_simd pata_acpi cryptd glue_helper [ 846.431310] CPU: 1 PID: 1249 Comm: a.out Not tainted 4.18.0-rc3+ #1 [ 846.431312] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 846.431315] RIP: 0010:f2fs_is_valid_blkaddr+0x10f/0x160 [ 846.431316] Code: 00 eb ed 31 c0 83 fa 05 75 ae 48 83 ec 08 48 8b 3f 89 f1 48 c7 c2 fc 0b 0f 8b 48 c7 c6 8b d7 09 8b 88 44 24 07 e8 61 8b ff ff <0f> 0b 0f b6 44 24 07 48 83 c4 08 eb 81 4c 8b 47 10 8b 8f 38 04 00 [ 846.431347] RSP: 0018:ffff961c414a7bc0 EFLAGS: 00010282 [ 846.431349] RAX: 0000000000000000 RBX: ffffc5f787b8ea80 RCX: 0000000000000000 [ 846.431350] RDX: 0000000000000000 RSI: ffff89dfffd165d8 RDI: ffff89dfffd165d8 [ 846.431351] RBP: ffff961c414a7c20 R08: 0000000000000001 R09: 0000000000000248 [ 846.431353] R10: 0000000000000000 R11: 0000000000000248 R12: 0000000000000007 [ 846.431369] R13: ffff89dff5492800 R14: ffff89dfae3aa000 R15: ffff89dff4ff88d0 [ 846.431372] FS: 00007f882e2fb700(0000) GS:ffff89dfffd00000(0000) knlGS:0000000000000000 [ 846.431373] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 846.431374] CR2: 0000000001a88008 CR3: 00000001eb572000 CR4: 00000000000006e0 [ 846.431384] Call Trace: [ 846.431426] f2fs_iget+0x6f4/0xe70 [ 846.431430] ? f2fs_find_entry+0x71/0x90 [ 846.431432] f2fs_lookup+0x1aa/0x390 [ 846.431452] __lookup_slow+0x97/0x150 [ 846.431459] lookup_slow+0x35/0x50 [ 846.431462] walk_component+0x1c6/0x470 [ 846.431479] ? memcg_kmem_charge_memcg+0x70/0x90 [ 846.431488] ? page_add_file_rmap+0x13/0x200 [ 846.431491] path_lookupat+0x76/0x230 [ 846.431501] ? __alloc_pages_nodemask+0xfc/0x280 [ 846.431504] filename_lookup+0xb8/0x1a0 [ 846.431534] ? _cond_resched+0x16/0x40 [ 846.431541] ? kmem_cache_alloc+0x160/0x1d0 [ 846.431549] ? path_listxattr+0x41/0xa0 [ 846.431551] path_listxattr+0x41/0xa0 [ 846.431570] do_syscall_64+0x55/0x100 [ 846.431583] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 846.431607] RIP: 0033:0x7f882de1c0d7 [ 846.431607] Code: f0 ff ff 73 01 c3 48 8b 0d be dd 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 dd 2b 00 f7 d8 64 89 01 48 [ 846.431639] RSP: 002b:00007ffe8e66c238 EFLAGS: 00000202 ORIG_RAX: 00000000000000c2 [ 846.431641] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f882de1c0d7 [ 846.431642] RDX: 0000000000000071 RSI: 00007ffe8e66c280 RDI: 0000000001a880c0 [ 846.431643] RBP: 00007ffe8e66c300 R08: 0000000001a88010 R09: 0000000000000000 [ 846.431645] R10: 00000000000001ab R11: 0000000000000202 R12: 0000000000400550 [ 846.431646] R13: 00007ffe8e66c400 R14: 0000000000000000 R15: 0000000000000000 [ 846.431648] ---[ end trace abca54df39d14f5c ]--- [ 846.431651] F2FS-fs (loop0): invalid blkaddr: 1024, type: 5, run fsck to fix. [ 846.431762] WARNING: CPU: 1 PID: 1249 at fs/f2fs/f2fs.h:2697 f2fs_iget+0xd17/0xe70 [ 846.431763] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd input_leds joydev soundcore serio_raw i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear qxl ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops pcbc drm 8139too aesni_intel 8139cp floppy psmouse mii aes_x86_64 crypto_simd pata_acpi cryptd glue_helper [ 846.431797] CPU: 1 PID: 1249 Comm: a.out Tainted: G W 4.18.0-rc3+ #1 [ 846.431798] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 846.431800] RIP: 0010:f2fs_iget+0xd17/0xe70 [ 846.431801] Code: ff ff 48 63 d8 e9 e1 f6 ff ff 48 8b 45 c8 41 b8 05 00 00 00 48 c7 c2 d8 e8 0e 8b 48 c7 c6 1d b0 0a 8b 48 8b 38 e8 f9 b4 00 00 <0f> 0b 48 8b 45 c8 f0 80 48 48 04 e9 d8 f9 ff ff 0f 0b 48 8b 43 18 [ 846.431832] RSP: 0018:ffff961c414a7bd0 EFLAGS: 00010282 [ 846.431834] RAX: 0000000000000000 RBX: ffffc5f787b8ea80 RCX: 0000000000000006 [ 846.431835] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff89dfffd165d0 [ 846.431836] RBP: ffff961c414a7c20 R08: 0000000000000000 R09: 0000000000000273 [ 846.431837] R10: 0000000000000000 R11: ffff89dfad50ca60 R12: 0000000000000007 [ 846.431838] R13: ffff89dff5492800 R14: ffff89dfae3aa000 R15: ffff89dff4ff88d0 [ 846.431840] FS: 00007f882e2fb700(0000) GS:ffff89dfffd00000(0000) knlGS:0000000000000000 [ 846.431841] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 846.431842] CR2: 0000000001a88008 CR3: 00000001eb572000 CR4: 00000000000006e0 [ 846.431846] Call Trace: [ 846.431850] ? f2fs_find_entry+0x71/0x90 [ 846.431853] f2fs_lookup+0x1aa/0x390 [ 846.431856] __lookup_slow+0x97/0x150 [ 846.431858] lookup_slow+0x35/0x50 [ 846.431874] walk_component+0x1c6/0x470 [ 846.431878] ? memcg_kmem_charge_memcg+0x70/0x90 [ 846.431880] ? page_add_file_rmap+0x13/0x200 [ 846.431882] path_lookupat+0x76/0x230 [ 846.431884] ? __alloc_pages_nodemask+0xfc/0x280 [ 846.431886] filename_lookup+0xb8/0x1a0 [ 846.431890] ? _cond_resched+0x16/0x40 [ 846.431891] ? kmem_cache_alloc+0x160/0x1d0 [ 846.431894] ? path_listxattr+0x41/0xa0 [ 846.431896] path_listxattr+0x41/0xa0 [ 846.431898] do_syscall_64+0x55/0x100 [ 846.431901] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 846.431902] RIP: 0033:0x7f882de1c0d7 [ 846.431903] Code: f0 ff ff 73 01 c3 48 8b 0d be dd 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 dd 2b 00 f7 d8 64 89 01 48 [ 846.431934] RSP: 002b:00007ffe8e66c238 EFLAGS: 00000202 ORIG_RAX: 00000000000000c2 [ 846.431936] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f882de1c0d7 [ 846.431937] RDX: 0000000000000071 RSI: 00007ffe8e66c280 RDI: 0000000001a880c0 [ 846.431939] RBP: 00007ffe8e66c300 R08: 0000000001a88010 R09: 0000000000000000 [ 846.431940] R10: 00000000000001ab R11: 0000000000000202 R12: 0000000000400550 [ 846.431941] R13: 00007ffe8e66c400 R14: 0000000000000000 R15: 0000000000000000 [ 846.431943] ---[ end trace abca54df39d14f5d ]--- [ 846.432033] F2FS-fs (loop0): access invalid blkaddr:1024 [ 846.432051] WARNING: CPU: 1 PID: 1249 at fs/f2fs/checkpoint.c:154 f2fs_is_valid_blkaddr+0x10f/0x160 [ 846.432051] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd input_leds joydev soundcore serio_raw i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear qxl ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops pcbc drm 8139too aesni_intel 8139cp floppy psmouse mii aes_x86_64 crypto_simd pata_acpi cryptd glue_helper [ 846.432085] CPU: 1 PID: 1249 Comm: a.out Tainted: G W 4.18.0-rc3+ #1 [ 846.432086] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 846.432089] RIP: 0010:f2fs_is_valid_blkaddr+0x10f/0x160 [ 846.432089] Code: 00 eb ed 31 c0 83 fa 05 75 ae 48 83 ec 08 48 8b 3f 89 f1 48 c7 c2 fc 0b 0f 8b 48 c7 c6 8b d7 09 8b 88 44 24 07 e8 61 8b ff ff <0f> 0b 0f b6 44 24 07 48 83 c4 08 eb 81 4c 8b 47 10 8b 8f 38 04 00 [ 846.432120] RSP: 0018:ffff961c414a7900 EFLAGS: 00010286 [ 846.432122] RAX: 0000000000000000 RBX: 0000000000000400 RCX: 0000000000000006 [ 846.432123] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff89dfffd165d0 [ 846.432124] RBP: ffff89dff5492800 R08: 0000000000000001 R09: 000000000000029d [ 846.432125] R10: ffff961c414a7820 R11: 000000000000029d R12: 0000000000000400 [ 846.432126] R13: 0000000000000000 R14: ffff89dff4ff88d0 R15: 0000000000000000 [ 846.432128] FS: 00007f882e2fb700(0000) GS:ffff89dfffd00000(0000) knlGS:0000000000000000 [ 846.432130] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 846.432131] CR2: 0000000001a88008 CR3: 00000001eb572000 CR4: 00000000000006e0 [ 846.432135] Call Trace: [ 846.432151] f2fs_wait_on_block_writeback+0x20/0x110 [ 846.432158] f2fs_grab_read_bio+0xbc/0xe0 [ 846.432161] f2fs_submit_page_read+0x21/0x280 [ 846.432163] f2fs_get_read_data_page+0xb7/0x3c0 [ 846.432165] f2fs_get_lock_data_page+0x29/0x1e0 [ 846.432167] f2fs_get_new_data_page+0x148/0x550 [ 846.432170] f2fs_add_regular_entry+0x1d2/0x550 [ 846.432178] ? __switch_to+0x12f/0x460 [ 846.432181] f2fs_add_dentry+0x6a/0xd0 [ 846.432184] f2fs_do_add_link+0xe9/0x140 [ 846.432186] __recover_dot_dentries+0x260/0x280 [ 846.432189] f2fs_lookup+0x343/0x390 [ 846.432193] __lookup_slow+0x97/0x150 [ 846.432195] lookup_slow+0x35/0x50 [ 846.432208] walk_component+0x1c6/0x470 [ 846.432212] ? memcg_kmem_charge_memcg+0x70/0x90 [ 846.432215] ? page_add_file_rmap+0x13/0x200 [ 846.432217] path_lookupat+0x76/0x230 [ 846.432219] ? __alloc_pages_nodemask+0xfc/0x280 [ 846.432221] filename_lookup+0xb8/0x1a0 [ 846.432224] ? _cond_resched+0x16/0x40 [ 846.432226] ? kmem_cache_alloc+0x160/0x1d0 [ 846.432228] ? path_listxattr+0x41/0xa0 [ 846.432230] path_listxattr+0x41/0xa0 [ 846.432233] do_syscall_64+0x55/0x100 [ 846.432235] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 846.432237] RIP: 0033:0x7f882de1c0d7 [ 846.432237] Code: f0 ff ff 73 01 c3 48 8b 0d be dd 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 dd 2b 00 f7 d8 64 89 01 48 [ 846.432269] RSP: 002b:00007ffe8e66c238 EFLAGS: 00000202 ORIG_RAX: 00000000000000c2 [ 846.432271] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f882de1c0d7 [ 846.432272] RDX: 0000000000000071 RSI: 00007ffe8e66c280 RDI: 0000000001a880c0 [ 846.432273] RBP: 00007ffe8e66c300 R08: 0000000001a88010 R09: 0000000000000000 [ 846.432274] R10: 00000000000001ab R11: 0000000000000202 R12: 0000000000400550 [ 846.432275] R13: 00007ffe8e66c400 R14: 0000000000000000 R15: 0000000000000000 [ 846.432277] ---[ end trace abca54df39d14f5e ]--- [ 846.432279] F2FS-fs (loop0): invalid blkaddr: 1024, type: 5, run fsck to fix. [ 846.432376] WARNING: CPU: 1 PID: 1249 at fs/f2fs/f2fs.h:2697 f2fs_wait_on_block_writeback+0xb1/0x110 [ 846.432376] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd input_leds joydev soundcore serio_raw i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear qxl ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops pcbc drm 8139too aesni_intel 8139cp floppy psmouse mii aes_x86_64 crypto_simd pata_acpi cryptd glue_helper [ 846.432410] CPU: 1 PID: 1249 Comm: a.out Tainted: G W 4.18.0-rc3+ #1 [ 846.432411] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 846.432413] RIP: 0010:f2fs_wait_on_block_writeback+0xb1/0x110 [ 846.432414] Code: 66 90 f0 ff 4b 34 74 59 5b 5d c3 48 8b 7d 00 41 b8 05 00 00 00 89 d9 48 c7 c2 d8 e8 0e 8b 48 c7 c6 1d b0 0a 8b e8 df bc fd ff <0f> 0b f0 80 4d 48 04 e9 67 ff ff ff 48 8b 03 48 c1 e8 37 83 e0 07 [ 846.432445] RSP: 0018:ffff961c414a7910 EFLAGS: 00010286 [ 846.432447] RAX: 0000000000000000 RBX: 0000000000000400 RCX: 0000000000000006 [ 846.432448] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff89dfffd165d0 [ 846.432449] RBP: ffff89dff5492800 R08: 0000000000000000 R09: 00000000000002d1 [ 846.432450] R10: ffff961c414a7820 R11: ffff89dfad50cf80 R12: 0000000000000400 [ 846.432451] R13: 0000000000000000 R14: ffff89dff4ff88d0 R15: 0000000000000000 [ 846.432453] FS: 00007f882e2fb700(0000) GS:ffff89dfffd00000(0000) knlGS:0000000000000000 [ 846.432454] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 846.432455] CR2: 0000000001a88008 CR3: 00000001eb572000 CR4: 00000000000006e0 [ 846.432459] Call Trace: [ 846.432463] f2fs_grab_read_bio+0xbc/0xe0 [ 846.432464] f2fs_submit_page_read+0x21/0x280 [ 846.432466] f2fs_get_read_data_page+0xb7/0x3c0 [ 846.432468] f2fs_get_lock_data_page+0x29/0x1e0 [ 846.432470] f2fs_get_new_data_page+0x148/0x550 [ 846.432473] f2fs_add_regular_entry+0x1d2/0x550 [ 846.432475] ? __switch_to+0x12f/0x460 [ 846.432477] f2fs_add_dentry+0x6a/0xd0 [ 846.432480] f2fs_do_add_link+0xe9/0x140 [ 846.432483] __recover_dot_dentries+0x260/0x280 [ 846.432485] f2fs_lookup+0x343/0x390 [ 846.432488] __lookup_slow+0x97/0x150 [ 846.432490] lookup_slow+0x35/0x50 [ 846.432505] walk_component+0x1c6/0x470 [ 846.432509] ? memcg_kmem_charge_memcg+0x70/0x90 [ 846.432511] ? page_add_file_rmap+0x13/0x200 [ 846.432513] path_lookupat+0x76/0x230 [ 846.432515] ? __alloc_pages_nodemask+0xfc/0x280 [ 846.432517] filename_lookup+0xb8/0x1a0 [ 846.432520] ? _cond_resched+0x16/0x40 [ 846.432522] ? kmem_cache_alloc+0x160/0x1d0 [ 846.432525] ? path_listxattr+0x41/0xa0 [ 846.432526] path_listxattr+0x41/0xa0 [ 846.432529] do_syscall_64+0x55/0x100 [ 846.432531] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 846.432533] RIP: 0033:0x7f882de1c0d7 [ 846.432533] Code: f0 ff ff 73 01 c3 48 8b 0d be dd 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 dd 2b 00 f7 d8 64 89 01 48 [ 846.432565] RSP: 002b:00007ffe8e66c238 EFLAGS: 00000202 ORIG_RAX: 00000000000000c2 [ 846.432567] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f882de1c0d7 [ 846.432568] RDX: 0000000000000071 RSI: 00007ffe8e66c280 RDI: 0000000001a880c0 [ 846.432569] RBP: 00007ffe8e66c300 R08: 0000000001a88010 R09: 0000000000000000 [ 846.432570] R10: 00000000000001ab R11: 0000000000000202 R12: 0000000000400550 [ 846.432571] R13: 00007ffe8e66c400 R14: 0000000000000000 R15: 0000000000000000 [ 846.432573] ---[ end trace abca54df39d14f5f ]--- [ 846.434280] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 846.434424] PGD 80000001ebd3a067 P4D 80000001ebd3a067 PUD 1eb1ae067 PMD 0 [ 846.434551] Oops: 0000 [#1] SMP PTI [ 846.434697] CPU: 0 PID: 44 Comm: kworker/u5:0 Tainted: G W 4.18.0-rc3+ #1 [ 846.434805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 846.435000] Workqueue: fscrypt_read_queue decrypt_work [ 846.435174] RIP: 0010:fscrypt_do_page_crypto+0x6e/0x2d0 [ 846.435351] Code: 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 88 00 00 00 31 c0 e8 43 c2 e0 ff 49 8b 86 48 02 00 00 85 ed c7 44 24 70 00 00 00 00 <48> 8b 58 08 0f 84 14 02 00 00 48 8b 78 10 48 8b 0c 24 48 c7 84 24 [ 846.435696] RSP: 0018:ffff961c40f9bd60 EFLAGS: 00010206 [ 846.435870] RAX: 0000000000000000 RBX: ffffc5f787719b80 RCX: ffffc5f787719b80 [ 846.436051] RDX: ffffffff8b9f4b88 RSI: ffffffff8b0ae622 RDI: ffff961c40f9bdb8 [ 846.436261] RBP: 0000000000001000 R08: ffffc5f787719b80 R09: 0000000000001000 [ 846.436433] R10: 0000000000000018 R11: fefefefefefefeff R12: ffffc5f787719b80 [ 846.436562] R13: ffffc5f787719b80 R14: ffff89dff4ff88d0 R15: 0ffff89dfaddee60 [ 846.436658] FS: 0000000000000000(0000) GS:ffff89dfffc00000(0000) knlGS:0000000000000000 [ 846.436758] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 846.436898] CR2: 0000000000000008 CR3: 00000001eddd0000 CR4: 00000000000006f0 [ 846.437001] Call Trace: [ 846.437181] ? check_preempt_wakeup+0xf2/0x230 [ 846.437276] ? check_preempt_curr+0x7c/0x90 [ 846.437370] fscrypt_decrypt_page+0x48/0x4d [ 846.437466] __fscrypt_decrypt_bio+0x5b/0x90 [ 846.437542] decrypt_work+0x12/0x20 [ 846.437651] process_one_work+0x15e/0x3d0 [ 846.437740] worker_thread+0x4c/0x440 [ 846.437848] kthread+0xf8/0x130 [ 846.437938] ? rescuer_thread+0x350/0x350 [ 846.438022] ? kthread_associate_blkcg+0x90/0x90 [ 846.438117] ret_from_fork+0x35/0x40 [ 846.438201] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd input_leds joydev soundcore serio_raw i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear qxl ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops pcbc drm 8139too aesni_intel 8139cp floppy psmouse mii aes_x86_64 crypto_simd pata_acpi cryptd glue_helper [ 846.438653] CR2: 0000000000000008 [ 846.438713] ---[ end trace abca54df39d14f60 ]--- [ 846.438796] RIP: 0010:fscrypt_do_page_crypto+0x6e/0x2d0 [ 846.438844] Code: 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 88 00 00 00 31 c0 e8 43 c2 e0 ff 49 8b 86 48 02 00 00 85 ed c7 44 24 70 00 00 00 00 <48> 8b 58 08 0f 84 14 02 00 00 48 8b 78 10 48 8b 0c 24 48 c7 84 24 [ 846.439084] RSP: 0018:ffff961c40f9bd60 EFLAGS: 00010206 [ 846.439176] RAX: 0000000000000000 RBX: ffffc5f787719b80 RCX: ffffc5f787719b80 [ 846.440927] RDX: ffffffff8b9f4b88 RSI: ffffffff8b0ae622 RDI: ffff961c40f9bdb8 [ 846.442083] RBP: 0000000000001000 R08: ffffc5f787719b80 R09: 0000000000001000 [ 846.443284] R10: 0000000000000018 R11: fefefefefefefeff R12: ffffc5f787719b80 [ 846.444448] R13: ffffc5f787719b80 R14: ffff89dff4ff88d0 R15: 0ffff89dfaddee60 [ 846.445558] FS: 0000000000000000(0000) GS:ffff89dfffc00000(0000) knlGS:0000000000000000 [ 846.446687] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 846.447796] CR2: 0000000000000008 CR3: 00000001eddd0000 CR4: 00000000000006f0 - Location https://elixir.bootlin.com/linux/v4.18-rc4/source/fs/crypto/crypto.c#L149 struct crypto_skcipher tfm = ci->ci_ctfm; Here ci can be NULL Note that this issue maybe require CONFIG_F2FS_FS_ENCRYPTION=y to reproduce. Reported-by Wen Xu <wen.xu@gatech.edu> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 50fa53ec	02-Aug-2018	Chao Yu <chao@kernel.org>	f2fs: fix to avoid broken of dnode block list f2fs recovery flow is relying on dnode block link list, it means fsynced file recovery depends on previous dnode's persistence in the list, so during fsync() we should wait on all regular inode's dnode writebacked before issuing flush. By this way, we can avoid dnode block list being broken by out-of-order IO submission due to IO scheduler or driver. Sheng Yong helps to do the test with this patch: Target:/data (f2fs, -) 64MB / 32768KB / 4KB / 8 1 / PERSIST / Index Base: SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS) 1 867.82 204.15 41440.03 41370.54 680.8 1025.94 1031.08 2 871.87 205.87 41370.3 40275.2 791.14 1065.84 1101.7 3 866.52 205.69 41795.67 40596.16 694.69 1037.16 1031.48 Avg 868.7366667 205.2366667 41535.33333 40747.3 722.21 1042.98 1054.753333 After: SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS) 1 798.81 202.5 41143 40613.87 602.71 838.08 913.83 2 805.79 206.47 40297.2 41291.46 604.44 840.75 924.27 3 814.83 206.17 41209.57 40453.62 602.85 834.66 927.91 Avg 806.4766667 205.0466667 40883.25667 40786.31667 603.3333333 837.83 922.0033333 Patched/Original: 0.928332713 0.999074239 0.984300676 1.000957528 0.835398753 0.803303994 0.874141189 It looks like atomic write will suffer performance regression. I suspect that the criminal is that we forcing to wait all dnode being in storage cache before we issue PREFLUSH+FUA. BTW, will commit ("f2fs: don't need to wait for node writes for atomic write") cause the problem: we will lose data of last transaction after SPO, even if atomic write return no error: - atomic_open(); - write() P1, P2, P3; - atomic_commit(); - writeback data: P1, P2, P3; - writeback node: N1, N2, N3; <--- If N1, N2 is not writebacked, N3 with fsync_mark is writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing last transaction. - preflush + fua; - power-cut If we don't wait dnode writeback for atomic_write: SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS) 1 779.91 206.03 41621.5 40333.16 716.9 1038.21 1034.85 2 848.51 204.35 40082.44 39486.17 791.83 1119.96 1083.77 3 772.12 206.27 41335.25 41599.65 723.29 1055.07 971.92 Avg 800.18 205.55 41013.06333 40472.99333 744.0066667 1071.08 1030.18 Patched/Original: 0.92108464 1.001526693 0.987425886 0.993268102 1.030180511 1.026942031 0.976702294 SQLite's performance recovers. Jaegeuk: "Practically, I don't see db corruption becase of this. We can excuse to lose the last transaction." Finally, we decide to keep original implementation of atomic write interface sematics that we don't wait all dnode writeback before preflush+fua submission. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 66110abc	28-Jul-2018	Chao Yu <chao@kernel.org>	f2fs: fix to clear PG_checked flag in set_page_dirty() PG_checked flag will be set on data page during GC, later, we can recognize such page by the flag and migrate page to cold segment. But previously, we don't clear this flag when invalidating data page, after page redirtying, we will write it into wrong log. Let's clear PG_checked flag in set_page_dirty() to avoid this. Signed-off-by: Weichao Guo <guoweichao@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 455e3a58	27-Jul-2018	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: don't allow any writes on aborted atomic writes In order to prevent abusing atomic writes by abnormal users, we've added a threshold, 20% over memory footprint, which disallows further atomic writes. Previously, however, SQLite doesn't know the files became normal, so that it could write stale data and commit on revoked normal database file. Once f2fs detects such the abnormal behavior, this patch tries to avoid further writes in write_begin(). Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7735730d	16-Jul-2018	Chao Yu <chao@kernel.org>	f2fs: fix to propagate error from __get_meta_page() If caller of __get_meta_page() can handle error, let's propagate error from __get_meta_page(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 66415cee	12-Jul-2018	Yunlong Song <yunlong.song@huawei.com>	f2fs: blk_finish_plug of submit_bio in lfs mode Expand the blk_finish_plug action from blkzoned to normal lfs mode, since plug will cause the out-of-order IO submission, which is not friendly to flash in lfs mode. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c9b60788	01-Aug-2018	Chao Yu <chao@kernel.org>	f2fs: fix to do sanity check with block address in main area This patch add to do sanity check with below field: - cp_pack_total_block_count - blkaddr of data/node - extent info - Overview BUG() in verify_block_addr() when writing to a corrupted f2fs image - Reproduce (4.18 upstream kernel) - POC (poc.c) static void activity(char mpoint) { char foo_bar_baz; int err; static int buf[8192]; memset(buf, 0, sizeof(buf)); err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint); int fd = open(foo_bar_baz, O_RDWR \| O_TRUNC, 0777); if (fd >= 0) { write(fd, (char )buf, sizeof(buf)); fdatasync(fd); close(fd); } } int main(int argc, char argv[]) { activity(argv[1]); return 0; } - Kernel message [ 689.349473] F2FS-fs (loop0): Mounted with checkpoint version = 3 [ 699.728662] WARNING: CPU: 0 PID: 1309 at fs/f2fs/segment.c:2860 f2fs_inplace_write_data+0x232/0x240 [ 699.728670] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy [ 699.729056] CPU: 0 PID: 1309 Comm: a.out Not tainted 4.18.0-rc1+ #4 [ 699.729064] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 699.729074] RIP: 0010:f2fs_inplace_write_data+0x232/0x240 [ 699.729076] Code: ff e9 cf fe ff ff 49 8d 7d 10 e8 39 45 ad ff 4d 8b 7d 10 be 04 00 00 00 49 8d 7f 48 e8 07 49 ad ff 45 8b 7f 48 e9 fb fe ff ff <0f> 0b f0 41 80 4d 48 04 e9 65 fe ff ff 90 66 66 66 66 90 55 48 8d [ 699.729130] RSP: 0018:ffff8801f43af568 EFLAGS: 00010202 [ 699.729139] RAX: 000000000000003f RBX: ffff8801f43af7b8 RCX: ffffffffb88c9113 [ 699.729142] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff8802024e5540 [ 699.729144] RBP: ffff8801f43af590 R08: 0000000000000009 R09: ffffffffffffffe8 [ 699.729147] R10: 0000000000000001 R11: ffffed0039b0596a R12: ffff8802024e5540 [ 699.729149] R13: ffff8801f0335500 R14: ffff8801e3e7a700 R15: ffff8801e1ee4450 [ 699.729154] FS: 00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000 [ 699.729156] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 699.729159] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0 [ 699.729171] Call Trace: [ 699.729192] f2fs_do_write_data_page+0x2e2/0xe00 [ 699.729203] ? f2fs_should_update_outplace+0xd0/0xd0 [ 699.729238] ? memcg_drain_all_list_lrus+0x280/0x280 [ 699.729269] ? __radix_tree_replace+0xa3/0x120 [ 699.729276] __write_data_page+0x5c7/0xe30 [ 699.729291] ? kasan_check_read+0x11/0x20 [ 699.729310] ? page_mapped+0x8a/0x110 [ 699.729321] ? page_mkclean+0xe9/0x160 [ 699.729327] ? f2fs_do_write_data_page+0xe00/0xe00 [ 699.729331] ? invalid_page_referenced_vma+0x130/0x130 [ 699.729345] ? clear_page_dirty_for_io+0x332/0x450 [ 699.729351] f2fs_write_cache_pages+0x4ca/0x860 [ 699.729358] ? __write_data_page+0xe30/0xe30 [ 699.729374] ? percpu_counter_add_batch+0x22/0xa0 [ 699.729380] ? kasan_check_write+0x14/0x20 [ 699.729391] ? _raw_spin_lock+0x17/0x40 [ 699.729403] ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30 [ 699.729413] ? iov_iter_advance+0x113/0x640 [ 699.729418] ? f2fs_write_end+0x133/0x2e0 [ 699.729423] ? balance_dirty_pages_ratelimited+0x239/0x640 [ 699.729428] f2fs_write_data_pages+0x329/0x520 [ 699.729433] ? generic_perform_write+0x250/0x320 [ 699.729438] ? f2fs_write_cache_pages+0x860/0x860 [ 699.729454] ? current_time+0x110/0x110 [ 699.729459] ? f2fs_preallocate_blocks+0x1ef/0x370 [ 699.729464] do_writepages+0x37/0xb0 [ 699.729468] ? f2fs_write_cache_pages+0x860/0x860 [ 699.729472] ? do_writepages+0x37/0xb0 [ 699.729478] __filemap_fdatawrite_range+0x19a/0x1f0 [ 699.729483] ? delete_from_page_cache_batch+0x4e0/0x4e0 [ 699.729496] ? __vfs_write+0x2b2/0x410 [ 699.729501] file_write_and_wait_range+0x66/0xb0 [ 699.729506] f2fs_do_sync_file+0x1f9/0xd90 [ 699.729511] ? truncate_partial_data_page+0x290/0x290 [ 699.729521] ? __sb_end_write+0x30/0x50 [ 699.729526] ? vfs_write+0x20f/0x260 [ 699.729530] f2fs_sync_file+0x9a/0xb0 [ 699.729534] ? f2fs_do_sync_file+0xd90/0xd90 [ 699.729548] vfs_fsync_range+0x68/0x100 [ 699.729554] ? __fget_light+0xc9/0xe0 [ 699.729558] do_fsync+0x3d/0x70 [ 699.729562] __x64_sys_fdatasync+0x24/0x30 [ 699.729585] do_syscall_64+0x78/0x170 [ 699.729595] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 699.729613] RIP: 0033:0x7f9bf930d800 [ 699.729615] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24 [ 699.729668] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [ 699.729673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800 [ 699.729675] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003 [ 699.729678] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000 [ 699.729680] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610 [ 699.729683] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000 [ 699.729687] ---[ end trace 4ce02f25ff7d3df5 ]--- [ 699.729782] ------------[ cut here ]------------ [ 699.729785] kernel BUG at fs/f2fs/segment.h:654! [ 699.731055] invalid opcode: 0000 [#1] SMP KASAN PTI [ 699.732104] CPU: 0 PID: 1309 Comm: a.out Tainted: G W 4.18.0-rc1+ #4 [ 699.733684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 699.735611] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730 [ 699.736649] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0 [ 699.740524] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283 [ 699.741573] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef [ 699.743006] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c [ 699.744426] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55 [ 699.745833] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940 [ 699.747256] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001 [ 699.748683] FS: 00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000 [ 699.750293] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 699.751462] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0 [ 699.752874] Call Trace: [ 699.753386] ? f2fs_inplace_write_data+0x93/0x240 [ 699.754341] f2fs_inplace_write_data+0xd2/0x240 [ 699.755271] f2fs_do_write_data_page+0x2e2/0xe00 [ 699.756214] ? f2fs_should_update_outplace+0xd0/0xd0 [ 699.757215] ? memcg_drain_all_list_lrus+0x280/0x280 [ 699.758209] ? __radix_tree_replace+0xa3/0x120 [ 699.759164] __write_data_page+0x5c7/0xe30 [ 699.760002] ? kasan_check_read+0x11/0x20 [ 699.760823] ? page_mapped+0x8a/0x110 [ 699.761573] ? page_mkclean+0xe9/0x160 [ 699.762345] ? f2fs_do_write_data_page+0xe00/0xe00 [ 699.763332] ? invalid_page_referenced_vma+0x130/0x130 [ 699.764374] ? clear_page_dirty_for_io+0x332/0x450 [ 699.765347] f2fs_write_cache_pages+0x4ca/0x860 [ 699.766276] ? __write_data_page+0xe30/0xe30 [ 699.767161] ? percpu_counter_add_batch+0x22/0xa0 [ 699.768112] ? kasan_check_write+0x14/0x20 [ 699.768951] ? _raw_spin_lock+0x17/0x40 [ 699.769739] ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30 [ 699.770885] ? iov_iter_advance+0x113/0x640 [ 699.771743] ? f2fs_write_end+0x133/0x2e0 [ 699.772569] ? balance_dirty_pages_ratelimited+0x239/0x640 [ 699.773680] f2fs_write_data_pages+0x329/0x520 [ 699.774603] ? generic_perform_write+0x250/0x320 [ 699.775544] ? f2fs_write_cache_pages+0x860/0x860 [ 699.776510] ? current_time+0x110/0x110 [ 699.777299] ? f2fs_preallocate_blocks+0x1ef/0x370 [ 699.778279] do_writepages+0x37/0xb0 [ 699.779026] ? f2fs_write_cache_pages+0x860/0x860 [ 699.779978] ? do_writepages+0x37/0xb0 [ 699.780755] __filemap_fdatawrite_range+0x19a/0x1f0 [ 699.781746] ? delete_from_page_cache_batch+0x4e0/0x4e0 [ 699.782820] ? __vfs_write+0x2b2/0x410 [ 699.783597] file_write_and_wait_range+0x66/0xb0 [ 699.784540] f2fs_do_sync_file+0x1f9/0xd90 [ 699.785381] ? truncate_partial_data_page+0x290/0x290 [ 699.786415] ? __sb_end_write+0x30/0x50 [ 699.787204] ? vfs_write+0x20f/0x260 [ 699.787941] f2fs_sync_file+0x9a/0xb0 [ 699.788694] ? f2fs_do_sync_file+0xd90/0xd90 [ 699.789572] vfs_fsync_range+0x68/0x100 [ 699.790360] ? __fget_light+0xc9/0xe0 [ 699.791128] do_fsync+0x3d/0x70 [ 699.791779] __x64_sys_fdatasync+0x24/0x30 [ 699.792614] do_syscall_64+0x78/0x170 [ 699.793371] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 699.794406] RIP: 0033:0x7f9bf930d800 [ 699.795134] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24 [ 699.798960] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [ 699.800483] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800 [ 699.801923] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003 [ 699.803373] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000 [ 699.804798] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610 [ 699.806233] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000 [ 699.807667] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy [ 699.817079] ---[ end trace 4ce02f25ff7d3df6 ]--- [ 699.818068] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730 [ 699.819114] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0 [ 699.822919] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283 [ 699.823977] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef [ 699.825436] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c [ 699.826881] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55 [ 699.828292] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940 [ 699.829750] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001 [ 699.831192] FS: 00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000 [ 699.832793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 699.833981] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0 [ 699.835556] ================================================================== [ 699.837029] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0 [ 699.838462] Read of size 8 at addr ffff8801f43af970 by task a.out/1309 [ 699.840086] CPU: 0 PID: 1309 Comm: a.out Tainted: G D W 4.18.0-rc1+ #4 [ 699.841603] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 699.843475] Call Trace: [ 699.843982] dump_stack+0x7b/0xb5 [ 699.844661] print_address_description+0x70/0x290 [ 699.845607] kasan_report+0x291/0x390 [ 699.846351] ? update_stack_state+0x38c/0x3e0 [ 699.853831] __asan_load8+0x54/0x90 [ 699.854569] update_stack_state+0x38c/0x3e0 [ 699.855428] ? __read_once_size_nocheck.constprop.7+0x20/0x20 [ 699.856601] ? __save_stack_trace+0x5e/0x100 [ 699.857476] unwind_next_frame.part.5+0x18e/0x490 [ 699.858448] ? unwind_dump+0x290/0x290 [ 699.859217] ? clear_page_dirty_for_io+0x332/0x450 [ 699.860185] __unwind_start+0x106/0x190 [ 699.860974] __save_stack_trace+0x5e/0x100 [ 699.861808] ? __save_stack_trace+0x5e/0x100 [ 699.862691] ? unlink_anon_vmas+0xba/0x2c0 [ 699.863525] save_stack_trace+0x1f/0x30 [ 699.864312] save_stack+0x46/0xd0 [ 699.864993] ? __alloc_pages_slowpath+0x1420/0x1420 [ 699.865990] ? flush_tlb_mm_range+0x15e/0x220 [ 699.866889] ? kasan_check_write+0x14/0x20 [ 699.867724] ? __dec_node_state+0x92/0xb0 [ 699.868543] ? lock_page_memcg+0x85/0xf0 [ 699.869350] ? unlock_page_memcg+0x16/0x80 [ 699.870185] ? page_remove_rmap+0x198/0x520 [ 699.871048] ? mark_page_accessed+0x133/0x200 [ 699.871930] ? _cond_resched+0x1a/0x50 [ 699.872700] ? unmap_page_range+0xcd4/0xe50 [ 699.873551] ? rb_next+0x58/0x80 [ 699.874217] ? rb_next+0x58/0x80 [ 699.874895] __kasan_slab_free+0x13c/0x1a0 [ 699.875734] ? unlink_anon_vmas+0xba/0x2c0 [ 699.876563] kasan_slab_free+0xe/0x10 [ 699.877315] kmem_cache_free+0x89/0x1e0 [ 699.878095] unlink_anon_vmas+0xba/0x2c0 [ 699.878913] free_pgtables+0x101/0x1b0 [ 699.879677] exit_mmap+0x146/0x2a0 [ 699.880378] ? __ia32_sys_munmap+0x50/0x50 [ 699.881214] ? kasan_check_read+0x11/0x20 [ 699.882052] ? mm_update_next_owner+0x322/0x380 [ 699.882985] mmput+0x8b/0x1d0 [ 699.883602] do_exit+0x43a/0x1390 [ 699.884288] ? mm_update_next_owner+0x380/0x380 [ 699.885212] ? f2fs_sync_file+0x9a/0xb0 [ 699.885995] ? f2fs_do_sync_file+0xd90/0xd90 [ 699.886877] ? vfs_fsync_range+0x68/0x100 [ 699.887694] ? __fget_light+0xc9/0xe0 [ 699.888442] ? do_fsync+0x3d/0x70 [ 699.889118] ? __x64_sys_fdatasync+0x24/0x30 [ 699.889996] rewind_stack_do_exit+0x17/0x20 [ 699.890860] RIP: 0033:0x7f9bf930d800 [ 699.891585] Code: Bad RIP value. [ 699.892268] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [ 699.893781] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800 [ 699.895220] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003 [ 699.896643] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000 [ 699.898069] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610 [ 699.899505] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000 [ 699.901241] The buggy address belongs to the page: [ 699.902215] page:ffffea0007d0ebc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 699.903811] flags: 0x2ffff0000000000() [ 699.904585] raw: 02ffff0000000000 0000000000000000 ffffffff07d00101 0000000000000000 [ 699.906125] raw: 0000000000000000 0000000000240000 00000000ffffffff 0000000000000000 [ 699.907673] page dumped because: kasan: bad access detected [ 699.909108] Memory state around the buggy address: [ 699.910077] ffff8801f43af800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00 [ 699.911528] ffff8801f43af880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 699.912953] >ffff8801f43af900: 00 00 00 00 00 00 00 00 f1 01 f4 f4 f4 f2 f2 f2 [ 699.914392] ^ [ 699.915758] ffff8801f43af980: f2 00 f4 f4 00 00 00 00 f2 00 00 00 00 00 00 00 [ 699.917193] ffff8801f43afa00: 00 00 00 00 00 00 00 00 00 f3 f3 f3 00 00 00 00 [ 699.918634] ================================================================== - Location https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.h#L644 Reported-by Wen Xu <wen.xu@gatech.edu> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e1da7872	05-Jun-2018	Chao Yu <chao@kernel.org>	f2fs: introduce and spread verify_blkaddr This patch introduces verify_blkaddr to check meta/data block address with valid range to detect bug earlier. In addition, once we encounter an invalid blkaddr, notice user to run fsck to fix, and let the kernel panic. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e2e59414	21-Jun-2018	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: assign REQ_RAHEAD to bio for ->readpages As Jens reported, we'd better assign REQ_RAHEAD to bio by the fact that ->readpages is called only from read-ahead. In Documentation/filesystems/vfs.txt, readpages: called by the VM to read pages associated with the address_space object. This is essentially just a vector version of readpage. Instead of just one page, several pages are requested. readpages is only used for read-ahead, so read errors are ignored. If anything goes wrong, feel free to give up. Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8a56dd96	29-Jun-2018	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: allow wrong configured dio to buffered write This fixes to support dio having unaligned buffers as buffered writes. xfs_io -f -d -c "pwrite 0 512" $testfile -> okay xfs_io -f -d -c "pwrite 1 512" $testfile -> EINVAL Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c29fd0c0	04-Jun-2018	Chao Yu <chao@kernel.org>	f2fs: let sync node IO interrupt async one Although mixed sync/async IOs can have continuous LBA, as they have different IO priority, block IO scheduler will add them into different queues and commit them separately, result in splited IOs which causes wrose performance. This patch gives high priority to synchronous IO of nodes, means that once synchronous flow starts, it can interrupt asynchronous writeback flow of system flusher, so more big IOs can be expected. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4d57b86d	29-May-2018	Chao Yu <chao@kernel.org>	f2fs: clean up symbol namespace As Ted reported: "Hi, I was looking at f2fs's sources recently, and I noticed that there is a very large number of non-static symbols which don't have a f2fs prefix. There's well over a hundred (see attached below). As one example, in fs/f2fs/dir.c there is: unsigned char get_de_type(struct f2fs_dir_entry *de) This function is clearly only useful for f2fs, but it has a generic name. This means that if any other file system tries to have the same symbol name, there will be a symbol conflict and the kernel would not successfully build. It also means that when someone is looking f2fs sources, it's not at all obvious whether a function such as read_data_page(), invalidate_blocks(), is a generic kernel function found in the fs, mm, or block layers, or a f2fs specific function. You might want to fix this at some point. Hopefully Kent's bcachefs isn't similarly using genericly named functions, since that might cause conflicts with f2fs's functions --- but just as this would be a problem that we would rightly insist that Kent fix, this is something that we should have rightly insisted that f2fs should have fixed before it was integrated into the mainline kernel. acquire_orphan_inode add_ino_entry add_orphan_inode allocate_data_block allocate_new_segments alloc_nid alloc_nid_done alloc_nid_failed available_free_memory ...." This patch adds "f2fs_" prefix for all non-static symbols in order to: a) avoid conflict with other kernel generic symbols; b) to indicate the function is f2fs specific one instead of generic one; Reported-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# fc99fe27	29-May-2018	Chao Yu <chao@kernel.org>	f2fs: make __f2fs_write_data_pages() static Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# fe16efe6	28-May-2018	Chao Yu <chao@kernel.org>	f2fs: fix to let caller retry allocating block address Configure io_bits with 2 and enable LFS mode, generic/013 reports below dmesg: BUG: unable to handle kernel NULL pointer dereference at 00000104 pdpt = 0000000029b7b001 pde = 0000000000000000 Oops: 0002 [#1] PREEMPT SMP Modules linked in: crc32_generic zram f2fs(O) rfcomm bnep bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev snd_seq_device aesni_intel snd_timer aes_i586 snd crypto_simd cryptd soundcore i2c_piix4 serio_raw mac_hid video parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000 CPU: 0 PID: 11161 Comm: fsstress Tainted: G O 4.17.0-rc2 #38 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs] EFLAGS: 00010206 CPU: 0 EAX: e863dcd8 EBX: 00000000 ECX: 00000100 EDX: 00000200 ESI: e863dcf4 EDI: f6f82768 EBP: e863dbb0 ESP: e863db74 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 80050033 CR2: 00000104 CR3: 29a62020 CR4: 000406f0 Call Trace: do_write_page+0x6f/0xc0 [f2fs] write_data_page+0x4a/0xd0 [f2fs] do_write_data_page+0x327/0x630 [f2fs] __write_data_page+0x34b/0x820 [f2fs] __f2fs_write_data_pages+0x42d/0x8c0 [f2fs] f2fs_write_data_pages+0x27/0x30 [f2fs] do_writepages+0x1a/0x70 __filemap_fdatawrite_range+0x94/0xd0 filemap_write_and_wait_range+0x3d/0xa0 __generic_file_write_iter+0x11a/0x1f0 f2fs_file_write_iter+0xdd/0x3b0 [f2fs] __vfs_write+0xd2/0x150 vfs_write+0x9b/0x190 ksys_write+0x45/0x90 sys_write+0x16/0x20 do_fast_syscall_32+0xaa/0x22c entry_SYSENTER_32+0x4c/0x7b EIP: 0xb7fc8c51 EFLAGS: 00000246 CPU: 0 EAX: ffffffda EBX: 00000003 ECX: 09cde000 EDX: 00001000 ESI: 00000003 EDI: 00001000 EBP: 00000000 ESP: bfbded38 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b Code: e8 f9 77 34 c9 8b 45 e0 8b 80 b8 00 00 00 39 45 d8 0f 84 bb 02 00 00 8b 45 e0 8b 80 b8 00 00 00 8d 50 d8 8b 08 89 55 f0 8b 50 04 <89> 51 04 89 0a c7 00 00 01 00 00 c7 40 04 00 02 00 00 8b 45 dc EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs] SS:ESP: 0068:e863db74 CR2: 0000000000000104 ---[ end trace 4cac79c0d1305ee6 ]--- allocate_data_block will submit all sequential pending IOs sorted by a FIFO list, If we failed to submit other user's IO due to unaligned write, we will retry to allocate new block address for current IO, then it will initialize fio.list again, if fio was in the list before, it can break FIFO list, result in above panic. Thread A Thread B - do_write_page - allocate_data_block - list_add_tail : fioA cached in FIFO list. - do_write_page - allocate_data_block - list_add_tail : fioB cached in FIFO list. - f2fs_submit_page_write : fail to submit IO - allocate_data_block - INIT_LIST_HEAD - f2fs_submit_page_write - list_del <-- NULL pointer dereference This patch adds fio.retry parameter to indicate failure status for each IO, and avoid bailing out if there is still pending IO in FIFO list for fixing. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1174abfd	28-May-2018	Chao Yu <chao@kernel.org>	f2fs: don't drop dentry pages after fs shutdown As description in commit "f2fs: don't drop any page on f2fs_cp_error() case": "We still provide readdir() after shtudown, so we should keep pages to avoid additional IOs." In order to provider lastest directory structure, let's keep dentry pages in cache after fs shutdown. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# aec2f729	26-May-2018	Chao Yu <chao@kernel.org>	f2fs: clean up with clear_radix_tree_dirty_tag Introduce clear_radix_tree_dirty_tag to include common codes for cleanup. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2ef79ecb	07-May-2018	Chao Yu <chao@kernel.org>	f2fs: avoid stucking GC due to atomic write f2fs doesn't allow abuse on atomic write class interface, so except limiting in-mem pages' total memory usage capacity, we need to limit atomic-write usage as well when filesystem is seriously fragmented, otherwise we may run into infinite loop during foreground GC because target blocks in victim segment are belong to atomic opened file for long time. Now, we will detect failure due to atomic write in foreground GC, if the count exceeds threshold, we will drop all atomic written data in cache, by this, I expect it can keep our system running safely to prevent Dos attack. In addition, his patch adds to show GC skip information in debugfs, now it just shows count of skipped caused by atomic write. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f8de4331	23-May-2018	Chao Yu <chao@kernel.org>	f2fs: detect synchronous writeback more earlier This patch changes to detect synchronous writeback more earlier before, in order to avoid unnecessary page writeback before exiting asynchronous writeback. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7b525dd0	23-May-2018	Chao Yu <chao@kernel.org>	f2fs: clean up with is_valid_blkaddr() - rename is_valid_blkaddr() to is_valid_meta_blkaddr() for readability. - introduce is_valid_blkaddr() for cleanup. No logic change in this patch. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e7a4feb0	08-May-2018	Chao Yu <chao@kernel.org>	f2fs: fix to let checkpoint guarantee atomic page persistence 1. thread A: commit_inmem_pages submit data into block layer, but haven't waited it writeback. 2. thread A: commit_inmem_pages update related node. 3. thread B: do checkpoint, flush all nodes to disk. 4. SPOR Then, atomic file becomes corrupted since nodes is flushed before data. This patch fixes to treat atomic page as checkpoint guaranteed one, then in checkpoint, we can make sure all atomic page can be writebacked with metadata of atomic file. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b2532c69	23-Apr-2018	Chao Yu <chao@kernel.org>	f2fs: rename dio_rwsem to i_gc_rwsem RW semphore dio_rwsem in struct f2fs_inode_info is introduced to avoid race between dio and data gc, but now, it is more wildly used to avoid foreground operation vs data gc. So rename it to i_gc_rwsem to improve its readability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5b19d284	04-May-2018	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid fsync() failure caused by EAGAIN in writepage() pageout() in MM traslates EAGAIN, so calls handle_write_error() -> mapping_set_error() -> set_bit(AS_EIO, ...). file_write_and_wait_range() will see EIO error, which is critical to return value of fsync() followed by atomic_write failure to user. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 17c50035	12-Apr-2018	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: clear PageError on writepage This patch clears PageError in some pages tagged by read path, but when we write the pages with valid contents, writepage should clear the bit likewise ext4. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b87078ad	20-Apr-2018	Jaegeuk Kim <jaegeuk@kernel.org>	Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer" This patch reverts copied f2fs_set_page_dirty_nobuffer to use generic function for stability. This reverts commit fe76b796fc5194cc3d57265002e3a748566d073f. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6dbb1796	18-Apr-2018	Eric Biggers <ebiggers@google.com>	f2fs: refactor read path to allow multiple postprocessing steps Currently f2fs's ->readpage() and ->readpages() assume that either the data undergoes no postprocessing, or decryption only. But with fs-verity, there will be an additional authenticity verification step, and it may be needed either by itself, or combined with decryption. To support this, store a 'struct bio_post_read_ctx' in ->bi_private which contains a work struct, a bitmask of postprocessing steps that are enabled, and an indicator of the current step. The bio completion routine, if there was no I/O error, enqueues the first postprocessing step. When that completes, it continues to the next step. Pages that fail any postprocessing step have PageError set. Once all steps have completed, pages without PageError set are set Uptodate, and all pages are unlocked. Also replace f2fs_encrypted_file() with a new function f2fs_post_read_required() in places like direct I/O and garbage collection that really should be testing whether the file needs special I/O processing, not whether it is encrypted specifically. This may also be useful for other future f2fs features such as compression. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0cb8dae4	18-Apr-2018	Eric Biggers <ebiggers@google.com>	fscrypt: allow synchronous bio decryption Currently, fscrypt provides fscrypt_decrypt_bio_pages() which decrypts a bio's pages asynchronously, then unlocks them afterwards. But, this assumes that decryption is the last "postprocessing step" for the bio, so it's incompatible with additional postprocessing steps such as authenticity verification after decryption. Therefore, rename the existing fscrypt_decrypt_bio_pages() to fscrypt_enqueue_decrypt_bio(). Then, add fscrypt_decrypt_bio() which decrypts the pages in the bio synchronously without unlocking the pages, nor setting them Uptodate; and add fscrypt_enqueue_decrypt_work(), which enqueues work on the fscrypt_read_workqueue. The new functions will be used by filesystems that support both fscrypt and fs-verity. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b93b0163	10-Apr-2018	Matthew Wilcox <willy@infradead.org>	page cache: use xa_lock Remove the address_space ->tree_lock and use the xa_lock newly added to the radix_tree_root. Rename the address_space ->page_tree to ->i_pages, since we don't really care that it's a tree. [willy@infradead.org: fix nds32, fs/dax.c] Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Jeff Layton <jlayton@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 0833721e	08-Mar-2018	Yunlei He <heyunlei@huawei.com>	f2fs: check blkaddr more accuratly before issue a bio This patch check blkaddr more accuratly before issue a write or read bio. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b91050a8	08-Mar-2018	Hyunchul Lee <cheol.lee@lge.com>	f2fs: add nowait aio support This patch adds nowait aio support[1]. Return EAGAIN if any of the following checks fail for direct I/O: - i_rwsem is not lockable - Blocks are not allocated at the write location And xfstests generic/471 is passed. [1]: 6be96d "Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT" Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 63189b78	07-Mar-2018	Chao Yu <chao@kernel.org>	f2fs: wrap all options with f2fs_sb_info.mount_opt This patch merges miscellaneous mount options into struct f2fs_mount_info, After this patch, once we add new mount option, we don't need to worry about recovery of it in remount_fs(), since we will recover the f2fs_sb_info.mount_opt including all options. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ccd31cb2	05-Feb-2018	Sheng Yong <shengyong1@huawei.com>	f2fs: clean up f2fs_sb_has_xxx functions This patch introduces F2FS_FEATURE_FUNCS to clean up the definitions of different f2fs_sb_has_xxx functions. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3bb09a0e	05-Feb-2018	Tiezhu Yang <yangtiezhu@loongson.cn>	f2fs: remove redundant check of page type when submit bio This patch removes redundant check of page type when submit bio to make the logic more clear. Signed-off-by: Tiezhu Yang <kernelpatch@126.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0cdd3195	30-Jan-2018	Hyunchul Lee <cheol.lee@lge.com>	f2fs: support passing down write hints given by users to block layer Add the 'whint_mode' mount option that controls which write hints are passed down to block layer. There are "off" and "user-based" mode. The default mode is "off". 1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET. 2) whint_mode=user-based. F2FS tries to pass down hints given by users. User F2FS Block ---- ---- ----- META WRITE_LIFE_NOT_SET HOT_NODE " WARM_NODE " COLD_NODE " ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME extension list " " -- buffered io WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET WRITE_LIFE_NONE " " WRITE_LIFE_MEDIUM " " WRITE_LIFE_LONG " " -- direct io WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET WRITE_LIFE_NONE " WRITE_LIFE_NONE WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM WRITE_LIFE_LONG " WRITE_LIFE_LONG Many thanks to Chao Yu and Jaegeuk Kim for comments to implement this patch. Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: avoid build warning] [Chao Yu: fix to restore whint_mode in ->remount_fs] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# db198ae0	18-Jan-2018	Chao Yu <chao@kernel.org>	f2fs: drop page cache after fs shutdown Don't remain dirtied page cache in f2fs after shutdown, it can mitigate memory pressure of whole system, in order to keep other modules working properly. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# bb9e3bb8	17-Jan-2018	Chao Yu <chao@kernel.org>	f2fs: split need_inplace_update This patch splits need_inplace_update to two functions: a. should_update_inplace() includes all conditions that we must use IPU. b. should_update_outplace() includes all conditions that we must use OPU. So that, in f2fs_ioc_set_pin_file() and f2fs_defragment_range(), we can use corresponding function to check whether we can trigger OPU/IPU or not. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# eb449797	17-Jan-2018	Chao Yu <chao@kernel.org>	f2fs: fix to update last_disk_size correctly This patch fixes to update last_disk_size only when writing out page successfully. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b323fd28	17-Jan-2018	Chao Yu <chao@kernel.org>	f2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup Use get_inline_xattr_addrs directly instead of F2FS_INLINE_XATTR_ADDRS. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a2e2e76b	15-Jan-2018	Chao Yu <chao@kernel.org>	f2fs: fix to drop all inmem pages correctly In commit 57864ae5ce3a ("f2fs: limit # of inmemory pages"), we have limited memory footprint of all inmem pages with 20% of total memory, otherwise, if we exceed the threshold, we will try to drop all inmem pages to avoid excessive memory pressure resulting in performance regression. But in some unrelated error paths, we will also drop all inmem pages, which should be wrong, fix it in this patch. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f3d98e74	10-Jan-2018	Chao Yu <chao@kernel.org>	f2fs: speed up defragment on sparse file We have supported to get next page offset with valid mapping crossing hole in f2fs_map_blocks, utilizing it to speed up defragment on sparse file. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c4020b2d	10-Jan-2018	Chao Yu <chao@kernel.org>	f2fs: support F2FS_IOC_PRECACHE_EXTENTS This patch introduces a new ioctl F2FS_IOC_PRECACHE_EXTENTS to precache extent info like ext4, in order to gain better performance during triggering AIO by eliminating synchronous waiting of mapping info. Referred commit: 7869a4a6c5ca ("ext4: add support for extent pre-caching") In addition, with newly added extent precache abilitiy, this patch add to support FIEMAP_FLAG_CACHE in ->fiemap. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1ad71a27	07-Dec-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: add an ioctl to disable GC for specific file This patch gives a flag to disable GC on given file, which would be useful, when user wants to keep its block map. It also conducts in-place-update for dontmove file. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 442a9dbd	10-Jan-2018	Chao Yu <chao@kernel.org>	f2fs: support FIEMAP_FLAG_XATTR This patch enables ->fiemap to handle FIEMAP_FLAG_XATTR flag for xattr mapping info lookup purpose. It makes f2fs passing generic/425 test in fstest. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f1b43d4c	10-Jan-2018	Chao Yu <chao@kernel.org>	f2fs: fix to cover f2fs_inline_data_fiemap with inode_lock This patch fix to cover f2fs_inline_data_fiemap with inode_lock in order to make that interface avoiding race with mapping change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7dff55d2	10-Jan-2018	Yunlei He <heyunlei@huawei.com>	f2fs: check node page again in write end io Check node page again in write end io in case of data corruption during inflght IO. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 578c6478	09-Jan-2018	Yufen Yu <yuyufen@huawei.com>	f2fs: implement cgroup writeback support Cgroup writeback requires explicit support from the filesystem. f2fs's data and node writeback IOs go through __write_data_page, which sets fio for submiting IOs. So, we add io_wbc for fio, associate bios with blkcg by invoking wbc_init_bio() and account IOs issuing by wbc_account_io(). In addtion, f2fs_fill_super() is updated to set SB_I_CGROUPWB. Meta writeback IOs is left alone by this patch and will always be attributed to the root cgroup. The results show that f2fs can throttle writeback nicely for data writing and file creating. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 263663cd	18-Dec-2017	Ming Lei <ming.lei@redhat.com>	block: convert to bio_first_bvec_all & bio_first_page_all This patch converts to bio_first_bvec_all() & bio_first_page_all() for retrieving the 1st bvec/page, and prepares for supporting multipage bvec. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# d6d478a1	03-Jan-2018	Chao Yu <chao@kernel.org>	f2fs: continue to do direct IO if we only preallocate partial blocks While doing direct IO, if we run out-of-space when we preallocate blocks, we should not return ENOSPC error directly, instead, we should continue to do following direct IO, which will keep directIO of f2fs acting like other filesystems. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b1ca321d	31-Dec-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: skip stop_checkpoint for user data writes We can give another chance to write user data, which can resolve generic/441. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4c2ac6a8	30-Nov-2017	Chao Yu <chao@kernel.org>	f2fs: clean up f2fs_map_blocks f2fs_map_blocks(): if (blkaddr == NEW_ADDR \|\| blkaddr == NULL_ADDR) { if (create) { ... } else { ... if (flag == F2FS_GET_BLOCK_FIEMAP && blkaddr == NULL_ADDR) { ... } if (flag != F2FS_GET_BLOCK_FIEMAP \|\| blkaddr != NEW_ADDR) goto sync_out; } It means we can break the loop in cases of: a) flag != F2FS_GET_BLOCK_FIEMAP or b) flag == F2FS_GET_BLOCK_FIEMAP && blkaddr == NULL_ADDR Condition b) is the same as previous one, so merge operations of them for readability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# d5097be5	27-Nov-2017	Hyunchul Lee <cheol.lee@lge.com>	f2fs: apply write hints to select the type of segment for direct write When blocks are allocated for direct write, select the type of segment using the kiocb hint. But if an inode has FI_NO_ALLOC, use the inode hint. Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 736c0a74	24-Nov-2017	LiFan <fanofcode.li@samsung.com>	f2fs: remove an excess variable Remove the variable page_idx which no one would miss. Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 25006645	22-Nov-2017	Sheng Yong <shengyong1@huawei.com>	f2fs: still write data if preallocate only partial blocks If there is not enough space left, f2fs_preallocate_blocks may only preallocte partial blocks. As a result, the write operation fails but i_blocks is not 0. To avoid this, f2fs should write data in non-preallocation way and write as many data as the size of i_blocks. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 86679820	15-Nov-2017	Mel Gorman <mgorman@techsingularity.net>	mm, pagevec: remove cold parameter for pagevecs Every pagevec_init user claims the pages being released are hot even in cases where it is unlikely the pages are hot. As no one cares about the hotness of pages being released to the allocator, just ditch the parameter. No performance impact is expected as the overhead is marginal. The parameter is removed simply because it is a bit stupid to have a useless parameter copied everywhere. Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 67fd707f	15-Nov-2017	Jan Kara <jack@suse.cz>	mm: remove nr_pages argument from pagevec_lookup_{,range}_tag() All users of pagevec_lookup() and pagevec_lookup_range() now pass PAGEVEC_SIZE as a desired number of pages. Just drop the argument. Link: http://lkml.kernel.org/r/20171009151359.31984-15-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 69c4f35d	15-Nov-2017	Jan Kara <jack@suse.cz>	f2fs: use pagevec_lookup_range_tag() We want only pages from given range in f2fs_write_cache_pages(). Use pagevec_lookup_range_tag() instead of pagevec_lookup_tag() and remove unnecessary code. Link: http://lkml.kernel.org/r/20171009151359.31984-6-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# d62fe971	28-Oct-2017	Chao Yu <chao@kernel.org>	f2fs: support bio allocation error injection This patch adds to support bio allocation error injection to simulate out-of-memory test scenario. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 01eccef7	28-Oct-2017	Chao Yu <chao@kernel.org>	f2fs: support get_page error injection This patch adds to support get_page error injection to simulate out-of-memory test scenario. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 57864ae5	18-Oct-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: limit # of inmemory pages If some abnormal users try lots of atomic write operations, f2fs is able to produce pinned pages in the main memory which affects system performance. This patch limits that as 20% over total memory size, and if f2fs reaches to the limit, it will drop all the inmemory pages. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a0d00fad	09-Oct-2017	Chao Yu <chao@kernel.org>	f2fs: fix to avoid race when accessing last_disk_size last_disk_size could be wrong due to concurrently updating, so using i_sem semaphore to make last_disk_size updating exclusive to fix this issue. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ebf7c522	07-Oct-2017	Thomas Meyer <thomas@m3y3r.de>	f2fs: Fix bool initialization/comparison Bool initializations should use true and false. Bool tests don't need comparisons. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 39d787be	28-Sep-2017	Chao Yu <chao@kernel.org>	f2fs: enhance multiple device flush When multiple device feature is enabled, during ->fsync we will issue flush in all devices to make sure node/data of the file being persisted into storage. But some flushes of device could be unneeded as file's data may be not writebacked into those devices. So this patch adds and manage bitmap per inode in global cache to indicate which device is dirty and it needs to issue flush during ->fsync, hence, we could improve performance of fsync in scenario of multiple device. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 71ad682c	29-Sep-2017	Weichao Guo <guoweichao@huawei.com>	f2fs: convert inline data for direct I/O & FI_NO_PREALLOC In FI_NO_PREALLOC cases, direct I/O path may allocate blocks for an inode but keep its inline data flag. This inconsistency may trigger vfs clear_inode nrpages bug_on when evicting the inode. We should convert inline data first in this case. Signed-off-by: Weichao Guo <guoweichao@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 71cb4aff	23-Sep-2017	Gao Xiang <xiang@kernel.org>	f2fs: allow readpages with NULL file pointer Keep in line with the other Linux file system implementations since page_cache_sync_readahead supports NULL file pointer, and thus we can readahead data by f2fs itself without file opening (something like the btrfs behavior). Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2916ecc0	08-Sep-2017	Jérôme Glisse <jglisse@redhat.com>	mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY Introduce a new migration mode that allow to offload the copy to a device DMA engine. This changes the workflow of migration and not all address_space migratepage callback can support this. This is intended to be use by migrate_vma() which itself is use for thing like HMM (see include/linux/hmm.h). No additional per-filesystem migratepage testing is needed. I disables MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i added comment in those to explain why (part of this patch). The commit message is unclear it should say that any callback that wish to support this new mode need to be aware of the difference in the migration flow from other mode. Some of these callbacks do extra locking while copying (aio, zsmalloc, balloon, ...) and for DMA to be effective you want to copy multiple pages in one DMA operations. But in the problematic case you can not easily hold the extra lock accross multiple call to this callback. Usual flow is: For each page { 1 - lock page 2 - call migratepage() callback 3 - (extra locking in some migratepage() callback) 4 - migrate page state (freeze refcount, update page cache, buffer head, ...) 5 - copy page 6 - (unlock any extra lock of migratepage() callback) 7 - return from migratepage() callback 8 - unlock page } The new mode MIGRATE_SYNC_NO_COPY: 1 - lock multiple pages For each page { 2 - call migratepage() callback 3 - abort in all problematic migratepage() callback 4 - migrate page state (freeze refcount, update page cache, buffer head, ...) } // finished all calls to migratepage() callback 5 - DMA copy multiple pages 6 - unlock all the pages To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a new callback migratepages() (for instance) that deals with multiple pages in one transaction. Because the problematic cases are not important for current usage I did not wanted to complexify this patchset even more for no good reason. Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Nellans <dnellans@nvidia.com> Cc: Evgeny Baskakov <ebaskakov@nvidia.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mark Hairgrove <mhairgrove@nvidia.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Sherry Cheung <SCheung@nvidia.com> Cc: Subhash Gutti <sgutti@nvidia.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Bob Liu <liubo95@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 13ba41e3	06-Sep-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: make get_lock_data_page to handle encrypted inode This patch refactors get_lock_data_page() to handle encryption case directly. In order to do that, it introduces common f2fs_submit_page_read(). Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# d4c759ee	05-Sep-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: use generic terms used for encrypted block management This patch renames functions regarding to buffer management via META_MAPPING used for encrypted blocks especially. We can actually use them in generic way. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1958593e	05-Sep-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: introduce f2fs_encrypted_file for clean-up This patch replaces (f2fs_encrypted_inode() && S_ISREG()) with f2fs_encrypted_file(), which gives no functional change. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 74d46992	23-Aug-2017	Christoph Hellwig <hch@lst.de>	block: replace bi_bdev with a gendisk pointer and partitions index This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# f2220c7f	09-Aug-2017	Qiuyang Sun <sunqiuyang@huawei.com>	f2fs: merge equivalent flags F2FS_GET_BLOCK_[READ\|DIO] Currently, the two flags F2FS_GET_BLOCK_[READ\|DIO] are totally equivalent and can be used interchangably in all scenarios they are involved in. Neither of the flags is referenced in f2fs_map_blocks(), making them both the default case. To remove the ambiguity, this patch merges both flags into F2FS_GET_BLOCK_DEFAULT, and introduces an enum for all distinct flags. Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b0af6d49	02-Aug-2017	Chao Yu <chao@kernel.org>	f2fs: add app/fs io stat This patch enables inner app/fs io stats and introduces below virtual fs nodes for exposing stats info: /sys/fs/f2fs/<dev>/iostat_enable /proc/fs/f2fs/<dev>/iostat_info Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix wrong stat assignment] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7a2af766	18-Jul-2017	Chao Yu <chao@kernel.org>	f2fs: enhance on-disk inode structure scalability This patch add new flag F2FS_EXTRA_ATTR storing in inode.i_inline to indicate that on-disk structure of current inode is extended. In order to extend, we changed the inode structure a bit: Original one: struct f2fs_inode { ... struct f2fs_extent i_ext; __le32 i_addr[DEF_ADDRS_PER_INODE]; __le32 i_nid[DEF_NIDS_PER_INODE]; } Extended one: struct f2fs_inode { ... struct f2fs_extent i_ext; union { struct { __le16 i_extra_isize; __le16 i_padding; __le32 i_extra_end[0]; }; __le32 i_addr[DEF_ADDRS_PER_INODE]; }; __le32 i_nid[DEF_NIDS_PER_INODE]; } Once F2FS_EXTRA_ATTR is set, we will steal four bytes in the head of i_addr field for storing i_extra_isize and i_padding. with i_extra_isize, we can calculate actual size of reserved space in i_addr, available attribute fields included in total extra attribute fields for current inode can be described as below: +--------------------+ \| .i_mode \| \| ... \| \| .i_ext \| +--------------------+ \| .i_extra_isize \|-----+ \| .i_padding \| \| \| .i_prjid \| \| \| .i_atime_extra \| \| \| .i_ctime_extra \| \| \| .i_mtime_extra \|<----+ \| .i_inode_cs \|<----- store blkaddr/inline from here \| .i_xattr_cs \| \| ... \| +--------------------+ \| \| \| block address \| \| \| +--------------------+ \| .i_nid \| +--------------------+ \| node_footer \| \| (nid, ino, offset) \| +--------------------+ Hence, with this patch, we would enhance scalability of f2fs inode for storing more newly added attribute. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f2470371	18-Jul-2017	Chao Yu <chao@kernel.org>	f2fs: make max inline size changeable This patch tries to make below macros calculating max inline size, inline dentry field size considerring reserving size-changeable space: - MAX_INLINE_DATA - NR_INLINE_DENTRY - INLINE_DENTRY_BITMAP_SIZE - INLINE_RESERVED_SIZE Then, when inline_{data,dentry} options is enabled, it allows us to reserve inline space with different size flexibly for adding newly introduced inode attribute. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0abd675e	08-Jul-2017	Chao Yu <chao@kernel.org>	f2fs: support plain user/group quota This patch adds to support plain user/group quota. Change Note by Jaegeuk Kim. - Use f2fs page cache for quota files in order to consider garbage collection. so, quota files are not tolerable for sudden power-cuts, so user needs to do quotacheck. - setattr() calls dquot_transfer which will transfer inode->i_blocks. We can't reclaim that during f2fs_evict_inode(). So, we need to count node blocks as well in order to match i_blocks with dquot's space. Note that, Chao wrote a patch to count inode->i_blocks without inode block. (f2fs: don't count inode block in in-memory inode.i_blocks) - in f2fs_remount, we need to make RW in prior to dquot_resume. - handle fault_injection case during f2fs_quota_off_umount - TODO: Project quota Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# d29460e5	21-Jun-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid deadlock caused by lock order of page and lock_op - punch_hole - fill_zero - f2fs_lock_op - get_new_data_page - lock_page - f2fs_write_data_pages - lock_page - do_write_data_page - f2fs_lock_op Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ff1048e7	06-Jul-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: relax migratepage for atomic written page In order to avoid lock contention for atomic written pages, we'd better give EBUSY in f2fs_migrate_page when mode is asynchronous. We expect it will be released soon as transaction commits. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0771fcc7	29-Jun-2017	Chao Yu <chao@kernel.org>	f2fs: skip ->writepages for {mete,node}_inode during recovery Skip ->writepages in prior to ->writepage for {meta,node}_inode during recovery, hence unneeded loop in ->writepages can be avoided. Moreover, check SBI_POR_DOING earlier while writebacking pages. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5a3a2d83	17-May-2017	Qiuyang Sun <sunqiuyang@huawei.com>	f2fs: dax: fix races between page faults and truncating pages Currently in F2FS, page faults and operations that truncate the pagecahe or data blocks, are completely unsynchronized. This can result in page fault faulting in a page into a range that we are changing after truncating, and thus we can end up with a page mapped to disk blocks that will be shortly freed. Filesystem corruption will shortly follow. This patch fixes the problem by creating new rw semaphore i_mmap_sem in f2fs_inode_info and grab it for functions removing blocks from extent tree and for read over page faults. The mechanism is similar to that in ext4. Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4e4cbee9	03-Jun-2017	Christoph Hellwig <hch@lst.de>	block: switch bios to blk_status_t Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
# fb830fc5	19-May-2017	Chao Yu <chao@kernel.org>	f2fs: introduce io_list for serialize data/node IOs Serialize data/node IOs by using fifo list instead of mutex lock, it will help to enhance concurrency of f2fs, meanwhile keeping LFS IO semantics. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# cc15620b	12-May-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid f2fs_lock_op for IPU writes Currently, if we do get_node_of_data before f2fs_lock_op, there may be dead lock as follows, where process A would be in infinite loop, and B will NOT be awaked. Process A(cp): Process B: f2fs_lock_all(sbi) get_dnode_of_data <---- lock dn.node_page flush_nodes f2fs_lock_op So, this patch adds f2fs_trylock_op to avoid f2fs_lock_op done by IPU. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a912b54d	10-May-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: split bio cache Split DATA/NODE type bio cache according to different temperature, so write IOs with the same temperature can be merged in corresponding bio cache as much as possible, otherwise, different temperature write IOs submitting into one bio cache will always cause split of bio. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b9109b0e	10-May-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: remove unnecessary read cases in merged IO flow Merged IO flow doesn't need to care about read IOs. f2fs_submit_merged_bio -> f2fs_submit_merged_write f2fs_submit_merged_bios -> f2fs_submit_merged_writes f2fs_submit_merged_bio_cond -> f2fs_submit_merged_write_cond Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3adc5fcb	02-May-2017	Jan Kara <jack@suse.cz>	f2fs: Make flush bios explicitely sync Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as synchronous" removed REQ_SYNC flag from WRITE_{FUA\|PREFLUSH\|...} definitions. generic_make_request_checks() however strips REQ_FUA and REQ_PREFLUSH flags from a bio when the storage doesn't report volatile write cache and thus write effectively becomes asynchronous which can lead to performance regressions. Fix the problem by making sure all bios which are synchronous are properly marked with REQ_SYNC. Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3 Cc: stable@vger.kernel.org # 4.9+ CC: Jaegeuk Kim <jaegeuk@kernel.org> CC: linux-f2fs-devel@lists.sourceforge.net Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 279d6df2	26-Apr-2017	Hou Pengyang <houpengyang@huawei.com>	f2fs: release cp and dnode lock before IPU We don't need to rewrite the page under cp_rwsem and dnode locks. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a817737e	24-Apr-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: introduce valid_ipu_blkaddr to clean up This patch introduces valid_ipu_blkaddr to clean up checking block address for inplace-update. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e959c8f5	24-Apr-2017	Hou Pengyang <houpengyang@huawei.com>	f2fs: lookup extent cache first under IPU scenario If a page is cold, NOT atomit written and need_ipu now, there is a high probability that IPU should be adapted. For IPU, we try to check extent tree to get the block index first, instead of reading the dnode page, where may lead to an useless dnode IO, since no need to update the dnode index for IPU. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7eab0c0d	24-Apr-2017	Hou Pengyang <houpengyang@huawei.com>	f2fs: reconstruct code to write a data page This patch introduces encrypt_one_page which encrypts one data page before submit_bio, and change the use of need_inplace_update. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a7881893	20-Apr-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix out-of free segments This patch also reverts d0db7703ac1 ("f2fs: do SSR in higher priority"). This patch fixes out of free segments caused by many small file creation by 1) mkfs -s 1 2G 2) mount 3) untar - preoduce 60000 small files burstly 4) sync - flush node pages - flush imeta Here, when we do f2fs_balance_fs, we missed # of imeta blocks, resulting in skipping to check has_not_enough_free_secs. Another test is done by 1) mkfs -s 12 2G 2) mount 3) untar - preoduce 60000 small files burstly 4) sync - flush node pages - flush imeta In this case, this patch also fixes wrong block allocation under large section size. Reported-by: William Brana <wbrana@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 04485987	18-Apr-2017	Hou Pengyang <houpengyang@huawei.com>	f2fs: introduce async IPU policy This patch introduces an ASYNC IPU policy. Under senario of large # of async updating(e.g. log writing in Android), disk would be seriously fragmented, and higher frequent gc would be triggered. This patch uses IPU to rewrite the async update writting, since async is NOT sensitive to io latency. Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
# 001c584c	18-Apr-2017	Chao Yu <chao@kernel.org>	f2fs: unlock cp_rwsem early for IPU writes For IPU writes, there won't be any udpates in dnode page since we will reuse old block address instead of allocating new one, so we don't need to lock cp_rwsem during IPU IO submitting. Signed-off-by: Chao Yu <yuchao0@huawei.com>
# 771a9a71	05-Apr-2017	Tomohiro Kusumi <tkusumi@tuxera.com>	f2fs: fix comment on f2fs_flush_merged_bios() after 86531d6b Callers are to unlock the page on failure after 86531d6b. Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# d1b3e72d	30-Mar-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: submit bio of in-place-update pages This patch tries to split in-place-update bios from sequential bios. Suggested-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 687de7f1	28-Mar-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE If two threads try to flush dirty pages in different inodes respectively, f2fs_write_data_pages() will produce WRITE and WRITE_SYNC one at a time, resulting in a lot of 4KB seperated IOs. So, this patch gives higher priority to WB_SYNC_ALL IOs and gathers write IOs with a big WRITE_SYNC'ed bio. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ef095d19	24-Mar-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: write small sized IO to hot log It would better split small and large IOs separately in order to get more consecutive big writes. The default threshold is set to 64KB, but configurable by sysfs/min_hot_blocks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 59c9081b	13-Mar-2017	Yunlei He <heyunlei@huawei.com>	f2fs: allow write page cache when writting cp This patch allow write data to normal file when writting new checkpoint. We relax three limitations for write_begin path: 1. data allocation 2. node allocation 3. variables in checkpoint Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a83d50bc	13-Mar-2017	Kinglong Mee <kinglongmee@gmail.com>	f2fs: fix bad prefetchw of NULL page For f2fs_read_data_pages, the f2fs_mpage_readpages gets "page == NULL", so that, the prefetchw(&page->flags) is operated on NULL. Fixes: f1e8866016 ("f2fs: expose f2fs_mpage_readpages") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8c242db9	16-Mar-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer When I forced to enable atomic operations intentionally, I could hit the below panic, since we didn't clear page->private in f2fs_invalidate_page called by file truncation. The panic occurs due to NULL mapping having page->private. BUG: unable to handle kernel paging request at ffffffffffffffff IP: drop_buffers+0x38/0xe0 PGD 5d00c067 PUD 5d00e067 PMD 0 CPU: 3 PID: 1648 Comm: fsstress Tainted: G D OE 4.10.0+ #5 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 task: ffff9151952863c0 task.stack: ffffaaec40db4000 RIP: 0010:drop_buffers+0x38/0xe0 RSP: 0018:ffffaaec40db74c8 EFLAGS: 00010292 Call Trace: ? page_referenced+0x8b/0x170 try_to_free_buffers+0xc5/0xe0 try_to_release_page+0x49/0x50 shrink_page_list+0x8bc/0x9f0 shrink_inactive_list+0x1dd/0x500 ? shrink_active_list+0x2c0/0x430 shrink_node_memcg+0x5eb/0x7c0 shrink_node+0xe1/0x320 do_try_to_free_pages+0xef/0x2e0 try_to_free_pages+0xe9/0x190 __alloc_pages_slowpath+0x390/0xe70 __alloc_pages_nodemask+0x291/0x2b0 alloc_pages_current+0x95/0x140 __page_cache_alloc+0xc4/0xe0 pagecache_get_page+0xab/0x2a0 grab_cache_page_write_begin+0x20/0x40 get_read_data_page+0x2e6/0x4c0 [f2fs] ? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs] ? truncate_data_blocks_range+0x238/0x2b0 [f2fs] get_lock_data_page+0x30/0x190 [f2fs] __exchange_data_block+0xaaf/0xf40 [f2fs] f2fs_fallocate+0x418/0xd00 [f2fs] vfs_fallocate+0x157/0x220 SyS_fallocate+0x48/0x80 Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> [Chao Yu: use INMEM_INVALIDATE for better tracing] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 174cd4b1	02-Feb-2017	Ingo Molnar <mingo@kernel.org>	sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
# 540faedb	27-Feb-2017	Chao Yu <chao@kernel.org>	f2fs: fix to update F2FS_{CP_}WB_DATA count correctly We should only account F2FS_{CP_}WB_DATA IOs for write path, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 55523519	24-Feb-2017	Chao Yu <chao@kernel.org>	f2fs: show simple call stack in fault injection message Previously kernel message can show that in which function we do the injection, but unfortunately, most of the caller are the same, for tracking more information of injection path, it needs to show upper caller's name. This patch supports that ability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# dd7b2333	23-Feb-2017	Yunlei He <heyunlei@huawei.com>	f2fs: no need lock_op in f2fs_write_inline_data Similar as f2fs_write_inode, f2fs_write_inline_data just mark inode page dirty, so it's no need to write inline data under read lock of cp_rwsem. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3f2be043	23-Feb-2017	Kinglong Mee <kinglongmee@gmail.com>	f2fs: avoid m_flags overlay when allocating more data blocks When more than one data blocks are allocated, the F2FS_MAP_UNWRITTEN/MAPPED flags will be overlapped by F2FS_MAP_NEW at the later times. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e15882b6	23-Feb-2017	Hou Pengyang <houpengyang@huawei.com>	f2fs: init local extent_info to avoid stale stack info in tp To avoid such stale(fops, blk, len) info in f2fs_lookup_extent_tree_end tp dio-23095 [005] ...1 17878.856859: f2fs_lookup_extent_tree_end: dev = (259,30), ino = 856, pgofs = 0, ext_info(fofs: 3441207040, blk: 4294967232, len: 3481143808) Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 86d54795	17-Feb-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: do not wait for writeback in write_begin Otherwise we can get livelock like below. [79880.428136] dbench D 0 18405 18404 0x00000000 [79880.428139] Call Trace: [79880.428142] __schedule+0x219/0x6b0 [79880.428144] schedule+0x36/0x80 [79880.428147] schedule_timeout+0x243/0x2e0 [79880.428152] ? update_sd_lb_stats+0x16b/0x5f0 [79880.428155] ? ktime_get+0x3c/0xb0 [79880.428157] io_schedule_timeout+0xa6/0x110 [79880.428161] __lock_page+0xf7/0x130 [79880.428164] ? unlock_page+0x30/0x30 [79880.428167] pagecache_get_page+0x16b/0x250 [79880.428171] grab_cache_page_write_begin+0x20/0x40 [79880.428182] f2fs_write_begin+0xa2/0xdb0 [f2fs] [79880.428192] ? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs] [79880.428197] ? kmem_cache_free+0x79/0x200 [79880.428203] ? __mark_inode_dirty+0x17f/0x360 [79880.428206] generic_perform_write+0xbb/0x190 [79880.428213] ? file_update_time+0xa4/0xf0 [79880.428217] __generic_file_write_iter+0x19b/0x1e0 [79880.428226] f2fs_file_write_iter+0x9c/0x180 [f2fs] [79880.428231] __vfs_write+0xc5/0x140 [79880.428235] vfs_write+0xb2/0x1b0 [79880.428238] SyS_write+0x46/0xa0 [79880.428242] entry_SYSCALL_64_fastpath+0x1e/0xad Fixes: cae96a5c8ab6 ("f2fs: check io submission more precisely") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7f54f51f	06-Feb-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: remove preflush for nobarrier case This patch removes REQ_PREFLUSH in the nobarrier case. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 942fd319	01-Feb-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: check last page index in cached bio to decide submission If the cached bio has the last page's index, then we need to submit it. Otherwise, we don't need to submit it and can wait for further IO merges. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# d68f735b	03-Feb-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: check io submission more precisely This patch check IO submission more precisely than previous rough check. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f566bae8	03-Feb-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: call internal __write_data_page directly This patch introduces __write_data_page to call it by f2fs_write_cache_pages directly.. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b86e3307	21-Jan-2017	Wei Fang <fangwei1@huawei.com>	f2fs: fix a dead loop in f2fs_fiemap() A dead loop can be triggered in f2fs_fiemap() using the test case as below: ... fd = open(); fallocate(fd, 0, 0, 4294967296); ioctl(fd, FS_IOC_FIEMAP, fiemap_buf); ... It's caused by an overflow in __get_data_block(): ... bh->b_size = map.m_len << inode->i_blkbits; ... map.m_len is an unsigned int, and bh->b_size is a size_t which is 64 bits on 64 bits archtecture, type conversion from an unsigned int to a size_t will result in an overflow. In the above-mentioned case, bh->b_size will be zero, and f2fs_fiemap() will call get_data_block() at block 0 again an again. Fix this by adding a force conversion before left shift. Signed-off-by: Wei Fang <fangwei1@huawei.com> Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# dc91de78	13-Jan-2017	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: do not preallocate blocks which has wrong buffer Sheng Yong reports needless preallocation if write(small_buffer, large_size) is called. In that case, f2fs preallocates large_size, but vfs returns early due to small_buffer size. Let's detect it before preallocation phase in f2fs. Reported-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5fe45743	07-Jan-2017	Chao Yu <chao@kernel.org>	f2fs: introduce FI_ATOMIC_COMMIT This patch introduces a new flag to indicate inode status of doing atomic write committing, so that, we can keep atomic write status for inode during atomic committing, then we can skip GCing pages of atomic write inode, that avoids random GCed datas being mixed with current transaction, so isolation of transaction can be kept. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 939afa94	07-Jan-2017	Chao Yu <chao@kernel.org>	f2fs: clean up with list_{first, last}_entry Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0a595eba	14-Dec-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: support IO alignment for DATA and NODE writes This patch implements IO alignment by filling dummy blocks in DATA and NODE write bios. If we can guarantee, for example, 32KB or 64KB for such the IOs, we can eliminate underlying dummy page problem which FTL conducts in order to close MLC or TLC partial written pages. Note that, - it requires "-o mode=lfs". - IO size should be power of 2, not exceed BIO_MAX_PAGES, 256. - read IO is still 4KB. - do checkpoint at fsync, if dummy NODE page was written. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 554b5125	21-Dec-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: add submit_bio tracepoint This patch adds final submit_bio() tracepoint. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 746e2403	19-Dec-2016	Yunlei He <heyunlei@huawei.com>	f2fs: add a case of no need to read a page in write begin If the range we write cover the whole valid data in the last page, we do not need to read it. Signed-off-by: Yunlei He <heyunlei@huawei.com> [Jaegeuk Kim: nullify the remaining area (fix: xfstests/f2fs/001)] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# bd7b8290	06-Dec-2016	David Gstir <david@sigma-star.at>	fscrypt: Cleanup page locking requirements for fscrypt_{decrypt,encrypt}_page() Rename the FS_CFLG_INPLACE_ENCRYPTION flag to FS_CFLG_OWN_PAGES which, when set, indicates that the fs uses pages under its own control as opposed to writeback pages which require locking and a bounce buffer for encryption. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
# 0002b61b	28-Nov-2016	Chao Yu <chao@kernel.org>	f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage We should use AOP_WRITEPAGE_ACTIVATE when we bypass writing pages. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 36951b38	15-Nov-2016	Chao Yu <chao@kernel.org>	f2fs: don't wait writeback for datas during checkpoint Normally, while committing checkpoint, we will wait on all pages to be writebacked no matter the page is data or metadata, so in scenario where there are lots of data IO being submitted with metadata, we may suffer long latency for waiting writeback during checkpoint. Indeed, we only care about persistence for pages with metadata, but not pages with data, as file system consistent are only related to metadate, so in order to avoid encountering long latency in above scenario, let's recognize and reference metadata in submitted IOs, wait writeback only for metadatas. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c040ff9d	11-Nov-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix redundant block allocation In direct_IO path of f2fs_file_write_iter(), 1. f2fs_preallocate_blocks(F2FS_GET_BLOCK_PRE_DIO) -> allocate LBA X 2. f2fs_direct_IO() -> return 0; Then, f2fs_write_data_page() will allocate another LBA X+1. This makes EIO triggered by HM-SMR. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a7de6086	11-Nov-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: use err for f2fs_preallocate_blocks This patch has no functional change. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3c62be17	06-Oct-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: support multiple devices This patch implements multiple devices support for f2fs. Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big volume under one f2fs instance. Internal block management is very simple, but we will modify block allocation and background GC policy to boost IO speed by exploiting them accoording to each device speed. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e57e9ae5	11-Nov-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: allow dio read for LFS mode We can allow dio reads for LFS mode, while doing buffered writes for dio writes. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6ae1be13	11-Nov-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: revert segment allocation for direct IO Now we don't need to be too much careful about storage alignment for dio, since its speed becomes quite fast and we'd better avoid any misalignment first. Revert: 38aa0889b250 (f2fs: align direct_io'ed data to section) Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0bfd7a09	28-Oct-2016	Damien Le Moal <damien.lemoal@wdc.com>	f2fs: Use generic zoned block device terminology SMR stands for "Shingled Magnetic Recording" which makes sense only for hard disk drives (spinning rust). The ZBC/ZAC standards enable management of SMR disks, but solid state drives may also support those standards. So rename the HMSMR feature to BLKZONED to avoid a HDD centric terminology. For the same reason, rename f2fs_sb_mounted_hmsmr to f2fs_sb_mounted_blkzoned. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 230436b3	02-Nov-2016	Arnd Bergmann <arnd@arndb.de>	f2fs: hide a maybe-uninitialized warning gcc is unsure about the use of last_ofs_in_node, which might happen without a prior initialization: fs/f2fs//git/arm-soc/fs/f2fs/data.c: In function ‘f2fs_map_blocks’: fs/f2fs/data.c:799:54: warning: ‘last_ofs_in_node’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (prealloc && dn.ofs_in_node != last_ofs_in_node + 1) { As pointed out by Chao Yu, the code is actually correct as 'prealloc' is only set if the last_ofs_in_node has been set, the two always get updated together. This initializes last_ofs_in_node to dn.ofs_in_node for each new dnode at the start of the 'next_block' loop, which at that point is a correct initialization as well. I assume that compilers that correctly track the contents of the variables and do not warn about the condition also figure out that they can eliminate the extra assignment here. Fixes: 46008c6d4232 ("f2fs: support in batch multi blocks preallocation") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 664ba972	18-Oct-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: use BIO_MAX_PAGES for bio allocation We don't need to allocate bio partially in order to maximize sequential writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 58736fa6	11-Oct-2016	Chao Yu <chao@kernel.org>	f2fs: be aware of extent beyond EOF in fiemap f2fs can support fallocating blocks beyond file size without changing the size, but ->fiemap of f2fs was restricted and can't detect these extents fallocated past EOF, now relieve the restriction. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6f2d8ed6	11-Oct-2016	Chao Yu <chao@kernel.org>	f2fs: don't miss any f2fs_balance_fs cases In f2fs_map_blocks, let f2fs_balance_fs detects node page modification with dn.node_changed to avoid miss some corner cases. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 933439c8	11-Oct-2016	Chao Yu <chao@kernel.org>	f2fs: give a chance to detach from dirty list If there is no dirty pages in inode, we should give a chance to detach the inode from global dirty list, otherwise it needs to call another unnecessary .writepages for detaching. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 9c4bb8a3	13-Nov-2016	David Gstir <david@sigma-star.at>	fscrypt: Let fs select encryption index/tweak Avoid re-use of page index as tweak for AES-XTS when multiple parts of same page are encrypted. This will happen on multiple (partial) calls of fscrypt_encrypt_page on same page. page->index is only valid for writeback pages. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
# 7821d4dd	13-Nov-2016	David Gstir <david@sigma-star.at>	fscrypt: Enable partial page encryption Not all filesystems work on full pages, thus we should allow them to hand partial pages to fscrypt for en/decryption. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
# 7637241e	01-Nov-2016	Jens Axboe <axboe@fb.com>	writeback: add wbc_to_write_flags() Add wbc_to_write_flags(), which returns the write modifier flags to use, based on a struct writeback_control. No functional changes in this patch, but it prepares us for factoring other wbc fields for write type. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de>
# 70fd7614	01-Nov-2016	Christoph Hellwig <hch@lst.de>	block,fs: use REQ_* flags directly Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
# 5114a97a	11-Oct-2016	Michal Hocko <mhocko@suse.com>	fs: use mapping_set_error instead of opencoded set_bit The mapping_set_error() helper sets the correct AS_ flag for the mapping so there is no reason to open code it. Use the helper directly. [akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO] Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 6ca56ca4	29-Sep-2016	Chao Yu <chao@kernel.org>	f2fs: don't submit irrelevant page While we call ->writepages, there are two cases: a. we didn't writeout any dirty pages, since they are writebacked by other thread concurrently. b. we writeout dirty pages, and have already submitted bio to block layer. In these cases, we don't need to do additional bio flushing unnecessarily, it may split bio in cache into smaller one. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1ecc0c5c	23-Sep-2016	Chao Yu <chao@kernel.org>	f2fs: support configuring fault injection per superblock Previously, we only support global fault injection configuration, so that when we configure type/rate of fault injection through sysfs, mount option, it will influence all f2fs partition which is being used. It is not make sence, since it will be not convenient if developer want to test separated partitions with different fault injection rate/type simultaneously, also it's not possible to enable fault injection in one partition and disable fault injection in other one. >From now on, we move global configuration of fault injection in module into per-superblock, hence injection testing can be more flexible. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5b7a487c	19-Sep-2016	Weichao Guo <guoweichao@huawei.com>	f2fs: add customized migrate_page callback This patch improves the migration of dirty pages and allows migrating atomic written pages that F2FS uses in Page Cache. Instead of the fallback releasing page path, it provides better performance for memory compaction, CMA and other users of memory page migrating. For dirty pages, there is no need to write back first when migrating. For an atomic written page before committing, we can migrate the page and update the related 'inmem_pages' list at the same time. Signed-off-by: Weichao Guo <guoweichao@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix some coding style] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5d4c0af4	17-Sep-2016	Yunlei He <heyunlei@huawei.com>	f2fs: preallocate blocks for encrypted file This patch allow preallocates data blocks for buffered aio writes in encrypted file. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix to avoid BUG_ON] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8b038c70	18-Sep-2016	Chao Yu <chao@kernel.org>	f2fs: support IO error injection This patch adds to support IO error injection for testing IO error tolerance of f2fs. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 649d7df2	06-Sep-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix to set PageUptodate in f2fs_write_end correctly Previously, f2fs_write_begin sets PageUptodate all the time. But, when user tries to update the entire page (i.e., len == PAGE_SIZE), we need to consider that the page is able to be copied partially afterwards. In such the case, we will lose the remaing region in the page. This patch fixes this by setting PageUptodate in f2fs_write_end as given copied result. In the short copy case, it returns zero to let generic_perform_write retry copying user data again. As a result, f2fs_write_end() works: PageUptodate len copied return retry 1. no 4096 4096 4096 false -> return 4096 2. no 4096 1024 0 true -> goto #1 case 3. yes 2048 2048 2048 false -> return 2048 4. yes 2048 1024 1024 false -> return 1024 Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7f3037a5	01-Sep-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: check free_sections for defragmentation Fix wrong condition check for defragmentation of a file. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 68f31393	06-Sep-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: no need to make zeros beyond i_size We don't need to make zeros beyond i_size, since we already wrote that through NEW_ADDR case. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# dfd02e4d	20-Aug-2016	Chao Yu <chao@kernel.org>	f2fs: fix to preallocate block only aligned to 4K In write_begin(), we skip checking dnode block for preallocating block when whole block needs to be updated since we preallocated its block in f2fs_preallocate_blocks, for partial updated block, we will still try to lock its node and do preallocation in write_begin(), so in f2fs_preallocate_blocks we should not preallocate its block. But previously, the calculation of preallocating block number is incorrect, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix a bug] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6a7a3aed	23-Aug-2016	Wei Yongjun <weiyongjun1@huawei.com>	f2fs: fix non static symbol warning Fixes the following sparse warning: fs/f2fs/data.c:969:12: warning: symbol 'f2fs_grab_bio' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 58383bef	20-Aug-2016	Chao Yu <chao@kernel.org>	f2fs: fix to do f2fs_balance_fs in f2fs_map_blocks correctly If we preallocate blocks with f2fs_reserve_blocks in f2fs_map_blocks, we should call f2fs_balance_fs for checking and reclaiming space, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3024c9a1	06-Aug-2016	Chao Yu <chao@kernel.org>	Revert "f2fs: move i_size_write in f2fs_write_end" This reverts commit a2ee0a300344a6da76186129b078113354fe13d2. When testing with generic/032 of xfstest suit, failure message will be reported as below: generic/032 8s ... [failed, exit status 1] - output mismatch (see results/generic/032.out.bad) --- tests/generic/032.out 2015-01-11 16:52:27.643681072 +0800 +++ results/generic/032.out.bad 2016-08-06 13:44:43.861330500 +0800 @@ -1,5 +1,5 @@ QA output created by 032 -100 iterations -0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd -* -0100000 +1: [768..775]: unwritten +Unwritten extents found! ... (Run 'diff -u tests/generic/032.out results/generic/032.out.bad' to see the entire diff) Ran: generic/032 Failures: generic/032 Failed 1 of 1 tests In write_end(), we should update i_size of inode before unlock page, otherwise, we will lose newly updated data in following race condition. Thread A Thread B - write_end - unlock page - writepages - lock_page - writepage if page is out-of-range of file size, we will skip writting the page. - update i_size Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1aee6b9a	27-Jul-2016	Jens Axboe <axboe@fb.com>	f2fs: drop bio->bi_rw manual assignment Merge 4fc29c1aa375 included this extra line, but it's not needed (or useful) since we'll bio_set_op_attrs() right after to properly set the op and flags for the bio. Signed-off-by: Jens Axboe <axboe@fb.com>
# 8a5c743e	26-Jul-2016	Michal Hocko <mhocko@suse.com>	mm, memcg: use consistent gfp flags during readahead Vladimir has noticed that we might declare memcg oom even during readahead because read_pages only uses GFP_KERNEL (with mapping_gfp restriction) while __do_page_cache_readahead uses page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from OOMs. This gfp mask discrepancy is really unfortunate and easily fixable. Drop page_cache_alloc_readahead() which only has one user and outsource the gfp_mask logic into readahead_gfp_mask and propagate this mask from __do_page_cache_readahead down to read_pages. This alone would have only very limited impact as most filesystems are implementing ->readpages and the common implementation mpage_readpages does GFP_KERNEL (with mapping_gfp restriction) again. We can tell it to use readahead_gfp_mask instead as this function is called only during readahead as well. The same applies to read_cache_pages. ext4 has its own ext4_mpage_readpages but the path which has pages != NULL can use the same gfp mask. Btrfs, cifs, f2fs and orangefs are doing a very similar pattern to mpage_readpages so the same can be applied to them as well. [akpm@linux-foundation.org: coding-style fixes] [mhocko@suse.com: restrict gfp mask in mpage_alloc] Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Chris Mason <clm@fb.com> Cc: Steve French <sfrench@samba.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Cc: Mike Marshall <hubcap@omnibond.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Changman Lee <cm224.lee@samsung.com> Cc: Chao Yu <yuchao0@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 5302fb00	22-Jul-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: clean up coding style and redundancy This patch includes minor clean-ups. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 9dfa1baf	13-Jul-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: use blk_plug in all the possible paths This patch reverts 19a5f5e2ef37 (f2fs: drop any block plugging), and adds blk_plug in write paths additionally. The main reason is that blk_start_plug can be used to wake up from low-power mode before submitting further bios. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 82e0a5aa	12-Jul-2016	Chao Yu <chao@kernel.org>	f2fs: fix to avoid data update racing between GC and DIO Datas in file can be operated by GC and DIO simultaneously, so we will face race case as below: For write case: Thread A Thread B - generic_file_direct_write - invalidate_inode_pages2_range - f2fs_direct_IO - do_blockdev_direct_IO - do_direct_IO - get_more_blocks - f2fs_gc - do_garbage_collect - gc_data_segment - move_data_page - do_write_data_page migrate data block to new block address - dio_bio_submit update user data to old block address For read case: Thread A Thread B - generic_file_direct_write - invalidate_inode_pages2_range - f2fs_direct_IO - do_blockdev_direct_IO - do_direct_IO - get_more_blocks - f2fs_balance_fs - f2fs_gc - do_garbage_collect - gc_data_segment - move_data_page - do_write_data_page migrate data block to new block address - write_checkpoint - do_checkpoint - clear_prefree_segments - f2fs_issue_discard discard old block adress - dio_bio_submit update user buffer from obsolete block address In order to fix this, for one file, we should let DIO and GC getting exclusion against with each other. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1d353eb7	12-Jul-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix ERR_PTR returned by bio This is to fix wrong error pointer handling flow reported by Dan. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a2ee0a30	07-Jul-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: move i_size_write in f2fs_write_end We don't need to do i_size_write under page lock. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 237c0790	30-Jun-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: call SetPageUptodate if needed SetPageUptodate() issues memory barrier, resulting in performance degrdation. Let's avoid that. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# fe76b796	30-Jun-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: introduce f2fs_set_page_dirty_nobuffer This patch adds f2fs_set_page_dirty_nobuffer() copied from __set_page_dirty_buffer. When appending 4KB blocks in f2fs on pmem with multiple cores, this improves the overall performance. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1563ac75	03-Jul-2016	Chao Yu <chao@kernel.org>	f2fs: fix to detect truncation prior rather than EIO during read In procedure of synchonized read, after sending out the read request, reader will try to lock the page for waiting device to finish the read jobs and unlock the page, but meanwhile, truncater will race with reader, so after reader get lock of the page, it should check page's mapping to detect whether someone has truncated the page in advance, then reader has the chance to do the retry if truncation was done, otherwise read can be failed due to previous condition check. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 78682f79	03-Jul-2016	Chao Yu <chao@kernel.org>	f2fs: fix to avoid reading out encrypted data in page cache For encrypted inode, if user overwrites data of the inode, f2fs will read encrypted data into page cache, and then do the decryption. However reader can race with overwriter, and it will see encrypted data which has not been decrypted by overwriter yet. Fix it by moving decrypting work to background and keep page non-uptodated until data is decrypted. Thread A Thread B - f2fs_file_write_iter - __generic_file_write_iter - generic_perform_write - f2fs_write_begin - f2fs_submit_page_bio - generic_file_read_iter - do_generic_file_read - lock_page_killable - unlock_page - copy_page_to_iter hit the encrypted data in updated page - lock_page - fscrypt_decrypt_page Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ac6f1999	16-Jun-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid latency-critical readahead of node pages The f2fs_map_blocks is very related to the performance, so let's avoid any latency to read ahead node pages. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 52763a4b	13-Jun-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: detect host-managed SMR by feature flag If mkfs.f2fs gives a feature flag for host-managed SMR, we can set mode=lfs by default. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 36abef4e	03-Jun-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: introduce mode=lfs mount option This mount option is to enable original log-structured filesystem forcefully. So, there should be no random writes for main area. Especially, this supports host-managed SMR device. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 19a5f5e2	04-Jun-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: drop any block plugging In f2fs, we don't need to keep block plugging for NODE and DATA writes, since we already merged bios as much as possible. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7f319975	03-Jun-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: set mapping error for EIO If EIO occurred, we need to set all the mapping to avoid any further IOs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 04d328de	05-Jun-2016	Mike Christie <mchristi@redhat.com>	f2fs: use bio op accessors Separate the op from the rq_flag_bits and have f2fs set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# 4e49ea4a	05-Jun-2016	Mike Christie <mchristi@redhat.com>	block/fs/drivers: remove rw argument from submit_bio This has callers of submit_bio/submit_bio_wait set the bio->bi_rw instead of passing it in. This makes that use the same as generic_make_request and how we set the other bio fields. Signed-off-by: Mike Christie <mchristi@redhat.com> Fixed up fs/ext4/crypto.c Signed-off-by: Jens Axboe <axboe@fb.com>
# b230e6ca	29-May-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: handle writepage correctly Previously, f2fs_write_data_pages() calls __f2fs_writepage() which calls f2fs_write_data_page(). If f2fs_write_data_page() returns AOP_WRITEPAGE_ACTIVATE, __f2fs_writepage() calls mapping_set_error(). But, this should not happen at every time, since sometimes f2fs_write_data_page() tries to skip writing pages without error. For example, volatile_write() gives EIO all the time, as Shuoran Liu pointed out. Reported-by: Shuoran Liu <liushuoran@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 46ae957f	25-May-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: remove two steps to flush dirty data pages If there is no cold page, we don't need to do a loop to flush dirty data pages. On /dev/pmem0, 1. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048 conv=fsync Before : 1.1 GB/s After : 1.2 GB/s 2. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048 Before : 2.2 GB/s After : 2.3 GB/s Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 28ea6162	25-May-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: do not skip writing data pages For data pages, let's try to flush as much as possible in background. On /dev/pmem0, 1. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048 conv=fsync Before : 800 MB/s After : 1.1 GB/s 2. dd if=/dev/zero of=/mnt/test/testfile bs=1M count=2048 Before : 1.3 GB/s After : 2.2 GB/s Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b93f7712	20-May-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: remove writepages lock This patch removes writepages lock. We can improve multi-threading performance. tiobench, 32 threads, 4KB write per fsync on SSD Before: 25.88 MB/s After: 28.03 MB/s Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 26de9b11	20-May-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid unnecessary updating inode during fsync If roll-forward recovery can recover i_size, we don't need to update inode's metadata during fsync. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ee6d182f	20-May-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: remove syncing inode page in all the cases This patch reduces to call them across the whole tree. - sync_inode_page() - update_inode_page() - update_inode() - f2fs_write_inode() Instead, checkpoint will flush all the dirty inode metadata before syncing node pages. Note that, this is doable, since we call mark_inode_dirty_sync() for all inode's field change which needs to update on-disk inode as well. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8edd03c8	20-May-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: introduce f2fs_i_blocks_write with mark_inode_dirty_sync This patch introduces f2fs_i_blocks_write() to call mark_inode_dirty_sync() when changing inode->i_blocks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# fc9581c8	20-May-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: introduce f2fs_i_size_write with mark_inode_dirty_sync This patch introduces f2fs_i_size_write() to call mark_inode_dirty_sync() with i_size_write(). Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 91942321	20-May-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: use inode pointer for {set, clear}_inode_flag This patch refactors to use inode pointer for set_inode_flag and clear_inode_flag. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 38f91ca8	18-May-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: flush pending bios right away when error occurs Given errors, this patch flushes pending bios as soon as possible. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f5730184	17-May-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: use bio count instead of F2FS_WRITEBACK page count This can reduce page counting overhead. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ab47036d	11-May-2016	Chao Yu <chao@kernel.org>	f2fs: fix deadlock when flush inline data Below backtrace info was reported by Yunlei He: Call Trace: [<ffffffff817a9395>] schedule+0x35/0x80 [<ffffffff817abb7d>] rwsem_down_read_failed+0xed/0x130 [<ffffffff813c12a8>] call_rwsem_down_read_failed+0x18/0x [<ffffffff817ab1d0>] down_read+0x20/0x30 [<ffffffffa02a1a12>] f2fs_evict_inode+0x242/0x3a0 [f2fs] [<ffffffff81217057>] evict+0xc7/0x1a0 [<ffffffff81217cd6>] iput+0x196/0x200 [<ffffffff812134f9>] __dentry_kill+0x179/0x1e0 [<ffffffff812136f9>] dput+0x199/0x1f0 [<ffffffff811fe77b>] __fput+0x18b/0x220 [<ffffffff811fe84e>] ____fput+0xe/0x10 [<ffffffff81097427>] task_work_run+0x77/0x90 [<ffffffff81074d62>] exit_to_usermode_loop+0x73/0xa2 [<ffffffff81003b7a>] do_syscall_64+0xfa/0x110 [<ffffffff817acf65>] entry_SYSCALL64_slow_path+0x25/0x25 Call Trace: [<ffffffff817a9395>] schedule+0x35/0x80 [<ffffffff81216dc3>] __wait_on_freeing_inode+0xa3/0xd0 [<ffffffff810bc300>] ? autoremove_wake_function+0x40/0x4 [<ffffffff8121771d>] find_inode_fast+0x7d/0xb0 [<ffffffff8121794a>] ilookup+0x6a/0xd0 [<ffffffffa02bc740>] sync_node_pages+0x210/0x650 [f2fs] [<ffffffff8122e690>] ? do_fsync+0x70/0x70 [<ffffffffa02b085e>] block_operations+0x9e/0xf0 [f2fs] [<ffffffff8137b795>] ? bio_endio+0x55/0x60 [<ffffffffa02b0942>] write_checkpoint+0x92/0xba0 [f2fs] [<ffffffff8117da57>] ? mempool_free_slab+0x17/0x20 [<ffffffff8117de8b>] ? mempool_free+0x2b/0x80 [<ffffffff8122e690>] ? do_fsync+0x70/0x70 [<ffffffffa02a53e3>] f2fs_sync_fs+0x63/0xd0 [f2fs] [<ffffffff8129630f>] ? ext4_sync_fs+0xbf/0x190 [<ffffffff8122e6b0>] sync_fs_one_sb+0x20/0x30 [<ffffffff812002e9>] iterate_supers+0xb9/0x110 [<ffffffff8122e7b5>] sys_sync+0x55/0x90 [<ffffffff81003ae9>] do_syscall_64+0x69/0x110 [<ffffffff817acf65>] entry_SYSCALL64_slow_path+0x25/0x25 With following excuting serials, we will set inline_node in inode page after inode was unlinked, result in a deadloop described as below: 1. open file 2. write file 3. unlink file 4. write file 5. close file Thread A Thread B - dput - iput_final - inode->i_state \|= I_FREEING - evict - f2fs_evict_inode - f2fs_sync_fs - write_checkpoint - block_operations - f2fs_lock_all (down_write(cp_rwsem)) - f2fs_lock_op (down_read(cp_rwsem)) - sync_node_pages - ilookup - find_inode_fast - __wait_on_freeing_inode (wait on I_FREEING clear) Here, we change to set inline_node flag only for linked inode for fixing. Reported-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Tested-by: Jaegeuk Kim <jaegeuk@kernel.org> Cc: stable@vger.kernel.org # v4.6 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 46008c6d	09-May-2016	Chao Yu <chao@kernel.org>	f2fs: support in batch multi blocks preallocation This patch introduces reserve_new_blocks to make preallocation of multi blocks as in batch operation, so it can avoid lots of redundant operation, result in better performance. In virtual machine, with rotational device: time fallocate -l 32G /mnt/f2fs/file Before: real 0m4.584s user 0m0.000s sys 0m4.580s After: real 0m0.292s user 0m0.000s sys 0m0.272s In x86, with SSD: time fallocate -l 500G $MNT/testfile Before : 24.758 s After : 1.604 s Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix bugs and add performance numbers measured in x86.] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0080c507	07-May-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: do not preallocate block unaligned to 4KB Previously f2fs_preallocate_blocks() tries to allocate unaligned blocks. In f2fs_write_begin(), however, prepare_write_begin() does not skip its allocation due to (len != 4KB). So, it needs locking node page twice unexpectedly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 43473f96	05-May-2016	Chao Yu <chao@kernel.org>	f2fs: fix incorrect mapping in ->bmap Currently, generic_block_bmap is used in f2fs_bmap, its semantics is when the mapping is been found, return position of target physical block, otherwise return zero. But, previously, when there is no mapping info for specified logical block, f2fs_bmap will map target physical block to a uninitialized variable, which should be wrong. Fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 23dc974e	29-Apr-2016	Chao Yu <chao@kernel.org>	f2fs: fix to clear private data in page Private data in page should be removed during ->releasepage or ->invalidatepage, otherwise garbage data would be remained in that page. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c8b8e32d	07-Apr-2016	Christoph Hellwig <hch@lst.de>	direct-io: eliminate the offset argument to ->direct_IO Including blkdev_direct_IO and dax_do_io. It has to be ki_pos to actually work, so eliminate the superflous argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 6bfc4919	18-Apr-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: issue cache flush on direct IO Under direct IO path with O_(D)SYNC, it needs to set proper APPEND or UPDATE flags, so taht f2fs_sync_file can make its data safe. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e6e5f561	14-Apr-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid writing 0'th page in volatile writes The first page of volatile writes usually contains a sort of header information which will be used for recovery. (e.g., journal header of sqlite) If this is written without other journal data, user needs to handle the stale journal information. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4da7bf5a	06-Apr-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: remove redundant condition check This patch resolves the redundant condition check reported by David. Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b32e4482	11-Apr-2016	Jaegeuk Kim <jaegeuk@kernel.org>	fscrypto: don't let data integrity writebacks fail with ENOMEM This patch fixes the issue introduced by the ext4 crypto fix in a same manner. For F2FS, however, we flush the pending IOs and wait for a while to acquire free memory. Fixes: c9af28fdd4492 ("ext4 crypto: don't let data integrity writebacks fail with ENOMEM") Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 09cbfeaf	01-Apr-2016	Kirill A. Shutemov <kirill.shutemov@linux.intel.com>	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced long time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 0b81d077	15-May-2015	Jaegeuk Kim <jaegeuk@kernel.org>	fs crypto: move per-file encryption from f2fs tree to fs/crypto This patch adds the renamed functions moved from the f2fs crypto files. 1. definitions for per-file encryption used by ext4 and f2fs. 2. crypto.c for encrypt/decrypt functions a. IO preparation: - fscrypt_get_ctx / fscrypt_release_ctx b. before IOs: - fscrypt_encrypt_page - fscrypt_decrypt_page - fscrypt_zeroout_range c. after IOs: - fscrypt_decrypt_bio_pages - fscrypt_pullback_bio_page - fscrypt_restore_control_page 3. policy.c supporting context management. a. For ioctls: - fscrypt_process_policy - fscrypt_get_policy b. For context permission - fscrypt_has_permitted_context - fscrypt_inherit_context 4. keyinfo.c to handle permissions - fscrypt_get_encryption_info - fscrypt_free_encryption_info 5. fname.c to support filename encryption a. general wrapper functions - fscrypt_fname_disk_to_usr - fscrypt_fname_usr_to_disk - fscrypt_setup_filename - fscrypt_free_filename b. specific filename handling functions - fscrypt_fname_alloc_buffer - fscrypt_fname_free_buffer 6. Makefile and Kconfig Cc: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Ildar Muslukhov <ildarm@google.com> Signed-off-by: Uday Savagaonkar <savagaon@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 406657dd	24-Feb-2016	Chao Yu <chao@kernel.org>	f2fs: introduce f2fs_flush_merged_bios for cleanup Add a new helper f2fs_flush_merged_bios to clean up redundant codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f28b3434	24-Feb-2016	Chao Yu <chao@kernel.org>	f2fs: introduce f2fs_update_data_blkaddr for cleanup Add a new help f2fs_update_data_blkaddr to clean up redundant codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7a9d7548	22-Feb-2016	Chao Yu <chao@kernel.org>	f2fs: trace old block address for CoWed page This patch enables to trace old block address of CoWed page for better debugging. f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 28bc106b	05-Feb-2016	Chao Yu <chao@kernel.org>	f2fs: support revoking atomic written pages f2fs support atomic write with following semantics: 1. open db file 2. ioctl start atomic write 3. (write db file) * n 4. ioctl commit atomic write 5. close db file With this flow we can avoid file becoming corrupted when abnormal power cut, because we hold data of transaction in referenced pages linked in inmem_pages list of inode, but without setting them dirty, so these data won't be persisted unless we commit them in step 4. But we should still hold journal db file in memory by using volatile write, because our semantics of 'atomic write support' is incomplete, in step 4, we could fail to submit all dirty data of transaction, once partial dirty data was committed in storage, then after a checkpoint & abnormal power-cut, db file will be corrupted forever. So this patch tries to improve atomic write flow by adding a revoking flow, once inner error occurs in committing, this gives another chance to try to revoke these partial submitted data of current transaction, it makes committing operation more like aotmical one. If we're not lucky, once revoking operation was failed, EAGAIN will be reported to user for suggesting doing the recovery with held journal file, or retrying current transaction again. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ce855a3b	05-Feb-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs crypto: f2fs_page_crypto() doesn't need a encryption context This patch adopts: ext4 crypto: ext4_page_crypto() doesn't need a encryption context Since ext4_page_crypto() doesn't need an encryption context (at least not any more), this allows us to simplify a number function signature and also allows us to avoid needing to allocate a context in ext4_block_write_begin(). It also means we no longer need a separate ext4_decrypt_one() function. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 24b84912	03-Feb-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: preallocate blocks for buffered aio writes This patch preallocates data blocks for buffered aio writes. With this patch, we can avoid redundant locking and unlocking of node pages given consecutive aio request. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b439b103	03-Feb-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: move dio preallocation into f2fs_file_write_iter This patch moves preallocation code for direct IOs into f2fs_file_write_iter. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# d31c7c3f	04-Feb-2016	Yunlei He <heyunlei@huawei.com>	f2fs: fix missing skip pages info fix missing skip pages info in f2fs_writepages trace event. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0c3a5797	18-Jan-2016	Chao Yu <chao@kernel.org>	f2fs: introduce f2fs_submit_merged_bio_cond f2fs use single bio buffer per type data (META/NODE/DATA) for caching writes locating in continuous block address as many as possible, after submitting, these writes may be still cached in bio buffer, so we have to flush cached writes in bio buffer by calling f2fs_submit_merged_bio. Unfortunately, in the scenario of high concurrency, bio buffer could be flushed by someone else before we submit it as below reasons: a) there is no space in bio buffer. b) add a request of different type (SYNC, ASYNC). c) add a discontinuous block address. For this condition, f2fs_submit_merged_bio will be devastating, because it could break the following merging of writes in bio buffer, split one big bio into two smaller one. This patch introduces f2fs_submit_merged_bio_cond which can do a conditional submitting with bio buffer, before submitting it will judge whether: - page in DATA type bio buffer is matching with specified page; - page in DATA type bio buffer is belong to specified inode; - page in NODE type bio buffer is belong to specified inode; If there is no eligible page in bio buffer, we will skip submitting step, result in gaining more chance to merge consecutive block IOs in bio cache. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# da85985c	26-Jan-2016	Chao Yu <chao@kernel.org>	f2fs: speed up handling holes in fiemap This patch makes f2fs_map_blocks supporting returning next potential page offset which skips hole region in indirect tree of inode, and use it to speed up fiemap in handling big hole case. Test method: xfs_io -f /mnt/f2fs/file -c "pwrite 1099511627776 4096" time xfs_io -f /mnt/f2fs/file -c "fiemap -v" Before: time xfs_io -f /mnt/f2fs/file -c "fiemap -v" /mnt/f2fs/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2147483647]: hole 2147483648 1: [2147483648..2147483655]: 81920..81927 8 0x1 real 3m3.518s user 0m0.000s sys 3m3.456s After: time xfs_io -f /mnt/f2fs/file -c "fiemap -v" /mnt/f2fs/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2147483647]: hole 2147483648 1: [2147483648..2147483655]: 81920..81927 8 0x1 real 0m0.008s user 0m0.000s sys 0m0.008s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 81ca7350	26-Jan-2016	Chao Yu <chao@kernel.org>	f2fs: remove unneeded pointer conversion There are redundant pointer conversion in following call stack: - at position a, inode was been converted to f2fs_file_info. - at position b, f2fs_file_info was been converted to inode again. - truncate_blocks(inode,..) - fi = F2FS_I(inode) ---a - ADDRS_PER_PAGE(node_page, fi) - addrs_per_inode(fi) - inode = &fi->vfs_inode ---b - f2fs_has_inline_xattr(inode) - fi = F2FS_I(inode) - is_inode_flag_set(fi,..) In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and addrs_per_inode to acept parameter with type of inode pointer. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5b8db7fa	26-Jan-2016	Chao Yu <chao@kernel.org>	f2fs: simplify __allocate_data_blocks This patch uses existing function f2fs_map_block to simplify implementation of __allocate_data_blocks. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4fe71e88	26-Jan-2016	Chao Yu <chao@kernel.org>	f2fs: simplify f2fs_map_blocks In f2fs_map_blocks, we use duplicated codes to handle first block mapping and the following blocks mapping, it's unnecessary. This patch simplifies f2fs_map_blocks to avoid using copied codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7c506896	26-Jan-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: use wq_has_sleeper for cp_wait wait_queue We need to use wq_has_sleeper including smp_mb to consider cp_wait concurrency. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# fec1d657	20-Jan-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: use wait_for_stable_page to avoid contention In write_begin, if storage supports stable_page, we don't need to wait for writeback to update its contents. This patch introduces to use wait_for_stable_page instead of wait_on_page_writeback. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e3ef1876	25-Jan-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: don't need to call set_page_dirty for io error If end_io gets an error, we don't need to set the page as dirty, since we already set f2fs_stop_checkpoint which will not flush any data. This will resolve the following warning. ====================================================== [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] 4.4.0+ #9 Tainted: G O ------------------------------------------------------ xfs_io/26773 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: (&(&sbi->inode_lock[i])->rlock){+.+...}, at: [<ffffffffc025483f>] update_dirty_page+0x6f/0xd0 [f2fs] and this task is already holding: (&(&q->__queue_lock)->rlock){-.-.-.}, at: [<ffffffff81396ea2>] blk_queue_bio+0x422/0x490 which would create a new lock dependency: (&(&q->__queue_lock)->rlock){-.-.-.} -> (&(&sbi->inode_lock[i])->rlock){+.+...} Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ae96e7bd	25-Jan-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid needless sync_inode_page when reading inline_data In write_begin, if there is an inline_data, f2fs loads it into 0'th data page. Since it's the read path, we don't need to sync its inode page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 52f80337	25-Jan-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: don't need to sync node page at every time In write_end, we don't need to sync inode page at every time. Instead, we can expect f2fs_write_inode will update later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2049d4fc	25-Jan-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid multiple node page writes due to inline_data The sceanrio is: 1. create fully node blocks 2. flush node blocks 3. write inline_data for all the node blocks again 4. flush node blocks redundantly So, this patch tries to flush inline_data when flushing node blocks. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3c082b7b	23-Jan-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: do f2fs_balance_fs when block is allocated We should consider data block allocation to trigger f2fs_balance_fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 25c13551	20-Jan-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: use writepages->lock for WB_SYNC_ALL If there are many writepages calls by multiple threads in background, we don't need to serialize to merge all the bios, since it's background. In such the case, it'd better to run writepages concurrently. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b483fadf	19-Jan-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: remove needless condition check This patch removes needless condition variable. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0fd785eb	18-Jan-2016	Chao Yu <chao@kernel.org>	f2fs: relocate is_merged_page Operations in is_merged_page is related to inner bio cache, move it to data.c. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5955102c	22-Jan-2016	Al Viro <viro@zeniv.linux.org.uk>	wrappers for ->i_mutex access parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# d0239e1b	08-Jan-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: detect idle time depending on user behavior This patch adds last time that user requested filesystem operations. This information is used to detect whether system is idle or not later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# da5af127	08-Jan-2016	Chao Yu <chao@kernel.org>	f2fs: recognize encrypted data in f2fs_fiemap This patch fixes to teach f2fs_fiemap to recognize encrypted data. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2c4db1a6	07-Jan-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: clean up f2fs_balance_fs This patch adds one parameter to clean up all the callers of f2fs_balance_fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 12719ae1	07-Jan-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid unnecessary f2fs_balance_fs calls Only when node page is newly dirtied, it needs to check whether we need to do f2fs_gc. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7612118a	01-Jan-2016	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: check the page status filled from disk After reading a page, we need to check whether there is any error. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# de1475cc	04-Jan-2016	Fan Li <fanofcode.li@samsung.com>	f2fs: read isize while holding i_mutex in fiemap make sure the isize we read doesn't change during the process. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e0afc4d6	30-Dec-2015	Chao Yu <chao@kernel.org>	f2fs: introduce max_file_blocks in sbi Introduce max_file_blocks in sbi to store max block index of file in f2fs, it could be used to avoid unneeded calculation of max block index in runtime. Signed-off-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: fix overflow of sbi->max_file_blocks] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8d4ea29b	31-Dec-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: write pending bios when cp_error is set When testing ioc_shutdown, put_super is able to be hanged by waiting for writebacking pages as follows. INFO: task umount:2723 blocked for more than 120 seconds. Tainted: G O 4.4.0-rc3+ #8 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. umount D ffff88000859f9d8 0 2723 2110 0x00000000 ffff88000859f9d8 0000000000000000 0000000000000000 ffffffff81e11540 ffff880078c225c0 ffff8800085a0000 ffff88007fc17440 7fffffffffffffff ffffffff818239f0 ffff88000859fb48 ffff88000859f9f0 ffffffff8182310c Call Trace: [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff8182310c>] schedule+0x3c/0x90 [<ffffffff81827fb9>] schedule_timeout+0x2d9/0x430 [<ffffffff810e0f8f>] ? mark_held_locks+0x6f/0xa0 [<ffffffff8111614d>] ? ktime_get+0x7d/0x140 [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff8106a655>] ? kvm_clock_get_cycles+0x25/0x30 [<ffffffff8111617c>] ? ktime_get+0xac/0x140 [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff81822564>] io_schedule_timeout+0xa4/0x110 [<ffffffff81823a25>] bit_wait_io+0x35/0x50 [<ffffffff818235bd>] __wait_on_bit+0x5d/0x90 [<ffffffff811b9e8b>] wait_on_page_bit+0xcb/0xf0 [<ffffffff810d5f90>] ? autoremove_wake_function+0x40/0x40 [<ffffffff811cf84c>] truncate_inode_pages_range+0x4bc/0x840 [<ffffffff811cfc3d>] truncate_inode_pages_final+0x4d/0x60 [<ffffffffc023ced5>] f2fs_evict_inode+0x75/0x400 [f2fs] [<ffffffff812639bc>] evict+0xbc/0x190 [<ffffffff81263d19>] iput+0x229/0x2c0 [<ffffffffc0241885>] f2fs_put_super+0x105/0x1a0 [f2fs] [<ffffffff8124756a>] generic_shutdown_super+0x6a/0xf0 [<ffffffff812478f7>] kill_block_super+0x27/0x70 [<ffffffffc0241290>] kill_f2fs_super+0x20/0x30 [f2fs] [<ffffffff81247b03>] deactivate_locked_super+0x43/0x70 [<ffffffff81247f4c>] deactivate_super+0x5c/0x60 [<ffffffff81268d2f>] cleanup_mnt+0x3f/0x90 [<ffffffff81268dc2>] __cleanup_mnt+0x12/0x20 [<ffffffff810ac463>] task_work_run+0x73/0xa0 [<ffffffff810032ac>] exit_to_usermode_loop+0xcc/0xd0 [<ffffffff81003e7c>] syscall_return_slowpath+0xcc/0xe0 [<ffffffff81829ea2>] int_ret_from_sys_call+0x25/0x9f Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 819d9153	28-Dec-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: use i_size_read to get i_size We need to use i_size_read() to get inode->i_size. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 179448bf	28-Dec-2015	Yunlei He <heyunlei@huawei.com>	f2fs: add a max block check for get_data_block_bmap This patch adds a max block check for get_data_block_bmap. Trinity test program will send a block number as parameter into ioctl_fibmap, which will be used in get_node_path(), when the block number large than f2fs max blocks, it will trigger kernel bug. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com> [Jaegeuk Kim: fix missing condition, pointed by Chao Yu] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 9a950d52	26-Dec-2015	Fan Li <fanofcode.li@samsung.com>	f2fs: fix bugs and simplify codes of f2fs_fiemap fix bugs: 1. len could be updated incorrectly when start+len is beyond isize. 2. If there is a hole consisting of more than two blocks, it could fail to add FIEMAP_EXTENT_LAST flag for the last extent. 3. If there is an extent beyond isize, when we search extents in a range that ends at isize, it will also return the extent beyond isize, which is outside the range. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6d5a1495	24-Dec-2015	Chao Yu <chao@kernel.org>	f2fs: let user being aware of IO error Sometimes we keep dumb when IO error occur in lower layer device, so user will not receive any error return value for some operation, but actually, the operation did not succeed. This sould be avoided, so this patch reports such kind of error to user. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b4d07a3e	23-Dec-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid f2fs_lock_op in f2fs_write_begin If f2fs_write_begin is to update data, we can bypass calling f2fs_lock_op() in order to avoid the checkpoint latency in the write syscall. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2aadac08	23-Dec-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: introduce prepare_write_begin to clean up This patch adds prepare_write_begin to clean f2fs_write_begin. The major role of this function is to convert any inline_data and allocate or find block address. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2a340760	22-Dec-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: call f2fs_balance_fs only when node was changed If user tries to update or read data, we don't need to call f2fs_balance_fs which triggers f2fs_gc, which increases unnecessary long latency. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3104af35	23-Dec-2015	Chao Yu <chao@kernel.org>	f2fs: reduce covered region of sbi->cp_rwsem in f2fs_map_blocks Only cover sbi->cp_rwsem on one dnode page's allocation and modification instead of multiple's in f2fs_map_blocks, it can reduce the covered region of cp_rwsem, then we can avoid potential long time delay for concurrent checkpointer. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 93bae099	22-Dec-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: record node block allocation in dnode_of_data This patch introduces recording node block allocation in dnode_of_data. This information helps to figure out whether any node block is allocated during specific file operations. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b9d777b8	22-Dec-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: check inline_data flag at converting time We can check inode's inline_data flag when calling to convert it. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7df3a431	16-Dec-2015	Fan Li <fanofcode.li@samsung.com>	f2fs: optimize the flow of f2fs_map_blocks check map->m_len right after it changes to avoid excess call to update dnode_of_data. Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c227f912	15-Dec-2015	Chao Yu <chao@kernel.org>	f2fs: record dirty status of regular/symlink inode Maintain regular/symlink inode which has dirty pages in global dirty list and record their total dirty pages count like the way of handling directory inode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 9006f2c9	30-Nov-2015	Chao Yu <chao@kernel.org>	f2fs: kill f2fs_drop_largest_extent For direct IO, f2fs only allocate new address for the block which is not exist in the disk before, its mapping info should not exist in extent cache previously, so here we do not need to call f2fs_drop_largest_extent to drop related cache. Due to no more callers for f2fs_drop_largest_extent now, kill it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# eb7e813c	10-Nov-2015	Chao Yu <chao@kernel.org>	f2fs: fix to remove directory inode from dirty list If last dirty dentry page was writebacked in reclaim path, we should remove its directory inode from global dirty list to avoid unnecessary flush for this inode when doing checkpoint. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# d323d005	26-Oct-2015	Chao Yu <chao@kernel.org>	f2fs: support file defragment This patch introduces a new ioctl F2FS_IOC_DEFRAGMENT to support file defragment in a specified range of regular file. This ioctl can be used in very limited workload: if user expects high sequential read performance in randomly written file, this interface can be used for defragmentation, after that file can be written as continuous as possible in the device. Meanwhile, it has side-effect, it will make holes in segments where blocks located originally, so it's better to trigger GC to eliminate fragment in segments. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2da3e027	28-Oct-2015	Chao Yu <chao@kernel.org>	f2fs: commit atomic written page in LFS mode We should always commit atomic written pages in LFS mode, otherwise data will become corrupted if we encounter suddent power cut after partial pages committed in IPU mode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 67f8cf3c	15-Oct-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: support fiemap for inline_data There is a FIEMAP_EXTENT_INLINE_DATA, pointed out by Marc. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1d373a0e	19-Oct-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: flush dirty data for bmap Users expect bmap will give allocated block addresses. Let's play likewise ext4. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 08b39fbd	07-Oct-2015	Chao Yu <chao@kernel.org>	f2fs crypto: fix racing of accessing encrypted page among different competitors Since we use different page cache (normally inode's page cache for R/W and meta inode's page cache for GC) to cache the same physical block which is belong to an encrypted inode. Writeback of these two page cache should be exclusive, but now we didn't handle writeback state well, so there may be potential racing problem: a) kworker: f2fs_gc: - f2fs_write_data_pages - f2fs_write_data_page - do_write_data_page - write_data_page - f2fs_submit_page_mbio (page#1 in inode's page cache was queued in f2fs bio cache, and be ready to write to new blkaddr) - gc_data_segment - move_encrypted_block - pagecache_get_page (page#2 in meta inode's page cache was cached with the invalid datas of physical block located in new blkaddr) - f2fs_submit_page_mbio (page#1 was submitted, later, page#2 with invalid data will be submitted) b) f2fs_gc: - gc_data_segment - move_encrypted_block - f2fs_submit_page_mbio (page#1 in meta inode's page cache was queued in f2fs bio cache, and be ready to write to new blkaddr) user thread: - f2fs_write_begin - f2fs_submit_page_bio (we submit the request to block layer to update page#2 in inode's page cache with physical block located in new blkaddr, so here we may read gabbage data from new blkaddr since GC hasn't writebacked the page#1 yet) This patch fixes above potential racing problem for encrypted inode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b8c29400	12-Oct-2015	Chao Yu <chao@kernel.org>	f2fs: add a tracepoint for f2fs_read_data_pages This patch adds a tracepoint for f2fs_read_data_pages to trace when pages are readahead by VFS. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a56c7c6f	09-Oct-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: set GFP_NOFS for grab_cache_page For normal inodes, their pages are allocated with __GFP_FS, which can cause filesystem calls when reclaiming memory. This can incur a dead lock condition accordingly. So, this patch addresses this problem by introducing f2fs_grab_cache_page(.., bool for_write), which calls grab_cache_page_write_begin() with AOP_FLAG_NOFS. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a1257023	08-Oct-2015	Jaegeuk Kim <jaegeuk@kernel.org>	Revert "f2fs: do not skip dentry block writes" The periodic checkpoint can resolve the previous issue. So, now we can use this again to improve the reported performance regression: https://lkml.org/lkml/2015/10/8/20 This reverts commit 15bec0ff5a9ba6d203178fa8772259df6207942a.
# 90b803e6	25-Sep-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: do not skip dentry block writes Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory pressure and the number of dirty pages is pretty small. But, we didn't skip for normal data writes, which gives us not much big impact on overall performance. Moreover, by skipping some data writes, kworker falls into infinite loop to try to write blocks, when many dir inodes have only one dentry block. So, this patch removes skipping data writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 46c9e141	18-Sep-2015	Chao Yu <chao@kernel.org>	f2fs: use correct flag in f2fs_map_blocks() We introduce F2FS_GET_BLOCK_READ in commit e2b4e2bc8865 ("f2fs: fix incorrect mapping for bmap"), but forget to use this flag in the right place, fix it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f9811703	21-Sep-2015	Chao Yu <chao@kernel.org>	f2fs: fix to handle io error in ->direct_IO Here is a oops reported as following message when testing generic/019 of xfstest: ------------[ cut here ]------------ kernel BUG at /home/yuchao/git/f2fs-dev/segment.c:882! invalid opcode: 0000 [#1] SMP Modules linked in: zram lz4_compress lz4_decompress f2fs(O) ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_def CPU: 2 PID: 25441 Comm: fio Tainted: G O 4.3.0-rc1+ #6 Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.61 05/16/2013 task: ffff8803f4e85580 ti: ffff8803fd61c000 task.ti: ffff8803fd61c000 RIP: 0010:[<ffffffffa0784981>] [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs] RSP: 0018:ffff8803fd61f918 EFLAGS: 00010246 RAX: 00000000000007ed RBX: 0000000000000224 RCX: 000000000000001f RDX: 0000000000000800 RSI: ffffffffffffffff RDI: ffff8803f56f4300 RBP: ffff8803fd61f978 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000024 R11: ffff8800d23bbd78 R12: ffff8800d0ef0000 R13: 0000000000000224 R14: 0000000000000000 R15: 0000000000000001 FS: 00007f827ff85700(0000) GS:ffff88041ea80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffff600000 CR3: 00000003fef17000 CR4: 00000000001406e0 Stack: 000007ea00000002 0000000100000001 ffff8803f6456248 000007ed0000002b 0000000000000224 ffff880404d1aa20 ffff8803fd61f9c8 ffff8800d0ef0000 ffff8803f6456248 0000000000000001 00000000ffffffff ffffffffa078f358 Call Trace: [<ffffffffa0785b87>] allocate_segment_by_default+0x1a7/0x1f0 [f2fs] [<ffffffffa078322c>] allocate_data_block+0x17c/0x360 [f2fs] [<ffffffffa0779521>] __allocate_data_block+0x131/0x1d0 [f2fs] [<ffffffffa077a995>] f2fs_direct_IO+0x4b5/0x580 [f2fs] [<ffffffff811510ae>] generic_file_direct_write+0xae/0x160 [<ffffffff811518f5>] __generic_file_write_iter+0xd5/0x1f0 [<ffffffff81151e07>] generic_file_write_iter+0xf7/0x200 [<ffffffff81319e38>] ? apparmor_file_permission+0x18/0x20 [<ffffffffa0768480>] ? f2fs_fallocate+0x1190/0x1190 [f2fs] [<ffffffffa07684c6>] f2fs_file_write_iter+0x46/0x90 [f2fs] [<ffffffff8120b4fe>] aio_run_iocb+0x1ee/0x290 [<ffffffff81700f7e>] ? mutex_lock+0x1e/0x50 [<ffffffff8120a1d7>] ? aio_read_events+0x207/0x2b0 [<ffffffff8120b913>] do_io_submit+0x373/0x630 [<ffffffff8120a4f6>] ? SyS_io_getevents+0x56/0xb0 [<ffffffff8120bbe0>] SyS_io_submit+0x10/0x20 [<ffffffff81703857>] entry_SYSCALL_64_fastpath+0x12/0x6a Code: 45 c8 48 8b 78 10 e8 9f 23 bf e0 41 8b 8c 24 cc 03 00 00 89 c7 31 d2 89 c6 89 d8 29 df f7 f1 29 d1 39 cf 0f 83 be fd ff ff eb RIP [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs] RSP <ffff8803fd61f918> ---[ end trace 2e577d7f711ddb86 ]--- The reason is that: in the test of generic/019, we will trigger a manmade IO error in block layer through debugfs, after that, prefree segment will no longer be freed, because we always skip doing gc or checkpoint when there occurs an IO error. Meanwhile fio with aio engine generated a large number of direct IOs, which continue allocating spaces in free segment until we run out of them, eventually, results in panic in new_curseg as no more free segment was found. So, this patch changes to return EIO in direct_IO for this condition. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 973163fc	18-Sep-2015	Chao Yu <chao@kernel.org>	f2fs: reorganize f2fs_map_blocks In this patch, we try to reorganize f2fs_map_blocks to make block mapping flow more clear by using following structure: /* check status of mapping / if (unmapped) { / blkaddr == NULL_ADDR \|\| blkaddr == NEW_ADDR / if (create) { / write path, handle dio write case here / alloc_and_map; } else { / * handle read cases from all call paths: * 1. generic read; * 2. dio read; * 3. fiemap; * 4. bmap / } } / map buffer_header */ Besides, this patch handles the missing case correctly for dio write: When we fail in __allocate_data_blocks, then in f2fs_map_blocks, we will not allocate blocks correctly for preallocated blocks, but returning with an unmapped buffer head, which will result in failure of dio write. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 9edcdabf	11-Sep-2015	Chao Yu <chao@kernel.org>	f2fs: fix overflow of size calculation We have potential overflow issue when calculating size of object, when we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only 32-bits space in 32-bit architecture, left shifting will incur overflow, i.e: pgoff_t index = 0xFFFFFFFF; loff_t size = index << PAGE_CACHE_SHIFT; size: 0xFFFFF000 So we should cast index with 64-bits type to avoid this issue. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e2b4e2bc	19-Aug-2015	Chao Yu <chao@kernel.org>	f2fs: fix incorrect mapping for bmap The test step is like below: 1. touch file 2. truncate -s $((10241024)) file 3. fallocate -o 0 -l $((10241024)) file 4. fibmap.f2fs file Our result of fibmap.f2fs showed below is not correct: file_pos start_blk end_blk blks 0 -937166132 -937166132 1 4096 -937166132 -937166132 1 8192 -937166132 -937166132 1 12288 -937166132 -937166132 1 16384 -937166132 -937166132 1 20480 -937166132 -937166132 1 ... 1040384 -937166132 -937166132 1 1044480 -937166132 -937166132 1 This is because f2fs_map_blocks will return with no error when meeting a hole or preallocated block, the caller __get_data_block will map the uninitialized variable value to bh->b_blocknr. Unfortunately generic_block_bmap will neither check the return value of get_data() nor check mapping info of buffer_head, result in returning the random block address. After fixing the issue, our result shows correctly: file_pos start_blk end_blk blks 0 0 0 256 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 740432f8	14-Aug-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: handle failed bio allocation As the below comment of bio_alloc_bioset, f2fs can allocate multiple bios at the same time. So, we can't guarantee that bio is allocated all the time. " * When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be * able to allocate a bio. This is due to the mempool guarantees. To make this * work, callers must never allocate more than 1 bio at a time from this pool. * Callers that need to allocate more than 1 bio must always submit the * previously allocated bio for IO before attempting to allocate a new one. * Failure to do so can cause deadlocks under memory pressure. " Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b54ffb73	19-May-2015	Kent Overstreet <kent.overstreet@gmail.com>	block: remove bio_get_nr_vecs() We can always fill up the bio now, no need to estimate the possible size based on queue parameters. Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [hch: rebased and wrote a changelog] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# decd36b6	07-Aug-2015	Chao Yu <chao@kernel.org>	f2fs: remove inmem radix tree Previously, we use radix tree to index all registered page entries for atomic file, but now we only use radix tree to see whether current page is indexed or not, since the other user of radix tree is gone in commit 042b7816aaeb ("f2fs: remove unnecessary call to invalidate inmemory pages"). So in this patch, we try to use one more efficient way: Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private value to indicate page indexing status. By using this way, we can save memory and lookup time. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c15e8599	07-Aug-2015	Chao Yu <chao@kernel.org>	f2fs: report EINVAL for unalignment direct IO We run ltp testcase with f2fs and obtain a TFAIL in diotest4, the result in detail is as fallow: dio04 <<<test_start>>> tag=dio04 stime=1432278894 cmdline="diotest4" contacts="" analysis=exit <<<test_output>>> diotest4 1 TPASS : Negative Offset diotest4 2 TPASS : removed diotest4 3 TFAIL : diotest4.c:129: write allows odd count.returns 1: Success diotest4 4 TFAIL : diotest4.c:183: Odd count of read and write diotest4 5 TPASS : Read beyond the file size ...... the result of ext4 with same environment: dio04 <<<test_start>>> tag=dio04 stime=1432259643 cmdline="diotest4" contacts="" analysis=exit <<<test_output>>> diotest4 1 TPASS : Negative Offset diotest4 2 TPASS : removed diotest4 3 TPASS : Odd count of read and write diotest4 4 TPASS : Read beyond the file size ...... The reason is that when triggering DIO in f2fs, we will return zero value in ->direct_IO if writer's buffer offset, file offset and transfer size is not alignment to block size of filesystem, resulting in falling back into buffered write instead of returning -EINVAL. This patch fixes that problem by returning correct error number for above case, and removing the judgement condition in check_direct_IO to make sure the verification will be enabled for direct reader too. Besides, Jaegeuk Kim pointed out that there is expectional cases we should always make direct-io falling back into buffered write, such as dio in encrypted file. Signed-off-by: Yunlei He <heyunlei@huawei.com> [Chao Yu make small change and add detail description in commit message] Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 759af1c9	05-Aug-2015	Fan Li <fanofcode.li@samsung.com>	f2fs: use extent cache to optimize f2fs_reserve_block In some cases, we only need the block address when we call f2fs_reserve_block, other fields of struct dnode_of_data aren't necessary. We can try extent cache first for such cases in order to speed up the process. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 470f00e9	14-Jul-2015	Chao Yu <chao@kernel.org>	f2fs: fix to release inode page correctly In following call path, we will pass a locked and referenced ipage pointer to get_new_data_page: - init_inode_metadata - make_empty_dir - get_new_data_page There are two exit paths in get_new_data_page when error occurs: 1) grab_cache_page fails, ipage will not be released; 2) f2fs_reserve_block fails, ipage will be released in callee. So, it's not consistent for error handling in get_new_data_page. For f2fs_reserve_block, it's not very easy to change the rule of error handling, since it's already complicated. Here we deside to choose an easy way to fix this issue: If any error occur in get_new_data_page, we will ensure releasing ipage in this function. The same issue is in f2fs_convert_inline_dir, fix that too. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5768dcdd	03-Aug-2015	Fan Li <fanofcode.li@samsung.com>	f2fs: change the timing of f2fs_wait_on_page_writeback some backing devices need pages to be stable during writeback. It doesn't matter if the page is completely overwritten or already uptodate, it needs to wait before write. Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6a290544	17-Jul-2015	Chao Yu <chao@kernel.org>	f2fs: skip writing in ->writepages when no dirty pages exist When flushing comes from background, if there is no dirty page in the mapping of inode, we'd better to skip seeking dirty page from mapping for writebacking. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 737f1899	16-Jul-2015	Tiezhu Yang <yangtiezhu@loongson.cn>	f2fs: optimize f2fs_write_cache_pages The if statement "goto continue_unlock" is exactly the same when each if condition is true that is depended on the value of both "step" and "is_cold_data(page)" are 0 or 1. That means when the value of "step" equals to "is_cold_data(page)", the if condition is true and the if statement "goto continue_unlock" appears only once, so it can be optimized to reduce the duplicated code. Signed-off-by: Tiezhu Yang <kernelpatch@126.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 86531d6b	15-Jul-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: callers take care of the page from bio error This patch changes for a caller to handle the page after its bio gets an error. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8f46dcae	14-Jul-2015	Chao Yu <chao@kernel.org>	f2fs: expose f2fs_write_cache_pages If there are gced dirty pages and normal dirty pages in the mapping of one inode, we might writeback them alternately with discontinuous block address, resulting in low performance. This patch introduces f2fs_write_cache_pages with codes copied from write_cache_pages in mm/page-writeback.c. In this function, we refactor flow with two steps: 1) writeback all cold type pages. 2) writeback all non-cold type pages. By using this method, f2fs will writeback dirty pages with the same temperature in bunch mode, it makes writeouted block being with more continuous address, so they can be merged as much as possible in f2fs bio cache, and also it will reduce the chance of submiting small IO from block layer. Test environment: 8g nokia sd card (very old sd card, but it shows better effect when testing with this patch, and with a 32g kingston sd card, I didn't see much more improvement). Test step: 1. touch testfile; 2. truncate -s 512K testfile; 3. write all pages with odd index; 4. trigger gc by ioctl; 5. write all pages with even index; 6. time fsync testfile. before: real 0m0.402s user 0m0.000s sys 0m0.000s after: real 0m0.143s user 0m0.004s sys 0m0.004s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a28ef1f5	08-Jul-2015	Chao Yu <chao@kernel.org>	f2fs: maintain extent cache in separated file This patch moves extent cache related code from data.c into extent_cache.c since extent cache is independent feature, and its codes are not relate to others in data.c, it's better for us to maintain them in separated place. There is no functionality change, but several small coding style fixes including: * rename __drop_largest_extent to f2fs_drop_largest_extent for exporting; * rename misspelled word 'untill' to 'until'; * remove unneeded 'return' in the end of f2fs_destroy_extent_tree(). Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3c7df87d	08-Jul-2015	Fan Li <fanofcode.li@samsung.com>	f2fs: don't try to split extents shorter than F2FS_MIN_EXTENT_LEN Since only parts of extents longer than F2FS_MIN_EXTENT_LEN will be kept in extent cache after split, extents already shorter than F2FS_MIN_EXTENT_LEN don't need to try split at all. Signed-off-by: Fan Li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 90d4388a	08-Jul-2015	Chao Yu <chao@kernel.org>	f2fs: fix to update page flag This patch fixes to update page flag (e.g. Uptodate/cold flag) in ->write_begin. Otherwise, page will be non-uptodate when we try to write entire page, and cold data flag in page will not be clean when gced page is being rewritten. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7023a1ad	29-Jun-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: shrink unreferenced extent_caches first If an extent_tree entry has a zero reference count, we can drop it from the cache in higher priority rather than currently referencing entries. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# bb96a8d5	06-Jul-2015	Chao Yu <chao@kernel.org>	f2fs: enhance multithread performance In ->writepages, we use writepages mutex lock to serialize all block address allocation and page submitting pairs from different inodes. This method makes our delayed dirty pages of one inode being written continously as many as possible. But there is one problem that we did not submit current cached bio in protection region of writepages mutex lock, so there is a small chance that we submit the one of other thread's as below, resulting in splitting more bios. thread 1 thread 2 ->writepages lock(writepages) ->write_cache_pages unlock(writepages) lock(writepages) ->write_cache_pages ->f2fs_submit_merged_bio ->writepage unlock(writepages) fs_mark-6535 [002] .... 2242.270230: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5766152, size = 524288 fs_mark-6536 [000] .... 2242.270361: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5767176, size = 4096 fs_mark-6536 [000] .... 2242.270370: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, NODE, sector = 8138112, size = 4096 fs_mark-6535 [002] .... 2242.270776: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5767184, size = 516096 This may really increase time of block layer works, and may cause larger IO lantency. This patch moves the submitting operation into region of writepages mutex lock to avoid bio splits when concurrently writebacking is intensive. my test environment: virtual machine, intel cpu i5 2500, 8GB size memory, 4GB size ramdisk time fs_mark -t 16 -L 1 -s 524288 -S 1 -d /mnt/f2fs/ before: real 0m4.244s user 0m0.088s sys 0m12.336s after: real 0m3.822s user 0m0.072s sys 0m10.760s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 84bc926c	29-Jun-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: check the largest extent at look-up time Because of the extent shrinker or other -ENOMEM scenarios, it cannot guarantee that the largest extent would be cached in the tree all the time. Instead of relying on extent_tree, we can simply check the cached one in extent tree accordingly. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3e72f721	19-Jun-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: use extent_cache by default We don't need to handle the duplicate extent information. The integrated rule is: - update on-disk extent with largest one tracked by in-memory extent_cache - destroy extent_tree for the truncation case - drop per-inode extent_cache by shrinker Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 554df79e	19-Jun-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: shrink extent_cache entries This patch registers shrinking extent_caches. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 244f4fc1	22-Jun-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: set cached_en after checking finally This patch relocates cached_en not only to be covered by spin_lock, but also to set once after checking out completely. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# cbe91923	16-Jun-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: update on-disk extents even under extent_cache Previously, f2fs_update_extent_cache() updates in-memory extent_cache all the time, and then finally preserves its up-to-date extent into on-disk one during f2fs_evict_inode. But, in the following scenario: 1. mount 2. open & write an extent X 3. f2fs_evict_inode; on-disk extent is X 4. open & update the extent X with Y 5. sync; trigger checkpoint 6. power-cut after power-on, f2fs should serve extent Y, but we have an on-disk extent X. This causes a failure on xfstests/311. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7a2cb678	18-Jun-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix wrong block address calculation for a split extent This patch fixes wrong calculation on block address field when an extent is split. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4246a0b6	20-Jul-2015	Christoph Hellwig <hch@lst.de>	block: add a bi_error field to struct bio Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# 6282adbf	25-Jul-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: call set_page_dirty to attach i_wb for cgroup The cgroup attaches inode->i_wb via mark_inode_dirty and when set_page_writeback is called, __inc_wb_stat() updates i_wb's stat. So, we need to explicitly call set_page_dirty->__mark_inode_dirty in prior to any writebacking pages. This patch should resolve the following kernel panic reported by Andreas Reis. https://bugzilla.kernel.org/show_bug.cgi?id=101801 --- Comment #2 from Andreas Reis <andreas.reis@gmail.com> --- BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 IP: [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90 PGD 2951ff067 PUD 2df43f067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 7 PID: 10356 Comm: gcc Tainted: G W 4.2.0-1-cu #1 Hardware name: Gigabyte Technology Co., Ltd. G1.Sniper M5/G1.Sniper M5, BIOS T01 02/03/2015 task: ffff880295044f80 ti: ffff880295140000 task.ti: ffff880295140000 RIP: 0010:[<ffffffff8149deea>] [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90 RSP: 0018:ffff880295143ac8 EFLAGS: 00010082 RAX: 0000000000000003 RBX: ffffea000a526d40 RCX: 0000000000000001 RDX: 0000000000000020 RSI: 0000000000000001 RDI: 0000000000000088 RBP: ffff880295143ae8 R08: 0000000000000000 R09: ffff88008f69bb30 R10: 00000000fffffffa R11: 0000000000000000 R12: 0000000000000088 R13: 0000000000000001 R14: ffff88041d099000 R15: ffff880084a205d0 FS: 00007f8549374700(0000) GS:ffff88042f3c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a8 CR3: 000000033e1d5000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: 0000000000000000 ffffea000a526d40 ffff880084a20738 ffff880084a20750 ffff880295143b48 ffffffff811cc91e ffff880000000000 0000000000000296 0000000000000000 ffff880417090198 0000000000000000 ffffea000a526d40 Call Trace: [<ffffffff811cc91e>] __test_set_page_writeback+0xde/0x1d0 [<ffffffff813fee87>] do_write_data_page+0xe7/0x3a0 [<ffffffff813faeea>] gc_data_segment+0x5aa/0x640 [<ffffffff813fb0b8>] do_garbage_collect+0x138/0x150 [<ffffffff813fb3fe>] f2fs_gc+0x1be/0x3e0 [<ffffffff81405541>] f2fs_balance_fs+0x81/0x90 [<ffffffff813ee357>] f2fs_unlink+0x47/0x1d0 [<ffffffff81239329>] vfs_unlink+0x109/0x1b0 [<ffffffff8123e3d7>] do_unlinkat+0x287/0x2c0 [<ffffffff8123ebc6>] SyS_unlink+0x16/0x20 [<ffffffff81942e2e>] entry_SYSCALL_64_fastpath+0x12/0x71 Code: 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49 89 f5 41 54 49 89 fc 53 48 83 ec 08 65 ff 05 e6 d9 b6 7e <48> 8b 47 20 48 63 ca 65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a RIP [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90 RSP <ffff880295143ac8> CR2: 00000000000000a8 ---[ end trace 5132449a58ed93a3 ]--- note: gcc[10356] exited with preempt_count 2 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 12377024	25-May-2015	Chao Yu <chao@kernel.org>	f2fs: avoid duplicated code by reusing f2fs_read_end_io This patch tries to clean up code because part code of f2fs_read_end_io and mpage_end_io are the same, so it's better to merge and reuse them. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4375a336	23-Apr-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs crypto: add encryption support in read/write paths This patch adds encryption support in read and write paths. Note that, in f2fs, we need to consider cleaning operation. In cleaning procedure, we must avoid encrypting and decrypting written blocks. So, this patch implements move_encrypted_block(). Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# fcc85a4d	21-Apr-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs crypto: activate encryption support for fs APIs This patch activates the following APIs for encryption support. The rules quoted by ext4 are: - An unencrypted directory may contain encrypted or unencrypted files or directories. - All files or directories in a directory must be protected using the same key as their containing directory. - Encrypted inode for regular file should not have inline_data. - Encrypted symlink and directory may have inline_data and inline_dentry. This patch activates the following APIs. 1. f2fs_link : validate context 2. f2fs_lookup : '' 3. f2fs_rename : '' 4. f2fs_create/f2fs_mkdir : inherit its dir's context 5. f2fs_direct_IO : do buffered io for regular files 6. f2fs_open : check encryption info 7. f2fs_file_mmap : '' 8. f2fs_setattr : '' 9. f2fs_file_write_iter : '' (Called by sys_io_submit) 10. f2fs_fallocate : do not support fcollapse 11. f2fs_evict_inode : free_encryption_info Signed-off-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7f63eb77	08-May-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: report unwritten area in f2fs_fiemap This patch slightly changes f2fs_fiemap function to report unwritten area. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 43f3eae1	30-Apr-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: split find_data_page according to specific purposes This patch splits find_data_page as follows. 1. f2fs_gc - use get_read_data_page() with read only 2. find_in_level - use find_data_page without locked page 3. truncate_partial_page - In the case cache_only mode, just drop cached page. - Ohterwise, use get_lock_data_page() and guarantee to truncate Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 01f28610	29-Apr-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix race on allocating and deallocating a dentry block There are two threads: f2fs_delete_entry() get_new_data_page() f2fs_reserve_block() dn.blkaddr = XXX lock_page(dentry_block) truncate_hole() dn.blkaddr = NULL unlock_page(dentry_block) lock_page(dentry_block) fill the block from XXX address add new dentries unlock_page(dentry_block) Later, f2fs_write_data_page() will truncate the dentry_block, since its block address is NULL. The reason for this was due to the wrong lock order. In this case, we should do f2fs_reserve_block() after locking its dentry block. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 05ca3632	23-Apr-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: add sbi and page pointer in f2fs_io_info This patch adds f2fs_sb_info and page pointers in f2fs_io_info structure. With this change, we can reduce a lot of parameters for IO functions. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f1e88660	09-Apr-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: expose f2fs_mpage_readpages This patch implements f2fs_mpage_readpages for further optimization on encryption support. The basic code was taken from fs/mpage.c, and changed to be simple by adjusting that block_size is equal to page_size in f2fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 003a3e1d	06-Apr-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: add f2fs_map_blocks This patch introduces f2fs_map_blocks structure likewise ext4_map_blocks. Now, f2fs uses f2fs_map_blocks when handling get_block. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5463e7c1	21-Apr-2015	Jaegeuk Kim <jaegeuk@kernel.org>	Revert "f2fs: enhance multi-threads performance" This reports performance regression by Yuanhan Liu. The basic idea was to reduce one-point mutex, but it turns out this causes another contention like context swithes. https://lkml.org/lkml/2015/4/21/11 Until finishing the analysis on this issue, I'd like to revert this for a while. This reverts commit 78373b7319abdf15050af5b1632c4c8b8b398f33.
# 22c6186e	16-Mar-2015	Omar Sandoval <osandov@osandov.com>	direct_IO: remove rw from a_ops->direct_IO() Now that no one is using rw, remove it completely. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 6f673763	16-Mar-2015	Omar Sandoval <osandov@osandov.com>	direct_IO: use iov_iter_rw() instead of rw everywhere The rw parameter to direct_IO is redundant with iov_iter->type, and treated slightly differently just about everywhere it's used: some users do rw & WRITE, and others do rw == WRITE where they should be doing a bitwise check. Simplify this with the new iov_iter_rw() helper, which always returns either READ or WRITE. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 17f8c842	16-Mar-2015	Omar Sandoval <osandov@osandov.com>	Remove rw from {,__,do_}blockdev_direct_IO() Most filesystems call through to these at some point, so we'll start here. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 1b3e27a9	23-Mar-2015	Chao Yu <chao@kernel.org>	f2fs: limit b_size of mapped bh in f2fs_map_bh Map bh over max size which caller defined is not needed, limit it in f2fs_map_bh. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# df6136ef	22-Mar-2015	Chao Yu <chao@kernel.org>	f2fs: preallocate fallocated blocks for direct IO Normally, due to DIO_SKIP_HOLES flag is set by default, blockdev_direct_IO in f2fs_direct_IO tries to skip DIO in holes when writing inside i_size, this makes us falling back to buffered IO which shows lower performance. So in commit 59b802e5a453 ("f2fs: allocate data blocks in advance for f2fs_direct_IO"), we improve perfromance by allocating data blocks in advance if we meet holes no matter in i_size or not, since with it we can avoid falling back to buffered IO. But we forget to consider for unwritten fallocated block in this commit. This patch tries to fix it for fallocate case, this helps to improve performance. Test result: Storage info: sandisk ultra 64G micro sd card. touch /mnt/f2fs/file truncate -s 67108864 /mnt/f2fs/file fallocate -o 0 -l 67108864 /mnt/f2fs/file time dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=64 conv=notrunc oflag=direct Time before applying the patch: 67108864 bytes (67 MB) copied, 36.16 s, 1.9 MB/s real 0m36.162s user 0m0.000s sys 0m0.180s Time after applying the patch: 67108864 bytes (67 MB) copied, 27.7776 s, 2.4 MB/s real 0m27.780s user 0m0.000s sys 0m0.036s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0bdee482	19-Mar-2015	Chao Yu <chao@kernel.org>	f2fs: preserve extent info for extent cache This patch tries to preserve last extent info in extent tree cache into on-disk inode, so this can help us to reuse the last extent info next time for performance. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 028a41e8	19-Mar-2015	Chao Yu <chao@kernel.org>	f2fs: initialize extent tree with on-disk extent info of inode With normal extent info cache, we records largest extent mapping between logical block and physical block into extent info, and we persist extent info in on-disk inode. When we enable extent tree cache, if extent info of on-disk inode is exist, and the extent is not a small fragmented mapping extent. We'd better to load the extent info into extent tree cache when inode is loaded. By this way we can have more chance to hit extent tree cache rather than taking more time to read dnode page for block address. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 93dfc526	19-Mar-2015	Chao Yu <chao@kernel.org>	f2fs: introduce __{find,grab}_extent_tree This patch introduces __{find,grab}_extent_tree for reusing by following patches. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 216a620a	19-Mar-2015	Chao Yu <chao@kernel.org>	f2fs: split set_data_blkaddr from f2fs_update_extent_cache Split __set_data_blkaddr from f2fs_update_extent_cache for readability. Additionally rename __set_data_blkaddr to set_data_blkaddr for exporting. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8ce67cb0	17-Mar-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: add some tracepoints to debug volatile and atomic writes Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3c6c2beb	17-Mar-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid punch_hole overhead when releasing volatile data This patch is to avoid some punch_hole overhead when releasing volatile data. If volatile data was not written yet, we just can make the first page as zero. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 78373b73	13-Mar-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: enhance multi-threads performance Previously, f2fs_write_data_pages has a mutex, sbi->writepages, to serialize data writes to maximize write bandwidth, while sacrificing multi-threads performance. Practically, however, multi-threads environment is much more important for users. So this patch tries to remove the mutex. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3402e87c	11-Mar-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: set buffer_new when new blocks are allocated This patch modifies to call set_buffer_new, if new blocks are allocated. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# d6d4f1cb	12-Mar-2015	Chao Yu <chao@kernel.org>	f2fs: fix to check current blkaddr in __allocate_data_blocks In __allocate_data_blocks, we should check current blkaddr which is located at ofs_in_node of dnode page instead of checking first blkaddr all the time. Otherwise we can only allocate one blkaddr in each dnode page. Fix it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# d5669f7b	27-Feb-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid to trigger writepage during POR This patch doesn't make any effect on previous behavior, since f2fs_write_data_page bypasses writing the page during POR. But, the difference is that this patch avoids holding writepages mutex. This is to avoid the following false warning, since this can happen only when mount and shutdown are triggered at the same time. ====================================================== [ INFO: possible circular locking dependency detected ] 4.0.0-rc1+ #3 Tainted: G O ------------------------------------------------------- kworker/u8:0/2270 is trying to acquire lock: (&sbi->gc_mutex){+.+.+.}, at: [<ffffffffa02bdd33>] f2fs_balance_fs+0x73/0x90 [f2fs] but task is already holding lock: (&sbi->writepages){+.+...}, at: [<ffffffffa02b261b>] f2fs_write_data_pages+0xcb/0x3a0 [f2fs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&sbi->writepages){+.+...}: [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0 [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530 [<ffffffffa02b261b>] f2fs_write_data_pages+0xcb/0x3a0 [f2fs] [<ffffffff811c38c1>] do_writepages+0x21/0x50 [<ffffffff8126c5a6>] __writeback_single_inode+0x76/0xbf0 [<ffffffff8126e23a>] writeback_single_inode+0xea/0x1c0 [<ffffffff8126e425>] write_inode_now+0x95/0xa0 [<ffffffff81259dab>] iput+0x20b/0x3f0 [<ffffffffa02c1c8b>] recover_data.constprop.14+0x26b/0xa80 [f2fs] [<ffffffffa02c2776>] recover_fsync_data+0x2b6/0x5e0 [f2fs] [<ffffffffa02a9744>] f2fs_fill_super+0xb24/0xb90 [f2fs] [<ffffffff8123d7f4>] mount_bdev+0x1a4/0x1e0 [<ffffffffa02a3c85>] f2fs_mount+0x15/0x20 [f2fs] [<ffffffff8123e159>] mount_fs+0x39/0x180 [<ffffffff8125e51b>] vfs_kern_mount+0x6b/0x160 [<ffffffff81261554>] do_mount+0x204/0xbe0 [<ffffffff8126223b>] SyS_mount+0x8b/0xe0 [<ffffffff81863e6d>] system_call_fastpath+0x16/0x1b -> #1 (&sbi->cp_mutex){+.+...}: [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0 [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530 [<ffffffffa02acbf2>] write_checkpoint+0x42/0x1230 [f2fs] [<ffffffffa02a847d>] f2fs_sync_fs+0x9d/0x2a0 [f2fs] [<ffffffff81272f82>] sync_filesystem+0x82/0xb0 [<ffffffff8123c214>] generic_shutdown_super+0x34/0x100 [<ffffffff8123c5f7>] kill_block_super+0x27/0x70 [<ffffffffa02a3c60>] kill_f2fs_super+0x20/0x30 [f2fs] [<ffffffff8123ca49>] deactivate_locked_super+0x49/0x80 [<ffffffff8123d05e>] deactivate_super+0x4e/0x70 [<ffffffff8125df63>] cleanup_mnt+0x43/0x90 [<ffffffff8125e002>] __cleanup_mnt+0x12/0x20 [<ffffffff810a82e4>] task_work_run+0xc4/0xf0 [<ffffffff8101f0bd>] do_notify_resume+0x8d/0xa0 [<ffffffff81864141>] int_signal+0x12/0x17 -> #0 (&sbi->gc_mutex){+.+.+.}: [<ffffffff810e2866>] __lock_acquire+0x1ac6/0x1c90 [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0 [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530 [<ffffffffa02bdd33>] f2fs_balance_fs+0x73/0x90 [f2fs] [<ffffffffa02b5938>] f2fs_write_data_page+0x348/0x5b0 [f2fs] [<ffffffffa02af9da>] __f2fs_writepage+0x1a/0x50 [f2fs] [<ffffffff811c1b54>] write_cache_pages+0x274/0x6f0 [<ffffffffa02b2630>] f2fs_write_data_pages+0xe0/0x3a0 [f2fs] [<ffffffff811c38c1>] do_writepages+0x21/0x50 [<ffffffff8126c5a6>] __writeback_single_inode+0x76/0xbf0 [<ffffffff8126d44a>] writeback_sb_inodes+0x32a/0x710 [<ffffffff8126d8cf>] __writeback_inodes_wb+0x9f/0xd0 [<ffffffff8126dcdb>] wb_writeback+0x3db/0x850 [<ffffffff8126e848>] bdi_writeback_workfn+0x148/0x980 [<ffffffff810a3782>] process_one_work+0x1e2/0x840 [<ffffffff810a3f01>] worker_thread+0x121/0x460 [<ffffffff810a9dc8>] kthread+0xf8/0x110 [<ffffffff81863dbc>] ret_from_fork+0x7c/0xb0 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b7f204cc	25-Feb-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: check its block allocation to avoid producing wrong dirty pages If a page is cached but its block was deallocated, we don't need to make the page dirty again by gc and truncate_partial_data_page. In that case, it needs to check its block allocation all the time instead of giving up-to-date page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2bca1e23	25-Feb-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: clear page's up-to-date if block was deallocated If page's on-disk block was deallocated, let's remove up-to-date flag to avoid further access with wrong contents. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e2e40f2c	22-Feb-2015	Christoph Hellwig <hch@lst.de>	fs: move struct kiocb to fs.h struct kiocb now is a generic I/O container, so move it to fs.h. Also do a #include diet for aio.h while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# cb3bc9ee	05-Feb-2015	Chao Yu <chao@kernel.org>	f2fs: use extent cache for dir We update extent cache for all user inode of f2fs including dir inode, so this patch gives another chance to try to get physical address of page from extent cache for dir inode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 91c5d9bc	05-Feb-2015	Chao Yu <chao@kernel.org>	f2fs: switch to check FI_NO_EXTENT in f2fs_{lookup,update}_extent_cache This patch switch to check FI_NO_EXTENT in f2fs_{lookup,update}_extent_cache instead of f2fs_{lookup,update}_extent_tree or {lookup,update}_extent_info. No functionality modification in this patch. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 62c8af65	05-Feb-2015	Chao Yu <chao@kernel.org>	f2fs: support fast lookup in extent cache This patch adds a fast lookup path for rb-tree extent cache. In this patch we add a recently accessed extent node pointer 'cached_en' in extent tree. In lookup path of extent cache, we will firstly lookup the last accessed extent node which cached_en points, if we do not hit in this node, we will try to lookup extent node in rb-tree. By this way we can avoid unnecessary slow lookup in rb-tree sometimes. Note that, side-effect of this patch is that we will increase memory cost, because we will store a pointer variable in each struct extent tree additionally. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1ec4610c	05-Feb-2015	Chao Yu <chao@kernel.org>	f2fs: add trace for rb-tree extent cache ops This patch adds trace for lookup/update/shrink/destroy ops in rb-tree extent cache. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1dcc336b	05-Feb-2015	Chao Yu <chao@kernel.org>	f2fs: enable rb-tree extent cache This patch enables rb-tree based extent cache in f2fs. When we mount with "-o extent_cache", f2fs will try to add recently accessed page-block mappings into rb-tree based extent cache as much as possible, instead of original one extent info cache. By this way, f2fs can support more effective cache between dnode page cache and disk. It will supply high hit ratio in the cache with fewer memory when dnode page cache are reclaimed in environment of low memory. Storage: Sandisk sd card 64g 1.append write file (offset: 0, size: 128M); 2.override write file (offset: 2M, size: 1M); 3.override write file (offset: 4M, size: 1M); ... 4.override write file (offset: 48M, size: 1M); ... 5.override write file (offset: 112M, size: 1M); 6.sync 7.echo 3 > /proc/sys/vm/drop_caches 8.read file (size:128M, unit: 4k, count: 32768) (time dd if=/mnt/f2fs/128m bs=4k count=32768) Extent Hit Ratio: before patched Hit Ratio 121 / 1071 1071 / 1071 Performance: before patched real 0m37.051s 0m35.556s user 0m0.040s 0m0.026s sys 0m2.990s 0m2.251s Memory Cost: before patched Tree Count: 0 1 (size: 24 bytes) Node Count: 0 45 (size: 1440 bytes) v3: o retest and given more details of test result. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 429511cd	05-Feb-2015	Chao Yu <chao@kernel.org>	f2fs: add core functions for rb-tree extent cache This patch adds core functions including slab cache init function and init/lookup/update/shrink/destroy function for rb-tree based extent cache. Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about detail design and implementation of extent cache. Todo: * register rb-based extent cache shrink with mm shrink interface. v2: o move set_extent_info and __is_{extent,back,front}_mergeable into f2fs.h. o introduce __{attach,detach}_extent_node for code readability. o add cond_resched() when fail to invoke kmem_cache_alloc/radix_tree_insert. o fix some coding style and typo issues. v3: o fix oops due to using an unassigned pointer. o use list_del to remove extent node in shrink list. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Changman Lee <cm224.lee@samsung.com> [Jaegeuk Kim: add static for some funcitons and declare in f2fs.h] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 7e4dde79	05-Feb-2015	Chao Yu <chao@kernel.org>	f2fs: introduce universal lookup/update interface for extent cache In this patch, we do these jobs: 1. rename {check,update}_extent_cache to {lookup,update}_extent_info; 2. introduce universal lookup/update interface of extent cache: f2fs_{lookup,update}_extent_cache including above two real functions, then export them to function callers. So after above cleanup, we can add new rb-tree based extent cache into exported interfaces. v2: o remove "f2fs_" for inner function {lookup,update}_extent_info suggested by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a2e7d1bf	05-Feb-2015	Chao Yu <chao@kernel.org>	f2fs: introduce f2fs_map_bh to clean codes of check_extent_cache This patch introduces f2fs_map_bh to clean codes of check_extent_cache. v2: o cleanup f2fs_map_bh pointed out by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4d0b0bd4	05-Feb-2015	Chao Yu <chao@kernel.org>	f2fs: simplfy a field name in struct f2fs_extent,extent_info Rename a filed name from 'blk_addr' to 'blk' in struct {f2fs_extent,extent_info} as annotation of this field descripts its meaning well to us. By this way, we can avoid long statement in code of following patches. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0c872e2d	05-Feb-2015	Chao Yu <chao@kernel.org>	f2fs: move ext_lock out of struct extent_info Move ext_lock out of struct extent_info, then in the following patches we can use variables with struct extent_info type as a parameter to pass pure data. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 59b802e5	09-Feb-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: allocate data blocks in advance for f2fs_direct_IO This patch adds preallocation for data blocks to prepare f2fs_direct_IO. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# da17eece	09-Feb-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: call set_buffer_new for get_block This patch fixes wrong handling of buffer_new flag in get_block. If f2fs allocates new blocks and mapped buffer_head, it needs to set buffer_new for the bh_result. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 487261f3	05-Feb-2015	Chao Yu <chao@kernel.org>	f2fs: merge {invalidate,release}page for meta/node/data pages This patch merges ->{invalidate,release}page function for meta/node/data pages. After this, duplication of codes could be removed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# f68daeeb	30-Jan-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: keep PagePrivate during releasepage If PagePrivate is removed by releasepage, f2fs loses counting dirty pages. e.g., try_to_release_page will not release page when the page is dirty, but our releasepage removes PagePrivate. [<ffffffff81188d75>] try_to_release_page+0x35/0x50 [<ffffffff811996f9>] invalidate_inode_pages2_range+0x2f9/0x3b0 [<ffffffffa02a7f54>] ? truncate_blocks+0x384/0x4d0 [f2fs] [<ffffffffa02b7583>] ? f2fs_direct_IO+0x283/0x290 [f2fs] [<ffffffffa02b7fb0>] ? get_data_block_fiemap+0x20/0x20 [f2fs] [<ffffffff8118aa53>] generic_file_direct_write+0x163/0x170 [<ffffffff8118ad06>] __generic_file_write_iter+0x2a6/0x350 [<ffffffff8118adef>] generic_file_write_iter+0x3f/0xb0 [<ffffffff81203081>] new_sync_write+0x81/0xb0 [<ffffffff81203837>] vfs_write+0xb7/0x1f0 [<ffffffff81204459>] SyS_write+0x49/0xb0 [<ffffffff817c286d>] system_call_fastpath+0x16/0x1b Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# caf0047e	28-Jan-2015	Chao Yu <chao@kernel.org>	f2fs: merge flags in struct f2fs_sb_info Currently, there are several variables with Boolean type as below: struct f2fs_sb_info { ... int s_dirty; bool need_fsck; bool s_closing; ... bool por_doing; ... } For this there are some issues: 1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean type variables by compiler. 2. if we continuously add new flag into f2fs_sb_info, structure will be messed up. So in this patch, we try to: 1. switch s_dirty to Boolean type variable since it has two status 0/1. 2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag. 3. introduce an enum type which can indicate different states of sbi. 4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag to operate flags for sbi. After that, above issues will be fixed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# df199139	06-Jan-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix wrong unlock_page call This patch removes wrongly called unlock_page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 38aa0889	05-Jan-2015	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: align direct_io'ed data to section This patch aligns the start block address of a file for direct io to the f2fs's section size. Some flash devices manage an over 4KB-sized page as a write unit, and if the direct_io'ed data are written but not aligned to that unit, the performance can be degraded due to the partial page copies. Thus, since f2fs has a section that is well aligned to FTL units, we can align the block address to the section size so that f2fs avoids this misalignment. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 41ef94b3	31-Dec-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: remove uncovered code path This patch removes unnecessary function calls. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3547ea96	31-Dec-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid potential unnecessary codes This patch relocates some operations to avoid unnecessary execution. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e1509cf2	30-Dec-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: clean up to remove parameter This patch uses dn->data_blkaddr as a parameter for the destination block address. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 2ace38e0	24-Dec-2014	Chao Yu <chao@kernel.org>	f2fs: cleanup parameters for trace_f2fs_submit_{read_,write_,page_,page_m}bio with fio Cleanup parameters for trace_f2fs_submit_{read_,write_,page_,page_m}bio with fio as one parameter. Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3e1c8f12	23-Dec-2014	Chao Yu <chao@kernel.org>	f2fs: cleanup trace event of f2fs_submit_page_{m,}bio with DECLARE_EVENT_CLASS This patch adds missing parameter _type_ for trace_f2fs_submit_page_bio, then use DECLARE_EVENT_CLASS/DEFINE_EVENT_CONDITION pair to cleanup some trace event code related to f2fs_submit_page_{m,}bio. Additionally, after we remove redundant code, size of code can be reduced: text data bss dec hex filename 176787 8712 56 185555 2d4d3 f2fs.ko.org 174408 8648 56 183112 2cb48 f2fs.ko Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# db9f7c1a	17-Dec-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: activate f2fs_trace_ios This patch activates f2fs_trace_ios. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# cf04e8eb	17-Dec-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: use f2fs_io_info to clean up messy parameters during IO path This patch cleans up parameters on IO paths. The key idea is to use f2fs_io_info adding a parameter, block address, and then use this structure as parameters. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 042b7816	12-Dec-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: remove unnecessary call to invalidate inmemory pages Now we use inmemory pages for atomic write only and provide abort procedure, we don't need to truncate them explicitly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 1e84371f	09-Dec-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: change atomic and volatile write policies This patch adds two new ioctls to release inmemory pages grabbed by atomic writes. o f2fs_ioc_abort_volatile_write - If transaction was failed, all the grabbed pages and data should be written. o f2fs_ioc_release_volatile_write - This is to enhance the performance of PERSIST mode in sqlite. In order to avoid huge memory consumption which causes OOM, this patch changes volatile writes to use normal dirty pages, instead blocked flushing to the disk as long as system does not suffer from memory pressure. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# cd34e296	30-Nov-2014	Chao Yu <chao@kernel.org>	f2fs: fix to return correct error number in f2fs_write_begin Fix the wrong error number in error path of f2fs_write_begin. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5f727395	25-Nov-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix deadlock during inline_data conversion A deadlock can be occurred: Thread 1] Thread 2] - f2fs_write_data_pages - f2fs_write_begin - lock_page(page #0) - grab_cache_page(page #X) - get_node_page(inode_page) - grab_cache_page(page #0) : to convert inline_data - f2fs_write_data_page - f2fs_write_inline_data - get_node_page(inode_page) In this case, trying to lock inode_page and page #0 causes deadlock. In order to avoid this, this patch adds a rule for this locking policy, which is that page #0 should be locked followed by inode_page lock. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 8cdcb713	17-Nov-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: put the inode page when error was occurred We should put the inode page when error was occurred. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 6a8f8ca5	29-Oct-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid race condition in handling wait_io __submit_merged_bio f2fs_write_end_io f2fs_write_end_io wait_io = X wait_io = x complete(X) complete(X) wait_io = NULL wait_for_completion() free(X) spin_lock(X) kernel panic In order to avoid this, this patch removes the wait_io facility. Instead, we can use wait_on_all_pages_writeback(sbi) to wait for end_ios. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b3d208f9	23-Oct-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: revisit inline_data to avoid data races and potential bugs This patch simplifies the inline_data usage with the following rule. 1. inline_data is set during the file creation. 2. If new data is requested to be written ranges out of inline_data, f2fs converts that inode permanently. 3. There is no cases which converts non-inline_data inode to inline_data. 4. The inline_data flag should be changed under inode page lock. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 9234f319	22-Oct-2014	Jan Kara <jack@suse.cz>	f2fs: fix possible data corruption in f2fs_write_begin() f2fs_write_begin() doesn't initialize the 'dn' variable if the inode has inline data. However it uses its contents to decide whether it should just zero out the page or load data to it. Thus if we are unlucky we can zero out page contents instead of loading inline data into a page. CC: stable@vger.kernel.org CC: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 9ba69cf9	17-Oct-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid to allocate when inline_data was written The sceanrio is like this. inline_data i_size page write_begin/vm_page_mkwrite X 30 dirty_page X 30 write to #4096 position X 30 get_dnode_of_data wait for get_dnode_of_data O 30 write inline_data O 30 get_dnode_of_data O 30 reserve data block .. In this case, we have #0 = NEW_ADDR and inline_data as well. We should not allow this condition for further access. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# cbcb2872	09-Oct-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: invalidate inmemory page If user truncates file's data, we should truncate inmemory pages too. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 34ba94ba	09-Oct-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: do not make dirty any inmemory pages This patch let inmemory pages be clean all the time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 02a1335f	06-Oct-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: support volatile operations for transient data This patch adds support for volatile writes which keep data pages in memory until f2fs_evict_inode is called by iput. For instance, we can use this feature for the sqlite database as follows. While supporting atomic writes for main database file, we can keep its journal data temporarily in the page cache by the following sequence. 1. open -> ioctl(F2FS_IOC_START_VOLATILE_WRITE); 2. writes : keep all the data in the page cache. 3. flush to the database file with atomic writes a. ioctl(F2FS_IOC_START_ATOMIC_WRITE); b. writes c. ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE); 4. close -> drop the cached data Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 88b88a66	06-Oct-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: support atomic writes This patch introduces a very limited functionality for atomic write support. In order to support atomic write, this patch adds two ioctls: o F2FS_IOC_START_ATOMIC_WRITE o F2FS_IOC_COMMIT_ATOMIC_WRITE The database engine should be aware of the following sequence. 1. open -> ioctl(F2FS_IOC_START_ATOMIC_WRITE); 2. writes : all the written data will be treated as atomic pages. 3. commit -> ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE); : this flushes all the data blocks to the disk, which will be shown all or nothing by f2fs recovery procedure. 4. repeat to #2. The IO pattens should be: ,- START_ATOMIC_WRITE ,- COMMIT_ATOMIC_WRITE CP \| D D D D D D \| FSYNC \| D D D D \| FSYNC ... `- COMMIT_ATOMIC_WRITE Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 55cf9cb6	15-Sep-2014	Chao Yu <chao@kernel.org>	f2fs: support large sector size Block size in f2fs is 4096 bytes, so theoretically, f2fs can support 4096 bytes sector device at maximum. But now f2fs only support 512 bytes size sector, so block device such as zRAM which uses page cache as its block storage space will not be mounted successfully as mismatch between sector size of zRAM and sector size of f2fs supported. In this patch we support large sector size in f2fs, so block device with sector size of 512/1024/2048/4096 bytes can be supported in f2fs. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 976e4c50	15-Sep-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: update i_size when __allocate_data_block The f2fs_direct_IO uses __allocate_data_block, but inside the allocation path, we should update i_size at the changed time to update its inode page. Otherwise, we can get wrong i_size after roll-forward recovery. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 90a893c7	22-Sep-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: use MAX_BIO_BLOCKS(sbi) This patch cleans up a simple macro. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 88bd02c9	15-Sep-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix conditions to remain recovery information in f2fs_sync_file This patch revisited whole the recovery information during the f2fs_sync_file. In this patch, there are three information to make a decision. a) IS_CHECKPOINTED, /* is it checkpointed before? / b) HAS_FSYNCED_INODE, / is the inode fsynced before? / c) HAS_LAST_FSYNC, / has the latest node fsync mark? */ And, the scenarios for our rule are based on: [Term] F: fsync_mark, D: dentry_mark 1. inode(x) \| CP \| inode(x) \| dnode(F) 2. inode(x) \| CP \| inode(F) \| dnode(F) 3. inode(x) \| CP \| dnode(F) \| inode(x) \| inode(F) 4. inode(x) \| CP \| dnode(F) \| inode(F) 5. CP \| inode(x) \| dnode(F) \| inode(DF) 6. CP \| inode(DF) \| dnode(F) 7. CP \| dnode(F) \| inode(DF) 8. CP \| dnode(F) \| inode(x) \| inode(DF) For example, #3, the three conditions should be changed as follows. inode(x) \| CP \| dnode(F) \| inode(x) \| inode(F) a) x o o o o b) x x x x o c) x o o x o If f2fs_sync_file stops ------^, it should write inode(F) --------------^ So, the need_inode_block_update should return true, since c) get_nat_flag(e, HAS_LAST_FSYNC), is false. For example, #8, CP \| alloc \| dnode(F) \| inode(x) \| inode(DF) a) o x x x x b) x x x o c) o o x o If f2fs_sync_file stops -------^, it should write inode(DF) --------------^ Note that, the roll-forward policy should follow this rule, which means, if there are any missing blocks, we doesn't need to recover that inode. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# a7ffdbe2	12-Sep-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: expand counting dirty pages in the inode page cache Previously f2fs only counts dirty dentry pages, but there is no reason not to expand the scope. This patch changes the names on the management of dirty pages and to count dirty pages in each inode info as well. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 9850cf4a	02-Sep-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: need fsck.f2fs when f2fs_bug_on is triggered If any f2fs_bug_on is triggered, fsck.f2fs is needed. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 4081363f	02-Sep-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB This patch adds three inline functions to clean up dirty casting codes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 764aa3e9	14-Aug-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid double lock in truncate_blocks The init_inode_metadata calls truncate_blocks when error is occurred. The callers holds f2fs_lock_op, so we should not call it again in truncate_blocks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# cf779cab	11-Aug-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: handle EIO not to break fs consistency There are two rules when EIO is occurred. 1. don't write any checkpoint data to preserve the previous checkpoint 2. don't lose the cached dentry/node/meta pages So, at first, this patch adds set_page_dirty in f2fs_write_end_io's failure. Then, writing checkpoint/dentry/node blocks is not allowed. Note that, for the data pages, we can't just throw away by redirtying them. Otherwise, kworker can fall into infinite loop to flush them. (Ref. xfstests/019) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b067ba1f	07-Aug-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: should convert inline_data during the mkwrite If mkwrite is called to an inode having inline_data, it can overwrite the data index space as NEW_ADDR. (e.g., the first 4 bytes are coincidently zero) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# e1c42045	06-Aug-2014	arter97 <qkrwngud825@gmail.com>	f2fs: fix typo Fix typo and some grammatical errors. The words "filesystem" and "readahead" are being used without the space treewide. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 70407fad	31-Jul-2014	Chao Yu <chao@kernel.org>	f2fs: add tracepoint for f2fs_direct_IO This patch adds a tracepoint for f2fs_direct_IO. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# fff04f90	25-Jul-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: add info of appended or updated data writes This patch introduces a inode number list in which represents inodes having appended data writes or updated data writes after last checkpoint. This will be used at fsync to determine whether the recovery information should be written or not. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 0f7b2abd	23-Jul-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: add nobarrier mount option This patch adds a mount option, nobarrier, in f2fs. The assumption in here is that file system keeps the IO ordering, but doesn't care about cache flushes inside the storages. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 79e35dc3	12-Jul-2014	Huang Ying <ying.huang@intel.com>	f2fs: add f2fs_balance_fs for direct IO Otherwise, if a large amount of direct IO writes were done, the segment allocation may be failed because no enough segments are gced. Changes: v2: add f2fs_balance_fs into __get_data_block instead of f2fs_direct_IO. Signed-off-by: Huang, Ying <ying.huang@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 3aab8f82	01-Jul-2014	Chao Yu <chao@kernel.org>	f2fs: introduce f2fs_write_failed to handle error case when write When we fail in ->write_begin()/->direct_IO(), our allocated node block in disk and page cache are still kept, despite these may not be used again. This patch introduce f2fs_write_failed() to handle the error case of these two interfaces, it will truncate page cache and blocks of this file according to i_size. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 5576cd6c	11-Jun-2014	Chao Yu <chao@kernel.org>	f2fs: avoid unneeded SetPageUptodate in f2fs_write_end We have already set page update in ->write_begin, so we should remove redundant SetPageUptodate in ->write_end. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# ccfb3000	12-Jun-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix to report newly allocate region as extent Previous get_block in f2fs didn't report the newly allocated region which has NEW_ADDR. For reader, it should not report, but fiemap needs this. So, this patch introduces two get_block sharing core function. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# 9ab70134	07-Jun-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: support f2fs_fiemap This patch links f2fs_fiemap with generic function with get_block. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# b6fe5873	03-Jun-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix to recover data written by dio If data are overwritten through dio, previous f2fs doesn't remain the fsync mark due to no additional node writes. Note that this patch should resolve the xfstests:311. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
# c20e89cd	06-May-2014	Chao Yu <chao@kernel.org>	f2fs: add a tracepoint for f2fs_read_data_page This patch adds a tracepoint for f2fs_read_data_page to trace when page is readed by user. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# e5748434	06-May-2014	Chao Yu <chao@kernel.org>	f2fs: add a tracepoint for f2fs_write_{meta,node,data}_pages This patch adds a tracepoint for f2fs_write_{meta,node,data}_pages to trace when pages are fsyncing/flushing. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# ecda0de3	06-May-2014	Chao Yu <chao@kernel.org>	f2fs: add a tracepoint for f2fs_write_{meta,node,data}_page This patch adds a tracepoint for f2fs_write_{meta,node,data}_page to trace when page is writting out. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# dfb2bf38	06-May-2014	Chao Yu <chao@kernel.org>	f2fs: add a tracepoint for f2fs_write_end This patch adds a tracepoint for f2fs_write_end to trace write op of user. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 62aed044	06-May-2014	Chao Yu <chao@kernel.org>	f2fs: add a tracepoint for f2fs_write_begin This patch adds a tracepoint for f2fs_write_begin to trace write op of user. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# d5f66990	29-Apr-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: decrease the lock granularity during write_begin This patch reduces the lock granularity during write_begin. When the system is under memory pressure, it would be better to reduce the locking time for the data pages. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 9ac1349a	29-Apr-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid grab_cache_page_write_begin for data pages We don't need to wait on page writeback for these cases. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 6403eb1f	26-Apr-2014	Chao Yu <chao@kernel.org>	f2fs: introduce help macro ADDRS_PER_PAGE() Introduce help macro ADDRS_PER_PAGE() to get the number of address pointers in direct node or inode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 2aea39ec	23-Apr-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: submit bio at the reclaim path If f2fs_write_data_page is called through the reclaim path, we should submit the bio right away. This patch resolves the following issue that Marc Dietrich reported. "It took me a while to bisect a problem which causes my ARM (tegra2) netbook to frequently stall for 5-10 seconds when I enable EXA acceleration (opentegra experimental ddx)." And this patch fixes that. Reported-by: Marc Dietrich <marvin24@gmx.de> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 454ae7e5	21-Apr-2014	Chao Yu <chao@kernel.org>	f2fs: handle inline data independently in f2fs_bmap We'd better handle inline data case independently in f2fs_bmap(). It can reduce our handling time in f2fs_bmap(). Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 6fb03f3a	15-Apr-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: adjust free mem size to flush dentry blocks If so many dirty dentry blocks are cached, not reached to the flush condition, we should fall into livelock in balance_dirty_pages. So, let's consider the mem size for the condition. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 76f60268	15-Apr-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: call redirty_page_for_writepage This patch replace some general codes with redirty_page_for_writepage, which can be enabled after consideration on additional procedure like counting dirty pages appropriately. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 5b46f25d	16-Mar-2014	Al Viro <viro@zeniv.linux.org.uk>	f2fs: switch to iov_iter_alignment() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 31b14039	04-Mar-2014	Al Viro <viro@zeniv.linux.org.uk>	switch {__,}blockdev_direct_IO() to iov_iter Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# d8d3d94b	04-Mar-2014	Al Viro <viro@zeniv.linux.org.uk>	pass iov_iter to ->direct_IO() unmodified, for now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# d54c795b	29-Mar-2014	Chao Yu <chao@kernel.org>	f2fs: fix error path when fail to read inline data We should unlock page in ->readpage() path and also should unlock & release page in error path of ->write_begin() to avoid deadlock or memory leak. So let's add release code to fix the problem when we fail to read inline data. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# df0f8dc0	22-Mar-2014	Chao Yu <chao@kernel.org>	f2fs: avoid unnecessary bio submit when wait page writeback This patch introduce is_merged_page() to check whether current page is merged in f2fs bio cache. When page is not in cache, we can avoid submitting bio cache, resulting in having more chance to merge pages. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 50c8cdb3	17-Mar-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: introduce nr_pages_to_write for segment alignment This patch introduces nr_pages_to_write to align page writes to the segment or other operational unit size, which can be tuned according to the system environment. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# d3baf95d	17-Mar-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: increase pages_skipped when skipping writepages This patch increases pages_skipped when skipping writepages. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 87d6f890	17-Mar-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid small data writes by skipping writepages This patch introduces nr_pages_to_skip(sbi, type) to determine writepages can be skipped. The dentry, node, and meta pages can be conrolled by F2FS without breaking the FS consistency. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 9cf3c389	27-Feb-2014	Chao Yu <chao@kernel.org>	f2fs: fix dirty page accounting when redirty We should de-account dirty counters for page when redirty in ->writepage(). Wu Fengguang described in 'commit 971767caf632190f77a40b4011c19948232eed75': "writeback: fix dirtied pages accounting on redirty De-account the accumulative dirty counters on page redirty. Page redirties (very common in ext4) will introduce mismatch between counters (a) and (b) a) NR_DIRTIED, BDI_DIRTIED, tsk->nr_dirtied b) NR_WRITTEN, BDI_WRITTEN This will introduce systematic errors in balanced_rate and result in dirty page position errors (ie. the dirty pages are no longer balanced around the global/bdi setpoints)." Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 8618b881	17-Feb-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix not to write data pages on the page reclaiming path Even if f2fs_write_data_page is called by the page reclaiming path, we should not write the page to provide enough free segments for the worst case scenario. Otherwise, f2fs can face with no free segment while gc is conducted, resulting in: ------------[ cut here ]------------ kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/segment.c:565! RIP: 0010:[<ffffffffa02c3b11>] [<ffffffffa02c3b11>] new_curseg+0x331/0x340 [f2fs] Call Trace: allocate_segment_by_default+0x204/0x280 [f2fs] allocate_data_block+0x108/0x210 [f2fs] write_data_page+0x8a/0xc0 [f2fs] do_write_data_page+0xe1/0x2a0 [f2fs] move_data_page+0x8a/0xf0 [f2fs] f2fs_gc+0x446/0x970 [f2fs] f2fs_balance_fs+0xb6/0xd0 [f2fs] f2fs_write_begin+0x50/0x350 [f2fs] ? unlock_page+0x27/0x30 ? unlock_page+0x27/0x30 generic_file_buffered_write+0x10a/0x280 ? file_update_time+0xa3/0xf0 __generic_file_aio_write+0x1c8/0x3d0 ? generic_file_aio_write+0x52/0xb0 ? generic_file_aio_write+0x52/0xb0 generic_file_aio_write+0x65/0xb0 do_sync_write+0x5a/0x90 vfs_write+0xc5/0x1f0 SyS_write+0x55/0xa0 system_call_fastpath+0x16/0x1b Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 1fe54f9d	06-Feb-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: clean up redundant function call This patch integrates inode_[inc\|dec]_dirty_dents with inc_page_count to remove redundant calls. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 1b1f559f	02-Feb-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: remove the ugly pointer conversion This patch modifies the use of bi_private to remove pointer chasing for sbi. Previously, we had a bi_private structure, but it needs memory allocation. So this patch uses bi_private by the sbi pointer and adds a completion pointer into the sbi. This can achieve no memory allocation and nice use of the bi_private. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 744602cf	23-Jan-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: update_inode_page should be done all the time In order to make fs consistency, update_inode_page should not be failed all the time. Otherwise, it is possible to lose some metadata in the inode like a link count. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# a18ff063	20-Jan-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: call mark_inode_dirty to flush dirty pages If a dentry page is updated, we should call mark_inode_dirty to add the inode into the dirty list, so that its dentry pages are flushed to the disk. Otherwise, the inode can be evicted without flush. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 6c311ec6	17-Jan-2014	Chris Fries <cfries@motorola.com>	f2fs: clean checkpatch warnings Fixed a variety of trivial checkpatch warnings. The only delta should be some minor formatting on log strings that were split / too long. Signed-off-by: Chris Fries <cfries@motorola.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# c434cbc0	15-Jan-2014	Changman Lee <cm224.lee@samsung.com>	f2fs: missing REQ_META and REQ_PRIO when sync_meta_pages(META_FLUSH) Doing sync_meta_pages with META_FLUSH when checkpoint, we overide rw using WRITE_FLUSH_FUA. At this time, we also should set REQ_META\|REQ_PRIO. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# c33ec326	16-Jan-2014	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid f2fs_balance_fs call during pageout This patch should resolve the following bug. ========================================================= [ INFO: possible irq lock inversion dependency detected ] 3.13.0-rc5.f2fs+ #6 Not tainted --------------------------------------------------------- kswapd0/41 just changed the state of lock: (&sbi->gc_mutex){+.+.-.}, at: [<ffffffffa030503e>] f2fs_balance_fs+0xae/0xd0 [f2fs] but this lock took another, RECLAIM_FS-READ-unsafe lock in the past: (&sbi->cp_rwsem){++++.?} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Chain exists of: &sbi->gc_mutex --> &sbi->cp_mutex --> &sbi->cp_rwsem Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sbi->cp_rwsem); local_irq_disable(); lock(&sbi->gc_mutex); lock(&sbi->cp_mutex); <Interrupt> lock(&sbi->gc_mutex); * DEADLOCK * This bug is due to the f2fs_balance_fs call in f2fs_write_data_page. If f2fs_write_data_page is triggered by wbc->for_reclaim via kswapd, it should not call f2fs_balance_fs which tries to get a mutex grabbed by original syscall flow. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 5514f0aa	10-Jan-2014	Yuan Zhong <yuan.mark.zhong@samsung.com>	f2fs: remove the needless parameter of f2fs_wait_on_page_writeback "boo sync" parameter is never referenced in f2fs_wait_on_page_writeback. We should remove this parameter. Signed-off-by: Yuan Zhong <yuan.mark.zhong@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# a8865372	27-Dec-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: handle errors correctly during f2fs_reserve_block The get_dnode_of_data nullifies inode and node page when error is occurred. There are two cases that passes inode page into get_dnode_of_data(). 1. make_empty_dir() -> get_new_data_page() -> f2fs_reserve_block(ipage) -> get_dnode_of_data() 2. f2fs_convert_inline_data() -> __f2fs_convert_inline_data() -> f2fs_reserve_block(ipage) -> get_dnode_of_data() This patch adds correct error handling codes when get_dnode_of_data() returns an error. At first, f2fs_reserve_block() calls f2fs_put_dnode() whenever reserve_new_block returns an error. So, the rule of f2fs_reserve_block() is to nullify inode page when there is any error internally. Finally, two callers of f2fs_reserve_block() should call f2fs_put_dnode() appropriately if they got an error since successful f2fs_reserve_block(). Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 9e09fc85	26-Dec-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: refactor f2fs_convert_inline_data Change log from v1: o handle NULL pointer of grab_cache_page_write_begin() pointed by Chao Yu. This patch refactors f2fs_convert_inline_data to check a couple of conditions internally for deciding whether it needs to convert inline_data or not. So, the new f2fs_convert_inline_data initially checks: 1) f2fs_has_inline_data(), and 2) the data size to be changed. If the inode has inline_data but the size to fill is less than MAX_INLINE_DATA, then we don't need to convert the inline_data with data allocation. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 26f466f4	26-Dec-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: call f2fs_put_page at the error case In f2fs_write_begin(), if f2fs_conver_inline_data() returns an error like -ENOSPC, f2fs should call f2fs_put_page(). Otherwise, it is remained as a locked page, resulting in the following bug. [<ffffffff8114657e>] sleep_on_page+0xe/0x20 [<ffffffff81146567>] __lock_page+0x67/0x70 [<ffffffff81157d08>] truncate_inode_pages_range+0x368/0x5d0 [<ffffffff81157ff5>] truncate_inode_pages+0x15/0x20 [<ffffffff8115804b>] truncate_pagecache+0x4b/0x70 [<ffffffff81158082>] truncate_setsize+0x12/0x20 [<ffffffffa02a1842>] f2fs_setattr+0x72/0x270 [f2fs] [<ffffffff811cdae3>] notify_change+0x213/0x400 [<ffffffff811ab376>] do_truncate+0x66/0xa0 [<ffffffff811ab541>] vfs_truncate+0x191/0x1b0 [<ffffffff811ab5bc>] do_sys_truncate+0x5c/0xa0 [<ffffffff811ab78e>] SyS_truncate+0xe/0x10 [<ffffffff81756052>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 9ffe0fb5	10-Nov-2013	Huajun Li <huajun.li@intel.com>	f2fs: handle inline data operations Hook inline data read/write, truncate, fallocate, setattr, etc. Files need meet following 2 requirement to inline: 1) file size is not greater than MAX_INLINE_DATA; 2) file doesn't pre-allocate data blocks by fallocate(). FI_INLINE_DATA will not be set while creating a new regular inode because most of the files are bigger than ~3.4K. Set FI_INLINE_DATA only when data is submitted to block layer, ranther than set it while creating a new inode, this also avoids converting data from inline to normal data block and vice versa. While writting inline data to inode block, the first data block should be released if the file has a block indexed by i_addr[0]. On the other hand, when a file operation is appied to a file with inline data, we need to test if this file can remain inline by doing this operation, otherwise it should be convert into normal file by reserving a new data block, copying inline data to this new block and clear FI_INLINE_DATA flag. Because reserve a new data block here will make use of i_addr[0], if we save inline data in i_addr[0..872], then the first 4 bytes would be overwriten. This problem can be avoided simply by not using i_addr[0] for inline data. Signed-off-by: Huajun Li <huajun.li@intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Weihong Xu <weihong.xu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 944fcfc1	26-Dec-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: check the blocksize before calling generic_direct_IO path The f2fs supports 4KB block size. If user requests dwrite with under 4KB data, it allocates a new 4KB data block. However, f2fs doesn't add zero data into the untouched data area inside the newly allocated data block. This incurs an error during the xfstest #263 test as follow. 263 12s ... [failed, exit status 1] - output mismatch (see 263.out.bad) --- 263.out 2013-03-09 03:37:15.043967603 +0900 +++ 263.out.bad 2013-12-27 04:20:39.230203114 +0900 @@ -1,3 +1,976 @@ QA output created by 263 fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z +fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z +truncating to largest ever: 0x12a00 +truncating to largest ever: 0x75400 +fallocating to largest ever: 0x79cbf ... (Run 'diff -u 263.out 263.out.bad' to see the entire diff) Ran: 263 Failures: 263 Failed 1 of 1 tests It turns out that, when the test tries to write 2KB data with dio, the new dio path allocates 4KB data block without filling zero data inside the remained 2KB area. Finally, the output file contains a garbage data for that region. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 1ec79083	26-Dec-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: should put the dnode when NEW_ADDR is detected When get_dnode_of_data() in get_data_block() returns a successful dnode, we should put the dnode. But, previously, if its data block address is equal to NEW_ADDR, we didn't do that, resulting in a deadlock condition. So, this patch splits original error conditions with this case, and then calls f2fs_put_dnode before finishing the function. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 4f4124d0	21-Dec-2013	Chao Yu <chao@kernel.org>	f2fs: update several comments Update several comments: 1. use f2fs_{un}lock_op install of mutex_{un}lock_op. 2. update comment of get_data_block(). 3. update description of node offset. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 7e8f2308	20-Dec-2013	Gu Zheng <guz.fnst@cn.fujitsu.com>	f2fs: remove the rw_flag domain from f2fs_io_info When using the f2fs_io_info in the low level, we still need to merge the rw and rw_flag, so use the rw to hold all the io flags directly, and remove the rw_flag field. ps.It is based on the previous patch: f2fs: move all the bio initialization into __bio_alloc Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 940a6d34	20-Dec-2013	Gu Zheng <guz.fnst@cn.fujitsu.com>	f2fs: move all the bio initialization into __bio_alloc Move all the bio initialization into __bio_alloc, and some minor cleanups are also added. v3: Use 'bool' rather than 'int' as Kim suggested. v2: Use 'is_read' rather than 'rw' as Yu Chao suggested. Remove the needless initialization of bio->bi_private. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# bfad7c2d	16-Dec-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: introduce a new direct_IO write path Previously, f2fs doesn't support direct IOs with high performance, which throws every write requests via the buffered write path, resulting in highly performance degradation due to memory opeations like copy_from_user. This patch introduces a new direct IO path in which every write requests are processed by generic blockdev_direct_IO() with enhanced get_block function. The get_data_block() in f2fs handles: 1. if original data blocks are allocates, then give them to blockdev. 2. otherwise, a. preallocate requested block addresses b. do not use extent cache for better performance c. give the block addresses to blockdev This policy induces that: - new allocated data are sequentially written to the disk - updated data are randomly written to the disk. - f2fs gives consistency on its file meta, not file data. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 76130cca	10-Dec-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix the location of tracepoint We need to get a trace before submit_bio, since its bi_sector is remapped during the submit_bio. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 458e6197	10-Dec-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: refactor bio->rw handling This patch introduces f2fs_io_info to mitigate the complex parameter list. struct f2fs_io_info { enum page_type type; /* contains DATA/NODE/META/META_FLUSH / int rw; / contains R/RS/W/WS / int rw_flag; / contains REQ_META/REQ_PRIO / } 1. f2fs_write_data_pages - DATA - WRITE_SYNC is set when wbc->WB_SYNC_ALL. 2. sync_node_pages - NODE - WRITE_SYNC all the time 3. sync_meta_pages - META - WRITE_SYNC all the time - REQ_META \| REQ_PRIO all the time * f2fs_submit_merged_bio() handles META_FLUSH. 4. ra_nat_pages, ra_sit_pages, ra_sum_pages - META - READ_SYNC Cc: Fan Li <fanofcode.li@samsung.com> Cc: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 63a0b7cb	09-Dec-2013	Fan Li <fanofcode.li@samsung.com>	f2fs: merge pages with the same sync_mode flag Previously f2fs submits most of write requests using WRITE_SYNC, but f2fs_write_data_pages submits last write requests by sync_mode flags callers pass. This causes a performance problem since continuous pages with different sync flags can't be merged in cfq IO scheduler(thanks yu chao for pointing it out), and synchronous requests often take more time. This patch makes the following modifies to DATA writebacks: 1. every page will be written back using the sync mode caller pass. 2. only pages with the same sync mode can be merged in one bio request. These changes are restricted to DATA pages.Other types of writebacks are modified To remain synchronous. In my test with tiotest, f2fs sequence write performance is improved by about 7%-10% , and this patch has no obvious impact on other performance tests. Signed-off-by: Fan Li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 6bacf52f	05-Dec-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: add unlikely() macro for compiler more aggressively This patch adds unlikely() macro into the most of codes. The basic rule is to add that when: - checking unusual errors, - checking page mappings, - and the other unlikely conditions. Change log from v1: - Don't add unlikely for the NULL test and error test: advised by Andi Kleen. Cc: Chao Yu <chao2.yu@samsung.com> Cc: Andi Kleen <andi@firstfloor.org> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# cfb271d4	05-Dec-2013	Chao Yu <chao@kernel.org>	f2fs: add unlikely() macro for compiler optimization As we know, some of our branch condition will rarely be true. So we could add 'unlikely' to let compiler optimize these code, by this way we could drop unneeded 'jump' assemble code to improve performance. change log: o add unlikely as many as possible across the whole source files at once suggested by Jaegeuk Kim. Suggested-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 93dfe2ac	29-Nov-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: refactor bio-related operations This patch integrates redundant bio operations on read and write IOs. 1. Move bio-related codes to the top of data.c. 2. Replace f2fs_submit_bio with f2fs_submit_merged_bio, which handles read bios additionally. 3. Introduce __submit_merged_bio to submit the merged bio. 4. Change f2fs_readpage to f2fs_submit_page_bio. 5. Introduce f2fs_submit_page_mbio to integrate previous submit_read_page and submit_write_page. Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com > Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 1069bbf7	28-Nov-2013	Chao Yu <chao@kernel.org>	f2fs: check return value of f2fs_readpage in find_data_page We should return error if we do not get an updated page in find_date_page when f2fs_readpage failed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# f9a4e6df	27-Nov-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: bug fix on bit overflow from 32bits to 64bits This patch fixes some bit overflows by the shift operations. Dan Carpenter reported potential bugs on bit overflows as follows. fs/f2fs/segment.c:910 submit_write_page() warn: should 'blk_addr << ((sbi)->log_blocksize - 9)' be a 64 bit type? fs/f2fs/checkpoint.c:429 get_valid_checkpoint() warn: should '1 << ()' be a 64 bit type? fs/f2fs/data.c:408 f2fs_readpage() warn: should 'blk_addr << ((sbi)->log_blocksize - 9)' be a 64 bit type? fs/f2fs/data.c:457 submit_read_page() warn: should 'blk_addr << ((sbi)->log_blocksize - 9)' be a 64 bit type? fs/f2fs/data.c:525 get_data_block_ro() warn: should 'i << blkbits' be a 64 bit type? Bug-Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# b600965c	10-Nov-2013	Huajun Li <huajun.li@intel.com>	f2fs: add a new function: f2fs_reserve_block() Add the function f2fs_reserve_block() to easily reserve new blocks, and use it to clean up more codes. Signed-off-by: Huajun Li <huajun.li@intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Weihong Xu <weihong.xu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# d4d288bc	23-Nov-2013	Chao Yu <chao@kernel.org>	f2fs: adds a tracepoint for f2fs_submit_read_bio This patch adds a tracepoint for f2fs_submit_read_bio. Signed-off-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: integrate tracepoints of f2fs_submit_read(_write)_bio] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 87b8872d	20-Nov-2013	Chao Yu <chao@kernel.org>	f2fs: adds a tracepoint for submit_read_page This patch adds a tracepoint for submit_read_page. Signed-off-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: integrate tracepoints of f2fs_submit_read(_write)_page] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 924b720b	19-Nov-2013	Chao Yu <chao@kernel.org>	f2fs: add a new function to support for merging contiguous read For better read performance, we add a new function to support for merging contiguous read as the one for write. v1-->v2: o add declarations here as Gu Zheng suggested. o use new structure f2fs_bio_info introduced by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Acked-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
# c11abd1a	18-Nov-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: disable the extent cache ops on high fragmented files The f2fs manages an extent cache to search a number of consecutive data blocks very quickly. However it conducts unnecessary cache operations if the file is highly fragmented with no valid extent cache. In such the case, we don't need to handle the extent cache, but just can disable the cache facility. Nevertheless, this patch gives one more chance to enable the extent cache. For example, 1. create a file 2. write data sequentially which produces a large valid extent cache 3. update some data, resulting in a fragmented extent 4. if the fragmented extent is too small, then drop extent cache 5. close the file 6. open the file again 7. give another chance to make a new extent cache 8. write data sequentially again which creates another big extent cache. ... Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 971767ca	18-Nov-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: use sbi->write_mutex for write bios This patch removes an unnecessary semaphore (i.e., sbi->bio_sem). There is no reason to use the semaphore when f2fs submits read and write IOs. Instead, let's use a write mutex and cover the sbi->bio[] by the lock. Change log from v1: o split write_mutex suggested by Chao Yu Chao described, "All DATA/NODE/META bio buffers in superblock is protected by 'sbi->write_mutex', but each bio buffer area is independent, So we should split write_mutex to three for DATA/NODE/META." Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 75c3c8bc	15-Nov-2013	Chao Yu <chao@kernel.org>	f2fs: use f2fs_put_page to release page for uniform style We should use f2fs_put_page to release page for uniform style of f2fs code. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 4f024f37	11-Oct-2013	Kent Overstreet <kmo@daterainc.com>	block: Abstract out bvec iterator Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6
# 2c30c71b	07-Nov-2013	Kent Overstreet <kmo@daterainc.com>	block: Convert various code to bio_for_each_segment() With immutable biovecs we don't want code accessing bi_io_vec directly - the uses this patch changes weren't incorrect since they all own the bio, but it makes the code harder to audit for no good reason - also, this will help with multipage bvecs later. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
# 5d56b671	29-Oct-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: add an option to avoid unnecessary BUG_ONs If you want to remove unnecessary BUG_ONs, you can just turn off F2FS_CHECK_FS in your kernel config. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 26c6b887	24-Oct-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: add tracepoint for set_page_dirty This patch adds a tracepoint for set_page_dirty. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# dcdfff65	22-Oct-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: clean up several status-related operations This patch cleans up improper definitions that update some status information. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# e479556b	27-Sep-2013	Gu Zheng <guz.fnst@cn.fujitsu.com>	f2fs: use rw_sem instead of fs_lock(locks mutex) The fs_locks is used to block other ops(ex, recovery) when doing checkpoint. And each other operate routine(besides checkpoint) needs to acquire a fs_lock, there is a terrible problem here, if these are too many concurrency threads acquiring fs_lock, so that they will block each other and may lead to some performance problem, but this is not the phenomenon we want to see. Though there are some optimization patches introduced to enhance the usage of fs_lock, but the thorough solution is using a rw_sem to replace the fs_lock. Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other, this can avoid the problem described above completely. Because of the weakness of rw_sem, the above change may introduce a potential problem that the checkpoint thread might get starved if other threads are intensively locking the read semaphore for I/O.(Pointed out by Xu Jin) In order to avoid this, a wait_list is introduced, the appending read semaphore ops will be dropped into the wait_list if checkpoint thread is waiting for write semaphore, and will be waked up when checkpoint thread gives up write semaphore. Thanks to Kim's previous review and test, and will be very glad to see other guys' performance tests about this patch. V2: -fix the potential starvation problem. -use more suitable func name suggested by Xu Jin. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: adjust minor coding standard] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# de93653f	12-Aug-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: reserve the xattr space dynamically This patch enables the number of direct pointers inside on-disk inode block to be changed dynamically according to the size of inline xattr space. The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file has inline xattr flag. The number of direct pointers that will be used by inline xattrs is defined as F2FS_INLINE_XATTR_ADDRS. Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# d59ff4df	20-Aug-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix wrong BUG_ON condition This patch removes a false-alaramed BUG_ON. The previous BUG_ON condition didn't cover the following true scenario. In f2fs_add_link, 1) get_new_data_page gives an uptodate page successfully, and then, 2) init_inode_metadata returns -ENOSPC. At this moment, a new clean data page is remained in the page cache, but its block address still indicates NEW_ADDR. After then, even if sync is called, this clean data page cannot be written to the disk due to the clean state. So this means that get_lock_data_page should make a new empty page when its block address is NEW_ADDR and its page is not uptodated. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 41dfde13	09-Aug-2013	Gu Zheng <guz.fnst@cn.fujitsu.com>	f2fs: clean up the needless end 'return' of void function Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# a569469e	05-Aug-2013	Jin Xu <jinuxstyle@gmail.com>	f2fs: fix a deadlock in fsync This patch fixes a deadlock bug that occurs quite often when there are concurrent write and fsync on a same file. Following is the simplified call trace when tasks get hung. fsync thread: - f2fs_sync_file ... - f2fs_write_data_pages ... - update_extent_cache ... - update_inode - wait_on_page_writeback bdi writeback thread - __writeback_single_inode - f2fs_write_data_pages - mutex_lock(sbi->writepages) The deadlock happens when the fsync thread waits on a inode page that has been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately, no one else could be able to submit the cached bio to block layer for writeback. This is because the fsync thread already hold a sbi->fs_lock and the sbi->writepages lock, causing the bdi thread being blocked when attempt to write data pages for the same inode. At the same time, f2fs_gc thread does not notice the situation and could not help. Even the sync syscall gets blocked. To fix it, we could submit the cached bio first before waiting on a inode page that is being written back. Signed-off-by: Jin Xu <jinuxstyle@gmail.com> [Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# df273efc	04-Aug-2013	Namjae Jeon <namjae.jeon@samsung.com>	f2fs: remove redundant code from f2fs_write_begin This code is being used for nobh_write_end() function. But since now f2fs_write_end function is added so there is no need for this code. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# f0c5e565	31-Jul-2013	Dan Carpenter <dan.carpenter@oracle.com>	f2fs: remove an unneeded kfree(NULL) This kfree() is no longer needed after a79dc083d7 "f2fs: move bio_private allocation out of f2fs_bio_alloc()". The "bio->bi_private" is NULL here so it's a no-op. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# d8207f69	24-Jul-2013	Gu Zheng <guz.fnst@cn.fujitsu.com>	f2fs: move bio_private allocation out of f2fs_bio_alloc() bio->bi_private is not always needed. As in the reading data path, end_read_io does not need bio_private for further using, so moving bio_private allocation out of f2fs_bio_alloc(). Alloc it in the submit_write_page(), and ignore it in the f2fs_readpage(). Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 45590710	15-Jul-2013	Gu Zheng <guz.fnst@cn.fujitsu.com>	f2fs: introduce help function F2FS_NODE() Introduce help function F2FS_NODE() to simplify the conversion of node_page to f2fs_node. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# a1dd3c13	26-Jun-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix to recover i_size from roll-forward If user requests many data writes and fsync together, the last updated i_size should be stored to the inode block consistently. But, previous write_end just marks the inode as dirty and doesn't update its metadata into its inode block. After that, fsync just writes the inode block with newly updated data index excluding inode metadata updates. So, this patch introduces write_end in which updates inode block too when the i_size is changed. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# b25958b6	13-Jun-2013	Haicheng Li <haicheng.li@linux.intel.com>	f2fs: optimize do_write_data_page() Since "need_inplace_update() == true" is a very rare case, using unlikely() to give compiler a chance to optimize the code. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 699489bb	07-Jun-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: sync dir->i_size with its block allocation If new dentry block is allocated and its i_size is updated, we should update its inode block together in order to sync i_size and its block allocation. Otherwise, we can loose additional dentry block due to the unconsistent i_size. Errorneous Scenario ------------------- In the recovery routine, - recovery_dentry \| - __f2fs_add_link \| \| - get_new_data_page \| \| \| - i_size_write(new_i_size) \| \| \| - mark_inode_dirty_sync(dir) \| \| - update_parent_metadata \| \| \| - mark_inode_dirty(dir) \| - write_checkpoint - sync_dirty_dir_inodes - filemap_flush(dentry_blocks) - f2fs_write_data_page - skip to write the last dentry block due to index < i_size In the above flow, new_i_size is not updated to its inode block so that the last dentry block will be lost accordingly. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 35b09d82	23-May-2013	Namjae Jeon <namjae.jeon@samsung.com>	f2fs: push some variables to debug part Some, counters are needed only for the statistical information while debugging. So, those can be controlled using CONFIG_F2FS_STAT_FS, pushing the usage for few variables under this flag. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 6f85b352	20-May-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid RECLAIM_FS-ON-W: deadlock This patch tries to avoid the following deadlock condition of which the reclaim path can trigger f2fs_balance_fs again. ================================= [ INFO: inconsistent lock state ] --------------------------------- inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. kswapd0/41 [HC0[0]:SC0[0]:HE1:SE1] takes: (&sbi->gc_mutex){+.+.?.}, at: f2fs_balance_fs+0xe6/0x100 [f2fs] {RECLAIM_FS-ON-W} state was registered at: [<ffffffff810aa5a9>] mark_held_locks+0xb9/0x140 [<ffffffff810aae85>] lockdep_trace_alloc+0x85/0xf0 [<ffffffff8113ab2c>] __alloc_pages_nodemask+0x7c/0x9b0 [<ffffffff81175aa8>] alloc_pages_current+0xb8/0x180 [<ffffffff811319cf>] __page_cache_alloc+0xaf/0xd0 [<ffffffff8113225c>] find_or_create_page+0x4c/0xb0 [<ffffffffa021359e>] find_data_page+0x14e/0x210 [f2fs] [<ffffffffa021161b>] f2fs_gc+0x9eb/0xd90 [f2fs] [<ffffffffa0218fae>] f2fs_balance_fs+0xee/0x100 [f2fs] [<ffffffffa020848c>] f2fs_setattr+0x6c/0x200 [f2fs] [<ffffffff811ae51b>] notify_change+0x1db/0x3a0 [<ffffffff8118fbd0>] do_truncate+0x60/0xa0 [<ffffffff8118fd95>] vfs_truncate+0x185/0x1b0 [<ffffffff8118fe1c>] do_sys_truncate+0x5c/0xa0 [<ffffffff8118ffee>] SyS_truncate+0xe/0x10 [<ffffffff816e2b42>] system_call_fastpath+0x16/0x1b Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 44a83ff6	19-May-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: update inode page after creation I found a bug when testing power-off-recovery as follows. [Bug Scenario] 1. create a file 2. fsync the file 3. reboot w/o any sync 4. try to recover the file - found its fsync mark - found its dentry mark : try to recover its dentry - get its file name - get its parent inode number : here we got zero value The reason why we get the wrong parent inode number is that we didn't synchronize the inode page with its newly created inode information perfectly. Especially, previous f2fs stores fi->i_pino and writes it to the cached node page in a wrong order, which incurs the zero-valued i_pino during the recovery. So, this patch modifies the creation flow to fix the synchronization order of inode page with its inode. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 64aa7ed9	19-May-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: change get_new_data_page to pass a locked node page This patch is for passing a locked node page to get_dnode_of_data. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 650495de	12-May-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix the inconsistent state of data pages In get_lock_data_page, if there is a data race between get_dnode_of_data for node and grab_cache_page for data, f2fs is able to face with the following BUG_ON(dn.data_blkaddr == NEW_ADDR). kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/data.c:251! [<ffffffffa044966c>] get_lock_data_page+0x1ec/0x210 [f2fs] Call Trace: [<ffffffffa043b089>] f2fs_readdir+0x89/0x210 [f2fs] [<ffffffff811a0920>] ? fillonedir+0x100/0x100 [<ffffffff811a0920>] ? fillonedir+0x100/0x100 [<ffffffff811a07f8>] vfs_readdir+0xb8/0xe0 [<ffffffff811a0b4f>] sys_getdents+0x8f/0x110 [<ffffffff816d7999>] system_call_fastpath+0x16/0x1b This bug is able to be occurred when the block address of the data block is changed after f2fs_put_dnode(). In order to avoid that, this patch fixes the lock order of node and data blocks in which the node block lock is covered by the data block lock. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# d47992f8	21-May-2013	Lukas Czerner <lczerner@redhat.com>	mm: change invalidatepage prototype to accept length Currently there is no way to truncate partial page where the end truncate point is not at the end of the page. This is because it was not needed and the functionality was enough for file system truncate operation to work properly. However more file systems now support punch hole feature and it can benefit from mm supporting truncating page just up to the certain point. Specifically, with this functionality truncate_inode_pages_range() can be changed so it supports truncating partial page at the end of the range (currently it will BUG_ON() if 'end' is not at the end of the page). This commit changes the invalidatepage() address space operation prototype to accept range to be invalidated and update all the instances for it. We also change the block_invalidatepage() in the same way and actually make a use of the new length argument implementing range invalidation. Actual file system implementations will follow except the file systems where the changes are really simple and should not change the behaviour in any way .Implementation for truncate_page_range() which will be able to accept page unaligned ranges will follow as well. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com>
# 531ad7d5	29-Apr-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: avoid deadlock during evict after f2fs_gc o Deadlock case #1 Thread 1: - writeback_sb_inodes - do_writepages - f2fs_write_data_pages - write_cache_pages - f2fs_write_data_page - f2fs_balance_fs - wait mutex_lock(gc_mutex) Thread 2: - f2fs_balance_fs - mutex_lock(gc_mutex) - f2fs_gc - f2fs_iget - wait iget_locked(inode->i_lock) Thread 3: - do_unlinkat - iput - lock(inode->i_lock) - evict - inode_wait_for_writeback o Deadlock case #2 Thread 1: - __writeback_single_inode : set I_SYNC - do_writepages - f2fs_write_data_page - f2fs_balance_fs - f2fs_gc - iput - evict - inode_wait_for_writeback(I_SYNC) In order to avoid this, even though iput is called with the zero-reference count, we need to stop the eviction procedure if the inode is on writeback. So this patch links f2fs_drop_inode which checks the I_SYNC flag. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# a27bb332	07-May-2013	Kent Overstreet <koverstreet@google.com>	aio: don't include aio.h in sched.h Faster kernel compiles by way of fewer unnecessary includes. [akpm@linux-foundation.org: fix fallout] [akpm@linux-foundation.org: fix build] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# afcb7ca0	25-Apr-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: check truncation of mapping after lock_page We call lock_page when we need to update a page after readpage. Between grab and lock page, the page can be truncated by other thread. So, we should check the page after lock_page whether it was truncated or not. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# c718379b	23-Apr-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: give a chance to merge IOs by IO scheduler Previously, background GC submits many 4KB read requests to load victim blocks and/or its (i)node blocks. ... f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0] f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0] f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0] ... However, by the fact that many IOs are sequential, we can give a chance to merge the IOs by IO scheduler. In order to do that, let's use blk_plug. ... f2fs_gc : f2fs_iget: ino = 143 f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee f2fs_gc : f2fs_iget: ino = 143 f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef <idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0] <idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0] <idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0] <idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0] <idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0] <idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0] <idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0] <idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0] <idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0] ... Note that this issue should be addressed in checkpoint, and some readahead flows too. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# c01e2853	23-Apr-2013	Namjae Jeon <namjae.jeon@samsung.com>	f2fs: add tracepoints to debug the block allocation Add tracepoints to debug the block allocation & fallocate. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> [Jaegeuk: enhance information] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 848753aa	23-Apr-2013	Namjae Jeon <namjae.jeon@samsung.com>	f2fs: add tracepoint for tracing the page i/o Add tracepoints for page i/o operations and block allocation tracing during page read operation. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> [Jaegeuk: combine and modify the tracepoint structures] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 6224da87	05-Apr-2013	Namjae Jeon <namjae.jeon@samsung.com>	f2fs: fix typo mistakes Fix typo mistakes. 1. I think that it should be 'L' instead of 'V'. 2. and try to fix 'Front' instead of 'Frone' Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 39936837	22-Nov-2012	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: introduce a new global lock scheme In the previous version, f2fs uses global locks according to the usage types, such as directory operations, block allocation, block write, and so on. Reference the following lock types in f2fs.h. enum lock_type { RENAME, /* for renaming operations / DENTRY_OPS, / for directory operations / DATA_WRITE, / for data write / DATA_NEW, / for data allocation / DATA_TRUNC, / for data truncate / NODE_NEW, / for node allocation / NODE_TRUNC, / for node truncate / NODE_WRITE, / for node write */ NR_LOCK_TYPE, }; In that case, we lose the performance under the multi-threading environment, since every types of operations must be conducted one at a time. In order to address the problem, let's share the locks globally with a mutex array regardless of any types. So, let users grab a mutex and perform their jobs in parallel as much as possbile. For this, I propose a new global lock scheme as follows. 0. Data structure - f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS] - f2fs_sb_info -> node_write 1. mutex_lock_op(sbi) - try to get an avaiable lock from the array. - returns the index of the gottern lock variable. 2. mutex_unlock_op(sbi, index of the lock) - unlock the given index of the lock. 3. mutex_lock_all(sbi) - grab all the locks in the array before the checkpoint. 4. mutex_unlock_all(sbi) - release all the locks in the array after checkpoint. 5. block_operations() - call mutex_lock_all() - sync_dirty_dir_inodes() - grab node_write - sync_node_pages() Note that, the pairs of mutex_lock_op()/mutex_unlock_op() and mutex_lock_all()/mutex_unlock_all() should be used together. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# cfb185a1	02-Apr-2013	P J P <ppandit@redhat.com>	f2fs: add NULL pointer check Commit - fa9150a84c - replaces a call to generic_writepages() in f2fs_write_data_pages() with write_cache_pages(), with a function pointer argument pointing to routine: __f2fs_writepage. -> https://git.kernel.org/linus/fa9150a84ca333f68127097c4fa1eda4b3913a22 This patch adds a NULL pointer check in f2fs_write_data_pages() to avoid a possible NULL pointer dereference, in case if - mapping->a_ops->writepage - is NULL. Signed-off-by: P J P <ppandit@redhat.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 0ff153a2	19-Mar-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: do not skip writing file meta during fsync This patch removes data_version check flow during the fsync call. The original purpose for the use of data_version was to avoid writng inode pages redundantly by the fsync calls repeatedly. However, when user can modify file meta and then call fsync, we should not skip fsync procedure. So, let's remove this condition check and hope that user triggers in right manner. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# c3850aa1	13-Mar-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix return value of releasepage for node and data If the return value of releasepage is equal to zero, the page cannot be reclaimed. Instead, we should return 1 in order to reclaim clean pages. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 393ff91f	08-Mar-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: reduce unncessary locking pages during read This patch reduces redundant locking and unlocking pages during read operations. In f2fs_readpage, let's use wait_on_page_locked() instead of lock_page. And then, when we need to modify any data finally, let's lock the page so that we can avoid lock contention. [readpage rule] - The f2fs_readpage returns unlocked page, or released page too in error cases. - Its caller should handle read error, -EIO, after locking the page, which indicates read completion. - Its caller should check PageUptodate after grab_cache_page. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 266e97a8	25-Feb-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: introduce readahead mode of node pages Previously, f2fs reads several node pages ahead when get_dnode_of_data is called with RDONLY_NODE flag. And, this flag is set by the following functions. - get_data_block_ro - get_lock_data_page - do_write_data_page - truncate_blocks - truncate_hole However, this readahead mechanism is initially introduced for the use of get_data_block_ro to enhance the sequential read performance. So, let's clarify all the cases with the additional modes as follows. enum { ALLOC_NODE, /* allocate a new node page if needed / LOOKUP_NODE, / look up a node without readahead / LOOKUP_NODE_RA, / * look up a node with readahead called * by get_datablock_ro. */ } Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
# c01e54b7	17-Jan-2013	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: support swapfile This patch adds f2fs_bmap operation to the data address space. This enables f2fs to support swapfile. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# fa9150a8	15-Jan-2013	Namjae Jeon <namjae.jeon@samsung.com>	f2fs: remove the blk_plug usage in f2fs_write_data_pages Let's consider the usage of blk_plug in f2fs_write_data_pages(). We can come up with the two issues: lock contention and task awareness. 1. Merging bios prior to grabing "queue lock" The f2fs merges consecutive IOs in the file system level before submitting any bios, which is similar with the back merge by the plugging mechanism in attempt_plug_merge(). Both of them need to acquire no queue lock. 2. Merging policy with respect to tasks The f2fs merges IOs as much as possible regardless of tasks, while blk-plugging is conducted on a basis of tasks. As we can understand there are trade-offs, f2fs tries to maximize the write performance with well-merged bios. As a result, if f2fs produces many consecutive but separated bios in writepages(), it would be good to use blk-plugging since f2fs would be able to avoid queue lock contention in the block layer by merging them. But, f2fs merges IOs and submit one bio, which means that there are not much chances to merge bios by attempt_plug_merge(). However, f2fs has already been used blk_plug by triggering generic_writepages() in f2fs_write_data_pages(). So to make the overall code consistency, I'd like to remove blk_plug there. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 690e4a3e	19-Dec-2012	Geert Uytterhoeven <geert@linux-m68k.org>	f2fs: add missing #include <linux/prefetch.h> m68k allmodconfig: fs/f2fs/data.c: In function ‘read_end_io’: fs/f2fs/data.c:311: error: implicit declaration of function ‘prefetchw’ fs/f2fs/segment.c: In function ‘f2fs_end_io_write’: fs/f2fs/segment.c:628: error: implicit declaration of function ‘prefetchw’ Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 3cd8a239	09-Dec-2012	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: cleanup the f2fs_bio_alloc routine Do cleanup more for better code readability. - Change the parameter set of f2fs_bio_alloc() This function should allocate a bio only since it is not something like f2fs_bio_init(). Instead, the caller should initialize the allocated bio. - Introduce SECTOR_FROM_BLOCK This macro translates a block address to its sector address. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
# 0a8165d7	28-Nov-2012	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: adjust kernel coding style As pointed out by Randy Dunlap, this patch removes all usage of "/*" for comment blocks. Instead, just use "/". Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# 25ca923b	28-Nov-2012	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: fix endian conversion bugs reported by sparse This patch should resolve the bugs reported by the sparse tool. Initial reports were written by "kbuild test robot" managed by fengguang.wu. In my local machines, I've tested also by running: > make C=2 CF="-D__CHECK_ENDIAN__" Accordingly, I've found lots of warnings and bugs related to the endian conversion. And I've fixed all at this moment. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
# eb47b800	02-Nov-2012	Jaegeuk Kim <jaegeuk@kernel.org>	f2fs: add address space operations for data This adds address space operations for data. - F2FS supports readpages(), writepages(), and direct_IO(). - Because of out-of-place writes, f2fs_direct_IO() does not write data in place. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>