History log of /linux-master/fs/f2fs/super.c
Revision Date Author Comments
# 22650a99 26-Mar-2024 Christian Brauner <brauner@kernel.org>

fs,block: yield devices early

Currently a device is only really released once the umount returns to
userspace due to how file closing works. That ultimately could cause
an old umount assumption to be violated that concurrent umount and mount
don't fail. So an exclusively held device with a temporary holder should
be yielded before the filesystem is gone. Add a helper that allows
callers to do that. This also allows us to remove the two holder ops
that Linus wasn't excited about.

Link: https://lore.kernel.org/r/20240326-vfs-bdev-end_holder-v1-1-20af85202918@kernel.org
Fixes: f3a608827d1f ("bdev: open block device as files") # mainline only
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 24593061 11-Mar-2024 Zhiguo Niu <zhiguo.niu@unisoc.com>

f2fs: fix to handle error paths of {new,change}_curseg()

{new,change}_curseg() may return error in some special cases,
error handling should be did in their callers, and this will also
facilitate subsequent error path expansion in {new,change}_curseg().

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 11bec96a 07-Mar-2024 Chao Yu <chao@kernel.org>

f2fs: zone: fix to remove pow2 check condition for zoned block device

Commit 2e2c6e9b72ce ("f2fs: remove power-of-two limitation of zoned
device") missed to remove pow2 check condition in init_blkz_info(),
fix it.

Fixes: 2e2c6e9b72ce ("f2fs: remove power-of-two limitation of zoned device")
Signed-off-by: Feng Song <songfeng@oppo.com>
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 45809cd3 03-Mar-2024 Chao Yu <chao@kernel.org>

f2fs: introduce SEGS_TO_BLKS/BLKS_TO_SEGS for cleanup

Just cleanup, no functional change.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# afbb8ff6 16-Feb-2024 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: print zone status in string and some log

No functional change, but add some more logs.

Note, it includes the spelling mistakes pointed by Colin Ian King.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4d4c5938 16-Feb-2024 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix write pointers all the time

Even if the roll forward recovery stopped due to any error, we have to fix
the write pointers in order to mount the disk from the previous checkpoint.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# de252407 20-Feb-2024 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: prevent an f2fs_gc loop during disable_checkpoint

Don't get stuck in the f2fs_gc loop while disabling checkpoint. Instead, we have
a time-based management.

Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 8b10d365 22-Feb-2024 Chao Yu <chao@kernel.org>

f2fs: introduce FAULT_NO_SEGMENT

Use it to simulate no free segment case during block allocation.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4e0197f9 20-Feb-2024 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: kill heap-based allocation

No one uses this feature. Let's kill it.

Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e39602da 12-Feb-2024 Chao Yu <chao@kernel.org>

f2fs: compress: fix to check zstd compress level correctly in mount option

f2fs only support to config zstd compress level w/ a positive number due
to layout design, but since commit e0c1b49f5b67 ("lib: zstd: Upgrade to
latest upstream zstd version 1.4.10"), zstd supports negative compress
level, so that zstd_min_clevel() may return a negative number, then w/
below mount option, .compress_level can be configed w/ a negative number,
which is not allowed to f2fs, let's add check condition to avoid it.

mount -o compress_algorithm=zstd:4294967295 /dev/sdx /mnt/f2fs

Fixes: 00e120b5e4b5 ("f2fs: assign default compression level")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a60108f7 06-Feb-2024 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use BLKS_PER_SEG, BLKS_PER_SEC, and SEGS_PER_SEC

No functional change.

Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 87161a2b 06-Feb-2024 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: deprecate io_bits

Let's deprecate an unused io_bits feature to save CPU cycles and memory.

Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0b8eb814 26-Jan-2024 Chao Yu <chao@kernel.org>

f2fs: use f2fs_err_ratelimited() to avoid redundant logs

Use f2fs_err_ratelimited() to instead f2fs_err() in
f2fs_record_stop_reason() and f2fs_record_errors() to
avoid redundant logs.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b1c9d3f8 26-Jan-2024 Chao Yu <chao@kernel.org>

f2fs: support printk_ratelimited() in f2fs_printk()

This patch supports using printk_ratelimited() in f2fs_printk(), and
wrap ratelimited f2fs_printk() into f2fs_{err,warn,info}_ratelimited(),
then, use these new helps to clean up codes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c7115e09 12-Jan-2024 Chao Yu <chao@kernel.org>

f2fs: introduce FAULT_BLKADDR_CONSISTENCE

We will encounter below inconsistent status when FAULT_BLKADDR type
fault injection is on.

Info: checkpoint state = d6 : nat_bits crc fsck compacted_summary orphan_inodes sudden-power-off
[ASSERT] (fsck_chk_inode_blk:1254) --> ino: 0x1c100 has i_blocks: 000000c0, but has 191 blocks
[FIX] (fsck_chk_inode_blk:1260) --> [0x1c100] i_blocks=0x000000c0 -> 0xbf
[FIX] (fsck_chk_inode_blk:1269) --> [0x1c100] i_compr_blocks=0x00000026 -> 0x27
[ASSERT] (fsck_chk_inode_blk:1254) --> ino: 0x1cadb has i_blocks: 0000002f, but has 46 blocks
[FIX] (fsck_chk_inode_blk:1260) --> [0x1cadb] i_blocks=0x0000002f -> 0x2e
[FIX] (fsck_chk_inode_blk:1269) --> [0x1cadb] i_compr_blocks=0x00000011 -> 0x12
[ASSERT] (fsck_chk_inode_blk:1254) --> ino: 0x1c62c has i_blocks: 00000002, but has 1 blocks
[FIX] (fsck_chk_inode_blk:1260) --> [0x1c62c] i_blocks=0x00000002 -> 0x1

After we inject fault into f2fs_is_valid_blkaddr() during truncation,
a) it missed to increase @nr_free or @valid_blocks
b) it can cause in blkaddr leak in truncated dnode
Which may cause inconsistent status.

This patch separates FAULT_BLKADDR_CONSISTENCE from FAULT_BLKADDR,
and rename FAULT_BLKADDR to FAULT_BLKADDR_VALIDITY
so that we can:
a) use FAULT_BLKADDR_CONSISTENCE in f2fs_truncate_data_blocks_range()
to simulate inconsistent issue independently, then it can verify fsck
repair flow.
b) FAULT_BLKADDR_VALIDITY fault will not cause any inconsistent status,
we can just use it to check error path handling in kernel side.

Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ccb49011 06-Feb-2024 Jan Kara <jack@suse.cz>

quota: Properly annotate i_dquot arrays with __rcu

Dquots pointed to from i_dquot arrays in inodes are protected by
dquot_srcu. Annotate them as such and change .get_dquots callback to
return properly annotated pointer to make sparse happy.

Fixes: b9ba6f94b238 ("quota: remove dqptr_sem")
Signed-off-by: Jan Kara <jack@suse.cz>


# be2760a7 20-Feb-2024 Gabriel Krisman Bertazi <krisman@suse.de>

f2fs: Configure dentry operations at dentry-creation time

This was already the case for case-insensitive before commit
bb9cd9106b22 ("fscrypt: Have filesystems handle their d_ops"), but it
was changed to set at lookup-time to facilitate the integration with
fscrypt. But it's a problem because dentries that don't get created
through ->lookup() won't have any visibility of the operations.

Since fscrypt now also supports configuring dentry operations at
creation-time, do it for any encrypted and/or casefold volume,
simplifying the implementation across these features.

Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20240221171412.10710-9-krisman@suse.de
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>


# 512383ae 23-Jan-2024 Christian Brauner <brauner@kernel.org>

f2fs: port block device access to files

Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-22-adbd023e19cc@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# f3a60882 08-Feb-2024 Christian Brauner <brauner@kernel.org>

bdev: open block device as files

Add two new helpers to allow opening block devices as files.
This is not the final infrastructure. This still opens the block device
before opening a struct a file. Until we have removed all references to
struct bdev_handle we can't switch the order:

* Introduce blk_to_file_flags() to translate from block specific to
flags usable to pen a new file.
* Introduce bdev_file_open_by_{dev,path}().
* Introduce temporary sb_bdev_handle() helper to retrieve a struct
bdev_handle from a block device file and update places that directly
reference struct bdev_handle to rely on it.
* Don't count block device openes against the number of open files. A
bdev_file_open_by_{dev,path}() file is never installed into any
file descriptor table.

One idea that came to mind was to use kernel_tmpfile_open() which
would require us to pass a path and it would then call do_dentry_open()
going through the regular fops->open::blkdev_open() path. But then we're
back to the problem of routing block specific flags such as
BLK_OPEN_RESTRICT_WRITES through the open path and would have to waste
FMODE_* flags every time we add a new one. With this we can avoid using
a flag bit and we have more leeway in how we open block devices from
bdev_open_by_{dev,path}().

Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-1-adbd023e19cc@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>


# a4af51ce 06-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

fs: super_set_uuid()

Some weird old filesytems have UUID-like things that we wish to expose
as UUIDs, but are smaller; add a length field so that the new
FS_IOC_(GET|SET)UUID ioctls can handle them in generic code.

And add a helper super_set_uuid(), for setting nonstandard length uuids.

Helper is now required for the new FS_IOC_GETUUID ioctl; if
super_set_uuid() hasn't been called, the ioctl won't be supported.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/r/20240207025624.1019754-2-kent.overstreet@linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>


# c919330d 12-Jan-2024 Eric Biggers <ebiggers@google.com>

f2fs: fix double free of f2fs_sb_info

kill_f2fs_super() is called even if f2fs_fill_super() fails.
f2fs_fill_super() frees the struct f2fs_sb_info, so it must set
sb->s_fs_info to NULL to prevent it from being freed again.

Fixes: 275dca4630c1 ("f2fs: move release of block devices to after kill_block_super()")
Reported-by: <syzbot+8f477ac014ff5b32d81f@syzkaller.appspotmail.com>
Closes: https://lore.kernel.org/lkml/0000000000006cb174060ec34502@google.com
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/linux-f2fs-devel/20240113005747.38887-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>


# 7437bb73 17-Dec-2023 Christoph Hellwig <hch@lst.de>

block: remove support for the host aware zone model

When zones were first added the SCSI and ATA specs, two different
models were supported (in addition to the drive managed one that
is invisible to the host):

- host managed where non-conventional zones there is strict requirement
to write at the write pointer, or else an error is returned
- host aware where a write point is maintained if writes always happen
at it, otherwise it is left in an under-defined state and the
sequential write preferred zones behave like conventional zones
(probably very badly performing ones, though)

Not surprisingly this lukewarm model didn't prove to be very useful and
was finally removed from the ZBC and SBC specs (NVMe never implemented
it). Due to to the easily disappearing write pointer host software
could never rely on the write pointer to actually be useful for say
recovery.

Fortunately only a few HDD prototypes shipped using this model which
never made it to mass production. Drop the support before it is too
late. Note that any such host aware prototype HDD can still be used
with Linux as we'll now treat it as a conventional HDD.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# a6a010f5 04-Dec-2023 Daniel Rosenberg <drosen@google.com>

f2fs: Restrict max filesize for 16K f2fs

Blocks are tracked by u32, so the max permitted filesize is
(U32_MAX + 1) * BLOCK_SIZE. Additionally, in order to support crypto
data unit sizes of 4K with a 16K block with IV_INO_LBLK_{32,64}, we must
further restrict max filesize to (U32_MAX + 1) * 4096. This does not
affect 4K blocksize f2fs as the natural limit for files are well below
that.

Fixes: d7e9a9037de2 ("f2fs: Support Block Size == Page Size")
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# aca90eea 02-Dec-2023 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: check write pointers when checkpoint=disable

Even if f2fs was rebooted as staying checkpoint=disable, let's match the write
pointers all the time.

Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 29215a7d 01-Dec-2023 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: allow checkpoint=disable for zoned block device

Let's allow checkpoint=disable back for zoned block device. It's very risky
as the feature relies on fsck or runtime recovery which matches the write
pointers again if the device rebooted while disabling the checkpoint.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 275dca46 27-Dec-2023 Eric Biggers <ebiggers@google.com>

f2fs: move release of block devices to after kill_block_super()

Call destroy_device_list() and free the f2fs_sb_info from
kill_f2fs_super(), after the call to kill_block_super(). This is
necessary to order it after the call to fscrypt_destroy_keyring() once
generic_shutdown_super() starts calling fscrypt_destroy_keyring() just
after calling ->put_super. This is because fscrypt_destroy_keyring()
may call into f2fs_get_devices() via the fscrypt_operations.

Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20231227171429.9223-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>


# e21fc203 23-Oct-2023 Amir Goldstein <amir73il@gmail.com>

exportfs: make ->encode_fh() a mandatory method for NFS export

Rename the default helper for encoding FILEID_INO32_GEN* file handles to
generic_encode_ino32_fh() and convert the filesystems that used the
default implementation to use the generic helper explicitly.

After this change, exportfs_encode_inode_fh() no longer has a default
implementation to encode FILEID_INO32_GEN* file handles.

This is a step towards allowing filesystems to encode non-decodeable
file handles for fanotify without having to implement any
export_operations.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231023180801.2953446-3-amir73il@gmail.com
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 1e7bef5f 20-Oct-2023 Daeho Jeong <daehojeong@google.com>

f2fs: finish previous checkpoints before returning from remount

Flush remaining checkpoint requests at the end of remount, since a new
checkpoint would be triggered while remount and we need to take care of
it before returning from remount, in order to avoid the below race
condition.

- Thread - checkpoint thread
do_remount()
down_write(&sb->s_umount);
f2fs_remount()
f2fs_disable_checkpoint(sbi) -> add checkpoints to the list
block_operations()
down_read_trylock(&sb->s_umount) = 0
up_write(&sb->s_umount);
f2fs_quota_sync()
dquot_writeback_dquots()
WARN_ON_ONCE(!rwsem_is_locked(&sb->s_umount));

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d7e9a903 02-Oct-2023 Daniel Rosenberg <drosen@google.com>

f2fs: Support Block Size == Page Size

This allows f2fs to support cases where the block size = page size for
both 4K and 16K block sizes. Other sizes should work as well, should the
need arise. This does not currently support 4K Block size filesystems if
the page size is 16K.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a4639380 04-Sep-2023 Chao Yu <chao@kernel.org>

f2fs: fix to drop meta_inode's page cache in f2fs_put_super()

syzbot reports a kernel bug as below:

F2FS-fs (loop1): detect filesystem reference count leak during umount, type: 10, count: 1
kernel BUG at fs/f2fs/super.c:1639!
CPU: 0 PID: 15451 Comm: syz-executor.1 Not tainted 6.5.0-syzkaller-09338-ge0152e7481c6 #0
RIP: 0010:f2fs_put_super+0xce1/0xed0 fs/f2fs/super.c:1639
Call Trace:
generic_shutdown_super+0x161/0x3c0 fs/super.c:693
kill_block_super+0x3b/0x70 fs/super.c:1646
kill_f2fs_super+0x2b7/0x3d0 fs/f2fs/super.c:4879
deactivate_locked_super+0x9a/0x170 fs/super.c:481
deactivate_super+0xde/0x100 fs/super.c:514
cleanup_mnt+0x222/0x3d0 fs/namespace.c:1254
task_work_run+0x14d/0x240 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x210/0x240 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
syscall_exit_to_user_mode+0x1d/0x60 kernel/entry/common.c:296
do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd

In f2fs_put_super(), it tries to do sanity check on dirty and IO
reference count of f2fs, once there is any reference count leak,
it will trigger panic.

The root case is, during f2fs_put_super(), if there is any IO error
in f2fs_wait_on_all_pages(), we missed to truncate meta_inode's page
cache later, result in panic, fix this case.

Fixes: 20872584b8c0 ("f2fs: fix to drop all dirty meta/node pages during umount()")
Reported-by: syzbot+ebd7072191e2eddd7d6e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000a14f020604a62a98@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7e1b150f 28-Aug-2023 Chao Yu <chao@kernel.org>

f2fs: compress: fix to avoid redundant compress extension

With below script, redundant compress extension will be parsed and added
by parse_options(), because parse_options() doesn't check whether the
extension is existed or not, fix it.

1. mount -t f2fs -o compress_extension=so /dev/vdb /mnt/f2fs
2. mount -t f2fs -o remount,compress_extension=so /mnt/f2fs
3. mount|grep f2fs

/dev/vdb on /mnt/f2fs type f2fs (...,compress_extension=so,compress_extension=so,...)

Fixes: 4c8ff7095bef ("f2fs: support data compression")
Fixes: 151b1982be5d ("f2fs: compress: add nocompress extensions support")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bfcba5ba 11-Sep-2023 Qi Zheng <zhengqi.arch@bytedance.com>

f2fs: dynamically allocate the f2fs-shrinker

Use new APIs to dynamically allocate the f2fs-shrinker.

Link: https://lkml.kernel.org/r/20230911094444.68966-8-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Chandan Babu R <chandan.babu@oracle.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Chuck Lever <cel@kernel.org>
Cc: Coly Li <colyli@suse.de>
Cc: Dai Ngo <Dai.Ngo@oracle.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Marijn Suijten <marijn.suijten@somainline.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nadav Amit <namit@vmware.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Olga Kornievskaia <kolga@netapp.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sean Paul <sean@poorly.run>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Song Liu <song@kernel.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 2b107946 27-Sep-2023 Jan Kara <jack@suse.cz>

f2fs: Convert to bdev_open_by_dev/path()

Convert f2fs to use bdev_open_by_dev/path() and pass the handle around.

CC: Jaegeuk Kim <jaegeuk@kernel.org>
CC: Chao Yu <chao@kernel.org>
CC: linux-f2fs-devel@lists.sourceforge.net
Acked-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230927093442.25915-23-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 11cc6426 04-Oct-2023 Jeff Layton <jlayton@kernel.org>

f2fs: convert to new timestamp accessors

Convert to using the new inode timestamp accessor functions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185347.80880-34-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 5b118884 24-Sep-2023 Eric Biggers <ebiggers@google.com>

fscrypt: support crypto data unit size less than filesystem block size

Until now, fscrypt has always used the filesystem block size as the
granularity of file contents encryption. Two scenarios have come up
where a sub-block granularity of contents encryption would be useful:

1. Inline crypto hardware that only supports a crypto data unit size
that is less than the filesystem block size.

2. Support for direct I/O at a granularity less than the filesystem
block size, for example at the block device's logical block size in
order to match the traditional direct I/O alignment requirement.

(1) first came up with older eMMC inline crypto hardware that only
supports a crypto data unit size of 512 bytes. That specific case
ultimately went away because all systems with that hardware continued
using out of tree code and never actually upgraded to the upstream
inline crypto framework. But, now it's coming back in a new way: some
current UFS controllers only support a data unit size of 4096 bytes, and
there is a proposal to increase the filesystem block size to 16K.

(2) was discussed as a "nice to have" feature, though not essential,
when support for direct I/O on encrypted files was being upstreamed.

Still, the fact that this feature has come up several times does suggest
it would be wise to have available. Therefore, this patch implements it
by using one of the reserved bytes in fscrypt_policy_v2 to allow users
to select a sub-block data unit size. Supported data unit sizes are
powers of 2 between 512 and the filesystem block size, inclusively.
Support is implemented for both the FS-layer and inline crypto cases.

This patch focuses on the basic support for sub-block data units. Some
things are out of scope for this patch but may be addressed later:

- Supporting sub-block data units in combination with
FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64, in most cases. Unfortunately this
combination usually causes data unit indices to exceed 32 bits, and
thus fscrypt_supported_policy() correctly disallows it. The users who
potentially need this combination are using f2fs. To support it, f2fs
would need to provide an option to slightly reduce its max file size.

- Supporting sub-block data units in combination with
FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32. This has the same problem
described above, but also it will need special code to make DUN
wraparound still happen on a FS block boundary.

- Supporting use case (2) mentioned above. The encrypted direct I/O
code will need to stop requiring and assuming FS block alignment.
This won't be hard, but it belongs in a separate patch.

- Supporting this feature on filesystems other than ext4 and f2fs.
(Filesystems declare support for it via their fscrypt_operations.)
On UBIFS, sub-block data units don't make sense because UBIFS encrypts
variable-length blocks as a result of compression. CephFS could
support it, but a bit more work would be needed to make the
fscrypt_*_block_inplace functions play nicely with sub-block data
units. I don't think there's a use case for this on CephFS anyway.

Link: https://lore.kernel.org/r/20230925055451.59499-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>


# 7a0263dc 24-Sep-2023 Eric Biggers <ebiggers@google.com>

fscrypt: replace get_ino_and_lblk_bits with just has_32bit_inodes

Now that fs/crypto/ computes the filesystem's lblk_bits from its maximum
file size, it is no longer necessary for filesystems to provide
lblk_bits via fscrypt_operations::get_ino_and_lblk_bits.

It is still necessary for fs/crypto/ to retrieve ino_bits from the
filesystem. However, this is used only to decide whether inode numbers
fit in 32 bits. Also, ino_bits is static for all relevant filesystems,
i.e. it doesn't depend on the filesystem instance.

Therefore, in the interest of keeping things as simple as possible,
replace 'get_ino_and_lblk_bits' with a flag 'has_32bit_inodes'. This
can always be changed back to a function if a filesystem needs it to be
dynamic, but for now a static flag is all that's needed.

Link: https://lore.kernel.org/r/20230925055451.59499-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>


# 40e13e18 24-Sep-2023 Eric Biggers <ebiggers@google.com>

fscrypt: make the bounce page pool opt-in instead of opt-out

Replace FS_CFLG_OWN_PAGES with a bit flag 'needs_bounce_pages' which has
the opposite meaning. I.e., filesystems now opt into the bounce page
pool instead of opt out. Make fscrypt_alloc_bounce_page() check that
the bounce page pool has been initialized.

I believe the opt-in makes more sense, since nothing else in
fscrypt_operations is opt-out, and these days filesystems can choose to
use blk-crypto which doesn't need the fscrypt bounce page pool. Also, I
happen to be planning to add two more flags, and I wanted to fix the
"FS_CFLG_" name anyway as it wasn't prefixed with "FSCRYPT_".

Link: https://lore.kernel.org/r/20230925055451.59499-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>


# 5970fbad 24-Sep-2023 Eric Biggers <ebiggers@google.com>

fscrypt: make it clearer that key_prefix is deprecated

fscrypt_operations::key_prefix should not be set by any filesystems that
aren't setting it already. This is already documented, but apparently
it's not sufficiently clear, as both ceph and btrfs have tried to set
it. Rename the field to legacy_key_prefix and improve the documentation
to hopefully make it clearer.

Link: https://lore.kernel.org/r/20230925055451.59499-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>


# 091a4dfb 21-Aug-2023 Chao Yu <chao@kernel.org>

f2fs: compress: fix to assign compress_level for lz4 correctly

After remount, F2FS_OPTION().compress_level was assgin to
LZ4HC_DEFAULT_CLEVEL incorrectly, result in lz4hc:9 was enabled, fix it.

1. mount /dev/vdb
/dev/vdb on /mnt/f2fs type f2fs (...,compress_algorithm=lz4,compress_log_size=2,...)
2. mount -t f2fs -o remount,compress_log_size=3 /mnt/f2fs/
3. mount|grep f2fs
/dev/vdb on /mnt/f2fs type f2fs (...,compress_algorithm=lz4:9,compress_log_size=3,...)

Fixes: 00e120b5e4b5 ("f2fs: assign default compression level")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# eb61c2cc 07-Aug-2023 Chao Yu <chao@kernel.org>

f2fs: fix to account cp stats correctly

cp_foreground_calls sysfs entry shows total CP call count rather than
foreground CP call count, fix it.

Fixes: fc7100ea2a52 ("f2fs: Add f2fs stats to sysfs")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9bf1dcbd 07-Aug-2023 Chao Yu <chao@kernel.org>

f2fs: fix to account gc stats correctly

As reported, status debugfs entry shows inconsistent GC stats as below:

GC calls: 6008 (BG: 6161)
- data segments : 3053 (BG: 3053)
- node segments : 2955 (BG: 2955)

Total GC calls is larger than BGGC calls, the reason is:
- f2fs_stat_info.call_count accounts total migrated section count
by f2fs_gc()
- f2fs_stat_info.bg_gc accounts total call times of f2fs_gc() from
background gc_thread

Another issue is gc_foreground_calls sysfs entry shows total GC call
count rather than FGGC call count.

This patch changes as below for fix:
- account GC calls and migrated segment count separately
- support to account migrated section count if it enables large section
mode
- fix to show correct value in gc_foreground_calls sysfs entry

Fixes: fc7100ea2a52 ("f2fs: Add f2fs stats to sysfs")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2bd4df8f 03-Aug-2023 Chunhai Guo <guochunhai@vivo.com>

f2fs: Only lfs mode is allowed with zoned block device feature

Now f2fs support four block allocation modes: lfs, adaptive,
fragment:segment, fragment:block. Only lfs mode is allowed with zoned block
device feature.

Fixes: 6691d940b0e0 ("f2fs: introduce fragment allocation mode mount option")
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 51bf8d3c 07-Jul-2023 Christoph Hellwig <hch@lst.de>

f2fs: don't reopen the main block device in f2fs_scan_devices

f2fs_scan_devices reopens the main device since the very beginning, which
has always been useless, and also means that we don't pass the right
holder for the reopen, which now leads to a warning as the core super.c
holder ops aren't passed in for the reopen.

Fixes: 3c62be17d4f5 ("f2fs: support multiple devices")
Fixes: 0718afd47f70 ("block: introduce holder ops")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2ea6f689 02-Aug-2023 Christoph Hellwig <hch@lst.de>

fs: use the super_block as holder when mounting file systems

The file system type is not a very useful holder as it doesn't allow us
to go back to the actual file system instance. Pass the super_block instead
which is useful when passed back to the file system driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Message-Id: <20230802154131.2221419-7-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# c62ebd35 05-Jul-2023 Jeff Layton <jlayton@kernel.org>

f2fs: convert to ctime accessor functions

In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-41-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# dde38c03 11-Jun-2023 Sheng Yong <shengyong@oppo.com>

f2fs: cleanup MIN_INLINE_XATTR_SIZE

Signed-off-by: Sheng Yong <shengyong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c571fbb5 11-Jun-2023 Sheng Yong <shengyong@oppo.com>

f2fs: add helper to check compression level

This patch adds a helper function to check if compression level is
valid.

Signed-off-by: Sheng Yong <shengyong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 00e120b5 12-Jun-2023 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: assign default compression level

Let's avoid any confusion from assigning compress_level=0 for LZ4HC and ZSTD.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ccf3ff2b 06-Jun-2023 Chao Yu <chao@kernel.org>

f2fs: introduce F2FS_QUOTA_DEFAULT_FL for cleanup

This patch adds F2FS_QUOTA_DEFAULT_FL to include two default flags:
F2FS_NOATIME_FL and F2FS_IMMUTABLE_FL, and use it to clean up codes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5079e1c0 02-Jun-2023 Chao Yu <chao@kernel.org>

f2fs: avoid dead loop in f2fs_issue_checkpoint()

generic/082 reports a bug as below:

__schedule+0x332/0xf60
schedule+0x6f/0xf0
schedule_timeout+0x23b/0x2a0
wait_for_completion+0x8f/0x140
f2fs_issue_checkpoint+0xfe/0x1b0
f2fs_sync_fs+0x9d/0xb0
sync_filesystem+0x87/0xb0
dquot_load_quota_sb+0x41b/0x460
dquot_load_quota_inode+0xa5/0x130
dquot_quota_on+0x4b/0x60
f2fs_quota_on+0xe3/0x1b0
do_quotactl+0x483/0x700
__x64_sys_quotactl+0x15c/0x310
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The root casue is race case as below:

Thread A Kworker IRQ
- write()
: write data to quota.user file

- writepages
- f2fs_submit_page_write
- __is_cp_guaranteed return false
- inc_page_count(F2FS_WB_DATA)
- submit_bio
- quotactl(Q_QUOTAON)
- f2fs_quota_on
- dquot_quota_on
- dquot_load_quota_inode
- vfs_setup_quota_inode
: inode->i_flags |= S_NOQUOTA
- f2fs_write_end_io
- __is_cp_guaranteed return true
- dec_page_count(F2FS_WB_CP_DATA)
- dquot_load_quota_sb
- f2fs_sync_fs
- f2fs_issue_checkpoint
- do_checkpoint
- f2fs_wait_on_all_pages(F2FS_WB_CP_DATA)
: loop due to F2FS_WB_CP_DATA count is negative

Calling filemap_fdatawrite() and filemap_fdatawait() to keep all data
clean before quota file setup.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 20872584 28-May-2023 Chao Yu <chao@kernel.org>

f2fs: fix to drop all dirty meta/node pages during umount()

For cp error case, there will be dirty meta/node pages remained after
f2fs_write_checkpoint() in f2fs_put_super(), drop them explicitly, and
do sanity check on reference count of dirty pages and inflight IOs.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 901c12d1 17-May-2023 Chao Yu <chao@kernel.org>

f2fs: flush error flags in workqueue

In IRQ context, it wakes up workqueue to record errors into on-disk
superblock fields rather than in-memory fields.

Fixes: 1aa161e43106 ("f2fs: fix scheduling while atomic in decompression path")
Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 458c15df 22-May-2023 Chao Yu <chao@kernel.org>

f2fs: don't reset unchangable mount option in f2fs_remount()

syzbot reports a bug as below:

general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] PREEMPT SMP KASAN
RIP: 0010:__lock_acquire+0x69/0x2000 kernel/locking/lockdep.c:4942
Call Trace:
lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5691
__raw_write_lock include/linux/rwlock_api_smp.h:209 [inline]
_raw_write_lock+0x2e/0x40 kernel/locking/spinlock.c:300
__drop_extent_tree+0x3ac/0x660 fs/f2fs/extent_cache.c:1100
f2fs_drop_extent_tree+0x17/0x30 fs/f2fs/extent_cache.c:1116
f2fs_insert_range+0x2d5/0x3c0 fs/f2fs/file.c:1664
f2fs_fallocate+0x4e4/0x6d0 fs/f2fs/file.c:1838
vfs_fallocate+0x54b/0x6b0 fs/open.c:324
ksys_fallocate fs/open.c:347 [inline]
__do_sys_fallocate fs/open.c:355 [inline]
__se_sys_fallocate fs/open.c:353 [inline]
__x64_sys_fallocate+0xbd/0x100 fs/open.c:353
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

The root cause is race condition as below:
- since it tries to remount rw filesystem, so that do_remount won't
call sb_prepare_remount_readonly to block fallocate, there may be race
condition in between remount and fallocate.
- in f2fs_remount(), default_options() will reset mount option to default
one, and then update it based on result of parse_options(), so there is
a hole which race condition can happen.

Thread A Thread B
- f2fs_fill_super
- parse_options
- clear_opt(READ_EXTENT_CACHE)

- f2fs_remount
- default_options
- set_opt(READ_EXTENT_CACHE)
- f2fs_fallocate
- f2fs_insert_range
- f2fs_drop_extent_tree
- __drop_extent_tree
- __may_extent_tree
- test_opt(READ_EXTENT_CACHE) return true
- write_lock(&et->lock) access NULL pointer
- parse_options
- clear_opt(READ_EXTENT_CACHE)

Cc: <stable@vger.kernel.org>
Reported-by: syzbot+d015b6c2fbb5c383bf08@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/20230522124203.3838360-1-chao@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 90b7c4b7 26-May-2023 Chao Yu <chao@kernel.org>

f2fs: fix to set noatime and immutable flag for quota file

We should set noatime bit for quota files, since no one cares about
atime of quota file, and we should set immutalbe bit as well, due to
nobody should write to the file through exported interfaces.

Meanwhile this patch use inode_lock to avoid race condition during
inode->i_flags, f2fs_inode->i_flags update.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b62e71be 23-Apr-2023 Chao Yu <chao@kernel.org>

f2fs: support errors=remount-ro|continue|panic mountoption

This patch supports errors=remount-ro|continue|panic mount option
for f2fs.

f2fs behaves as below in three different modes:
mode continue remount-ro panic
access ops normal noraml N/A
syscall errors -EIO -EROFS N/A
mount option rw ro N/A
pending dir write keep keep N/A
pending non-dir write drop keep N/A
pending node write drop keep N/A
pending meta write keep keep N/A

By default it uses "continue" mode.

[Yangtao helps to clean up function's name]
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 05bdb996 08-Jun-2023 Christoph Hellwig <hch@lst.de>

block: replace fmode_t with a block-specific type for block open flags

The only overlap between the block open flags mapped into the fmode_t and
other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new
blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and
->ioctl and stop abusing fmode_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd]
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 81b1fb7d 08-Jun-2023 Christoph Hellwig <hch@lst.de>

fs: remove sb->s_mode

There is no real need to store the open mode in the super_block now.
It is only used by f2fs, which can easily recalculate it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 2736e8ee 08-Jun-2023 Christoph Hellwig <hch@lst.de>

block: use the holder as indication for exclusive opens

The current interface for exclusive opens is rather confusing as it
requires both the FMODE_EXCL flag and a holder. Remove the need to pass
FMODE_EXCL and just key off the exclusive open off a non-NULL holder.

For blkdev_put this requires adding the holder argument, which provides
better debug checking that only the holder actually releases the hold,
but at the same time allows removing the now superfluous mode argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd]
Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 0718afd4 01-Jun-2023 Christoph Hellwig <hch@lst.de>

block: introduce holder ops

Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and
installed in the block_device for exclusive claims. It will be used to
allow the block layer to call back into the user of the block device for
thing like notification of a removed device or a device resize.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 2e2c6e9b 17-Apr-2023 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove power-of-two limitation of zoned device

In f2fs, there's no reason to force po2.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e1bb7d3d 02-Apr-2023 Chao Yu <chao@kernel.org>

f2fs: fix to recover quota data correctly

With -O quota mkfs option, xfstests generic/417 fails due to fsck detects
data corruption on quota inodes.

[ASSERT] (fsck_chk_quota_files:2051) --> Quota file is missing or invalid quota file content found.

The root cause is there is a hole f2fs doesn't hold quota inodes,
so all recovered quota data will be dropped due to SBI_POR_DOING
flag was set.
- f2fs_fill_super
- f2fs_recover_orphan_inodes
- f2fs_enable_quota_files
- f2fs_quota_off_umount
<--- quota inodes were dropped --->
- f2fs_recover_fsync_data
- f2fs_enable_quota_files
- f2fs_quota_off_umount

This patch tries to eliminate the hole by holding quota inodes
during entire recovery flow as below:
- f2fs_fill_super
- f2fs_recover_quota_begin
- f2fs_recover_orphan_inodes
- f2fs_recover_fsync_data
- f2fs_recover_quota_end

Then, recovered quota data can be persisted after SBI_POR_DOING
is cleared.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d78dfefc 04-Apr-2023 Chao Yu <chao@kernel.org>

f2fs: fix to check readonly condition correctly

With below case, it can mount multi-device image w/ rw option, however
one of secondary device is set as ro, later update will cause panic, so
let's introduce f2fs_dev_is_readonly(), and check multi-devices rw status
in f2fs_remount() w/ it in order to avoid such inconsistent mount status.

mkfs.f2fs -c /dev/zram1 /dev/zram0 -f
blockdev --setro /dev/zram1
mount -t f2fs dev/zram0 /mnt/f2fs
mount: /mnt/f2fs: WARNING: source write-protected, mounted read-only.
mount -t f2fs -o remount,rw mnt/f2fs
dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=8192

kernel BUG at fs/f2fs/inline.c:258!
RIP: 0010:f2fs_write_inline_data+0x23e/0x2d0 [f2fs]
Call Trace:
f2fs_write_single_data_page+0x26b/0x9f0 [f2fs]
f2fs_write_cache_pages+0x389/0xa60 [f2fs]
__f2fs_write_data_pages+0x26b/0x2d0 [f2fs]
f2fs_write_data_pages+0x2e/0x40 [f2fs]
do_writepages+0xd3/0x1b0
__writeback_single_inode+0x5b/0x420
writeback_sb_inodes+0x236/0x5a0
__writeback_inodes_wb+0x56/0xf0
wb_writeback+0x2a3/0x490
wb_do_writeback+0x2b2/0x330
wb_workfn+0x6a/0x260
process_one_work+0x270/0x5e0
worker_thread+0x52/0x3e0
kthread+0xf4/0x120
ret_from_fork+0x29/0x50

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 68f0453d 09-Apr-2023 Chao Yu <chao@kernel.org>

f2fs: use f2fs_hw_is_readonly() instead of bdev_read_only()

f2fs has supported multi-device feature, to check devices' rw status,
it should use f2fs_hw_is_readonly() rather than bdev_read_only(), fix
it.

Meanwhile, it removes f2fs_hw_is_readonly() check condition in:
- f2fs_write_checkpoint()
- f2fs_convert_inline_inode()
As it has checked f2fs_readonly() condition, and if f2fs' devices
were readonly, f2fs_readonly() must be true.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c2c14ca5 30-Mar-2023 Yangtao Li <frank.li@vivo.com>

f2fs: set default compress option only when sb_has_compression

If the compress feature is not enabled, there is no need to set
compress-related parameters.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d4998b78 19-Mar-2023 Yangtao Li <frank.li@vivo.com>

f2fs: add compression feature check for all compress mount opt

Opt_compress_chksum, Opt_compress_mode and Opt_compress_cache
lack the necessary check to see if the image supports compression,
let's add it.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1aa161e4 23-Mar-2023 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix scheduling while atomic in decompression path

[ 16.945668][ C0] Call trace:
[ 16.945678][ C0] dump_backtrace+0x110/0x204
[ 16.945706][ C0] dump_stack_lvl+0x84/0xbc
[ 16.945735][ C0] __schedule_bug+0xb8/0x1ac
[ 16.945756][ C0] __schedule+0x724/0xbdc
[ 16.945778][ C0] schedule+0x154/0x258
[ 16.945793][ C0] bit_wait_io+0x48/0xa4
[ 16.945808][ C0] out_of_line_wait_on_bit+0x114/0x198
[ 16.945824][ C0] __sync_dirty_buffer+0x1f8/0x2e8
[ 16.945853][ C0] __f2fs_commit_super+0x140/0x1f4
[ 16.945881][ C0] f2fs_commit_super+0x110/0x28c
[ 16.945898][ C0] f2fs_handle_error+0x1f4/0x2f4
[ 16.945917][ C0] f2fs_decompress_cluster+0xc4/0x450
[ 16.945942][ C0] f2fs_end_read_compressed_page+0xc0/0xfc
[ 16.945959][ C0] f2fs_handle_step_decompress+0x118/0x1cc
[ 16.945978][ C0] f2fs_read_end_io+0x168/0x2b0
[ 16.945993][ C0] bio_endio+0x25c/0x2c8
[ 16.946015][ C0] dm_io_dec_pending+0x3e8/0x57c
[ 16.946052][ C0] clone_endio+0x134/0x254
[ 16.946069][ C0] bio_endio+0x25c/0x2c8
[ 16.946084][ C0] blk_update_request+0x1d4/0x478
[ 16.946103][ C0] scsi_end_request+0x38/0x4cc
[ 16.946129][ C0] scsi_io_completion+0x94/0x184
[ 16.946147][ C0] scsi_finish_command+0xe8/0x154
[ 16.946164][ C0] scsi_complete+0x90/0x1d8
[ 16.946181][ C0] blk_done_softirq+0xa4/0x11c
[ 16.946198][ C0] _stext+0x184/0x614
[ 16.946214][ C0] __irq_exit_rcu+0x78/0x144
[ 16.946234][ C0] handle_domain_irq+0xd4/0x154
[ 16.946260][ C0] gic_handle_irq.33881+0x5c/0x27c
[ 16.946281][ C0] call_on_irq_stack+0x40/0x70
[ 16.946298][ C0] do_interrupt_handler+0x48/0xa4
[ 16.946313][ C0] el1_interrupt+0x38/0x68
[ 16.946346][ C0] el1h_64_irq_handler+0x20/0x30
[ 16.946362][ C0] el1h_64_irq+0x78/0x7c
[ 16.946377][ C0] finish_task_switch+0xc8/0x3d8
[ 16.946394][ C0] __schedule+0x600/0xbdc
[ 16.946408][ C0] preempt_schedule_common+0x34/0x5c
[ 16.946423][ C0] preempt_schedule+0x44/0x48
[ 16.946438][ C0] process_one_work+0x30c/0x550
[ 16.946456][ C0] worker_thread+0x414/0x8bc
[ 16.946472][ C0] kthread+0x16c/0x1e0
[ 16.946486][ C0] ret_from_fork+0x10/0x20

Fixes: bff139b49d9f ("f2fs: handle decompress only post processing in softirq")
Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 447286eb 16-Feb-2023 Yangtao Li <frank.li@vivo.com>

f2fs: convert to use bitmap API

Let's use BIT() and GENMASK() instead of open it.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a46bebd5 09-Feb-2023 Daeho Jeong <daehojeong@google.com>

f2fs: synchronize atomic write aborts

To fix a race condition between atomic write aborts, I use the inode
lock and make COW inode to be re-usable thoroughout the whole
atomic file inode lifetime.

Reported-by: syzbot+823000d23b3400619f7c@syzkaller.appspotmail.com
Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c5bf8348 06-Feb-2023 Yangtao Li <frank.li@vivo.com>

f2fs: fix to set ipu policy

For LFS mode, it should update outplace and no need inplace update.
When using LFS mode for small-volume devices, IPU will not be used,
and the OPU writing method is actually used, but F2FS_IPU_FORCE can
be read from the ipu_policy node, which is different from the actual
situation. And remount to lfs mode should be disallowed when
f2fs ipu is enabled, let's fix it.

Fixes: 84b89e5d943d ("f2fs: add auto tuning for small devices")
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 04d7a7ae 02-Feb-2023 Yangtao Li <frank.li@vivo.com>

f2fs: fix f2fs_show_options to show nogc_merge mount option

Commit 5911d2d1d1a3 ("f2fs: introduce gc_merge mount option") forgot
to show nogc_merge option, let's fix it.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b1c5ef26 12-Jan-2023 Yangtao Li <frank.li@vivo.com>

f2fs: return true if all cmd were issued or no cmd need to be issued for f2fs_issue_discard_timeout()

f2fs_issue_discard_timeout() returns whether discard cmds are dropped,
which does not match the meaning of the function. Let's change it to
return whether all discard cmd are issued.

After commit 4d67490498ac ("f2fs: Don't create discard thread when
device doesn't support realtime discard"), f2fs_issue_discard_timeout()
is alse called by f2fs_remount(). Since the comments of
f2fs_issue_discard_timeout() doesn't make much sense, let's update it.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2381a685 16-Jan-2023 Yangtao Li <frank.li@vivo.com>

f2fs: fix to show discard_unit mount opt

Convert to show discard_unit only when has DISCARD opt.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c40e15a9 20-Dec-2022 Yangtao Li <frank.li@vivo.com>

f2fs: merge f2fs_show_injection_info() into time_to_inject()

There is no need to additionally use f2fs_show_injection_info()
to output information. Concatenate time_to_inject() and
__time_to_inject() via a macro. In the new __time_to_inject()
function, pass in the caller function name and parent function.

In this way, we no longer need the f2fs_show_injection_info() function,
and let's remove it.

Suggested-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b5a711ac 29-Nov-2022 Yangtao Li <frank.li@vivo.com>

f2fs: judge whether discard_unit is section only when have CONFIG_BLK_DEV_ZONED

The current logic, regardless of whether CONFIG_BLK_DEV_ZONED
is enabled or not, will judge whether discard_unit is SECTION,
when f2fs_sb_has_blkzoned.

In fact, when CONFIG_BLK_DEV_ZONED is not enabled, this judgment
is a path that will never be accessed. At this time, -EINVAL will
be returned in the parse_options function, accompanied by the
message "Zoned block device support is not enabled".

Let's wrap this discard_unit judgment with CONFIG_BLK_DEV_ZONED.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# fdb7ccc3 18-Nov-2022 Yangtao Li <frank.li@vivo.com>

f2fs: introduce IS_F2FS_IPU_* macro

IS_F2FS_IPU_* macro can be used to identify whether
f2fs ipu related policies are enabled.

BTW, convert to use BIT() instead of open code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1ad2a626 07-Feb-2023 Eric Biggers <ebiggers@google.com>

f2fs: stop calling fscrypt_add_test_dummy_key()

Now that fs/crypto/ adds the test dummy encryption key on-demand when
it's needed, there's no need for individual filesystems to call
fscrypt_add_test_dummy_key(). Remove the call to it from f2fs.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20230208062107.199831-4-ebiggers@kernel.org


# 25547439 01-Dec-2022 Yangtao Li <frank.li@vivo.com>

f2fs: don't call f2fs_issue_discard_timeout() when discard_cmd_cnt is 0 in f2fs_put_super()

No need to call f2fs_issue_discard_timeout() in f2fs_put_super,
when no discard command requires issue. Since the caller of
f2fs_issue_discard_timeout() usually judges the number of discard
commands before using it. Let's move this logic to
f2fs_issue_discard_timeout().

By the way, use f2fs_realtime_discard_enable to simplify the code.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# db8dcd25 07-Dec-2022 Colin Ian King <colin.i.king@gmail.com>

f2fs: Fix spelling mistake in label: free_bio_enrty_cache -> free_bio_entry_cache

There is a spelling mistake in a label name. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 71644dff 01-Dec-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add block_age-based extent cache

This patch introduces a runtime hot/cold data separation method
for f2fs, in order to improve the accuracy for data temperature
classification, reduce the garbage collection overhead after
long-term data updates.

Enhanced hot/cold data separation can record data block update
frequency as "age" of the extent per inode, and take use of the age
info to indicate better temperature type for data block allocation:
- It records total data blocks allocated since mount;
- When file extent has been updated, it calculate the count of data
blocks allocated since last update as the age of the extent;
- Before the data block allocated, it searches for the age info and
chooses the suitable segment for allocation.

Test and result:
- Prepare: create about 30000 files
* 3% for cold files (with cold file extension like .apk, from 3M to 10M)
* 50% for warm files (with random file extension like .FcDxq, from 1K
to 4M)
* 47% for hot files (with hot file extension like .db, from 1K to 256K)
- create(5%)/random update(90%)/delete(5%) the files
* total write amount is about 70G
* fsync will be called for .db files, and buffered write will be used
for other files

The storage of test device is large enough(128G) so that it will not
switch to SSR mode during the test.

Benefit: dirty segment count increment reduce about 14%
- before: Dirty +21110
- after: Dirty +18286

Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com>
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 12607c1b 30-Nov-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: specify extent cache for read explicitly

Let's descrbie it's read extent cache.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ed8ac22b 23-Nov-2022 Yangtao Li <frank.li@vivo.com>

f2fs: introduce f2fs_is_readonly() for readability

Introduce f2fs_is_readonly() and use it to simplify code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 870af777 25-Nov-2022 Yangtao Li <frank.li@vivo.com>

f2fs: do some cleanup for f2fs module init

Just for cleanup, no functional changes.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1cd2e6d5 23-Nov-2022 Yangtao Li <frank.li@vivo.com>

f2fs: define MIN_DISCARD_GRANULARITY macro

Do cleanup in f2fs_tuning_parameters() and __init_discard_policy(),
let's use macro instead of number.

Suggested-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 66aee5aa 14-Nov-2022 Yuwei Guan <ssawgyw@gmail.com>

f2fs: change type for 'sbi->readdir_ra'

Before this patch, the varibale 'readdir_ra' takes effect if it's equal
to '1' or not, so we can change type for it from 'int' to 'bool'.

Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 777cd95b 14-Nov-2022 Yuwei Guan <ssawgyw@gmail.com>

f2fs: cleanup for 'f2fs_tuning_parameters' function

A cleanup patch for 'f2fs_tuning_parameters' function.

Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b7ad23ce 14-Nov-2022 Yuwei Guan <ssawgyw@gmail.com>

f2fs: fix to alloc_mode changed after remount on a small volume device

The commit 84b89e5d943d8 ("f2fs: add auto tuning for small devices") add
tuning for small volume device, now support to tune alloce_mode to 'reuse'
if it's small size. But the alloc_mode will change to 'default' when do
remount on this small size dievce. This patch fo fix alloc_mode changed
when do remount for a small volume device.

Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 967eaad1 10-Nov-2022 Yangtao Li <frank.li@vivo.com>

f2fs: fix to set flush_merge opt and show noflush_merge

Some minor modifications to flush_merge and related parameters:

1.The FLUSH_MERGE opt is set by default only in non-ro mode.
2.When ro and merge are set at the same time, an error is reported.
3.Display noflush_merge mount opt.

Suggested-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 92b4cf5b 08-Nov-2022 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

f2fs: initialize locks earlier in f2fs_fill_super()

syzbot is reporting lockdep warning at f2fs_handle_error() [1], for
spin_lock(&sbi->error_lock) is called before spin_lock_init() is called.
For safe locking in error handling, move initialization of locks (and
obvious structures) in f2fs_fill_super() to immediately after memory
allocation.

Link: https://syzkaller.appspot.com/bug?extid=40642be9b7e0bb28e0df [1]
Reported-by: syzbot <syzbot+40642be9b7e0bb28e0df@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+40642be9b7e0bb28e0df@syzkaller.appspotmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# cc249e4c 06-Nov-2022 Chao Yu <chao@kernel.org>

f2fs: fix to avoid accessing uninitialized spinlock

syzbot reports a kernel bug:

__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106
assign_lock_key+0x22a/0x240 kernel/locking/lockdep.c:981
register_lock_class+0x287/0x9b0 kernel/locking/lockdep.c:1294
__lock_acquire+0xe4/0x1f60 kernel/locking/lockdep.c:4934
lock_acquire+0x1a7/0x400 kernel/locking/lockdep.c:5668
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:350 [inline]
f2fs_save_errors fs/f2fs/super.c:3868 [inline]
f2fs_handle_error+0x29/0x230 fs/f2fs/super.c:3896
f2fs_iget+0x215/0x4bb0 fs/f2fs/inode.c:516
f2fs_fill_super+0x47d3/0x7b50 fs/f2fs/super.c:4222
mount_bdev+0x26c/0x3a0 fs/super.c:1401
legacy_get_tree+0xea/0x180 fs/fs_context.c:610
vfs_get_tree+0x88/0x270 fs/super.c:1531
do_new_mount+0x289/0xad0 fs/namespace.c:3040
do_mount fs/namespace.c:3383 [inline]
__do_sys_mount fs/namespace.c:3591 [inline]
__se_sys_mount+0x2e3/0x3d0 fs/namespace.c:3568
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

F2FS-fs (loop1): Failed to read F2FS meta data inode

The root cause is if sbi->error_lock may be accessed before
its initialization, fix it.

Link: https://lore.kernel.org/linux-f2fs-devel/0000000000007edb6605ecbb6442@google.com/T/#u
Reported-by: syzbot+40642be9b7e0bb28e0df@syzkaller.appspotmail.com
Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e5a0db6a 25-Oct-2022 Yangtao Li <frank.li@vivo.com>

f2fs: replace gc_urgent_high_remaining with gc_remaining_trials

The user can set the trial count limit for GC urgent and
idle mode with replaced gc_remaining_trials.. If GC thread gets
to the limit, the mode will turn back to GC normal mode finally.

It was applied only to GC_URGENT, while this patch expands it for
GC_IDLE.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7b02b220 28-Oct-2022 Chao Yu <chao@kernel.org>

f2fs: fix to destroy sbi->post_read_wq in error path of f2fs_fill_super()

In error path of f2fs_fill_super(), this patch fixes to call
f2fs_destroy_post_read_wq() once if we fail in f2fs_start_ckpt_thread().

Fixes: 261eeb9c1585 ("f2fs: introduce checkpoint_merge mount option")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6047de54 24-Oct-2022 Yangtao Li <frank.li@vivo.com>

f2fs: add barrier mount option

This patch adds a mount option, barrier, in f2fs.
The barrier option is the opposite of nobarrier.
If this option is set, cache_flush commands are allowed to be issued.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 18792e64 06-Oct-2022 Chao Yu <chao@kernel.org>

f2fs: support fault injection for f2fs_is_valid_blkaddr()

This patch supports to inject fault into f2fs_is_valid_blkaddr() to
simulate accessing inconsistent data/meta block addressses from caller.

Usage:
a) echo 262144 > /sys/fs/f2fs/<dev>/inject_type or
b) mount -o fault_type=262144 <dev> <mountpoint>

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 95fa90c9 28-Sep-2022 Chao Yu <chao@kernel.org>

f2fs: support recording errors into superblock

This patch supports to record detail reason of FSCORRUPTED error into
f2fs_super_block.s_errors[].

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a9cfee0e 28-Sep-2022 Chao Yu <chao@kernel.org>

f2fs: support recording stop_checkpoint reason into super_block

This patch supports to record stop_checkpoint error into
f2fs_super_block.s_stop_reason[].

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# da35fe96 23-Aug-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: increase the limit for reserve_root

This patch increases the threshold that limits the reserved root space from 0.2%
to 12.5% by using simple shift operation.

Typically Android sets 128MB, but if the storage capacity is 32GB, 0.2% which is
around 64MB becomes too small. Let's relax it.

Cc: stable@vger.kernel.org
Reported-by: Aran Dalton <arda@allwinnertech.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4f99484d 18-Aug-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: complete checkpoints during remount

Otherwise, pending checkpoints can contribute a race condition to give a
quota warning.

- Thread - checkpoint thread
add checkpoints to the list
do_remount()
down_write(&sb->s_umount);
f2fs_remount()
block_operations()
down_read_trylock(&sb->s_umount) = 0
up_write(&sb->s_umount);
f2fs_quota_sync()
dquot_writeback_dquots()
WARN_ON_ONCE(!rwsem_is_locked(&sb->s_umount));

Or,

do_remount()
down_write(&sb->s_umount);
f2fs_remount()
create a ckpt thread
f2fs_enable_checkpoint() adds checkpoints
wait for f2fs_sync_fs()
trigger another pending checkpoint
block_operations()
down_read_trylock(&sb->s_umount) = 0
up_write(&sb->s_umount);
f2fs_quota_sync()
dquot_writeback_dquots()
WARN_ON_ONCE(!rwsem_is_locked(&sb->s_umount));

Cc: stable@vger.kernel.org
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c7b58576 19-Aug-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: flush pending checkpoints when freezing super

This avoids -EINVAL when trying to freeze f2fs.

Cc: stable@vger.kernel.org
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b87846bd 19-Aug-2022 Eric Biggers <ebiggers@google.com>

f2fs: use memcpy_{to,from}_page() where possible

This is simpler, and as a side effect it replaces several uses of
kmap_atomic() with its recommended replacement kmap_local_page().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 80dc113a 11-Aug-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: LFS mode does not support ATGC

ATGC is using SSR which violates LFS mode used by zoned device.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0e91fc1e 01-Sep-2022 Christoph Hellwig <hch@lst.de>

fscrypt: work on block_devices instead of request_queues

request_queues are a block layer implementation detail that should not
leak into file systems. Change the fscrypt inline crypto code to
retrieve block devices instead of request_queues from the file system.
As part of that, clean up the interaction with multi-device file systems
by returning both the number of devices and the actual device array in a
single method call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[ebiggers: bug fixes and minor tweaks]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20220901193208.138056-4-ebiggers@kernel.org


# e53f8643 04-Aug-2022 Chao Yu <chao.yu@oppo.com>

f2fs: clean up f2fs_abort_atomic_write()

f2fs_abort_atomic_write() has checked whether current inode is
atomic_write one or not, it's redundant to check in its caller,
remove it for cleanup.

Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f8e2f32b 18-Jul-2022 Daeho Jeong <daehojeong@google.com>

f2fs: introduce sysfs atomic write statistics

introduce the below 4 new sysfs node for atomic write statistics.
- current_atomic_write: the total current atomic write block count,
which is not committed yet.
- peak_atomic_write: the peak value of total current atomic write block
count after boot.
- committed_atomic_block: the accumulated total committed atomic write
block count after boot.
- revoked_atomic_block: the accumulated total revoked atomic write block
count after boot.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b771aadc 28-Jun-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: enforce single zone capacity

In order to simplify the complicated per-zone capacity, let's support
only one capacity for entire zoned device.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7a8fc586 20-Jun-2022 Daeho Jeong <daehojeong@google.com>

f2fs: introduce memory mode

Introduce memory mode to supports "normal" and "low" memory modes.
"low" mode is to support low memory devices. Because of the nature of
low memory devices, in this mode, f2fs will try to save memory sometimes
by sacrificing performance. "normal" mode is the default mode and same
as before.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c5bca38d 06-Jun-2022 Eric Biggers <ebiggers@google.com>

f2fs: use the updated test_dummy_encryption helper functions

Switch f2fs over to the functions that are replacing
fscrypt_set_test_dummy_encryption(). Since f2fs hasn't been converted
to the new mount API yet, this doesn't really provide a benefit for
f2fs. But it allows fscrypt_set_test_dummy_encryption() to be removed.

Also take the opportunity to eliminate an #ifdef.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e33c267a 31-May-2022 Roman Gushchin <roman.gushchin@linux.dev>

mm: shrinkers: provide shrinkers with names

Currently shrinkers are anonymous objects. For debugging purposes they
can be identified by count/scan function names, but it's not always
useful: e.g. for superblock's shrinkers it's nice to have at least an
idea of to which superblock the shrinker belongs.

This commit adds names to shrinkers. register_shrinker() and
prealloc_shrinker() functions are extended to take a format and arguments
to master a name.

In some cases it's not possible to determine a good name at the time when
a shrinker is allocated. For such cases shrinker_debugfs_rename() is
provided.

The expected format is:
<subsystem>-<shrinker_type>[:<instance>]-<id>
For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.

After this change the shrinker debugfs directory looks like:
$ cd /sys/kernel/debug/shrinker/
$ ls
dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42
mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43
mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44
rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49
sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13
sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36
sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19
sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10
sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9
sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37
sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38
sb-dax-11 sb-proc-45 sb-tmpfs-35
sb-debugfs-7 sb-proc-46 sb-tmpfs-40

[roman.gushchin@linux.dev: fix build warnings]
Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 908ea654 25-May-2022 Yufen Yu <yuyufen@huawei.com>

f2fs: add f2fs_init_write_merge_io function

Almost all other initialization of variables in f2fs_fill_super are
extraced to a single function. Also do it for write_io[], which can
make code more clean.

This patch just refactors the code, theres no functional change.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
[Jaegeuk Kim: clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c81d5bae 06-May-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do not stop GC when requiring a free section

The f2fs_gc uses a bitmap to indicate pinned sections, but when disabling
chckpoint, we call f2fs_gc() with NULL_SEGNO which selects the same dirty
segment as a victim all the time, resulting in checkpoint=disable failure,
for example. Let's pick another one, if we fail to collect it.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d147ea4a 06-May-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce f2fs_gc_control to consolidate f2fs_gc parameters

No functional change.

- remove checkpoint=disable check for f2fs_write_checkpoint
- get sec_freed all the time

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 64e3ed0b 30-Apr-2022 Eric Biggers <ebiggers@google.com>

f2fs: reject test_dummy_encryption when !CONFIG_FS_ENCRYPTION

There is no good reason to allow this mount option when the kernel isn't
configured with encryption support. Since this option is only for
testing, we can just fix this; we don't really need to worry about
breaking anyone who might be counting on this option being ignored.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3db1de0e 28-Apr-2022 Daeho Jeong <daehojeong@google.com>

f2fs: change the current atomic write way

Current atomic write has three major issues like below.
- keeps the updates in non-reclaimable memory space and they are even
hard to be migrated, which is not good for contiguous memory
allocation.
- disk spaces used for atomic files cannot be garbage collected, so
this makes it difficult for the filesystem to be defragmented.
- If atomic write operations hit the threshold of either memory usage
or garbage collection failure count, All the atomic write operations
will fail immediately.

To resolve the issues, I will keep a COW inode internally for all the
updates to be flushed from memory, when we need to flush them out in a
situation like high memory pressure. These COW inodes will be tagged
as orphan inodes to be reclaimed in case of sudden power-cut or system
failure during atomic writes.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6213f5d4 05-May-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: don't need inode lock for system hidden quota

Let's avoid false-alarmed lockdep warning.

[ 58.914674] [T1501146] -> #2 (&sb->s_type->i_mutex_key#20){+.+.}-{3:3}:
[ 58.915975] [T1501146] system_server: down_write+0x7c/0xe0
[ 58.916738] [T1501146] system_server: f2fs_quota_sync+0x60/0x1a8
[ 58.917563] [T1501146] system_server: block_operations+0x16c/0x43c
[ 58.918410] [T1501146] system_server: f2fs_write_checkpoint+0x114/0x318
[ 58.919312] [T1501146] system_server: f2fs_issue_checkpoint+0x178/0x21c
[ 58.920214] [T1501146] system_server: f2fs_sync_fs+0x48/0x6c
[ 58.920999] [T1501146] system_server: f2fs_do_sync_file+0x334/0x738
[ 58.921862] [T1501146] system_server: f2fs_sync_file+0x30/0x48
[ 58.922667] [T1501146] system_server: __arm64_sys_fsync+0x84/0xf8
[ 58.923506] [T1501146] system_server: el0_svc_common.llvm.12821150825140585682+0xd8/0x20c
[ 58.924604] [T1501146] system_server: do_el0_svc+0x28/0xa0
[ 58.925366] [T1501146] system_server: el0_svc+0x24/0x38
[ 58.926094] [T1501146] system_server: el0_sync_handler+0x88/0xec
[ 58.926920] [T1501146] system_server: el0_sync+0x1b4/0x1c0

[ 58.927681] [T1501146] -> #1 (&sbi->cp_global_sem){+.+.}-{3:3}:
[ 58.928889] [T1501146] system_server: down_write+0x7c/0xe0
[ 58.929650] [T1501146] system_server: f2fs_write_checkpoint+0xbc/0x318
[ 58.930541] [T1501146] system_server: f2fs_issue_checkpoint+0x178/0x21c
[ 58.931443] [T1501146] system_server: f2fs_sync_fs+0x48/0x6c
[ 58.932226] [T1501146] system_server: sync_filesystem+0xac/0x130
[ 58.933053] [T1501146] system_server: generic_shutdown_super+0x38/0x150
[ 58.933958] [T1501146] system_server: kill_block_super+0x24/0x58
[ 58.934791] [T1501146] system_server: kill_f2fs_super+0xcc/0x124
[ 58.935618] [T1501146] system_server: deactivate_locked_super+0x90/0x120
[ 58.936529] [T1501146] system_server: deactivate_super+0x74/0xac
[ 58.937356] [T1501146] system_server: cleanup_mnt+0x128/0x168
[ 58.938150] [T1501146] system_server: __cleanup_mnt+0x18/0x28
[ 58.938944] [T1501146] system_server: task_work_run+0xb8/0x14c
[ 58.939749] [T1501146] system_server: do_notify_resume+0x114/0x1e8
[ 58.940595] [T1501146] system_server: work_pending+0xc/0x5f0

[ 58.941375] [T1501146] -> #0 (&sbi->gc_lock){+.+.}-{3:3}:
[ 58.942519] [T1501146] system_server: __lock_acquire+0x1270/0x2868
[ 58.943366] [T1501146] system_server: lock_acquire+0x114/0x294
[ 58.944169] [T1501146] system_server: down_write+0x7c/0xe0
[ 58.944930] [T1501146] system_server: f2fs_issue_checkpoint+0x13c/0x21c
[ 58.945831] [T1501146] system_server: f2fs_sync_fs+0x48/0x6c
[ 58.946614] [T1501146] system_server: f2fs_do_sync_file+0x334/0x738
[ 58.947472] [T1501146] system_server: f2fs_ioc_commit_atomic_write+0xc8/0x14c
[ 58.948439] [T1501146] system_server: __f2fs_ioctl+0x674/0x154c
[ 58.949253] [T1501146] system_server: f2fs_ioctl+0x54/0x88
[ 58.950018] [T1501146] system_server: __arm64_sys_ioctl+0xa8/0x110
[ 58.950865] [T1501146] system_server: el0_svc_common.llvm.12821150825140585682+0xd8/0x20c
[ 58.951965] [T1501146] system_server: do_el0_svc+0x28/0xa0
[ 58.952727] [T1501146] system_server: el0_svc+0x24/0x38
[ 58.953454] [T1501146] system_server: el0_sync_handler+0x88/0xec
[ 58.954279] [T1501146] system_server: el0_sync+0x1b4/0x1c0

Cc: stable@vger.kernel.org
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2880f47b 06-May-2022 Weichao Guo <guoweichao@oppo.com>

f2fs: skip GC if possible when checkpoint disabling

If the number of unusable blocks is not larger than
unusable capacity, we can skip GC when checkpoint
disabling.

Signed-off-by: Weichao Guo <guoweichao@oppo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
[Jaegeuk Kim: Fix missing gc_mode assignment]
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7f262f73 27-Apr-2022 Luis Chamberlain <mcgrof@kernel.org>

f2fs: ensure only power of 2 zone sizes are allowed

F2FS zoned support has power of 2 zone size assumption in many places
such as in __f2fs_issue_discard_zone, init_blkz_info. As the power of 2
requirement has been removed from the block layer, explicitly add a
condition in f2fs to allow only power of 2 zone size devices.

This condition will be relaxed once those calculation based on power of
2 is made generic.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d46db459 27-Apr-2022 Luis Chamberlain <mcgrof@kernel.org>

f2fs: call bdev_zone_sectors() only once on init_blkz_info()

Instead of calling bdev_zone_sectors() multiple times, call
it once and cache the value locally. This will make the
subsequent change easier to read.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4de85145 22-Apr-2022 Niels Dossche <dossche.niels@gmail.com>

f2fs: extend stat_lock to avoid potential race in statfs

There are multiple calculations and reads of fields of sbi that should
be protected by stat_lock. As stat_lock is not used to read these
values in statfs, this can lead to inconsistent results.
Extend the locking to prevent this issue.
Commit c9c8ed50d94c ("f2fs: fix to avoid potential race on
sbi->unusable_block_count access/update")
already added the use of sbi->stat_lock in statfs in
order to make the calculation of multiple, different fields atomic so
that results are consistent. This is similar to that patch regarding the
change in statfs.

Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9d6b0cd7 22-Feb-2022 Matthew Wilcox (Oracle) <willy@infradead.org>

fs: Remove flags parameter from aops->write_begin

There are no more aop flags left, so remove the parameter.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 930e2607 12-Apr-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove obsolete whint_mode

This patch removes obsolete whint_mode.

Fixes: 41d36a9f3e53 ("fs: remove kiocb.ki_hint")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 65d3af64 22-Mar-2022 Muchun Song <songmuchun@bytedance.com>

f2fs: allocate inode by using alloc_inode_sb()

The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().

Link: https://lkml.kernel.org/r/20220228122126.37293-6-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a64239d0 22-Mar-2022 NeilBrown <neilb@suse.de>

f2fs: replace congestion_wait() calls with io_schedule_timeout()

As congestion is no longer tracked, congestion_wait() is effectively
equivalent to io_schedule_timeout().

So introduce f2fs_io_schedule_timeout() which sets TASK_UNINTERRUPTIBLE
and call that instead.

Link: https://lkml.kernel.org/r/164549983744.9187.6425865370954230902.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 98e92867 18-Mar-2022 Chao Yu <chao@kernel.org>

f2fs: use aggressive GC policy during f2fs_disable_checkpoint()

Let's enable GC_URGENT_HIGH mode during f2fs_disable_checkpoint(),
so that we can use SSR allocator for GCed data/node persistence,
it can improve the performance due to it avoiding migration of
data/node locates in selected target segment of SSR allocator.

Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c86868bb 17-Mar-2022 Chao Yu <chao@kernel.org>

f2fs: initialize sbi->gc_mode explicitly

It needs to initialized sbi->gc_mode to GC_NORMAL explicitly.

Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ba900534 04-Mar-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs

Let's purge inode cache in order to avoid the below deadlock.

[freeze test] shrinkder
freeze_super
- pwercpu_down_write(SB_FREEZE_FS)
- super_cache_scan
- down_read(&sb->s_umount)
- prune_icache_sb
- dispose_list
- evict
- f2fs_evict_inode
thaw_super
- down_write(&sb->s_umount);
- __percpu_down_read(SB_FREEZE_FS)

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 680af5b8 15-Feb-2022 Juhyung Park <qkrwngud825@gmail.com>

f2fs: quota: fix loop condition at f2fs_quota_sync()

cnt should be passed to sb_has_quota_active() instead of type to check
active quota properly.

Moreover, when the type is -1, the compiler with enough inline knowledge
can discard sb_has_quota_active() check altogether, causing a NULL pointer
dereference at the following inode_lock(dqopt->files[cnt]):

[ 2.796010] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a0
[ 2.796024] Mem abort info:
[ 2.796025] ESR = 0x96000005
[ 2.796028] EC = 0x25: DABT (current EL), IL = 32 bits
[ 2.796029] SET = 0, FnV = 0
[ 2.796031] EA = 0, S1PTW = 0
[ 2.796032] Data abort info:
[ 2.796034] ISV = 0, ISS = 0x00000005
[ 2.796035] CM = 0, WnR = 0
[ 2.796046] user pgtable: 4k pages, 39-bit VAs, pgdp=00000003370d1000
[ 2.796048] [00000000000000a0] pgd=0000000000000000, pud=0000000000000000
[ 2.796051] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 2.796056] CPU: 7 PID: 640 Comm: f2fs_ckpt-259:7 Tainted: G S 5.4.179-arter97-r8-64666-g2f16e087f9d8 #1
[ 2.796057] Hardware name: Qualcomm Technologies, Inc. Lahaina MTP lemonadep (DT)
[ 2.796059] pstate: 80c00005 (Nzcv daif +PAN +UAO)
[ 2.796065] pc : down_write+0x28/0x70
[ 2.796070] lr : f2fs_quota_sync+0x100/0x294
[ 2.796071] sp : ffffffa3f48ffc30
[ 2.796073] x29: ffffffa3f48ffc30 x28: 0000000000000000
[ 2.796075] x27: ffffffa3f6d718b8 x26: ffffffa415fe9d80
[ 2.796077] x25: ffffffa3f7290048 x24: 0000000000000001
[ 2.796078] x23: 0000000000000000 x22: ffffffa3f7290000
[ 2.796080] x21: ffffffa3f72904a0 x20: ffffffa3f7290110
[ 2.796081] x19: ffffffa3f77a9800 x18: ffffffc020aae038
[ 2.796083] x17: ffffffa40e38e040 x16: ffffffa40e38e6d0
[ 2.796085] x15: ffffffa40e38e6cc x14: ffffffa40e38e6d0
[ 2.796086] x13: 00000000000004f6 x12: 00162c44ff493000
[ 2.796088] x11: 0000000000000400 x10: ffffffa40e38c948
[ 2.796090] x9 : 0000000000000000 x8 : 00000000000000a0
[ 2.796091] x7 : 0000000000000000 x6 : 0000d1060f00002a
[ 2.796093] x5 : ffffffa3f48ff718 x4 : 000000000000000d
[ 2.796094] x3 : 00000000060c0000 x2 : 0000000000000001
[ 2.796096] x1 : 0000000000000000 x0 : 00000000000000a0
[ 2.796098] Call trace:
[ 2.796100] down_write+0x28/0x70
[ 2.796102] f2fs_quota_sync+0x100/0x294
[ 2.796104] block_operations+0x120/0x204
[ 2.796106] f2fs_write_checkpoint+0x11c/0x520
[ 2.796107] __checkpoint_and_complete_reqs+0x7c/0xd34
[ 2.796109] issue_checkpoint_thread+0x6c/0xb8
[ 2.796112] kthread+0x138/0x414
[ 2.796114] ret_from_fork+0x10/0x18
[ 2.796117] Code: aa0803e0 aa1f03e1 52800022 aa0103e9 (c8e97d02)
[ 2.796120] ---[ end trace 96e942e8eb6a0b53 ]---
[ 2.800116] Kernel panic - not syncing: Fatal exception
[ 2.800120] SMP: stopping secondary CPUs

Fixes: 9de71ede81e6 ("f2fs: quota: fix potential deadlock")
Cc: <stable@vger.kernel.org> # v5.15+
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 984fc4e7 03-Feb-2022 Chao Yu <chao@kernel.org>

f2fs: support idmapped mounts

This patch enables idmapped mounts for f2fs, since all dedicated helpers
for this functionality existsm, so, in this patch we just pass down the
user_namespace argument from the VFS methods to the relevant helpers.

Simple idmap example on f2fs image:

1. truncate -s 128M f2fs.img
2. mkfs.f2fs f2fs.img
3. mount f2fs.img /mnt/f2fs/
4. touch /mnt/f2fs/file

5. ls -ln /mnt/f2fs/
total 0
-rw-r--r-- 1 0 0 0 2月 4 13:17 file

6. ./mount-idmapped --map-mount b:0:1001:1 /mnt/f2fs/ /mnt/scratch_f2fs/

7. ls -ln /mnt/scratch_f2fs/
total 0
-rw-r--r-- 1 1001 1001 0 2月 4 13:17 file

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 47c8ebcc 27-Jan-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add a way to limit roll forward recovery time

This adds a sysfs entry to call checkpoint during fsync() in order to avoid
long elapsed time to run roll-forward recovery when booting the device.
Default value doesn't enforce the limitation which is same as before.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1018a546 04-Feb-2022 Chao Yu <chao@kernel.org>

f2fs: introduce F2FS_IPU_HONOR_OPU_WRITE ipu policy

Once F2FS_IPU_FORCE policy is enabled in some cases:
a) f2fs forces to use F2FS_IPU_FORCE in a small-sized volume
b) user sets F2FS_IPU_FORCE policy via sysfs

Then we may fail to defragment file due to IPU policy check, it doesn't
make sense, let's introduce a new IPU policy to allow OPU during file
defragmentation.

In small-sized volume, let's enable F2FS_IPU_HONOR_OPU_WRITE policy
by default.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e4544b63 07-Jan-2022 Tim Murray <timmurray@google.com>

f2fs: move f2fs to use reader-unfair rwsems

f2fs rw_semaphores work better if writers can starve readers,
especially for the checkpoint thread, because writers are strictly
more important than reader threads. This prevents significant priority
inversion between low-priority readers that blocked while trying to
acquire the read lock and a second acquisition of the write lock that
might be blocking high priority work.

Signed-off-by: Tim Murray <timmurray@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5298d4bf 17-Jan-2022 Christoph Hellwig <hch@lst.de>

unicode: clean up the Kconfig symbol confusion

Turn the CONFIG_UNICODE symbol into a tristate that generates some always
built in code and remove the confusing CONFIG_UNICODE_UTF8_DATA symbol.

Note that a lot of the IS_ENABLED() checks could be turned from cpp
statements into normal ifs, but this change is intended to be fairly
mechanic, so that should be cleaned up later.

Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>


# 300a8429 11-Dec-2021 Chao Yu <chao@kernel.org>

f2fs: fix to reserve space for IO align feature

https://bugzilla.kernel.org/show_bug.cgi?id=204137

With below script, we will hit panic during new segment allocation:

DISK=bingo.img
MOUNT_DIR=/mnt/f2fs

dd if=/dev/zero of=$DISK bs=1M count=105
mkfs.f2fe -a 1 -o 19 -t 1 -z 1 -f -q $DISK

mount -t f2fs $DISK $MOUNT_DIR -o "noinline_dentry,flush_merge,noextent_cache,mode=lfs,io_bits=7,fsync_mode=strict"

for (( i = 0; i < 4096; i++ )); do
name=`head /dev/urandom | tr -dc A-Za-z0-9 | head -c 10`
mkdir $MOUNT_DIR/$name
done

umount $MOUNT_DIR
rm $DISK

--- Core dump ---
Call Trace:
allocate_segment_by_default+0x9d/0x100 [f2fs]
f2fs_allocate_data_block+0x3c0/0x5c0 [f2fs]
do_write_page+0x62/0x110 [f2fs]
f2fs_outplace_write_data+0x43/0xc0 [f2fs]
f2fs_do_write_data_page+0x386/0x560 [f2fs]
__write_data_page+0x706/0x850 [f2fs]
f2fs_write_cache_pages+0x267/0x6a0 [f2fs]
f2fs_write_data_pages+0x19c/0x2e0 [f2fs]
do_writepages+0x1c/0x70
__filemap_fdatawrite_range+0xaa/0xe0
filemap_fdatawrite+0x1f/0x30
f2fs_sync_dirty_inodes+0x74/0x1f0 [f2fs]
block_operations+0xdc/0x350 [f2fs]
f2fs_write_checkpoint+0x104/0x1150 [f2fs]
f2fs_sync_fs+0xa2/0x120 [f2fs]
f2fs_balance_fs_bg+0x33c/0x390 [f2fs]
f2fs_write_node_pages+0x4c/0x1f0 [f2fs]
do_writepages+0x1c/0x70
__writeback_single_inode+0x45/0x320
writeback_sb_inodes+0x273/0x5c0
wb_writeback+0xff/0x2e0
wb_workfn+0xa1/0x370
process_one_work+0x138/0x350
worker_thread+0x4d/0x3d0
kthread+0x109/0x140
ret_from_fork+0x25/0x30

The root cause here is, with IO alignment feature enables, in worst
case, we need F2FS_IO_SIZE() free blocks space for single one 4k write
due to IO alignment feature will fill dummy pages to make IO being
aligned.

So we will easily run out of free segments during non-inline directory's
data writeback, even in process of foreground GC.

In order to fix this issue, I just propose to reserve additional free
space for IO alignment feature to handle worst case of free space usage
ratio during FGGC.

Fixes: 0a595ebaaa6b ("f2fs: support IO alignment for DATA and NODE writes")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3e020389 12-Dec-2021 Chao Yu <chao@kernel.org>

f2fs: support fault injection to f2fs_trylock_op()

f2fs: support fault injection for f2fs_trylock_op()

This patch supports to inject fault into f2fs_trylock_op().

Usage:
a) echo 65536 > /sys/fs/f2fs/<dev>/inject_type or
b) mount -o fault_type=65536 <dev> <mountpoint>

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 325163e9 08-Dec-2021 Daeho Jeong <daehojeong@google.com>

f2fs: add gc_urgent_high_remaining sysfs node

Added a new sysfs node called gc_urgent_high_remaining. The user can
set the trial count limit for GC urgent high mode with this value. If
GC thread gets to the limit, the mode will turn back to GC normal mode.
By default, the value is zero, which means there is no limit like before.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 49bd03cc 15-Sep-2021 Christoph Hellwig <hch@lst.de>

unicode: pass a UNICODE_AGE() tripple to utf8_load

Don't bother with pointless string parsing when the caller can just pass
the version in the format that the core expects. Also remove the
fallback to the latest version that none of the callers actually uses.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>


# 86e80575 15-Sep-2021 Christoph Hellwig <hch@lst.de>

f2fs: simplify f2fs_sb_read_encoding

Return the encoding table as the return value instead of as an argument,
and don't bother with the encoding flags as the caller can handle that
trivially.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>


# 4034247a 14-Jan-2022 NeilBrown <neilb@suse.de>

mm: introduce memalloc_retry_wait()

Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying. Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:

- a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
- a need to check for the process being signalled between failures
- the possibility that other recovery actions could be performed
- the allocation is quite deep in support code, and passing down an
extra flag to say if __GFP_NOFAIL is wanted would be clumsy.

Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.

It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.

This patch introduces memalloc_retry_wait() with takes on that
responsibility. Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used. It will wait
however is appropriate.

For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests. If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing. So there is no need for much
further waiting. memalloc_retry_wait() will wait until the current
jiffie ends. If this condition is not met, then alloc_page() won't have
waited much if at all. In that case memalloc_retry_wait() waits about
200ms. This is the delay that most current loops uses.

linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.

Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5429c9db 04-Nov-2021 Dongliang Mu <mudongliangabcd@gmail.com>

f2fs: fix UAF in f2fs_available_free_memory

if2fs_fill_super
-> f2fs_build_segment_manager
-> create_discard_cmd_control
-> f2fs_start_discard_thread

It invokes kthread_run to create a thread and run issue_discard_thread.

However, if f2fs_build_node_manager fails, the control flow goes to
free_nm and calls f2fs_destroy_node_manager. This function will free
sbi->nm_info. However, if issue_discard_thread accesses sbi->nm_info
after the deallocation, but before the f2fs_stop_discard_thread, it will
cause UAF(Use-after-free).

-> f2fs_destroy_segment_manager
-> destroy_discard_cmd_control
-> f2fs_stop_discard_thread

Fix this by stopping discard thread before f2fs_destroy_node_manager.

Note that, the commit d6d2b491a82e1 introduces the call of
f2fs_available_free_memory into issue_discard_thread.

Cc: stable@vger.kernel.org
Fixes: d6d2b491a82e ("f2fs: allow to change discard policy based on cached discard cmds")
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# cf30f6a5 11-Sep-2020 Nick Terrell <terrelln@fb.com>

lib: zstd: Add kernel-specific API

This patch:
- Moves `include/linux/zstd.h` -> `include/linux/zstd_lib.h`
- Updates modified zstd headers to yearless copyright
- Adds a new API in `include/linux/zstd.h` that is functionally
equivalent to the in-use subset of the current API. Functions are
renamed to avoid symbol collisions with zstd, to make it clear it is
not the upstream zstd API, and to follow the kernel style guide.
- Updates all callers to use the new API.

There are no functional changes in this patch. Since there are no
functional change, I felt it was okay to update all the callers in a
single patch. Once the API is approved, the callers are mechanically
changed.

This patch is preparing for the 3rd patch in this series, which updates
zstd to version 1.4.10. Since the upstream zstd API is no longer exposed
to callers, the update can happen transparently.

Signed-off-by: Nick Terrell <terrelln@fb.com>
Tested By: Paul Jones <paul@pauljones.id.au>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64
Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf>


# 10a26878 28-Oct-2021 Chao Yu <chao@kernel.org>

f2fs: support fault injection for dquot_initialize()

This patch adds a new function f2fs_dquot_initialize() to wrap
dquot_initialize(), and it supports to inject fault into
f2fs_dquot_initialize() to simulate inner failure occurs in
dquot_initialize().

Usage:
a) echo 65536 > /sys/fs/f2fs/<dev>/inject_type or
b) mount -o fault_type=65536 <dev> <mountpoint>

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ca98d721 28-Oct-2021 Chao Yu <chao@kernel.org>

f2fs: fix incorrect return value in f2fs_sanity_check_ckpt()

As Pavel Machek reported in [1]

This code looks quite confused: part of function returns 1 on
corruption, part returns -errno. The problem is not stable-specific.

[1] https://lkml.org/lkml/2021/9/19/207

Let's fix to make 'insane cp_payload case' to return 1 rater than
EFSCORRUPTED, so that return value can be kept consistent for all
error cases, it can avoid confusion of code logic.

Fixes: 65ddf6564843 ("f2fs: fix to do sanity check for sb/cp fields correctly")
Reported-by: Pavel Machek <pavel@denx.de>
Reviewed-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 71f2c820 01-Sep-2021 Chao Yu <chao@kernel.org>

f2fs: multidevice: support direct IO

Commit 3c62be17d4f5 ("f2fs: support multiple devices") missed
to support direct IO for multiple device feature, this patch
adds to support the missing part of multidevice feature.

In addition, for multiple device image, we should be aware of
any issued direct write IO rather than just buffered write IO,
so that fsync and syncfs can issue a preflush command to the
device where direct write IO goes, to persist user data for
posix compliant.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6691d940 29-Sep-2021 Daeho Jeong <daehojeong@google.com>

f2fs: introduce fragment allocation mode mount option

Added two options into "mode=" mount option to make it possible for
developers to simulate filesystem fragmentation/after-GC situation
itself. The developers use these modes to understand filesystem
fragmentation/after-GC condition well, and eventually get some
insights to handle them better.

"fragment:segment": f2fs allocates a new segment in ramdom position.
With this, we can simulate the after-GC condition.
"fragment:block" : We can scatter block allocation with
"max_fragment_chunk" and "max_fragment_hole" sysfs
nodes. f2fs will allocate 1..<max_fragment_chunk>
blocks in a chunk and make a hole in the length of
1..<max_fragment_hole> by turns in a newly allocated
free segment. Plus, this mode implicitly enables
"fragment:segment" option for more randomness.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 011e0868 27-Sep-2021 Keoseong Park <keosung.park@samsung.com>

f2fs: fix to use WHINT_MODE

Since active_logs can be set to 2 or 4 or NR_CURSEG_PERSIST_TYPE(6),
it cannot be set to NR_CURSEG_TYPE(8).
That is, whint_mode is always off.

Therefore, the condition is changed from NR_CURSEG_TYPE to NR_CURSEG_PERSIST_TYPE.

Cc: Chao Yu <chao@kernel.org>
Fixes: d0b9e42ab615 (f2fs: introduce inmem curseg)
Reported-by: tanghuan <tanghuan@vivo.com>
Signed-off-by: Keoseong Park <keosung.park@samsung.com>
Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4373b3dc 09-Sep-2021 Eric Biggers <ebiggers@google.com>

fscrypt: remove fscrypt_operations::max_namelen

The max_namelen field is unnecessary, as it is set to 255 (NAME_MAX) on
all filesystems that support fscrypt (or plan to support fscrypt). For
simplicity, just use NAME_MAX directly instead.

Link: https://lore.kernel.org/r/20210909184513.139281-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>


# f7db8dd6 29-Aug-2021 Chao Yu <chao@kernel.org>

f2fs: enable realtime discard iff device supports discard

Let's only enable realtime discard if and only if device supports
discard functionality.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# dddd3d65 19-Aug-2021 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: guarantee to write dirty data when enabling checkpoint back

We must flush all the dirty data when enabling checkpoint back. Let's guarantee
that first by adding a retry logic on sync_inodes_sb(). In addition to that,
this patch adds to flush data in fsync when checkpoint is disabled, which can
mitigate the sync_inodes_sb() failures in advance.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4d674904 19-Aug-2021 Fengnan Chang <changfengnan@vivo.com>

f2fs: Don't create discard thread when device doesn't support realtime discard

Don't create discard thread when device doesn't support realtime discard
or user specifies nodiscard mount option.

Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a4b68176 20-Aug-2021 Daeho Jeong <daehojeong@google.com>

f2fs: introduce periodic iostat io latency traces

Whenever we notice some sluggish issues on our machines, we are always
curious about how well all types of I/O in the f2fs filesystem are
handled. But, it's hard to get this kind of real data. First of all,
we need to reproduce the issue while turning on the profiling tool like
blktrace, but the issue doesn't happen again easily. Second, with the
intervention of any tools, the overall timing of the issue will be
slightly changed and it sometimes makes us hard to figure it out.

So, I added the feature printing out IO latency statistics tracepoint
events, which are minimal things to understand filesystem's I/O related
behaviors, into F2FS_IOSTAT kernel config. With "iostat_enable" sysfs
node on, we can get this statistics info in a periodic way and it
would cause the least overhead.

[samples]
f2fs_ckpt-254:1-507 [003] .... 2842.439683: f2fs_iostat_latency:
dev = (254,11), iotype [peak lat.(ms)/avg lat.(ms)/count],
rd_data [136/1/801], rd_node [136/1/1704], rd_meta [4/2/4],
wr_sync_data [164/16/3331], wr_sync_node [152/3/648],
wr_sync_meta [160/2/4243], wr_async_data [24/13/15],
wr_async_node [0/0/0], wr_async_meta [0/0/0]

f2fs_ckpt-254:1-507 [002] .... 2845.450514: f2fs_iostat_latency:
dev = (254,11), iotype [peak lat.(ms)/avg lat.(ms)/count],
rd_data [60/3/456], rd_node [60/3/1258], rd_meta [0/0/1],
wr_sync_data [120/12/2285], wr_sync_node [88/5/428],
wr_sync_meta [52/6/2990], wr_async_data [4/1/3],
wr_async_node [0/0/0], wr_async_meta [0/0/0]

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 52118743 19-Aug-2021 Daeho Jeong <daehojeong@google.com>

f2fs: separate out iostat feature

Added F2FS_IOSTAT config option to support getting IO statistics through
sysfs and printing out periodic IO statistics tracepoint events and
moved I/O statistics related codes into separate files for better
maintenance.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
[Jaegeuk Kim: set default=y]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 32410577 08-Aug-2021 Chao Yu <chao@kernel.org>

f2fs: support fault injection for f2fs_kmem_cache_alloc()

This patch supports to inject fault into f2fs_kmem_cache_alloc().

Usage:
a) echo 32768 > /sys/fs/f2fs/<dev>/inject_type or
b) mount -o fault_type=32768 <dev> <mountpoint>

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 65ddf656 05-Aug-2021 Chao Yu <chao@kernel.org>

f2fs: fix to do sanity check for sb/cp fields correctly

This patch fixes below problems of sb/cp sanity check:
- in sanity_check_raw_superi(), it missed to consider log header
blocks while cp_payload check.
- in f2fs_sanity_check_ckpt(), it missed to check nat_bits_blocks.

Cc: <stable@kernel.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0f6b56ec 02-Aug-2021 Daeho Jeong <daehojeong@google.com>

f2fs: add sysfs node to control ra_pages for fadvise seq file

fadvise() allows the user to expand the readahead window to double with
POSIX_FADV_SEQUENTIAL, now. But, in some use cases, it is not that
sufficient and we need to meet the need in a restricted way. We can
control the multiplier value of bdi device readahead between 2 (default)
and 256 for POSIX_FADV_SEQUENTIAL advise option.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4f993264 02-Aug-2021 Chao Yu <chao@kernel.org>

f2fs: introduce discard_unit mount option

As James Z reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=213877

[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response

[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.

[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory

[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64

[5.] Most recent kernel version which did not have the bug:
None

[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None

[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X

[8.] Memory consumption

With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0

With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7

The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.

For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.

This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.

Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().

In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 277afbde 28-Jul-2021 Chao Yu <chao@kernel.org>

f2fs: fix wrong checkpoint_changed value in f2fs_remount()

In f2fs_remount(), return value of test_opt() is an unsigned int type
variable, however when we compare it to a bool type variable, it cause
wrong result, fix it.

Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9de71ede 19-Jul-2021 Chao Yu <chao@kernel.org>

f2fs: quota: fix potential deadlock

xfstest generic/587 reports a deadlock issue as below:

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc1 #69 Not tainted
------------------------------------------------------
repquota/8606 is trying to acquire lock:
ffff888022ac9320 (&sb->s_type->i_mutex_key#18){+.+.}-{3:3}, at: f2fs_quota_sync+0x207/0x300 [f2fs]

but task is already holding lock:
ffff8880084bcde8 (&sbi->quota_sem){.+.+}-{3:3}, at: f2fs_quota_sync+0x59/0x300 [f2fs]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&sbi->quota_sem){.+.+}-{3:3}:
__lock_acquire+0x648/0x10b0
lock_acquire+0x128/0x470
down_read+0x3b/0x2a0
f2fs_quota_sync+0x59/0x300 [f2fs]
f2fs_quota_on+0x48/0x100 [f2fs]
do_quotactl+0x5e3/0xb30
__x64_sys_quotactl+0x23a/0x4e0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #1 (&sbi->cp_rwsem){++++}-{3:3}:
__lock_acquire+0x648/0x10b0
lock_acquire+0x128/0x470
down_read+0x3b/0x2a0
f2fs_unlink+0x353/0x670 [f2fs]
vfs_unlink+0x1c7/0x380
do_unlinkat+0x413/0x4b0
__x64_sys_unlinkat+0x50/0xb0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #0 (&sb->s_type->i_mutex_key#18){+.+.}-{3:3}:
check_prev_add+0xdc/0xb30
validate_chain+0xa67/0xb20
__lock_acquire+0x648/0x10b0
lock_acquire+0x128/0x470
down_write+0x39/0xc0
f2fs_quota_sync+0x207/0x300 [f2fs]
do_quotactl+0xaff/0xb30
__x64_sys_quotactl+0x23a/0x4e0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
&sb->s_type->i_mutex_key#18 --> &sbi->cp_rwsem --> &sbi->quota_sem

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&sbi->quota_sem);
lock(&sbi->cp_rwsem);
lock(&sbi->quota_sem);
lock(&sb->s_type->i_mutex_key#18);

*** DEADLOCK ***

3 locks held by repquota/8606:
#0: ffff88801efac0e0 (&type->s_umount_key#53){++++}-{3:3}, at: user_get_super+0xd9/0x190
#1: ffff8880084bc380 (&sbi->cp_rwsem){++++}-{3:3}, at: f2fs_quota_sync+0x3e/0x300 [f2fs]
#2: ffff8880084bcde8 (&sbi->quota_sem){.+.+}-{3:3}, at: f2fs_quota_sync+0x59/0x300 [f2fs]

stack backtrace:
CPU: 6 PID: 8606 Comm: repquota Not tainted 5.14.0-rc1 #69
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack_lvl+0xce/0x134
dump_stack+0x17/0x20
print_circular_bug.isra.0.cold+0x239/0x253
check_noncircular+0x1be/0x1f0
check_prev_add+0xdc/0xb30
validate_chain+0xa67/0xb20
__lock_acquire+0x648/0x10b0
lock_acquire+0x128/0x470
down_write+0x39/0xc0
f2fs_quota_sync+0x207/0x300 [f2fs]
do_quotactl+0xaff/0xb30
__x64_sys_quotactl+0x23a/0x4e0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f883b0b4efe

The root cause is ABBA deadlock of inode lock and cp_rwsem,
reorder locks in f2fs_quota_sync() as below to fix this issue:
- lock inode
- lock cp_rwsem
- lock quota_sem

Fixes: db6ec53b7e03 ("f2fs: add a rw_sem to cover quota flag changes")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# edc6d01b 13-Apr-2021 Jan Kara <jack@suse.cz>

f2fs: Convert to using invalidate_lock

Use invalidate_lock instead of f2fs' private i_mmap_sem. The intended
purpose is exactly the same. By this conversion we fix a long standing
race between hole punching and read(2) / readahead(2) paths that can
lead to stale page cache contents.

CC: Jaegeuk Kim <jaegeuk@kernel.org>
CC: Chao Yu <yuchao0@huawei.com>
CC: linux-f2fs-devel@lists.sourceforge.net
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 151b1982 08-Jun-2021 Fengnan Chang <changfengnan@vivo.com>

f2fs: compress: add nocompress extensions support

When we create a directory with enable compression, all file write into
directory will try to compress.But sometimes we may know, new file
cannot meet compression ratio requirements.
We need a nocompress extension to skip those files to avoid unnecessary
compress page test.

After add nocompress_extension, the priority should be:
dir_flag < comp_extention,nocompress_extension < comp_file_flag,
no_comp_file_flag.

Priority in between FS_COMPR_FL, FS_NOCOMP_FS, extensions:
* compress_extension=so; nocompress_extension=zip; chattr +c dir;
touch dir/foo.so; touch dir/bar.zip; touch dir/baz.txt; then foo.so
and baz.txt should be compresse, bar.zip should be non-compressed.
chattr +c dir/bar.zip can enable compress on bar.zip.
* compress_extension=so; nocompress_extension=zip; chattr -c dir;
touch dir/foo.so; touch dir/bar.zip; touch dir/baz.txt; then foo.so
should be compresse, bar.zip and baz.txt should be non-compressed.
chattr+c dir/bar.zip; chattr+c dir/baz.txt; can enable compress on
bar.zip and baz.txt.

Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4d9a2bb1 10-Jun-2021 Chao Yu <chao@kernel.org>

f2fs: introduce f2fs_casefolded_name slab cache

Add a slab cache: "f2fs_casefolded_name" for memory allocation
of casefold name.

Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6ce19aff 20-May-2021 Chao Yu <chao@kernel.org>

f2fs: compress: add compress_inode to cache compressed blocks

Support to use address space of inner inode to cache compressed block,
in order to improve cache hit ratio of random read.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a7d9fe3c 21-May-2021 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: support RO feature

Given RO feature in superblock, we don't need to check provisioning/reserve
spaces and SSA area.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 833dcd35 26-May-2021 Joe Perches <joe@perches.com>

f2fs: logging neatening

Update the logging uses that have unnecessary newlines as the f2fs_printk
function and so its f2fs_<level> macro callers already adds one.

This allows searching single line logging entries with an easier grep and
also avoids unnecessary blank lines in the logging.

Miscellanea:

o Coalesce formats
o Align to open parenthesis

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0dd57178 17-May-2021 Chao Yu <chao@kernel.org>

f2fs: add MODULE_SOFTDEP to ensure crc32 is included in the initramfs

As marcosfrm reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=213089

Initramfs generators rely on "pre" softdeps (and "depends") to include
additional required modules.

F2FS does not declare "pre: crc32" softdep. Then every generator (dracut,
mkinitcpio...) has to maintain a hardcoded list for this purpose.

Hence let's use MODULE_SOFTDEP("pre: crc32") in f2fs code.

Fixes: 43b6573bac95 ("f2fs: use cryptoapi crc32 functions")
Reported-by: marcosfrm <marcosfrm@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# cad83c96 07-May-2021 Chao Yu <chao@kernel.org>

f2fs: fix to avoid racing on fsync_entry_slab by multi filesystem instances

As syzbot reported, there is an use-after-free issue during f2fs recovery:

Use-after-free write at 0xffff88823bc16040 (in kfence-#10):
kmem_cache_destroy+0x1f/0x120 mm/slab_common.c:486
f2fs_recover_fsync_data+0x75b0/0x8380 fs/f2fs/recovery.c:869
f2fs_fill_super+0x9393/0xa420 fs/f2fs/super.c:3945
mount_bdev+0x26c/0x3a0 fs/super.c:1367
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x86/0x270 fs/super.c:1497
do_new_mount fs/namespace.c:2905 [inline]
path_mount+0x196f/0x2be0 fs/namespace.c:3235
do_mount fs/namespace.c:3248 [inline]
__do_sys_mount fs/namespace.c:3456 [inline]
__se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433
do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

The root cause is multi f2fs filesystem instances can race on accessing
global fsync_entry_slab pointer, result in use-after-free issue of slab
cache, fixes to init/destroy this slab cache only once during module
init/destroy procedure to avoid this issue.

Reported-by: syzbot+9d90dad32dd9727ed084@syzkaller.appspotmail.com
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5f029c04 05-Apr-2021 Yi Zhuang <zhuangyi1@huawei.com>

f2fs: clean up build warnings

This patch combined the below three clean-up patches.

- modify open brace '{' following function definitions
- ERROR: spaces required around that ':'
- ERROR: spaces required before the open parenthesis '('
- ERROR: spaces prohibited before that ','
- Made suggested modifications from checkpatch in reference to WARNING:
Missing a blank line after declarations

Signed-off-by: Yi Zhuang <zhuangyi1@huawei.com>
Signed-off-by: Jia Yang <jiayang5@huawei.com>
Signed-off-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b5d15199 01-Apr-2021 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: set checkpoint_merge by default

Once we introduced checkpoint_merge, we've seen some contention w/o the option.
In order to avoid it, let's set it by default.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 23738e74 30-Mar-2021 Chao Yu <chao@kernel.org>

f2fs: fix to restrict mount condition on readonly block device

When we mount an unclean f2fs image in a readonly block device, let's
make mount() succeed only when there is no recoverable data in that
image, otherwise after mount(), file fsyned won't be recovered as user
expected.

Fixes: 938a184265d7 ("f2fs: give a warning only for readonly partition")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5911d2d1 27-Mar-2021 Chao Yu <chao@kernel.org>

f2fs: introduce gc_merge mount option

In this patch, we will add two new mount options: "gc_merge" and
"nogc_merge", when background_gc is on, "gc_merge" option can be
set to let background GC thread to handle foreground GC requests,
it can eliminate the sluggish issue caused by slow foreground GC
operation when GC is triggered from a process with limited I/O
and CPU resources.

Original idea is from Xiang.

Signed-off-by: Gao Xiang <xiang@kernel.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3fd97359 17-Mar-2021 Chao Yu <chao@kernel.org>

f2fs: fix error path of f2fs_remount()

In error path of f2fs_remount(), it missed to restart/stop kernel thread
or enable/disable checkpoint, then mount option status may not be
consistent with real condition of filesystem, so let's reorder remount
flow a bit as below and do recovery correctly in error path:

1) handle gc thread
2) handle ckpt thread
3) handle flush thread
4) handle checkpoint disabling

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3f7070b0 17-Mar-2021 Chao Yu <chao@kernel.org>

f2fs: don't start checkpoint thread in readonly mountpoint

In readonly mountpoint, there should be no write IOs include checkpoint
IO, so that it's not needed to create kernel checkpoint thread.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# cd6ee739 20-Feb-2021 Chao Yu <chao@kernel.org>

f2fs: avoid unused f2fs_show_compress_options()

LKP reports:

fs/f2fs/super.c:1516:20: warning: unused function 'f2fs_show_compress_options' [-Wunused-function]
static inline void f2fs_show_compress_options(struct seq_file *seq,

Fix this issue by covering f2fs_show_compress_options() with
CONFIG_F2FS_FS_COMPRESSION macro.

Fixes: 4c8ff7095bef ("f2fs: support data compression")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7dede886 20-Feb-2021 Chao Yu <chao@kernel.org>

f2fs: fix to allow migrating fully valid segment

F2FS_IOC_FLUSH_DEVICE/F2FS_IOC_RESIZE_FS needs to migrate all blocks of
target segment to other place, no matter the segment has partially or fully
valid blocks.

However, after commit 803e74be04b3 ("f2fs: stop GC when the victim becomes
fully valid"), we may skip migration due to target segment is fully valid,
result in failing the ioctl interface, fix this.

Fixes: 803e74be04b3 ("f2fs: stop GC when the victim becomes fully valid")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a8affc03 10-Mar-2021 Christoph Hellwig <hch@lst.de>

block: rename BIO_MAX_PAGES to BIO_MAX_VECS

Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been
horribly confusingly misnamed. Rename it to BIO_MAX_VECS to stop
confusing users of the bio API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e2728c56 12-Jan-2021 Eric Biggers <ebiggers@google.com>

fs: don't call ->dirty_inode for lazytime timestamp updates

There is no need to call ->dirty_inode for lazytime timestamp updates
(i.e. for __mark_inode_dirty(I_DIRTY_TIME)), since by the definition of
lazytime, filesystems must ignore these updates. Filesystems only need
to care about the updated timestamps when they expire.

Therefore, only call ->dirty_inode when I_DIRTY_INODE is set.

Based on a patch from Christoph Hellwig:
https://lore.kernel.org/r/20200325122825.1086872-4-hch@lst.de

Link: https://lore.kernel.org/r/20210112190253.64307-6-ebiggers@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 938a1842 12-Feb-2021 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: give a warning only for readonly partition

Let's allow mounting readonly partition. We're able to recovery later once we
have it as read-write back.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d50dfc0c 08-Feb-2021 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: don't grab superblock freeze for flush/ckpt thread

There are controlled by f2fs_freeze().

This fixes xfstests/generic/068 which is stuck at

task:f2fs_ckpt-252:3 state:D stack: 0 pid: 5761 ppid: 2 flags:0x00004000
Call Trace:
__schedule+0x44c/0x8a0
schedule+0x4f/0xc0
percpu_rwsem_wait+0xd8/0x140
? percpu_down_write+0xf0/0xf0
__percpu_down_read+0x56/0x70
issue_checkpoint_thread+0x12c/0x160 [f2fs]
? wait_woken+0x80/0x80
kthread+0x114/0x150
? __checkpoint_and_complete_reqs+0x110/0x110 [f2fs]
? kthread_park+0x90/0x90
ret_from_fork+0x22/0x30

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 261eeb9c 18-Jan-2021 Daeho Jeong <daehojeong@google.com>

f2fs: introduce checkpoint_merge mount option

We've added a new mount options, "checkpoint_merge" and "nocheckpoint_merge",
which creates a kernel daemon and makes it to merge concurrent checkpoint
requests as much as possible to eliminate redundant checkpoint issues. Plus,
we can eliminate the sluggish issue caused by slow checkpoint operation
when the checkpoint is done in a process context in a cgroup having
low i/o budget and cpu shares. To make this do better, we set the
default i/o priority of the kernel daemon to "3", to give one higher
priority than other kernel threads. The below verification result
explains this.
The basic idea has come from https://opensource.samsung.com.

[Verification]
Android Pixel Device(ARM64, 7GB RAM, 256GB UFS)
Create two I/O cgroups (fg w/ weight 100, bg w/ wight 20)
Set "strict_guarantees" to "1" in BFQ tunables

In "fg" cgroup,
- thread A => trigger 1000 checkpoint operations
"for i in `seq 1 1000`; do touch test_dir1/file; fsync test_dir1;
done"
- thread B => gererating async. I/O
"fio --rw=write --numjobs=1 --bs=128k --runtime=3600 --time_based=1
--filename=test_img --name=test"

In "bg" cgroup,
- thread C => trigger repeated checkpoint operations
"echo $$ > /dev/blkio/bg/tasks; while true; do touch test_dir2/file;
fsync test_dir2; done"

We've measured thread A's execution time.

[ w/o patch ]
Elapsed Time: Avg. 68 seconds
[ w/ patch ]
Elapsed Time: Avg. 48 seconds

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
[Jaegeuk Kim: fix the return value in f2fs_start_ckpt_thread, reported by Dan]
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b0ff4fe7 26-Jan-2021 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: flush data when enabling checkpoint back

During checkpoint=disable period, f2fs bypasses all the synchronous IOs such as
sync and fsync. So, when enabling it back, we must flush all of them in order
to keep the data persistent. Otherwise, suddern power-cut right after enabling
checkpoint will cause data loss.

Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Cc: stable@vger.kernel.org
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d5f7bc00 14-Jan-2021 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: deprecate f2fs_trace_io

This patch deprecates f2fs_trace_io, since f2fs uses page->private more broadly,
resulting in more buggy cases.

Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 12699fb7 14-Jan-2021 Matthew Wilcox (Oracle) <willy@infradead.org>

f2fs: Remove readahead collision detection

With the new ->readahead operation, locked pages are added to the page
cache, preventing two threads from racing with each other to read the
same chunk of file, so this is dead code.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6d1451bf 12-Jan-2021 Chengguang Xu <cgxu519@mykernel.net>

f2fs: fix to use per-inode maxbytes

F2FS inode may have different max size, e.g. compressed file have
less blkaddr entries in all its direct-node blocks, result in being
with less max filesize. So change to use per-inode maxbytes.

Suggested-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3fde13f8 22-Jan-2021 Chao Yu <chao@kernel.org>

f2fs: compress: support compress level

Expand 'compress_algorithm' mount option to accept parameter as format of
<algorithm>:<level>, by this way, it gives a way to allow user to do more
specified config on lz4 and zstd compression level, then f2fs compression
can provide higher compress ratio.

In order to set compress level for lz4 algorithm, it needs to set
CONFIG_LZ4HC_COMPRESS and CONFIG_F2FS_FS_LZ4HC config to enable lz4hc
compress algorithm.

CR and performance number on lz4/lz4hc algorithm:

dd if=enwik9 of=compressed_file conv=fsync

Original blocks: 244382

lz4 lz4hc-9
compressed blocks 170647 163270
compress ratio 69.8% 66.8%
speed 16.4207 s, 60.9 MB/s 26.7299 s, 37.4 MB/s

compress ratio = after / before

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 32be0e97 22-Jan-2021 Chao Yu <chao@kernel.org>

f2fs: compress: deny setting unsupported compress algorithm

If kernel doesn't support certain kinds of compress algorithm, deny to set
them as compress algorithm of f2fs via 'compress_algorithm=%s' mount option.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 67883ade 26-Jan-2021 Christoph Hellwig <hch@lst.de>

f2fs: remove FAULT_ALLOC_BIO

Sleeping bio allocations do not fail, which means that injecting an error
into sleeping bio allocations is a little silly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e584bbe8 09-Dec-2020 Chao Yu <chao@kernel.org>

f2fs: fix shift-out-of-bounds in sanity_check_raw_super()

syzbot reported a bug which could cause shift-out-of-bounds issue,
fix it.

Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
__ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
sanity_check_raw_super fs/f2fs/super.c:2812 [inline]
read_raw_super_block fs/f2fs/super.c:3267 [inline]
f2fs_fill_super.cold+0x16c9/0x16f6 fs/f2fs/super.c:3519
mount_bdev+0x34d/0x410 fs/super.c:1366
legacy_get_tree+0x105/0x220 fs/fs_context.c:592
vfs_get_tree+0x89/0x2f0 fs/super.c:1496
do_new_mount fs/namespace.c:2896 [inline]
path_mount+0x12ae/0x1e70 fs/namespace.c:3227
do_mount fs/namespace.c:3240 [inline]
__do_sys_mount fs/namespace.c:3448 [inline]
__se_sys_mount fs/namespace.c:3425 [inline]
__x64_sys_mount+0x27f/0x300 fs/namespace.c:3425
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported-by: syzbot+ca9a785f8ac472085994@syzkaller.appspotmail.com
Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d540e35d 07-Dec-2020 Yangtao Li <tiny.windzz@gmail.com>

f2fs: don't check PAGE_SIZE again in sanity_check_raw_super()

Many flash devices read and write a single IO based on a multiple
of 4KB, and we support only 4KB page cache size now.

Since we already check page size in init_f2fs_fs(), so remove page
size check in sanity_check_raw_super().

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Shaohua Liu <liush@allwinnertech.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b9ec1094 07-Dec-2020 Yangtao Li <tiny.windzz@gmail.com>

f2fs: convert to F2FS_*_INO macro

Use F2FS_ROOT_INO, F2FS_NODE_INO and F2FS_META_INO macro
for better code readability.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Shaohua Liu <liush@allwinnertech.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 602a16d5 30-Nov-2020 Daeho Jeong <daehojeong@google.com>

f2fs: add compress_mode mount option

We will add a new "compress_mode" mount option to control file
compression mode. This supports "fs" and "user". In "fs" mode (default),
f2fs does automatic compression on the compression enabled files.
In "user" mode, f2fs disables the automaic compression and gives the
user discretion of choosing the target file and the timing. It means
the user can do manual compression/decompression on the compression
enabled files using ioctls.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3a0a9cbc 27-Nov-2020 Chao Yu <chao@kernel.org>

f2fs: fix kbytes written stat for multi-device case

For multi-device case, one f2fs image includes multi devices, so it
needs to account bytes written of all block devices belong to the image
rather than one main block device, fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b28f047b 26-Nov-2020 Chao Yu <chao@kernel.org>

f2fs: compress: support chksum

This patch supports to store chksum value with compressed
data, and verify the integrality of compressed data while
reading the data.

The feature can be enabled through specifying mount option
'compress_chksum'.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 8769918b 22-Nov-2020 Sahitya Tummala <stummala@codeaurora.org>

f2fs: change to use rwsem for cp_mutex

Use rwsem to ensure serialization of the callers and to avoid
starvation of high priority tasks, when the system is under
heavy IO workload.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7ad08a58 18-Nov-2020 Daniel Rosenberg <drosen@google.com>

f2fs: Handle casefolding with Encryption

Expand f2fs's casefolding support to include encrypted directories. To
index casefolded+encrypted directories, we use the SipHash of the
casefolded name, keyed by a key derived from the directory's fscrypt
master key. This ensures that the dirhash doesn't leak information
about the plaintext filenames.

Encryption keys are unavailable during roll-forward recovery, so we
can't compute the dirhash when recovering a new dentry in an encrypted +
casefolded directory. To avoid having to force a checkpoint when a new
file is fsync'ed, store the dirhash on-disk appended to i_name.

This patch incorporates work by Eric Biggers <ebiggers@google.com>
and Jaegeuk Kim <jaegeuk@kernel.org>.

Co-developed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bb9cd910 18-Nov-2020 Daniel Rosenberg <drosen@google.com>

fscrypt: Have filesystems handle their d_ops

This shifts the responsibility of setting up dentry operations from
fscrypt to the individual filesystems, allowing them to have their own
operations while still setting fscrypt's d_revalidate as appropriate.

Most filesystems can just use generic_set_encrypted_ci_d_ops, unless
they have their own specific dentry operations as well. That operation
will set the minimal d_ops required under the circumstances.

Since the fscrypt d_ops are set later on, we must set all d_ops there,
since we cannot adjust those later on. This should not result in any
change in behavior.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9f7e334a 18-Nov-2020 Liu Song <liu.song11@zte.com.cn>

f2fs: remove writeback_inodes_sb in f2fs_remount

Since sync_inodes_sb has been used, there is no need to
use writeback_inodes_sb, so remove it.

Signed-off-by: Liu Song <liu.song11@zte.com.cn>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 89ff6005 12-Nov-2020 Hyeongseok Kim <hyeongseok@gmail.com>

f2fs: fix double free of unicode map

In case of retrying fill_super with skip_recovery,
s_encoding for casefold would not be loaded again even though it's
already been freed because it's not NULL.
Set NULL after free to prevent double freeing when unmount.

Fixes: eca4873ee1b6 ("f2fs: Use generic casefolding support")
Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 8446fe92 24-Nov-2020 Christoph Hellwig <hch@lst.de>

block: switch partition lookup to use struct block_device

Use struct block_device to lookup partitions on a disk. This removes
all usage of struct hd_struct from the I/O path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs]
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# a782483c 26-Nov-2020 Christoph Hellwig <hch@lst.de>

block: remove the nr_sects field in struct hd_struct

Now that the hd_struct always has a block device attached to it, there is
no need for having two size field that just get out of sync.

Additionally the field in hd_struct did not use proper serialization,
possibly allowing for torn writes. By only using the block_device field
this problem also gets fixed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs]
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c68d6c88 14-Sep-2020 Chao Yu <chao@kernel.org>

f2fs: compress: introduce cic/dic slab cache

Add two slab caches: "f2fs_cic_entry" and "f2fs_dic_entry" for memory
allocation of compress_io_ctx and decompress_io_ctx structure.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 31083031 14-Sep-2020 Chao Yu <chao@kernel.org>

f2fs: compress: introduce page array slab cache

Add a per-sbi slab cache "f2fs_page_array_entry-%u:%u" for memory
allocation of page pointer array in compress context.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: Fix wrong memory allocation]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3a22e9ac 28-Sep-2020 Chao Yu <chao@kernel.org>

f2fs: fix to do sanity check on segment/section count

As syzbot reported:

BUG: KASAN: slab-out-of-bounds in init_min_max_mtime fs/f2fs/segment.c:4710 [inline]
BUG: KASAN: slab-out-of-bounds in f2fs_build_segment_manager+0x9302/0xa6d0 fs/f2fs/segment.c:4792
Read of size 8 at addr ffff8880a1b934a8 by task syz-executor682/6878

CPU: 1 PID: 6878 Comm: syz-executor682 Not tainted 5.9.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x198/0x1fd lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
init_min_max_mtime fs/f2fs/segment.c:4710 [inline]
f2fs_build_segment_manager+0x9302/0xa6d0 fs/f2fs/segment.c:4792
f2fs_fill_super+0x381a/0x6e80 fs/f2fs/super.c:3633
mount_bdev+0x32e/0x3f0 fs/super.c:1417
legacy_get_tree+0x105/0x220 fs/fs_context.c:592
vfs_get_tree+0x89/0x2f0 fs/super.c:1547
do_new_mount fs/namespace.c:2875 [inline]
path_mount+0x1387/0x20a0 fs/namespace.c:3192
do_mount fs/namespace.c:3205 [inline]
__do_sys_mount fs/namespace.c:3413 [inline]
__se_sys_mount fs/namespace.c:3390 [inline]
__x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9

The root cause is: if segs_per_sec is larger than one, and segment count
in last section is less than segs_per_sec, we will suffer out-of-boundary
memory access on sit_i->sentries[] in init_min_max_mtime().

Fix this by adding sanity check among segment count, section count and
segs_per_sec value in sanity_check_raw_super().

Reported-by: syzbot+481a3ffab50fed41dcc0@syzkaller.appspotmail.com
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f99ba9ad 17-Sep-2020 Wang Xiaojun <wangxiaojun11@huawei.com>

f2fs: fix wrong total_sections check and fsmeta check

Meta area is not included in section_count computation.
So the minimum number of total_sections is 1 meanwhile it cannot be
greater than segment_count_main.

The minimum number of meta segments is 8 (SB + 2 (CP + SIT + NAT) + SSA).

Signed-off-by: Wang Xiaojun <wangxiaojun11@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d89f5891 17-Sep-2020 Wang Xiaojun <wangxiaojun11@huawei.com>

f2fs: remove duplicated code in sanity_check_area_boundary

Use seg_end_blkaddr instead of "segment0_blkaddr + (segment_count <<
log_blocks_per_seg)".

Signed-off-by: Wang Xiaojun <wangxiaojun11@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d0660122 21-Sep-2020 Chao Yu <chao@kernel.org>

f2fs: relocate blkzoned feature check

Relocate blkzoned feature check into parse_options() like
other feature check.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 07eb1d69 21-Sep-2020 Chao Yu <chao@kernel.org>

f2fs: do sanity check on zoned block device path

sbi->devs would be initialized only if image enables multiple device
feature or blkzoned feature, if blkzoned feature flag was set by fuzz
in non-blkzoned device, we will suffer below panic:

get_zone_idx fs/f2fs/segment.c:4892 [inline]
f2fs_usable_zone_blks_in_seg fs/f2fs/segment.c:4943 [inline]
f2fs_usable_blks_in_seg+0x39b/0xa00 fs/f2fs/segment.c:4999
Call Trace:
check_block_count+0x69/0x4e0 fs/f2fs/segment.h:704
build_sit_entries fs/f2fs/segment.c:4403 [inline]
f2fs_build_segment_manager+0x51da/0xa370 fs/f2fs/segment.c:5100
f2fs_fill_super+0x3880/0x6ff0 fs/f2fs/super.c:3684
mount_bdev+0x32e/0x3f0 fs/super.c:1417
legacy_get_tree+0x105/0x220 fs/fs_context.c:592
vfs_get_tree+0x89/0x2f0 fs/super.c:1547
do_new_mount fs/namespace.c:2896 [inline]
path_mount+0x12ae/0x1e70 fs/namespace.c:3216
do_mount fs/namespace.c:3229 [inline]
__do_sys_mount fs/namespace.c:3437 [inline]
__se_sys_mount fs/namespace.c:3414 [inline]
__x64_sys_mount+0x27f/0x300 fs/namespace.c:3414
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46

Add sanity check to inconsistency on factors: blkzoned flag, device
path and device character to avoid above panic.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c8c868ab 16-Sep-2020 Eric Biggers <ebiggers@google.com>

fscrypt: make fscrypt_set_test_dummy_encryption() take a 'const char *'

fscrypt_set_test_dummy_encryption() requires that the optional argument
to the test_dummy_encryption mount option be specified as a substring_t.
That doesn't work well with filesystems that use the new mount API,
since the new way of parsing mount options doesn't use substring_t.

Make it take the argument as a 'const char *' instead.

Instead of moving the match_strdup() into the callers in ext4 and f2fs,
make them just use arg->from directly. Since the pattern is
"test_dummy_encryption=%s", the argument will be null-terminated.

Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-14-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>


# ac4acb1f 16-Sep-2020 Eric Biggers <ebiggers@google.com>

fscrypt: handle test_dummy_encryption in more logical way

The behavior of the test_dummy_encryption mount option is that when a
new file (or directory or symlink) is created in an unencrypted
directory, it's automatically encrypted using a dummy encryption policy.
That's it; in particular, the encryption (or lack thereof) of existing
files (or directories or symlinks) doesn't change.

Unfortunately the implementation of test_dummy_encryption is a bit weird
and confusing. When test_dummy_encryption is enabled and a file is
being created in an unencrypted directory, we set up an encryption key
(->i_crypt_info) for the directory. This isn't actually used to do any
encryption, however, since the directory is still unencrypted! Instead,
->i_crypt_info is only used for inheriting the encryption policy.

One consequence of this is that the filesystem ends up providing a
"dummy context" (policy + nonce) instead of a "dummy policy". In
commit ed318a6cc0b6 ("fscrypt: support test_dummy_encryption=v2"), I
mistakenly thought this was required. However, actually the nonce only
ends up being used to derive a key that is never used.

Another consequence of this implementation is that it allows for
'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
case that can be forgotten about. For example, currently
FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
dummy encryption policy when the filesystem is mounted with
test_dummy_encryption. That seems like the wrong thing to do, since
again, the directory itself is not actually encrypted.

Therefore, switch to a more logical and maintainable implementation
where the dummy encryption policy inheritance is done without setting up
keys for unencrypted directories. This involves:

- Adding a function fscrypt_policy_to_inherit() which returns the
encryption policy to inherit from a directory. This can be a real
policy, a dummy policy, or no policy.

- Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.

- Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
of an inode.

Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>


# 6d1349c7 18-Sep-2020 Al Viro <viro@zeniv.linux.org.uk>

[PATCH] reduce boilerplate in fsid handling

Get rid of boilerplate in most of ->statfs()
instances...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# c2759eba 07-Sep-2020 Daeho Jeong <daehojeong@google.com>

f2fs: change i_compr_blocks of inode to atomic value

writepages() can be concurrently invoked for the same file by different
threads such as a thread fsyncing the file and a kworker kernel thread.
So, changing i_compr_blocks without protection is racy and we need to
protect it by changing it with atomic type value. Plus, we don't need
a 64bit value for i_compr_blocks, so just we will use a atomic value,
not atomic64.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 69c0dd29 02-Sep-2020 Chao Yu <chao@kernel.org>

f2fs: ignore compress mount option on image w/o compression feature

to keep consistent with behavior when passing compress mount option
to kernel w/o compression feature, so that mount may not fail on
such condition.

Reported-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 093749e2 04-Aug-2020 Chao Yu <chao@kernel.org>

f2fs: support age threshold based garbage collection

There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.

This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:

1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
0 means youngest, 100 means oldest, if we set age threshold to 80
then select dirty segments which has age in range of [80, 100] as
candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;

2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.

3. merge valid blocks from source victim into target victim with
SSR alloctor.

Test steps:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC

Benefit: GC count and block movement count both decrease obviously:

- Before:
- Valid: 86
- Dirty: 1
- Prefree: 11
- Free: 6001 (6001)

GC calls: 162 (BG: 220)
- data segments : 160 (160)
- node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)
- node blocks : 494 (494)

IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments

- After:

- Valid: 87
- Dirty: 0
- Prefree: 4
- Free: 6008 (6008)

GC calls: 75 (BG: 76)
- data segments : 74 (74)
- node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)
- node blocks : 269 (269)

IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# eca4873e 08-Jul-2020 Daniel Rosenberg <drosen@google.com>

f2fs: Use generic casefolding support

This switches f2fs over to the generic support provided in
the previous patch.

Since casefolded dentries behave the same in ext4 and f2fs, we decrease
the maintenance burden by unifying them, and any optimizations will
immediately apply to both.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d0b9e42a 04-Aug-2020 Chao Yu <chao@kernel.org>

f2fs: introduce inmem curseg

Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;

In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;

With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature

So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# de881df9 16-Jul-2020 Aravind Ramesh <aravind.ramesh@wdc.com>

f2fs: support zone capacity less than zone size

NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
Zone-capacity indicates the maximum number of sectors that are usable in
a zone beginning from the first sector of the zone. This makes the sectors
sectors after the zone-capacity till zone-size to be unusable.
This patch set tracks zone-size and zone-capacity in zoned devices and
calculate the usable blocks per segment and usable segments per section.

If zone-capacity is less than zone-size mark only those segments which
start before zone-capacity as free segments. All segments at and beyond
zone-capacity are treated as permanently used segments. In cases where
zone-capacity does not align with segment size the last segment will start
before zone-capacity and end beyond the zone-capacity of the zone. For
such spanning segments only sectors within the zone-capacity are used.

During writes and GC manage the usable segments in a section and usable
blocks per segment. Segments which are beyond zone-capacity are never
allocated, and do not need to be garbage collected, only the segments
which are before zone-capacity needs to garbage collected.
For spanning segments based on the number of usable blocks in that
segment, write to blocks only up to zone-capacity.

Zone-capacity is device specific and cannot be configured by the user.
Since NVMe ZNS device zones are sequentially write only, a block device
with conventional zones or any normal block device is needed along with
the ZNS device for the metadata operations of F2fs.

A typical nvme-cli output of a zoned device shows zone start and capacity
and write pointer as below:

SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ

Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
are in EMPTY state. For each zone, only zone start + 49MB is usable area,
any lba/sector after 49MB cannot be read or written to, the drive will fail
any attempts to read/write. So, the second zone starts at 64MB and is
usable till 113MB (64 + 49) and the range between 113 and 128MB is
again unusable. The next zone starts at 128MB, and so on.

Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1f0b067b 29-Jul-2020 Chao Yu <chao@kernel.org>

f2fs: compress: disable compression mount option if compression is off

If CONFIG_F2FS_FS_COMPRESSION is off, don't allow to configure or
show compression related mount option.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 99c787cf 23-Jul-2020 Li Guifu <bluce.liguifu@huawei.com>

f2fs: fix use-after-free issue

During umount, f2fs_put_super() unregisters procfs entries after
f2fs_destroy_segment_manager(), it may cause use-after-free
issue when umount races with procfs accessing, fix it by relocating
f2fs_unregister_sysfs().

[Chao Yu: change commit title/message a bit]

Signed-off-by: Li Guifu <bluce.liguifu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 27aacd28 01-Jul-2020 Satya Tangirala <satyat@google.com>

f2fs: add inline encryption support

Wire up f2fs to support inline encryption via the helper functions which
fs/crypto/ now provides. This includes:

- Adding a mount option 'inlinecrypt' which enables inline encryption
on encrypted files where it can be used.

- Setting the bio_crypt_ctx on bios that will be submitted to an
inline-encrypted file.

- Not adding logically discontiguous data to bios that will be submitted
to an inline-encrypted file.

- Not doing filesystem-layer crypto on inline-encrypted files.

This patch includes a fix for a race during IPU by
Sahitya Tummala <stummala@codeaurora.org>

Signed-off-by: Satya Tangirala <satyat@google.com>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Link: https://lore.kernel.org/r/20200702015607.1215430-4-satyat@google.com
Co-developed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>


# 6b12367d 23-Jun-2020 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid readahead race condition

If two readahead threads having same offset enter in readpages, every read
IOs are split and issued to the disk which giving lower bandwidth.

This patch tries to avoid redundant readahead calls.

Fixes one build error reported by Randy.
Fix build error when F2FS_FS_COMPRESSION is not set/enabled.
This label is needed in either case.

../fs/f2fs/data.c: In function ‘f2fs_mpage_readpages’:
../fs/f2fs/data.c:2327:5: error: label ‘next_page’ used but not defined
goto next_page;

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ba87a45c 17-Jun-2020 Wang Xiaojun <wangxiaojun11@huawei.com>

f2fs: use kfree() to free variables allocated by match_strdup()

Use kfree() instead of kvfree() to free variables allocated
by match_strdup(). Because the memory is allocated with kmalloc
inside match_strdup().

Signed-off-by: Wang Xiaojun <wangxiaojun11@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 742532d1 09-Jun-2020 Denis Efremov <efremov@linux.com>

f2fs: use kfree() instead of kvfree() to free superblock data

Use kfree() instead of kvfree() to free super in read_raw_super_block()
because the memory is allocated with kzalloc() in the function.
Use kfree() instead of kvfree() to free sbi, raw_super in
f2fs_fill_super() and f2fs_put_super() because the memory is allocated
with kzalloc().

Signed-off-by: Denis Efremov <efremov@linux.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0b6d4ca0 04-Jun-2020 Eric Biggers <ebiggers@google.com>

f2fs: don't return vmalloc() memory from f2fs_kmalloc()

kmalloc() returns kmalloc'ed memory, and kvmalloc() returns either
kmalloc'ed or vmalloc'ed memory. But the f2fs wrappers, f2fs_kmalloc()
and f2fs_kvmalloc(), both return both kinds of memory.

It's redundant to have two functions that do the same thing, and also
breaking the standard naming convention is causing bugs since people
assume it's safe to kfree() memory allocated by f2fs_kmalloc(). See
e.g. the various allocations in fs/f2fs/compress.c.

Fix this by making f2fs_kmalloc() just use kmalloc(). And to avoid
re-introducing the allocation failures that the vmalloc fallback was
intended to fix, convert the largest allocations to use f2fs_kvmalloc().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ed318a6c 12-May-2020 Eric Biggers <ebiggers@google.com>

fscrypt: support test_dummy_encryption=v2

v1 encryption policies are deprecated in favor of v2, and some new
features (e.g. encryption+casefolding) are only being added for v2.

Therefore, the "test_dummy_encryption" mount option (which is used for
encryption I/O testing with xfstests) needs to support v2 policies.

To do this, extend its syntax to be "test_dummy_encryption=v1" or
"test_dummy_encryption=v2". The existing "test_dummy_encryption" (no
argument) also continues to be accepted, to specify the default setting
-- currently v1, but the next patch changes it to v2.

To cleanly support both v1 and v2 while also making it easy to support
specifying other encryption settings in the future (say, accepting
"$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a
pointer to the dummy fscrypt_context rather than using mount flags.

To avoid concurrency issues, don't allow test_dummy_encryption to be set
or changed during a remount. (The former restriction is new, but
xfstests doesn't run into it, so no one should notice.)

Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'. On ext4,
there are two regressions, both of which are test bugs: ext4/023 and
ext4/028 fail because they set an xattr and expect it to be stored
inline, but the increase in size of the fscrypt_context from
24 to 40 bytes causes this xattr to be spilled into an external block.

Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.org
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>


# 1ae18f71 15-May-2020 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix checkpoint=disable:%u%%

When parsing the mount option, we don't have sbi->user_block_count.
Should do it after getting it.

Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b4b10061 31-Mar-2020 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: refactor resize_fs to avoid meta updates in progress

Sahitya raised an issue:
- prevent meta updates while checkpoint is in progress

allocate_segment_for_resize() can cause metapage updates if
it requires to change the current node/data segments for resizing.
Stop these meta updates when there is a checkpoint already
in progress to prevent inconsistent CP data.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# baaa7ebf 11-May-2020 Konstantin Khlebnikov <koct9i@gmail.com>

f2fs: report delalloc reserve as non-free in statfs for project quota

This reserved space isn't committed yet but cannot be used for
allocations. For userspace it has no difference from used space.

See the same fix in ext4 commit f06925c73942 ("ext4: report delalloc
reserve as non-free in statfs for project quota").

Fixes: ddc34e328d06 ("f2fs: introduce f2fs_statfs_project")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6d92b201 08-Apr-2020 Chao Yu <chao@kernel.org>

f2fs: compress: support lzo-rle compress algorithm

LZO-RLE extension (run length encoding) was introduced to improve
performance of LZO algorithm in scenario of data contains many zeros,
zram has changed to use this extended algorithm by default, this
patch adds to support this algorithm extension, to enable this
extension, it needs to enable F2FS_FS_LZO and F2FS_FS_LZORLE config,
and specifies "compress_algorithm=lzo-rle" mountoption.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5e6bbde9 08-Apr-2020 Chao Yu <chao@kernel.org>

f2fs: introduce mempool for {,de}compress intermediate page allocation

If compression feature is on, in scenario of no enough free memory,
page refault ratio is higher than before, the root cause is:
- {,de}compression flow needs to allocate intermediate pages to store
compressed data in cluster, so during their allocation, vm may reclaim
mmaped pages.
- if above reclaimed pages belong to compressed cluster, during its
refault, it may cause more intermediate pages allocation, result in
reclaiming more mmaped pages.

So this patch introduces a mempool for intermediate page allocation,
in order to avoid high refault ratio, by default, number of
preallocated page in pool is 512, user can change the number by
assigning 'num_compress_pages' parameter during module initialization.

Ma Feng found warnings in the original patch and fixed like below.

Fix the following sparse warning:
fs/f2fs/compress.c:501:5: warning: symbol 'num_compress_pages' was not declared.
Should it be static?
fs/f2fs/compress.c:530:6: warning: symbol 'f2fs_compress_free_page' was not
declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Ma Feng <mafeng.ma@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3c57f751 01-May-2020 Eric Biggers <ebiggers@google.com>

f2fs: use strcmp() in parse_options()

Remove the pointless string length checks. Just use strcmp().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2bc4bea3 29-Mar-2020 Daeho Jeong <daehojeong@google.com>

f2fs: add tracepoint for f2fs iostat

Added a tracepoint to see iostat of f2fs. Default period of that
is 3 second. This tracepoint can be used to be monitoring
I/O statistics periodically.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 50cfa66f 03-Mar-2020 Chao Yu <chao@kernel.org>

f2fs: compress: support zstd compress algorithm

Add zstd compress algorithm support, use "compress_algorithm=zstd"
mountoption to enable it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 91faa534 10-Mar-2020 Chao Yu <chao@kernel.org>

f2fs: change default compression algorithm

Use LZ4 as default compression algorithm, as compared to LZO, it shows
almost the same compression ratio and much better decompression speed.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 62f63eea 19-Mar-2020 Chao Yu <chao@kernel.org>

f2fs: fix NULL pointer dereference in f2fs_write_begin()

BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:f2fs_write_begin+0x823/0xb90 [f2fs]
Call Trace:
f2fs_quota_write+0x139/0x1d0 [f2fs]
write_blk+0x36/0x80 [quota_tree]
get_free_dqblk+0x42/0xa0 [quota_tree]
do_insert_tree+0x235/0x4a0 [quota_tree]
do_insert_tree+0x26e/0x4a0 [quota_tree]
do_insert_tree+0x26e/0x4a0 [quota_tree]
do_insert_tree+0x26e/0x4a0 [quota_tree]
qtree_write_dquot+0x70/0x190 [quota_tree]
v2_write_dquot+0x43/0x90 [quota_v2]
dquot_acquire+0x77/0x100
f2fs_dquot_acquire+0x2f/0x60 [f2fs]
dqget+0x310/0x450
dquot_transfer+0x7e/0x120
f2fs_setattr+0x11a/0x4a0 [f2fs]
notify_change+0x349/0x480
chown_common+0x168/0x1c0
do_fchownat+0xbc/0xf0
__x64_sys_fchownat+0x20/0x30
do_syscall_64+0x5f/0x220
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Passing fsdata parameter to .write_{begin,end} in f2fs_quota_write(),
so that if quota file is compressed one, we can avoid above NULL
pointer dereference when updating quota content.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c6a564ff 25-Mar-2020 Christoph Hellwig <hch@lst.de>

block: move the part_stat* helpers from genhd.h to a new header

These macros are just used by a few files. Move them out of genhd.h,
which is included everywhere into a new standalone header.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# a999150f 25-Feb-2020 Chao Yu <chao@kernel.org>

f2fs: use kmem_cache pool during inline xattr lookups

It's been observed that kzalloc() on lookup_all_xattrs() are called millions
of times on Android, quickly becoming the top abuser of slub memory allocator.

Use a dedicated kmem cache pool for xattr lookups to mitigate this.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5df7731f 17-Feb-2020 Chao Yu <chao@kernel.org>

f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bbbc34fd 14-Feb-2020 Chao Yu <chao@kernel.org>

f2fs: clean up bggc mount option

There are three status for background gc: on, off and sync, it's
a little bit confused to use test_opt(BG_GC) and test_opt(FORCE_FG_GC)
combinations to indicate status of background gc.

So let's remove F2FS_MOUNT_BG_GC and F2FS_MOUNT_FORCE_FG_GC mount
options, and add F2FS_OPTION().bggc_mode with below three status
to clean up codes and enhance bggc mode's scalability.

enum {
BGGC_MODE_ON, /* background gc is on */
BGGC_MODE_OFF, /* background gc is off */
BGGC_MODE_SYNC, /*
* background gc is on, migrating blocks
* like foreground gc
*/
};

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b0332a0f 14-Feb-2020 Chao Yu <chao@kernel.org>

f2fs: clean up lfs/adaptive mount option

This patch removes F2FS_MOUNT_ADAPTIVE and F2FS_MOUNT_LFS mount options,
and add F2FS_OPTION.fs_mode with below two status to indicate filesystem
mode.

enum {
FS_MODE_ADAPTIVE, /* use both lfs/ssr allocation */
FS_MODE_LFS, /* use lfs allocation only */
};

It can enhance code readability and fs mode's scalability.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a9117eca 14-Feb-2020 Chao Yu <chao@kernel.org>

f2fs: fix to show norecovery mount option

Previously, 'norecovery' mount option will be shown as
'disable_roll_forward', fix to show original option name correctly.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7a88ddb5 27-Feb-2020 Chao Yu <chao@kernel.org>

f2fs: fix inconsistent comments

Lack of maintenance on comments may mislead developers, fix them.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c10c9820 27-Feb-2020 Chao Yu <chao@kernel.org>

f2fs: cover last_disk_size update with spinlock

This change solves below hangtask issue:

INFO: task kworker/u16:1:58 blocked for more than 122 seconds.
Not tainted 5.6.0-rc2-00590-g9983bdae4974e #11
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:1 D 0 58 2 0x00000000
Workqueue: writeback wb_workfn (flush-179:0)
Backtrace:
(__schedule) from [<c0913234>] (schedule+0x78/0xf4)
(schedule) from [<c017ec74>] (rwsem_down_write_slowpath+0x24c/0x4c0)
(rwsem_down_write_slowpath) from [<c0915f2c>] (down_write+0x6c/0x70)
(down_write) from [<c0435b80>] (f2fs_write_single_data_page+0x608/0x7ac)
(f2fs_write_single_data_page) from [<c0435fd8>] (f2fs_write_cache_pages+0x2b4/0x7c4)
(f2fs_write_cache_pages) from [<c043682c>] (f2fs_write_data_pages+0x344/0x35c)
(f2fs_write_data_pages) from [<c0267ee8>] (do_writepages+0x3c/0xd4)
(do_writepages) from [<c0310cbc>] (__writeback_single_inode+0x44/0x454)
(__writeback_single_inode) from [<c03112d0>] (writeback_sb_inodes+0x204/0x4b0)
(writeback_sb_inodes) from [<c03115cc>] (__writeback_inodes_wb+0x50/0xe4)
(__writeback_inodes_wb) from [<c03118f4>] (wb_writeback+0x294/0x338)
(wb_writeback) from [<c0312dac>] (wb_workfn+0x35c/0x54c)
(wb_workfn) from [<c014f2b8>] (process_one_work+0x214/0x544)
(process_one_work) from [<c014f634>] (worker_thread+0x4c/0x574)
(worker_thread) from [<c01564fc>] (kthread+0x144/0x170)
(kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)

Reported-and-tested-by: Ondřej Jirman <megi@xff.cz>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bf22c3cc 17-Feb-2020 Sahitya Tummala <stummala@codeaurora.org>

f2fs: fix the panic in do_checkpoint()

There could be a scenario where f2fs_sync_meta_pages() will not
ensure that all F2FS_DIRTY_META pages are submitted for IO. Thus,
resulting in the below panic in do_checkpoint() -

f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_META) &&
!f2fs_cp_error(sbi));

This can happen in a low-memory condition, where shrinker could
also be doing the writepage operation (stack shown below)
at the same time when checkpoint is running on another core.

schedule
down_write
f2fs_submit_page_write -> by this time, this page in page cache is tagged
as PAGECACHE_TAG_WRITEBACK and PAGECACHE_TAG_DIRTY
is cleared, due to which f2fs_sync_meta_pages()
cannot sync this page in do_checkpoint() path.
f2fs_do_write_meta_page
__f2fs_write_meta_page
f2fs_write_meta_page
shrink_page_list
shrink_inactive_list
shrink_node_memcg
shrink_node
kswapd

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# fb24fea7 14-Jan-2020 Chao Yu <chao@kernel.org>

f2fs: change to use rwsem for gc_mutex

Mutex lock won't serialize callers, in order to avoid starving of unlucky
caller, let's use rwsem lock instead.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bf2cbd3c 04-Jan-2020 Chengguang Xu <cgxu519@mykernel.net>

f2fs: code cleanup for f2fs_statfs_project()

Calling min_not_zero() to simplify complicated prjquota
limit comparison in f2fs_statfs_project().

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# acdf2172 04-Jan-2020 Chengguang Xu <cgxu519@mykernel.net>

f2fs: fix miscounted block limit in f2fs_statfs_project()

statfs calculates Total/Used/Avail disk space in block unit,
so we should translate soft/hard prjquota limit to block unit
as well.

Below testing result shows the block/inode numbers of
Total/Used/Avail from df command are all correct afer
applying this patch.

[root@localhost quota-tools]\# ./repquota -P /dev/sdb1
*** Report for project quotas on device /dev/sdb1
Block grace time: 7days; Inode grace time: 7days
Block limits File limits
Project used soft hard grace used soft hard grace
-----------------------------------------------------------
\#0 -- 4 0 0 1 0 0
\#101 -- 0 0 0 2 0 0
\#102 -- 0 10240 0 2 10 0
\#103 -- 0 0 20480 2 0 20
\#104 -- 0 10240 20480 2 10 20
\#105 -- 0 20480 10240 2 20 10

[root@localhost sdb1]\# lsattr -p t{1,2,3,4,5}
101 ----------------N-- t1/a1
102 ----------------N-- t2/a2
103 ----------------N-- t3/a3
104 ----------------N-- t4/a4
105 ----------------N-- t5/a5

[root@localhost sdb1]\# df -hi t{1,2,3,4,5}
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sdb1 2.4M 21 2.4M 1% /mnt/sdb1
/dev/sdb1 10 2 8 20% /mnt/sdb1
/dev/sdb1 20 2 18 10% /mnt/sdb1
/dev/sdb1 10 2 8 20% /mnt/sdb1
/dev/sdb1 10 2 8 20% /mnt/sdb1

[root@localhost sdb1]\# df -h t{1,2,3,4,5}
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 10G 489M 9.6G 5% /mnt/sdb1
/dev/sdb1 10M 0 10M 0% /mnt/sdb1
/dev/sdb1 20M 0 20M 0% /mnt/sdb1
/dev/sdb1 10M 0 10M 0% /mnt/sdb1
/dev/sdb1 10M 0 10M 0% /mnt/sdb1

Fixes: 909110c060f2 ("f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()")
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4c8ff709 01-Nov-2019 Chao Yu <chao@kernel.org>

f2fs: support data compression

This patch tries to support compression in f2fs.

- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.

- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.

- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.

- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext

Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+

Changelog:

20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().

20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().

20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().

20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.

20190402
- don't preallocate blocks for compressed file.

- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.

20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR

One cluster contain 4 blocks

before overwrite after overwrite

- VVVV -> CVNN
- CVNN -> VVVV

- CVNN -> CVNN
- CVNN -> CVVV

- CVVV -> CVNN
- CVVV -> CVVV

20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.

20191101
- apply fixes from Jaegeuk

20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity

20191216
- apply fixes from Jaegeuk

20200117
- fix to avoid NULL pointer dereference

[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks

Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2c4e0c52 03-Dec-2019 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: declare nested quota_sem and remove unnecessary sems

1.
f2fs_quota_sync
-> down_read(&sbi->quota_sem)
-> dquot_writeback_dquots
-> f2fs_dquot_commit
-> down_read(&sbi->quota_sem)

2.
f2fs_quota_sync
-> down_read(&sbi->quota_sem)
-> f2fs_write_data_pages
-> f2fs_write_single_data_page
-> down_write(&F2FS_I(inode)->i_sem)

f2fs_mkdir
-> f2fs_do_add_link
-> down_write(&F2FS_I(inode)->i_sem)
-> f2fs_init_inode_metadata
-> f2fs_new_node_page
-> dquot_alloc_inode
-> f2fs_dquot_mark_dquot_dirty
-> down_read(&sbi->quota_sem)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f543805f 03-Dec-2019 Chao Yu <chao@kernel.org>

f2fs: introduce private bioset

In low memory scenario, we can allocate multiple bios without
submitting any of them.

- f2fs_write_checkpoint()
- block_operations()
- f2fs_sync_node_pages()
step 1) flush cold nodes, allocate new bio from mempool
- bio_alloc()
- mempool_alloc()
step 2) flush hot nodes, allocate a bio from mempool
- bio_alloc()
- mempool_alloc()
step 3) flush warm nodes, be stuck in below call path
- bio_alloc()
- mempool_alloc()
- loop to wait mempool element release, as we only
reserved memory for two bio allocation, however above
allocated two bios may never be submitted.

So we need avoid using default bioset, in this patch we introduce a
private bioset, in where we enlarg mempool element count to total
number of log header, so that we can make sure we have enough
backuped memory pool in scenario of allocating/holding multiple
bios.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d508c94e 09-Dec-2019 Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

f2fs: Check write pointer consistency of non-open zones

To catch f2fs bugs in write pointer handling code for zoned block
devices, check write pointers of non-open zones that current segments do
not point to. Do this check at mount time, after the fsync data recovery
and current segments' write pointer consistency fix. Or when fsync data
recovery is disabled by mount option, do the check when there is no fsync
data.

Check two items comparing write pointers with valid block maps in SIT.
The first item is check for zones with no valid blocks. When there is no
valid blocks in a zone, the write pointer should be at the start of the
zone. If not, next write operation to the zone will cause unaligned write
error. If write pointer is not at the zone start, reset the write pointer
to place at the zone start.

The second item is check between the write pointer position and the last
valid block in the zone. It is unexpected that the last valid block
position is beyond the write pointer. In such a case, report as a bug.
Fix is not required for such zone, because the zone is not selected for
next write operation until the zone get discarded.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 909110c0 24-Nov-2019 Chengguang Xu <cgxu519@mykernel.net>

f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()

Setting softlimit larger than hardlimit seems meaningless
for disk quota but currently it is allowed. In this case,
there may be a bit of comfusion for users when they run
df comamnd to directory which has project quota.

For example, we set 20M softlimit and 10M hardlimit of
block usage limit for project quota of test_dir(project id 123).

[root@hades f2fs]# repquota -P -a
*** Report for project quotas on device /dev/nvme0n1p8
Block grace time: 7days; Inode grace time: 7days
Block limits File limits
Project used soft hard grace used soft hard grace
----------------------------------------------------------------------
0 -- 4 0 0 1 0 0
123 +- 10248 20480 10240 2 0 0

The result of df command as below:

[root@hades f2fs]# df -h /mnt/f2fs/test
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p8 20M 11M 10M 51% /mnt/f2fs

Even though it looks like there is another 10M free space to use,
if we write new data to diretory test(inherit project id),
the write will fail with errno(-EDQUOT).

After this patch, the df result looks like below.

[root@hades f2fs]# df -h /mnt/f2fs/test
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p8 10M 10M 0 100% /mnt/f2fs

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d4100351 10-Nov-2019 Christoph Hellwig <hch@lst.de>

block: rework zone reporting

Avoid the need to allocate a potentially large array of struct blk_zone
in the block layer by switching the ->report_zones method interface to
a callback model. Now the caller simply supplies a callback that is
executed on each reported zone, and private data for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f5a53edc 18-Oct-2019 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: support aligned pinned file

This patch supports 2MB-aligned pinned file, which can guarantee no GC at all
by allocating fully valid 2MB segment.

Check free segments by has_not_enough_free_secs() with large budget.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0eee17e3 24-Oct-2019 Eric Biggers <ebiggers@google.com>

f2fs: add support for IV_INO_LBLK_64 encryption policies

f2fs inode numbers are stable across filesystem resizing, and f2fs inode
and file logical block numbers are always 32-bit. So f2fs can always
support IV_INO_LBLK_64 encryption policies. Wire up the needed
fscrypt_operations to declare support.

Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>


# 7212b95e 01-Nov-2019 Jan Kara <jack@suse.cz>

fs: Use dquot_load_quota_inode() from filesystems

Use dquot_load_quota_inode from filesystems instead of dquot_enable().
In all three cases we want to load quota inode and never use the
function to update quota flags.

Signed-off-by: Jan Kara <jack@suse.cz>


# 0b20fcec 30-Sep-2019 Chao Yu <chao@kernel.org>

f2fs: cache global IPU bio

In commit 8648de2c581e ("f2fs: add bio cache for IPU"), we added
f2fs_submit_ipu_bio() in __write_data_page() as below:

__write_data_page()

if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode)) {
f2fs_submit_ipu_bio(sbi, bio, page);
....
}

in order to avoid below deadlock:

Thread A Thread B
- __write_data_page (inode x, page y)
- f2fs_do_write_data_page
- set_page_writeback ---- set writeback flag in page y
- f2fs_inplace_write_data
- f2fs_balance_fs
- lock gc_mutex
- lock gc_mutex
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- f2fs_wait_on_page_writeback
- wait_on_page_writeback --- wait writeback of page y

However, the bio submission breaks the merge of IPU IOs.

So in this patch let's add a global bio cache for merged IPU pages,
then f2fs_wait_on_page_writeback() is able to submit bio if a
writebacked page is cached in global bio cache.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9f701f6c 22-Sep-2019 Qiuyang Sun <sunqiuyang@huawei.com>

f2fs: check total_segments from devices in raw_super

For multi-device F2FS, we should check if the sum of total_segments from
all devices matches segment_count.

Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ed352042 26-Sep-2019 Chengguang Xu via Linux-f2fs-devel <linux-f2fs-devel@lists.sourceforge.net>

f2fs: mark recovery flag correctly in read_raw_super_block()

On the combination of first fail and second success,
we will miss to mark recovery flag because currently
we reuse err variable in the loop.

Signed-off-by: Chengguang Xu <cgxu519@zoho.com.cn>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1166c1f2 23-Aug-2019 Surbhi Palande <f2fsnewbie@gmail.com>

f2fs: check all the data segments against all node ones

As a part of the sanity checking while mounting, distinct segment number
assignment to data and node segments is verified. Fixing a small bug in
this verification between node and data segments. We need to check all
the data segments with all the node segments.

Fixes: 042be0f849e5f ("f2fs: fix to do sanity check with current segment number")
Signed-off-by: Surbhi Palande <csurbhi@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 27cae0bc 05-Aug-2019 Chao Yu <chao@kernel.org>

f2fs: fix wrong available node count calculation

In mkfs, we have counted quota file's node number in cp.valid_node_count,
so we have to avoid wrong substraction of quota node number in
.available_nid/.avail_node_count calculation.

f2fs_write_check_point_pack()
{
..
set_cp(valid_node_count, 1 + c.quota_inum + c.lpf_inum);

Fixes: 292c196a3695 ("f2fs: reserve nid resource for quota sysfile")
Fixes: 7b63f72f73af ("f2fs: fix to do sanity check on valid node/block count")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2c2eb7a3 23-Jul-2019 Daniel Rosenberg <drosen@google.com>

f2fs: Support case-insensitive file name lookups

Modeled after commit b886ee3e778e ("ext4: Support case-insensitive file
name lookups")

"""
This patch implements the actual support for case-insensitive file name
lookups in f2fs, based on the feature bit and the encoding stored in the
superblock.

A filesystem that has the casefold feature set is able to configure
directories with the +F (F2FS_CASEFOLD_FL) attribute, enabling lookups
to succeed in that directory in a case-insensitive fashion, i.e: match
a directory entry even if the name used by userspace is not a byte per
byte match with the disk name, but is an equivalent case-insensitive
version of the Unicode string. This operation is called a
case-insensitive file name lookup.

The feature is configured as an inode attribute applied to directories
and inherited by its children. This attribute can only be enabled on
empty directories for filesystems that support the encoding feature,
thus preventing collision of file names that only differ by case.

* dcache handling:

For a +F directory, F2Fs only stores the first equivalent name dentry
used in the dcache. This is done to prevent unintentional duplication of
dentries in the dcache, while also allowing the VFS code to quickly find
the right entry in the cache despite which equivalent string was used in
a previous lookup, without having to resort to ->lookup().

d_hash() of casefolded directories is implemented as the hash of the
casefolded string, such that we always have a well-known bucket for all
the equivalencies of the same string. d_compare() uses the
utf8_strncasecmp() infrastructure, which handles the comparison of
equivalent, same case, names as well.

For now, negative lookups are not inserted in the dcache, since they
would need to be invalidated anyway, because we can't trust missing file
dentries. This is bad for performance but requires some leveraging of
the vfs layer to fix. We can live without that for now, and so does
everyone else.

* on-disk data:

Despite using a specific version of the name as the internal
representation within the dcache, the name stored and fetched from the
disk is a byte-per-byte match with what the user requested, making this
implementation 'name-preserving'. i.e. no actual information is lost
when writing to storage.

DX is supported by modifying the hashes used in +F directories to make
them case/encoding-aware. The new disk hashes are calculated as the
hash of the full casefolded string, instead of the string directly.
This allows us to efficiently search for file names in the htree without
requiring the user to provide an exact name.

* Dealing with invalid sequences:

By default, when a invalid UTF-8 sequence is identified, ext4 will treat
it as an opaque byte sequence, ignoring the encoding and reverting to
the old behavior for that unique file. This means that case-insensitive
file name lookup will not work only for that file. An optional bit can
be set in the superblock telling the filesystem code and userspace tools
to enforce the encoding. When that optional bit is set, any attempt to
create a file name using an invalid UTF-8 sequence will fail and return
an error to userspace.

* Normalization algorithm:

The UTF-8 algorithms used to compare strings in f2fs is implemented
in fs/unicode, and is based on a previous version developed by
SGI. It implements the Canonical decomposition (NFD) algorithm
described by the Unicode specification 12.1, or higher, combined with
the elimination of ignorable code points (NFDi) and full
case-folding (CF) as documented in fs/unicode/utf8_norm.c.

NFD seems to be the best normalization method for F2FS because:

- It has a lower cost than NFC/NFKC (which requires
decomposing to NFD as an intermediary step)
- It doesn't eliminate important semantic meaning like
compatibility decompositions.

Although:

- This implementation is not completely linguistic accurate, because
different languages have conflicting rules, which would require the
specialization of the filesystem to a given locale, which brings all
sorts of problems for removable media and for users who use more than
one language.
"""

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5aba5430 23-Jul-2019 Daniel Rosenberg <drosen@google.com>

f2fs: include charset encoding information in the superblock

Add charset encoding to f2fs to support casefolding. It is modeled after
the same feature introduced in commit c83ad55eaa91 ("ext4: include charset
encoding information in the superblock")

Currently this is not compatible with encryption, similar to the current
ext4 imlpementation. This will change in the future.

>From the ext4 patch:
"""
The s_encoding field stores a magic number indicating the encoding
format and version used globally by file and directory names in the
filesystem. The s_encoding_flags defines policies for using the charset
encoding, like how to handle invalid sequences. The magic number is
mapped to the exact charset table, but the mapping is specific to ext4.
Since we don't have any commitment to support old encodings, the only
encoding I am supporting right now is utf8-12.1.0.

The current implementation prevents the user from enabling encoding and
per-directory encryption on the same filesystem at the same time. The
incompatibility between these features lies in how we do efficient
directory searches when we cannot be sure the encryption of the user
provided fname will match the actual hash stored in the disk without
decrypting every directory entry, because of normalization cases. My
quickest solution is to simply block the concurrent use of these
features for now, and enable it later, once we have a better solution.
"""

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# fe973b06 25-Jul-2019 Chao Yu <chao@kernel.org>

f2fs: fix to handle quota_{on,off} correctly

With quota_ino feature on, generic/232 reports an inconsistence issue
on the image.

The root cause is that the testcase tries to:
- use quotactl to shutdown journalled quota based on sysfile;
- and then use quotactl to enable/turn on quota based on specific file
(aquota.user or aquota.group).

Eventually, quota sysfile will be out-of-update due to following specific
file creation.

Change as below to fix this issue:
- deny enabling quota based on specific file if quota sysfile exists.
- set SBI_QUOTA_NEED_REPAIR once sysfile based quota shutdowns via
ioctl.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a8933b6b 18-Jul-2019 Chao Yu <chao@kernel.org>

f2fs: fix to drop meta/node pages during umount

As reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=204193

A null pointer dereference bug is triggered in f2fs under kernel-5.1.3.

kasan_report.cold+0x5/0x32
f2fs_write_end_io+0x215/0x650
bio_endio+0x26e/0x320
blk_update_request+0x209/0x5d0
blk_mq_end_request+0x2e/0x230
lo_complete_rq+0x12c/0x190
blk_done_softirq+0x14a/0x1a0
__do_softirq+0x119/0x3e5
irq_exit+0x94/0xe0
call_function_single_interrupt+0xf/0x20

During umount, we will access NULL sbi->node_inode pointer in
f2fs_write_end_io():

f2fs_bug_on(sbi, page->mapping == NODE_MAPPING(sbi) &&
page->index != nid_of_node(page));

The reason is if disable_checkpoint mount option is on, meta dirty
pages can remain during umount, and then be flushed by iput() of
meta_inode, however node_inode has been iput()ed before
meta_inode's iput().

Since checkpoint is disabled, all meta/node datas are useless and
should be dropped in next mount, so in umount, let's adjust
drop_inode() to give a hint to iput_final() to drop all those dirty
datas correctly.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1f78adfa 12-Jul-2019 Chao Yu <chao@kernel.org>

f2fs: disallow switching io_bits option during remount

If IO alignment feature is turned on after remount, we didn't
initialize mempool of it, it turns out we will encounter panic
during IO submission due to access NULL mempool pointer.

This feature should be set only at mount time, so simply deny
configuring during remount.

This fixes bug reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=204135

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c72db71e 12-Jul-2019 Chao Yu <chao@kernel.org>

f2fs: fix panic of IO alignment feature

Since 07173c3ec276 ("block: enable multipage bvecs"), one bio vector
can store multi pages, so that we can not calculate max IO size of
bio as PAGE_SIZE * bio->bi_max_vecs. However IO alignment feature of
f2fs always has that assumption, so finally, it may cause panic during
IO submission as below stack.

kernel BUG at fs/f2fs/data.c:317!
RIP: 0010:__submit_merged_bio+0x8b0/0x8c0
Call Trace:
f2fs_submit_page_write+0x3cd/0xdd0
do_write_page+0x15d/0x360
f2fs_outplace_write_data+0xd7/0x210
f2fs_do_write_data_page+0x43b/0xf30
__write_data_page+0xcf6/0x1140
f2fs_write_cache_pages+0x3ba/0xb40
f2fs_write_data_pages+0x3dd/0x8b0
do_writepages+0xbb/0x1e0
__writeback_single_inode+0xb6/0x800
writeback_sb_inodes+0x441/0x910
wb_writeback+0x261/0x650
wb_workfn+0x1f9/0x7a0
process_one_work+0x503/0x970
worker_thread+0x7d/0x820
kthread+0x1ad/0x210
ret_from_fork+0x35/0x40

This patch adds one extra condition to check left space in bio while
trying merging page to bio, to avoid panic.

This bug was reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=204043

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 95ae251f 22-Jul-2019 Eric Biggers <ebiggers@google.com>

f2fs: add fs-verity support

Add fs-verity support to f2fs. fs-verity is a filesystem feature that
enables transparent integrity protection and authentication of read-only
files. It uses a dm-verity like mechanism at the file level: a Merkle
tree is used to verify any block in the file in log(filesize) time. It
is implemented mainly by helper functions in fs/verity/. See
Documentation/filesystems/fsverity.rst for the full documentation.

The f2fs support for fs-verity consists of:

- Adding a filesystem feature flag and an inode flag for fs-verity.

- Implementing the fsverity_operations to support enabling verity on an
inode and reading/writing the verity metadata.

- Updating ->readpages() to verify data as it's read from verity files
and to support reading verity metadata pages.

- Updating ->write_begin(), ->write_end(), and ->writepages() to support
writing verity metadata pages.

- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().

Like ext4, f2fs stores the verity metadata (Merkle tree and
fsverity_descriptor) past the end of the file, starting at the first 64K
boundary beyond i_size. This approach works because (a) verity files
are readonly, and (b) pages fully beyond i_size aren't visible to
userspace but can be read/written internally by f2fs with only some
relatively small changes to f2fs. Extended attributes cannot be used
because (a) f2fs limits the total size of an inode's xattr entries to
4096 bytes, which wouldn't be enough for even a single Merkle tree
block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity
metadata *must* be encrypted when the file is because it contains hashes
of the plaintext data.

Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>


# 8ce589c7 04-Aug-2019 Eric Biggers <ebiggers@google.com>

f2fs: wire up new fscrypt ioctls

Wire up the new ioctls for adding and removing fscrypt keys to/from the
filesystem, and the new ioctl for retrieving v2 encryption policies.

The key removal ioctls also required making f2fs_drop_inode() call
fscrypt_drop_inode().

For more details see Documentation/filesystems/fscrypt.rst and the
fscrypt patches that added the implementation of these ioctls.

Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>


# 38fb6d0e 24-Jul-2019 Icenowy Zheng <icenowy@aosc.io>

f2fs: use EINVAL for superblock with invalid magic

The kernel mount_block_root() function expects -EACESS or -EINVAL for a
unmountable filesystem when trying to mount the root with different
filesystem types.

However, in 5.3-rc1 the behavior when F2FS code cannot find valid block
changed to return -EFSCORRUPTED(-EUCLEAN), and this error code makes
mount_block_root() fail when trying to probe F2FS.

When the magic number of the superblock mismatches, it has a high
probability that it's just not a F2FS. In this case return -EINVAL seems
to be a better result, and this return value can make mount_block_root()
probing work again.

Return -EINVAL when the superblock has magic mismatch, -EFSCORRUPTED in
other cases (the magic matches but the superblock cannot be recognized).

Fixes: 10f966bbf521 ("f2fs: use generic EFSBADCRC/EFSCORRUPTED")
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bd976e52 30-Jun-2019 Damien Le Moal <damien.lemoal@wdc.com>

block: Kill gfp_t argument of blkdev_report_zones()

Only GFP_KERNEL and GFP_NOIO are used with blkdev_report_zones(). In
preparation of using vmalloc() for large report buffer and zone array
allocations used by this function, remove its "gfp_t gfp_mask" argument
and rely on the caller context to use memalloc_noio_save/restore() where
necessary (block layer zone revalidation and dm-zoned I/O error path).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 2d008835 10-Jul-2019 Chao Yu <chao@kernel.org>

f2fs: improve print log in f2fs_sanity_check_ckpt()

As Park Ju Hyung suggested:

"I'd like to suggest to write down an actual version of f2fs-tools
here as we've seen older versions of fsck doing even more damage
and the users might not have the latest f2fs-tools installed."

This patch give a more detailed info of how we fix such corruption
to user to avoid damageable repair with low version fsck.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# db6ec53b 29-May-2019 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add a rw_sem to cover quota flag changes

Two paths to update quota and f2fs_lock_op:

1.
- lock_op
| - quota_update
`- unlock_op

2.
- quota_update
- lock_op
`- unlock_op

But, we need to make a transaction on quota_update + lock_op in #2 case.
So, this patch introduces:
1. lock_op
2. down_write
3. check __need_flush
4. up_write
5. if there is dirty quota entries, flush them
6. otherwise, good to go

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 10f966bb 19-Jun-2019 Chao Yu <chao@kernel.org>

f2fs: use generic EFSBADCRC/EFSCORRUPTED

f2fs uses EFAULT as error number to indicate filesystem is corrupted
all the time, but generic filesystems use EUCLEAN for such condition,
we need to change to follow others.

This patch adds two new macros as below to wrap more generic error
code macros, and spread them in code.

EFSBADCRC EBADMSG /* Bad CRC detected */
EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */

Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# dcbb4c10 18-Jun-2019 Joe Perches <joe@perches.com>

f2fs: introduce f2fs_<level> macros to wrap f2fs_printk()

- Add and use f2fs_<level> macros
- Convert f2fs_msg to f2fs_printk
- Remove level from f2fs_printk and embed the level in the format
- Coalesce formats and align multi-line arguments
- Remove unnecessary duplicate extern f2fs_msg f2fs.h

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 04f0b2ea 04-Jun-2019 Qiuyang Sun <sunqiuyang@huawei.com>

f2fs: ioctl for removing a range from F2FS

This ioctl shrinks a given length (aligned to sections) from end of the
main area. Any cursegs and valid blocks will be moved out before
invalidating the range.

This feature can be used for adjusting partition sizes online.

History of the patch:

Sahitya Tummala:
- Add this ioctl for f2fs_compat_ioctl() as well.
- Fix debugfs status to reflect the online resize changes.
- Fix potential race between online resize path and allocate new data
block path or gc path.

Others:
- Rename some identifiers.
- Add some error handling branches.
- Clear sbi->next_victim_seg[BG_GC/FG_GC] in shrinking range.
- Implement this interface as ext4's, and change the parameter from shrunk
bytes to new block count of F2FS.
- During resizing, force to empty sit_journal and forbid adding new
entries to it, in order to avoid invalid segno in journal after resize.
- Reduce sbi->user_block_count before resize starts.
- Commit the updated superblock first, and then update in-memory metadata
only when the former succeeds.
- Target block count must align to sections.
- Write checkpoint before and after committing the new superblock, w/o
CP_FSCK_FLAG respectively, so that the FS can be fixed by fsck even if
resize fails after the new superblock is committed.
- In free_segment_range(), reduce granularity of gc_mutex.
- Add protection on curseg migration.
- Add freeze_bdev() and thaw_bdev() for resize fs.
- Remove CUR_MAIN_SECS and use MAIN_SECS directly for allocation.
- Recover super_block and FS metadata when resize fails.
- No need to clear CP_FSCK_FLAG in update_ckpt_flags().
- Clean up the sb and fs metadata update functions for resize_fs.

Geert Uytterhoeven:
- Use div_u64*() for 64-bit divisions

Arnd Bergmann:
- Not all architectures support get_user() with a 64-bit argument:
ERROR: "__get_user_bad" [fs/f2fs/f2fs.ko] undefined!
Use copy_from_user() here, this will always work.

Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4d3aed70 29-May-2019 Daniel Rosenberg <drosen@google.com>

f2fs: Add option to limit required GC for checkpoint=disable

This extends the checkpoint option to allow checkpoint=disable:%u[%]
This allows you to specify what how much of the disk you are willing
to lose access to while mounting with checkpoint=disable. If the amount
lost would be higher, the mount will return -EAGAIN. This can be given
as a percent of total space, or in blocks.

Currently, we need to run garbage collection until the amount of holes
is smaller than the OVP space. With the new option, f2fs can mark
space as unusable up front instead of requiring garbage collection until
the number of holes is small enough.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9a9aecaa 29-May-2019 Daniel Rosenberg <drosen@google.com>

f2fs: Fix root reserved on remount

On a remount, you can currently set root reserved if it was not
previously set. This can cause an underflow if reserved has been set to
a very high value, since then root reserved + current reserved could be
greater than user_block_count. inc_valid_block_count later subtracts out
these values from user_block_count, causing an underflow.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 81621f97 24-May-2019 Sahitya Tummala <stummala@codeaurora.org>

f2fs: fix f2fs_show_options to show nodiscard mount option

Fix f2fs_show_options to show nodiscard mount option.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9227d522 22-May-2019 Sahitya Tummala <stummala@codeaurora.org>

f2fs: add error prints for debugging mount failure

Add error prints to get more details on the mount failure.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 040d2bb3 20-May-2019 Chao Yu <chao@kernel.org>

f2fs: fix to avoid deadloop if data_flush is on

As Hagbard Celine reported:

[ 615.697824] INFO: task kworker/u16:5:344 blocked for more than 120 seconds.
[ 615.697825] Not tainted 5.0.15-gentoo-f2fslog #4
[ 615.697826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 615.697827] kworker/u16:5 D 0 344 2 0x80000000
[ 615.697831] Workqueue: writeback wb_workfn (flush-259:0)
[ 615.697832] Call Trace:
[ 615.697836] ? __schedule+0x2c5/0x8b0
[ 615.697839] schedule+0x32/0x80
[ 615.697841] schedule_preempt_disabled+0x14/0x20
[ 615.697842] __mutex_lock.isra.8+0x2ba/0x4d0
[ 615.697845] ? log_store+0xf5/0x260
[ 615.697848] f2fs_write_data_pages+0x133/0x320
[ 615.697851] ? trace_hardirqs_on+0x2c/0xe0
[ 615.697854] do_writepages+0x41/0xd0
[ 615.697857] __filemap_fdatawrite_range+0x81/0xb0
[ 615.697859] f2fs_sync_dirty_inodes+0x1dd/0x200
[ 615.697861] f2fs_balance_fs_bg+0x2a7/0x2c0
[ 615.697863] ? up_read+0x5/0x20
[ 615.697865] ? f2fs_do_write_data_page+0x2cb/0x940
[ 615.697867] f2fs_balance_fs+0xe5/0x2c0
[ 615.697869] __write_data_page+0x1c8/0x6e0
[ 615.697873] f2fs_write_cache_pages+0x1e0/0x450
[ 615.697878] f2fs_write_data_pages+0x14b/0x320
[ 615.697880] ? trace_hardirqs_on+0x2c/0xe0
[ 615.697883] do_writepages+0x41/0xd0
[ 615.697885] __filemap_fdatawrite_range+0x81/0xb0
[ 615.697887] f2fs_sync_dirty_inodes+0x1dd/0x200
[ 615.697889] f2fs_balance_fs_bg+0x2a7/0x2c0
[ 615.697891] f2fs_write_node_pages+0x51/0x220
[ 615.697894] do_writepages+0x41/0xd0
[ 615.697897] __writeback_single_inode+0x3d/0x3d0
[ 615.697899] writeback_sb_inodes+0x1e8/0x410
[ 615.697902] __writeback_inodes_wb+0x5d/0xb0
[ 615.697904] wb_writeback+0x28f/0x340
[ 615.697906] ? cpumask_next+0x16/0x20
[ 615.697908] wb_workfn+0x33e/0x420
[ 615.697911] process_one_work+0x1a1/0x3d0
[ 615.697913] worker_thread+0x30/0x380
[ 615.697915] ? process_one_work+0x3d0/0x3d0
[ 615.697916] kthread+0x116/0x130
[ 615.697918] ? kthread_create_worker_on_cpu+0x70/0x70
[ 615.697921] ret_from_fork+0x3a/0x50

There is still deadloop in below condition:

d A
- do_writepages
- f2fs_write_node_pages
- f2fs_balance_fs_bg
- f2fs_sync_dirty_inodes
- f2fs_write_cache_pages
- mutex_lock(&sbi->writepages) -- lock once
- __write_data_page
- f2fs_balance_fs_bg
- f2fs_sync_dirty_inodes
- f2fs_write_data_pages
- mutex_lock(&sbi->writepages) -- lock again

Thread A Thread B
- do_writepages
- f2fs_write_node_pages
- f2fs_balance_fs_bg
- f2fs_sync_dirty_inodes
- .cp_task = current
- f2fs_sync_dirty_inodes
- .cp_task = current
- filemap_fdatawrite
- .cp_task = NULL
- filemap_fdatawrite
- f2fs_write_cache_pages
- enter f2fs_balance_fs_bg since .cp_task is NULL
- .cp_task = NULL

Change as below to avoid this:
- add condition to avoid holding .writepages mutex lock in path
of data flush
- introduce mutex lock sbi.flush_lock to exclude concurrent data
flush in background.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5dae2d39 19-May-2019 Chao Yu <chao@kernel.org>

f2fs: fix to check layout on last valid checkpoint park

As Ju Hyung reported:

"
I was semi-forced today to use the new kernel and test f2fs.

My Ubuntu initramfs got a bit wonky and I had to boot into live CD and
fix some stuffs. The live CD was using 4.15 kernel, and just mounting
the f2fs partition there corrupted f2fs and my 4.19(with 5.1-rc1-4.19
f2fs-stable merged) refused to mount with "SIT is corrupted node"
message.

I used the latest f2fs-tools sent by Chao including "fsck.f2fs: fix to
repair cp_loads blocks at correct position"

It spit out 140M worth of output, but at least I didn't have to run it
twice. Everything returned "Ok" in the 2nd run.
The new log is at
http://arter97.com/f2fs/final

After fixing the image, I used my 4.19 kernel with 5.2-rc1-4.19
f2fs-stable merged and it mounted.

But, I got this:
[ 1.047791] F2FS-fs (nvme0n1p3): layout of large_nat_bitmap is
deprecated, run fsck to repair, chksum_offset: 4092
[ 1.081307] F2FS-fs (nvme0n1p3): Found nat_bits in checkpoint
[ 1.161520] F2FS-fs (nvme0n1p3): recover fsync data on readonly fs
[ 1.162418] F2FS-fs (nvme0n1p3): Mounted with checkpoint version = 761c7e00

But after doing a reboot, the message is gone:
[ 1.098423] F2FS-fs (nvme0n1p3): Found nat_bits in checkpoint
[ 1.177771] F2FS-fs (nvme0n1p3): recover fsync data on readonly fs
[ 1.178365] F2FS-fs (nvme0n1p3): Mounted with checkpoint version = 761c7eda

I'm not exactly sure why the kernel detected that I'm still using the
old layout on the first boot. Maybe fsck didn't fix it properly, or
the check from the kernel is improper.
"

Although we have rebuild the old deprecated checkpoint with new layout
during repair, we only repair last checkpoint park, the other old one is
remained.

Once the image was mounted, we will 1) sanity check layout and 2) decide
which checkpoint park to use according to cp_ver. So that we will print
reported message unnecessarily at step 1), to avoid it, we simply move
layout check into f2fs_sanity_check_ckpt() after step 2).

Reported-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bc88ac96 20-May-2019 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: link f2fs quota ops for sysfile

This patch reverts:
commit fb40d618b039 ("f2fs: don't clear CP_QUOTA_NEED_FSCK_FLAG").

We were missing error handlers used in f2fs quota ops.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c9c8ed50 04-May-2019 Chao Yu <chao@kernel.org>

f2fs: fix to avoid potential race on sbi->unusable_block_count access/update

Use sbi.stat_lock to protect sbi->unusable_block_count accesss/udpate, in
order to avoid potential race on it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 896285ad 26-Apr-2019 Chao Yu <chao@kernel.org>

f2fs: fix to handle error in f2fs_disable_checkpoint()

In f2fs_disable_checkpoint(), it needs to detect and propagate error
number returned from f2fs_write_checkpoint().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b61af314 22-Apr-2019 Chao Yu <chao@kernel.org>

f2fs: fix to skip recovery on readonly device

As Park Ju Hyung reported in mailing list:

https://sourceforge.net/p/linux-f2fs/mailman/message/36639787/

generic_make_request: Trying to write to read-only block-device loop0 (partno 0)
WARNING: CPU: 0 PID: 23437 at block/blk-core.c:2174 generic_make_request_checks+0x594/0x630

generic_make_request+0x46/0x3d0
submit_bio+0x30/0x110
__submit_merged_bio+0x68/0x390
f2fs_submit_page_write+0x1bb/0x7f0
f2fs_do_write_meta_page+0x7f/0x160
__f2fs_write_meta_page+0x70/0x140
f2fs_sync_meta_pages+0x140/0x250
f2fs_write_checkpoint+0x5c5/0x17b0
f2fs_sync_fs+0x9c/0x110
sync_filesystem+0x66/0x80
f2fs_recover_fsync_data+0x790/0xa30
f2fs_fill_super+0xe4e/0x1980
mount_bdev+0x518/0x610
mount_fs+0x34/0x13f
vfs_kern_mount.part.11+0x4f/0x120
do_mount+0x2d1/0xe40
__x64_sys_mount+0xbf/0xe0
do_syscall_64+0x4a/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9

print_req_error: I/O error, dev loop0, sector 4096

If block device is readonly, we should never trigger write IO from
filesystem layer, but previously, orphan and journal recovery didn't
consider such condition, result in triggering above warning, fix it.

Reported-by: Park Ju Hyung <qkrwngud825@gmail.com>
Tested-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f824deb5 22-Apr-2019 Chao Yu <chao@kernel.org>

f2fs: fix to consider multiple device for readonly check

This patch introduce f2fs_hw_is_readonly() to check whether lower
device is readonly or not, it adapts multiple device scenario.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d02a6e61 25-Mar-2019 Chao Yu <chao@kernel.org>

f2fs: allow address pointer number of dnode aligning to specified size

This patch expands scalability of dnode layout, it allows address pointer
number of dnode aligning to specified size (now, the size is one byte by
default), and later the number can align to compress cluster size
(1 << n bytes, n=[2,..)), it can avoid cluster acrossing two dnode, making
design of compress meta layout simple.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7b63f72f 15-Apr-2019 Chao Yu <chao@kernel.org>

f2fs: fix to do sanity check on valid node/block count

As Jungyeon reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=203229

- Overview
When mounting the attached crafted image, following errors are reported.
Additionally, it hangs on sync after trying to mount it.

The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y

- Reproduces
mkdir test
mount -t f2fs tmp.img test
sync

- Kernel message
kernel BUG at fs/f2fs/recovery.c:591!
RIP: 0010:recover_data+0x12d8/0x1780
Call Trace:
f2fs_recover_fsync_data+0x613/0x710
f2fs_fill_super+0x1043/0x1aa0
mount_bdev+0x16d/0x1a0
mount_fs+0x4a/0x170
vfs_kern_mount+0x5d/0x100
do_mount+0x200/0xcf0
ksys_mount+0x79/0xc0
__x64_sys_mount+0x1c/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9

With corrupted image wihch has out-of-range valid node/block count, during
recovery, once we failed due to no free space, it will trigger kernel
panic.

Adding sanity check on valid node/block count in f2fs_sanity_check_ckpt()
to detect such condition, so that potential panic can be avoided.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bda52397 15-Apr-2019 Chao Yu <chao@kernel.org>

f2fs: remove new blank line of f2fs kernel message

Just removing '\n' in f2fs_msg(, "\n") to avoid redundant new blank line.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d01718a0 15-Apr-2019 Al Viro <viro@zeniv.linux.org.uk>

f2fs: switch to ->free_inode()

Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 2c58d548 10-Apr-2019 Eric Biggers <ebiggers@google.com>

fscrypt: cache decrypted symlink target in ->i_link

Path lookups that traverse encrypted symlink(s) are very slow because
each encrypted symlink needs to be decrypted each time it's followed.
This also involves dropping out of rcu-walk mode.

Make encrypted symlinks faster by caching the decrypted symlink target
in ->i_link. The first call to fscrypt_get_symlink() sets it. Then,
the existing VFS path lookup code uses the non-NULL ->i_link to take the
fast path where ->get_link() isn't called, and lookups in rcu-walk mode
remain in rcu-walk mode.

Also set ->i_link immediately when a new encrypted symlink is created.

To safely free the symlink target after an RCU grace period has elapsed,
introduce a new function fscrypt_free_inode(), and make the relevant
filesystems call it just before actually freeing the inode.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 95175daf 15-Mar-2019 Damien Le Moal <damien.lemoal@wdc.com>

f2fs: Reduce zoned block device memory usage

For zoned block devices, an array of zone types for each device is
allocated and initialized in order to determine if a section is stored
on a sequential zone (zone reset needed) or a conventional zone (no
zone reset needed and regular discard applies). Considering this usage,
the zone types stored in memory can be replaced with a bitmap to
indicate an equivalent information, that is, if a zone is sequential or
not. This reduces the memory usage for each zoned device by roughly 8:
on a 14TB disk with zones of 256 MB, the zone type array consumes
13x4KB pages while the bitmap uses only 2x4KB pages.

This patch changes the f2fs_dev_info structure blkz_type field to the
bitmap blkz_seq. Access to this bitmap is done using the helper
function f2fs_blkz_is_seq(), which is a rewrite of the function
get_blkz_type().

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# dd6c89b5 04-Mar-2019 Chao Yu <chao@kernel.org>

f2fs: fix to do sanity check with inode.i_inline_xattr_size

As Paul Bandha reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=202709

When I run the poc on the mounted f2fs img I get a buffer overflow in
read_inline_xattr due to there being no sanity check on the value of
i_inline_xattr_size.

I created the img by just modifying the value of i_inline_xattr_size
in the inode:

i_name [test1.txt]
i_ext: fofs:0 blkaddr:0 len:0
i_extra_isize [0x 18 : 24]
i_inline_xattr_size [0x ffff : 65535]
i_addr[ofs] [0x 0 : 0]

mkdir /mnt/f2fs
mount ./f2fs1.img /mnt/f2fs
gcc poc.c -o poc
./poc

int main() {
int y = syscall(SYS_listxattr, "/mnt/f2fs/test1.txt", NULL, 0);
printf("ret %d", y);
printf("errno: %d\n", errno);

}

BUG: KASAN: slab-out-of-bounds in read_inline_xattr+0x18f/0x260
Read of size 262140 at addr ffff88011035efd8 by task f2fs1poc/3263

CPU: 0 PID: 3263 Comm: f2fs1poc Not tainted 4.18.0-custom #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x71/0xab
print_address_description+0x83/0x250
kasan_report+0x213/0x350
memcpy+0x1f/0x50
read_inline_xattr+0x18f/0x260
read_all_xattrs+0xba/0x190
f2fs_listxattr+0x9d/0x3f0
listxattr+0xb2/0xd0
path_listxattr+0x93/0xe0
do_syscall_64+0x9d/0x220
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Let's add sanity check for inode.i_inline_xattr_size during f2fs_iget()
to avoid this issue.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 70db5b04 12-Mar-2019 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: give some messages for inline_xattr_size

This patch adds some kernel messages when user sets wrong inline_xattr_size.

Fixes: 500e0b28ecd3 ("f2fs: fix to check inline_xattr_size boundary correctly")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 428e3bcf 25-Feb-2019 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: give random value to i_generation

This follows to give random number to i_generation along with commit
232530680290b ("ext4: improve smp scalability for inode generation")

This can be used for DUN for UFS HW encryption.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# aa2c8c43 19-Feb-2019 Chao Yu <chao@kernel.org>

f2fs: fix to retry fill_super only if recovery failed

With current retry mechanism in f2fs_fill_super, first fill_super
fails due to no memory, then second fill_super runs w/o recovery,
if we succeed, we may lose fsynced data, it doesn't make sense.

Let's retry fill_super only if it occurs non-ENOMEM error during
recovery.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6d52e135 14-Feb-2019 Chao Yu <chao@kernel.org>

f2fs: don't allow negative ->write_io_size_bits

As Dan reported:

"We put an upper bound on ->write_io_size_bits but we don't have a lower
bound."

So let's add lower bound check for ->write_io_size_bits in parse_options().

[We don't allow configuring ->write_io_size_bits to zero, since at least
we need to fill one dummy page for aligned IO.]

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 500e0b28 14-Feb-2019 Chao Yu <chao@kernel.org>

f2fs: fix to check inline_xattr_size boundary correctly

We use below condition to check inline_xattr_size boundary:

if (!F2FS_OPTION(sbi).inline_xattr_size ||
F2FS_OPTION(sbi).inline_xattr_size >=
DEF_ADDRS_PER_INODE -
F2FS_TOTAL_EXTRA_ATTR_SIZE -
DEF_INLINE_RESERVED_SIZE -
DEF_MIN_INLINE_SIZE)

There is there problems in that check:
- we should allow inline_xattr_size equaling to min size of inline
{data,dentry} area.
- F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
different size unit, previous one is 4 bytes, latter one is 1 bytes.
- DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
however, we need to consider min size of inline dentry area as well,
minimal inline dentry should at least contain two entries: '.' and
'..', so that min inline_dentry size is 40 bytes.

.bitmap 1 * 1 = 1
.reserved 1 * 1 = 1
.dentry 11 * 2 = 22
.filename 8 * 2 = 16
total 40

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 025cdb16 23-Jan-2019 Chengguang Xu <cgxu519@gmx.com>

f2fs: jump to label 'free_node_inode' when failing from d_make_root()

When sb->s_root is NULL dput() will do nothing,
so jump to label 'free_node_inode' instead of lable
'free_root_inode' when failing from d_make_root().

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a0770e13 03-Mar-2019 zhengliang <zhengliang6@huawei.com>

f2fs: fix to data block override node segment by mistake

v4: Rearrange the previous three versions.

The following scenario could lead to data block override by mistake.

TASK A | TASK kworker | TASK B | TASK C
| | |
open | | |
write | | |
close | | |
| f2fs_write_data_pages | |
| f2fs_write_cache_pages | |
| f2fs_outplace_write_data | |
| f2fs_allocate_data_block (get block in seg S, | |
| S is full, and only | |
| have this valid data | |
| block) | |
| allocate_segment | |
| locate_dirty_segment (mark S as PRE) | |
| f2fs_submit_page_write (submit but is not | |
| written on dev) | |
unlink | | |
iput_final | | |
f2fs_drop_inode | | |
f2fs_truncate | | |
(not evict) | | |
| | write_checkpoint |
| | flush merged bio but not wait file data writeback |
| | set_prefree_as_free (mark S as FREE) |
| | | update NODE/DATA
| | | allocate_segment (select S)
| writeback done | |

So we need to guarantee io complete before truncate inode in f2fs_drop_inode.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Zheng Liang <zhengliang6@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 812a9597 22-Jan-2019 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: sync filesystem after roll-forward recovery

Some works after roll-forward recovery can get an error which will release
all the data structures. Let's flush them in order to make it clean.

One possible corruption came from:

[ 90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
[ 90.675349] Call trace:
[ 90.677869] __list_del_entry_valid+0x94/0xb4
[ 90.682351] remove_dirty_inode+0xac/0x114
[ 90.686563] __f2fs_write_data_pages+0x6a8/0x6c8
[ 90.691302] f2fs_write_data_pages+0x40/0x4c
[ 90.695695] do_writepages+0x80/0xf0
[ 90.699372] __writeback_single_inode+0xdc/0x4ac
[ 90.704113] writeback_sb_inodes+0x280/0x440
[ 90.708501] wb_writeback+0x1b8/0x3d0
[ 90.712267] wb_workfn+0x1a8/0x4d4
[ 90.715765] process_one_work+0x1c0/0x3d4
[ 90.719883] worker_thread+0x224/0x344
[ 90.723739] kthread+0x120/0x130
[ 90.727055] ret_from_fork+0x10/0x18

Reported-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0e0667b6 27-Jan-2019 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: flush quota blocks after turnning it off

After quota_off, we'll get some dirty blocks. If put_super don't have a chance
to flush them by checkpoint, it causes NULL pointer exception in end_io after
iput(node_inode). (e.g., by checkpoint=disable)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# db610a64 24-Jan-2019 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add quick mode of checkpoint=disable for QA

This mode returns mount() quickly with EAGAIN. We can trigger this by
shutdown(F2FS_GOING_DOWN_NEED_FSCK).

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 03f2c02d 14-Jan-2019 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: run discard jobs when put_super

When we umount f2fs, we need to avoid long delay due to discard commands, which
is actually taking tens of seconds, if storage is very slow on UNMAP. So, this
patch introduces timeout-based work on it.

By default, let me give 5 seconds for discard.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 643fa961 12-Dec-2018 Chandan Rajendra <chandan@linux.vnet.ibm.com>

fscrypt: remove filesystem specific build config option

In order to have a common code base for fscrypt "post read" processing
for all filesystems which support encryption, this commit removes
filesystem specific build config option (e.g. CONFIG_EXT4_FS_ENCRYPTION)
and replaces it with a build option (i.e. CONFIG_FS_ENCRYPTION) whose
value affects all the filesystems making use of fscrypt.

Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>


# 21acc07d 04-Jan-2019 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: no need to check return value of debugfs_create functions

When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.

Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# c20e57b3 04-Jan-2019 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: no need to check return value of debugfs_create functions

When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.

Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f365c6cc 01-Jan-2019 Chengguang Xu <cgxu519@gmx.com>

f2fs: change error code to -ENOMEM from -EINVAL

The error case of failing allocating memory should
return -ENOMEM.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7c77bf7d 01-Jan-2019 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: don't access node/meta inode mapping after iput

This fixes wrong access of address spaces of node and meta inodes after iput.

Fixes: 60aa4d5536ab ("f2fs: fix use-after-free issue when accessing sbi->stat_info")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 60aa4d55 25-Dec-2018 Sahitya Tummala <stummala@codeaurora.org>

f2fs: fix use-after-free issue when accessing sbi->stat_info

iput() on sbi->node_inode can update sbi->stat_info
in the below context, if the f2fs_write_checkpoint()
has failed with error.

f2fs_balance_fs_bg+0x1ac/0x1ec
f2fs_write_node_pages+0x4c/0x260
do_writepages+0x80/0xbc
__writeback_single_inode+0xdc/0x4ac
writeback_single_inode+0x9c/0x144
write_inode_now+0xc4/0xec
iput+0x194/0x22c
f2fs_put_super+0x11c/0x1e8
generic_shutdown_super+0x70/0xf4
kill_block_super+0x2c/0x5c
kill_f2fs_super+0x44/0x50
deactivate_locked_super+0x60/0x8c
deactivate_super+0x68/0x74
cleanup_mnt+0x40/0x78

Fix this by moving f2fs_destroy_stats() further below iput() in
both f2fs_put_super() and f2fs_fill_super() paths.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 88960068 22-Dec-2018 Martin Blumenstingl <martin.blumenstingl@googlemail.com>

f2fs: fix validation of the block count in sanity_check_raw_super

Treat "block_count" from struct f2fs_super_block as 64-bit little endian
value in sanity_check_raw_super() because struct f2fs_super_block
declares "block_count" as "__le64".

This fixes a bug where the superblock validation fails on big endian
devices with the following error:
F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
F2FS-fs (sda1): Can't find valid F2FS filesystem in 1th superblock
F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
F2FS-fs (sda1): Can't find valid F2FS filesystem in 2th superblock
As result of this the partition cannot be mounted.

With this patch applied the superblock validation works fine and the
partition can be mounted again:
F2FS-fs (sda1): Mounted with checkpoint version = 7c84

My little endian x86-64 hardware was able to mount the partition without
this fix.
To confirm that mounting f2fs filesystems works on big endian machines
again I tested this on a 32-bit MIPS big endian (lantiq) device.

Fixes: 0cfe75c5b01199 ("f2fs: enhance sanity_check_raw_super() to avoid potential overflows")
Cc: stable@vger.kernel.org
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 8f31b466 17-Dec-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix missing unlock(sbi->gc_mutex)

This fixes missing unlock call.

Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5222595d 13-Dec-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use kvmalloc, if kmalloc is failed

One report says memalloc failure during mount.

(unwind_backtrace) from [<c010cd4c>] (show_stack+0x10/0x14)
(show_stack) from [<c049c6b8>] (dump_stack+0x8c/0xa0)
(dump_stack) from [<c024fcf0>] (warn_alloc+0xc4/0x160)
(warn_alloc) from [<c0250218>] (__alloc_pages_nodemask+0x3f4/0x10d0)
(__alloc_pages_nodemask) from [<c0270450>] (kmalloc_order_trace+0x2c/0x120)
(kmalloc_order_trace) from [<c03fa748>] (build_node_manager+0x35c/0x688)
(build_node_manager) from [<c03de494>] (f2fs_fill_super+0xf0c/0x16cc)
(f2fs_fill_super) from [<c02a5864>] (mount_bdev+0x15c/0x188)
(mount_bdev) from [<c03da624>] (f2fs_mount+0x18/0x20)
(f2fs_mount) from [<c02a68b8>] (mount_fs+0x158/0x19c)
(mount_fs) from [<c02c3c9c>] (vfs_kern_mount+0x78/0x134)
(vfs_kern_mount) from [<c02c76ac>] (do_mount+0x474/0xca4)
(do_mount) from [<c02c8264>] (SyS_mount+0x94/0xbc)
(SyS_mount) from [<c0108180>] (ret_fast_syscall+0x0/0x48)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 19880e6e 23-Nov-2018 Alexey Dobriyan <adobriyan@gmail.com>

f2fs: make "f2fs_fault_name[]" const char *

Those strings are immutable.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7beb01f7 24-Oct-2018 Chao Yu <chao@kernel.org>

f2fs: clean up f2fs_sb_has_##feature_name

In F2FS_HAS_FEATURE(), we will use F2FS_SB(sb) to get sbi pointer to
access .raw_super field, to avoid unneeded pointer conversion, this
patch changes to F2FS_HAS_FEATURE() accept sbi parameter directly.

Just do cleanup, no logic change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e3080b01 24-Oct-2018 Chao Yu <chao@kernel.org>

f2fs: support subsectional garbage collection

Section is minimal garbage collection unit of f2fs, in zoned block
device, or ancient block mapping flash device, in order to improve
GC efficiency, we can align GC unit to lower device erase unit,
normally, it consists of multiple of segments.

Once background or foreground GC triggers, it brings a large number
of IOs, which will impact user IO, and also occupy cpu/memory resource
intensively.

So, to reduce impact of GC on large size section, this patch supports
subsectional GC, in one cycle of GC, it only migrate partial segment{s}
in victim section. Currently, by default, we use sbi->segs_per_sec as
migration granularity.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 089842de 24-Oct-2018 Yunlong Song <yunlong.song@huawei.com>

f2fs: remove codes of unused wio_mutex

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# af033b2a 20-Sep-2018 Chao Yu <chao@kernel.org>

f2fs: guarantee journalled quota data by checkpoint

For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.

The implementation is as below:

1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.

2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.

3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().

Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 26b5a079 12-Oct-2018 Sheng Yong <shengyong1@huawei.com>

f2fs: cleanup dirty pages if recover failed

During recover, we will try to create new dentries for inodes with
dentry_mark. But if the parent is missing (e.g. killed by fsck),
recover will break. But those recovered dirty pages are not cleanup.
This will hit f2fs_bug_on:

[ 53.519566] F2FS-fs (loop0): Found nat_bits in checkpoint
[ 53.539354] F2FS-fs (loop0): recover_inode: ino = 5, name = file, inline = 3
[ 53.539402] F2FS-fs (loop0): recover_dentry: ino = 5, name = file, dir = 0, err = -2
[ 53.545760] F2FS-fs (loop0): Cannot recover all fsync data errno=-2
[ 53.546105] F2FS-fs (loop0): access invalid blkaddr:4294967295
[ 53.546171] WARNING: CPU: 1 PID: 1798 at fs/f2fs/checkpoint.c:163 f2fs_is_valid_blkaddr+0x26c/0x320
[ 53.546174] Modules linked in:
[ 53.546183] CPU: 1 PID: 1798 Comm: mount Not tainted 4.19.0-rc2+ #1
[ 53.546186] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 53.546191] RIP: 0010:f2fs_is_valid_blkaddr+0x26c/0x320
[ 53.546195] Code: 85 bb 00 00 00 48 89 df 88 44 24 07 e8 ad a8 db ff 48 8b 3b 44 89 e1 48 c7 c2 40 03 72 a9 48 c7 c6 e0 01 72 a9 e8 84 3c ff ff <0f> 0b 0f b6 44 24 07 e9 8a 00 00 00 48 8d bf 38 01 00 00 e8 7c a8
[ 53.546201] RSP: 0018:ffff88006c067768 EFLAGS: 00010282
[ 53.546208] RAX: 0000000000000000 RBX: ffff880068844200 RCX: ffffffffa83e1a33
[ 53.546211] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88006d51e590
[ 53.546215] RBP: 0000000000000005 R08: ffffed000daa3cb3 R09: ffffed000daa3cb3
[ 53.546218] R10: 0000000000000001 R11: ffffed000daa3cb2 R12: 00000000ffffffff
[ 53.546221] R13: ffff88006a1f8000 R14: 0000000000000200 R15: 0000000000000009
[ 53.546226] FS: 00007fb2f3646840(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
[ 53.546229] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 53.546234] CR2: 00007f0fd77f0008 CR3: 00000000687e6002 CR4: 00000000000206e0
[ 53.546237] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 53.546240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 53.546242] Call Trace:
[ 53.546248] f2fs_submit_page_bio+0x95/0x740
[ 53.546253] read_node_page+0x161/0x1e0
[ 53.546271] ? truncate_node+0x650/0x650
[ 53.546283] ? add_to_page_cache_lru+0x12c/0x170
[ 53.546288] ? pagecache_get_page+0x262/0x2d0
[ 53.546292] __get_node_page+0x200/0x660
[ 53.546302] f2fs_update_inode_page+0x4a/0x160
[ 53.546306] f2fs_write_inode+0x86/0xb0
[ 53.546317] __writeback_single_inode+0x49c/0x620
[ 53.546322] writeback_single_inode+0xe4/0x1e0
[ 53.546326] sync_inode_metadata+0x93/0xd0
[ 53.546330] ? sync_inode+0x10/0x10
[ 53.546342] ? do_raw_spin_unlock+0xed/0x100
[ 53.546347] f2fs_sync_inode_meta+0xe0/0x130
[ 53.546351] f2fs_fill_super+0x287d/0x2d10
[ 53.546367] ? vsnprintf+0x742/0x7a0
[ 53.546372] ? f2fs_commit_super+0x180/0x180
[ 53.546379] ? up_write+0x20/0x40
[ 53.546385] ? set_blocksize+0x5f/0x140
[ 53.546391] ? f2fs_commit_super+0x180/0x180
[ 53.546402] mount_bdev+0x181/0x200
[ 53.546406] mount_fs+0x94/0x180
[ 53.546411] vfs_kern_mount+0x6c/0x1e0
[ 53.546415] do_mount+0xe5e/0x1510
[ 53.546420] ? fs_reclaim_release+0x9/0x30
[ 53.546424] ? copy_mount_string+0x20/0x20
[ 53.546428] ? fs_reclaim_acquire+0xd/0x30
[ 53.546435] ? __might_sleep+0x2c/0xc0
[ 53.546440] ? ___might_sleep+0x53/0x170
[ 53.546453] ? __might_fault+0x4c/0x60
[ 53.546468] ? _copy_from_user+0x95/0xa0
[ 53.546474] ? memdup_user+0x39/0x60
[ 53.546478] ksys_mount+0x88/0xb0
[ 53.546482] __x64_sys_mount+0x5d/0x70
[ 53.546495] do_syscall_64+0x65/0x130
[ 53.546503] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.547639] ---[ end trace b804d1ea2fec893e ]---

So if recover fails, we need to drop all recovered data.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9149a5eb 07-Oct-2018 Chao Yu <chao@kernel.org>

f2fs: spread f2fs_set_inode_flags()

This patch changes codes as below:
- use f2fs_set_inode_flags() to update i_flags atomically to avoid
potential race.
- synchronize F2FS_I(inode)->i_flags to inode->i_flags in
f2fs_new_inode().
- use f2fs_set_inode_flags() to simply codes in f2fs_quota_{on,off}.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 730746ce 02-Oct-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: allow to mount, if quota is failed

Since we can use the filesystem without quotas till next boot.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4354994f 20-Aug-2018 Daniel Rosenberg <drosen@google.com>

f2fs: checkpoint disabling

Note that, it requires "f2fs: return correct errno in f2fs_gc".

This adds a lightweight non-persistent snapshotting scheme to f2fs.

To use, mount with the option checkpoint=disable, and to return to
normal operation, remount with checkpoint=enable. If the filesystem
is shut down before remounting with checkpoint=enable, it will revert
back to its apparent state when it was first mounted with
checkpoint=disable. This is useful for situations where you wish to be
able to roll back the state of the disk in case of some critical
failure.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
[Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d440c52d 28-Sep-2018 Junling Zheng <zhengjunling@huawei.com>

f2fs: support superblock checksum

Now we support crc32 checksum for superblock.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 095680f2 28-Sep-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: keep lazytime on remount

This patch fixes losing lazytime when remounting f2fs.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c6b1867b 22-Sep-2018 Chengguang Xu <cgxu519@gmx.com>

f2fs: fix remount problem of option io_bits

Currently we show mount option "io_bits=%u" as "io_size=%uKB",
it will cause option parsing problem(unrecognized mount option)
in remount.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a7d10cf3 19-Sep-2018 Sahitya Tummala <stummala@codeaurora.org>

f2fs: add new idle interval timing for discard and gc paths

This helps to control the frequency of submission of discard and
GC requests independently, based on the need.

Suggested-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6f5c2ed0 11-Sep-2018 Chao Yu <chao@kernel.org>

f2fs: split IO error injection according to RW

This patch adds to support injecting error for write IO, this can simulate
IO error like fail_make_request or dm_flakey does.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7c1a000d 11-Sep-2018 Chao Yu <chao@kernel.org>

f2fs: add SPDX license identifiers

Remove the verbose license text from f2fs files and replace them with
SPDX tags. This does not change the license of any of the code.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4cb037ec 11-Sep-2018 Chengguang Xu <cgxu519@gmx.com>

f2fs: surround fault_injection related option parsing using CONFIG_F2FS_FAULT_INJECTION

It's a little bit strange when fault_injection related
options fail with -EINVAL which were already disabled
from config, so surround all fault_injection related option
parsing code using CONFIG_F2FS_FAULT_INJECTION. Meanwhile,
slightly change warning message to keep consistency with
option POSIX_ACL and FS_XATTR.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 042be0f8 06-Sep-2018 Chao Yu <chao@kernel.org>

f2fs: fix to do sanity check with current segment number

https://bugzilla.kernel.org/show_bug.cgi?id=200219

Reproduction way:
- mount image
- run poc code
- umount image

F2FS-fs (loop1): Bitmap was wrongly set, blk:15364
------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/segment.c:2061!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 17686 Comm: umount Tainted: G W O 4.18.0-rc2+ #39
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: update_sit_entry+0x459/0x4e0 [f2fs]
Code: e8 1c b5 fd ff 0f 0b 0f 0b 8b 45 e4 c7 44 24 08 9c 7a 6c f8 c7 44 24 04 bc 4a 6c f8 89 44 24 0c 8b 06 89 04 24 e8 f7 b4 fd ff <0f> 0b 8b 45 e4 0f b6 d2 89 54 24 10 c7 44 24 08 60 7a 6c f8 c7 44
EAX: 00000032 EBX: 000000f8 ECX: 00000002 EDX: 00000001
ESI: d7177000 EDI: f520fe68 EBP: d6477c6c ESP: d6477c34
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010282
CR0: 80050033 CR2: b7fbe000 CR3: 2a99b3c0 CR4: 000406f0
Call Trace:
f2fs_allocate_data_block+0x124/0x580 [f2fs]
do_write_page+0x78/0x150 [f2fs]
f2fs_do_write_node_page+0x25/0xa0 [f2fs]
__write_node_page+0x2bf/0x550 [f2fs]
f2fs_sync_node_pages+0x60e/0x6d0 [f2fs]
? sync_inode_metadata+0x2f/0x40
? f2fs_write_checkpoint+0x28f/0x7d0 [f2fs]
? up_write+0x1e/0x80
f2fs_write_checkpoint+0x2a9/0x7d0 [f2fs]
? mark_held_locks+0x5d/0x80
? _raw_spin_unlock_irq+0x27/0x50
kill_f2fs_super+0x68/0x90 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x81/0xa0
exit_to_usermode_loop+0x59/0xa7
do_fast_syscall_32+0x1f5/0x22c
entry_SYSENTER_32+0x53/0x86
EIP: 0xb7f95c51
Code: c1 1e f7 ff ff 89 e5 8b 55 08 85 d2 8b 81 64 cd ff ff 74 02 89 02 5d c3 8b 0c 24 c3 8b 1c 24 c3 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
EAX: 00000000 EBX: 0871ab90 ECX: bfb2cd00 EDX: 00000000
ESI: 00000000 EDI: 0871ab90 EBP: 0871ab90 ESP: bfb2cd7c
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246
Modules linked in: f2fs(O) crc32_generic bnep rfcomm bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev aesni_intel snd_seq_device aes_i586 snd_timer crypto_simd snd cryptd soundcore mac_hid serio_raw video i2c_piix4 parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000 [last unloaded: f2fs]
---[ end trace d423f83982cfcdc5 ]---

The reason is, different log headers using the same segment, once
one log's next block address is used by another log, it will cause
panic as above.

Main area: 24 segs, 24 secs 24 zones
- COLD data: 0, 0, 0
- WARM data: 1, 1, 1
- HOT data: 20, 20, 20
- Dir dnode: 22, 22, 22
- File dnode: 22, 22, 22
- Indir nodes: 21, 21, 21

So this patch adds sanity check to detect such condition to avoid
this issue.

Signed-off-by: Chao Yu <yuchao0@huawei.com>

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4a70e255 05-Sep-2018 Chao Yu <chao@kernel.org>

f2fs: fix memory leak of percpu counter in fill_super()

In fill_super -> init_percpu_info, we should destroy percpu counter
in error path, otherwise memory allcoated for percpu counter will
leak.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0b2103e8 05-Sep-2018 Chao Yu <chao@kernel.org>

f2fs: fix memory leak of write_io in fill_super()

It needs to release memory allocated for sbi->write_io in error path,
otherwise, it will cause memory leak.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1378752b 22-Aug-2018 Chao Yu <chao@kernel.org>

f2fs: fix to flush all dirty inodes recovered in readonly fs

generic/417 reported as blow:

------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:695!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 21697 Comm: umount Tainted: G W O 4.18.0-rc2+ #39
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
Call Trace:
? _raw_spin_unlock+0x2c/0x50
evict+0xa8/0x170
dispose_list+0x34/0x40
evict_inodes+0x118/0x120
generic_shutdown_super+0x41/0x100
? rcu_read_lock_sched_held+0x97/0xa0
kill_block_super+0x22/0x50
kill_f2fs_super+0x6f/0x80 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x81/0xa0
exit_to_usermode_loop+0x59/0xa7
do_fast_syscall_32+0x1f5/0x22c
entry_SYSENTER_32+0x53/0x86
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]

It can simply reproduced with scripts:

Enable quota feature during mkfs.

Testcase1:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4k" -c "fsync"
4. godown /mnt/f2fs
5. umount /mnt/f2fs
6. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
7. umount /mnt/f2fs

Testcase2:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. touch /mnt/f2fs/file
4. create process[pid = x] do:
a) open /mnt/f2fs/file;
b) unlink /mnt/f2fs/file
5. godown -f /mnt/f2fs
6. kill process[pid = x]
7. umount /mnt/f2fs
8. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
9. umount /mnt/f2fs

The reason is: during recovery, i_{c,m}time of inode will be updated, then
the inode can be set dirty w/o being tracked in sbi->inode_list[DIRTY_META]
global list, so later write_checkpoint will not flush such dirty inode into
node page.

Once umount is called, sync_filesystem() in generic_shutdown_super() will
skip syncng dirty inodes due to sb_rdonly check, leaving dirty inodes
there.

To solve this issue, during umount, add remove SB_RDONLY flag in
sb->s_flags, to make sure sync_filesystem() will not be skipped.

Signed-off-by: Chao Yu <yuchao0@huawei.com>

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# cda9cc59 25-Jun-2018 Yunlei He <heyunlei@huawei.com>

f2fs: report error if quota off error during umount

Now, we depend on fsck to ensure quota file data is ok,
so we scan whole partition if checkpoint without umount
flag. It's same for quota off error case, which may make
quota file data inconsistent.

generic/019 reports below error:

__quota_error: 1160 callbacks suppressed
Quota error (device zram1): write_blk: dquota write failed
Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
Quota error (device zram1): write_blk: dquota write failed
Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
Quota error (device zram1): write_blk: dquota write failed
Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
Quota error (device zram1): write_blk: dquota write failed
Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
Quota error (device zram1): write_blk: dquota write failed
Quota error (device zram1): qtree_write_dquot: Error -28 occurred while creating quota
VFS: Busy inodes after unmount of zram1. Self-destruct in 5 seconds. Have a nice day...

If we failed in below path due to fail to write dquot block, we will miss
to release quota inode, fix it.

- f2fs_put_super
- f2fs_quota_off_umount
- f2fs_quota_off
- f2fs_quota_sync <-- failed
- dquot_quota_off <-- missed to call

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 22d7ea13 22-Aug-2018 Chao Yu <chao@kernel.org>

Revert "f2fs: use printk_ratelimited for f2fs_msg"

Don't limit printing log, so that we will not miss any key messages.

This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff.

In addition, we use printk_ratelimited to avoid too many log prints.
- error injection
- discard submission failure

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7d20c8ab 03-Sep-2018 Chao Yu <chao@kernel.org>

f2fs: fix to avoid NULL pointer dereference on se->discard_map

https://bugzilla.kernel.org/show_bug.cgi?id=200951

These is a NULL pointer dereference issue reported in bugzilla:

Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.

The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS

The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard

The USB host is USB3.0 and UASP capable. It is the one on RK3399.

Given this everything works fine, except there is no TRIM support.

In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference

Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem

Regards,
Vicenç.

[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280

Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---

The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.

So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 853137ce 09-Aug-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix performance issue observed with multi-thread sequential read

This reverts the commit - "b93f771 - f2fs: remove writepages lock"
to fix the drop in sequential read throughput.

Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
device: UFS

Before -
read throughput: 185 MB/s
total read requests: 85177 (of these ~80000 are 4KB size requests).
total write requests: 2546 (of these ~2208 requests are written in 512KB).

After -
read throughput: 758 MB/s
total read requests: 2417 (of these ~2042 are 512KB reads).
total write requests: 2701 (of these ~2034 requests are written in 512KB).

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d494500a 08-Aug-2018 Chao Yu <chao@kernel.org>

f2fs: support fault_type mount option

Previously, once fault injection is on, by default, all kind of faults
will be injected to f2fs, if we want to trigger single or specified
combined type during the test, we need to configure sysfs entry, it will
be a little inconvenient to integrate sysfs configuring into testsuit,
such as xfstest.

So this patch introduces a new mount option 'fault_type' to assist old
option 'fault_injection', with these two mount options, we can specify
any fault rate/type at mount-time.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b83dcfe6 06-Aug-2018 Chao Yu <chao@kernel.org>

f2fs: support discard submission error injection

This patch adds to support discard submission error injection for testing
error handling of __submit_discard_cmd().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bcbfbd60 28-Jun-2018 Chao Yu <chao@kernel.org>

f2fs: fix to do sanity check with inline flags

https://bugzilla.kernel.org/show_bug.cgi?id=200221

- Overview
BUG() in clear_inode() when mounting and un-mounting a corrupted f2fs image

- Reproduce

- Kernel message
[ 538.601448] F2FS-fs (loop0): Invalid segment/section count (31, 24 x 1376257)
[ 538.601458] F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock
[ 538.724091] F2FS-fs (loop0): Try to recover 2th superblock, ret: 0
[ 538.724102] F2FS-fs (loop0): Mounted with checkpoint version = 2
[ 540.970834] ------------[ cut here ]------------
[ 540.970838] kernel BUG at fs/inode.c:512!
[ 540.971750] invalid opcode: 0000 [#1] SMP KASAN PTI
[ 540.972755] CPU: 1 PID: 1305 Comm: umount Not tainted 4.18.0-rc1+ #4
[ 540.974034] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 540.982913] RIP: 0010:clear_inode+0xc0/0xd0
[ 540.983774] Code: 8d a3 30 01 00 00 4c 89 e7 e8 1c ec f8 ff 48 8b 83 30 01 00 00 49 39 c4 75 1a 48 c7 83 a0 00 00 00 60 00 00 00 5b 41 5c 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 1f 40 00 66 66 66 66 90 55
[ 540.987570] RSP: 0018:ffff8801e34a7b70 EFLAGS: 00010002
[ 540.988636] RAX: 0000000000000000 RBX: ffff8801e9b744e8 RCX: ffffffffb840eb3a
[ 540.990063] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8801e9b746b8
[ 540.991499] RBP: ffff8801e34a7b80 R08: ffffed003d36e8ce R09: ffffed003d36e8ce
[ 540.992923] R10: 0000000000000001 R11: ffffed003d36e8cd R12: ffff8801e9b74668
[ 540.994360] R13: ffff8801e9b74760 R14: ffff8801e9b74528 R15: ffff8801e9b74530
[ 540.995786] FS: 00007f4662bdf840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 540.997403] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 540.998571] CR2: 000000000175c568 CR3: 00000001dcfe6000 CR4: 00000000000006e0
[ 541.000015] Call Trace:
[ 541.000554] f2fs_evict_inode+0x253/0x630
[ 541.001381] evict+0x16f/0x290
[ 541.002015] iput+0x280/0x300
[ 541.002654] dentry_unlink_inode+0x165/0x1e0
[ 541.003528] __dentry_kill+0x16a/0x260
[ 541.004300] dentry_kill+0x70/0x250
[ 541.005018] dput+0x154/0x1d0
[ 541.005635] do_one_tree+0x34/0x40
[ 541.006354] shrink_dcache_for_umount+0x3f/0xa0
[ 541.007285] generic_shutdown_super+0x43/0x1c0
[ 541.008192] kill_block_super+0x52/0x80
[ 541.008978] kill_f2fs_super+0x62/0x70
[ 541.009750] deactivate_locked_super+0x6f/0xa0
[ 541.010664] deactivate_super+0x5e/0x80
[ 541.011450] cleanup_mnt+0x61/0xa0
[ 541.012151] __cleanup_mnt+0x12/0x20
[ 541.012893] task_work_run+0xc8/0xf0
[ 541.013635] exit_to_usermode_loop+0x125/0x130
[ 541.014555] do_syscall_64+0x138/0x170
[ 541.015340] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 541.016375] RIP: 0033:0x7f46624bf487
[ 541.017104] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 c9 2b 00 f7 d8 64 89 01 48
[ 541.020923] RSP: 002b:00007fff5e12e9a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 541.022452] RAX: 0000000000000000 RBX: 0000000001753030 RCX: 00007f46624bf487
[ 541.023885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000175a1e0
[ 541.025318] RBP: 000000000175a1e0 R08: 0000000000000000 R09: 0000000000000014
[ 541.026755] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f46629c883c
[ 541.028186] R13: 0000000000000000 R14: 0000000001753210 R15: 00007fff5e12ec30
[ 541.029626] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[ 541.039445] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 541.040392] RIP: 0010:clear_inode+0xc0/0xd0
[ 541.041240] Code: 8d a3 30 01 00 00 4c 89 e7 e8 1c ec f8 ff 48 8b 83 30 01 00 00 49 39 c4 75 1a 48 c7 83 a0 00 00 00 60 00 00 00 5b 41 5c 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 1f 40 00 66 66 66 66 90 55
[ 541.045042] RSP: 0018:ffff8801e34a7b70 EFLAGS: 00010002
[ 541.046099] RAX: 0000000000000000 RBX: ffff8801e9b744e8 RCX: ffffffffb840eb3a
[ 541.047537] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8801e9b746b8
[ 541.048965] RBP: ffff8801e34a7b80 R08: ffffed003d36e8ce R09: ffffed003d36e8ce
[ 541.050402] R10: 0000000000000001 R11: ffffed003d36e8cd R12: ffff8801e9b74668
[ 541.051832] R13: ffff8801e9b74760 R14: ffff8801e9b74528 R15: ffff8801e9b74530
[ 541.053263] FS: 00007f4662bdf840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 541.054891] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 541.056039] CR2: 000000000175c568 CR3: 00000001dcfe6000 CR4: 00000000000006e0
[ 541.058506] ==================================================================
[ 541.059991] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0
[ 541.061513] Read of size 8 at addr ffff8801e34a7970 by task umount/1305

[ 541.063302] CPU: 1 PID: 1305 Comm: umount Tainted: G D 4.18.0-rc1+ #4
[ 541.064838] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 541.066778] Call Trace:
[ 541.067294] dump_stack+0x7b/0xb5
[ 541.067986] print_address_description+0x70/0x290
[ 541.068941] kasan_report+0x291/0x390
[ 541.069692] ? update_stack_state+0x38c/0x3e0
[ 541.070598] __asan_load8+0x54/0x90
[ 541.071315] update_stack_state+0x38c/0x3e0
[ 541.072172] ? __read_once_size_nocheck.constprop.7+0x20/0x20
[ 541.073340] ? vprintk_func+0x27/0x60
[ 541.074096] ? printk+0xa3/0xd3
[ 541.074762] ? __save_stack_trace+0x5e/0x100
[ 541.075634] unwind_next_frame.part.5+0x18e/0x490
[ 541.076594] ? unwind_dump+0x290/0x290
[ 541.077368] ? __show_regs+0x2c4/0x330
[ 541.078142] __unwind_start+0x106/0x190
[ 541.085422] __save_stack_trace+0x5e/0x100
[ 541.086268] ? __save_stack_trace+0x5e/0x100
[ 541.087161] ? unlink_anon_vmas+0xba/0x2c0
[ 541.087997] save_stack_trace+0x1f/0x30
[ 541.088782] save_stack+0x46/0xd0
[ 541.089475] ? __alloc_pages_slowpath+0x1420/0x1420
[ 541.090477] ? flush_tlb_mm_range+0x15e/0x220
[ 541.091364] ? __dec_node_state+0x24/0xb0
[ 541.092180] ? lock_page_memcg+0x85/0xf0
[ 541.092979] ? unlock_page_memcg+0x16/0x80
[ 541.093812] ? page_remove_rmap+0x198/0x520
[ 541.094674] ? mark_page_accessed+0x133/0x200
[ 541.095559] ? _cond_resched+0x1a/0x50
[ 541.096326] ? unmap_page_range+0xcd4/0xe50
[ 541.097179] ? rb_next+0x58/0x80
[ 541.097845] ? rb_next+0x58/0x80
[ 541.098518] __kasan_slab_free+0x13c/0x1a0
[ 541.099352] ? unlink_anon_vmas+0xba/0x2c0
[ 541.100184] kasan_slab_free+0xe/0x10
[ 541.100934] kmem_cache_free+0x89/0x1e0
[ 541.101724] unlink_anon_vmas+0xba/0x2c0
[ 541.102534] free_pgtables+0x101/0x1b0
[ 541.103299] exit_mmap+0x146/0x2a0
[ 541.103996] ? __ia32_sys_munmap+0x50/0x50
[ 541.104829] ? kasan_check_read+0x11/0x20
[ 541.105649] ? mm_update_next_owner+0x322/0x380
[ 541.106578] mmput+0x8b/0x1d0
[ 541.107191] do_exit+0x43a/0x1390
[ 541.107876] ? mm_update_next_owner+0x380/0x380
[ 541.108791] ? deactivate_super+0x5e/0x80
[ 541.109610] ? cleanup_mnt+0x61/0xa0
[ 541.110351] ? __cleanup_mnt+0x12/0x20
[ 541.111115] ? task_work_run+0xc8/0xf0
[ 541.111879] ? exit_to_usermode_loop+0x125/0x130
[ 541.112817] rewind_stack_do_exit+0x17/0x20
[ 541.113666] RIP: 0033:0x7f46624bf487
[ 541.114404] Code: Bad RIP value.
[ 541.115094] RSP: 002b:00007fff5e12e9a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 541.116605] RAX: 0000000000000000 RBX: 0000000001753030 RCX: 00007f46624bf487
[ 541.118034] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000175a1e0
[ 541.119472] RBP: 000000000175a1e0 R08: 0000000000000000 R09: 0000000000000014
[ 541.120890] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f46629c883c
[ 541.122321] R13: 0000000000000000 R14: 0000000001753210 R15: 00007fff5e12ec30

[ 541.124061] The buggy address belongs to the page:
[ 541.125042] page:ffffea00078d29c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 541.126651] flags: 0x2ffff0000000000()
[ 541.127418] raw: 02ffff0000000000 dead000000000100 dead000000000200 0000000000000000
[ 541.128963] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 541.130516] page dumped because: kasan: bad access detected

[ 541.131954] Memory state around the buggy address:
[ 541.132924] ffff8801e34a7800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00
[ 541.134378] ffff8801e34a7880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 541.135814] >ffff8801e34a7900: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
[ 541.137253] ^
[ 541.138637] ffff8801e34a7980: f1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 541.140075] ffff8801e34a7a00: 00 00 00 00 00 00 00 00 f3 00 00 00 00 00 00 00
[ 541.141509] ==================================================================

- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/inode.c#L512
BUG_ON(inode->i_data.nrpages);

The root cause is root directory inode is corrupted, it has both
inline_data and inline_dentry flag, and its nlink is zero, so in
->evict(), after dropping all page cache, it grabs page #0 for inline
data truncation, result in panic in later clear_inode() where we will
check inode->i_data.nrpages value.

This patch adds inline flags check in sanity_check_inode, in addition,
do sanity check with root inode's nlink.

Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 50fa53ec 02-Aug-2018 Chao Yu <chao@kernel.org>

f2fs: fix to avoid broken of dnode block list

f2fs recovery flow is relying on dnode block link list, it means fsynced
file recovery depends on previous dnode's persistence in the list, so
during fsync() we should wait on all regular inode's dnode writebacked
before issuing flush.

By this way, we can avoid dnode block list being broken by out-of-order
IO submission due to IO scheduler or driver.

Sheng Yong helps to do the test with this patch:

Target:/data (f2fs, -)
64MB / 32768KB / 4KB / 8

1 / PERSIST / Index

Base:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 867.82 204.15 41440.03 41370.54 680.8 1025.94 1031.08
2 871.87 205.87 41370.3 40275.2 791.14 1065.84 1101.7
3 866.52 205.69 41795.67 40596.16 694.69 1037.16 1031.48
Avg 868.7366667 205.2366667 41535.33333 40747.3 722.21 1042.98 1054.753333

After:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 798.81 202.5 41143 40613.87 602.71 838.08 913.83
2 805.79 206.47 40297.2 41291.46 604.44 840.75 924.27
3 814.83 206.17 41209.57 40453.62 602.85 834.66 927.91
Avg 806.4766667 205.0466667 40883.25667 40786.31667 603.3333333 837.83 922.0033333

Patched/Original:
0.928332713 0.999074239 0.984300676 1.000957528 0.835398753 0.803303994 0.874141189

It looks like atomic write will suffer performance regression.

I suspect that the criminal is that we forcing to wait all dnode being in
storage cache before we issue PREFLUSH+FUA.

BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
cause the problem: we will lose data of last transaction after SPO, even if
atomic write return no error:

- atomic_open();
- write() P1, P2, P3;
- atomic_commit();
- writeback data: P1, P2, P3;
- writeback node: N1, N2, N3; <--- If N1, N2 is not writebacked, N3 with fsync_mark is
writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
last transaction.
- preflush + fua;
- power-cut

If we don't wait dnode writeback for atomic_write:

SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 779.91 206.03 41621.5 40333.16 716.9 1038.21 1034.85
2 848.51 204.35 40082.44 39486.17 791.83 1119.96 1083.77
3 772.12 206.27 41335.25 41599.65 723.29 1055.07 971.92
Avg 800.18 205.55 41013.06333 40472.99333 744.0066667 1071.08 1030.18

Patched/Original:
0.92108464 1.001526693 0.987425886 0.993268102 1.030180511 1.026942031 0.976702294

SQLite's performance recovers.

Jaegeuk:
"Practically, I don't see db corruption becase of this. We can excuse to lose
the last transaction."

Finally, we decide to keep original implementation of atomic write interface
sematics that we don't wait all dnode writeback before preflush+fua submission.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e494c2f9 01-Aug-2018 Chao Yu <chao@kernel.org>

f2fs: fix to do sanity check with cp_pack_start_sum

After fuzzing, cp_pack_start_sum could be corrupted, so current log's
summary info should be wrong due to loading incorrect summary block.
Then, if segment's type in current log is exceeded NR_CURSEG_TYPE, it
can lead accessing invalid dirty_i->dirty_segmap bitmap finally.

Add sanity check for cp_pack_start_sum to fix this issue.

https://bugzilla.kernel.org/show_bug.cgi?id=200419

- Reproduce

- Kernel message (f2fs-dev w/ KASAN)
[ 3117.578432] F2FS-fs (loop0): Invalid log blocks per segment (8)

[ 3117.578445] F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock
[ 3117.581364] F2FS-fs (loop0): invalid crc_offset: 30716
[ 3117.583564] WARNING: CPU: 1 PID: 1225 at fs/f2fs/checkpoint.c:90 __get_meta_page+0x448/0x4b0
[ 3117.583570] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds serio_raw snd soundcore mac_hid i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 8139too qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel psmouse aes_x86_64 8139cp crypto_simd cryptd mii glue_helper pata_acpi floppy
[ 3117.584014] CPU: 1 PID: 1225 Comm: mount Not tainted 4.17.0+ #1
[ 3117.584017] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 3117.584022] RIP: 0010:__get_meta_page+0x448/0x4b0
[ 3117.584023] Code: 00 49 8d bc 24 84 00 00 00 e8 74 54 da ff 41 83 8c 24 84 00 00 00 08 4c 89 f6 4c 89 ef e8 c0 d9 95 00 48 89 ef e8 18 e3 00 00 <0f> 0b f0 80 4d 48 04 e9 0f fe ff ff 0f 0b 48 89 c7 48 89 04 24 e8
[ 3117.584072] RSP: 0018:ffff88018eb678c0 EFLAGS: 00010286
[ 3117.584082] RAX: ffff88018f0a6a78 RBX: ffffea0007a46600 RCX: ffffffff9314d1b2
[ 3117.584085] RDX: ffffffff00000001 RSI: 0000000000000000 RDI: ffff88018f0a6a98
[ 3117.584087] RBP: ffff88018ebe9980 R08: 0000000000000002 R09: 0000000000000001
[ 3117.584090] R10: 0000000000000001 R11: ffffed00326e4450 R12: ffff880193722200
[ 3117.584092] R13: ffff88018ebe9afc R14: 0000000000000206 R15: ffff88018eb67900
[ 3117.584096] FS: 00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
[ 3117.584098] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3117.584101] CR2: 00000000016f21b8 CR3: 0000000191c22000 CR4: 00000000000006e0
[ 3117.584112] Call Trace:
[ 3117.584121] ? f2fs_set_meta_page_dirty+0x150/0x150
[ 3117.584127] ? f2fs_build_segment_manager+0xbf9/0x3190
[ 3117.584133] ? f2fs_npages_for_summary_flush+0x75/0x120
[ 3117.584145] f2fs_build_segment_manager+0xda8/0x3190
[ 3117.584151] ? f2fs_get_valid_checkpoint+0x298/0xa00
[ 3117.584156] ? f2fs_flush_sit_entries+0x10e0/0x10e0
[ 3117.584184] ? map_id_range_down+0x17c/0x1b0
[ 3117.584188] ? __put_user_ns+0x30/0x30
[ 3117.584206] ? find_next_bit+0x53/0x90
[ 3117.584237] ? cpumask_next+0x16/0x20
[ 3117.584249] f2fs_fill_super+0x1948/0x2b40
[ 3117.584258] ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.584279] ? sget_userns+0x65e/0x690
[ 3117.584296] ? set_blocksize+0x88/0x130
[ 3117.584302] ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.584305] mount_bdev+0x1c0/0x200
[ 3117.584310] mount_fs+0x5c/0x190
[ 3117.584320] vfs_kern_mount+0x64/0x190
[ 3117.584330] do_mount+0x2e4/0x1450
[ 3117.584343] ? lockref_put_return+0x130/0x130
[ 3117.584347] ? copy_mount_string+0x20/0x20
[ 3117.584357] ? kasan_unpoison_shadow+0x31/0x40
[ 3117.584362] ? kasan_kmalloc+0xa6/0xd0
[ 3117.584373] ? memcg_kmem_put_cache+0x16/0x90
[ 3117.584377] ? __kmalloc_track_caller+0x196/0x210
[ 3117.584383] ? _copy_from_user+0x61/0x90
[ 3117.584396] ? memdup_user+0x3e/0x60
[ 3117.584401] ksys_mount+0x7e/0xd0
[ 3117.584405] __x64_sys_mount+0x62/0x70
[ 3117.584427] do_syscall_64+0x73/0x160
[ 3117.584440] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3117.584455] RIP: 0033:0x7f5693f14b9a
[ 3117.584456] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[ 3117.584505] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 3117.584510] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a
[ 3117.584512] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040
[ 3117.584514] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 3117.584516] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040
[ 3117.584519] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003
[ 3117.584523] ---[ end trace a8e0d899985faf31 ]---
[ 3117.685663] F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=2, run fsck to fix.
[ 3117.685673] F2FS-fs (loop0): recover_data: ino = 2 (i_size: recover) recovered = 1, err = 0
[ 3117.685707] ==================================================================
[ 3117.685955] BUG: KASAN: slab-out-of-bounds in __remove_dirty_segment+0xdd/0x1e0
[ 3117.686175] Read of size 8 at addr ffff88018f0a63d0 by task mount/1225

[ 3117.686477] CPU: 0 PID: 1225 Comm: mount Tainted: G W 4.17.0+ #1
[ 3117.686481] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 3117.686483] Call Trace:
[ 3117.686494] dump_stack+0x71/0xab
[ 3117.686512] print_address_description+0x6b/0x290
[ 3117.686517] kasan_report+0x28e/0x390
[ 3117.686522] ? __remove_dirty_segment+0xdd/0x1e0
[ 3117.686527] __remove_dirty_segment+0xdd/0x1e0
[ 3117.686532] locate_dirty_segment+0x189/0x190
[ 3117.686538] f2fs_allocate_new_segments+0xa9/0xe0
[ 3117.686543] recover_data+0x703/0x2c20
[ 3117.686547] ? f2fs_recover_fsync_data+0x48f/0xd50
[ 3117.686553] ? ksys_mount+0x7e/0xd0
[ 3117.686564] ? policy_nodemask+0x1a/0x90
[ 3117.686567] ? policy_node+0x56/0x70
[ 3117.686571] ? add_fsync_inode+0xf0/0xf0
[ 3117.686592] ? blk_finish_plug+0x44/0x60
[ 3117.686597] ? f2fs_ra_meta_pages+0x38b/0x5e0
[ 3117.686602] ? find_inode_fast+0xac/0xc0
[ 3117.686606] ? f2fs_is_valid_blkaddr+0x320/0x320
[ 3117.686618] ? __radix_tree_lookup+0x150/0x150
[ 3117.686633] ? dqget+0x670/0x670
[ 3117.686648] ? pagecache_get_page+0x29/0x410
[ 3117.686656] ? kmem_cache_alloc+0x176/0x1e0
[ 3117.686660] ? f2fs_is_valid_blkaddr+0x11d/0x320
[ 3117.686664] f2fs_recover_fsync_data+0xc23/0xd50
[ 3117.686670] ? f2fs_space_for_roll_forward+0x60/0x60
[ 3117.686674] ? rb_insert_color+0x323/0x3d0
[ 3117.686678] ? f2fs_recover_orphan_inodes+0xa5/0x700
[ 3117.686683] ? proc_register+0x153/0x1d0
[ 3117.686686] ? f2fs_remove_orphan_inode+0x10/0x10
[ 3117.686695] ? f2fs_attr_store+0x50/0x50
[ 3117.686700] ? proc_create_single_data+0x52/0x60
[ 3117.686707] f2fs_fill_super+0x1d06/0x2b40
[ 3117.686728] ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.686735] ? sget_userns+0x65e/0x690
[ 3117.686740] ? set_blocksize+0x88/0x130
[ 3117.686745] ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.686748] mount_bdev+0x1c0/0x200
[ 3117.686753] mount_fs+0x5c/0x190
[ 3117.686758] vfs_kern_mount+0x64/0x190
[ 3117.686762] do_mount+0x2e4/0x1450
[ 3117.686769] ? lockref_put_return+0x130/0x130
[ 3117.686773] ? copy_mount_string+0x20/0x20
[ 3117.686777] ? kasan_unpoison_shadow+0x31/0x40
[ 3117.686780] ? kasan_kmalloc+0xa6/0xd0
[ 3117.686786] ? memcg_kmem_put_cache+0x16/0x90
[ 3117.686790] ? __kmalloc_track_caller+0x196/0x210
[ 3117.686795] ? _copy_from_user+0x61/0x90
[ 3117.686801] ? memdup_user+0x3e/0x60
[ 3117.686804] ksys_mount+0x7e/0xd0
[ 3117.686809] __x64_sys_mount+0x62/0x70
[ 3117.686816] do_syscall_64+0x73/0x160
[ 3117.686824] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3117.686829] RIP: 0033:0x7f5693f14b9a
[ 3117.686830] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[ 3117.686887] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 3117.686892] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a
[ 3117.686894] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040
[ 3117.686896] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 3117.686899] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040
[ 3117.686901] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003

[ 3117.687005] Allocated by task 1225:
[ 3117.687152] kasan_kmalloc+0xa6/0xd0
[ 3117.687157] kmem_cache_alloc_trace+0xfd/0x200
[ 3117.687161] f2fs_build_segment_manager+0x2d09/0x3190
[ 3117.687165] f2fs_fill_super+0x1948/0x2b40
[ 3117.687168] mount_bdev+0x1c0/0x200
[ 3117.687171] mount_fs+0x5c/0x190
[ 3117.687174] vfs_kern_mount+0x64/0x190
[ 3117.687177] do_mount+0x2e4/0x1450
[ 3117.687180] ksys_mount+0x7e/0xd0
[ 3117.687182] __x64_sys_mount+0x62/0x70
[ 3117.687186] do_syscall_64+0x73/0x160
[ 3117.687190] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[ 3117.687285] Freed by task 19:
[ 3117.687412] __kasan_slab_free+0x137/0x190
[ 3117.687416] kfree+0x8b/0x1b0
[ 3117.687460] ttm_bo_man_put_node+0x61/0x80 [ttm]
[ 3117.687476] ttm_bo_cleanup_refs+0x15f/0x250 [ttm]
[ 3117.687492] ttm_bo_delayed_delete+0x2f0/0x300 [ttm]
[ 3117.687507] ttm_bo_delayed_workqueue+0x17/0x50 [ttm]
[ 3117.687528] process_one_work+0x2f9/0x740
[ 3117.687531] worker_thread+0x78/0x6b0
[ 3117.687541] kthread+0x177/0x1c0
[ 3117.687545] ret_from_fork+0x35/0x40

[ 3117.687638] The buggy address belongs to the object at ffff88018f0a6300
which belongs to the cache kmalloc-192 of size 192
[ 3117.688014] The buggy address is located 16 bytes to the right of
192-byte region [ffff88018f0a6300, ffff88018f0a63c0)
[ 3117.688382] The buggy address belongs to the page:
[ 3117.688554] page:ffffea00063c2980 count:1 mapcount:0 mapping:ffff8801f3403180 index:0x0
[ 3117.688788] flags: 0x17fff8000000100(slab)
[ 3117.688944] raw: 017fff8000000100 ffffea00063c2840 0000000e0000000e ffff8801f3403180
[ 3117.689166] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
[ 3117.689386] page dumped because: kasan: bad access detected

[ 3117.689653] Memory state around the buggy address:
[ 3117.689816] ffff88018f0a6280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3117.690027] ffff88018f0a6300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 3117.690239] >ffff88018f0a6380: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 3117.690448] ^
[ 3117.690644] ffff88018f0a6400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 3117.690868] ffff88018f0a6480: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 3117.691077] ==================================================================
[ 3117.691290] Disabling lock debugging due to kernel taint
[ 3117.693893] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 3117.694120] PGD 80000001f01bc067 P4D 80000001f01bc067 PUD 1d9638067 PMD 0
[ 3117.694338] Oops: 0002 [#1] SMP KASAN PTI
[ 3117.694490] CPU: 1 PID: 1225 Comm: mount Tainted: G B W 4.17.0+ #1
[ 3117.694703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 3117.695073] RIP: 0010:__remove_dirty_segment+0xe2/0x1e0
[ 3117.695246] Code: c4 48 89 c7 e8 cf bb d7 ff 45 0f b6 24 24 41 83 e4 3f 44 88 64 24 07 41 83 e4 3f 4a 8d 7c e3 08 e8 b3 bc d7 ff 4a 8b 4c e3 08 <f0> 4c 0f b3 29 0f 82 94 00 00 00 48 8d bd 20 04 00 00 e8 97 bb d7
[ 3117.695793] RSP: 0018:ffff88018eb67638 EFLAGS: 00010292
[ 3117.695969] RAX: 0000000000000000 RBX: ffff88018f0a6300 RCX: 0000000000000000
[ 3117.696182] RDX: 0000000000000000 RSI: 0000000000000297 RDI: 0000000000000297
[ 3117.696391] RBP: ffff88018ebe9980 R08: ffffed003e743ebb R09: ffffed003e743ebb
[ 3117.696604] R10: 0000000000000001 R11: ffffed003e743eba R12: 0000000000000019
[ 3117.696813] R13: 0000000000000014 R14: 0000000000000320 R15: ffff88018ebe99e0
[ 3117.697032] FS: 00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
[ 3117.697280] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3117.702357] CR2: 00007fe89bb1a000 CR3: 0000000191c22000 CR4: 00000000000006e0
[ 3117.707235] Call Trace:
[ 3117.712077] locate_dirty_segment+0x189/0x190
[ 3117.716891] f2fs_allocate_new_segments+0xa9/0xe0
[ 3117.721617] recover_data+0x703/0x2c20
[ 3117.726316] ? f2fs_recover_fsync_data+0x48f/0xd50
[ 3117.730957] ? ksys_mount+0x7e/0xd0
[ 3117.735573] ? policy_nodemask+0x1a/0x90
[ 3117.740198] ? policy_node+0x56/0x70
[ 3117.744829] ? add_fsync_inode+0xf0/0xf0
[ 3117.749487] ? blk_finish_plug+0x44/0x60
[ 3117.754152] ? f2fs_ra_meta_pages+0x38b/0x5e0
[ 3117.758831] ? find_inode_fast+0xac/0xc0
[ 3117.763448] ? f2fs_is_valid_blkaddr+0x320/0x320
[ 3117.768046] ? __radix_tree_lookup+0x150/0x150
[ 3117.772603] ? dqget+0x670/0x670
[ 3117.777159] ? pagecache_get_page+0x29/0x410
[ 3117.781648] ? kmem_cache_alloc+0x176/0x1e0
[ 3117.786067] ? f2fs_is_valid_blkaddr+0x11d/0x320
[ 3117.790476] f2fs_recover_fsync_data+0xc23/0xd50
[ 3117.794790] ? f2fs_space_for_roll_forward+0x60/0x60
[ 3117.799086] ? rb_insert_color+0x323/0x3d0
[ 3117.803304] ? f2fs_recover_orphan_inodes+0xa5/0x700
[ 3117.807563] ? proc_register+0x153/0x1d0
[ 3117.811766] ? f2fs_remove_orphan_inode+0x10/0x10
[ 3117.815947] ? f2fs_attr_store+0x50/0x50
[ 3117.820087] ? proc_create_single_data+0x52/0x60
[ 3117.824262] f2fs_fill_super+0x1d06/0x2b40
[ 3117.828367] ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.832432] ? sget_userns+0x65e/0x690
[ 3117.836500] ? set_blocksize+0x88/0x130
[ 3117.840501] ? f2fs_commit_super+0x1a0/0x1a0
[ 3117.844420] mount_bdev+0x1c0/0x200
[ 3117.848275] mount_fs+0x5c/0x190
[ 3117.852053] vfs_kern_mount+0x64/0x190
[ 3117.855810] do_mount+0x2e4/0x1450
[ 3117.859441] ? lockref_put_return+0x130/0x130
[ 3117.862996] ? copy_mount_string+0x20/0x20
[ 3117.866417] ? kasan_unpoison_shadow+0x31/0x40
[ 3117.869719] ? kasan_kmalloc+0xa6/0xd0
[ 3117.872948] ? memcg_kmem_put_cache+0x16/0x90
[ 3117.876121] ? __kmalloc_track_caller+0x196/0x210
[ 3117.879333] ? _copy_from_user+0x61/0x90
[ 3117.882467] ? memdup_user+0x3e/0x60
[ 3117.885604] ksys_mount+0x7e/0xd0
[ 3117.888700] __x64_sys_mount+0x62/0x70
[ 3117.891742] do_syscall_64+0x73/0x160
[ 3117.894692] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3117.897669] RIP: 0033:0x7f5693f14b9a
[ 3117.900563] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[ 3117.906922] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 3117.910159] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a
[ 3117.913469] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040
[ 3117.916764] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 3117.920071] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040
[ 3117.923393] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003
[ 3117.926680] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds serio_raw snd soundcore mac_hid i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 8139too qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel psmouse aes_x86_64 8139cp crypto_simd cryptd mii glue_helper pata_acpi floppy
[ 3117.949979] CR2: 0000000000000000
[ 3117.954283] ---[ end trace a8e0d899985faf32 ]---
[ 3117.958575] RIP: 0010:__remove_dirty_segment+0xe2/0x1e0
[ 3117.962810] Code: c4 48 89 c7 e8 cf bb d7 ff 45 0f b6 24 24 41 83 e4 3f 44 88 64 24 07 41 83 e4 3f 4a 8d 7c e3 08 e8 b3 bc d7 ff 4a 8b 4c e3 08 <f0> 4c 0f b3 29 0f 82 94 00 00 00 48 8d bd 20 04 00 00 e8 97 bb d7
[ 3117.971789] RSP: 0018:ffff88018eb67638 EFLAGS: 00010292
[ 3117.976333] RAX: 0000000000000000 RBX: ffff88018f0a6300 RCX: 0000000000000000
[ 3117.980926] RDX: 0000000000000000 RSI: 0000000000000297 RDI: 0000000000000297
[ 3117.985497] RBP: ffff88018ebe9980 R08: ffffed003e743ebb R09: ffffed003e743ebb
[ 3117.990098] R10: 0000000000000001 R11: ffffed003e743eba R12: 0000000000000019
[ 3117.994761] R13: 0000000000000014 R14: 0000000000000320 R15: ffff88018ebe99e0
[ 3117.999392] FS: 00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
[ 3118.004096] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3118.008816] CR2: 00007fe89bb1a000 CR3: 0000000191c22000 CR4: 00000000000006e0

- Location
https://elixir.bootlin.com/linux/v4.18-rc3/source/fs/f2fs/segment.c#L775
if (test_and_clear_bit(segno, dirty_i->dirty_segmap[t]))
dirty_i->nr_dirty[t]--;
Here dirty_i->dirty_segmap[t] can be NULL which leads to crash in test_and_clear_bit()

Reported-by Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4ddc1b28 25-Jul-2018 Chao Yu <chao@kernel.org>

f2fs: fix to restrict mount condition when without CONFIG_QUOTA

Like quota_ino feature, we need to reject mounting RDWR with image
which enables project_quota feature when there is no CONFIG_QUOTA
be set in kernel.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 00960c2c 24-Jul-2018 Sheng Yong <shengyong1@huawei.com>

f2fs: quota: do not mount as RDWR without QUOTA if quota feature enabled

If quota feature is enabled, quota is on by default. However, if
CONFIG_QUOTA is not built in kernel, dquot entries will not get updated,
which leads to quota inconsistency.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 76cf05d7 26-Jul-2018 Sheng Yong <shengyong1@huawei.com>

f2fs: quota: fix incorrect comments

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 955ac6e5 24-Jul-2018 Sheng Yong <shengyong1@huawei.com>

f2fs: quota: decrease the lock granularity of statfs_project

According to fs/quota/dquot.c, `dq_data_lock' protects mem_dqinfo
structures and modifications of dquot pointers in the inode, and
`dquot->dq_dqb_lock' protects data from dq_dqb.

We should use dquot->dq_dqb_lock in statfs_project instead of
dq_dat_lock.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a39e5365 05-Jul-2018 Chao Yu <chao@kernel.org>

f2fs: enable real-time discard by default

f2fs is focused on flash based storage, so let's enable real-time
discard by default, if user don't want to enable it, 'nodiscard'
mount option should be used on mount.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# dc132802 02-Jul-2018 Sahitya Tummala <stummala@codeaurora.org>

f2fs: show the fsync_mode=nobarrier mount option

This patch shows the fsync_mode=nobarrier mount option in
f2fs_show_options().

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2d3a5856 30-Jun-2018 Gao Xiang <xiang@kernel.org>

f2fs: avoid the global name 'fault_name'

Non-prefix global name 'fault_name' will pollute global
namespace, fix it.

Refer to:
https://lists.01.org/pipermail/kbuild-all/2018-June/049660.html

To: Jaegeuk Kim <jaegeuk@kernel.org>
To: Chao Yu <yuchao0@huawei.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9dc956b2 27-Jun-2018 Chao Yu <chao@kernel.org>

f2fs: fix to do sanity check with user_block_count

This patch fixs to do sanity check with user_block_count.

- Overview
Divide zero in utilization when mount() a corrupted f2fs image

- Reproduce (4.18 upstream kernel)

- Kernel message
[ 564.099503] F2FS-fs (loop0): invalid crc value
[ 564.101991] divide error: 0000 [#1] SMP KASAN PTI
[ 564.103103] CPU: 1 PID: 1298 Comm: f2fs_discard-7: Not tainted 4.18.0-rc1+ #4
[ 564.104584] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 564.106624] RIP: 0010:issue_discard_thread+0x248/0x5c0
[ 564.107692] Code: ff ff 48 8b bd e8 fe ff ff 41 8b 9d 4c 04 00 00 e8 cd b8 ad ff 41 8b 85 50 04 00 00 31 d2 48 8d 04 80 48 8d 04 80 48 c1 e0 02 <48> f7 f3 83 f8 50 7e 16 41 c7 86 7c ff ff ff 01 00 00 00 41 c7 86
[ 564.111686] RSP: 0018:ffff8801f3117dc0 EFLAGS: 00010206
[ 564.112775] RAX: 0000000000000384 RBX: 0000000000000000 RCX: ffffffffb88c1e03
[ 564.114250] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffff8801e3aa4850
[ 564.115706] RBP: ffff8801f3117f00 R08: 1ffffffff751a1d0 R09: fffffbfff751a1d0
[ 564.117177] R10: 0000000000000001 R11: fffffbfff751a1d0 R12: 00000000fffffffc
[ 564.118634] R13: ffff8801e3aa4400 R14: ffff8801f3117ed8 R15: ffff8801e2050000
[ 564.120094] FS: 0000000000000000(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 564.121748] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 564.122923] CR2: 000000000202b078 CR3: 00000001f11ac000 CR4: 00000000000006e0
[ 564.124383] Call Trace:
[ 564.124924] ? __issue_discard_cmd+0x480/0x480
[ 564.125882] ? __sched_text_start+0x8/0x8
[ 564.126756] ? __kthread_parkme+0xcb/0x100
[ 564.127620] ? kthread_blkcg+0x70/0x70
[ 564.128412] kthread+0x180/0x1d0
[ 564.129105] ? __issue_discard_cmd+0x480/0x480
[ 564.130029] ? kthread_associate_blkcg+0x150/0x150
[ 564.131033] ret_from_fork+0x35/0x40
[ 564.131794] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[ 564.141798] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 564.142773] RIP: 0010:issue_discard_thread+0x248/0x5c0
[ 564.143885] Code: ff ff 48 8b bd e8 fe ff ff 41 8b 9d 4c 04 00 00 e8 cd b8 ad ff 41 8b 85 50 04 00 00 31 d2 48 8d 04 80 48 8d 04 80 48 c1 e0 02 <48> f7 f3 83 f8 50 7e 16 41 c7 86 7c ff ff ff 01 00 00 00 41 c7 86
[ 564.147776] RSP: 0018:ffff8801f3117dc0 EFLAGS: 00010206
[ 564.148856] RAX: 0000000000000384 RBX: 0000000000000000 RCX: ffffffffb88c1e03
[ 564.150424] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffff8801e3aa4850
[ 564.151906] RBP: ffff8801f3117f00 R08: 1ffffffff751a1d0 R09: fffffbfff751a1d0
[ 564.153463] R10: 0000000000000001 R11: fffffbfff751a1d0 R12: 00000000fffffffc
[ 564.154915] R13: ffff8801e3aa4400 R14: ffff8801f3117ed8 R15: ffff8801e2050000
[ 564.156405] FS: 0000000000000000(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 564.158070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 564.159279] CR2: 000000000202b078 CR3: 00000001f11ac000 CR4: 00000000000006e0
[ 564.161043] ==================================================================
[ 564.162587] BUG: KASAN: stack-out-of-bounds in from_kuid_munged+0x1d/0x50
[ 564.163994] Read of size 4 at addr ffff8801f3117c84 by task f2fs_discard-7:/1298

[ 564.165852] CPU: 1 PID: 1298 Comm: f2fs_discard-7: Tainted: G D 4.18.0-rc1+ #4
[ 564.167593] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 564.169522] Call Trace:
[ 564.170057] dump_stack+0x7b/0xb5
[ 564.170778] print_address_description+0x70/0x290
[ 564.171765] kasan_report+0x291/0x390
[ 564.172540] ? from_kuid_munged+0x1d/0x50
[ 564.173408] __asan_load4+0x78/0x80
[ 564.174148] from_kuid_munged+0x1d/0x50
[ 564.174962] do_notify_parent+0x1f5/0x4f0
[ 564.175808] ? send_sigqueue+0x390/0x390
[ 564.176639] ? css_set_move_task+0x152/0x340
[ 564.184197] do_exit+0x1290/0x1390
[ 564.184950] ? __issue_discard_cmd+0x480/0x480
[ 564.185884] ? mm_update_next_owner+0x380/0x380
[ 564.186829] ? __sched_text_start+0x8/0x8
[ 564.187672] ? __kthread_parkme+0xcb/0x100
[ 564.188528] ? kthread_blkcg+0x70/0x70
[ 564.189333] ? kthread+0x180/0x1d0
[ 564.190052] ? __issue_discard_cmd+0x480/0x480
[ 564.190983] rewind_stack_do_exit+0x17/0x20

[ 564.192190] The buggy address belongs to the page:
[ 564.193213] page:ffffea0007cc45c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 564.194856] flags: 0x2ffff0000000000()
[ 564.195644] raw: 02ffff0000000000 0000000000000000 dead000000000200 0000000000000000
[ 564.197247] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 564.198826] page dumped because: kasan: bad access detected

[ 564.200299] Memory state around the buggy address:
[ 564.201306] ffff8801f3117b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 564.202779] ffff8801f3117c00: 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3
[ 564.204252] >ffff8801f3117c80: f3 f3 f3 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 564.205742] ^
[ 564.206424] ffff8801f3117d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 564.207908] ffff8801f3117d80: f3 f3 f3 f3 f3 f3 f3 f3 00 00 00 00 00 00 00 00
[ 564.209389] ==================================================================
[ 564.231795] F2FS-fs (loop0): Mounted with checkpoint version = 2

- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.h#L586
return div_u64((u64)valid_user_blocks(sbi) * 100,
sbi->user_block_count);
Missing checks on sbi->user_block_count.

Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c77ec61c 22-Jun-2018 Chao Yu <chao@kernel.org>

f2fs: fix to do sanity check with {sit,nat}_ver_bitmap_bytesize

This patch adds to do sanity check with {sit,nat}_ver_bitmap_bytesize
during mount, in order to avoid accessing across cache boundary with
this abnormal bitmap size.

- Overview
buffer overrun in build_sit_info() when mounting a crafted f2fs image

- Reproduce

- Kernel message
[ 548.580867] F2FS-fs (loop0): Invalid log blocks per segment (8201)

[ 548.580877] F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
[ 548.584979] ==================================================================
[ 548.586568] BUG: KASAN: use-after-free in kmemdup+0x36/0x50
[ 548.587715] Read of size 64 at addr ffff8801e9c265ff by task mount/1295

[ 548.589428] CPU: 1 PID: 1295 Comm: mount Not tainted 4.18.0-rc1+ #4
[ 548.589432] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 548.589438] Call Trace:
[ 548.589474] dump_stack+0x7b/0xb5
[ 548.589487] print_address_description+0x70/0x290
[ 548.589492] kasan_report+0x291/0x390
[ 548.589496] ? kmemdup+0x36/0x50
[ 548.589509] check_memory_region+0x139/0x190
[ 548.589514] memcpy+0x23/0x50
[ 548.589518] kmemdup+0x36/0x50
[ 548.589545] f2fs_build_segment_manager+0x8fa/0x3410
[ 548.589551] ? __asan_loadN+0xf/0x20
[ 548.589560] ? f2fs_sanity_check_ckpt+0x1be/0x240
[ 548.589566] ? f2fs_flush_sit_entries+0x10c0/0x10c0
[ 548.589587] ? __put_user_ns+0x40/0x40
[ 548.589604] ? find_next_bit+0x57/0x90
[ 548.589610] f2fs_fill_super+0x194b/0x2b40
[ 548.589617] ? f2fs_commit_super+0x1b0/0x1b0
[ 548.589637] ? set_blocksize+0x90/0x140
[ 548.589651] mount_bdev+0x1c5/0x210
[ 548.589655] ? f2fs_commit_super+0x1b0/0x1b0
[ 548.589667] f2fs_mount+0x15/0x20
[ 548.589672] mount_fs+0x60/0x1a0
[ 548.589683] ? alloc_vfsmnt+0x309/0x360
[ 548.589688] vfs_kern_mount+0x6b/0x1a0
[ 548.589699] do_mount+0x34a/0x18c0
[ 548.589710] ? lockref_put_or_lock+0xcf/0x160
[ 548.589716] ? copy_mount_string+0x20/0x20
[ 548.589728] ? memcg_kmem_put_cache+0x1b/0xa0
[ 548.589734] ? kasan_check_write+0x14/0x20
[ 548.589740] ? _copy_from_user+0x6a/0x90
[ 548.589744] ? memdup_user+0x42/0x60
[ 548.589750] ksys_mount+0x83/0xd0
[ 548.589755] __x64_sys_mount+0x67/0x80
[ 548.589781] do_syscall_64+0x78/0x170
[ 548.589797] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 548.589820] RIP: 0033:0x7f76fc331b9a
[ 548.589821] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[ 548.589880] RSP: 002b:00007ffd4f0a0e48 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 548.589890] RAX: ffffffffffffffda RBX: 000000000146c030 RCX: 00007f76fc331b9a
[ 548.589892] RDX: 000000000146c210 RSI: 000000000146df30 RDI: 0000000001474ec0
[ 548.589895] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 548.589897] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000000001474ec0
[ 548.589900] R13: 000000000146c210 R14: 0000000000000000 R15: 0000000000000003

[ 548.590242] The buggy address belongs to the page:
[ 548.591243] page:ffffea0007a70980 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 548.592886] flags: 0x2ffff0000000000()
[ 548.593665] raw: 02ffff0000000000 dead000000000100 dead000000000200 0000000000000000
[ 548.595258] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 548.603713] page dumped because: kasan: bad access detected

[ 548.605203] Memory state around the buggy address:
[ 548.606198] ffff8801e9c26480: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 548.607676] ffff8801e9c26500: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 548.609157] >ffff8801e9c26580: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 548.610629] ^
[ 548.612088] ffff8801e9c26600: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 548.613674] ffff8801e9c26680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 548.615141] ==================================================================
[ 548.616613] Disabling lock debugging due to kernel taint
[ 548.622871] WARNING: CPU: 1 PID: 1295 at mm/page_alloc.c:4065 __alloc_pages_slowpath+0xe4a/0x1420
[ 548.622878] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
[ 548.623217] CPU: 1 PID: 1295 Comm: mount Tainted: G B 4.18.0-rc1+ #4
[ 548.623219] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 548.623226] RIP: 0010:__alloc_pages_slowpath+0xe4a/0x1420
[ 548.623227] Code: ff ff 01 89 85 c8 fe ff ff e9 91 fc ff ff 41 89 c5 e9 5c fc ff ff 0f 0b 89 f8 25 ff ff f7 ff 89 85 8c fe ff ff e9 d5 f2 ff ff <0f> 0b e9 65 f2 ff ff 65 8b 05 38 81 d2 47 f6 c4 01 74 1c 65 48 8b
[ 548.623281] RSP: 0018:ffff8801f28c7678 EFLAGS: 00010246
[ 548.623284] RAX: 0000000000000000 RBX: 00000000006040c0 RCX: ffffffffb82f73b7
[ 548.623287] RDX: 1ffff1003e518eeb RSI: 000000000000000c RDI: 0000000000000000
[ 548.623290] RBP: ffff8801f28c7880 R08: 0000000000000000 R09: ffffed0047fff2c5
[ 548.623292] R10: 0000000000000001 R11: ffffed0047fff2c4 R12: ffff8801e88de040
[ 548.623295] R13: 00000000006040c0 R14: 000000000000000c R15: ffff8801f28c7938
[ 548.623299] FS: 00007f76fca51840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
[ 548.623302] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 548.623304] CR2: 00007f19b9171760 CR3: 00000001ed952000 CR4: 00000000000006e0
[ 548.623317] Call Trace:
[ 548.623325] ? kasan_check_read+0x11/0x20
[ 548.623330] ? __zone_watermark_ok+0x92/0x240
[ 548.623336] ? get_page_from_freelist+0x1c3/0x1d90
[ 548.623347] ? _raw_spin_lock_irqsave+0x2a/0x60
[ 548.623353] ? warn_alloc+0x250/0x250
[ 548.623358] ? save_stack+0x46/0xd0
[ 548.623361] ? kasan_kmalloc+0xad/0xe0
[ 548.623366] ? __isolate_free_page+0x2a0/0x2a0
[ 548.623370] ? mount_fs+0x60/0x1a0
[ 548.623374] ? vfs_kern_mount+0x6b/0x1a0
[ 548.623378] ? do_mount+0x34a/0x18c0
[ 548.623383] ? ksys_mount+0x83/0xd0
[ 548.623387] ? __x64_sys_mount+0x67/0x80
[ 548.623391] ? do_syscall_64+0x78/0x170
[ 548.623396] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 548.623401] __alloc_pages_nodemask+0x3c5/0x400
[ 548.623407] ? __alloc_pages_slowpath+0x1420/0x1420
[ 548.623412] ? __mutex_lock_slowpath+0x20/0x20
[ 548.623417] ? kvmalloc_node+0x31/0x80
[ 548.623424] alloc_pages_current+0x75/0x110
[ 548.623436] kmalloc_order+0x24/0x60
[ 548.623442] kmalloc_order_trace+0x24/0xb0
[ 548.623448] __kmalloc_track_caller+0x207/0x220
[ 548.623455] ? f2fs_build_node_manager+0x399/0xbb0
[ 548.623460] kmemdup+0x20/0x50
[ 548.623465] f2fs_build_node_manager+0x399/0xbb0
[ 548.623470] f2fs_fill_super+0x195e/0x2b40
[ 548.623477] ? f2fs_commit_super+0x1b0/0x1b0
[ 548.623481] ? set_blocksize+0x90/0x140
[ 548.623486] mount_bdev+0x1c5/0x210
[ 548.623489] ? f2fs_commit_super+0x1b0/0x1b0
[ 548.623495] f2fs_mount+0x15/0x20
[ 548.623498] mount_fs+0x60/0x1a0
[ 548.623503] ? alloc_vfsmnt+0x309/0x360
[ 548.623508] vfs_kern_mount+0x6b/0x1a0
[ 548.623513] do_mount+0x34a/0x18c0
[ 548.623518] ? lockref_put_or_lock+0xcf/0x160
[ 548.623523] ? copy_mount_string+0x20/0x20
[ 548.623528] ? memcg_kmem_put_cache+0x1b/0xa0
[ 548.623533] ? kasan_check_write+0x14/0x20
[ 548.623537] ? _copy_from_user+0x6a/0x90
[ 548.623542] ? memdup_user+0x42/0x60
[ 548.623547] ksys_mount+0x83/0xd0
[ 548.623552] __x64_sys_mount+0x67/0x80
[ 548.623557] do_syscall_64+0x78/0x170
[ 548.623562] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 548.623566] RIP: 0033:0x7f76fc331b9a
[ 548.623567] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[ 548.623632] RSP: 002b:00007ffd4f0a0e48 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 548.623636] RAX: ffffffffffffffda RBX: 000000000146c030 RCX: 00007f76fc331b9a
[ 548.623639] RDX: 000000000146c210 RSI: 000000000146df30 RDI: 0000000001474ec0
[ 548.623641] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 548.623643] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000000001474ec0
[ 548.623646] R13: 000000000146c210 R14: 0000000000000000 R15: 0000000000000003
[ 548.623650] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 548.623656] F2FS-fs (loop0): Failed to initialize F2FS node manager
[ 548.627936] F2FS-fs (loop0): Invalid log blocks per segment (8201)

[ 548.627940] F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
[ 548.635835] F2FS-fs (loop0): Failed to initialize F2FS node manager

- Location
https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.c#L3578

sit_i->sit_bitmap = kmemdup(src_bitmap, bitmap_size, GFP_KERNEL);

Buffer overrun happens when doing memcpy. I suspect there is missing (inconsistent) checks on bitmap_size.

Reported by Wen Xu (wen.xu@gatech.edu) from SSLab, Gatech.

Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 42bf546c 22-Jun-2018 Chao Yu <chao@kernel.org>

f2fs: fix to do sanity check with secs_per_zone

As Wen Xu reported in below link:

https://bugzilla.kernel.org/show_bug.cgi?id=200183

- Overview
Divide zero in reset_curseg() when mounting a crafted f2fs image

- Reproduce

- Kernel message
[ 588.281510] divide error: 0000 [#1] SMP KASAN PTI
[ 588.282701] CPU: 0 PID: 1293 Comm: mount Not tainted 4.18.0-rc1+ #4
[ 588.284000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 588.286178] RIP: 0010:reset_curseg+0x94/0x1a0
[ 588.298166] RSP: 0018:ffff8801e88d7940 EFLAGS: 00010246
[ 588.299360] RAX: 0000000000000014 RBX: ffff8801e1d46d00 RCX: ffffffffb88bf60b
[ 588.300809] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffff8801e1d46d64
[ 588.305272] R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000000
[ 588.306822] FS: 00007fad85008840(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
[ 588.308456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 588.309623] CR2: 0000000001705078 CR3: 00000001f30f8000 CR4: 00000000000006f0
[ 588.311085] Call Trace:
[ 588.311637] f2fs_build_segment_manager+0x103f/0x3410
[ 588.316136] ? f2fs_commit_super+0x1b0/0x1b0
[ 588.317031] ? set_blocksize+0x90/0x140
[ 588.319473] f2fs_mount+0x15/0x20
[ 588.320166] mount_fs+0x60/0x1a0
[ 588.320847] ? alloc_vfsmnt+0x309/0x360
[ 588.321647] vfs_kern_mount+0x6b/0x1a0
[ 588.322432] do_mount+0x34a/0x18c0
[ 588.323175] ? strndup_user+0x46/0x70
[ 588.323937] ? copy_mount_string+0x20/0x20
[ 588.324793] ? memcg_kmem_put_cache+0x1b/0xa0
[ 588.325702] ? kasan_check_write+0x14/0x20
[ 588.326562] ? _copy_from_user+0x6a/0x90
[ 588.327375] ? memdup_user+0x42/0x60
[ 588.328118] ksys_mount+0x83/0xd0
[ 588.328808] __x64_sys_mount+0x67/0x80
[ 588.329607] do_syscall_64+0x78/0x170
[ 588.330400] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 588.331461] RIP: 0033:0x7fad848e8b9a
[ 588.336022] RSP: 002b:00007ffd7c5b6be8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 588.337547] RAX: ffffffffffffffda RBX: 00000000016f8030 RCX: 00007fad848e8b9a
[ 588.338999] RDX: 00000000016f8210 RSI: 00000000016f9f30 RDI: 0000000001700ec0
[ 588.340442] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
[ 588.341887] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000000001700ec0
[ 588.343341] R13: 00000000016f8210 R14: 0000000000000000 R15: 0000000000000003
[ 588.354891] ---[ end trace 4ce02f25ff7d3df5 ]---
[ 588.355862] RIP: 0010:reset_curseg+0x94/0x1a0
[ 588.360742] RSP: 0018:ffff8801e88d7940 EFLAGS: 00010246
[ 588.361812] RAX: 0000000000000014 RBX: ffff8801e1d46d00 RCX: ffffffffb88bf60b
[ 588.363485] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffff8801e1d46d64
[ 588.365213] RBP: ffff8801e88d7968 R08: ffffed003c32266f R09: ffffed003c32266f
[ 588.366661] R10: 0000000000000001 R11: ffffed003c32266e R12: ffff8801f0337700
[ 588.368110] R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000000
[ 588.370057] FS: 00007fad85008840(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
[ 588.372099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 588.373291] CR2: 0000000001705078 CR3: 00000001f30f8000 CR4: 00000000000006f0

- Location
https://elixir.bootlin.com/linux/latest/source/fs/f2fs/segment.c#L2147
curseg->zone = GET_ZONE_FROM_SEG(sbi, curseg->segno);

If secs_per_zone is corrupted due to fuzzing test, it will cause divide
zero operation when using GET_ZONE_FROM_SEG macro, so we should do more
sanity check with secs_per_zone during mount to avoid this issue.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4e423832 11-Jun-2018 Chao Yu <chao@kernel.org>

f2fs: fix error path of fill_super

In fill_super, if root inode's attribute is incorrect, we need to
call f2fs_destroy_stats to release stats memory.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4cac90d5 11-Jun-2018 Chao Yu <chao@kernel.org>

f2fs: relocate readdir_ra configure initialization

readdir_ra is sysfs configuration instead of mount option, so it should
not be initialized in default_options(), otherwise after remount, it can
be reset to be enabled which may not as user wish, so let's move it to
f2fs_tuning_parameters().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0aa7e0f8 06-Jun-2018 Chao Yu <chao@kernel.org>

f2fs: move s_res{u,g}id initialization to default_options()

Let default_options() initialize s_res{u,g}id with default value like
other options.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 83a3bfdb 21-Jun-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: indicate shutdown f2fs to allow unmount successfully

Once we shutdown f2fs, we have to flush stale pages in order to unmount
the system. In order to make stable, we need to stop fault injection as well.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# dbae2c55 18-Jul-2018 Michael Callahan <michaelcallahan@fb.com>

block: Define and use STAT_READ and STAT_WRITE

Add defines for STAT_READ and STAT_WRITE for indexing the partition
stat entries. This clarifies some fs/ code which has hardcoded 1 for
STAT_WRITE and will make it easier to extend the stats with additional
fields.

tj: Refreshed on top of v4.17.

Signed-off-by: Michael Callahan <michaelcallahan@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 1cb50f87 06-Jul-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do checkpoint in kill_sb

When unmounting f2fs in force mode, we can get it stuck by io_schedule()
by some pending IOs in meta_inode.

io_schedule+0xd/0x30
wait_on_page_bit_common+0xc6/0x130
__filemap_fdatawait_range+0xbd/0x100
filemap_fdatawait_keep_errors+0x15/0x40
sync_inodes_sb+0x1cf/0x240
sync_filesystem+0x52/0x90
generic_shutdown_super+0x1d/0x110
kill_f2fs_super+0x28/0x80 [f2fs]
deactivate_locked_super+0x35/0x60
cleanup_mnt+0x36/0x70
task_work_run+0x79/0xa0
exit_to_usermode_loop+0x62/0x70
do_syscall_64+0xdb/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
0xffffffffffffffff

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 026f0507 12-Jun-2018 Kees Cook <keescook@chromium.org>

treewide: Use array_size() in f2fs_kzalloc()

The f2fs_kzalloc() function has no 2-factor argument form, so
multiplication factors need to be wrapped in array_size(). This patch
replaces cases of:

f2fs_kzalloc(handle, a * b, gfp)

with:
f2fs_kzalloc(handle, array_size(a, b), gfp)

as well as handling cases of:

f2fs_kzalloc(handle, a * b * c, gfp)

with:

f2fs_kzalloc(handle, array3_size(a, b, c), gfp)

This does, however, attempt to ignore constant size factors like:

f2fs_kzalloc(handle, 4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
expression HANDLE;
type TYPE;
expression THING, E;
@@

(
f2fs_kzalloc(HANDLE,
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
f2fs_kzalloc(HANDLE,
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression HANDLE;
expression COUNT;
typedef u8;
typedef __u8;
@@

(
f2fs_kzalloc(HANDLE,
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(char) * COUNT
+ COUNT
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
expression HANDLE;
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
f2fs_kzalloc(HANDLE,
- sizeof(TYPE) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE) * COUNT_ID
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING) * COUNT_ID
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
)

// 2-factor product, only identifiers.
@@
expression HANDLE;
identifier SIZE, COUNT;
@@

f2fs_kzalloc(HANDLE,
- SIZE * COUNT
+ array_size(COUNT, SIZE)
, ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression HANDLE;
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
f2fs_kzalloc(HANDLE,
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression HANDLE;
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
f2fs_kzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
f2fs_kzalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
expression HANDLE;
identifier STRIDE, SIZE, COUNT;
@@

(
f2fs_kzalloc(HANDLE,
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kzalloc(HANDLE,
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kzalloc(HANDLE,
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kzalloc(HANDLE,
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kzalloc(HANDLE,
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kzalloc(HANDLE,
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kzalloc(HANDLE,
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kzalloc(HANDLE,
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)

// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression HANDLE;
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
f2fs_kzalloc(HANDLE, C1 * C2 * C3, ...)
|
f2fs_kzalloc(HANDLE,
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)

// And then all remaining 2 factors products when they're not all constants.
@@
expression HANDLE;
expression E1, E2;
constant C1, C2;
@@

(
f2fs_kzalloc(HANDLE, C1 * C2, ...)
|
f2fs_kzalloc(HANDLE,
- E1 * E2
+ array_size(E1, E2)
, ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>


# c8606593 12-Jun-2018 Kees Cook <keescook@chromium.org>

treewide: Use array_size() in f2fs_kmalloc()

The f2fs_kmalloc() function has no 2-factor argument form, so
multiplication factors need to be wrapped in array_size(). This patch
replaces cases of:

f2fs_kmalloc(handle, a * b, gfp)

with:
f2fs_kmalloc(handle, array_size(a, b), gfp)

as well as handling cases of:

f2fs_kmalloc(handle, a * b * c, gfp)

with:

f2fs_kmalloc(handle, array3_size(a, b, c), gfp)

This does, however, attempt to ignore constant size factors like:

f2fs_kmalloc(handle, 4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
expression HANDLE;
type TYPE;
expression THING, E;
@@

(
f2fs_kmalloc(HANDLE,
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
f2fs_kmalloc(HANDLE,
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression HANDLE;
expression COUNT;
typedef u8;
typedef __u8;
@@

(
f2fs_kmalloc(HANDLE,
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(char) * COUNT
+ COUNT
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
expression HANDLE;
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
f2fs_kmalloc(HANDLE,
- sizeof(TYPE) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(TYPE) * COUNT_ID
+ array_size(COUNT_ID, sizeof(TYPE))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(TYPE) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(TYPE) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(TYPE))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(THING) * (COUNT_ID)
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(THING) * COUNT_ID
+ array_size(COUNT_ID, sizeof(THING))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(THING) * (COUNT_CONST)
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(THING) * COUNT_CONST
+ array_size(COUNT_CONST, sizeof(THING))
, ...)
)

// 2-factor product, only identifiers.
@@
expression HANDLE;
identifier SIZE, COUNT;
@@

f2fs_kmalloc(HANDLE,
- SIZE * COUNT
+ array_size(COUNT, SIZE)
, ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression HANDLE;
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
f2fs_kmalloc(HANDLE,
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression HANDLE;
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
f2fs_kmalloc(HANDLE,
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
f2fs_kmalloc(HANDLE,
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
expression HANDLE;
identifier STRIDE, SIZE, COUNT;
@@

(
f2fs_kmalloc(HANDLE,
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kmalloc(HANDLE,
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kmalloc(HANDLE,
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kmalloc(HANDLE,
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kmalloc(HANDLE,
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kmalloc(HANDLE,
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kmalloc(HANDLE,
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
f2fs_kmalloc(HANDLE,
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)

// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression HANDLE;
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
f2fs_kmalloc(HANDLE, C1 * C2 * C3, ...)
|
f2fs_kmalloc(HANDLE,
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)

// And then all remaining 2 factors products when they're not all constants.
@@
expression HANDLE;
expression E1, E2;
constant C1, C2;
@@

(
f2fs_kmalloc(HANDLE, C1 * C2, ...)
|
f2fs_kmalloc(HANDLE,
- E1 * E2
+ array_size(E1, E2)
, ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>


# c29fd0c0 04-Jun-2018 Chao Yu <chao@kernel.org>

f2fs: let sync node IO interrupt async one

Although mixed sync/async IOs can have continuous LBA, as they have
different IO priority, block IO scheduler will add them into different
queues and commit them separately, result in splited IOs which causes
wrose performance.

This patch gives high priority to synchronous IO of nodes, means that
once synchronous flow starts, it can interrupt asynchronous writeback
flow of system flusher, so more big IOs can be expected.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4d57b86d 29-May-2018 Chao Yu <chao@kernel.org>

f2fs: clean up symbol namespace

As Ted reported:

"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).

As one example, in fs/f2fs/dir.c there is:

unsigned char get_de_type(struct f2fs_dir_entry *de)

This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.

You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.

acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."

This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;

Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4071e67c 27-May-2018 Anatoly Pugachev <matorola@gmail.com>

disable loading f2fs module on PAGE_SIZE > 4KB

The following patch disables loading of f2fs module on architectures
which have PAGE_SIZE > 4096 , since it is impossible to mount f2fs on
such architectures , log messages are:

mount: /mnt: wrong fs type, bad option, bad superblock on
/dev/vdiskb1, missing codepage or helper program, or other error.
/dev/vdiskb1: F2FS filesystem,
UUID=1d8b9ca4-2389-4910-af3b-10998969f09c, volume name ""

May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
page_cache_size (8192), supports only 4KB
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Can't find valid F2FS
filesystem in 1th superblock
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
page_cache_size (8192), supports only 4KB
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Can't find valid F2FS
filesystem in 2th superblock
May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid
page_cache_size (8192), supports only 4KB

which was introduced by git commit 5c9b469295fb6b10d98923eab5e79c4edb80ed20

tested on git kernel 4.17.0-rc6-00309-gec30dcf7f425

with patch applied:

modprobe: ERROR: could not insert 'f2fs': Invalid argument
May 28 01:40:28 v215 kernel: F2FS not supported on PAGE_SIZE(8192) != 4096

Signed-off-by: Anatoly Pugachev <matorola@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 107a805d 25-May-2018 Chao Yu <chao@kernel.org>

f2fs: keep migration IO order in LFS mode

For non-migration IO, we will keep order of data/node blocks' submitting
as allocation sequence by sorting IOs in per log io_list list, but for
migration IO, it could be out-of-order.

In LFS mode, we should keep all IOs including migration IO be ordered,
so that this patch fixes to add an additional lock to keep submitting
order.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1c41e680 07-May-2018 Chao Yu <chao@kernel.org>

f2fs: fix to initialize i_current_depth according to inode type

i_current_depth is used only for directory inode, but its space is
shared with i_gc_failures field used for regular inode, in order to
avoid affecting i_gc_failures' value, this patch fixes to initialize
the union's fields according to inode type.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0cfe75c5 27-Apr-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: enhance sanity_check_raw_super() to avoid potential overflows

In order to avoid the below overflow issue, we should have checked the
boundaries in superblock before reaching out to allocation. As Linus suggested,
the right place should be sanity_check_raw_super().

Dr Silvio Cesare of InfoSect reported:

There are integer overflows with using the cp_payload superblock field in the
f2fs filesystem potentially leading to memory corruption.

include/linux/f2fs_fs.h

struct f2fs_super_block {
...
__le32 cp_payload;

fs/f2fs/f2fs.h

typedef u32 block_t; /*
* should not change u32, since it is the on-disk block
* address format, __le32.
*/
...

static inline block_t __cp_payload(struct f2fs_sb_info *sbi)
{
return le32_to_cpu(F2FS_RAW_SUPER(sbi)->cp_payload);
}

fs/f2fs/checkpoint.c

block_t start_blk, orphan_blocks, i, j;
...
start_blk = __start_cp_addr(sbi) + 1 + __cp_payload(sbi);
orphan_blocks = __start_sum_addr(sbi) - 1 - __cp_payload(sbi);

+++ integer overflows

...
unsigned int cp_blks = 1 + __cp_payload(sbi);
...
sbi->ckpt = kzalloc(cp_blks * blk_size, GFP_KERNEL);

+++ integer overflow leading to incorrect heap allocation.

int cp_payload_blks = __cp_payload(sbi);
...
ckpt->cp_pack_start_sum = cpu_to_le32(1 + cp_payload_blks +
orphan_blocks);

+++ sign bug and integer overflow

...
for (i = 1; i < 1 + cp_payload_blks; i++)

+++ integer overflow

...

sbi->max_orphans = (sbi->blocks_per_seg - F2FS_CP_PACKS -
NR_CURSEG_TYPE - __cp_payload(sbi)) *
F2FS_ORPHANS_PER_BLOCK;

+++ integer overflow

Reported-by: Greg KH <greg@kroah.com>
Reported-by: Silvio Cesare <silvio.cesare@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b2532c69 23-Apr-2018 Chao Yu <chao@kernel.org>

f2fs: rename dio_rwsem to i_gc_rwsem

RW semphore dio_rwsem in struct f2fs_inode_info is introduced to avoid
race between dio and data gc, but now, it is more wildly used to avoid
foreground operation vs data gc. So rename it to i_gc_rwsem to improve
its readability.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 59c84408 03-Apr-2018 Chao Yu <chao@kernel.org>

f2fs: introduce private inode status mapping

Previously, we use generic FS_*_FL defined by vfs to indicate inode status
for each bit of i_flags, so f2fs's flag status definition is tied to vfs'
one, it will be hard for f2fs to reuse bits f2fs never used to indicate
new status..

In order to solve this issue, we introduce private inode status mapping,
Note, for these bits have already been persisted into disk, we should
never change their definition, for other ones, we can remap them for
later new coming status.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d6290814 25-May-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add fsync_mode=nobarrier for non-atomic files

For non-atomic files, this patch adds an option to give nobarrier which
doesn't issue flush commands to the device.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e12ee683 30-Apr-2018 Eric Biggers <ebiggers@google.com>

fscrypt: make fscrypt_operations.max_namelen an integer

Now ->max_namelen() is only called to limit the filename length when
adding NUL padding, and only for real filenames -- not symlink targets.
It also didn't give the correct length for symlink targets anyway since
it forgot to subtract 'sizeof(struct fscrypt_symlink_data)'.

Thus, change ->max_namelen from a function to a simple 'unsigned int'
that gives the filesystem's maximum filename length.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 6dbb1796 18-Apr-2018 Eric Biggers <ebiggers@google.com>

f2fs: refactor read path to allow multiple postprocessing steps

Currently f2fs's ->readpage() and ->readpages() assume that either the
data undergoes no postprocessing, or decryption only. But with
fs-verity, there will be an additional authenticity verification step,
and it may be needed either by itself, or combined with decryption.

To support this, store a 'struct bio_post_read_ctx' in ->bi_private
which contains a work struct, a bitmask of postprocessing steps that are
enabled, and an indicator of the current step. The bio completion
routine, if there was no I/O error, enqueues the first postprocessing
step. When that completes, it continues to the next step. Pages that
fail any postprocessing step have PageError set. Once all steps have
completed, pages without PageError set are set Uptodate, and all pages
are unlocked.

Also replace f2fs_encrypted_file() with a new function
f2fs_post_read_required() in places like direct I/O and garbage
collection that really should be testing whether the file needs special
I/O processing, not whether it is encrypted specifically.

This may also be useful for other future f2fs features such as
compression.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2f520529 21-Mar-2018 Yunlong Song <yunlong.song@huawei.com>

f2fs: no need to initialize zero value for GFP_F2FS_ZERO

Since f2fs_inode_info is allocated with flag GFP_F2FS_ZERO, so we do not
need to initialize zero value for its member any more.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 02117b8a 16-Mar-2018 Ritesh Harjani <riteshh@codeaurora.org>

f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_read

Quota code itself is serializing the operations by taking mutex_lock.
It seems a below deadlock can happen if GF_NOFS is not used in
f2fs_quota_read

__switch_to+0x88
__schedule+0x5b0
schedule+0x78
schedule_preempt_disabled+0x20
__mutex_lock_slowpath+0xdc //mutex owner is itself
mutex_lock+0x2c
dquot_commit+0x30 //mutex_lock(&dqopt->dqio_mutex);
dqput+0xe0
__dquot_drop+0x80
dquot_drop+0x48
f2fs_evict_inode+0x218
evict+0xa8
dispose_list+0x3c
prune_icache_sb+0x58
super_cache_scan+0xf4
do_shrink_slab+0x208
shrink_slab.part.40+0xac
shrink_zone+0x1b0
do_try_to_free_pages+0x25c
try_to_free_pages+0x164
__alloc_pages_nodemask+0x534
do_read_cache_page+0x6c
read_cache_page+0x14
f2fs_quota_read+0xa4
read_blk+0x54
find_tree_dqentry+0xe4
find_tree_dqentry+0xb8
find_tree_dqentry+0xb8
find_tree_dqentry+0xb8
qtree_read_dquot+0x68
v2_read_dquot+0x24
dquot_acquire+0x5c // mutex_lock(&dqopt->dqio_mutex);
dqget+0x238
__dquot_initialize+0xd4
dquot_initialize+0x10
dquot_file_open+0x34
f2fs_file_open+0x6c
do_dentry_open+0x1e4
vfs_open+0x6c
path_openat+0xa20
do_filp_open+0x4c
do_sys_open+0x178

Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ff62af20 15-Mar-2018 Sheng Yong <shengyong1@huawei.com>

f2fs: introduce a new mount option test_dummy_encryption

This patch introduces a new mount option `test_dummy_encryption'
to allow fscrypt to create a fake fscrypt context. This is used
by xfstests.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b7c409de 15-Mar-2018 Sheng Yong <shengyong1@huawei.com>

f2fs: introduce F2FS_FEATURE_LOST_FOUND feature

This patch introduces a new feature, F2FS_FEATURE_LOST_FOUND, which
is set by mkfs. mkfs creates a directory named lost+found, which saves
unreachable files. If fsck finds a file which has no parent, or its
parent is removed by fsck, the file will be placed under lost+found
directory by fsck.

lost+found directory could not be encrypted. As a result, the root
directory cannot be encrypted too. So if LOST_FOUND feature is enabled,
let's avoid to encrypt root directory.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 63189b78 07-Mar-2018 Chao Yu <chao@kernel.org>

f2fs: wrap all options with f2fs_sb_info.mount_opt

This patch merges miscellaneous mount options into struct f2fs_mount_info,
After this patch, once we add new mount option, we don't need to worry
about recovery of it in remount_fs(), since we will recover the
f2fs_sb_info.mount_opt including all options.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 93cf93f1 06-Mar-2018 Junling Zheng <zhengjunling@huawei.com>

f2fs: introduce mount option for fsync mode

Commit "0a007b97aad6"(f2fs: recover directory operations by fsync)
fixed xfstest generic/342 case, but it also increased the written
data and caused the performance degradation. In most cases, there's
no need to do so heavy fsync actually.

So we introduce new mount option "fsync_mode={posix,strict}" to
control the policy of fsync. "fsync_mode=posix" is set by default,
and means that f2fs uses a light fsync, which follows POSIX semantics.
And "fsync_mode=strict" means that it's a heavy fsync, which behaves
in line with xfs, ext4 and btrfs, where generic/342 will pass, but
the performance will regress.

Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# eea52882 01-Mar-2018 Chao Yu <chao@kernel.org>

f2fs: fix to restore old mount option in ->remount_fs

This patch fixes to restore old mount option once we encounter failure
in ->remount_fs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# deeedd71 01-Mar-2018 Chao Yu <chao@kernel.org>

f2fs: wrap sb_rdonly with f2fs_readonly

Use f2fs_readonly to wrap sb_rdonly for cleanup, and spread it in
all places.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 782911f4 25-Feb-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: set readdir_ra by default

It gives general readdir improvement.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 84b89e5d 22-Feb-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add auto tuning for small devices

If f2fs is running on top of very small devices, it's worth to avoid abusing
free LBAs. In order to achieve that, this patch introduces some parameter
tuning.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 07939627 18-Feb-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add mount option for segment allocation policy

This patch adds an mount option, "alloc_mode=%s" having two options, "default"
and "reuse".

In "alloc_mode=reuse" case, f2fs starts to allocate segments from 0'th segment
all the time to reassign segments. It'd be useful for small-sized eMMC parts.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 846ae671 26-Feb-2018 Chao Yu <chao@kernel.org>

f2fs: expose extension_list sysfs entry

This patch adds a sysfs entry 'extension_list' to support
query/add/del item in extension list.

Query:
cat /sys/fs/f2fs/<device>/extension_list

Add:
echo 'extension' > /sys/fs/f2fs/<device>/extension_list

Del:
echo '!extension' > /sys/fs/f2fs/<device>/extension_list

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d0d3f1b3 11-Feb-2018 Chao Yu <chao@kernel.org>

f2fs: introduce sb_lock to make encrypt pwsalt update exclusive

f2fs_super_block.encrypt_pw_salt can be udpated and persisted
concurrently, result in getting different pwsalt in separated
threads, so let's introduce sb_lock to exclude concurrent
accessers.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ccd31cb2 05-Feb-2018 Sheng Yong <shengyong1@huawei.com>

f2fs: clean up f2fs_sb_has_xxx functions

This patch introduces F2FS_FEATURE_FUNCS to clean up the definitions of
different f2fs_sb_has_xxx functions.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f2e703f9 30-Jan-2018 Hyunchul Lee <cheol.lee@lge.com>

f2fs: support passing down write hints to block layer with F2FS policy

Add 'whint_mode=fs-based' mount option. In this mode, F2FS passes
down write hints with its policy.

* whint_mode=fs-based. F2FS passes down hints with its policy.

User F2FS Block
---- ---- -----
META WRITE_LIFE_MEDIUM;
HOT_NODE WRITE_LIFE_NOT_SET
WARM_NODE "
COLD_NODE WRITE_LIFE_NONE
ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
extension list " "

-- buffered io
WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_LONG
WRITE_LIFE_NONE " "
WRITE_LIFE_MEDIUM " "
WRITE_LIFE_LONG " "

-- direct io
WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE " WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG " WRITE_LIFE_LONG

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0cdd3195 30-Jan-2018 Hyunchul Lee <cheol.lee@lge.com>

f2fs: support passing down write hints given by users to block layer

Add the 'whint_mode' mount option that controls which write
hints are passed down to block layer. There are "off" and
"user-based" mode. The default mode is "off".

1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.

2) whint_mode=user-based. F2FS tries to pass down hints given
by users.

User F2FS Block
---- ---- -----
META WRITE_LIFE_NOT_SET
HOT_NODE "
WARM_NODE "
COLD_NODE "
ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
extension list " "

-- buffered io
WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE " "
WRITE_LIFE_MEDIUM " "
WRITE_LIFE_LONG " "

-- direct io
WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE " WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG " WRITE_LIFE_LONG

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid build warning]
[Chao Yu: fix to restore whint_mode in ->remount_fs]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4d817ae0 27-Jan-2018 Chao Yu <chao@kernel.org>

f2fs: restrict inline_xattr_size configuration

This patch limits to enable inline_xattr_size mount option only if
both extra_attr and flexible_inline_xattr feature is on in current
image.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0964fc1a 29-Jan-2018 Sheng Yong <shengyong1@huawei.com>

f2fs: fix potential corruption in area before F2FS_SUPER_OFFSET

sb_getblk does not guarantee the buffer head is uptodate. If bh is not
uptodate, the data (may be used as boot code) in area before
F2FS_SUPER_OFFSET may get corrupted when super block is committed.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d7997e63 17-Jan-2018 Chao Yu <chao@kernel.org>

f2fs: clean up error path of fill_super

This patch cleans up error path of fille_super to avoid unneeded
release step.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7c2e5963 04-Jan-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add resgid and resuid to reserve root blocks

This patch adds mount options to reserve some blocks via resgid=%u,resuid=%u.
It only activates with reserve_root=%u.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 578c6478 09-Jan-2018 Yufen Yu <yuyufen@huawei.com>

f2fs: implement cgroup writeback support

Cgroup writeback requires explicit support from the filesystem.
f2fs's data and node writeback IOs go through __write_data_page,
which sets fio for submiting IOs. So, we add io_wbc for fio,
associate bios with blkcg by invoking wbc_init_bio() and
account IOs issuing by wbc_account_io().
In addtion, f2fs_fill_super() is updated to set SB_I_CGROUPWB.

Meta writeback IOs is left alone by this patch and will always be
attributed to the root cgroup.

The results show that f2fs can throttle writeback nicely for
data writing and file creating.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 94b1e10e 05-Jan-2018 Wei Yongjun <weiyongjun1@huawei.com>

f2fs: make local functions static

Fixes the following sparse warnings:

fs/f2fs/segment.c:887:6: warning:
symbol '__check_sit_bitmap' was not declared. Should it be static?
fs/f2fs/segment.c:1327:6: warning:
symbol 'f2fs_wait_discard_bio' was not declared. Should it be static?
fs/f2fs/super.c:1661:5: warning:
symbol 'f2fs_get_projid' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7e65be49 27-Dec-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add reserved blocks for root user

This patch allows root to reserve some blocks via mount option.

"-o reserve_root=N" means N x 4KB-sized blocks for root only.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f66c027e 03-Jan-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: show precise # of blocks that user/root can use

Let's show precise # of blocks that user/root can use through bavail and bfree
respectively.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6279398d 02-Jan-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: enable quota at remount from r to w

We have to enable quota only when remounting from read to write. Otherwise,
we'll get remount failure. (e.g., write to write case)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bae01eda 30-Nov-2017 Chao Yu <chao@kernel.org>

f2fs: fix error handling in fill_super

In fill_super, if we fail to call f2fs_build_stats(), it needs to detach
from global f2fs shrink list, otherwise once system starts to shrink slab
cache, we will encounter below panic:

BUG: unable to handle kernel paging request at 00007d35
Oops: 0002 [#1] PREEMPT SMP
EIP: __lock_acquire+0x70/0x12c0
Call Trace:
lock_acquire+0xae/0x220
mutex_trylock+0xc5/0xf0
f2fs_shrink_count+0x32/0xb0 [f2fs]
shrink_slab+0xf1/0x5b0
drop_slab_node+0x35/0x60
drop_slab+0xf/0x20
drop_caches_sysctl_handler+0x79/0xc0
proc_sys_call_handler+0xa4/0xc0
proc_sys_write+0x1f/0x30
__vfs_write+0x24/0x150
SyS_write+0x44/0x90
do_fast_syscall_32+0xa1/0x1ca
entry_SYSENTER_32+0x4c/0x7b

In addition, this patch relocates f2fs_join_shrinker in fill_super to
avoid unneeded error handling of it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4e6aad29 30-Nov-2017 Chao Yu <chao@kernel.org>

f2fs: spread f2fs_k{m,z}alloc

Use f2fs_k{m,z}alloc as much as possible to increase fault injection
points.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 628b3d14 30-Nov-2017 Chao Yu <chao@kernel.org>

f2fs: inject fault to kvmalloc

This patch supports to inject fault into kvmalloc/kvzalloc.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 292c196a 16-Nov-2017 Chao Yu <chao@kernel.org>

f2fs: reserve nid resource for quota sysfile

During mkfs, quota sysfiles have already occupied nid resource,
it needs to adjust remaining available nid count in kernel side.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1751e8a6 27-Nov-2017 Linus Torvalds <torvalds@linux-foundation.org>

Rename superblock flags (MS_xyz -> SB_xyz)

This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

# places to look in; re security/*: it generally should *not* be
# touched (that stuff parses mount(2) arguments directly), but
# there are two places where we really deal with superblock flags.
FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
include/linux/fs.h include/uapi/linux/bfs_fs.h \
security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
# the list of MS_... constants
SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
ACTIVE NOUSER"

SED_PROG=
for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

# we want files that contain at least one of MS_...,
# with fs/namespace.c and fs/pnode.c excluded.
L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

for f in $L; do sed -i $f $SED_PROG; done

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d1954ab4 30-Oct-2017 Jeff Layton <jlayton@kernel.org>

f2fs: don't bother with inode->i_version

f2fs does not set the SB_I_VERSION flag, so the i_version will never
be incremented on write. It was recently changed to increment the
i_version on a quota write, which isn't necessary here.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ea676733 06-Oct-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: support quota sys files

This patch supports hidden quota files in the system, which will be used for
Android. It requires up-to-date f2fs-tools later than v1.9.0.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d62fe971 28-Oct-2017 Chao Yu <chao@kernel.org>

f2fs: support bio allocation error injection

This patch adds to support bio allocation error injection to simulate
out-of-memory test scenario.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 01eccef7 28-Oct-2017 Chao Yu <chao@kernel.org>

f2fs: support get_page error injection

This patch adds to support get_page error injection to simulate
out-of-memory test scenario.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 80d42145 27-Oct-2017 Yunlong Song <yunlong.song@huawei.com>

f2fs: support soft block reservation

It supports to extend reserved_blocks sysfs interface to be soft
threshold, which allows user configure it exceeding current available
user space. This patch also introduces a new sysfs interface called
current_reserved_blocks, which shows the current blocks that have
already been reserved.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6afc662e 06-Sep-2017 Chao Yu <chao@kernel.org>

f2fs: support flexible inline xattr size

Now, in product, more and more features based on file encryption were
introduced, their demand of xattr space is increasing, however, inline
xattr has fixed-size of 200 bytes, once inline xattr space is full, new
increased xattr data would occupy additional xattr block which may bring
us more space usage and performance regression during persisting.

In order to resolve above issue, it's better to expand inline xattr size
flexibly according to user's requirement.

So this patch introduces new filesystem feature 'flexible inline xattr',
and new mount option 'inline_xattr_size=%u', once mkfs enables the
feature, we can use the option to make f2fs supporting flexible inline
xattr size.

To support this feature, we add extra attribute i_inline_xattr_size in
inode layout, indicating that how many space inline xattr borrows from
block address mapping space in inode layout, by this, we can easily
locate and store flexible-sized inline xattr data in inode.

Inode disk layout:
+----------------------+
| .i_mode |
| ... |
| .i_ext |
+----------------------+
| .i_extra_isize |
| .i_inline_xattr_size |-----------+
| ... | |
+----------------------+ |
| .i_addr | |
| - block address or | |
| - inline data | |
+----------------------+<---+ v
| inline xattr | +---inline xattr range
+----------------------+<---+
| .i_nid |
+----------------------+
| node_footer |
| (nid, ino, offset) |
+----------------------+

Note that, we have to cnosider backward compatibility which reserved
inline_data space, 200 bytes, all the time, reported by Sheng Yong.

Previous inline data or directory always reserved 200 bytes in inode layout,
even if inline_xattr is disabled. In order to keep inline_dentry's structure
for backward compatibility, we get the space back only from inline_data.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1f227a3e 23-Oct-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: stop all the operations by cp_error flag

This patch replaces to use cp_error flag instead of RDONLY for quota off.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6e5b5d41 19-Oct-2017 Jaegeuk Kim <jaegeuk@kernel.org>

Revert "f2fs: return wrong error number on f2fs_quota_write"

This reverts commit 4f31d26b0c17f2aae6a6afeb823a87e20671ab4b.

It turns out that we need to report error number if nothing was written.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4e46a023 19-Oct-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: retry ENOMEM for quota_read|write

This gives another chance to read or write quota data.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 57864ae5 18-Oct-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: limit # of inmemory pages

If some abnormal users try lots of atomic write operations, f2fs is able to
produce pinned pages in the main memory which affects system performance.
This patch limits that as 20% over total memory size, and if f2fs reaches
to the limit, it will drop all the inmemory pages.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 204b4ae0 07-Oct-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs/crypto: drop crypto key at evict_inode only

This patch avoids dropping crypto key in f2fs_drop_inode, so we can guarantee
it happens only at evict_inode.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# cf5c759f 03-Oct-2017 Chao Yu <chao@kernel.org>

f2fs: give up CP_TRIMMED_FLAG if it drops discards

In ->umount, once we drop remained discard entries, we should not
set CP_TRIMMED_FLAG with another checkpoint.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 8412663d 03-Oct-2017 Chao Yu <chao@kernel.org>

f2fs: support issuing/waiting discard in range

Fstrim intends to trim invalid blocks of filesystem only with specified
range and granularity, but actually, it will issue all previous cached
discard commands which may be out-of-range and be with unmatched
granularity, it's unneeded.

In order to fix above issues, this patch introduces new helps to support
to issue and wait discard in range and adds a new fstrim_list for tracking
in-flight discard from ->fstrim.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ffcc4182 09-Oct-2017 Eric Biggers <ebiggers@google.com>

fscrypt: remove unneeded empty fscrypt_operations structs

In the case where a filesystem has been configured without encryption
support, there is no longer any need to initialize ->s_cop at all, since
none of the methods are ever called.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# f7293e48 09-Oct-2017 Eric Biggers <ebiggers@google.com>

fscrypt: remove ->is_encrypted()

Now that all callers of fscrypt_operations.is_encrypted() have been
switched to IS_ENCRYPTED(), remove ->is_encrypted().

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 1228b482 28-Sep-2017 Chao Yu <chao@kernel.org>

f2fs: fix to flush multiple device in checkpoint

If f2fs manages multiple devices, in checkpoint, we need to issue flush
in those devices which contain dirty data/node in their cache before
we write checkpoint region, otherwise, filesystem metadata could be
corrupted if hitting SPO after checkpoint.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 638164a2 01-Oct-2017 Chao Yu <chao@kernel.org>

f2fs: fix potential panic during fstrim

As Ju Hyung Park reported:

"When 'fstrim' is called for manual trim, a BUG() can be triggered
randomly with this patch.

I'm seeing this issue on both x86 Desktop and arm64 Android phone.

On x86 Desktop, this was caused during Ubuntu boot-up. I have a
cronjob installed which calls 'fstrim -v /' during boot. On arm64
Android, this was caused during GC looping with 1ms gc_min_sleep_time
& gc_max_sleep_time."

Root cause of this issue is that f2fs_wait_discard_bios can only be
used by f2fs_put_super, because during put_super there must be no
other referrers, so it can ignore discard entry's reference count
when removing the entry, otherwise in other caller we will hit bug_on
in __remove_discard_cmd as there may be other issuer added reference
count in discard entry.

Thread A Thread B
- issue_discard_thread
- f2fs_ioc_fitrim
- f2fs_trim_fs
- f2fs_wait_discard_bios
- __issue_discard_cmd
- __submit_discard_cmd
- __wait_discard_cmd
- dc->ref++
- __wait_one_discard_bio
- __wait_discard_cmd
- __remove_discard_cmd
- f2fs_bug_on(sbi, dc->ref)

Fixes: 969d1b180d987c2be02de890d0fff0f66a0e80de
Reported-by: Ju Hyung Park <qkrwngud825@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 27161f13 06-Sep-2017 Yunlei He <heyunlei@huawei.com>

f2fs: avoid race in between read xattr & write xattr

Thread A: Thread B:
-f2fs_getxattr
-lookup_all_xattrs
-xnid = F2FS_I(inode)->i_xattr_nid;
-f2fs_setxattr
-__f2fs_setxattr
-write_all_xattrs
-truncate_xattr_node
... ...
-write_checkpoint
... ...
-alloc_nid <- nid reuse
-get_node_page
-f2fs_bug_on <- nid != node_footer->nid

It's need a rw_sem to avoid the race

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f62fc9f9 31-Aug-2017 Arvind Yadav <arvind.yadav.cs@gmail.com>

f2fs: constify super_operations

super_operations are not supposed to change at runtime.
"struct super_block" working with super_operations provided
by <linux/fs.h> work with const super_operations. So mark
the non-const structs as const

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4b2414d0 07-Aug-2017 Chao Yu <chao@kernel.org>

f2fs: support journalled quota

This patch supports to enable f2fs to accept quota information through
mount option:
- {usr,grp,prj}jquota=<quota file path>
- jqfmt=<quota type>

Then, in ->mount flow, we can recover quota file during log replaying,
by this, journelled quota can be supported.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: Fix wrong return values.]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9a20d391 07-Aug-2017 Chao Yu <chao@kernel.org>

f2fs: avoid unneeded sync on quota file

We only need to sync quota file with appointed quota type instead of all
types in f2fs_quota_{on,off}.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b0af6d49 02-Aug-2017 Chao Yu <chao@kernel.org>

f2fs: add app/fs io stat

This patch enables inner app/fs io stats and introduces below virtual fs
nodes for exposing stats info:
/sys/fs/f2fs/<dev>/iostat_enable
/proc/fs/f2fs/<dev>/iostat_info

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix wrong stat assignment]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a36c106d 02-Aug-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use printk_ratelimited for f2fs_msg

This patch reduces contention of printks.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 704956ec 31-Jul-2017 Chao Yu <chao@kernel.org>

f2fs: support inode checksum

This patch adds to support inode checksum in f2fs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix verification flow]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4f31d26b 30-Jul-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: return wrong error number on f2fs_quota_write

This must return size, not error number.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ddc34e32 28-Jul-2017 Chao Yu <chao@kernel.org>

f2fs: introduce f2fs_statfs_project

This patch introduces f2fs_statfs_project, it enables to show usage
status of directory tree which is limited with project quota.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# dc6b2055 26-Jul-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid naming confusion of sysfs init

This patch changes the function names of sysfs init to follow ext4.

f2fs_init_sysfs <-> f2fs_register_sysfs
f2fs_exit_sysfs <-> f2fs_unregister_sysfs

Suggested-by: Chao Yu <yuchao0@huawei.com>
Reivewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5c57132e 25-Jul-2017 Chao Yu <chao@kernel.org>

f2fs: support project quota

This patch adds to support plain project quota.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7a2af766 18-Jul-2017 Chao Yu <chao@kernel.org>

f2fs: enhance on-disk inode structure scalability

This patch add new flag F2FS_EXTRA_ATTR storing in inode.i_inline
to indicate that on-disk structure of current inode is extended.

In order to extend, we changed the inode structure a bit:

Original one:

struct f2fs_inode {
...
struct f2fs_extent i_ext;
__le32 i_addr[DEF_ADDRS_PER_INODE];
__le32 i_nid[DEF_NIDS_PER_INODE];
}

Extended one:

struct f2fs_inode {
...
struct f2fs_extent i_ext;
union {
struct {
__le16 i_extra_isize;
__le16 i_padding;
__le32 i_extra_end[0];
};
__le32 i_addr[DEF_ADDRS_PER_INODE];
};
__le32 i_nid[DEF_NIDS_PER_INODE];
}

Once F2FS_EXTRA_ATTR is set, we will steal four bytes in the head of
i_addr field for storing i_extra_isize and i_padding. with i_extra_isize,
we can calculate actual size of reserved space in i_addr, available
attribute fields included in total extra attribute fields for current
inode can be described as below:

+--------------------+
| .i_mode |
| ... |
| .i_ext |
+--------------------+
| .i_extra_isize |-----+
| .i_padding | |
| .i_prjid | |
| .i_atime_extra | |
| .i_ctime_extra | |
| .i_mtime_extra |<----+
| .i_inode_cs |<----- store blkaddr/inline from here
| .i_xattr_cs |
| ... |
+--------------------+
| |
| block address |
| |
+--------------------+
| .i_nid |
+--------------------+
| node_footer |
| (nid, ino, offset) |
+--------------------+

Hence, with this patch, we would enhance scalability of f2fs inode for
storing more newly added attribute.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f2470371 18-Jul-2017 Chao Yu <chao@kernel.org>

f2fs: make max inline size changeable

This patch tries to make below macros calculating max inline size,
inline dentry field size considerring reserving size-changeable
space:
- MAX_INLINE_DATA
- NR_INLINE_DENTRY
- INLINE_DENTRY_BITMAP_SIZE
- INLINE_RESERVED_SIZE

Then, when inline_{data,dentry} options is enabled, it allows us to
reserve inline space with different size flexibly for adding newly
introduced inode attribute.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0abd675e 08-Jul-2017 Chao Yu <chao@kernel.org>

f2fs: support plain user/group quota

This patch adds to support plain user/group quota.

Change Note by Jaegeuk Kim.

- Use f2fs page cache for quota files in order to consider garbage collection.
so, quota files are not tolerable for sudden power-cuts, so user needs to do
quotacheck.

- setattr() calls dquot_transfer which will transfer inode->i_blocks.
We can't reclaim that during f2fs_evict_inode(). So, we need to count
node blocks as well in order to match i_blocks with dquot's space.

Note that, Chao wrote a patch to count inode->i_blocks without inode block.
(f2fs: don't count inode block in in-memory inode.i_blocks)

- in f2fs_remount, we need to make RW in prior to dquot_resume.

- handle fault_injection case during f2fs_quota_off_umount

- TODO: Project quota

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6ac851ba 04-Jul-2017 Chao Yu <chao@kernel.org>

Revert "f2fs: fix to clean previous mount option when remount_fs"

Don't clear old mount option before parse new option during ->remount_fs
like other generic filesystems.

This reverts commit 26666c8a4366debae30ae37d0688b2bec92d196a.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# cce13252 29-Jun-2017 Chao Yu <chao@kernel.org>

f2fs: stop gc/discard thread in prior during umount

This patch resolves kernel panic for xfstests/081, caused by recent f2fs_bug_on

f2fs: add f2fs_bug_on in __remove_discard_cmd

For fixing, we will stop gc/discard thread in prior in ->kill_sb in order to
avoid referring and releasing race among them.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# daeb433e 26-Jun-2017 Chao Yu <chao@kernel.org>

f2fs: introduce reserved_blocks in sysfs

In this patch, we add a new sysfs interface, with it, we can control
number of reserved blocks in system which could not be used by user,
it enable f2fs to let user to configure for adjusting over-provision
ratio dynamically instead of changing it by mkfs.

So we can expect it will help to reserve more free space for relieving
GC in both filesystem and flash device.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0cc091d0 21-Jun-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: report # of free inodes more precisely

If the partition is small, we don't need to report total # of inodes including
hidden free nodes.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 663f387b 14-Jun-2017 Chao Yu <chao@kernel.org>

f2fs: set CP_TRIMMED_FLAG correctly

Don't set CP_TRIMMED_FLAG for non-zoned block device or discard
unsupported device, it can avoid to trigger unneeded checkpoint for
that kind of device.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 8ceffcb2 14-Jun-2017 Chao Yu <chao@kernel.org>

f2fs: move sysfs code from super.c to fs/f2fs/sysfs.c

Codes related to sysfs and procfs are dispersive and mixed with sb
related codes, but actually these codes are independent from others,
so split them from super.c, and reorgnize and manger them in sysfs.c.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a398101a 14-Jun-2017 Chao Yu <chao@kernel.org>

f2fs: clean up sysfs codes

Just cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1727f317 11-Jun-2017 Chao Yu <chao@kernel.org>

f2fs: fix wrong error number of fill_super

This patch fixes incorrect error number in error path of fill_super.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 44529f89 11-Jun-2017 Chao Yu <chao@kernel.org>

f2fs: fix to show injection rate in ->show_options

If fault injection functionality is enabled, show additional injection
rate in ->show_options.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b63def91 11-Jun-2017 Christophe JAILLET <christophe.jaillet@wanadoo.fr>

f2fs: Fix a return value in case of error in 'f2fs_fill_super'

err must be set to -ENOMEM, otherwise we return 0.

Fixes: a912b54d3aaa0 ("f2fs: split bio cache")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5a3a2d83 17-May-2017 Qiuyang Sun <sunqiuyang@huawei.com>

f2fs: dax: fix races between page faults and truncating pages

Currently in F2FS, page faults and operations that truncate the pagecahe
or data blocks, are completely unsynchronized. This can result in page
fault faulting in a page into a range that we are changing after
truncating, and thus we can end up with a page mapped to disk blocks that
will be shortly freed. Filesystem corruption will shortly follow.

This patch fixes the problem by creating new rw semaphore i_mmap_sem in
f2fs_inode_info and grab it for functions removing blocks from extent tree
and for read over page faults. The mechanism is similar to that in ext4.

Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 85787090 10-May-2017 Christoph Hellwig <hch@lst.de>

fs: switch ->s_uuid to uuid_t

For some file systems we still memcpy into it, but in various places this
already allows us to use the proper uuid helpers. More to come..

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> (Changes to IMA/EVM)
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>


# fb830fc5 19-May-2017 Chao Yu <chao@kernel.org>

f2fs: introduce io_list for serialize data/node IOs

Serialize data/node IOs by using fifo list instead of mutex lock,
it will help to enhance concurrency of f2fs, meanwhile keeping LFS
IO semantics.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e41e6d75 19-May-2017 Chao Yu <chao@kernel.org>

f2fs: split wio_mutex

Split wio_mutex to adjust different temperature bio cache.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a912b54d 10-May-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: split bio cache

Split DATA/NODE type bio cache according to different temperature,
so write IOs with the same temperature can be merged in corresponding
bio cache as much as possible, otherwise, different temperature write
IOs submitting into one bio cache will always cause split of bio.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b9109b0e 10-May-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove unnecessary read cases in merged IO flow

Merged IO flow doesn't need to care about read IOs.

f2fs_submit_merged_bio -> f2fs_submit_merged_write
f2fs_submit_merged_bios -> f2fs_submit_merged_writes
f2fs_submit_merged_bio_cond -> f2fs_submit_merged_write_cond

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 15d3042a 15-May-2017 Jin Qian <jinqian@google.com>

f2fs: sanity check checkpoint segno and blkoff

Make sure segno and blkoff read from raw image are valid.

Cc: stable@vger.kernel.org
Signed-off-by: Jin Qian <jinqian@google.com>
[Jaegeuk Kim: adjust minor coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3adc5fcb 02-May-2017 Jan Kara <jack@suse.cz>

f2fs: Make flush bios explicitely sync

Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions. generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions.

Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.

Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3
Cc: stable@vger.kernel.org # 4.9+
CC: Jaegeuk Kim <jaegeuk@kernel.org>
CC: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1f43e2ad 27-Apr-2017 Chao Yu <chao@kernel.org>

f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard

Introduce CP_TRIMMED_FLAG to indicate all invalid block were trimmed
before umount, so once we do mount with image which contain the flag,
we don't record invalid blocks as undiscard one, when fstrim is being
triggered, we can avoid issuing redundant discard commands.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b9dd4618 25-Apr-2017 Jin Qian <jinqian@google.com>

f2fs: sanity check segment count

F2FS uses 4 bytes to represent block address. As a result, supported
size of disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments.

Signed-off-by: Jin Qian <jinqian@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 302bd348 07-Apr-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: clean up get_valid_blocks with consistent parameter

This patch cleans up get_valid_blocks, which has no functional change.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d431413f 05-Apr-2017 Chao Yu <chao@kernel.org>

f2fs: introduce f2fs_wait_discard_bios

Split f2fs_wait_discard_bios from f2fs_wait_discard_bio, just for cleanup,
no logic change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 687de7f1 28-Mar-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE

If two threads try to flush dirty pages in different inodes respectively,
f2fs_write_data_pages() will produce WRITE and WRITE_SYNC one at a time,
resulting in a lot of 4KB seperated IOs.

So, this patch gives higher priority to WB_SYNC_ALL IOs and gathers write
IOs with a big WRITE_SYNC'ed bio.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ef095d19 24-Mar-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: write small sized IO to hot log

It would better split small and large IOs separately in order to get more
consecutive big writes.

The default threshold is set to 64KB, but configurable by sysfs/min_hot_blocks.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7a20b8a6 24-Mar-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: allocate node and hot data in the beginning of partition

In order to give more spatial locality, this patch changes the block allocation
policy which assigns beginning of partition for small and hot data/node blocks.
In order to do this, we set noheap allocation by default and introduce another
mount option, heap, to reset it back.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 59c9081b 13-Mar-2017 Yunlei He <heyunlei@huawei.com>

f2fs: allow write page cache when writting cp

This patch allow write data to normal file when writting
new checkpoint.

We relax three limitations for write_begin path:
1. data allocation
2. node allocation
3. variables in checkpoint

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 14b44d23 09-Mar-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add fault injection on f2fs_truncate

Inject a fault during f2fs_truncate().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# aa51d08a 07-Mar-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: build stat_info before orphan inode recovery

f2fs_sync_fs() -> write_checkpoint() calls stat_inc_cp_count(sbi->stat_info),
which needs stat_info allocation.
Otherwise, we can hit:

[254042.598623] ? count_shadow_nodes+0xa0/0xa0
[254042.598633] f2fs_sync_fs+0x65/0xd0 [f2fs]
[254042.598645] f2fs_balance_fs_bg+0xe4/0x1c0 [f2fs]
[254042.598657] f2fs_write_node_pages+0x34/0x1a0 [f2fs]
[254042.598664] ? pagevec_lookup_entries+0x1e/0x30
[254042.598673] do_writepages+0x1e/0x30
[254042.598682] __writeback_single_inode+0x45/0x330
[254042.598688] writeback_single_inode+0xd7/0x190
[254042.598694] write_inode_now+0x86/0xa0
[254042.598699] iput+0x122/0x200
[254042.598709] f2fs_fill_super+0xd4a/0x14d0 [f2fs]
[254042.598717] mount_bdev+0x184/0x1c0
[254042.598934] ? f2fs_commit_super+0x100/0x100 [f2fs]
[254042.599142] f2fs_mount+0x15/0x20 [f2fs]
[254042.599349] mount_fs+0x39/0x160
[254042.599554] ? __alloc_percpu+0x15/0x20
[254042.599759] vfs_kern_mount+0x67/0x110
[254042.599972] do_mount+0x1bb/0xc80
[254042.600175] ? memdup_user+0x42/0x60
[254042.600380] SyS_mount+0x83/0xd0
[254042.600583] entry_SYSCALL_64_fastpath+0x1e/0xad

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b8d96a30 27-Feb-2017 Hou Pengyang <houpengyang@huawei.com>

f2fs: add f2fs_drop_inode tracepoint

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7bb3a371 27-Feb-2017 Masato Suzuki <masato.suzuki@wdc.com>

f2fs: Fix zoned block device support

The introduction of the multi-device feature partially broke the support
for zoned block devices. In the function f2fs_scan_devices, sbi->devs
allocation and initialization is skipped in the case of a single device
mount. This result in no device information structure being allocated
for the device. This is fine if the device is a regular device, but in
the case of a zoned block device, the device zone type array is not
initialized, which causes the function __f2fs_issue_discard_zone to fail
as get_blkz_type is unable to determine the zone type of a section.

Fix this by always allocating and initializing the sbi->devs device
information array even in the case of a single device if that device is
zoned. For this particular case, make sure to obtain a reference on the
single device so that the call to blkdev_put() in destroy_device_list
operates as expected.

Fixes: 3c62be17d4f562f4 ("f2fs: support multiple devices")
Cc: <stable@vger.kernel.org> # v4.10
Signed-off-by: Masato Suzuki <masato.suzuki@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a3ebfe4f 27-Feb-2017 Chao Yu <chao@kernel.org>

f2fs: fix to enlarge size of write_io_dummy mempool

It needs to double cache size of write_io_dummy mempool, otherwise we may
run out of cache in scenraio of Data/Node IOs were issued concurrently.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b6895e8f 27-Feb-2017 Chao Yu <chao@kernel.org>

f2fs: fix memory leak of write_io_dummy mempool during umount

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 39133a50 08-Feb-2017 Chao Yu <chao@kernel.org>

f2fs: enable inline_xattr by default

In android, since SElinux is enable, security policy will be appliedd for
each file, it stores in inode as an xattr entry, so it will take one 4k
size node block additionally for each file.

Let's enable inline_xattr by default in order to save storage space.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 23cf7212 14-Feb-2017 Chao Yu <chao@kernel.org>

f2fs: introduce noinline_xattr mount option

This patch introduces new mount option 'noinline_xattr', so we can disable
inline xattr functionality which is already set as a default mount option.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2ad0ef84 11-Feb-2017 Bhumika Goyal <bhumirks@gmail.com>

f2fs: super: constify fscrypt_operations structure

Declare fscrypt_operations structure as const as it is only stored in
the s_cop field of a super_block structure. This field is of type const,
so fscrypt_operations structure having this property can be made const
too.

File size before: fs/f2fs/super.o
text data bss dec hex filename
54131 31355 184 85670 14ea6 fs/f2fs/super.o

File size after: fs/f2fs/super.o
text data bss dec hex filename
54227 31259 184 85670 14ea6 fs/f2fs/super.o

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1200abb2 09-Feb-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: show checkpoint version at mount time

If we mounted f2fs successfully, let's show current checkpoint version.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0cc0dec2 26-Jan-2017 Kaixu Xia <xiakaixu@huawei.com>

f2fs: show the fault injection mount option

This patch shows the fault injection mount option in
f2fs_show_options().

Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0b54fb84 11-Jan-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: factor out discard command info into discard_cmd_control

This patch adds discard_cmd_control with the existing discarding controls.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6f69f0ed 07-Feb-2017 Eric Biggers <ebiggers@google.com>

fscrypt: constify struct fscrypt_operations

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Richard Weinberger <richard@nod.at>


# 4e6a8d9b 29-Dec-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: relax async discard commands more

This patch relaxes async discard commands to avoid waiting its end_io during
checkpoint.
Instead of waiting them during checkpoint, it will be done when actually reusing
them.

Test on initial partition of nvme drive.

# time fstrim /mnt/test

Before : 6.158s
After : 4.822s

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ec91538d 21-Dec-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: get io size bit from mount option

This patch adds to set io_size_bits from mount option.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0a595eba 14-Dec-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: support IO alignment for DATA and NODE writes

This patch implements IO alignment by filling dummy blocks in DATA and NODE
write bios. If we can guarantee, for example, 32KB or 64KB for such the IOs,
we can eliminate underlying dummy page problem which FTL conducts in order to
close MLC or TLC partial written pages.

Note that,
- it requires "-o mode=lfs".
- IO size should be power of 2, not exceed BIO_MAX_PAGES, 256.
- read IO is still 4KB.
- do checkpoint at fsync, if dummy NODE page was written.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f99e8648 12-Jan-2017 Damien Le Moal <damien.lemoal@wdc.com>

block: Rename blk_queue_zone_size and bdev_zone_size

All block device data fields and functions returning a number of 512B
sectors are by convention named xxx_sectors while names in the form
xxx_size are generally used for a number of bytes. The blk_queue_zone_size
and bdev_zone_size functions were not following this convention so rename
them.

No functional change is introduced by this patch.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>

Collapsed the two patches, they were nonsensically split and broke
bisection.

Signed-off-by: Jens Axboe <axboe@fb.com>


# a5d431ef 05-Jan-2017 Eric Biggers <ebiggers@google.com>

fscrypt: make fscrypt_operations.key_prefix a string

There was an unnecessary amount of complexity around requesting the
filesystem-specific key prefix. It was unclear why; perhaps it was
envisioned that different instances of the same filesystem type could
use different key prefixes, or that key prefixes could be binary.
However, neither of those things were implemented or really make sense
at all. So simplify the code by making key_prefix a const char *.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 5eba8c5d 07-Dec-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to access nullified flush_cmd_control pointer

f2fs_sync_file() remount_ro
- f2fs_readonly
- destroy_flush_cmd_control
- f2fs_issue_flush
- no fcc pointer!

So, this patch doesn't free fcc in this case, but just stop its kernel thread
which sends flush commands.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2040fce8 05-Dec-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: detect wrong layout

Previous mkfs.f2fs allows small partition inappropriately, so f2fs should detect
that as well.

Refer this in f2fs-tools.

mkfs.f2fs: detect small partition by overprovision ratio and # of segments

Reported-and-Tested-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 204706c7 02-Dec-2016 Jaegeuk Kim <jaegeuk@kernel.org>

Revert "f2fs: use percpu_counter for # of dirty pages in inode"

This reverts commit 1beba1b3a953107c3ff5448ab4e4297db4619c76.

The perpcu_counter doesn't provide atomicity in single core and consume more
DRAM. That incurs fs_mark test failure due to ENOMEM.

Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b08b12d2 18-Nov-2016 Chao Yu <chao@kernel.org>

f2fs: fix incorrect free inode count in ->statfs

While calculating inode count that we can create at most in the left space,
we should consider space which data/node blocks occupied, since we create
data/node mixly in main area. So fix the wrong calculation in ->statfs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3c62be17 06-Oct-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: support multiple devices

This patch implements multiple devices support for f2fs.
Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big
volume under one f2fs instance.

Internal block management is very simple, but we will modify block allocation
and background GC policy to boost IO speed by exploiting them accoording to
each device speed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b4b9d34c 04-Nov-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove checkpoint in f2fs_freeze

The generic freeze_super() calls sync_filesystems() before f2fs_freeze().
So, basically we don't need to do checkpoint in f2fs_freeze(). But, in xfs/068,
it triggers circular locking problem below due to gc_mutex for checkpoint.

======================================================
[ INFO: possible circular locking dependency detected ]
4.9.0-rc1+ #132 Tainted: G OE
-------------------------------------------------------

1. wait for __sb_start_write() by

[<ffffffff9845f353>] dump_stack+0x85/0xc2
[<ffffffff980e80bf>] print_circular_bug+0x1cf/0x230
[<ffffffff980eb4d0>] __lock_acquire+0x19e0/0x1bc0
[<ffffffff980ebdcb>] lock_acquire+0x11b/0x220
[<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs]
[<ffffffff9826bdd0>] __sb_start_write+0x130/0x200
[<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs]
[<ffffffffc08c7c3b>] f2fs_drop_inode+0x9b/0x160 [f2fs]
[<ffffffff98289991>] iput+0x171/0x2c0
[<ffffffffc08cfccf>] f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
[<ffffffffc08cfe04>] block_operations+0x84/0x110 [f2fs]
[<ffffffffc08cff78>] write_checkpoint+0xe8/0xf20 [f2fs]
[<ffffffff980e979d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
[<ffffffff9803e9d9>] ? sched_clock+0x9/0x10
[<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
[<ffffffffc08c6df5>] f2fs_sync_fs+0x85/0x190 [f2fs]
[<ffffffff982a4f90>] ? do_fsync+0x70/0x70
[<ffffffff982a4f90>] ? do_fsync+0x70/0x70
[<ffffffff982a4fb0>] sync_fs_one_sb+0x20/0x30
[<ffffffff9826ca3e>] iterate_supers+0xae/0x100
[<ffffffff982a50b5>] sys_sync+0x55/0x90
[<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6

2. wait for sbi->gc_mutex by

[<ffffffff980ebdcb>] lock_acquire+0x11b/0x220
[<ffffffff989063d6>] mutex_lock_nested+0x76/0x3f0
[<ffffffffc08c6de9>] f2fs_sync_fs+0x79/0x190 [f2fs]
[<ffffffffc08c7a6c>] f2fs_freeze+0x1c/0x20 [f2fs]
[<ffffffff9826b6ef>] freeze_super+0xcf/0x190
[<ffffffff9827eebc>] do_vfs_ioctl+0x53c/0x6a0
[<ffffffff9827f099>] SyS_ioctl+0x79/0x90
[<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 178053e2 28-Oct-2016 Damien Le Moal <damien.lemoal@wdc.com>

f2fs: Cache zoned block devices zone type

With the zoned block device feature enabled, section discard
need to do a zone reset for sections contained in sequential
zones, and a regular discard (if supported) for sections
stored in conventional zones. Avoid the need for a costly
report zones to obtain a section zone type when discarding it
by caching the types of the device zones in the super block
information. This cache is initialized at mount time for mounts
with the zoned block device feature enabled.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3adc57e9 28-Oct-2016 Damien Le Moal <damien.lemoal@wdc.com>

f2fs: Do not allow adaptive mode for host-managed zoned block devices

The LFS mode is mandatory for host-managed zoned block devices as
update in place optimizations are not possible for segments in
sequential zones.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 96ba2dec 28-Oct-2016 Damien Le Moal <damien.lemoal@wdc.com>

f2fs: Always enable discard for zoned blocks devices

Zone write pointer reset acts as discard for zoned block
devices. So if the zoned block device feature is enabled,
always declare that discard is enabled, even if the device
does not actually support the command.
For the same reason, prevent the use the "nodicard" mount
option.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0ab02998 28-Oct-2016 Damien Le Moal <damien.lemoal@wdc.com>

f2fs: Suppress discard warning message for zoned block devices

For zoned block devices, discard is replaced by zone reset. So
do not warn if the device does not supports discard.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d1b959c8 28-Oct-2016 Damien Le Moal <damien.lemoal@wdc.com>

f2fs: Check zoned block feature for host-managed zoned block devices

The F2FS_FEATURE_BLKZONED feature indicates that the drive was formatted
with zone alignment optimization. This is optional for host-aware
devices, but mandatory for host-managed zoned block devices.
So check that the feature is set in this latter case.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0bfd7a09 28-Oct-2016 Damien Le Moal <damien.lemoal@wdc.com>

f2fs: Use generic zoned block device terminology

SMR stands for "Shingled Magnetic Recording" which makes sense
only for hard disk drives (spinning rust). The ZBC/ZAC standards
enable management of SMR disks, but solid state drives may also
support those standards. So rename the HMSMR feature to BLKZONED
to avoid a HDD centric terminology. For the same reason, rename
f2fs_sb_mounted_hmsmr to f2fs_sb_mounted_blkzoned.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 487df616 28-Oct-2016 Damien Le Moal <damien.lemoal@wdc.com>

f2fs: Add missing break in switch-case

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 09922800 31-Oct-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid infinite loop in the EIO case on recover_orphan_inodes

This patch should fix an infinite loop case below.

F2FS-fs : inject IO error in f2fs_read_end_io+0xf3/0x120 [f2fs]
F2FS-fs (nvme0n1p1): recover_orphan_inode: orphan failed (ino=39ac1a), run fsck to fix.
...
[<ffffffffc0b11ede>] sync_meta_pages+0xae/0x270 [f2fs]
[<ffffffffc0b288dd>] ? flush_sit_entries+0x8d/0x960 [f2fs]
[<ffffffffc0b13801>] write_checkpoint+0x361/0xf20 [f2fs]
[<ffffffffb40e979d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc0b0a199>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
[<ffffffffc0b0a1a5>] f2fs_sync_fs+0x85/0x190 [f2fs]
[<ffffffffc0b2560e>] f2fs_balance_fs_bg+0x7e/0x1c0 [f2fs]
[<ffffffffc0b216c4>] f2fs_write_node_pages+0x34/0x320 [f2fs]
[<ffffffffb41dff21>] do_writepages+0x21/0x30
[<ffffffffb429edb1>] __writeback_single_inode+0x61/0x760
[<ffffffffb490a937>] ? _raw_spin_unlock+0x27/0x40
[<ffffffffb42a0805>] writeback_single_inode+0xd5/0x190
[<ffffffffb42a0959>] write_inode_now+0x99/0xc0
[<ffffffffb4289a16>] iput+0x1f6/0x2c0
[<ffffffffc0b0e3be>] f2fs_fill_super+0xe0e/0x1300 [f2fs]
[<ffffffffb426c394>] ? sget_userns+0x4f4/0x530
[<ffffffffb426c692>] mount_bdev+0x182/0x1b0
[<ffffffffc0b0d5b0>] ? f2fs_commit_super+0x100/0x100 [f2fs]
[<ffffffffc0b0a375>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffffb426d038>] mount_fs+0x38/0x170
[<ffffffffb428ec9b>] vfs_kern_mount+0x6b/0x160
[<ffffffffb4291d9e>] do_mount+0x1be/0xd60
[<ffffffffb4291a57>] ? copy_mount_options+0xb7/0x220
[<ffffffffb4292c54>] SyS_mount+0x94/0xd0
[<ffffffffb490b345>] entry_SYSCALL_64_fastpath+0x23/0xc6

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 35782b23 20-Oct-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove percpu_count due to performance regression

This patch removes percpu_count usage due to performance regression in iozone.

Fixes: 523be8a6b3 ("f2fs: use percpu_counter for page counters")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7c45729a 14-Oct-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: keep dirty inodes selectively for checkpoint

This is to avoid no free segment bug during checkpoint caused by a number of
dirty inodes.

The case was reported by Chao like this.
1. mount with lazytime option
2. fill 4k file until disk is full
3. sync filesystem
4. read all files in the image
5. umount

In this case, we actually don't need to flush dirty inode to inode page during
checkpoint.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2dd15654 11-Oct-2016 Chao Yu <chao@kernel.org>

f2fs: fix to release discard entries during checkpoint

In f2fs_fill_super, if there is any IO error occurs during recovery,
cached discard entries will be leaked, in order to avoid this, make
write_checkpoint() handle memory release by itself, besides, move
clear_prefree_segments to write_checkpoint for readability.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 70fd7614 01-Nov-2016 Christoph Hellwig <hch@lst.de>

block,fs: use REQ_* flags directly

Remove the WRITE_* and READ_SYNC wrappers, and just use the flags
directly. Where applicable this also drops usage of the
bio_set_op_attrs wrapper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 0f348028 26-Sep-2016 Chao Yu <chao@kernel.org>

f2fs: support checkpoint error injection

This patch adds to support checkpoint error injection in f2fs for testing
fatal error tolerance, it will be useful that it can simulate abnormal
power off by f2fs itself instead of calling godown ioctl by running apps.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2443b8b3 26-Sep-2016 Chao Yu <chao@kernel.org>

f2fs: fix to recover old fault injection config in ->remount_fs

In ->remount_fs, we didn't recover original fault injection config if
we encounter error, fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 36dbd328 26-Sep-2016 Chao Yu <chao@kernel.org>

f2fs: do fault injection initialization in default_options

Do fault injection initialization in default_options to keep consistent
with other default option configurating.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1ecc0c5c 23-Sep-2016 Chao Yu <chao@kernel.org>

f2fs: support configuring fault injection per superblock

Previously, we only support global fault injection configuration, so that
when we configure type/rate of fault injection through sysfs, mount
option, it will influence all f2fs partition which is being used.

It is not make sence, since it will be not convenient if developer want
to test separated partitions with different fault injection rate/type
simultaneously, also it's not possible to enable fault injection in one
partition and disable fault injection in other one.

>From now on, we move global configuration of fault injection in module
into per-superblock, hence injection testing can be more flexible.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d32853de 22-Sep-2016 Chao Yu <chao@kernel.org>

f2fs: adjust display format of segment bit

Just adjust segment bit info printed in procfs.

Before:
1008 5|0 |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1009 3|183|0 0 61 20 20 0 0 21 80 c0 2 e4 e 54 0 21 21 17 a 44 d0 28 e4 50 40 30 8 0 2d 32 0 5 b0 80 1 43 2 8e f8 7b 2 25 93 bf e0 73 8e 9a 19 44 60 ff e4 cc e6 8e bf f9 ff 5 3d 31 3d 13
1010 3|1 |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

After:
1008 5|0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1009 4|434| ff 7d ff bf d9 3f ff e7 ff bf d7 bf ff bb be ff fb df f7 fb fa bf fb fe bb df dd ff fe ef ff fe ef e2 27 bf ab bf fb df fd bd bf fb db fc ff ff 3f ff ff bf ff 5f db 3f fb fb bf fb bf 4f ff ef
1010 4|422| ff bb fe ff ef d7 ee ff ff fc bf ef 7d eb ec fd fb 3f 97 7f ef ff af ff db ff ff 69 bf ff f6 e7 ff fb f7 7b fb df be ff ff ef f3 fe ff ff df fe f7 fa ff b7 77 be fe fb a9 7f 87 a2 ac c7 ff 75

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bb5dada7 23-Sep-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove dirty inode pages in error path

When getting EIO while handling orphan inodes, we can get some dirty node
pages. Then, f2fs_write_node_pages() called by iput(node_inode) will try
to flush node pages. But in this case, we should prevent to do that, since
we will try again from the start.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d41065e2 21-Sep-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: handle errors during recover_orphan_inodes

This patch fixes to handle EIO during recover_orphan_inode() given the below
panic.

F2FS-fs : inject IO error in f2fs_read_end_io+0xe6/0x100 [f2fs]
------------[ cut here ]------------
RIP: 0010:[<ffffffffc0b244e3>] [<ffffffffc0b244e3>] f2fs_evict_inode+0x433/0x470 [f2fs]
RSP: 0018:ffff92f8b7fb7c30 EFLAGS: 00010246
RAX: ffff92fb88a13500 RBX: ffff92f890566ea0 RCX: 00000000fd3c255c
RDX: 0000000000000001 RSI: ffff92fb88a13d90 RDI: ffff92fb8ee127e8
RBP: ffff92f8b7fb7c58 R08: 0000000000000001 R09: ffff92fb88a13d58
R10: 000000005a6a9373 R11: 0000000000000001 R12: 00000000fffffffb
R13: ffff92fb8ee12000 R14: 00000000000034ca R15: ffff92fb8ee12620
FS: 00007f1fefd8e880(0000) GS:ffff92fb95600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc211d34cdb CR3: 000000012d43a000 CR4: 00000000001406e0
Stack:
ffff92f890566ea0 ffff92f890567078 ffffffffc0b5a0c0 ffff92f890566f28
ffff92fb888b2000 ffff92f8b7fb7c80 ffffffffbc27ff55 ffff92f890566ea0
ffff92fb8bf10000 ffffffffc0b5a0c0 ffff92f8b7fb7cb0 ffffffffbc28090d
Call Trace:
[<ffffffffbc27ff55>] evict+0xc5/0x1a0
[<ffffffffbc28090d>] iput+0x1ad/0x2c0
[<ffffffffc0b3304c>] recover_orphan_inodes+0x10c/0x2e0 [f2fs]
[<ffffffffc0b2e0f4>] f2fs_fill_super+0x884/0x1150 [f2fs]
[<ffffffffbc2644ac>] mount_bdev+0x18c/0x1c0
[<ffffffffc0b2d870>] ? f2fs_commit_super+0x100/0x100 [f2fs]
[<ffffffffc0b2a755>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffffbc264e49>] mount_fs+0x39/0x170
[<ffffffffbc28555b>] vfs_kern_mount+0x6b/0x160
[<ffffffffbc2881df>] do_mount+0x1cf/0xd00
[<ffffffffbc287f2c>] ? copy_mount_options+0xac/0x170
[<ffffffffbc289003>] SyS_mount+0x83/0xd0
[<ffffffffbc8ee880>] entry_SYSCALL_64_fastpath+0x23/0xc1

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# aaec2b1d 19-Sep-2016 Chao Yu <chao@kernel.org>

f2fs: introduce cp_lock to protect updating of ckpt_flags

This patch introduces spinlock to protect updating process of ckpt_flags
field in struct f2fs_checkpoint, it avoids incorrectly updating in race
condition.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: add __is_set_ckpt_flags likewise __set_ckpt_flags]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a468f0ef 19-Sep-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use crc and cp version to determine roll-forward recovery

Previously, we used cp_version only to detect recoverable dnodes.
In order to avoid same garbage cp_version, we needed to truncate the next
dnode during checkpoint, resulting in additional discard or data write.
If we can distinguish this by using crc in addition to cp_version, we can
remove this overhead.

There is backward compatibility concern where it changes node_footer layout.
So, this patch introduces a new checkpoint flag, CP_CRC_RECOVERY_FLAG, to
detect new layout. New layout will be activated only when this flag is set.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 8b038c70 18-Sep-2016 Chao Yu <chao@kernel.org>

f2fs: support IO error injection

This patch adds to support IO error injection for testing IO error
tolerance of f2fs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 97c1794a 09-May-2016 Chao Yu <chao@kernel.org>

f2fs: enable inline_dentry by default and add noinline_dentry option

Make inline_dentry as default mount option to improve space usage and
IO performance in scenario of numerous small directory.
It adds noinline_dentry mount option, instead.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b873b798 04-Aug-2016 Jaegeuk Kim <jaegeuk@kernel.org>

Revert "f2fs: use percpu_rw_semaphore"

LKP reported -36.3% regression of fsmark.files_per_sec due to this patch.
I've confirmed that fxmark [1] has also slight regression for DWAL.

[1] https://github.com/sslab-gatech/fxmark

This reverts commit ec795418c41850056feb956534edf059dc1155d4.


# 82e0a5aa 12-Jul-2016 Chao Yu <chao@kernel.org>

f2fs: fix to avoid data update racing between GC and DIO

Datas in file can be operated by GC and DIO simultaneously, so we will
face race case as below:

For write case:
Thread A Thread B
- generic_file_direct_write
- invalidate_inode_pages2_range
- f2fs_direct_IO
- do_blockdev_direct_IO
- do_direct_IO
- get_more_blocks
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- do_write_data_page
migrate data block to new block address
- dio_bio_submit
update user data to old block address

For read case:
Thread A Thread B
- generic_file_direct_write
- invalidate_inode_pages2_range
- f2fs_direct_IO
- do_blockdev_direct_IO
- do_direct_IO
- get_more_blocks
- f2fs_balance_fs
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- do_write_data_page
migrate data block to new block address
- write_checkpoint
- do_checkpoint
- clear_prefree_segments
- f2fs_issue_discard
discard old block adress
- dio_bio_submit
update user buffer from obsolete block address

In order to fix this, for one file, we should let DIO and GC getting exclusion
against with each other.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b56ab837 30-Jun-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid mark_inode_dirty

Let's check inode's dirtiness before calling mark_inode_dirty.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3e6d0b4d 06-Jul-2016 Chao Yu <chao@kernel.org>

f2fs: fix incorrect f_bfree calculation in ->statfs

As manual described, f_bfree indicates total free blocks in fs, in f2fs, it
includes two parts: visible free blocks and over-provision blocks. This
patch corrrects the calculation.

fsblkcnt_t f_bfree; /* free blocks in fs */

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ec795418 30-Jun-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use percpu_rw_semaphore

This patch replaces rw_semaphore with percpu_rw_semaphore for:
sbi->cp_rwsem
nm_i->nat_tree_lock

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 64058be9 03-Jul-2016 Chao Yu <chao@kernel.org>

f2fs: add nodiscard mount option

This patch adds 'nodiscard' mount option.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 52763a4b 13-Jun-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: detect host-managed SMR by feature flag

If mkfs.f2fs gives a feature flag for host-managed SMR, we can set mode=lfs
by default.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 67c3758d 13-Jun-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: call update_inode_page for orphan inodes

Let's store orphan inode pages right away.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 36abef4e 03-Jun-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce mode=lfs mount option

This mount option is to enable original log-structured filesystem forcefully.
So, there should be no random writes for main area.

Especially, this supports host-managed SMR device.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7dfeaa32 04-Jun-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid reverse IO order for NODE and DATA

There is a data race between allocate_data_block() and f2fs_sbumit_page_mbio(),
which incur unnecessary reversed bio submission.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9a449e9c 02-Jun-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove obsolete parameter in f2fs_truncate

We don't need lock parameter, which is always true.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 338bbfa0 02-Jun-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid wrong count on dirty inodes

The number should be covered by spin_lock. Otherwise we can see wrong count
in f2fs_stat.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 53aa6bbf 25-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: inject to produce some orphan inodes

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b93f7712 20-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove writepages lock

This patch removes writepages lock.
We can improve multi-threading performance.

tiobench, 32 threads, 4KB write per fsync on SSD
Before: 25.88 MB/s
After: 28.03 MB/s

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 69e9e427 20-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: set flush_merge by default

This patch sets flush_merge by default.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6d94c74a 20-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add lazytime mount option

This patch adds lazytime support.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 26de9b11 20-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid unnecessary updating inode during fsync

If roll-forward recovery can recover i_size, we don't need to update inode's
metadata during fsync.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0f18b462 20-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: flush inode metadata when checkpoint is doing

This patch registers all the inodes which have dirty metadata to sync when
checkpoint is doing.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# fc9581c8 20-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce f2fs_i_size_write with mark_inode_dirty_sync

This patch introduces f2fs_i_size_write() to call mark_inode_dirty_sync() with
i_size_write().

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 91942321 20-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use inode pointer for {set, clear}_inode_flag

This patch refactors to use inode pointer for set_inode_flag and
clear_inode_flag.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 84c60b13 27-May-2016 Al Viro <viro@zeniv.linux.org.uk>

drop redundant ->owner initializations

it's not needed for file_operations of inodes located on fs defined
in the hosting module and for file_operations that go into procfs.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# b8bef79d 17-May-2016 Tiezhu Yang <yangtiezhu@loongson.cn>

f2fs: make exit_f2fs_fs more clear

init_f2fs_fs does:
1) f2fs_build_trace_ios
2) init_inodecache
3) create_node_manager_caches
4) create_segment_manager_caches
5) create_checkpoint_caches
6) create_extent_cache
7) kset_create_and_add
8) kobject_init_and_add
9) register_shrinker
10) register_filesystem
11) f2fs_create_root_stats
12) proc_mkdir

exit_f2fs_fs should do cleanup in the reverse order
to make the code more clear.

Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 513c5f37 16-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use percpu_counter for total_valid_inode_count

This patch uses percpu_counter to avoid stat_lock.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 41382ec4 16-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use percpu_counter for alloc_valid_block_count

This patch uses percpu_count for sbi->alloc_valid_block_count.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1beba1b3 13-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use percpu_counter for # of dirty pages in inode

This patch adds percpu_counter for # of dirty pages in inode.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 523be8a6 13-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use percpu_counter for page counters

This patch substitutes percpu_counter for atomic_counter when counting
various types of pages.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f5730184 17-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use bio count instead of F2FS_WRITEBACK page count

This can reduce page counting overhead.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 08796897 15-May-2016 Sheng Yong <shengyong1@huawei.com>

f2fs: add fault injection to sysfs

This patch introduces a new struct f2fs_fault_info and a global f2fs_fault
to save fault injection status. Fault injection entries are created in
/sys/fs/f2fs/fault_injection/ during initializing f2fs module.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 99e3e858 11-May-2016 Sheng Yong <shengyong1@huawei.com>

f2fs: correct return value type of f2fs_fill_super

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b5a7aef1 04-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

fscrypto/f2fs: allow fs-specific key prefix for fs encryption

This patch allows fscrypto to handle a second key prefix given by filesystem.
The main reason is to provide backward compatibility, since previously f2fs
used "f2fs:" as a crypto prefix instead of "fscrypt:".
Later, ext4 should also provide key_prefix() to give "ext4:".

One concern decribed by Ted would be kinda double check overhead of prefixes.
In x86, for example, validate_user_key consumes 8 ms after boot-up, which turns
out derive_key_aes() consumed most of the time to load specific crypto module.
After such the cold miss, it shows almost zero latencies, which treats as a
negligible overhead.
Note that request_key() detects wrong prefix in prior to derive_key_aes() even.

Cc: Ted Tso <tytso@mit.edu>
Cc: stable@vger.kernel.org # v4.6
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 74ef9241 02-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix leak of orphan inode objects

When unmounting filesystem, we should release all the ino entries.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# cb78942b 29-Apr-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: inject ENOSPC failures

This patch injects ENOSPC failures.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c41f3cc3 29-Apr-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: inject page allocation failures

This patch adds page allocation failures.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2c63fead 29-Apr-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: inject kmalloc failure

This patch injects kmalloc failure given a fault injection rate.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 73faec4d 29-Apr-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add mount option to select fault injection ratio

This patch adds a mount option to select fault ratio.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f00d6fa7 27-Apr-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add proc entry to show valid block bitmap

This patch adds a new proc entry to show segment information in more detail.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b7a15f3d 27-Apr-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce macros for proc entries

This adds macros to be used multiple proc entries.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# faa0e55b 24-Mar-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: treat as a normal umount when remounting ro

When user remounts f2fs as read-only, we can mark the checkpoint as umount.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6781eabb 23-Mar-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: give -EINVAL for norecovery and rw mount

Once detecting something to recover, f2fs should stop mounting, given norecovery
and rw mount options.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# df728b0f 23-Mar-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: recover superblock at RW remounts

This patch adds a sbi flag, SBI_NEED_SB_WRITE, which indicates it needs to
recover superblock when (re)mounting as RW. This is set only when f2fs is
mounted as RO.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f2353d7b 23-Mar-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: give RO message when recovering superblock

When one of superblocks is missing, f2fs recovers it with the valid one.
But, even if f2fs is mounted as RO, we'd better notify that too.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 09cbfeaf 01-Apr-2016 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b2dde6fc 29-Mar-2016 Shuoran Liu <liushuoran@huawei.com>

f2fs: retrieve IO write stat from the right place

In the following patch,

f2fs: split journal cache from curseg cache

journal cache is split from curseg cache. So IO write statistics should be
retrived from journal cache but not curseg->sum_blk. Otherwise, it will
get 0, and the stat is lost.

Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# fd694733 20-Mar-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: cover large section in sanity check of super

This patch fixes the bug which does not cover a large section case when checking
the sanity of superblock.
If f2fs detects misalignment, it will fix the superblock during the mount time,
so it doesn't need to trigger fsck.f2fs further.

Reported-by: Matthias Prager <linux@matthiasprager.de>
Reported-by: David Gnedt <david.gnedt@davizone.at>
Cc: stable 4.5+ <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 43b6573b 02-Mar-2016 Keith Mok <ek9852@gmail.com>

f2fs: use cryptoapi crc32 functions

The crc function is done bit by bit.
Optimize this by use cryptoapi
crc32 function which is backed by h/w acceleration.

Signed-off-by: Keith Mok <ek9852@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0b81d077 15-May-2015 Jaegeuk Kim <jaegeuk@kernel.org>

fs crypto: move per-file encryption from f2fs tree to fs/crypto

This patch adds the renamed functions moved from the f2fs crypto files.

1. definitions for per-file encryption used by ext4 and f2fs.

2. crypto.c for encrypt/decrypt functions
a. IO preparation:
- fscrypt_get_ctx / fscrypt_release_ctx
b. before IOs:
- fscrypt_encrypt_page
- fscrypt_decrypt_page
- fscrypt_zeroout_range
c. after IOs:
- fscrypt_decrypt_bio_pages
- fscrypt_pullback_bio_page
- fscrypt_restore_control_page

3. policy.c supporting context management.
a. For ioctls:
- fscrypt_process_policy
- fscrypt_get_policy
b. For context permission
- fscrypt_has_permitted_context
- fscrypt_inherit_context

4. keyinfo.c to handle permissions
- fscrypt_get_encryption_info
- fscrypt_free_encryption_info

5. fname.c to support filename encryption
a. general wrapper functions
- fscrypt_fname_disk_to_usr
- fscrypt_fname_usr_to_disk
- fscrypt_setup_filename
- fscrypt_free_filename

b. specific filename handling functions
- fscrypt_fname_alloc_buffer
- fscrypt_fname_free_buffer

6. Makefile and Kconfig

Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Uday Savagaonkar <savagaon@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 406657dd 24-Feb-2016 Chao Yu <chao@kernel.org>

f2fs: introduce f2fs_flush_merged_bios for cleanup

Add a new helper f2fs_flush_merged_bios to clean up redundant codes.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 41214b3c 22-Feb-2016 Chao Yu <chao@kernel.org>

f2fs: show more info about superblock recovery

This patch changes to show more info in message log about the recovery
of the corrupted superblock during ->mount, e.g. the index of corrupted
superblock and the result of recovery.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 984ec63c 16-Feb-2016 Shawn Lin <shawn.lin@rock-chips.com>

f2fs: move sanity checking of cp into get_valid_checkpoint

>From the function name of get_valid_checkpoint, it seems to return
the valid cp or NULL for caller to check. If no valid one is found,
f2fs_fill_super will print the err log. But if get_valid_checkpoint
get one valid(the return value indicate that it's valid, however actually
it is invalid after sanity checking), then print another similar err
log. That seems strange. Let's keep sanity checking inside the procedure
of geting valid cp. Another improvement we gained from this move is
that even the large volume is supported, we check the cp in advanced
to skip the following procedure if failing the sanity checking.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2b39e907 16-Feb-2016 Shawn Lin <shawn.lin@rock-chips.com>

f2fs: slightly reorganize read_raw_super_block

read_raw_super_block was introduced to help find the
first valid superblock. Commit da554e48caab ("f2fs:
recovering broken superblock during mount") changed the
behaviour to read both of them and check whether need
the recovery flag or not. So the comment before this
function isn't consistent with what it actually does.
Also, the origin code use two tags to round the err
cases, which isn't so readable. So this patch amend
the comment and slightly reorganize it.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# dfc08a12 14-Feb-2016 Chao Yu <chao@kernel.org>

f2fs: introduce f2fs_journal struct to wrap journal info

Introduce a new structure f2fs_journal to wrap journal info in struct
f2fs_summary_block for readability.

struct f2fs_journal {
union {
__le16 n_nats;
__le16 n_sits;
};
union {
struct nat_journal nat_j;
struct sit_journal sit_j;
struct f2fs_extra_info info;
};
} __packed;

struct f2fs_summary_block {
struct f2fs_summary entries[ENTRIES_IN_SUM];
struct f2fs_journal journal;
struct summary_footer footer;
} __packed;

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 29b96b54 05-Feb-2016 Chao Yu <chao@kernel.org>

f2fs: split drop_inmem_pages from commit_inmem_pages

Split drop_inmem_pages from commit_inmem_pages for code readability,
and prepare for the following modification.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 17c19120 29-Jan-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: flush bios to handle cp_error in put_super

Sometimes, if cp_error is set, there remains under-writeback pages, resulting in
kernel hang in put_super.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 8f1dbbbb 26-Jan-2016 Shuoran Liu <liushuoran@huawei.com>

f2fs: introduce lifetime write IO statistics

This patch introduces lifetime IO write statistics exposed to the sysfs interface.
The write IO amount is obtained from block layer, accumulated in the file system and
stored in the hot node summary of checkpoint.

Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Pengyang Hou <houpengyang@huawei.com>
[Jaegeuk Kim: add sysfs documentation]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2304cb0c 18-Jan-2016 Chao Yu <chao@kernel.org>

f2fs: export dirty_nats_ratio in sysfs

This patch exports a new sysfs entry 'dirty_nat_ratio' to control threshold
of dirty nat entries, if current ratio exceeds configured threshold,
checkpoint will be triggered in f2fs_balance_fs_bg for flushing dirty nats.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5d097056 14-Jan-2016 Vladimir Davydov <vdavydov.dev@gmail.com>

kmemcg: account certain kmem allocations to memcg

Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg. For the list, see below:

- threadinfo
- task_struct
- task_delay_info
- pid
- cred
- mm_struct
- vm_area_struct and vm_region (nommu)
- anon_vma and anon_vma_chain
- signal_struct
- sighand_struct
- fs_struct
- files_struct
- fdtable and fdtable->full_fds_bits
- dentry and external_name
- inode for all filesystems. This is the most tedious part, because
most filesystems overwrite the alloc_inode method.

The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds. Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d0239e1b 08-Jan-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: detect idle time depending on user behavior

This patch adds last time that user requested filesystem operations.
This information is used to detect whether system is idle or not later.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6beceb54 08-Jan-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce time and interval facility

This patch adds time and interval arrays to store some timing variables.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 12719ae1 07-Jan-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid unnecessary f2fs_balance_fs calls

Only when node page is newly dirtied, it needs to check whether we need to do
f2fs_gc.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e0afc4d6 30-Dec-2015 Chao Yu <chao@kernel.org>

f2fs: introduce max_file_blocks in sbi

Introduce max_file_blocks in sbi to store max block index of file in f2fs,
it could be used to avoid unneeded calculation of max block index in
runtime.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix overflow of sbi->max_file_blocks]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 179448bf 28-Dec-2015 Yunlei He <heyunlei@huawei.com>

f2fs: add a max block check for get_data_block_bmap

This patch adds a max block check for get_data_block_bmap.

Trinity test program will send a block number as parameter into
ioctl_fibmap, which will be used in get_node_path(), when the block
number large than f2fs max blocks, it will trigger kernel bug.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com>
[Jaegeuk Kim: fix missing condition, pointed by Chao Yu]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 06d6f226 23-Dec-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: declare static function

The __f2fs_commit_super is static.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c34f42e2 23-Dec-2015 Chao Yu <chao@kernel.org>

f2fs: report error of do_checkpoint

do_checkpoint and write_checkpoint can fail due to reasons like triggering
in a readonly fs or encountering IO error of storage device.

So it's better to report such error info to user, let user be aware of
failure of doing checkpoint.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 343f40f0 15-Dec-2015 Chao Yu <chao@kernel.org>

f2fs: introduce new option for controlling data flush

Add a new option 'data_flush' to enable data flush functionality.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c227f912 15-Dec-2015 Chao Yu <chao@kernel.org>

f2fs: record dirty status of regular/symlink inode

Maintain regular/symlink inode which has dirty pages in global dirty list
and record their total dirty pages count like the way of handling directory
inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b3980910 15-Dec-2015 Chao Yu <chao@kernel.org>

f2fs: introduce __f2fs_commit_super

Introduce __f2fs_commit_super to include duplicated codes in
f2fs_commit_super for cleanup.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e8240f65 15-Dec-2015 Chao Yu <chao@kernel.org>

f2fs: don't grab super block buffer header all the time

We have already got one copy of valid super block in memory, do not grab
buffer header of super block all the time.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b39f0de2 15-Dec-2015 Yunlei He <heyunlei@huawei.com>

f2fs: backup raw_super in sbi

f2fs use fields of f2fs_super_block struct directly in a grabbed buffer.

Once the buffer happen to be destroyed (e.g. through dd), it may bring
in unpredictable effect on f2fs.

This patch fixes to allocate additional buffer to store datas of super
block rather than using grabbed block buffer directly.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2710fd7e 14-Dec-2015 Chao Yu <chao@kernel.org>

f2fs: introduce dirty list node in inode info

Add a new dirt list node member in inode info for linking the inode to
global dirty list in superblock, instead of old implementation which
allocate slab cache memory as an entry to inode.

It avoids memory pressure due to slab cache allocation, and also makes
codes more clean.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a49324f1 14-Dec-2015 Chao Yu <chao@kernel.org>

f2fs: rename {add,remove,release}_dirty_inode to {add,remove,release}_ino_entry

remove_dirty_dir_inode will be renamed to remove_dirty_inode as a generic
function in following patch for removing directory/regular/symlink inode
in global dirty list.

Here rename ino management related functions for readability, also in
order to avoid name conflict.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9a59b62f 14-Dec-2015 Chao Yu <chao@kernel.org>

f2fs: do more integrity verification for superblock

Do more sanity check for superblock during ->mount.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5d909cdb 07-Dec-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: refactor f2fs_commit_super

Previously, f2fs_commit_super hacks the bh->blocknr to write the broken
alternate superblock.
Instead of it, we should use the correct logic to retrieve its buffer head
with locking it appropriately.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 29ba108d 16-Nov-2015 Chao Yu <chao@kernel.org>

f2fs: fix memory leak of kobject in error path of fill_super

f2fs_sb_info::s_kobj should be released in error path of fill_super,
otherwise it will lead to memory leak.

This bug was found by kmemleak:

dmesg:
kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8800838dc358 (size 8):
comm "mount", pid 4154, jiffies 4297482839 (age 1911.412s)
hex dump (first 8 bytes):
7a 72 61 6d 31 00 ff ff zram1...
backtrace:
[<ffffffff817a3668>] kmemleak_alloc+0x28/0x50
[<ffffffff811dc99f>] __kmalloc_track_caller+0xef/0x1c0
[<ffffffff8119d1c5>] kstrdup+0x45/0x80
[<ffffffff8119d228>] kstrdup_const+0x28/0x30
[<ffffffff813d2333>] kvasprintf_const+0x63/0xa0
[<ffffffff813c59fc>] kobject_set_name_vargs+0x3c/0xa0
[<ffffffff813c5a85>] kobject_add_varg+0x25/0x60
[<ffffffff813c5b13>] kobject_init_and_add+0x53/0x70
[<ffffffffa07ced19>] f2fs_fill_super+0x9d9/0xc40 [f2fs]
[<ffffffff811ff0b2>] mount_bdev+0x192/0x1d0
[<ffffffffa07cd3e5>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffff811fec93>] mount_fs+0x43/0x170
[<ffffffff81220256>] vfs_kern_mount+0x76/0x160
[<ffffffff812211c8>] do_mount+0x258/0xdc0
[<ffffffff81221dab>] SyS_mount+0x7b/0xc0
[<ffffffff817aecd7>] entry_SYSCALL_64_fastpath+0x12/0x6f
...

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 787c7b8c 28-Oct-2015 Chao Yu <chao@kernel.org>

f2fs: report error of f2fs_create_root_stats

f2fs_create_root_stats can fail due to no memory, report it to user.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ea1a29a0 12-Oct-2015 Chao Yu <chao@kernel.org>

f2fs: export ra_nid_pages to sysfs

After finishing building free nid cache, we will try to readahead
asynchronously 4 more pages for the next reloading, the count of
readahead nid pages is fixed.

In some case, like SMR drive, read less sectors with fixed count
each time we trigger RA may be low efficient, since we will face
high seeking overhead, so we'd better let user to configure this
parameter from sysfs in specific workload.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 60b99b48 05-Oct-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce a periodic checkpoint flow

This patch introduces a periodic checkpoint feature.
Note that, this is not enforcing to conduct checkpoints very strictly in terms
of trigger timing, instead just hope to help user experiences.
The default value is 60 seconds.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6aefd93b 05-Oct-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce background_gc=sync mount option

This patch introduce background_gc=sync enabling synchronous cleaning in
background.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9cd81ce3 18-Sep-2015 Chao Yu <chao@kernel.org>

f2fs: disallow switch extent_cache option dynamically

Swith extent_cache option dynamically when remount may casue consistency
issue between extent cache and dnode page. Fix in this patch to avoid
that condition.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 01a5ad82 31-Aug-2015 Yunlei He <heyunlei@huawei.com>

f2fs: upset segment_info repair

upset segment_info like this:

276000|161 0|0 4|70 3|0 3|0 0|0 0|91 4|0 4|232 4|39
276104|0 4|0 4|1 4|0 4|0 4|280 4|0 4|42 4|262 4|38
276204|179 4|89 4|39 4|24 4|0 4|96 4|3 4|428 4|0 4|118
276304|112 4|97 4|0 4|0 4|0 4|68 4|0 4|0 4|86 4|138
276404|0 4|0 0|166 5|39 4|101 0|111

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 315df839 11-Aug-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do not write any node pages related to orphan inodes

We should not write node pages when deleting orphan inodes.
In order to do that, we can eaisly set POR_DOING flag earlier before entering
orphan inode routine.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 8c14bfad 07-Aug-2015 Chao Yu <chao@kernel.org>

f2fs: handle error of f2fs_iget correctly

In recover_orphan_inode, whenever f2fs_iget fail, we will make kernel panic,
but it's not reasonable, because f2fs_iget can fail due to a lot of reasons
including out of memory.

So we change error handling method as below:
a) when finding no entry for the orphan inode, bug_on for catching bugs;
b) for other reasons, report it to caller.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# decd36b6 07-Aug-2015 Chao Yu <chao@kernel.org>

f2fs: remove inmem radix tree

Previously, we use radix tree to index all registered page entries for
atomic file, but now we only use radix tree to see whether current page
is indexed or not, since the other user of radix tree is gone in commit
042b7816aaeb ("f2fs: remove unnecessary call to invalidate inmemory pages").

So in this patch, we try to use one more efficient way:
Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private
value to indicate page indexing status. By using this way, we can save
memory and lookup time.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 55f57d2c 16-Jul-2015 Chao Yu <chao@kernel.org>

f2fs: fix double lock in handle_failed_inode

In handle_failed_inode, there is a potential deadlock which can happen
in below call path:

- f2fs_create
- f2fs_lock_op down_read(cp_rwsem)
- f2fs_add_link
- __f2fs_add_link
- init_inode_metadata
- f2fs_init_security failed
- truncate_blocks failed
- handle_failed_inode
- f2fs_truncate
- truncate_blocks(..,true)
- write_checkpoint
- block_operations
- f2fs_lock_all down_write(cp_rwsem)
- f2fs_lock_op down_read(cp_rwsem)

So in this path, we pass parameter to f2fs_truncate to make sure
cp_rwsem in truncate_blocks will not be locked again.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3e72f721 19-Jun-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use extent_cache by default

We don't need to handle the duplicate extent information.

The integrated rule is:
- update on-disk extent with largest one tracked by in-memory extent_cache
- destroy extent_tree for the truncation case
- drop per-inode extent_cache by shrinker

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7daaea25 25-Jun-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add noextent_cache mount option

This patch adds noextent_cache mount option.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2658e50d 19-Jun-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce a shrinker for mounted fs

This patch introduces a shrinker targeting to reduce memory footprint consumed
by a number of in-memory f2fs data structures.

In addition, it newly adds:
- sbi->umount_mutex to avoid data races on shrinker and put_super
- sbi->shruinker_run_no to not revisit objects

Note that the basic implementation was copied from fs/ubifs/shrinker.c

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# eca616f8 15-Jun-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid freed stat information

The write_checkpoint can update stat information, so we should destroy the stat
structure after it.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c5bda1c8 07-Jun-2015 Chao Yu <chao@kernel.org>

f2fs: skip committing valid superblock

In recovery procedure for superblock, we try to write data of valid
superblock into invalid one for recovery, work should be finished here,
but then still we will write the valid one with its original data.
This operation is not needed. Let's skip doing this unnecessary work.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 09d54cdd 07-Jun-2015 Chao Yu <chao@kernel.org>

f2fs: setting discard option in parse_options()

For the first mount of f2fs image with realtime discard option, we will
disable discard option if device is not supported, but for remount
operation, our discard option can still be set, this should be avoided.

This patch moves configuring of discard option to parse_options() to fix
this issue.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 96c6dd59 30-May-2015 Chenxi Mao <chenxi.mao2013@gmail.com>

f2fs: disable the discard option when device doesn't support

Current f2fs check the whether the blk device can support discard.
However, the code will cause the discard option cannot be enabled.
Because the clear_opt(sbi, DISCARD) will be invoked forever.

This patch can fix this issue.

Jaegeuk Kim:
The original patch was intended to disable the discard option when device
does not support trim command.
Rather than remaining the buggy patch, let's replace with this patch as
an integrated one.

Signed-off-by: Chenxi Mao <chenxi.mao2013@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 26bf3dc7 19-May-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs crypto: use per-inode tfm structure

This patch applies the following ext4 patch:

ext4 crypto: use per-inode tfm structure

As suggested by Herbert Xu, we shouldn't allocate a new tfm each time
we read or write a page. Instead we can use a single tfm hanging off
the inode's crypt_info structure for all of our encryption needs for
that inode, since the tfm can be used by multiple crypto requests in
parallel.

Also use cmpxchg() to avoid races that could result in crypt_info
structure getting doubly allocated or doubly freed.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# da554e48 21-May-2015 hujianyang <hujianyang@huawei.com>

f2fs: recovering broken superblock during mount

This patch recovers a broken superblock with the other valid one.

Signed-off-by: hujianyang <hujianyang@huawei.com>
[Jaegeuk Kim: reinitialize local variables in f2fs_fill_super for retrial]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# cfc4d971 15-May-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs crypto: split f2fs_crypto_init/exit with two parts

This patch splits f2fs_crypto_init/exit with two parts: base initialization and
memory allocation.

Firstly, f2fs module declares the base encryption memory pointers.
Then, allocating internal memories is done at the first encrypted inode access.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 06e1bc05 13-May-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: truncate data blocks for orphan inode

As Hu reported, F2FS has a space leak problem, when conducting:

1) format a 4GB f2fs partition
2) dd a 3G file,
3) unlink it.

So, when doing f2fs_drop_inode(), we need to truncate data blocks
before skipping it.
We can also drop unused caches assigned to each inode.

Reported-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 498c5e9f 07-May-2015 Yunlei He <heyunlei@huawei.com>

f2fs: add default mount options to remount

I use f2fs filesystem with /data partition on my Android phone
by the default mount options. When I remount /data in order to
adding discard option to run some benchmarks, I find the default
options such as background_gc, user_xattr and acl turned off.

So I introduce a function named default_options in super.c. It do
some default setting, and both mount and remount operations will
call this function to complete default setting.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 57e5055b 20-Apr-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs crypto: add f2fs encryption facilities

Most of parts were copied from ext4, except:

- add f2fs_restore_and_release_control_page which returns control page and
restore control page
- remove ext4_encrypted_zeroout()
- remove sbi->s_file_encryption_mode & sbi->s_dir_encryption_mode
- add f2fs_end_io_crypto_work for mpage_end_io

Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 05ca3632 23-Apr-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add sbi and page pointer in f2fs_io_info

This patch adds f2fs_sb_info and page pointers in f2fs_io_info structure.
With this change, we can reduce a lot of parameters for IO functions.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 26d815ad 20-Apr-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce f2fs_commit_super

This patch introduces f2fs_commit_super to write updated superblock.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5463e7c1 21-Apr-2015 Jaegeuk Kim <jaegeuk@kernel.org>

Revert "f2fs: enhance multi-threads performance"

This reports performance regression by Yuanhan Liu.
The basic idea was to reduce one-point mutex, but it turns out this causes
another contention like context swithes.

https://lkml.org/lkml/2015/4/21/11

Until finishing the analysis on this issue, I'd like to revert this for a while.

This reverts commit 78373b7319abdf15050af5b1632c4c8b8b398f33.


# 9df47ba7 13-Apr-2015 Taehee Yoo <ap420073@gmail.com>

f2fs: change 0 to false for bool type

in the f2fs_fill_super function, variable "retry" is bool type
i think that it should be set as false.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 75342797 23-Mar-2015 Wanpeng Li <wanpeng.li@linux.intel.com>

f2fs: enable inline data by default

Enable inline_data feature by default since it brings us better
performance and space utilization and now has already stable.
Add another option noinline_data to disable it during mount.

Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Suggested-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 78373b73 13-Mar-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: enhance multi-threads performance

Previously, f2fs_write_data_pages has a mutex, sbi->writepages, to serialize
data writes to maximize write bandwidth, while sacrificing multi-threads
performance.
Practically, however, multi-threads environment is much more important for
users. So this patch tries to remove the mutex.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2adc3505 16-Mar-2015 Chao Yu <chao@kernel.org>

f2fs: set SBI_NEED_FSCK when encountering exception in recovery

This patch tries to set SBI_NEED_FSCK flag into sbi only when we fail to recover
in fill_super, so we could skip fscking image when we fail to fill super for
other reason.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# fdf6c8be 06-Mar-2015 Wanpeng Li <wanpeng.li@linux.intel.com>

f2fs: fix extent cache memory leak

extent tree/node slab cache is created during f2fs insmod,
how, it isn't destroyed during f2fs rmmod, this patch fix
it by destroy extent tree/node slab cache once rmmod f2fs.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1dcc336b 05-Feb-2015 Chao Yu <chao@kernel.org>

f2fs: enable rb-tree extent cache

This patch enables rb-tree based extent cache in f2fs.

When we mount with "-o extent_cache", f2fs will try to add recently accessed
page-block mappings into rb-tree based extent cache as much as possible, instead
of original one extent info cache.

By this way, f2fs can support more effective cache between dnode page cache and
disk. It will supply high hit ratio in the cache with fewer memory when dnode
page cache are reclaimed in environment of low memory.

Storage: Sandisk sd card 64g
1.append write file (offset: 0, size: 128M);
2.override write file (offset: 2M, size: 1M);
3.override write file (offset: 4M, size: 1M);
...
4.override write file (offset: 48M, size: 1M);
...
5.override write file (offset: 112M, size: 1M);
6.sync
7.echo 3 > /proc/sys/vm/drop_caches
8.read file (size:128M, unit: 4k, count: 32768)
(time dd if=/mnt/f2fs/128m bs=4k count=32768)

Extent Hit Ratio:
before patched
Hit Ratio 121 / 1071 1071 / 1071

Performance:
before patched
real 0m37.051s 0m35.556s
user 0m0.040s 0m0.026s
sys 0m2.990s 0m2.251s

Memory Cost:
before patched
Tree Count: 0 1 (size: 24 bytes)
Node Count: 0 45 (size: 1440 bytes)

v3:
o retest and given more details of test result.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 89672159 05-Feb-2015 Chao Yu <chao@kernel.org>

f2fs: add a mount option for rb-tree extent cache

This patch adds a mount option 'extent_cache' in f2fs.

It is try to use a rb-tree based extent cache to cache more mapping information
with less memory if this option is set, otherwise we will use the original one
extent info cache.

Suggested-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0c872e2d 05-Feb-2015 Chao Yu <chao@kernel.org>

f2fs: move ext_lock out of struct extent_info

Move ext_lock out of struct extent_info, then in the following patches we can
use variables with struct extent_info type as a parameter to pass pure data.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bba681cb 26-Jan-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce a batched trim

This patch introduces a batched trimming feature, which submits split discard
commands.

This is to avoid long latency due to huge trim commands.
If fstrim was triggered ranging from 0 to the end of device, we should lock
all the checkpoint-related mutexes, resulting in very long latency.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 081d78c2 23-Jan-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: should fail mount when trying to recover data on read-only dev

If device is read-only, we should not proceed data recovery.
But, if the previous checkpoint was done by normal clean shutdown, it's safe to
proceed the recovery, since there will be no data to be recovered.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 119ee914 29-Jan-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: split UMOUNT and FASTBOOT flags

This patch adds FASTBOOT flag into checkpoint as follows.

- CP_UMOUNT_FLAG is set when system is umounted.
- CP_FASTBOOT_FLAG is set when intermediate checkpoint having node summaries
was done.

So, if you get CP_UMOUNT_FLAG from checkpoint, the system was umounted cleanly.
Instead, if there was sudden-power-off, you can get CP_FASTBOOT_FLAG or nothing.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2d834bf9 23-Jan-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: support norecovery mount option

This patch adds a mount option, norecovery, which is mostly same as
disable_roll_forward. The only difference is that norecovery should be activated
with read-only mount option.

This can be used when user wants to check whether f2fs is mountable or not
without any recovery process. (e.g., xfstests/200)

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# dabc4a5c 23-Jan-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix not to drop mount options when retrying fill_super

If wrong mount option was requested, f2fs tries to fill_super again.
But, during the next trial, f2fs has no valid mount options, since
parse_options deleted all the separators in the original string.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# caf0047e 28-Jan-2015 Chao Yu <chao@kernel.org>

f2fs: merge flags in struct f2fs_sb_info

Currently, there are several variables with Boolean type as below:

struct f2fs_sb_info {
...
int s_dirty;
bool need_fsck;
bool s_closing;
...
bool por_doing;
...
}

For this there are some issues:
1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean
type variables by compiler.
2. if we continuously add new flag into f2fs_sb_info, structure will be messed
up.

So in this patch, we try to:
1. switch s_dirty to Boolean type variable since it has two status 0/1.
2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag.
3. introduce an enum type which can indicate different states of sbi.
4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag
to operate flags for sbi.

After that, above issues will be fixed.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 85dc2f2c 14-Jan-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do checkpoint when umount flag is not set

If the previous checkpoint was done without CP_UMOUNT flag, it needs to do
checkpoint with CP_UMOUNT for the next fast boot.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 30a5537f 14-Jan-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: trigger correct checkpoint during umount

This patch fixes to trigger checkpoint with umount flag when kill_sb was called.
In kill_sb, f2fs_sync_fs was finally called, but at this time, f2fs can't do
checkpoint with CP_UMOUNT.
After then, f2fs_put_super is not doing checkpoint, since it is not dirty.

So, this patch adds a flag to indicate f2fs_sync_fs is called during umount.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 351f4fba 07-Jan-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add f2fs_destroy_trace_ios to free radix tree

This patch removes radix tree after finishing tracing IOs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c0508650 07-Jan-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add spin_lock to cover radix operations in IO tracer

This patch adds spin_lock to cover radix tree operations in IO tracer.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 06292073 29-Dec-2014 Chao Yu <chao@kernel.org>

f2fs: reuse inode_entry_slab in gc procedure for using slab more effectively

There are two slab cache inode_entry_slab and winode_slab using the same
structure as below:

struct dir_inode_entry {
struct list_head list; /* list head */
struct inode *inode; /* vfs inode pointer */
};

struct inode_entry {
struct list_head list;
struct inode *inode;
};

It's a little waste that the two cache can not share their memory space for each
other.
So in this patch we remove one redundant winode_slab slab cache, then use more
universal name struct inode_entry as remaining data structure name of slab,
finally we reuse the inode_entry_slab to store dirty dir item and gc item for
more effective.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# db9f7c1a 17-Dec-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: activate f2fs_trace_ios

This patch activates f2fs_trace_ios.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# aba291b3 17-Nov-2014 Chao Yu <chao@kernel.org>

f2fs: remove unneeded check code with option in f2fs_remount

Because we have checked the contrary condition in case of "if" judgment, we do
not need to check the condition again in case of "else" judgment. Let's remove
it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6c029932 17-Nov-2014 Chao Yu <chao@kernel.org>

f2fs: avoid unable to restart gc thread in remount

In f2fs_remount, we will stop gc thread and set need_restart_gc as true when new
option is set without BG_GC, then if any error occurred in the following
procedure, we can restore to start the gc thread.
But after that, We will fail to restore gc thread in start_gc_thread as BG_GC is
not set in new option, so we'd better move this condition judgment out of
start_gc_thread to fix this issue.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d5053a34 30-Oct-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce -o fastboot for reducing booting time only

If a system wants to reduce the booting time as a top priority, now we can
use a mount option, -o fastboot.
With this option, f2fs conducts a little bit slow write_checkpoint, but
it can avoid the node page reads during the next mount time.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1730663c 20-Oct-2014 Gu Zheng <guz.fnst@cn.fujitsu.com>

f2fs: set raw_super default to NULL to avoid compile warning

Set raw_super default to NULL to avoid the possibly used
uninitialized warning, though we may never hit it in fact.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5efd3c6f 24-Sep-2014 Chao Yu <chao@kernel.org>

f2fs: add a new mount option for inline dir

Adds a new mount option 'inline_dentry' for inline dir.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 34ba94ba 09-Oct-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do not make dirty any inmemory pages

This patch let inmemory pages be clean all the time.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 88b88a66 06-Oct-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: support atomic writes

This patch introduces a very limited functionality for atomic write support.
In order to support atomic write, this patch adds two ioctls:
o F2FS_IOC_START_ATOMIC_WRITE
o F2FS_IOC_COMMIT_ATOMIC_WRITE

The database engine should be aware of the following sequence.
1. open
-> ioctl(F2FS_IOC_START_ATOMIC_WRITE);
2. writes
: all the written data will be treated as atomic pages.
3. commit
-> ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
: this flushes all the data blocks to the disk, which will be shown all or
nothing by f2fs recovery procedure.
4. repeat to #2.

The IO pattens should be:

,- START_ATOMIC_WRITE ,- COMMIT_ATOMIC_WRITE
CP | D D D D D D | FSYNC | D D D D | FSYNC ...
`- COMMIT_ATOMIC_WRITE

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4b2fecc8 20-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce FITRIM in f2fs_ioctl

This patch introduces FITRIM in f2fs_ioctl.
In this case, f2fs will issue small discards and prefree discards as many as
possible for the given area.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 75ab4cb8 20-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce cp_control structure

This patch add a new data structure to control checkpoint parameters.
Currently, it presents the reason of checkpoint such as is_umount and normal
sync.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 26666c8a 15-Sep-2014 Chao Yu <chao@kernel.org>

f2fs: fix to clean previous mount option when remount_fs

In manual of mount, we descript remount as below:

"mount -o remount,rw /dev/foo /dir
After this call all old mount options are replaced and arbitrary stuff from
fstab is ignored, except the loop= option which is internally generated and
maintained by the mount command."

Previously f2fs do not clear up old mount options when remount_fs, so we have no
chance of disabling previous option (e.g. flush_merge). Fix it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 55cf9cb6 15-Sep-2014 Chao Yu <chao@kernel.org>

f2fs: support large sector size

Block size in f2fs is 4096 bytes, so theoretically, f2fs can support 4096 bytes
sector device at maximum. But now f2fs only support 512 bytes size sector, so
block device such as zRAM which uses page cache as its block storage space will
not be mounted successfully as mismatch between sector size of zRAM and sector
size of f2fs supported.

In this patch we support large sector size in f2fs, so block device with sector
size of 512/1024/2048/4096 bytes can be supported in f2fs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c1ce1b02 10-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: give an option to enable in-place-updates during fsync to users

If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
only starts to try in-place-updates.
And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
keeps out-of-order manner. Otherwise, it triggers in-place-updates.

This may be used by storage showing very high random write performance.

For example, it can be used when,

Seq. writes (Data) + wait + Seq. writes (Node)

is pretty much slower than,

Rand. writes (Data)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a7ffdbe2 12-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: expand counting dirty pages in the inode page cache

Previously f2fs only counts dirty dentry pages, but there is no reason not to
expand the scope.

This patch changes the names on the management of dirty pages and to count
dirty pages in each inode info as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b0c44f05 02-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: need fsck.f2fs if the recovery was failed

If the roll-forward recovery was failed, we'd better conduct fsck.f2fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2ae4c673 02-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: retain inconsistency information to initiate fsck.f2fs

This patch adds sbi->need_fsck to conduct fsck.f2fs later.
This flag can only be removed by fsck.f2fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 922cedbd 28-Aug-2014 Dan Carpenter <dan.carpenter@oracle.com>

f2fs: simplify by using a literal

We can make the code a bit simpler because we know that "!retry" is
zero.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c200b1aa 20-Aug-2014 Chao Yu <chao@kernel.org>

f2fs: fix incorrect calculation with total/free inode num

Theoretically, our total inodes number is the same as total node number, but
there are three node ids are reserved in f2fs, they are 0, 1 (node nid), and 2
(meta nid), and they should never be used by user, so our total/free inode
number calculated in ->statfs is wrong.

This patch indroduces F2FS_RESERVED_NODE_NUM and then fixes this issue by
recalculating total/free inode number with the macro.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# cf779cab 11-Aug-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: handle EIO not to break fs consistency

There are two rules when EIO is occurred.
1. don't write any checkpoint data to preserve the previous checkpoint
2. don't lose the cached dentry/node/meta pages

So, at first, this patch adds set_page_dirty in f2fs_write_end_io's failure.
Then, writing checkpoint/dentry/node blocks is not allowed.

Note that, for the data pages, we can't just throw away by redirtying them.
Otherwise, kworker can fall into infinite loop to flush them.
(Ref. xfstests/019)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 8501017e 11-Aug-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: check s_dirty under cp_mutex

It needs to check s_dirty under cp_mutex, since s_dirty is reset under that
mutex.
And previous condition was not correct, since we can omit doing checkpoint
when checkpoint was done followed by all the node pages were written back.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1e968fdf 11-Aug-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce f2fs_cp_error for readability

This patch adds f2fs_cp_error for readability.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ed2e621a 08-Aug-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: give a chance to mount again when encountering errors

This patch gives another chance to try mount process when we encounter an error.
This makes an effect on the roll-forward recovery failures as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6f12ac25 19-Aug-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: trigger release_dirty_inode in f2fs_put_super

The generic_shutdown_super calls sync_filesystem, evict_inode, and then
f2fs_put_super. In f2fs_evict_inode, we remain some dirty inode information
so we should release them at f2fs_put_super.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 97c3c5ca 19-Aug-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: don't skip checkpoint if there is no dirty node pages

This is the errorneous scenario.
1. write data
2. do checkpoint
3. produce some dirty node pages by the gc thread
4. write back dirty node pages
5. f2fs_put_super will skip the checkpoint, since dirty count for node pages is
zero.

This patch removes such the wrong condition check.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e1c42045 06-Aug-2014 arter97 <qkrwngud825@gmail.com>

f2fs: fix typo

Fix typo and some grammatical errors.

The words "filesystem" and "readahead" are being used without the space treewide.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b3582c68 03-Jul-2014 Chao Yu <chao@kernel.org>

f2fs: reduce competition among node page writes

We do not need to block on ->node_write among different node page writers e.g.
fsync/flush, unless we have a node page writer from write_checkpoint.
So it's better use rw_semaphore instead of mutex type for ->node_write to
promote performance.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6451e041 25-Jul-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add infra for ino management

This patch changes the naming of orphan-related data structures to use as
inode numbers managed globally.
Later, we can use this facility for managing any inode number lists.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0f7b2abd 23-Jul-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add nobarrier mount option

This patch adds a mount option, nobarrier, in f2fs.
The assumption in here is that file system keeps the IO ordering, but
doesn't care about cache flushes inside the storages.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9d847950 24-Jul-2014 Chao Yu <chao@kernel.org>

f2fs: fix to put root inode in error path of fill_super

We should put root inode correctly in error path of fill_super, otherwise we
may encounter a leak case of inode resource.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6b2920a5 06-Jul-2014 Chao Yu <chao@kernel.org>

f2fs: use inner macro and function to clean up codes

In this patch we use below inner macro and function to clean up codes.
1. ADDRS_PER_PAGE
2. SM_I
3. f2fs_readonly

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d6b7d4b3 11-Jun-2014 Chao Yu <chao@kernel.org>

f2fs: check lower bound nid value in check_nid_range

This patch add lower bound verification for nid in check_nid_range, so nids
reserved like 0, node, meta passed by caller could be checked there.

And then check_nid_range could be used in f2fs_nfs_get_inode for simplifying
code.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2163d198 27-Apr-2014 Gu Zheng <guz.fnst@cn.fujitsu.com>

f2fs: introduce help function {create,destroy}_flush_cmd_control

Introduce help function {create,destroy}_flush_cmd_control to clean up
the create/destory flush merge operation.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# a688b9d9e 27-Apr-2014 Gu Zheng <guz.fnst@cn.fujitsu.com>

f2fs: introduce struct flush_cmd_control to wrap the flush_merge fields

Split the flush_merge fields from sm_i, and use the new struct flush_cmd_control
to wrap it, so that we can igonre these fileds if flush_merge is disable, and
it alse can the structs more neat.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 876dc59e 11-Apr-2014 Gu Zheng <guz.fnst@cn.fujitsu.com>

f2fs: add the flush_merge handle in the remount flow

Add the *remount* handle of flush_merge option, so that the users
can enable flush_merge in the runtime, such as the underlying device
handles the cache_flush command relatively slowly.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# b270ad6f 11-Apr-2014 Gu Zheng <guz.fnst@cn.fujitsu.com>

f2fs: enable flush_merge only in f2fs is not read-only

Enable flush_merge only in f2fs is not read-only, so does the mount
option show.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 6b4afdd7 02-Apr-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce f2fs_issue_flush to avoid redundant flush issue

Some storage devices show relatively high latencies to complete cache_flush
commands, even though their normal IO speed is prettry much high. In such
the case, it needs to merge cache_flush commands as much as possible to avoid
issuing them redundantly.
So, this patch introduces a mount option, "-o flush_merge", to mitigate such
the overhead.

If this option is enabled by user, F2FS merges the cache_flush commands and then
issues just one cache_flush on behalf of them. Once the single command is
finished, F2FS sends a completion signal to all the pending threads.

Note that, this option can be used under a workload consisting of very intensive
concurrent fsync calls, while the storage handles cache_flush commands slowly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# df0f8dc0 22-Mar-2014 Chao Yu <chao@kernel.org>

f2fs: avoid unnecessary bio submit when wait page writeback

This patch introduce is_merged_page() to check whether current page is merged
in f2fs bio cache. When page is not in cache, we can avoid submitting bio cache,
resulting in having more chance to merge pages.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# d928bfbf 20-Mar-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce fi->i_sem to protect fi's info

This patch introduces fi->i_sem to protect fi's info that includes xattr_ver,
pino, i_nlink.
This enables to remove i_mutex during f2fs_sync_file, resulting in performance
improvement when a number of fsync calls are triggered from many concurrent
threads.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# cdfc41c1 18-Mar-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: throttle the memory footprint with a sysfs entry

This patch introduces ram_thresh, a sysfs entry, which controls the memory
footprint used by the free nid list and the nat cache.

Previously, the free nid list was controlled by MAX_FREE_NIDS, while the nat
cache was managed by NM_WOUT_THRESHOLD.
However, this approach cannot be applied dynamically according to the system.

So, this patch adds ram_thresh that users can specify the threshold, which is
in order of 1 / 1024.
For example, if the total ram size is 4GB and the value is set to 10 by default,
f2fs tries to control the number of free nids and nat caches not to consume over
10 * (4GB / 1024) = 10MB.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 04c09388 17-Mar-2014 Chao Yu <chao@kernel.org>

f2fs: fix incorrect parsing with option string

Previously 'background_gc={on***,off***}' is being parsed as correct option,
with this patch we cloud fix the trivial bug in mount process.

Change log from v1:
o need to check length of parameter suggested by Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 90aa6dc9 16-Mar-2014 Chao Yu <chao@kernel.org>

f2fs: print type for each segment in segment_info's show

The original segment_info's show looks out-of-format:
cat /proc/fs/f2fs/loop0/segment_info
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 512
512 512 512 512 512 512 512 0 0 512
348 0 263 0 0 512 0 0 512 512
512 512 0 512 512 512 512 512 512 512
512 512 511 328 512 512 512 512 512 512
512 512 512 512 512 512 512 0 0 175

Let's fix this and show type for each segment.
cat /proc/fs/f2fs/loop0/segment_info
format: segment_type|valid_blocks
segment_type(0:HD, 1:WD, 2:CD, 3:HN, 4:WN, 5:CN)
0 2|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
10 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
20 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
30 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
40 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
50 3|0 3|0 3|0 3|0 3|0 3|0 3|0 0|0 3|0 3|0
60 3|0 3|0 3|0 3|0 3|0 3|0 3|0 3|0 3|0 3|512
70 3|512 3|512 3|512 3|512 3|512 3|512 3|512 3|0 3|0 3|512
80 3|0 3|0 3|0 3|0 3|0 3|512 3|0 3|0 3|512 3|512
90 3|512 0|512 3|274 0|512 0|512 0|512 0|512 0|512 0|512 3|512
100 3|512 0|512 3|511 0|328 3|512 0|512 0|512 3|512 0|512 0|512
110 0|512 0|512 0|512 0|512 0|512 0|512 0|512 5|0 4|0 3|512

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 02b9984d 13-Mar-2014 Theodore Ts'o <tytso@mit.edu>

fs: push sync_filesystem() down to the file system's remount_fs()

Previously, the no-op "mount -o mount /dev/xxx" operation when the
file system is already mounted read-write causes an implied,
unconditional syncfs(). This seems pretty stupid, and it's certainly
documented or guaraunteed to do this, nor is it particularly useful,
except in the case where the file system was mounted rw and is getting
remounted read-only.

However, it's possible that there might be some file systems that are
actually depending on this behavior. In most file systems, it's
probably fine to only call sync_filesystem() when transitioning from
read-write to read-only, and there are some file systems where this is
not needed at all (for example, for a pseudo-filesystem or something
like romfs).

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Jan Kara <jack@suse.cz>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Anders Larsen <al@alarsen.net>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: xfs@oss.sgi.com
Cc: linux-btrfs@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: codalist@coda.cs.cmu.edu
Cc: linux-ext4@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: fuse-devel@lists.sourceforge.net
Cc: cluster-devel@redhat.com
Cc: linux-mtd@lists.infradead.org
Cc: jfs-discussion@lists.sourceforge.net
Cc: linux-nfs@vger.kernel.org
Cc: linux-nilfs@vger.kernel.org
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: ocfs2-devel@oss.oracle.com
Cc: reiserfs-devel@vger.kernel.org


# 910bb12d 12-Mar-2014 Chao Yu <chao@kernel.org>

f2fs: check upper bound of ino value in f2fs_nfs_get_inode

Upper bound checking of ino should be added to f2fs_nfs_get_inode, so unneeded
process before do_read_inode in f2fs_iget could be avoided when ino is invalid.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 46c04366 07-Mar-2014 Gu Zheng <guz.fnst@cn.fujitsu.com>

f2fs: format segment_info's show for better legibility

The original segment_info's show is a bit out-of-format:

[root@guz Demoes]# cat /proc/fs/f2fs/loop0/segment_info
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
......
0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 [root@guz Demoes]#

so we fix it here for better legibility.
[root@guz Demoes]# cat /proc/fs/f2fs/loop0/segment_info
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
......
0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1
[root@guz Demoes]#

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# e8512d2e 07-Mar-2014 Gu Zheng <guz.fnst@cn.fujitsu.com>

f2fs: remove the unused ctor argument of f2fs_kmem_cache_create()

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# ab9fa662 27-Feb-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add an sysfs entry to control the directory level

This patch adds an sysfs entry to control dir_level used by the large directory.

The description of this entry is:

dir_level This parameter controls the directory level to
support large directory. If a directory has a
number of files, it can reduce the file lookup
latency by increasing this dir_level value.
Otherwise, it needs to decrease this value to
reduce the space overhead. The default value is 0.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 6437d1b0 19-Feb-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to do build_stat prior to the recovery procedure

At the end of the recovery procedure, write_checkpoint is called and updates
the cp count which is managed by f2fs stat.
But, previously build_stat() is called after the recovery procedure, which
results in:

BUG: unable to handle kernel NULL pointer dereference at 000000000000012c
IP: [<ffffffffa03b1030>] write_checkpoint+0x720/0xbc0 [f2fs]
Call Trace:
[<ffffffff810a6b44>] ? mark_held_locks+0x74/0x140
[<ffffffff8109a3e0>] ? __init_waitqueue_head+0x60/0x60
[<ffffffffa03bf036>] recover_fsync_data+0x656/0xf20 [f2fs]
[<ffffffff812ee3eb>] ? security_d_instantiate+0x1b/0x30
[<ffffffffa03aeb4d>] f2fs_fill_super+0x94d/0xa00 [f2fs]
[<ffffffff811a9825>] mount_bdev+0x1a5/0x1f0
[<ffffffff8114915e>] ? __get_free_pages+0xe/0x40
[<ffffffffa03ae200>] ? f2fs_remount+0x130/0x130 [f2fs]
[<ffffffffa03aa575>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffff811aa713>] mount_fs+0x43/0x1b0
[<ffffffff811c7124>] vfs_kern_mount+0x74/0x160
[<ffffffff811c5cb1>] ? __get_fs_type+0x51/0x60
[<ffffffff811c9727>] do_mount+0x237/0xb50
[<ffffffff811c936a>] ? copy_mount_options+0x3a/0x170

So, this patche changes the order of recovery_fsync_data() and
f2fs_build_stats().

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 6c311ec6 17-Jan-2014 Chris Fries <cfries@motorola.com>

f2fs: clean checkpatch warnings

Fixed a variety of trivial checkpatch warnings. The only delta should
be some minor formatting on log strings that were split / too long.

Signed-off-by: Chris Fries <cfries@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# b1c57c1c 07-Jan-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add a sysfs entry to control max_victim_search

Previously during SSR and GC, the maximum number of retrials to find a victim
segment was hard-coded by MAX_VICTIM_SEARCH, 4096 by default.

This number makes an effect on IO locality, when SSR mode is activated, which
results in performance fluctuation on some low-end devices.

If max_victim_search = 4, the victim will be searched like below.
("D" represents a dirty segment, and "*" indicates a selected victim segment.)

D1 D2 D3 D4 D5 D6 D7 D8 D9
[ * ]
[ * ]
[ * ]
[ ....]

This patch adds a sysfs entry to control the number dynamically through:
/sys/fs/f2fs/$dev/max_victim_search

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 216fbd64 06-Nov-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce sysfs entry to control in-place-update policy

This patch introduces new sysfs entries for users to control the policy of
in-place-updates, namely IPU, in f2fs.

Sometimes f2fs suffers from performance degradation due to its out-of-place
update policy that produces many additional node block writes.
If the storage performance is very dependant on the amount of data writes
instead of IO patterns, we'd better drop this out-of-place update policy.

This patch suggests 5 polcies and their triggering conditions as follows.

[sysfs entry name = ipu_policy]

0: F2FS_IPU_FORCE all the time,
1: F2FS_IPU_SSR if SSR mode is activated,
2: F2FS_IPU_UTIL if FS utilization is over threashold,
3: F2FS_IPU_SSR_UTIL if SSR mode is activated and FS utilization is over
threashold,
4: F2FS_IPU_DISABLE disable IPU. (=default option)

[sysfs entry name = min_ipu_util]

This parameter controls the threshold to trigger in-place-updates.
The number indicates percentage of the filesystem utilization, and used by
F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies.

For more details, see need_inplace_update() in segment.h.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 5dcd8a71 10-Dec-2013 Changman Lee <cm224.lee@samsung.com>

f2fs: missing kmem_cache_destroy for discard_entry

insmod f2fs.ko is failed after insmod and rmmod firstly.

$ sudo insmod fs/f2fs/f2fs.ko
insmod: error inserting 'fs/f2fs/f2fs.ko': -1 Cannot allocate memory

-- dmesg --
kmem_cache_sanity_check (free_nid): Cache name already exists.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 458e6197 10-Dec-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: refactor bio->rw handling

This patch introduces f2fs_io_info to mitigate the complex parameter list.

struct f2fs_io_info {
enum page_type type; /* contains DATA/NODE/META/META_FLUSH */
int rw; /* contains R/RS/W/WS */
int rw_flag; /* contains REQ_META/REQ_PRIO */
}

1. f2fs_write_data_pages
- DATA
- WRITE_SYNC is set when wbc->WB_SYNC_ALL.

2. sync_node_pages
- NODE
- WRITE_SYNC all the time

3. sync_meta_pages
- META
- WRITE_SYNC all the time
- REQ_META | REQ_PRIO all the time

** f2fs_submit_merged_bio() handles META_FLUSH.

4. ra_nat_pages, ra_sit_pages, ra_sum_pages
- META
- READ_SYNC

Cc: Fan Li <fanofcode.li@samsung.com>
Cc: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 6bacf52f 05-Dec-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add unlikely() macro for compiler more aggressively

This patch adds unlikely() macro into the most of codes.
The basic rule is to add that when:
- checking unusual errors,
- checking page mappings,
- and the other unlikely conditions.

Change log from v1:
- Don't add unlikely for the NULL test and error test: advised by Andi Kleen.

Cc: Chao Yu <chao2.yu@samsung.com>
Cc: Andi Kleen <andi@firstfloor.org>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# a0acdfe0 04-Dec-2013 Chao Yu <chao@kernel.org>

f2fs: use inner macro GFP_F2FS_ZERO for simplification

Use inner macro GFP_F2FS_ZERO to instead of GFP_NOFS | __GFP_ZERO for
simplification of code.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 8f99a946 28-Nov-2013 Chao Yu <chao@kernel.org>

f2fs: convert recover_orphan_inodes to void

The recover_orphan_inodes() returns no error all the time, so we don't need to
check its errors.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: add description]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 8274de77 10-Nov-2013 Huajun Li <huajun.li@intel.com>

f2fs: add a new mount option: inline_data

Add a mount option: inline_data. If the mount option is set,
data of New created small files can be stored in their inode.

Signed-off-by: Huajun Li <huajun.li@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Weihong Xu <weihong.xu@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 924b720b 19-Nov-2013 Chao Yu <chao@kernel.org>

f2fs: add a new function to support for merging contiguous read

For better read performance, we add a new function to support for merging
contiguous read as the one for write.

v1-->v2:
o add declarations here as Gu Zheng suggested.
o use new structure f2fs_bio_info introduced by Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Acked-by: Gu Zheng <guz.fnst@cn.fujitsu.com>


# 1ff7bd3b 18-Nov-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce a bio array for per-page write bios

The f2fs has three bio types, NODE, DATA, and META, and manages some data
structures per each bio types.

The codes are a little bit messy, thus, this patch introduces a bio array
which groups individual data structures as follows.

struct f2fs_bio_info {
struct bio *bio; /* bios to merge */
sector_t last_block_in_bio; /* last block number */
struct mutex io_mutex; /* mutex for bio */
};

struct f2fs_sb_info {
...
struct f2fs_bio_info write_io[NR_PAGE_TYPE]; /* for write bios */
...
};

The code changes from this new data structure are trivial.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 971767ca 18-Nov-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use sbi->write_mutex for write bios

This patch removes an unnecessary semaphore (i.e., sbi->bio_sem).
There is no reason to use the semaphore when f2fs submits read and write IOs.
Instead, let's use a write mutex and cover the sbi->bio[] by the lock.

Change log from v1:
o split write_mutex suggested by Chao Yu

Chao described,
"All DATA/NODE/META bio buffers in superblock is protected by
'sbi->write_mutex', but each bio buffer area is independent, So we
should split write_mutex to three for DATA/NODE/META."

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 7ac8c3b0 11-Nov-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add a sysfs entry to control max_discards

If frequent small discards are issued to the device, the performance would
be degraded significantly.
So, this patch adds a sysfs entry to control the number of discards to be
issued during a checkpoint procedure.

By default, f2fs does not issue any small discards, which means max_discards
is zero.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 7fd9e544 14-Nov-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add a slab cache entry for small discards

This patch adds a slab cache entry for small discards.

Each entry consists of:

struct discard_entry {
struct list_head list; /* list head */
block_t blkaddr; /* block address to be discarded */
int len; /* # of consecutive blocks of the discard */
};

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 1d15bd20 06-Nov-2013 Chao Yu <chao@kernel.org>

f2fs: fix memory leak after kobject init failed in fill_super

If we failed to init&add kobject when fill_super, stats info and proc object of
f2fs will not be released.
We should free them before we finish fill_super.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# fb51b5ef 06-Nov-2013 Changman Lee <cm224.lee@samsung.com>

f2fs: cleanup waiting routine for writeback pages in cp

use genernal method supported by kernel

o changes from v1
If any waiter exists at end io, wake up it.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# ea91e9b0 24-Oct-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add reclaiming control by sysfs

This patch adds a control method in sysfs to reclaim prefree segments.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# aabe5136 22-Oct-2013 Haicheng Li <haicheng.li@linux.intel.com>

f2fs: use bool for booleans

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 9076a75f 14-Oct-2013 Gu Zheng <guz.fnst@cn.fujitsu.com>

f2fs: introduce function read_raw_super_block()

Introduce function read_raw_super_block() to hide reading raw super block and
the retry routine if the first sb is invalid.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# b1838f89 09-Oct-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix the starvation problem on cp_rwsem

This patch removes the logic previously introduced to address the starvation
on cp_rwsem.

One potential there-in bug is that we should cover the wait.list with spin_lock,
but the previous code broke this rule.

And, actually current rwsem handles this starvation issue reasonably, so that we
didn't need to do this before neither.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 5887d291 07-Oct-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid unnecessary checkpoints

During the f2fs_put_super procedure, we don't need to conduct checkpoint all
the time, since we don't need to do that if superblock is clean.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 4058c511 06-Oct-2013 Kelly Anderson <kelly@xilka.com>

f2fs: handle remount options correctly

The current f2fs code errors if the xattr or acl options are passed when
remounting. This is important in a typical scenario where f2fs is mounted
as a "ro" root file-system by the boot loader and then the init process wants
to remount it "rw" with the "remount,rw" option.

Signed-off-by: Kelly Anderson <kelly@xilka.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# e479556b 27-Sep-2013 Gu Zheng <guz.fnst@cn.fujitsu.com>

f2fs: use rw_sem instead of fs_lock(locks mutex)

The fs_locks is used to block other ops(ex, recovery) when doing checkpoint.
And each other operate routine(besides checkpoint) needs to acquire a fs_lock,
there is a terrible problem here, if these are too many concurrency threads acquiring
fs_lock, so that they will block each other and may lead to some performance problem,
but this is not the phenomenon we want to see.
Though there are some optimization patches introduced to enhance the usage of fs_lock,
but the thorough solution is using a *rw_sem* to replace the fs_lock.
Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block
other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other,
this can avoid the problem described above completely.
Because of the weakness of rw_sem, the above change may introduce a potential problem
that the checkpoint thread might get starved if other threads are intensively locking
the read semaphore for I/O.(Pointed out by Xu Jin)
In order to avoid this, a wait_list is introduced, the appending read semaphore ops
will be dropped into the wait_list if checkpoint thread is waiting for write semaphore,
and will be waked up when checkpoint thread gives up write semaphore.
Thanks to Kim's previous review and test, and will be very glad to see other guys'
performance tests about this patch.

V2:
-fix the potential starvation problem.
-use more suitable func name suggested by Xu Jin.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: adjust minor coding standard]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# de93653f 12-Aug-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: reserve the xattr space dynamically

This patch enables the number of direct pointers inside on-disk inode block to
be changed dynamically according to the size of inline xattr space.

The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file
has inline xattr flag.

The number of direct pointers that will be used by inline xattrs is defined as
F2FS_INLINE_XATTR_ADDRS.
Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 444c580f 08-Aug-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add flags for inline xattrs

This patch adds basic inode flags for inline xattrs, F2FS_INLINE_XATTR,
and add a mount option, inline_xattr, which is enabled when xattr is set.

If the mount option is enabled, all the files are marked with the inline_xattrs
flag.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 6e6b978c 22-Aug-2013 Wei Yongjun <yongjun_wei@trendmicro.com.cn>

f2fs: fix error return code in init_f2fs_fs()

Fix to return -ENOMEM in the kset create and add error handling
case instead of 0, as done elsewhere in this function.

Introduced by commit b59d0bae6ca30c496f298881616258f9cde0d9c6.
(f2fs: add sysfs support for controlling the gc_thread)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Namjae Jeon <namjae.jeon@samsung.com>
[Jaegeuk Kim: merge the patch with previous modification]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 9890ff3f 20-Aug-2013 Zhao Hongjiang <zhaohongjiang@huawei.com>

f2fs: fix memory leak when init f2fs filesystem fail

When any of the caches create fails in init_f2fs_fs(), the other caches which are
create successful should be free.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 41dfde13 09-Aug-2013 Gu Zheng <guz.fnst@cn.fujitsu.com>

f2fs: clean up the needless end 'return' of void function

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# c2d715d1 08-Aug-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix a build failure due to missing the kobject header

This patch should resolve the following error reported by kbuild test robot.

All error/warnings:

In file included from fs/f2fs/dir.c:13:0:
>> fs/f2fs/f2fs.h:435:17: error: field 's_kobj' has incomplete type
struct kobject s_kobj;

The failure was caused by missing the kobject header file in dir.c.
So, this patch move the header file to the right location, f2fs.h.

CC: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# d2dc095f 04-Aug-2013 Namjae Jeon <namjae.jeon@samsung.com>

f2fs: add sysfs entries to select the gc policy

Add sysfs entry gc_idle to control the gc policy. Where
gc_idle = 1 corresponds to selecting a cost benefit approach,
while gc_idle = 2 corresponds to selecting a greedy approach
to garbage collection. The selection is mutually exclusive one
approach will work at any point. If gc_idle = 0, then this
option is disabled.

Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: change the select_gc_type() flow slightly]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# b59d0bae 04-Aug-2013 Namjae Jeon <namjae.jeon@samsung.com>

f2fs: add sysfs support for controlling the gc_thread

Add sysfs entries to control the timing parameters for
f2fs gc thread.

Various Sysfs options introduced are:
gc_min_sleep_time: Min Sleep time for GC in ms
gc_max_sleep_time: Max Sleep time for GC in ms
gc_no_gc_sleep_time: Default Sleep time for GC in ms

Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: fix an umount bug and some minor changes]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 5e176d54 27-Jun-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add proc entry to monitor current usage of segments

You can monitor valid block counts of whole segments in:
/proc/fs/f2fs/sdb1/segment_info.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 696c018c 15-Jun-2013 Namjae Jeon <namjae.jeon@samsung.com>

f2fs: add remount_fs callback support

Add the f2fs_remount function call which will be used
during the filesystem remounting. This function
will help us to change the mount options specific to
f2fs.

Also modify the f2fs background_gc mount option, which
will allow the user to dynamically trun on/off the
garbage collection in f2fs based on the background_gc
value. If background_gc=on, Garbage collection will
be turned off & if background_gc=off, Garbage collection
will be truned on.

By default the garbage collection is on in f2fs.

Change Log:
v2: Incorporated the review comments by Gu Zheng.
Removing the restore part for VFS flags
Updating comments with proper flag conditions
Display GC background option as ON/OFF
Revised conditions to stop GC in case of remount

v1: Initial changes for adding remount_fs callback
support.

Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: change /** with /* for the coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# b3783873 09-Jun-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid freqeunt write_inode calls

If update_inode is called, we don't need to do write_inode.
So, let's use a *dirty* flag for each inode.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 5fb08372 07-Jun-2013 Gu Zheng <guz.fnst@cn.fujitsu.com>

f2fs: set sb->s_fs_info before calling parse_options()

In f2fs_fill_super(), set sb->s_fs_info before calling parse_options(), then we can get
f2fs_sb_info via F2FS_SB(sb) in parse_options().
So that the second argument "sbi" of func parse_options() is no longer needed.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 77888c1e 20-May-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add f2fs_readonly()

Introduce a simple macro function for readability.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# bde582b2 02-May-2013 Chris Fries <C.Fries@motorola.com>

f2fs: continue to mount after failing recovery

When unable to roll forward the journal, we shouldn't bail out and
not mount, we should continue to attempt the mount. Bad recovery data
is likely unrecoverable at this point, and requiring the user to try
to mount again doesn't solve any issues.

Signed-off-by: Chris Fries <C.Fries@motorola.com>
Reviewed-by: Russell Knize <rknize2@motorola.com>
Reviewed-by: Jason Hrycay <jason.hrycay@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 531ad7d5 29-Apr-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid deadlock during evict after f2fs_gc

o Deadlock case #1

Thread 1:
- writeback_sb_inodes
- do_writepages
- f2fs_write_data_pages
- write_cache_pages
- f2fs_write_data_page
- f2fs_balance_fs
- wait mutex_lock(gc_mutex)

Thread 2:
- f2fs_balance_fs
- mutex_lock(gc_mutex)
- f2fs_gc
- f2fs_iget
- wait iget_locked(inode->i_lock)

Thread 3:
- do_unlinkat
- iput
- lock(inode->i_lock)
- evict
- inode_wait_for_writeback

o Deadlock case #2

Thread 1:
- __writeback_single_inode
: set I_SYNC
- do_writepages
- f2fs_write_data_page
- f2fs_balance_fs
- f2fs_gc
- iput
- evict
- inode_wait_for_writeback(I_SYNC)

In order to avoid this, even though iput is called with the zero-reference
count, we need to stop the eviction procedure if the inode is on writeback.
So this patch links f2fs_drop_inode which checks the I_SYNC flag.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# b743ba78 28-Apr-2013 Haicheng Li <haicheng.li@linux.intel.com>

f2fs: remove useless #include <linux/proc_fs.h> as we're now using sysfs as debug entry.

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# a2a4a7e4 19-Apr-2013 Namjae Jeon <namjae.jeon@samsung.com>

f2fs: add tracepoints for sync & inode operations

Add tracepoints in f2fs for tracing the syncing
operations like filesystem sync, file sync enter/exit.
It will helf to trace the code under debugging scenarios.

Also add tracepoints for tracing the various inode operations
like building inode, eviction of inode, link/unlike of
inodes.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 66348b72 11-Apr-2013 Wei Yongjun <yongjun_wei@trendmicro.com.cn>

f2fs: fix error return code in f2fs_fill_super()

Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.
Introduce by commit c0d39e(f2fs: fix return values from validate superblock)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 39936837 22-Nov-2012 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce a new global lock scheme

In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.

Reference the following lock types in f2fs.h.
enum lock_type {
RENAME, /* for renaming operations */
DENTRY_OPS, /* for directory operations */
DATA_WRITE, /* for data write */
DATA_NEW, /* for data allocation */
DATA_TRUNC, /* for data truncate */
NODE_NEW, /* for node allocation */
NODE_TRUNC, /* for node truncate */
NODE_WRITE, /* for node write */
NR_LOCK_TYPE,
};

In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.

In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.

For this, I propose a new global lock scheme as follows.

0. Data structure
- f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
- f2fs_sb_info -> node_write

1. mutex_lock_op(sbi)
- try to get an avaiable lock from the array.
- returns the index of the gottern lock variable.

2. mutex_unlock_op(sbi, index of the lock)
- unlock the given index of the lock.

3. mutex_lock_all(sbi)
- grab all the locks in the array before the checkpoint.

4. mutex_unlock_all(sbi)
- release all the locks in the array after checkpoint.

5. block_operations()
- call mutex_lock_all()
- sync_dirty_dir_inodes()
- grab node_write
- sync_node_pages()

Note that,
the pairs of mutex_lock_op()/mutex_unlock_op() and
mutex_lock_all()/mutex_unlock_all() should be used together.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# b7473754 31-Mar-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid race for summary information

In order to do GC more reliably, I'd like to lock the vicitm summary page
until its GC is completed, and also prevent any checkpoint process.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 5ec4e49f 30-Mar-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: change GC bitmaps to apply the section granularity

This patch removes a bitmap for victim segments selected by foreground GC, and
modifies the other bitmap for victim segments selected by background GC.

1) foreground GC bitmap
: We don't need to manage this, since we just only one previous victim section
number instead of the whole victim history.
The f2fs uses the victim section number in order not to allocate currently
GC'ed section to current active logs.

2) background GC bitmap
: This bitmap is used to avoid selecting victims repeatedly by background GCs.
In addition, the victims are able to be selected by foreground GCs, since
there is no need to read victim blocks during foreground GCs.

By the fact that the foreground GC reclaims segments in a section unit, it'd
be better to manage this bitmap based on the section granularity.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 6ead1142 20-Mar-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix the recovery flow to handle errors correctly

We should handle errors during the recovery flow correctly.
For example, if we get -ENOMEM, we should report a mount failure instead of
conducting the remained mount procedure.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 111d2495 18-Mar-2013 Masanari Iida <standby24x7@gmail.com>

f2fs: fix typo in comments

Correct spelling typo in comments

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# c0d39e65 17-Mar-2013 Namjae Jeon <namjae.jeon@samsung.com>

f2fs: fix return values from validate superblock

validate super block is not returning with proper values.
When failure from sb_bread it should reflect there is an EIO otherwise
it should return of EINVAL.
Returning, '1' is not conveying proper message as the return type.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# d3ee456d 17-Mar-2013 Namjae Jeon <namjae.jeon@samsung.com>

f2fs: notify when discard is not supported

Change f2fs so that a warning is emitted when an attempt is made to
mount a filesystem with the unsupported discard option.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 434720fa 18-Mar-2013 Masanari Iida <standby24x7@gmail.com>

f2fs: Fix typo in comments

Correct spelling typo in comments

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 5a20d339 02-Mar-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: align f2fs maximum name length to linux based filesystem

The maximum filename length supported in linux is 255 characters.
So let's follow that.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 7f78e035 02-Mar-2013 Eric W. Biederman <ebiederm@xmission.com>

fs: Limit sys_mount to only request filesystem modules.

Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives. Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work. While writing this patch I saw a handful of such
cases. The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module. The common pattern in the kernel is to call request_module()
without regards to the users permissions. In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted. In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 43727527 03-Feb-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: clarify and enhance the f2fs_gc flow

This patch makes clearer the ambiguous f2fs_gc flow as follows.

1. Remove intermediate checkpoint condition during f2fs_gc
(i.e., should_do_checkpoint() and GC_BLOCKED)

2. Remove unnecessary return values of f2fs_gc because of #1.
(i.e., GC_NODE, GC_OK, etc)

3. Simplify write_checkpoint() because of #2.

4. Clarify the main f2fs_gc flow.
o monitor how many freed sections during one iteration of do_garbage_collect().
o do GC more without checkpoints if we can't get enough free sections.
o do checkpoint once we've got enough free sections through forground GCs.

5. Adopt thread-logging (Slack-Space-Recycle) scheme more aggressively on data
log types. See. get_ssr_segement()

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 14d7e9de 01-Feb-2013 majianpeng <majianpeng@gmail.com>

f2fs: when check superblock failed, try to check another superblock

In f2fs, there are two superblocks. So when the first superblock was
invalidate, it should try to check another.

By Jaegeuk Kim:
o Remove a white space for coding style
o Clean up for code readability
o Fix a typo

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 5c9b4692 01-Feb-2013 majianpeng <majianpeng@gmail.com>

f2fs: use F2FS_BLKSIZE to judge bloksize and page_cache_size

In some system PAGE_CACHE_SIZE isn't 4K. So using F2FS_BLKSIZE to judge.

By Jaegeuk Kim:
o f2fs does not support no other 4KB page cache size.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# d6212a5f 29-Jan-2013 Changman Lee <cm224.lee@samsung.com>

f2fs: add un/freeze_fs into super_operations

This patch supports ioctl FIFREEZE and FITHAW to snapshot filesystem.
Before calling f2fs_freeze, all writers would be suspended and sync_fs
would be completed. So no f2fs has to do something.
Just background gc operation should be skipped due to generate dirty
nodes and data until unfreeze.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# aa43507f 25-Jan-2013 Alejandro Martinez Ruiz <alex@nowcomputing.com>

f2fs: fix disable_ext_identify option spelling

There is a typo in the ->show_options function for disable_ext_identify.
Fix it to match the spelling from the documentation.

Signed-off-by: Alejandro Martinez Ruiz <alex@nowcomputing.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 577e3495 24-Jan-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: prevent checkpoint once any IO failure is detected

This patch enhances the checkpoint routine to cope with IO errors.

Basically f2fs detects IO errors from end_io_write, and the errors are able to
be occurred during one of data, node, and meta page writes.

In the previous code, when an IO error is occurred during writes, f2fs sets a
flag, CP_ERROR_FLAG, in the raw ckeckpoint buffer which will be written to disk.
Afterwards, write_checkpoint() will check the flag and remount f2fs as a
read-only (ro) mode.

However, even once f2fs is remounted as a ro mode, dirty checkpoint pages are
freely able to be written to disk by flusher or kswapd in background.
In such a case, after cold reboot, f2fs would restore the checkpoint data having
CP_ERROR_FLAG, resulting in disabling write_checkpoint and remounting f2fs as
a ro mode again.

Therefore, let's prevent any checkpoint page (meta) writes once an IO error is
occurred, and remount f2fs as a ro mode right away at that moment.

Reported-by: Oliver Winker <oliver@oli1170.net>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>


# 6e6093a8 16-Jan-2013 Namjae Jeon <namjae.jeon@samsung.com>

f2fs: add __init to functions in init_f2fs_fs

Add __init to functions in init_f2fs_fs for code consistency.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 4589d25d 15-Jan-2013 Namjae Jeon <linkinjeon@gmail.com>

f2fs: fix the debugfs entry creation path

As the "status" debugfs entry will be maintained for entire F2FS filesystem
irrespective of the number of partitions.
So, we can move the initialization to the init part of the f2fs and destroy will
be done from exit part. After making changes, for individual partition mount -
entry creation code will not be executed.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# ff9234ad 11-Jan-2013 Namjae Jeon <namjae.jeon@samsung.com>

f2fs: remove redundant call to set_blocksize in f2fs_fill_super

Since, f2fs supports only 4KB blocksize, which is set at the beginning in
f2fs_fill_super. So, we do not need to again check this blocksize setting
in such case.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 7d82db83 10-Jan-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add f2fs_balance_fs in several interfaces

The f2fs_balance_fs() is to check the number of free sections and decide whether
it needs to conduct cleaning or not. If there are not enough free sections, the
cleaning job should be started.

In order to control an amount of free sections even under high utilization, f2fs
should call f2fs_balance_fs at all the VFS interfaces that are able to produce
dirty pages.
This patch adds the function calls in the missing interfaces as follows.

1. f2fs_setxattr()
The f2fs_setxattr() produces dirty node pages so that we should call
f2fs_balance_fs() either likewise doing in other VFS interfaces such as
f2fs_lookup(), f2fs_mkdir(), and so on.

2. f2fs_sync_file()
We should guarantee serving free sections for syncing metadata during fsync.
Previously, there is no space check before triggering checkpoint and
sync_node_pages.
Therefore, if a bunch of fsync calls are triggered under 100% of FS utilization,
f2fs is able to be faced with no free sections, resulting in BUG_ON().

3. f2fs_sync_fs()
Before calling write_checkpoint(), we should guarantee that there are minimum
free sections.

4. f2fs_write_inode()
f2fs_write_inode() is also able to produce dirty node pages.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# a07ef784 29-Dec-2012 Namjae Jeon <namjae.jeon@samsung.com>

f2fs: introduce f2fs_msg to ease adding information prints

Introduced f2fs_msg function to differentiate f2fs specific messages in
the log. And, added few informative prints in the mount path, to convey
proper error in case of mount failure.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 64c576fe 21-Dec-2012 Namjae Jeon <namjae.jeon@samsung.com>

f2fs: remove unneeded variable from f2fs_sync_fs

We can directly return '0' from the function, instead of introducing a
'ret' variable.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 12a67146 20-Dec-2012 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: return a default value for non-void function

This patch resolves a build warning reported by kbuild test robot.

"
fs/f2fs/segment.c: In function '__get_segment_type':
fs/f2fs/segment.c:806:1: warning: control reaches end of non-void
function [-Wreturn-type]
"

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 30f0c758 19-Dec-2012 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: should recover orphan and fsync data

The recovery routine should do all the time regardless of normal umount action.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 1362b5e3 12-Dec-2012 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix wrong calculation on f_files in statfs

In f2fs_statfs(), f_files should be the total number of available inodes
instead of the currently allocated inodes.
So, this patch should resolve the reported bug below.

Note that, showing 10% usage is not a bug, since f2fs reveals whole volume size
as much as possible and shows the space overhead as *used*.
This policy is fair enough with respect to other file systems.

<Reported Bug>
(loop0 is backed by 1GiB file)

$ mkfs.f2fs /dev/loop0

F2FS-tools: Ver: 1.1.0 (2012-12-11)
Info: sector size = 512
Info: total sectors = 2097152 (in 512bytes)
Info: zone aligned segment0 blkaddr: 512
Info: format successful

$ mount /dev/loop0 mnt/

$ df mnt/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/loop0 1046528 98312 929784 10%
/home/zeta/linux-devel/mtd-bench/mnt

$ df mnt/ -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/loop0 1 -465918 465919 - /home/zeta/linux-devel/mtd-bench/mnt

Notice IUsed is negative. Also, 10% usage on a fresh f2fs seems too
much to be correct.

Reported-and-Tested-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 154a0865 30-Nov-2012 Namjae Jeon <namjae.jeon@samsung.com>

f2fs: remove unneeded memset from init_once

Since, __GFP_ZERO is used while f2fs inode allocation, so we do not
need memset for f2fs_inode_info, as this is already zeroed out.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>


# 72ce6094 30-Nov-2012 Namjae Jeon <namjae.jeon@samsung.com>

f2fs: show error in case of invalid mount arguments

print the invalid argument/value from parse_options in case of
mount failure.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>


# 0a8165d7 28-Nov-2012 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: adjust kernel coding style

As pointed out by Randy Dunlap, this patch removes all usage of "/**" for comment
blocks. Instead, just use "/*".

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 25ca923b 28-Nov-2012 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix endian conversion bugs reported by sparse

This patch should resolve the bugs reported by the sparse tool.
Initial reports were written by "kbuild test robot" managed by fengguang.wu.

In my local machines, I've tested also by running:
> make C=2 CF="-D__CHECK_ENDIAN__"

Accordingly, I've found lots of warnings and bugs related to the endian
conversion. And I've fixed all at this moment.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# aff063e2 02-Nov-2012 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add super block operations

This adds the implementation of superblock operations for f2fs, which includes
- init_f2fs_fs/exit_f2fs_fs
- f2fs_mount
- super_operations of f2fs

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>