Searched +hist:7 +hist:fd9e544 (Results 1 - 3 of 3) sorted by relevance

/linux-master/fs/f2fs/
H A Dsuper.cdiff 7e1b150f Mon Aug 28 08:04:17 MDT 2023 Chao Yu <chao@kernel.org> f2fs: compress: fix to avoid redundant compress extension

With below script, redundant compress extension will be parsed and added
by parse_options(), because parse_options() doesn't check whether the
extension is existed or not, fix it.

1. mount -t f2fs -o compress_extension=so /dev/vdb /mnt/f2fs
2. mount -t f2fs -o remount,compress_extension=so /mnt/f2fs
3. mount|grep f2fs

/dev/vdb on /mnt/f2fs type f2fs (...,compress_extension=so,compress_extension=so,...)

Fixes: 4c8ff7095bef ("f2fs: support data compression")
Fixes: 151b1982be5d ("f2fs: compress: add nocompress extensions support")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 7a0263dc Sun Sep 24 23:54:50 MDT 2023 Eric Biggers <ebiggers@google.com> fscrypt: replace get_ino_and_lblk_bits with just has_32bit_inodes

Now that fs/crypto/ computes the filesystem's lblk_bits from its maximum
file size, it is no longer necessary for filesystems to provide
lblk_bits via fscrypt_operations::get_ino_and_lblk_bits.

It is still necessary for fs/crypto/ to retrieve ino_bits from the
filesystem. However, this is used only to decide whether inode numbers
fit in 32 bits. Also, ino_bits is static for all relevant filesystems,
i.e. it doesn't depend on the filesystem instance.

Therefore, in the interest of keeping things as simple as possible,
replace 'get_ino_and_lblk_bits' with a flag 'has_32bit_inodes'. This
can always be changed back to a function if a filesystem needs it to be
dynamic, but for now a static flag is all that's needed.

Link: https://lore.kernel.org/r/20230925055451.59499-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
diff 2ea6f689 Wed Aug 02 09:41:25 MDT 2023 Christoph Hellwig <hch@lst.de> fs: use the super_block as holder when mounting file systems

The file system type is not a very useful holder as it doesn't allow us
to go back to the actual file system instance. Pass the super_block instead
which is useful when passed back to the file system driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Message-Id: <20230802154131.2221419-7-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 7b02b220 Fri Oct 28 03:30:26 MDT 2022 Chao Yu <chao@kernel.org> f2fs: fix to destroy sbi->post_read_wq in error path of f2fs_fill_super()

In error path of f2fs_fill_super(), this patch fixes to call
f2fs_destroy_post_read_wq() once if we fail in f2fs_start_ckpt_thread().

Fixes: 261eeb9c1585 ("f2fs: introduce checkpoint_merge mount option")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 7a8fc586 Mon Jun 20 11:38:42 MDT 2022 Daeho Jeong <daehojeong@google.com> f2fs: introduce memory mode

Introduce memory mode to supports "normal" and "low" memory modes.
"low" mode is to support low memory devices. Because of the nature of
low memory devices, in this mode, f2fs will try to save memory sometimes
by sacrificing performance. "normal" mode is the default mode and same
as before.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff e33c267a Tue May 31 21:22:24 MDT 2022 Roman Gushchin <roman.gushchin@linux.dev> mm: shrinkers: provide shrinkers with names

Currently shrinkers are anonymous objects. For debugging purposes they
can be identified by count/scan function names, but it's not always
useful: e.g. for superblock's shrinkers it's nice to have at least an
idea of to which superblock the shrinker belongs.

This commit adds names to shrinkers. register_shrinker() and
prealloc_shrinker() functions are extended to take a format and arguments
to master a name.

In some cases it's not possible to determine a good name at the time when
a shrinker is allocated. For such cases shrinker_debugfs_rename() is
provided.

The expected format is:
<subsystem>-<shrinker_type>[:<instance>]-<id>
For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.

After this change the shrinker debugfs directory looks like:
$ cd /sys/kernel/debug/shrinker/
$ ls
dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42
mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43
mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44
rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49
sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13
sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36
sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19
sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10
sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9
sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37
sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38
sb-dax-11 sb-proc-45 sb-tmpfs-35
sb-debugfs-7 sb-proc-46 sb-tmpfs-40

[roman.gushchin@linux.dev: fix build warnings]
Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
diff 7f262f73 Wed Apr 27 10:02:54 MDT 2022 Luis Chamberlain <mcgrof@kernel.org> f2fs: ensure only power of 2 zone sizes are allowed

F2FS zoned support has power of 2 zone size assumption in many places
such as in __f2fs_issue_discard_zone, init_blkz_info. As the power of 2
requirement has been removed from the block layer, explicitly add a
condition in f2fs to allow only power of 2 zone size devices.

This condition will be relaxed once those calculation based on power of
2 is made generic.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 680af5b8 Tue Feb 15 01:27:21 MST 2022 Juhyung Park <qkrwngud825@gmail.com> f2fs: quota: fix loop condition at f2fs_quota_sync()

cnt should be passed to sb_has_quota_active() instead of type to check
active quota properly.

Moreover, when the type is -1, the compiler with enough inline knowledge
can discard sb_has_quota_active() check altogether, causing a NULL pointer
dereference at the following inode_lock(dqopt->files[cnt]):

[ 2.796010] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a0
[ 2.796024] Mem abort info:
[ 2.796025] ESR = 0x96000005
[ 2.796028] EC = 0x25: DABT (current EL), IL = 32 bits
[ 2.796029] SET = 0, FnV = 0
[ 2.796031] EA = 0, S1PTW = 0
[ 2.796032] Data abort info:
[ 2.796034] ISV = 0, ISS = 0x00000005
[ 2.796035] CM = 0, WnR = 0
[ 2.796046] user pgtable: 4k pages, 39-bit VAs, pgdp=00000003370d1000
[ 2.796048] [00000000000000a0] pgd=0000000000000000, pud=0000000000000000
[ 2.796051] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 2.796056] CPU: 7 PID: 640 Comm: f2fs_ckpt-259:7 Tainted: G S 5.4.179-arter97-r8-64666-g2f16e087f9d8 #1
[ 2.796057] Hardware name: Qualcomm Technologies, Inc. Lahaina MTP lemonadep (DT)
[ 2.796059] pstate: 80c00005 (Nzcv daif +PAN +UAO)
[ 2.796065] pc : down_write+0x28/0x70
[ 2.796070] lr : f2fs_quota_sync+0x100/0x294
[ 2.796071] sp : ffffffa3f48ffc30
[ 2.796073] x29: ffffffa3f48ffc30 x28: 0000000000000000
[ 2.796075] x27: ffffffa3f6d718b8 x26: ffffffa415fe9d80
[ 2.796077] x25: ffffffa3f7290048 x24: 0000000000000001
[ 2.796078] x23: 0000000000000000 x22: ffffffa3f7290000
[ 2.796080] x21: ffffffa3f72904a0 x20: ffffffa3f7290110
[ 2.796081] x19: ffffffa3f77a9800 x18: ffffffc020aae038
[ 2.796083] x17: ffffffa40e38e040 x16: ffffffa40e38e6d0
[ 2.796085] x15: ffffffa40e38e6cc x14: ffffffa40e38e6d0
[ 2.796086] x13: 00000000000004f6 x12: 00162c44ff493000
[ 2.796088] x11: 0000000000000400 x10: ffffffa40e38c948
[ 2.796090] x9 : 0000000000000000 x8 : 00000000000000a0
[ 2.796091] x7 : 0000000000000000 x6 : 0000d1060f00002a
[ 2.796093] x5 : ffffffa3f48ff718 x4 : 000000000000000d
[ 2.796094] x3 : 00000000060c0000 x2 : 0000000000000001
[ 2.796096] x1 : 0000000000000000 x0 : 00000000000000a0
[ 2.796098] Call trace:
[ 2.796100] down_write+0x28/0x70
[ 2.796102] f2fs_quota_sync+0x100/0x294
[ 2.796104] block_operations+0x120/0x204
[ 2.796106] f2fs_write_checkpoint+0x11c/0x520
[ 2.796107] __checkpoint_and_complete_reqs+0x7c/0xd34
[ 2.796109] issue_checkpoint_thread+0x6c/0xb8
[ 2.796112] kthread+0x138/0x414
[ 2.796114] ret_from_fork+0x10/0x18
[ 2.796117] Code: aa0803e0 aa1f03e1 52800022 aa0103e9 (c8e97d02)
[ 2.796120] ---[ end trace 96e942e8eb6a0b53 ]---
[ 2.800116] Kernel panic - not syncing: Fatal exception
[ 2.800120] SMP: stopping secondary CPUs

Fixes: 9de71ede81e6 ("f2fs: quota: fix potential deadlock")
Cc: <stable@vger.kernel.org> # v5.15+
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 680af5b8 Tue Feb 15 01:27:21 MST 2022 Juhyung Park <qkrwngud825@gmail.com> f2fs: quota: fix loop condition at f2fs_quota_sync()

cnt should be passed to sb_has_quota_active() instead of type to check
active quota properly.

Moreover, when the type is -1, the compiler with enough inline knowledge
can discard sb_has_quota_active() check altogether, causing a NULL pointer
dereference at the following inode_lock(dqopt->files[cnt]):

[ 2.796010] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a0
[ 2.796024] Mem abort info:
[ 2.796025] ESR = 0x96000005
[ 2.796028] EC = 0x25: DABT (current EL), IL = 32 bits
[ 2.796029] SET = 0, FnV = 0
[ 2.796031] EA = 0, S1PTW = 0
[ 2.796032] Data abort info:
[ 2.796034] ISV = 0, ISS = 0x00000005
[ 2.796035] CM = 0, WnR = 0
[ 2.796046] user pgtable: 4k pages, 39-bit VAs, pgdp=00000003370d1000
[ 2.796048] [00000000000000a0] pgd=0000000000000000, pud=0000000000000000
[ 2.796051] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 2.796056] CPU: 7 PID: 640 Comm: f2fs_ckpt-259:7 Tainted: G S 5.4.179-arter97-r8-64666-g2f16e087f9d8 #1
[ 2.796057] Hardware name: Qualcomm Technologies, Inc. Lahaina MTP lemonadep (DT)
[ 2.796059] pstate: 80c00005 (Nzcv daif +PAN +UAO)
[ 2.796065] pc : down_write+0x28/0x70
[ 2.796070] lr : f2fs_quota_sync+0x100/0x294
[ 2.796071] sp : ffffffa3f48ffc30
[ 2.796073] x29: ffffffa3f48ffc30 x28: 0000000000000000
[ 2.796075] x27: ffffffa3f6d718b8 x26: ffffffa415fe9d80
[ 2.796077] x25: ffffffa3f7290048 x24: 0000000000000001
[ 2.796078] x23: 0000000000000000 x22: ffffffa3f7290000
[ 2.796080] x21: ffffffa3f72904a0 x20: ffffffa3f7290110
[ 2.796081] x19: ffffffa3f77a9800 x18: ffffffc020aae038
[ 2.796083] x17: ffffffa40e38e040 x16: ffffffa40e38e6d0
[ 2.796085] x15: ffffffa40e38e6cc x14: ffffffa40e38e6d0
[ 2.796086] x13: 00000000000004f6 x12: 00162c44ff493000
[ 2.796088] x11: 0000000000000400 x10: ffffffa40e38c948
[ 2.796090] x9 : 0000000000000000 x8 : 00000000000000a0
[ 2.796091] x7 : 0000000000000000 x6 : 0000d1060f00002a
[ 2.796093] x5 : ffffffa3f48ff718 x4 : 000000000000000d
[ 2.796094] x3 : 00000000060c0000 x2 : 0000000000000001
[ 2.796096] x1 : 0000000000000000 x0 : 00000000000000a0
[ 2.796098] Call trace:
[ 2.796100] down_write+0x28/0x70
[ 2.796102] f2fs_quota_sync+0x100/0x294
[ 2.796104] block_operations+0x120/0x204
[ 2.796106] f2fs_write_checkpoint+0x11c/0x520
[ 2.796107] __checkpoint_and_complete_reqs+0x7c/0xd34
[ 2.796109] issue_checkpoint_thread+0x6c/0xb8
[ 2.796112] kthread+0x138/0x414
[ 2.796114] ret_from_fork+0x10/0x18
[ 2.796117] Code: aa0803e0 aa1f03e1 52800022 aa0103e9 (c8e97d02)
[ 2.796120] ---[ end trace 96e942e8eb6a0b53 ]---
[ 2.800116] Kernel panic - not syncing: Fatal exception
[ 2.800120] SMP: stopping secondary CPUs

Fixes: 9de71ede81e6 ("f2fs: quota: fix potential deadlock")
Cc: <stable@vger.kernel.org> # v5.15+
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 984fc4e7 Thu Feb 03 22:24:56 MST 2022 Chao Yu <chao@kernel.org> f2fs: support idmapped mounts

This patch enables idmapped mounts for f2fs, since all dedicated helpers
for this functionality existsm, so, in this patch we just pass down the
user_namespace argument from the VFS methods to the relevant helpers.

Simple idmap example on f2fs image:

1. truncate -s 128M f2fs.img
2. mkfs.f2fs f2fs.img
3. mount f2fs.img /mnt/f2fs/
4. touch /mnt/f2fs/file

5. ls -ln /mnt/f2fs/
total 0
-rw-r--r-- 1 0 0 0 2月 4 13:17 file

6. ./mount-idmapped --map-mount b:0:1001:1 /mnt/f2fs/ /mnt/scratch_f2fs/

7. ls -ln /mnt/scratch_f2fs/
total 0
-rw-r--r-- 1 1001 1001 0 2月 4 13:17 file

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
H A Dsegment.cdiff 7d009e04 Thu Feb 22 05:18:50 MST 2024 Chao Yu <chao@kernel.org> f2fs: fix to handle segment allocation failure correctly

If CONFIG_F2FS_CHECK_FS is off, and for very rare corner case that
we run out of free segment, we should not panic kernel, instead,
let's handle such error correctly in its caller.

Signed-off-by: Chao Yu <chao@kernel.org>
Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 0e8d040b Sun Jan 08 20:44:50 MST 2023 Chao Yu <chao@kernel.org> f2fs: clear atomic_write_task in f2fs_abort_atomic_write()

Otherwise, last .atomic_write_task will be remained in structure
f2fs_inode_info, resulting in aborting atomic_write accidentally
in race case. Meanwhile, clear original_i_size as well.

Fixes: 7a10f0177e11 ("f2fs: don't give partially written atomic data from process crash")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 7a2b15cf Wed Dec 21 12:19:32 MST 2022 Yangtao Li <frank.li@vivo.com> f2fs: support accounting iostat count and avg_bytes

Previously, we supported to account iostat io_bytes,
in this patch, it adds to account iostat count and avg_bytes:

time: 1671648667
io_bytes count avg_bytes
[WRITE]
app buffered data: 31 2 15

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 7bc155fe Thu May 05 18:49:18 MDT 2022 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: kill volatile write support

There's no user, since all can use atomic writes simply.
Let's kill it.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 4f993264 Mon Aug 02 18:15:43 MDT 2021 Chao Yu <chao@kernel.org> f2fs: introduce discard_unit mount option

As James Z reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=213877

[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response

[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.

[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory

[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64

[5.] Most recent kernel version which did not have the bug:
None

[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None

[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X

[8.] Memory consumption

With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0

With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7

The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.

For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.

This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.

Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().

In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 4f993264 Mon Aug 02 18:15:43 MDT 2021 Chao Yu <chao@kernel.org> f2fs: introduce discard_unit mount option

As James Z reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=213877

[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response

[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.

[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory

[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64

[5.] Most recent kernel version which did not have the bug:
None

[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None

[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X

[8.] Memory consumption

With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0

With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7

The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.

For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.

This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.

Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().

In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 4f993264 Mon Aug 02 18:15:43 MDT 2021 Chao Yu <chao@kernel.org> f2fs: introduce discard_unit mount option

As James Z reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=213877

[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response

[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.

[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory

[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64

[5.] Most recent kernel version which did not have the bug:
None

[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None

[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X

[8.] Memory consumption

With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0

With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7

The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.

For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.

This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.

Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().

In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 4f993264 Mon Aug 02 18:15:43 MDT 2021 Chao Yu <chao@kernel.org> f2fs: introduce discard_unit mount option

As James Z reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=213877

[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response

[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.

[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory

[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64

[5.] Most recent kernel version which did not have the bug:
None

[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None

[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X

[8.] Memory consumption

With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0

With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7

The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.

For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.

This patch introduces a new mountoption "discard_unit=block|segment|
section" to support issuing discard with different basic unit which is
aligned to block, segment or section, so that user can specify
"discard_unit=segment" or "discard_unit=section" to disable small
discard functionality.

Note that this mount option can not be changed by remount() due to
related metadata need to be initialized during mount().

In order to save memory, let's use "discard_unit=section" for blkzoned
device by default.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 7dede886 Sat Feb 20 02:35:40 MST 2021 Chao Yu <chao@kernel.org> f2fs: fix to allow migrating fully valid segment

F2FS_IOC_FLUSH_DEVICE/F2FS_IOC_RESIZE_FS needs to migrate all blocks of
target segment to other place, no matter the segment has partially or fully
valid blocks.

However, after commit 803e74be04b3 ("f2fs: stop GC when the victim becomes
fully valid"), we may skip migration due to target segment is fully valid,
result in failing the ioctl interface, fix this.

Fixes: 803e74be04b3 ("f2fs: stop GC when the victim becomes fully valid")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 7bcd0cfa Thu Mar 19 05:57:58 MDT 2020 Chao Yu <chao@kernel.org> f2fs: don't trigger data flush in foreground operation

Data flush can generate heavy IO and cause long latency during
flush, so it's not appropriate to trigger it in foreground
operation.

And also, we may face below potential deadlock during data flush:
- f2fs_write_multi_pages
- f2fs_write_raw_pages
- f2fs_write_single_data_page
- f2fs_balance_fs
- f2fs_balance_fs_bg
- f2fs_sync_dirty_inodes
- filemap_fdatawrite -- stuck on flush same cluster

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
H A Df2fs.hdiff 7d009e04 Thu Feb 22 05:18:50 MST 2024 Chao Yu <chao@kernel.org> f2fs: fix to handle segment allocation failure correctly

If CONFIG_F2FS_CHECK_FS is off, and for very rare corner case that
we run out of free segment, we should not panic kernel, instead,
let's handle such error correctly in its caller.

Signed-off-by: Chao Yu <chao@kernel.org>
Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 40b2d55e Wed Feb 07 00:05:48 MST 2024 Chao Yu <chao@kernel.org> f2fs: fix to create selinux label during whiteout initialization

generic/700 - output mismatch (see /media/fstests/results//generic/700.out.bad)
--- tests/generic/700.out 2023-03-28 10:40:42.735529223 +0000
+++ /media/fstests/results//generic/700.out.bad 2024-02-06 04:37:56.000000000 +0000
@@ -1,2 +1,4 @@
QA output created by 700
+/mnt/scratch_f2fs/f1: security.selinux: No such attribute
+/mnt/scratch_f2fs/f2: security.selinux: No such attribute
Silence is golden
...
(Run 'diff -u /media/fstests/tests/generic/700.out /media/fstests/results//generic/700.out.bad' to see the entire diff)

HINT: You _MAY_ be missing kernel fix:
70b589a37e1a xfs: add selinux labels to whiteout inodes

Previously, it missed to create selinux labels during whiteout inode
initialization, fix this issue.

Fixes: 7e01e7ad746b ("f2fs: support RENAME_WHITEOUT")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff bb6e1c8f Sun Dec 10 02:20:36 MST 2023 Chao Yu <chao@kernel.org> f2fs: delete obsolete FI_DROP_CACHE

FI_DROP_CACHE was introduced in commit 1e84371ffeef ("f2fs: change
atomic and volatile write policies") for volatile write feature,
after commit 7bc155fec5b3 ("f2fs: kill volatile write support"),
we won't support volatile write, let's delete related codes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff a5393636 Sun Dec 10 02:20:35 MST 2023 Chao Yu <chao@kernel.org> f2fs: delete obsolete FI_FIRST_BLOCK_WRITTEN

Commit 3c6c2bebef79 ("f2fs: avoid punch_hole overhead when releasing
volatile data") introduced FI_FIRST_BLOCK_WRITTEN as below reason:

This patch is to avoid some punch_hole overhead when releasing volatile
data. If volatile data was not written yet, we just can make the first
page as zero.

After commit 7bc155fec5b3 ("f2fs: kill volatile write support"), we
won't support volatile write, but it missed to remove obsolete
FI_FIRST_BLOCK_WRITTEN, delete it in this patch.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 7a2b15cf Wed Dec 21 12:19:32 MST 2022 Yangtao Li <frank.li@vivo.com> f2fs: support accounting iostat count and avg_bytes

Previously, we supported to account iostat io_bytes,
in this patch, it adds to account iostat count and avg_bytes:

time: 1671648667
io_bytes count avg_bytes
[WRITE]
app buffered data: 31 2 15

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 7b01ad7f Sun Jul 24 10:04:17 MDT 2022 Chao Yu <chao.yu@oppo.com> f2fs: obsolete unused MAX_DISCARD_BLOCKS

After commit a7eeb823854c ("f2fs: use bitmap in discard_entry"),
MAX_DISCARD_BLOCKS became obsolete, remove it.

Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 7a8fc586 Mon Jun 20 11:38:42 MDT 2022 Daeho Jeong <daehojeong@google.com> f2fs: introduce memory mode

Introduce memory mode to supports "normal" and "low" memory modes.
"low" mode is to support low memory devices. Because of the nature of
low memory devices, in this mode, f2fs will try to save memory sometimes
by sacrificing performance. "normal" mode is the default mode and same
as before.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff b5639bb4 Sat May 14 11:59:29 MDT 2022 Eric Biggers <ebiggers@google.com> f2fs: don't use casefolded comparison for "." and ".."

Tryng to rename a directory that has all following properties fails with
EINVAL and triggers the 'WARN_ON_ONCE(!fscrypt_has_encryption_key(dir))'
in f2fs_match_ci_name():

- The directory is casefolded
- The directory is encrypted
- The directory's encryption key is not yet set up
- The parent directory is *not* encrypted

The problem is incorrect handling of the lookup of ".." to get the
parent reference to update. fscrypt_setup_filename() treats ".." (and
".") specially, as it's never encrypted. It's passed through as-is, and
setting up the directory's key is not attempted. As the name isn't a
no-key name, f2fs treats it as a "normal" name and attempts a casefolded
comparison. That breaks the assumption of the WARN_ON_ONCE() in
f2fs_match_ci_name() which assumes that for encrypted directories,
casefolded comparisons only happen when the directory's key is set up.

We could just remove this WARN_ON_ONCE(). However, since casefolding is
always a no-op on "." and ".." anyway, let's instead just not casefold
these names. This results in the standard bytewise comparison.

Fixes: 7ad08a58bf67 ("f2fs: Handle casefolding with Encryption")
Cc: <stable@vger.kernel.org> # v5.11+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 7bc155fe Thu May 05 18:49:18 MDT 2022 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: kill volatile write support

There's no user, since all can use atomic writes simply.
Let's kill it.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
diff 7f8e249d Thu Mar 03 14:21:27 MST 2022 Jaegeuk Kim <jaegeuk@kernel.org> f2fs: introduce F2FS_UNFAIR_RWSEM to support unfair rwsem

Unfair rwsem should be used when blk-cg is on. Otherwise, there is regression.

FYI, we noticed a -26.7% regression of aim7.jobs-per-min due to commit:

commit: e4544b63a7ee49e7fbebf35ece0a6acd3b9617ae ("f2fs: move f2fs to use reader-unfair rwsems")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: aim7
on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
with following parameters:

disk: 4BRD_12G
md: RAID0
fs: f2fs
test: sync_disk_rw
load: 100
cpufreq_governor: performance
ucode: 0x500320a

test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/

Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Completed in 724 milliseconds