Searched +hist:2 +hist:a721dbb (Results 1 - 6 of 6) sorted by relevance
/linux-master/fs/xfs/scrub/ | ||
H A D | symlink.c | 2a721dbb Tue Oct 17 22:37:45 MDT 2017 Darrick J. Wong <darrick.wong@oracle.com> xfs: scrub symbolic links Create the infrastructure to scrub symbolic link data. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> 2a721dbb Tue Oct 17 22:37:45 MDT 2017 Darrick J. Wong <darrick.wong@oracle.com> xfs: scrub symbolic links Create the infrastructure to scrub symbolic link data. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> |
H A D | scrub.h | diff 86a1746e Thu Feb 22 01:30:59 MST 2024 Darrick J. Wong <djwong@kernel.org> xfs: track directory entry updates during live nlinks fsck Create the necessary hooks in the directory operations (create/link/unlink/rename) code so that our live nlink scrub code can stay up to date with link count updates in the rest of the filesystem. This will be the means to keep our shadow link count information up to date while the scan runs in real time. In online fsck part 2, we'll use these same hooks to handle repairs to directories and parent pointer information. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> diff 2d5f38a3 Mon May 01 17:16:14 MDT 2023 Darrick J. Wong <djwong@kernel.org> xfs: disable reaping in fscounters scrub The fscounters scrub code doesn't work properly because it cannot quiesce updates to the percpu counters in the filesystem, hence it returns false corruption reports. This has been fixed properly in one of the online repair patchsets that are under review by replacing the xchk_disable_reaping calls with an exclusive filesystem freeze. Disabling background gc isn't sufficient to fix the problem. In other words, scrub doesn't need to call xfs_inodegc_stop, which is just as well since it wasn't correct to allow scrub to call xfs_inodegc_start when something else could be calling xfs_inodegc_stop (e.g. trying to freeze the filesystem). Neuter the scrubber for now, and remove the xchk_*_reaping functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> diff 2e6f2756 Tue Jan 16 19:53:07 MST 2018 Darrick J. Wong <darrick.wong@oracle.com> xfs: cross-reference inode btrees during scrub Cross-reference the inode btrees with the other metadata when we scrub the filesystem. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> diff 2a721dbb Tue Oct 17 22:37:45 MDT 2017 Darrick J. Wong <darrick.wong@oracle.com> xfs: scrub symbolic links Create the infrastructure to scrub symbolic link data. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> diff 2a721dbb Tue Oct 17 22:37:45 MDT 2017 Darrick J. Wong <darrick.wong@oracle.com> xfs: scrub symbolic links Create the infrastructure to scrub symbolic link data. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> |
H A D | common.h | diff 2d5f38a3 Mon May 01 17:16:14 MDT 2023 Darrick J. Wong <djwong@kernel.org> xfs: disable reaping in fscounters scrub The fscounters scrub code doesn't work properly because it cannot quiesce updates to the percpu counters in the filesystem, hence it returns false corruption reports. This has been fixed properly in one of the online repair patchsets that are under review by replacing the xchk_disable_reaping calls with an exclusive filesystem freeze. Disabling background gc isn't sufficient to fix the problem. In other words, scrub doesn't need to call xfs_inodegc_stop, which is just as well since it wasn't correct to allow scrub to call xfs_inodegc_start when something else could be calling xfs_inodegc_stop (e.g. trying to freeze the filesystem). Neuter the scrubber for now, and remove the xchk_*_reaping functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> diff 2a721dbb Tue Oct 17 22:37:45 MDT 2017 Darrick J. Wong <darrick.wong@oracle.com> xfs: scrub symbolic links Create the infrastructure to scrub symbolic link data. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> diff 2a721dbb Tue Oct 17 22:37:45 MDT 2017 Darrick J. Wong <darrick.wong@oracle.com> xfs: scrub symbolic links Create the infrastructure to scrub symbolic link data. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> |
H A D | scrub.c | diff 86a1746e Thu Feb 22 01:30:59 MST 2024 Darrick J. Wong <djwong@kernel.org> xfs: track directory entry updates during live nlinks fsck Create the necessary hooks in the directory operations (create/link/unlink/rename) code so that our live nlink scrub code can stay up to date with link count updates in the rest of the filesystem. This will be the means to keep our shadow link count information up to date while the scan runs in real time. In online fsck part 2, we'll use these same hooks to handle repairs to directories and parent pointer information. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> diff 2d295fe6 Fri Dec 15 11:03:36 MST 2023 Darrick J. Wong <djwong@kernel.org> xfs: repair inode records If an inode is so badly damaged that it cannot be loaded into the cache, fix the ondisk metadata and try again. If there /is/ a cached inode, fix any problems and apply any optimizations that can be solved incore. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> diff e0319282 Mon Sep 11 09:39:06 MDT 2023 Darrick J. Wong <djwong@kernel.org> xfs: only call xchk_stats_merge after validating scrub inputs Harshit Mogalapalli slogged through several reports from our internal syzbot instance and observed that they all had a common stack trace: BUG: KASAN: user-memory-access in instrument_atomic_read_write include/linux/instrumented.h:96 [inline] BUG: KASAN: user-memory-access in atomic_try_cmpxchg_acquire include/linux/atomic/atomic-instrumented.h:1294 [inline] BUG: KASAN: user-memory-access in queued_spin_lock include/asm-generic/qspinlock.h:111 [inline] BUG: KASAN: user-memory-access in do_raw_spin_lock include/linux/spinlock.h:187 [inline] BUG: KASAN: user-memory-access in __raw_spin_lock include/linux/spinlock_api_smp.h:134 [inline] BUG: KASAN: user-memory-access in _raw_spin_lock+0x76/0xe0 kernel/locking/spinlock.c:154 Write of size 4 at addr 0000001dd87ee280 by task syz-executor365/1543 CPU: 2 PID: 1543 Comm: syz-executor365 Not tainted 6.5.0-syzk #1 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x83/0xb0 lib/dump_stack.c:106 print_report+0x3f8/0x620 mm/kasan/report.c:478 kasan_report+0xb0/0xe0 mm/kasan/report.c:588 check_region_inline mm/kasan/generic.c:181 [inline] kasan_check_range+0x139/0x1e0 mm/kasan/generic.c:187 instrument_atomic_read_write include/linux/instrumented.h:96 [inline] atomic_try_cmpxchg_acquire include/linux/atomic/atomic-instrumented.h:1294 [inline] queued_spin_lock include/asm-generic/qspinlock.h:111 [inline] do_raw_spin_lock include/linux/spinlock.h:187 [inline] __raw_spin_lock include/linux/spinlock_api_smp.h:134 [inline] _raw_spin_lock+0x76/0xe0 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] xchk_stats_merge_one.isra.1+0x39/0x650 fs/xfs/scrub/stats.c:191 xchk_stats_merge+0x5f/0xe0 fs/xfs/scrub/stats.c:225 xfs_scrub_metadata+0x252/0x14e0 fs/xfs/scrub/scrub.c:599 xfs_ioc_scrub_metadata+0xc8/0x160 fs/xfs/xfs_ioctl.c:1646 xfs_file_ioctl+0x3fd/0x1870 fs/xfs/xfs_ioctl.c:1955 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl fs/ioctl.c:857 [inline] __x64_sys_ioctl+0x199/0x220 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3e/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7ff155af753d Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1b 79 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc006e2568 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff155af753d RDX: 00000000200000c0 RSI: 00000000c040583c RDI: 0000000000000003 RBP: 00000000ffffffff R08: 00000000004010c0 R09: 00000000004010c0 R10: 00000000004010c0 R11: 0000000000000246 R12: 0000000000400cb0 R13: 00007ffc006e2670 R14: 0000000000000000 R15: 0000000000000000 </TASK> The root cause here is that xchk_stats_merge_one walks off the end of the xchk_scrub_stats.cs_stats array because it has been fed a garbage value in sm->sm_type. That occurs because I put the xchk_stats_merge in the wrong place -- it should have been after the last xchk_teardown call on our way out of xfs_scrub_metadata because we only call the teardown function if we called the setup function, and we don't call the setup functions if the inputs are obviously garbage. Thanks to Harshit for triaging the bug reports and bringing this to my attention. Fixes: d7a74cad8f45 ("xfs: track usage statistics of online fsck") Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> diff e0319282 Mon Sep 11 09:39:06 MDT 2023 Darrick J. Wong <djwong@kernel.org> xfs: only call xchk_stats_merge after validating scrub inputs Harshit Mogalapalli slogged through several reports from our internal syzbot instance and observed that they all had a common stack trace: BUG: KASAN: user-memory-access in instrument_atomic_read_write include/linux/instrumented.h:96 [inline] BUG: KASAN: user-memory-access in atomic_try_cmpxchg_acquire include/linux/atomic/atomic-instrumented.h:1294 [inline] BUG: KASAN: user-memory-access in queued_spin_lock include/asm-generic/qspinlock.h:111 [inline] BUG: KASAN: user-memory-access in do_raw_spin_lock include/linux/spinlock.h:187 [inline] BUG: KASAN: user-memory-access in __raw_spin_lock include/linux/spinlock_api_smp.h:134 [inline] BUG: KASAN: user-memory-access in _raw_spin_lock+0x76/0xe0 kernel/locking/spinlock.c:154 Write of size 4 at addr 0000001dd87ee280 by task syz-executor365/1543 CPU: 2 PID: 1543 Comm: syz-executor365 Not tainted 6.5.0-syzk #1 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x83/0xb0 lib/dump_stack.c:106 print_report+0x3f8/0x620 mm/kasan/report.c:478 kasan_report+0xb0/0xe0 mm/kasan/report.c:588 check_region_inline mm/kasan/generic.c:181 [inline] kasan_check_range+0x139/0x1e0 mm/kasan/generic.c:187 instrument_atomic_read_write include/linux/instrumented.h:96 [inline] atomic_try_cmpxchg_acquire include/linux/atomic/atomic-instrumented.h:1294 [inline] queued_spin_lock include/asm-generic/qspinlock.h:111 [inline] do_raw_spin_lock include/linux/spinlock.h:187 [inline] __raw_spin_lock include/linux/spinlock_api_smp.h:134 [inline] _raw_spin_lock+0x76/0xe0 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] xchk_stats_merge_one.isra.1+0x39/0x650 fs/xfs/scrub/stats.c:191 xchk_stats_merge+0x5f/0xe0 fs/xfs/scrub/stats.c:225 xfs_scrub_metadata+0x252/0x14e0 fs/xfs/scrub/scrub.c:599 xfs_ioc_scrub_metadata+0xc8/0x160 fs/xfs/xfs_ioctl.c:1646 xfs_file_ioctl+0x3fd/0x1870 fs/xfs/xfs_ioctl.c:1955 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl fs/ioctl.c:857 [inline] __x64_sys_ioctl+0x199/0x220 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3e/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7ff155af753d Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1b 79 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc006e2568 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff155af753d RDX: 00000000200000c0 RSI: 00000000c040583c RDI: 0000000000000003 RBP: 00000000ffffffff R08: 00000000004010c0 R09: 00000000004010c0 R10: 00000000004010c0 R11: 0000000000000246 R12: 0000000000400cb0 R13: 00007ffc006e2670 R14: 0000000000000000 R15: 0000000000000000 </TASK> The root cause here is that xchk_stats_merge_one walks off the end of the xchk_scrub_stats.cs_stats array because it has been fed a garbage value in sm->sm_type. That occurs because I put the xchk_stats_merge in the wrong place -- it should have been after the last xchk_teardown call on our way out of xfs_scrub_metadata because we only call the teardown function if we called the setup function, and we don't call the setup functions if the inputs are obviously garbage. Thanks to Harshit for triaging the bug reports and bringing this to my attention. Fixes: d7a74cad8f45 ("xfs: track usage statistics of online fsck") Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> diff e0319282 Mon Sep 11 09:39:06 MDT 2023 Darrick J. Wong <djwong@kernel.org> xfs: only call xchk_stats_merge after validating scrub inputs Harshit Mogalapalli slogged through several reports from our internal syzbot instance and observed that they all had a common stack trace: BUG: KASAN: user-memory-access in instrument_atomic_read_write include/linux/instrumented.h:96 [inline] BUG: KASAN: user-memory-access in atomic_try_cmpxchg_acquire include/linux/atomic/atomic-instrumented.h:1294 [inline] BUG: KASAN: user-memory-access in queued_spin_lock include/asm-generic/qspinlock.h:111 [inline] BUG: KASAN: user-memory-access in do_raw_spin_lock include/linux/spinlock.h:187 [inline] BUG: KASAN: user-memory-access in __raw_spin_lock include/linux/spinlock_api_smp.h:134 [inline] BUG: KASAN: user-memory-access in _raw_spin_lock+0x76/0xe0 kernel/locking/spinlock.c:154 Write of size 4 at addr 0000001dd87ee280 by task syz-executor365/1543 CPU: 2 PID: 1543 Comm: syz-executor365 Not tainted 6.5.0-syzk #1 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x83/0xb0 lib/dump_stack.c:106 print_report+0x3f8/0x620 mm/kasan/report.c:478 kasan_report+0xb0/0xe0 mm/kasan/report.c:588 check_region_inline mm/kasan/generic.c:181 [inline] kasan_check_range+0x139/0x1e0 mm/kasan/generic.c:187 instrument_atomic_read_write include/linux/instrumented.h:96 [inline] atomic_try_cmpxchg_acquire include/linux/atomic/atomic-instrumented.h:1294 [inline] queued_spin_lock include/asm-generic/qspinlock.h:111 [inline] do_raw_spin_lock include/linux/spinlock.h:187 [inline] __raw_spin_lock include/linux/spinlock_api_smp.h:134 [inline] _raw_spin_lock+0x76/0xe0 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] xchk_stats_merge_one.isra.1+0x39/0x650 fs/xfs/scrub/stats.c:191 xchk_stats_merge+0x5f/0xe0 fs/xfs/scrub/stats.c:225 xfs_scrub_metadata+0x252/0x14e0 fs/xfs/scrub/scrub.c:599 xfs_ioc_scrub_metadata+0xc8/0x160 fs/xfs/xfs_ioctl.c:1646 xfs_file_ioctl+0x3fd/0x1870 fs/xfs/xfs_ioctl.c:1955 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl fs/ioctl.c:857 [inline] __x64_sys_ioctl+0x199/0x220 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3e/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7ff155af753d Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1b 79 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc006e2568 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff155af753d RDX: 00000000200000c0 RSI: 00000000c040583c RDI: 0000000000000003 RBP: 00000000ffffffff R08: 00000000004010c0 R09: 00000000004010c0 R10: 00000000004010c0 R11: 0000000000000246 R12: 0000000000400cb0 R13: 00007ffc006e2670 R14: 0000000000000000 R15: 0000000000000000 </TASK> The root cause here is that xchk_stats_merge_one walks off the end of the xchk_scrub_stats.cs_stats array because it has been fed a garbage value in sm->sm_type. That occurs because I put the xchk_stats_merge in the wrong place -- it should have been after the last xchk_teardown call on our way out of xfs_scrub_metadata because we only call the teardown function if we called the setup function, and we don't call the setup functions if the inputs are obviously garbage. Thanks to Harshit for triaging the bug reports and bringing this to my attention. Fixes: d7a74cad8f45 ("xfs: track usage statistics of online fsck") Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> diff e0319282 Mon Sep 11 09:39:06 MDT 2023 Darrick J. Wong <djwong@kernel.org> xfs: only call xchk_stats_merge after validating scrub inputs Harshit Mogalapalli slogged through several reports from our internal syzbot instance and observed that they all had a common stack trace: BUG: KASAN: user-memory-access in instrument_atomic_read_write include/linux/instrumented.h:96 [inline] BUG: KASAN: user-memory-access in atomic_try_cmpxchg_acquire include/linux/atomic/atomic-instrumented.h:1294 [inline] BUG: KASAN: user-memory-access in queued_spin_lock include/asm-generic/qspinlock.h:111 [inline] BUG: KASAN: user-memory-access in do_raw_spin_lock include/linux/spinlock.h:187 [inline] BUG: KASAN: user-memory-access in __raw_spin_lock include/linux/spinlock_api_smp.h:134 [inline] BUG: KASAN: user-memory-access in _raw_spin_lock+0x76/0xe0 kernel/locking/spinlock.c:154 Write of size 4 at addr 0000001dd87ee280 by task syz-executor365/1543 CPU: 2 PID: 1543 Comm: syz-executor365 Not tainted 6.5.0-syzk #1 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x83/0xb0 lib/dump_stack.c:106 print_report+0x3f8/0x620 mm/kasan/report.c:478 kasan_report+0xb0/0xe0 mm/kasan/report.c:588 check_region_inline mm/kasan/generic.c:181 [inline] kasan_check_range+0x139/0x1e0 mm/kasan/generic.c:187 instrument_atomic_read_write include/linux/instrumented.h:96 [inline] atomic_try_cmpxchg_acquire include/linux/atomic/atomic-instrumented.h:1294 [inline] queued_spin_lock include/asm-generic/qspinlock.h:111 [inline] do_raw_spin_lock include/linux/spinlock.h:187 [inline] __raw_spin_lock include/linux/spinlock_api_smp.h:134 [inline] _raw_spin_lock+0x76/0xe0 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] xchk_stats_merge_one.isra.1+0x39/0x650 fs/xfs/scrub/stats.c:191 xchk_stats_merge+0x5f/0xe0 fs/xfs/scrub/stats.c:225 xfs_scrub_metadata+0x252/0x14e0 fs/xfs/scrub/scrub.c:599 xfs_ioc_scrub_metadata+0xc8/0x160 fs/xfs/xfs_ioctl.c:1646 xfs_file_ioctl+0x3fd/0x1870 fs/xfs/xfs_ioctl.c:1955 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl fs/ioctl.c:857 [inline] __x64_sys_ioctl+0x199/0x220 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3e/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7ff155af753d Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1b 79 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc006e2568 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff155af753d RDX: 00000000200000c0 RSI: 00000000c040583c RDI: 0000000000000003 RBP: 00000000ffffffff R08: 00000000004010c0 R09: 00000000004010c0 R10: 00000000004010c0 R11: 0000000000000246 R12: 0000000000400cb0 R13: 00007ffc006e2670 R14: 0000000000000000 R15: 0000000000000000 </TASK> The root cause here is that xchk_stats_merge_one walks off the end of the xchk_scrub_stats.cs_stats array because it has been fed a garbage value in sm->sm_type. That occurs because I put the xchk_stats_merge in the wrong place -- it should have been after the last xchk_teardown call on our way out of xfs_scrub_metadata because we only call the teardown function if we called the setup function, and we don't call the setup functions if the inputs are obviously garbage. Thanks to Harshit for triaging the bug reports and bringing this to my attention. Fixes: d7a74cad8f45 ("xfs: track usage statistics of online fsck") Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> diff 2d5f38a3 Mon May 01 17:16:14 MDT 2023 Darrick J. Wong <djwong@kernel.org> xfs: disable reaping in fscounters scrub The fscounters scrub code doesn't work properly because it cannot quiesce updates to the percpu counters in the filesystem, hence it returns false corruption reports. This has been fixed properly in one of the online repair patchsets that are under review by replacing the xchk_disable_reaping calls with an exclusive filesystem freeze. Disabling background gc isn't sufficient to fix the problem. In other words, scrub doesn't need to call xfs_inodegc_stop, which is just as well since it wasn't correct to allow scrub to call xfs_inodegc_start when something else could be calling xfs_inodegc_stop (e.g. trying to freeze the filesystem). Neuter the scrubber for now, and remove the xchk_*_reaping functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> diff 2e973b2c Wed Aug 18 19:46:52 MDT 2021 Dave Chinner <dchinner@redhat.com> xfs: convert remaining mount flags to state flags The remaining mount flags kept in m_flags are actually runtime state flags. These change dynamically, so they really should be updated atomically so we don't potentially lose an update due to racing modifications. Convert these remaining flags to be stored in m_opstate and use atomic bitops to set and clear the flags. This also adds a couple of simple wrappers for common state checks - read only and shutdown. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> diff 38c26bfd Wed Aug 18 19:46:37 MDT 2021 Dave Chinner <dchinner@redhat.com> xfs: replace xfs_sb_version checks with feature flag checks Convert the xfs_sb_version_hasfoo() to checks against mp->m_features. Checks of the superblock itself during disk operations (e.g. in the read/write verifiers and the to/from disk formatters) are not converted - they operate purely on the superblock state. Everything else should use the mount features. Large parts of this conversion were done with sed with commands like this: for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f done With manual cleanups for things like "xfs_has_extflgbit" and other little inconsistencies in naming. The result is ia lot less typing to check features and an XFS binary size reduced by a bit over 3kB: $ size -t fs/xfs/built-in.a text data bss dec hex filenam before 1130866 311352 484 1442702 16038e (TOTALS) after 1127727 311352 484 1439563 15f74b (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> diff 27fb5a72 Wed Mar 25 00:03:24 MDT 2020 Darrick J. Wong <darrick.wong@oracle.com> xfs: prohibit fs freezing when using empty transactions I noticed that fsfreeze can take a very long time to freeze an XFS if there happens to be a GETFSMAP caller running in the background. I also happened to notice the following in dmesg: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 43492 at fs/xfs/xfs_super.c:853 xfs_quiesce_attr+0x83/0x90 [xfs] Modules linked in: xfs libcrc32c ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip_set_hash_ip ip_set_hash_net xt_tcpudp xt_set ip_set_hash_mac ip_set nfnetlink ip6table_filter ip6_tables bfq iptable_filter sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: xfs] CPU: 2 PID: 43492 Comm: xfs_io Not tainted 5.6.0-rc4-djw #rc4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:xfs_quiesce_attr+0x83/0x90 [xfs] Code: 7c 07 00 00 85 c0 75 22 48 89 df 5b e9 96 c1 00 00 48 c7 c6 b0 2d 38 a0 48 89 df e8 57 64 ff ff 8b 83 7c 07 00 00 85 c0 74 de <0f> 0b 48 89 df 5b e9 72 c1 00 00 66 90 0f 1f 44 00 00 41 55 41 54 RSP: 0018:ffffc900030f3e28 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff88802ac54000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff81e4a6f0 RDI: 00000000ffffffff RBP: ffff88807859f070 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000010 R12: 0000000000000000 R13: ffff88807859f388 R14: ffff88807859f4b8 R15: ffff88807859f5e8 FS: 00007fad1c6c0fc0(0000) GS:ffff88807e000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0c7d237000 CR3: 0000000077f01003 CR4: 00000000001606a0 Call Trace: xfs_fs_freeze+0x25/0x40 [xfs] freeze_super+0xc8/0x180 do_vfs_ioctl+0x70b/0x750 ? __fget_files+0x135/0x210 ksys_ioctl+0x3a/0xb0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x1a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe These two things appear to be related. The assertion trips when another thread initiates a fsmap request (which uses an empty transaction) after the freezer waited for m_active_trans to hit zero but before the the freezer executes the WARN_ON just prior to calling xfs_log_quiesce. The lengthy delays in freezing happen because the freezer calls xfs_wait_buftarg to clean out the buffer lru list. Meanwhile, the GETFSMAP caller is continuing to grab and release buffers, which means that it can take a very long time for the buffer lru list to empty out. We fix both of these races by calling sb_start_write to obtain freeze protection while using empty transactions for GETFSMAP and for metadata scrubbing. The other two users occur during mount, during which time we cannot fs freeze. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> diff 27fb5a72 Wed Mar 25 00:03:24 MDT 2020 Darrick J. Wong <darrick.wong@oracle.com> xfs: prohibit fs freezing when using empty transactions I noticed that fsfreeze can take a very long time to freeze an XFS if there happens to be a GETFSMAP caller running in the background. I also happened to notice the following in dmesg: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 43492 at fs/xfs/xfs_super.c:853 xfs_quiesce_attr+0x83/0x90 [xfs] Modules linked in: xfs libcrc32c ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip_set_hash_ip ip_set_hash_net xt_tcpudp xt_set ip_set_hash_mac ip_set nfnetlink ip6table_filter ip6_tables bfq iptable_filter sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: xfs] CPU: 2 PID: 43492 Comm: xfs_io Not tainted 5.6.0-rc4-djw #rc4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:xfs_quiesce_attr+0x83/0x90 [xfs] Code: 7c 07 00 00 85 c0 75 22 48 89 df 5b e9 96 c1 00 00 48 c7 c6 b0 2d 38 a0 48 89 df e8 57 64 ff ff 8b 83 7c 07 00 00 85 c0 74 de <0f> 0b 48 89 df 5b e9 72 c1 00 00 66 90 0f 1f 44 00 00 41 55 41 54 RSP: 0018:ffffc900030f3e28 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff88802ac54000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff81e4a6f0 RDI: 00000000ffffffff RBP: ffff88807859f070 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000010 R12: 0000000000000000 R13: ffff88807859f388 R14: ffff88807859f4b8 R15: ffff88807859f5e8 FS: 00007fad1c6c0fc0(0000) GS:ffff88807e000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0c7d237000 CR3: 0000000077f01003 CR4: 00000000001606a0 Call Trace: xfs_fs_freeze+0x25/0x40 [xfs] freeze_super+0xc8/0x180 do_vfs_ioctl+0x70b/0x750 ? __fget_files+0x135/0x210 ksys_ioctl+0x3a/0xb0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x1a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe These two things appear to be related. The assertion trips when another thread initiates a fsmap request (which uses an empty transaction) after the freezer waited for m_active_trans to hit zero but before the the freezer executes the WARN_ON just prior to calling xfs_log_quiesce. The lengthy delays in freezing happen because the freezer calls xfs_wait_buftarg to clean out the buffer lru list. Meanwhile, the GETFSMAP caller is continuing to grab and release buffers, which means that it can take a very long time for the buffer lru list to empty out. We fix both of these races by calling sb_start_write to obtain freeze protection while using empty transactions for GETFSMAP and for metadata scrubbing. The other two users occur during mount, during which time we cannot fs freeze. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> diff 27fb5a72 Wed Mar 25 00:03:24 MDT 2020 Darrick J. Wong <darrick.wong@oracle.com> xfs: prohibit fs freezing when using empty transactions I noticed that fsfreeze can take a very long time to freeze an XFS if there happens to be a GETFSMAP caller running in the background. I also happened to notice the following in dmesg: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 43492 at fs/xfs/xfs_super.c:853 xfs_quiesce_attr+0x83/0x90 [xfs] Modules linked in: xfs libcrc32c ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip_set_hash_ip ip_set_hash_net xt_tcpudp xt_set ip_set_hash_mac ip_set nfnetlink ip6table_filter ip6_tables bfq iptable_filter sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: xfs] CPU: 2 PID: 43492 Comm: xfs_io Not tainted 5.6.0-rc4-djw #rc4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:xfs_quiesce_attr+0x83/0x90 [xfs] Code: 7c 07 00 00 85 c0 75 22 48 89 df 5b e9 96 c1 00 00 48 c7 c6 b0 2d 38 a0 48 89 df e8 57 64 ff ff 8b 83 7c 07 00 00 85 c0 74 de <0f> 0b 48 89 df 5b e9 72 c1 00 00 66 90 0f 1f 44 00 00 41 55 41 54 RSP: 0018:ffffc900030f3e28 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff88802ac54000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff81e4a6f0 RDI: 00000000ffffffff RBP: ffff88807859f070 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000010 R12: 0000000000000000 R13: ffff88807859f388 R14: ffff88807859f4b8 R15: ffff88807859f5e8 FS: 00007fad1c6c0fc0(0000) GS:ffff88807e000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0c7d237000 CR3: 0000000077f01003 CR4: 00000000001606a0 Call Trace: xfs_fs_freeze+0x25/0x40 [xfs] freeze_super+0xc8/0x180 do_vfs_ioctl+0x70b/0x750 ? __fget_files+0x135/0x210 ksys_ioctl+0x3a/0xb0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x1a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe These two things appear to be related. The assertion trips when another thread initiates a fsmap request (which uses an empty transaction) after the freezer waited for m_active_trans to hit zero but before the the freezer executes the WARN_ON just prior to calling xfs_log_quiesce. The lengthy delays in freezing happen because the freezer calls xfs_wait_buftarg to clean out the buffer lru list. Meanwhile, the GETFSMAP caller is continuing to grab and release buffers, which means that it can take a very long time for the buffer lru list to empty out. We fix both of these races by calling sb_start_write to obtain freeze protection while using empty transactions for GETFSMAP and for metadata scrubbing. The other two users occur during mount, during which time we cannot fs freeze. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> diff 27fb5a72 Wed Mar 25 00:03:24 MDT 2020 Darrick J. Wong <darrick.wong@oracle.com> xfs: prohibit fs freezing when using empty transactions I noticed that fsfreeze can take a very long time to freeze an XFS if there happens to be a GETFSMAP caller running in the background. I also happened to notice the following in dmesg: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 43492 at fs/xfs/xfs_super.c:853 xfs_quiesce_attr+0x83/0x90 [xfs] Modules linked in: xfs libcrc32c ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip_set_hash_ip ip_set_hash_net xt_tcpudp xt_set ip_set_hash_mac ip_set nfnetlink ip6table_filter ip6_tables bfq iptable_filter sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: xfs] CPU: 2 PID: 43492 Comm: xfs_io Not tainted 5.6.0-rc4-djw #rc4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:xfs_quiesce_attr+0x83/0x90 [xfs] Code: 7c 07 00 00 85 c0 75 22 48 89 df 5b e9 96 c1 00 00 48 c7 c6 b0 2d 38 a0 48 89 df e8 57 64 ff ff 8b 83 7c 07 00 00 85 c0 74 de <0f> 0b 48 89 df 5b e9 72 c1 00 00 66 90 0f 1f 44 00 00 41 55 41 54 RSP: 0018:ffffc900030f3e28 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff88802ac54000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff81e4a6f0 RDI: 00000000ffffffff RBP: ffff88807859f070 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000010 R12: 0000000000000000 R13: ffff88807859f388 R14: ffff88807859f4b8 R15: ffff88807859f5e8 FS: 00007fad1c6c0fc0(0000) GS:ffff88807e000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0c7d237000 CR3: 0000000077f01003 CR4: 00000000001606a0 Call Trace: xfs_fs_freeze+0x25/0x40 [xfs] freeze_super+0xc8/0x180 do_vfs_ioctl+0x70b/0x750 ? __fget_files+0x135/0x210 ksys_ioctl+0x3a/0xb0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x1a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe These two things appear to be related. The assertion trips when another thread initiates a fsmap request (which uses an empty transaction) after the freezer waited for m_active_trans to hit zero but before the the freezer executes the WARN_ON just prior to calling xfs_log_quiesce. The lengthy delays in freezing happen because the freezer calls xfs_wait_buftarg to clean out the buffer lru list. Meanwhile, the GETFSMAP caller is continuing to grab and release buffers, which means that it can take a very long time for the buffer lru list to empty out. We fix both of these races by calling sb_start_write to obtain freeze protection while using empty transactions for GETFSMAP and for metadata scrubbing. The other two users occur during mount, during which time we cannot fs freeze. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> |
/linux-master/fs/xfs/libxfs/ | ||
H A D | xfs_fs.h | diff c3c4ecb5 Tue Mar 08 17:58:37 MST 2022 Chandan Babu R <chandan.babu@oracle.com> xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters The following changes are made to enable userspace to obtain 64-bit extent counters, 1. Carve out a new 64-bit field xfs_bulkstat->bs_extents64 from xfs_bulkstat->bs_pad[] to hold 64-bit extent counter. 2. Define the new flag XFS_BULK_IREQ_BULKSTAT for userspace to indicate that it is capable of receiving 64-bit extent counters. Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> diff 508578f2 Tue May 12 17:54:17 MDT 2020 Nishad Kamdar <nishadkamdar@gmail.com> xfs: Use the correct style for SPDX License Identifier This patch corrects the SPDX License Identifier style in header files related to XFS File System support. For C header files Documentation/process/license-rules.rst mandates C-like comments. (opposed to C source files where C++ style should be used). Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> diff e8777b27 Tue Nov 12 09:39:43 MST 2019 Arnd Bergmann <arnd@arndb.de> xfs: avoid time_t in user api The ioctl definitions for XFS_IOC_SWAPEXT, XFS_IOC_FSBULKSTAT and XFS_IOC_FSBULKSTAT_SINGLE are part of libxfs and based on time_t. The definition for time_t differs between current kernels and coming 32-bit libc variants that define it as 64-bit. For most ioctls, that means the kernel has to be able to handle two different command codes based on the different structure sizes. The same solution could be applied for XFS_IOC_SWAPEXT, but it would not work for XFS_IOC_FSBULKSTAT and XFS_IOC_FSBULKSTAT_SINGLE because the structure with the time_t is passed through an indirect pointer, and the command number itself is based on struct xfs_fsop_bulkreq, which does not differ based on time_t. This means any solution that can be applied requires a change of the ABI definition in the xfs_fs.h header file, as well as doing the same change in any user application that contains a copy of this header. The usual solution would be to define a replacement structure and use conditional compilation for the ioctl command codes to use one or the other, such as #define XFS_IOC_FSBULKSTAT_OLD _IOWR('X', 101, struct xfs_fsop_bulkreq) #define XFS_IOC_FSBULKSTAT_NEW _IOWR('X', 129, struct xfs_fsop_bulkreq) #define XFS_IOC_FSBULKSTAT ((sizeof(time_t) == sizeof(__kernel_long_t)) ? \ XFS_IOC_FSBULKSTAT_OLD : XFS_IOC_FSBULKSTAT_NEW) After this, the kernel would be able to implement both XFS_IOC_FSBULKSTAT_OLD and XFS_IOC_FSBULKSTAT_NEW handlers on 32-bit architectures with the correct ABI for either definition of time_t. However, as long as two observations are true, a much simpler solution can be used: 1. xfsprogs is the only user space project that has a copy of this header 2. xfsprogs already has a replacement for all three affected ioctl commands, based on the xfs_bulkstat structure to pass 64-bit timestamps regardless of the architecture Based on those assumptions, changing xfs_bstime to use __kernel_long_t instead of time_t in both the kernel and in xfsprogs preserves the current ABI for any libc definition of time_t and solves the problem of passing 64-bit timestamps to 32-bit user space. If either of the two assumptions is invalid, more discussion is needed for coming up with a way to fix as much of the affected user space code as possible. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> diff 2a721dbb Tue Oct 17 22:37:45 MDT 2017 Darrick J. Wong <darrick.wong@oracle.com> xfs: scrub symbolic links Create the infrastructure to scrub symbolic link data. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> diff 2a721dbb Tue Oct 17 22:37:45 MDT 2017 Darrick J. Wong <darrick.wong@oracle.com> xfs: scrub symbolic links Create the infrastructure to scrub symbolic link data. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> |
/linux-master/fs/xfs/ | ||
H A D | Makefile | diff 18a1e644 Thu Feb 22 01:43:40 MST 2024 Darrick J. Wong <djwong@kernel.org> xfs: define an in-memory btree for storing refcount bag info during repairs Create a new in-memory btree type so that we can store refcount bag info in a much more memory-efficient and performant format. Recall that the refcount recordset regenerator computes the new recordset from browsing the rmap records. Let's say that the rmap records are: {agbno: 10, length: 40, ...} {agbno: 11, length: 3, ...} {agbno: 12, length: 20, ...} {agbno: 15, length: 1, ...} It is convenient to have a data structure that could quickly tell us the refcount for an arbitrary agbno without wasting memory. An array or a list could do that pretty easily. List suck because of the pointer overhead. xfarrays are a lot more compact, but we want to minimize sparse holes in the xfarray to constrain memory usage. Maintaining any kind of record order isn't needed for correctness, so I created the "rcbag", which is shorthand for an unordered list of (excerpted) reverse mappings. So we add the first rmap to the rcbag, and it looks like: 0: {agbno: 10, length: 40} The refcount for agbno 10 is 1. Then we move on to block 11, so we add the second rmap: 0: {agbno: 10, length: 40} 1: {agbno: 11, length: 3} The refcount for agbno 11 is 2. We move on to block 12, so we add the third: 0: {agbno: 10, length: 40} 1: {agbno: 11, length: 3} 2: {agbno: 12, length: 20} The refcount for agbno 12 and 13 is 3. We move on to block 14, and remove the second rmap: 0: {agbno: 10, length: 40} 1: NULL 2: {agbno: 12, length: 20} The refcount for agbno 14 is 2. We move on to block 15, and add the last rmap. But we don't care where it is and we don't want to expand the array so we put it in slot 1: 0: {agbno: 10, length: 40} 1: {agbno: 15, length: 1} 2: {agbno: 12, length: 20} The refcount for block 15 is 3. Notice how order doesn't matter in this list? That's why repair uses an unordered list, or "bag". The data structure is not a set because it does not guarantee uniqueness. That said, adding and removing specific items is now an O(n) operation because we have no idea where that item might be in the list. Overall, the runtime is O(n^2) which is bad. I realized that I could easily refactor the btree code and reimplement the refcount bag with an xfbtree. Adding and removing is now O(log2 n), so the runtime is at least O(n log2 n), which is much faster. In the end, the rcbag becomes a sorted list, but that's merely a detail of the implementation. The repair code doesn't care. (Note: That horrible xfs_db bmap_inflate command can be used to exercise this sort of rcbag insanity by cranking up refcounts quickly.) Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> diff 18a1e644 Thu Feb 22 01:43:40 MST 2024 Darrick J. Wong <djwong@kernel.org> xfs: define an in-memory btree for storing refcount bag info during repairs Create a new in-memory btree type so that we can store refcount bag info in a much more memory-efficient and performant format. Recall that the refcount recordset regenerator computes the new recordset from browsing the rmap records. Let's say that the rmap records are: {agbno: 10, length: 40, ...} {agbno: 11, length: 3, ...} {agbno: 12, length: 20, ...} {agbno: 15, length: 1, ...} It is convenient to have a data structure that could quickly tell us the refcount for an arbitrary agbno without wasting memory. An array or a list could do that pretty easily. List suck because of the pointer overhead. xfarrays are a lot more compact, but we want to minimize sparse holes in the xfarray to constrain memory usage. Maintaining any kind of record order isn't needed for correctness, so I created the "rcbag", which is shorthand for an unordered list of (excerpted) reverse mappings. So we add the first rmap to the rcbag, and it looks like: 0: {agbno: 10, length: 40} The refcount for agbno 10 is 1. Then we move on to block 11, so we add the second rmap: 0: {agbno: 10, length: 40} 1: {agbno: 11, length: 3} The refcount for agbno 11 is 2. We move on to block 12, so we add the third: 0: {agbno: 10, length: 40} 1: {agbno: 11, length: 3} 2: {agbno: 12, length: 20} The refcount for agbno 12 and 13 is 3. We move on to block 14, and remove the second rmap: 0: {agbno: 10, length: 40} 1: NULL 2: {agbno: 12, length: 20} The refcount for agbno 14 is 2. We move on to block 15, and add the last rmap. But we don't care where it is and we don't want to expand the array so we put it in slot 1: 0: {agbno: 10, length: 40} 1: {agbno: 15, length: 1} 2: {agbno: 12, length: 20} The refcount for block 15 is 3. Notice how order doesn't matter in this list? That's why repair uses an unordered list, or "bag". The data structure is not a set because it does not guarantee uniqueness. That said, adding and removing specific items is now an O(n) operation because we have no idea where that item might be in the list. Overall, the runtime is O(n^2) which is bad. I realized that I could easily refactor the btree code and reimplement the refcount bag with an xfbtree. Adding and removing is now O(log2 n), so the runtime is at least O(n log2 n), which is much faster. In the end, the rcbag becomes a sorted list, but that's merely a detail of the implementation. The repair code doesn't care. (Note: That horrible xfs_db bmap_inflate command can be used to exercise this sort of rcbag insanity by cranking up refcounts quickly.) Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> diff 18a1e644 Thu Feb 22 01:43:40 MST 2024 Darrick J. Wong <djwong@kernel.org> xfs: define an in-memory btree for storing refcount bag info during repairs Create a new in-memory btree type so that we can store refcount bag info in a much more memory-efficient and performant format. Recall that the refcount recordset regenerator computes the new recordset from browsing the rmap records. Let's say that the rmap records are: {agbno: 10, length: 40, ...} {agbno: 11, length: 3, ...} {agbno: 12, length: 20, ...} {agbno: 15, length: 1, ...} It is convenient to have a data structure that could quickly tell us the refcount for an arbitrary agbno without wasting memory. An array or a list could do that pretty easily. List suck because of the pointer overhead. xfarrays are a lot more compact, but we want to minimize sparse holes in the xfarray to constrain memory usage. Maintaining any kind of record order isn't needed for correctness, so I created the "rcbag", which is shorthand for an unordered list of (excerpted) reverse mappings. So we add the first rmap to the rcbag, and it looks like: 0: {agbno: 10, length: 40} The refcount for agbno 10 is 1. Then we move on to block 11, so we add the second rmap: 0: {agbno: 10, length: 40} 1: {agbno: 11, length: 3} The refcount for agbno 11 is 2. We move on to block 12, so we add the third: 0: {agbno: 10, length: 40} 1: {agbno: 11, length: 3} 2: {agbno: 12, length: 20} The refcount for agbno 12 and 13 is 3. We move on to block 14, and remove the second rmap: 0: {agbno: 10, length: 40} 1: NULL 2: {agbno: 12, length: 20} The refcount for agbno 14 is 2. We move on to block 15, and add the last rmap. But we don't care where it is and we don't want to expand the array so we put it in slot 1: 0: {agbno: 10, length: 40} 1: {agbno: 15, length: 1} 2: {agbno: 12, length: 20} The refcount for block 15 is 3. Notice how order doesn't matter in this list? That's why repair uses an unordered list, or "bag". The data structure is not a set because it does not guarantee uniqueness. That said, adding and removing specific items is now an O(n) operation because we have no idea where that item might be in the list. Overall, the runtime is O(n^2) which is bad. I realized that I could easily refactor the btree code and reimplement the refcount bag with an xfbtree. Adding and removing is now O(log2 n), so the runtime is at least O(n log2 n), which is much faster. In the end, the rcbag becomes a sorted list, but that's merely a detail of the implementation. The repair code doesn't care. (Note: That horrible xfs_db bmap_inflate command can be used to exercise this sort of rcbag insanity by cranking up refcounts quickly.) Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> diff 18a1e644 Thu Feb 22 01:43:40 MST 2024 Darrick J. Wong <djwong@kernel.org> xfs: define an in-memory btree for storing refcount bag info during repairs Create a new in-memory btree type so that we can store refcount bag info in a much more memory-efficient and performant format. Recall that the refcount recordset regenerator computes the new recordset from browsing the rmap records. Let's say that the rmap records are: {agbno: 10, length: 40, ...} {agbno: 11, length: 3, ...} {agbno: 12, length: 20, ...} {agbno: 15, length: 1, ...} It is convenient to have a data structure that could quickly tell us the refcount for an arbitrary agbno without wasting memory. An array or a list could do that pretty easily. List suck because of the pointer overhead. xfarrays are a lot more compact, but we want to minimize sparse holes in the xfarray to constrain memory usage. Maintaining any kind of record order isn't needed for correctness, so I created the "rcbag", which is shorthand for an unordered list of (excerpted) reverse mappings. So we add the first rmap to the rcbag, and it looks like: 0: {agbno: 10, length: 40} The refcount for agbno 10 is 1. Then we move on to block 11, so we add the second rmap: 0: {agbno: 10, length: 40} 1: {agbno: 11, length: 3} The refcount for agbno 11 is 2. We move on to block 12, so we add the third: 0: {agbno: 10, length: 40} 1: {agbno: 11, length: 3} 2: {agbno: 12, length: 20} The refcount for agbno 12 and 13 is 3. We move on to block 14, and remove the second rmap: 0: {agbno: 10, length: 40} 1: NULL 2: {agbno: 12, length: 20} The refcount for agbno 14 is 2. We move on to block 15, and add the last rmap. But we don't care where it is and we don't want to expand the array so we put it in slot 1: 0: {agbno: 10, length: 40} 1: {agbno: 15, length: 1} 2: {agbno: 12, length: 20} The refcount for block 15 is 3. Notice how order doesn't matter in this list? That's why repair uses an unordered list, or "bag". The data structure is not a set because it does not guarantee uniqueness. That said, adding and removing specific items is now an O(n) operation because we have no idea where that item might be in the list. Overall, the runtime is O(n^2) which is bad. I realized that I could easily refactor the btree code and reimplement the refcount bag with an xfbtree. Adding and removing is now O(log2 n), so the runtime is at least O(n log2 n), which is much faster. In the end, the rcbag becomes a sorted list, but that's merely a detail of the implementation. The repair code doesn't care. (Note: That horrible xfs_db bmap_inflate command can be used to exercise this sort of rcbag insanity by cranking up refcounts quickly.) Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> diff 18a1e644 Thu Feb 22 01:43:40 MST 2024 Darrick J. Wong <djwong@kernel.org> xfs: define an in-memory btree for storing refcount bag info during repairs Create a new in-memory btree type so that we can store refcount bag info in a much more memory-efficient and performant format. Recall that the refcount recordset regenerator computes the new recordset from browsing the rmap records. Let's say that the rmap records are: {agbno: 10, length: 40, ...} {agbno: 11, length: 3, ...} {agbno: 12, length: 20, ...} {agbno: 15, length: 1, ...} It is convenient to have a data structure that could quickly tell us the refcount for an arbitrary agbno without wasting memory. An array or a list could do that pretty easily. List suck because of the pointer overhead. xfarrays are a lot more compact, but we want to minimize sparse holes in the xfarray to constrain memory usage. Maintaining any kind of record order isn't needed for correctness, so I created the "rcbag", which is shorthand for an unordered list of (excerpted) reverse mappings. So we add the first rmap to the rcbag, and it looks like: 0: {agbno: 10, length: 40} The refcount for agbno 10 is 1. Then we move on to block 11, so we add the second rmap: 0: {agbno: 10, length: 40} 1: {agbno: 11, length: 3} The refcount for agbno 11 is 2. We move on to block 12, so we add the third: 0: {agbno: 10, length: 40} 1: {agbno: 11, length: 3} 2: {agbno: 12, length: 20} The refcount for agbno 12 and 13 is 3. We move on to block 14, and remove the second rmap: 0: {agbno: 10, length: 40} 1: NULL 2: {agbno: 12, length: 20} The refcount for agbno 14 is 2. We move on to block 15, and add the last rmap. But we don't care where it is and we don't want to expand the array so we put it in slot 1: 0: {agbno: 10, length: 40} 1: {agbno: 15, length: 1} 2: {agbno: 12, length: 20} The refcount for block 15 is 3. Notice how order doesn't matter in this list? That's why repair uses an unordered list, or "bag". The data structure is not a set because it does not guarantee uniqueness. That said, adding and removing specific items is now an O(n) operation because we have no idea where that item might be in the list. Overall, the runtime is O(n^2) which is bad. I realized that I could easily refactor the btree code and reimplement the refcount bag with an xfbtree. Adding and removing is now O(log2 n), so the runtime is at least O(n log2 n), which is much faster. In the end, the rcbag becomes a sorted list, but that's merely a detail of the implementation. The repair code doesn't care. (Note: That horrible xfs_db bmap_inflate command can be used to exercise this sort of rcbag insanity by cranking up refcounts quickly.) Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> diff 18a1e644 Thu Feb 22 01:43:40 MST 2024 Darrick J. Wong <djwong@kernel.org> xfs: define an in-memory btree for storing refcount bag info during repairs Create a new in-memory btree type so that we can store refcount bag info in a much more memory-efficient and performant format. Recall that the refcount recordset regenerator computes the new recordset from browsing the rmap records. Let's say that the rmap records are: {agbno: 10, length: 40, ...} {agbno: 11, length: 3, ...} {agbno: 12, length: 20, ...} {agbno: 15, length: 1, ...} It is convenient to have a data structure that could quickly tell us the refcount for an arbitrary agbno without wasting memory. An array or a list could do that pretty easily. List suck because of the pointer overhead. xfarrays are a lot more compact, but we want to minimize sparse holes in the xfarray to constrain memory usage. Maintaining any kind of record order isn't needed for correctness, so I created the "rcbag", which is shorthand for an unordered list of (excerpted) reverse mappings. So we add the first rmap to the rcbag, and it looks like: 0: {agbno: 10, length: 40} The refcount for agbno 10 is 1. Then we move on to block 11, so we add the second rmap: 0: {agbno: 10, length: 40} 1: {agbno: 11, length: 3} The refcount for agbno 11 is 2. We move on to block 12, so we add the third: 0: {agbno: 10, length: 40} 1: {agbno: 11, length: 3} 2: {agbno: 12, length: 20} The refcount for agbno 12 and 13 is 3. We move on to block 14, and remove the second rmap: 0: {agbno: 10, length: 40} 1: NULL 2: {agbno: 12, length: 20} The refcount for agbno 14 is 2. We move on to block 15, and add the last rmap. But we don't care where it is and we don't want to expand the array so we put it in slot 1: 0: {agbno: 10, length: 40} 1: {agbno: 15, length: 1} 2: {agbno: 12, length: 20} The refcount for block 15 is 3. Notice how order doesn't matter in this list? That's why repair uses an unordered list, or "bag". The data structure is not a set because it does not guarantee uniqueness. That said, adding and removing specific items is now an O(n) operation because we have no idea where that item might be in the list. Overall, the runtime is O(n^2) which is bad. I realized that I could easily refactor the btree code and reimplement the refcount bag with an xfbtree. Adding and removing is now O(log2 n), so the runtime is at least O(n log2 n), which is much faster. In the end, the rcbag becomes a sorted list, but that's merely a detail of the implementation. The repair code doesn't care. (Note: That horrible xfs_db bmap_inflate command can be used to exercise this sort of rcbag insanity by cranking up refcounts quickly.) Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> diff 2d295fe6 Fri Dec 15 11:03:36 MST 2023 Darrick J. Wong <djwong@kernel.org> xfs: repair inode records If an inode is so badly damaged that it cannot be loaded into the cache, fix the ondisk metadata and try again. If there /is/ a cached inode, fix any problems and apply any optimizations that can be solved incore. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> diff be408417 Wed Dec 06 19:40:59 MST 2023 Darrick J. Wong <djwong@kernel.org> xfs: implement block reservation accounting for btrees we're staging Create a new xrep_newbt structure to encapsulate a fake root for creating a staged btree cursor as well as to track all the blocks that we need to reserve in order to build that btree. As for the particular choice of lowspace thresholds and btree block slack factors -- at this point one could say that the thresholds in online repair come from bulkload_estimate_ag_slack in xfs_repair[1]. But that's not the entire story, since the offline btree rebuilding code in xfs_repair was merged as a retroport of the online btree code in this patchset! Before xfs_btree_staging.[ch] came along, xfs_repair determined the slack factor (aka the number of slots to leave unfilled in each new btree block) via open-coded logic in repair/phase5.c[2]. At that point the slack factors were arbitrary quantities per btree. The rmapbt automatically left 10 slots free; everything else left zero. That had a noticeable effect on performance straight after mounting because adding records to /any/ btree would result in splits. A few years ago when this patch was first written, Dave and I decided that repair should generate btree blocks that were 75% full unless space was tight, in which case it should try to fill the blocks to nearly full. We defined tight as ~10% free to avoid repair failures but settled on 3/32 (~9%) to avoid div64. IOWs, we mostly pulled the thresholds out of thin air. We've been QAing with those geometry numbers ever since. ;) Link: https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/tree/repair/bulkload.c?h=v6.5.0#n114 Link: https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/tree/repair/phase5.c?h=v4.19.0#n1349 Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> diff 6bdcf26a Fri Nov 03 11:34:46 MDT 2017 Christoph Hellwig <hch@lst.de> xfs: use a b+tree for the in-core extent list Replace the current linear list and the indirection array for the in-core extent list with a b+tree to avoid the need for larger memory allocations for the indirection array when lots of extents are present. The current extent list implementations leads to heavy pressure on the memory allocator when modifying files with a high extent count, and can lead to high latencies because of that. The replacement is a b+tree with a few quirks. The leaf nodes directly store the extent record in two u64 values. The encoding is a little bit different from the existing in-core extent records so that the start offset and length which are required for lookups can be retreived with simple mask operations. The inner nodes store a 64-bit key containing the start offset in the first half of the node, and the pointers to the next lower level in the second half. In either case we walk the node from the beginninig to the end and do a linear search, as that is more efficient for the low number of cache lines touched during a search (2 for the inner nodes, 4 for the leaf nodes) than a binary search. We store termination markers (zero length for the leaf nodes, an otherwise impossible high bit for the inner nodes) to terminate the key list / records instead of storing a count to use the available cache lines as efficiently as possible. One quirk of the algorithm is that while we normally split a node half and half like usual btree implementations we just spill over entries added at the very end of the list to a new node on its own. This means we get a 100% fill grade for the common cases of bulk insertion when reading an inode into memory, and when only sequentially appending to a file. The downside is a slightly higher chance of splits on the first random insertions. Both insert and removal manually recurse into the lower levels, but the bulk deletion of the whole tree is still implemented as a recursive function call, although one limited by the overall depth and with very little stack usage in every iteration. For the first few extents we dynamically grow the list from a single extent to the next powers of two until we have a first full leaf block and that building the actual tree. The code started out based on the generic lib/btree.c code from Joern Engel based on earlier work from Peter Zijlstra, but has since been rewritten beyond recognition. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> diff 2a721dbb Tue Oct 17 22:37:45 MDT 2017 Darrick J. Wong <darrick.wong@oracle.com> xfs: scrub symbolic links Create the infrastructure to scrub symbolic link data. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> diff 2a721dbb Tue Oct 17 22:37:45 MDT 2017 Darrick J. Wong <darrick.wong@oracle.com> xfs: scrub symbolic links Create the infrastructure to scrub symbolic link data. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> |
Completed in 244 milliseconds