Cross Reference: /linux-master/fs/xfs/scrub/reap.c

Revision	Date	Author	Comments
# 32080a9b	22-Feb-2024	Darrick J. Wong <djwong@kernel.org>	xfs: repair the rmapbt Rebuild the reverse mapping btree from all primary metadata. This first patch establishes the bare mechanics of finding records and putting together a new ondisk tree; more complex pieces are needed to make it work properly. Link: Documentation/filesystems/xfs-online-fsck-design.rst Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
# dbbdbd00	15-Dec-2023	Darrick J. Wong <djwong@kernel.org>	xfs: repair problems in CoW forks Try to repair errors that we see in file CoW forks so that we don't do stupid things like remap garbage into a file. There's not a lot we can do with the COW fork -- the ondisk metadata record only that the COW staging extents are owned by the refcount btree, which effectively means that we can't reconstruct this incore structure from scratch. Actually, this is even worse -- we can't touch written extents, because those map space that are actively under writeback, and there's not much to do with delalloc reservations. Hence we can only detect crosslinked unwritten extents and fix them by punching out the problematic parts and replacing them with delalloc extents. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
# 66da1128	15-Dec-2023	Darrick J. Wong <djwong@kernel.org>	xfs: reintroduce reaping of file metadata blocks to xrep_reap_extents Back in commit a55e07308831b ("xfs: only allow reaping of per-AG blocks in xrep_reap_extents"), we removed from the reaping code the ability to handle bmbt blocks. At the time, the reaping code only walked single blocks, didn't correctly detect crosslinked blocks, and the special casing made the function hard to understand. It was easier to remove unneeded functionality prior to fixing all the bugs. Now that we've fixed the problems, we want again the ability to reap file metadata blocks. Reintroduce the per-file reaping functionality atop the current implementation. We require that sc->sa is uninitialized, so that we can use it to hold all the per-AG context for a given extent. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
# 0f08af0f	15-Dec-2023	Darrick J. Wong <djwong@kernel.org>	xfs: move the per-AG datatype bitmaps to separate files Move struct xagb_bitmap to its own pair of C and header files per request of Christoph. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
# 6ece924b	15-Dec-2023	Darrick J. Wong <djwong@kernel.org>	xfs: create separate structures and code for u32 bitmaps Create a version of the xbitmap that handles 32-bit integer intervals and adapt the xfs_agblock_t bitmap to use it. This reduces the size of the interval tree nodes from 48 to 36 bytes and enables us to use a more efficient slab (:0000040 instead of :0000048) which allows us to pack more nodes into a single slab page (102 vs 85). As a side effect, the users of these bitmaps no longer have to convert between u32 and u64 quantities just to use the bitmap; and the hairy overflow checking code in xagb_bitmap_test goes away. Later in this patchset we're going to add bitmaps for xfs_agino_t, xfs_rgblock_t, and xfs_dablk_t, so the increase in code size (5622 vs. 9959 bytes) seems worth it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
# c0e37f07	14-Dec-2023	Darrick J. Wong <djwong@kernel.org>	xfs: fix an off-by-one error in xreap_agextent_binval Overall, this function tries to find and invalidate all buffers for a given extent of space on the data device. The inner for loop in this function tries to find all xfs_bufs for a given daddr. The lengths of all possible cached buffers range from 1 fsblock to the largest needed to contain a 64k xattr value (~17fsb). The scan is capped to avoid looking at anything buffer going past the given extent. Unfortunately, the loop continuation test is wrong -- max_fsbs is the largest size we want to scan, not one past that. Put another way, this loop is actually 1-indexed, not 0-indexed. Therefore, the continuation test should use <=, not <. As a result, online repairs of btree blocks fails to stale any buffers for btrees that are being torn down, which causes later assertions in the buffer cache when another thread creates a different-sized buffer. This happens in xfs/709 when allocating an inode cluster buffer: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 3346128 at fs/xfs/xfs_message.c:104 assfail+0x3a/0x40 [xfs] CPU: 0 PID: 3346128 Comm: fsstress Not tainted 6.7.0-rc4-djwx #rc4 RIP: 0010:assfail+0x3a/0x40 [xfs] Call Trace: <TASK> _xfs_buf_obj_cmp+0x4a/0x50 xfs_buf_get_map+0x191/0xba0 xfs_trans_get_buf_map+0x136/0x280 xfs_ialloc_inode_init+0x186/0x340 xfs_ialloc_ag_alloc+0x254/0x720 xfs_dialloc+0x21f/0x870 xfs_create_tmpfile+0x1a9/0x2f0 xfs_rename+0x369/0xfd0 xfs_vn_rename+0xfa/0x170 vfs_rename+0x5fb/0xc30 do_renameat2+0x52d/0x6e0 __x64_sys_renameat2+0x4b/0x60 do_syscall_64+0x3b/0xe0 entry_SYSCALL_64_after_hwframe+0x46/0x4e A later refactoring patch in the online repair series fixed this by accident, which is why I didn't notice this until I started testing only the patches that are likely to end up in 6.8. Fixes: 1c7ce115e521 ("xfs: reap large AG metadata extents when possible") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
# 3f3cec03	06-Dec-2023	Darrick J. Wong <djwong@kernel.org>	xfs: force small EFIs for reaping btree extents Introduce the concept of a defer ops barrier to separate consecutively queued pending work items of the same type. With a barrier in place, the two work items will be tracked separately, and receive separate log intent items. The goal here is to prevent reaping of old metadata blocks from creating unnecessarily huge EFIs that could then run the risk of overflowing the scrub transaction. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
# 4c88fef3	06-Dec-2023	Darrick J. Wong <djwong@kernel.org>	xfs: remove __xfs_free_extent_later xfs_free_extent_later is a trivial helper, so remove it to reduce the amount of thinking required to understand the deferred freeing interface. This will make it easier to introduce automatic reaping of speculative allocations in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
# 014ad537	10-Aug-2023	Darrick J. Wong <djwong@kernel.org>	xfs: use per-AG bitmaps to reap unused AG metadata blocks during repair The AGFL repair code uses a series of bitmaps to figure out where there are OWN_AG blocks that are not claimed by the free space and rmap btrees. These blocks become the new AGFL, and any overflow is reaped. The bitmaps current track xfs_fsblock_t even though we already know the AG number. In the last patch, we introduced a new bitmap "type" for tracking xfs_agblock_t extents. Port the reaping code and the AGFL repair to use this new type, which makes it very obvious what we're tracking. This also eliminates a bunch of unnecessary agblock <-> fsblock conversions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 1c7ce115	10-Aug-2023	Darrick J. Wong <djwong@kernel.org>	xfs: reap large AG metadata extents when possible When we're freeing extents that have been set in a bitmap, break the bitmap extent into multiple sub-extents organized by fate, and reap the extents. This enables us to dispose of old resources more efficiently than doing them block by block. While we're at it, rename the reaping functions to make it clear that they're reaping per-AG extents. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 9ed851f6	10-Aug-2023	Darrick J. Wong <djwong@kernel.org>	xfs: allow scanning ranges of the buffer cache for live buffers After an online repair, we need to invalidate buffers representing the blocks from the old metadata that we're replacing. It's possible that parts of a tree that were previously cached in memory are no longer accessible due to media failure or other corruption on interior nodes, so repair figures out the old blocks from the reverse mapping data and scans the buffer cache directly. In other words, online fsck needs to find all the live (i.e. non-stale) buffers for a range of fsblocks so that it can invalidate them. Unfortunately, the current buffer cache code triggers asserts if the rhashtable lookup finds a non-stale buffer of a different length than the key we searched for. For regular operation this is desirable, but for this repair procedure, we don't care since we're going to forcibly stale the buffer anyway. Add an internal lookup flag to avoid the assert. Skip buffers that are already XBF_STALE. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 77a1396f	10-Aug-2023	Darrick J. Wong <djwong@kernel.org>	xfs: rearrange xrep_reap_block to make future code flow easier Rearrange the logic inside xrep_reap_block to make it more obvious that crosslinked metadata blocks are handled differently. Add a couple of tracepoints so that we can tell what's going on at the end of a btree rebuild operation. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 5fee784e	10-Aug-2023	Darrick J. Wong <djwong@kernel.org>	xfs: use deferred frees to reap old btree blocks Use deferred frees (EFIs) to reap the blocks of a btree that we just replaced. This helps us to shrink the window in which those old blocks could be lost due to a system crash, though we try to flush the EFIs every few hundred blocks so that we don't also overflow the transaction reservations during and after we commit the new btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# a55e0730	10-Aug-2023	Darrick J. Wong <djwong@kernel.org>	xfs: only allow reaping of per-AG blocks in xrep_reap_extents Now that we've refactored btree cursors to require the caller to pass in a perag structure, there are numerous problems in xrep_reap_extents if it's being called to reap extents for an inode metadata repair. We don't have any repair functions that can do that, so drop the support for now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 8e54e06b	10-Aug-2023	Darrick J. Wong <djwong@kernel.org>	xfs: only invalidate blocks if we're going to free them When we're discarding old btree blocks after a repair, only invalidate the buffers for the ones that we're freeing -- if the metadata was crosslinked with another data structure, we don't want to touch it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# e06ef14b	10-Aug-2023	Darrick J. Wong <djwong@kernel.org>	xfs: move the post-repair block reaping code to a separate file Reaping blocks after a repair is a complicated affair involving a lot of rmap btree lookups and figuring out if we're going to unmap or free old metadata blocks that might be crosslinked. Eventually, we will need to be able to reap per-AG metadata blocks, bmbt blocks from inode forks, garbage CoW staging extents, and (even later) blocks from btrees rooted in inodes. This results in a lot of reaping code, so we might as well split that off while it's easy. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>