#
622d88e2 |
|
22-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: move xfs_symlink_remote.c declarations to xfs_symlink_remote.h Move declarations for libxfs symlink functions into a separate header file like we do for most everything else. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
6c8127e9 |
|
22-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: xfs_bmap_finish_one should map unwritten extents properly The deferred bmap work state and the log item can transmit unwritten state, so the XFS_BMAP_MAP handler must map in extents with that unwritten state. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
52f80706 |
|
22-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: support deferred bmap updates on the attr fork The deferred bmap update log item has always supported the attr fork, so plumb this in so that higher layers can access this. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
2b6a5ec2 |
|
22-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: fix xfs_bunmapi to allow unmapping of partial rt extents When XFS_BMAPI_REMAP is passed to bunmapi, that means that we want to remove part of a block mapping without touching the allocator. For realtime files with rtextsize > 1, that also means that we should skip all the code that changes a partial remove request into an unwritten extent conversion. IOWs, bunmapi in this mode should handle removing the mapping from the rt file and nothing else. Note that XFS_BMAPI_REMAP callers are required to decrement the reference count and/or free the space manually. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
80284115 |
|
22-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: move xfs_bmap_defer_add to xfs_bmap_item.c Move the code that adds the incore xfs_bmap_item deferred work data to a transaction live with the BUI log item code. This means that the file mapping code no longer has to know about the inner workings of the BUI log items. As a consequence, we can hide the _get_group helper. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
2a15e768 |
|
22-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: clean up bmap log intent item tracepoint callsites Pass the incore bmap structure to the tracepoints instead of open-coding the argument passing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
32080a9b |
|
22-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: repair the rmapbt Rebuild the reverse mapping btree from all primary metadata. This first patch establishes the bare mechanics of finding records and putting together a new ondisk tree; more complex pieces are needed to make it work properly. Link: Documentation/filesystems/xfs-online-fsck-design.rst Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
5049ff4d |
|
22-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: create a helper to decide if a file mapping targets the rt volume Create a helper so that we can stop open-coding this decision everywhere. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
6a701eb8 |
|
22-Feb-2024 |
Christoph Hellwig <hch@lst.de> |
xfs: move and rename xfs_btree_read_bufl Despite its name, xfs_btree_read_bufl doesn't contain any btree-related functionaliy and isn't used by the btree code. Move it to xfs_bmap.c, hard code the refval and ops arguments and rename it to xfs_bmap_read_buf. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
fb0793f2 |
|
22-Feb-2024 |
Christoph Hellwig <hch@lst.de> |
xfs: open code xfs_btree_check_lptr in xfs_bmap_btree_to_extents xfs_bmap_btree_to_extents always passes a level of 1 to xfs_btree_check_lptr, thus making the level check redundant. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
88ee2f48 |
|
22-Feb-2024 |
Christoph Hellwig <hch@lst.de> |
xfs: split the per-btree union in struct xfs_btree_cur Split up the union that encodes btree-specific fields in struct xfs_btree_cur. Most fields in there are specific to the btree type encoded in xfs_btree_ops.type, and we can use the obviously named union for that. But one field is specific to the bmapbt and two are shared by the refcount and rtrefcountbt. Move those to a separate union to make the usage clear and not need a separate struct for the refcount-related fields. This will also make unnecessary some very awkward btree cursor refc/rtrefc switching logic in the rtrefcount patchset. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
ad065ef0 |
|
22-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: set btree block buffer ops in _init_buf Set the btree block buffer ops in xfs_btree_init_buf since we already have access to that information through the btree ops. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
c87e3bf7 |
|
22-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: initialize btree blocks using btree_ops structure Notice now that the btree ops structure encodes btree geometry flags and the magic number through the buffer ops. Refactor the btree block initialization functions to use the btree ops so that we no longer have to open code all that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
e9e66df8 |
|
22-Feb-2024 |
Christoph Hellwig <hch@lst.de> |
xfs: remove bc_ino.flags Just move the two flags into bc_flags where there is plenty of space. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
fd9c7f77 |
|
22-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: encode the btree geometry flags in the btree ops structure Certain btree flags never change for the life of a btree cursor because they describe the geometry of the btree itself. Encode these in the btree ops structure and reduce the amount of code required in each btree type's init_cursor functions. This also frees up most of the bits in bc_flags. A previous version of this patch also converted the open-coded flags logic to helpers. This was removed due to the pending refactoring (that follows this patch) to eliminate most of the state flags. Conversion script: sed \ -e 's/XFS_BTREE_LONG_PTRS/XFS_BTGEO_LONG_PTRS/g' \ -e 's/XFS_BTREE_ROOT_IN_INODE/XFS_BTGEO_ROOT_IN_INODE/g' \ -e 's/XFS_BTREE_LASTREC_UPDATE/XFS_BTGEO_LASTREC_UPDATE/g' \ -e 's/XFS_BTREE_OVERLAPPING/XFS_BTGEO_OVERLAPPING/g' \ -e 's/cur->bc_flags & XFS_BTGEO_/cur->bc_ops->geom_flags \& XFS_BTGEO_/g' \ -i $(git ls-files fs/xfs/*.[ch] fs/xfs/libxfs/*.[ch] fs/xfs/scrub/*.[ch]) Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
989d5ec3 |
|
22-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: report XFS_IS_CORRUPT errors to the health system Whenever we encounter XFS_IS_CORRUPT failures, we should report that to the health monitoring system for later reporting. I started with this semantic patch and massaged everything until it built: @@ expression mp, test; @@ - if (XFS_IS_CORRUPT(mp, test)) return -EFSCORRUPTED; + if (XFS_IS_CORRUPT(mp, test)) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; } @@ expression mp, test; identifier label, error; @@ - if (XFS_IS_CORRUPT(mp, test)) { error = -EFSCORRUPTED; goto label; } + if (XFS_IS_CORRUPT(mp, test)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto label; } Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
a78d10f4 |
|
22-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: report btree block corruption errors to the health system Whenever we encounter corrupt btree blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
1196f3f5 |
|
22-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: report block map corruption errors to the health tracking system Whenever we encounter a corrupt block mapping, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
3fed24ff |
|
19-Feb-2024 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
xfs: Replace xfs_isilocked with xfs_assert_ilocked To use the new rwsem_assert_held()/rwsem_assert_held_write(), we can't use the existing ASSERT macro. Add a new xfs_assert_ilocked() and convert all the callers. Fix an apparent bug in xfs_isilocked(): If the caller specifies XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL, xfs_assert_ilocked() will check both the IOLOCK and the ILOCK are held for write. xfs_isilocked() only checked that the ILOCK was held for write. xfs_assert_ilocked() is always on, even if DEBUG or XFS_WARN aren't defined. It's a cheap check, so I don't think it's worth defining it away. Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
0b3a76e9 |
|
15-Jan-2024 |
Dave Chinner <dchinner@redhat.com> |
xfs: use GFP_KERNEL in pure transaction contexts When running in a transaction context, memory allocations are scoped to GFP_NOFS. Hence we don't need to use GFP_NOFS contexts in pure transaction context allocations - GFP_KERNEL will automatically get converted to GFP_NOFS as appropriate. Go through the code and convert all the obvious GFP_NOFS allocations in transaction context to use GFP_KERNEL. This further reduces the explicit use of GFP_NOFS in XFS. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
d61b40bf |
|
08-Jan-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: fix backwards logic in xfs_bmap_alloc_account We're only allocating from the realtime device if the inode is marked for realtime and we're /not/ allocating into the attr fork. Fixes: 58643460546d ("xfs: also use xfs_bmap_btalloc_accounting for RT allocations") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
6e145f94 |
|
19-Dec-2023 |
Christoph Hellwig <hch@lst.de> |
xfs: make if_data a void pointer The xfs_ifork structure currently has a union of the if_root void pointer and the if_data char pointer. In either case it is an opaque pointer that depends on the fork format. Replace the union with a single if_data void pointer as that is what almost all callers want. Only the symlink NULL termination code in xfs_init_local_fork actually needs a new local variable now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
676544c2 |
|
17-Dec-2023 |
Christoph Hellwig <hch@lst.de> |
xfs: indicate if xfs_bmap_adjacent changed ap->blkno Add a return value to xfs_bmap_adjacent to indicate if it did change ap->blkno or not. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
58643460 |
|
17-Dec-2023 |
Christoph Hellwig <hch@lst.de> |
xfs: also use xfs_bmap_btalloc_accounting for RT allocations Make xfs_bmap_btalloc_accounting more generic by handling the RT quota reservations and then also use it from xfs_bmap_rtalloc instead of open coding the accounting logic there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
eef519d7 |
|
17-Dec-2023 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the xfs_alloc_arg argument to xfs_bmap_btalloc_accounting xfs_bmap_btalloc_accounting only uses the len field from args, but that has just been propagated to ap->length field by the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
a59eb5fc |
|
15-Dec-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: create a new inode fork block unmap helper Create a new helper to unmap blocks from an inode's fork. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
e744cef2 |
|
15-Dec-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: zap broken inode forks Determine if inode fork damage is responsible for the inode being unable to pass the ifork verifiers in xfs_iget and zap the fork contents if this is true. Once this is done the fork will be empty but we'll be able to construct an in-core inode, and a subsequent call to the inode fork repair ioctl will search the rmapbt to rebuild the records that were in the fork. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
603ce8ab |
|
13-Dec-2023 |
Christoph Hellwig <hch@lst.de> |
xfs: pass the defer ops directly to xfs_defer_add Pass a pointer to the xfs_defer_op_type structure to xfs_defer_add and remove the indirection through the xfs_defer_ops_type enum and a global table of all possible operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
e6af9c98 |
|
04-Dec-2023 |
Jiachen Zhang <zhangjiachen.jaycee@bytedance.com> |
xfs: ensure logflagsp is initialized in xfs_bmap_del_extent_real In the case of returning -ENOSPC, ensure logflagsp is initialized by 0. Otherwise the caller __xfs_bunmapi will set uninitialized illegal tmp_logflags value into xfs log, which might cause unpredictable error in the log recovery procedure. Also, remove the flags variable and set the *logflagsp directly, so that the code should be more robust in the long run. Fixes: 1b24b633aafe ("xfs: move some more code into xfs_bmap_del_extent_real") Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
4c88fef3 |
|
06-Dec-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: remove __xfs_free_extent_later xfs_free_extent_later is a trivial helper, so remove it to reduce the amount of thinking required to understand the deferred freeing interface. This will make it easier to introduce automatic reaping of speculative allocations in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
05564124 |
|
16-Oct-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: convert do_div calls to xfs_rtb_to_rtx helper calls Convert these calls to use the helpers, and clean up all these places where the same variable can have different units depending on where it is in the function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
5dc3a80d |
|
16-Oct-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: create helpers to convert rt block numbers to rt extent numbers Create helpers to do unit conversions of rt block numbers to rt extent numbers. There are three variations -- one to compute the rt extent number from an rt block number; one to compute the offset of an rt block within an rt extent; and one to extract both. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
3d2b6d03 |
|
16-Oct-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: rename xfs_verify_rtext to xfs_verify_rtbext This helper function validates that a range of *blocks* in the realtime section is completely contained within the realtime section. It does /not/ validate ranges of *rtextents*. Rename the function to avoid suggesting that it does, and change the type of the @len parameter since xfs_rtblock_t is a position unit, not a length unit. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
68db60bf |
|
16-Oct-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: create a helper to compute leftovers of realtime extents Create a helper to compute the misalignment between a file extent (xfs_extlen_t) and a realtime extent. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
13928113 |
|
16-Oct-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h Move all the declarations for functionality in xfs_rtbitmap.c into a separate xfs_rtbitmap.h header file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
ddd98076 |
|
16-Oct-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: fix units conversion error in xfs_bmap_del_extent_delay The unit conversions in this function do not make sense. First we convert a block count to bytes, then divide that bytes value by rextsize, which is in blocks, to get an rt extent count. You can't divide bytes by blocks to get a (possibly multiblock) extent value. Fortunately nobody uses delalloc on the rt volume so this hasn't mattered. Fixes: fa5c836ca8eb5 ("xfs: refactor xfs_bunmapi_cow") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
6c664484 |
|
16-Oct-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: hoist freeing of rt data fork extent mappings Currently, xfs_bmap_del_extent_real contains a bunch of code to convert the physical extent of a data fork mapping for a realtime file into rt extents and pass that to the rt extent freeing function. Since the details of this aren't needed when CONFIG_XFS_REALTIME=n, move it to xfs_rtbitmap.c to reduce code size when realtime isn't enabled. This will (one day) enable realtime EFIs to reuse the same unit-converting call with less code duplication. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
b742d7b4 |
|
28-Jun-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: use deferred frees for btree block freeing Btrees that aren't freespace management trees use the normal extent allocation and freeing routines for their blocks. Hence when a btree block is freed, a direct call to xfs_free_extent() is made and the extent is immediately freed. This puts the entire free space management btrees under this path, so we are stacking btrees on btrees in the call stack. The inobt, finobt and refcount btrees all do this. However, the bmap btree does not do this - it calls xfs_free_extent_later() to defer the extent free operation via an XEFI and hence it gets processed in deferred operation processing during the commit of the primary transaction (i.e. via intent chaining). We need to change xfs_free_extent() to behave in a non-blocking manner so that we can avoid deadlocks with busy extents near ENOSPC in transactions that free multiple extents. Inserting or removing a record from a btree can cause a multi-level tree merge operation and that will free multiple blocks from the btree in a single transaction. i.e. we can call xfs_free_extent() multiple times, and hence the btree manipulation transaction is vulnerable to this busy extent deadlock vector. To fix this, convert all the remaining callers of xfs_free_extent() to use xfs_free_extent_later() to queue XEFIs and hence defer processing of the extent frees to a context that can be safely restarted if a deadlock condition is detected. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
7dfee17b |
|
04-Jun-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: validate block number being freed before adding to xefi Bad things happen in defered extent freeing operations if it is passed a bad block number in the xefi. This can come from a bogus agno/agbno pair from deferred agfl freeing, or just a bad fsbno being passed to __xfs_free_extent_later(). Either way, it's very difficult to diagnose where a null perag oops in EFI creation is coming from when the operation that queued the xefi has already been completed and there's no longer any trace of it around.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
b82a5c42 |
|
01-May-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: don't unconditionally null args->pag in xfs_bmap_btalloc_at_eof xfs/170 on a filesystem with su=128k,sw=4 produces this splat: BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP CPU: 1 PID: 4022907 Comm: dd Tainted: G W 6.3.0-xfsx #2 6ebeeffbe9577d32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-bu RIP: 0010:xfs_perag_rele+0x10/0x70 [xfs] RSP: 0018:ffffc90001e43858 EFLAGS: 00010217 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000100 RDX: ffffffffa054e717 RSI: 0000000000000005 RDI: 0000000000000000 RBP: ffff888194eea000 R08: 0000000000000000 R09: 0000000000000037 R10: ffff888100ac1cb0 R11: 0000000000000018 R12: 0000000000000000 R13: ffffc90001e43a38 R14: ffff888194eea000 R15: ffff888194eea000 FS: 00007f93d1a0e740(0000) GS:ffff88843fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000018a34f000 CR4: 00000000003506e0 Call Trace: <TASK> xfs_bmap_btalloc+0x1a7/0x5d0 [xfs f85291d6841cbb3dc740083f1f331c0327394518] xfs_bmapi_allocate+0xee/0x470 [xfs f85291d6841cbb3dc740083f1f331c0327394518] xfs_bmapi_write+0x539/0x9e0 [xfs f85291d6841cbb3dc740083f1f331c0327394518] xfs_iomap_write_direct+0x1bb/0x2b0 [xfs f85291d6841cbb3dc740083f1f331c0327394518] xfs_direct_write_iomap_begin+0x51c/0x710 [xfs f85291d6841cbb3dc740083f1f331c0327394518] iomap_iter+0x132/0x2f0 __iomap_dio_rw+0x2f8/0x840 iomap_dio_rw+0xe/0x30 xfs_file_dio_write_aligned+0xad/0x180 [xfs f85291d6841cbb3dc740083f1f331c0327394518] xfs_file_write_iter+0xfb/0x190 [xfs f85291d6841cbb3dc740083f1f331c0327394518] vfs_write+0x2eb/0x410 ksys_write+0x65/0xe0 do_syscall_64+0x2b/0x80 This crash occurs under the "out_low_space" label. We grabbed a perag reference, passed it via args->pag into xfs_bmap_btalloc_at_eof, and afterwards args->pag is NULL. Fix the second function not to clobber args->pag if the caller had passed one in. Fixes: 85843327094f ("xfs: factor xfs_bmap_btalloc()") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
9419092f |
|
26-Apr-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: fix livelock in delayed allocation at ENOSPC On a filesystem with a non-zero stripe unit and a large sequential write, delayed allocation will set a minimum allocation length of the stripe unit. If allocation fails because there are no extents long enough for an aligned minlen allocation, it is supposed to fall back to unaligned allocation which allows single block extents to be allocated. When the allocator code was rewritting in the 6.3 cycle, this fallback was broken - the old code used args->fsbno as the both the allocation target and the allocation result, the new code passes the target as a separate parameter. The conversion didn't handle the aligned->unaligned fallback path correctly - it reset args->fsbno to the target fsbno on failure which broke allocation failure detection in the high level code and so it never fell back to unaligned allocations. This resulted in a loop in writeback trying to allocate an aligned block, getting a false positive success, trying to insert the result in the BMBT. This did nothing because the extent already was in the BMBT (merge results in an unchanged extent) and so it returned the prior extent to the conversion code as the current iomap. Because the iomap returned didn't cover the offset we tried to map, xfs_convert_blocks() then retries the allocation, which fails in the same way and now we have a livelock. Reported-and-tested-by: Brian Foster <bfoster@redhat.com> Fixes: 85843327094f ("xfs: factor xfs_bmap_btalloc()") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
c95356ca |
|
11-Apr-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: _{attr,data}_map_shared should take ILOCK_EXCL until iread_extents is completely done While fuzzing the data fork extent count on a btree-format directory with xfs/375, I observed the following (excerpted) splat: XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/libxfs/xfs_bmap.c, line: 1208 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 43192 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs] Call Trace: <TASK> xfs_iread_extents+0x1af/0x210 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] xchk_dir_walk+0xb8/0x190 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] xchk_parent_count_parent_dentries+0x41/0x80 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] xchk_parent_validate+0x199/0x2e0 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] xchk_parent+0xdf/0x130 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] xfs_scrub_metadata+0x2b8/0x730 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] xfs_scrubv_metadata+0x38b/0x4d0 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] xfs_ioc_scrubv_metadata+0x111/0x160 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] xfs_file_ioctl+0x367/0xf50 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd] __x64_sys_ioctl+0x82/0xa0 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The cause of this is a race condition in xfs_ilock_data_map_shared, which performs an unlocked access to the data fork to guess which lock mode it needs: Thread 0 Thread 1 xfs_need_iread_extents <observe no iext tree> xfs_ilock(..., ILOCK_EXCL) xfs_iread_extents <observe no iext tree> <check ILOCK_EXCL> <load bmbt extents into iext> <notice iext size doesn't match nextents> xfs_need_iread_extents <observe iext tree> xfs_ilock(..., ILOCK_SHARED) <tear down iext tree> xfs_iunlock(..., ILOCK_EXCL) xfs_iread_extents <observe no iext tree> <check ILOCK_EXCL> *BOOM* Fix this race by adding a flag to the xfs_ifork structure to indicate that we have not yet read in the extent records and changing the predicate to look at the flag state, not if_height. The memory barrier ensures that the flag will not be set until the very end of the function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
6a3bd8fc |
|
11-Apr-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: complain about bad file mapping records in the ondisk bmbt Similar to what we've just done for the other btrees, create a function to log corrupt bmbt records and call it whenever we encounter a bad record in the ondisk btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
774a99b4 |
|
11-Apr-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: give xfs_bmap_intent its own perag reference Give the xfs_bmap_intent an active reference to the perag structure data. This reference will be used to enable scrub intent draining functionality in subsequent patches. Later, shrink will use these passive references to know if an AG is quiesced or not. The reason why we take a passive ref for a file mapping operation is simple: we're committing to some sort of action involving space in an AG, so we want to indicate our interest in that AG. The space is already allocated, so we need to be able to operate on AGs that are offline or being shrunk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
f8f1ed1a |
|
12-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: return a referenced perag from filestreams allocator Now that the filestreams AG selection tracks active perags, we need to return an active perag to the core allocator code. This is because the file allocation the filestreams code will run are AG specific allocations and so need to pin the AG until the allocations complete. We cannot rely on the filestreams item reference to do this - the filestreams association can be torn down at any time, hence we need to have a separate reference for the allocation process to pin the AG after it has been selected. This means there is some perag juggling in allocation failure fallback paths as they will do all AG scans in the case the AG specific allocation fails. Hence we need to track the perag reference that the filestream allocator returned to make sure we don't leak it on repeated allocation failure. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
8f7747ad |
|
12-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: move xfs_bmap_btalloc_filestreams() to xfs_filestreams.c xfs_bmap_btalloc_filestreams() calls two filestreams functions to select the AG to allocate from. Both those functions end up in the same selection function that iterates all AGs multiple times. Worst case, xfs_bmap_btalloc_filestreams() can iterate all AGs 4 times just to select the initial AG to allocate in. Move the AG selection to fs/xfs/xfs_filestreams.c as a single interface so that the inefficient AG interation is contained entirely within the filestreams code. This will allow the implementation to be simplified and made more efficient in future patches. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
05cf492a |
|
12-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: use xfs_bmap_longest_free_extent() in filestreams The code in xfs_bmap_longest_free_extent() is open coded in xfs_filestream_pick_ag(). Export xfs_bmap_longest_free_extent and call it from the filestreams code instead. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
6b637ad0 |
|
12-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: get rid of notinit from xfs_bmap_longest_free_extent It is only set if reading the AGF gets a EAGAIN error. Just return the EAGAIN error and handle that error in the callers. This means we can remove the not_init parameter from xfs_bmap_select_minlen(), too, because the use of not_init there is pessimistic. If we can't read the agf, it won't increase blen. The only time we actually care whether we checked all the AGFs for contiguous free space is when the best length is less than the minimum allocation length. If not_init is set, then we ignore blen and set the minimum alloc length to the absolute minimum, not the best length we know already is present. However, if blen is less than the minimum we're going to ignore it anyway, regardless of whether we scanned all the AGFs or not. Hence not_init can go away, because we only use if blen is good from the scanned AGs otherwise we ignore it altogether and use minlen. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
89563e7d |
|
12-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: factor out filestreams from xfs_bmap_btalloc_nullfb There's many if (filestreams) {} else {} branches in this function. Split it out into a filestreams specific function so that we can then work directly on cleaning up the filestreams code without impacting the rest of the allocation algorithms. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
230e8fe8 |
|
12-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: fold xfs_alloc_ag_vextent() into callers We don't need the multiplexing xfs_alloc_ag_vextent() provided anymore - we can just call the exact/near/size variants directly. This allows us to remove args->type completely and stop using args->fsbno as an input to the allocator algorithms. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
5f36b2ce |
|
12-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: introduce xfs_alloc_vextent_exact_bno() Two of the callers to xfs_alloc_vextent_this_ag() actually want exact block number allocation, not anywhere-in-ag allocation. Split this out from _this_ag() as a first class citizen so no external extent allocation code needs to care about args->type anymore. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
db4710fd |
|
12-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: introduce xfs_alloc_vextent_near_bno() The remaining callers of xfs_alloc_vextent() are all doing NEAR_BNO allocations. We can replace that function with a new xfs_alloc_vextent_near_bno() function that does this explicitly. We also multiplex NEAR_BNO allocations through xfs_alloc_vextent_this_ag via args->type. Replace all of these with direct calls to xfs_alloc_vextent_near_bno(), too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
2a7f6d41 |
|
12-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: use xfs_alloc_vextent_start_bno() where appropriate Change obvious callers of single AG allocation to use xfs_alloc_vextent_start_bno(). Callers no long need to specify XFS_ALLOCTYPE_START_BNO, and so the type can be driven inward and removed. While doing this, also pass the allocation target fsb as a parameter rather than encoding it in args->fsbno. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
319c9e87 |
|
12-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: use xfs_alloc_vextent_first_ag() where appropriate Change obvious callers of single AG allocation to use xfs_alloc_vextent_first_ag(). This gets rid of XFS_ALLOCTYPE_FIRST_AG as the type used within xfs_alloc_vextent_first_ag() during iteration is _THIS_AG. Hence we can remove the setting of args->type from all the callers of _first_ag() and remove the alloctype. While doing this, pass the allocation target fsb as a parameter rather than encoding it in args->fsbno. This starts the process of making args->fsbno an output only variable rather than input/output. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
85843327 |
|
12-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: factor xfs_bmap_btalloc() There are several different contexts xfs_bmap_btalloc() handles, and large chunks of the code execute independent allocation contexts. Try to untangle this mess a bit. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
74c36a86 |
|
12-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: use xfs_alloc_vextent_this_ag() where appropriate Change obvious callers of single AG allocation to use xfs_alloc_vextent_this_ag(). Drive the per-ag grabbing out to the callers, too, so that callers with active references don't need to do new lookups just for an allocation in a context that already has a perag reference. The only remaining caller that does single AG allocation through xfs_alloc_vextent() is xfs_bmap_btalloc() with XFS_ALLOCTYPE_NEAR_BNO. That is going to need more untangling before it can be converted cleanly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
76257a15 |
|
12-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: introduce xfs_for_each_perag_wrap() In several places we iterate every AG from a specific start agno and wrap back to the first AG when we reach the end of the filesystem to continue searching. We don't have a primitive for this iteration yet, so add one for conversion of these algorithms to per-ag based iteration. The filestream AG select code is a mess, and this initially makes it worse. The per-ag selection needs to be driven completely into the filestream code to clean this up and it will be done in a future patch that makes the filestream allocator use active per-ag references correctly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
7ac2ff8b |
|
12-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: perags need atomic operational state We currently don't have any flags or operational state in the xfs_perag except for the pagf_init and pagi_init flags. And the agflreset flag. Oh, there's also the pagf_metadata and pagi_inodeok flags, too. For controlling per-ag operations, we are going to need some atomic state flags. Hence add an opstate field similar to what we already have in the mount and log, and convert all these state flags across to atomic bit operations. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
692b6cdd |
|
10-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: t_firstblock is tracking AGs not blocks The tp->t_firstblock field is now raelly tracking the highest AG we have locked, not the block number of the highest allocation we've made. It's purpose is to prevent AGF locking deadlocks, so rename it to "highest AG" and simplify the implementation to just track the agno rather than a fsbno. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
36b6ad2d |
|
10-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: drop firstblock constraints from allocation setup Now that xfs_alloc_vextent() does all the AGF deadlock prevention filtering for multiple allocations in a single transaction, we no longer need the allocation setup code to care about what AGs we might already have locked. Hence we can remove all the "nullfb" conditional logic in places like xfs_bmap_btalloc() and instead have them focus simply on setting up locality constraints. If the allocation fails due to AGF lock filtering in xfs_alloc_vextent, then we just fall back as we normally do to more relaxed allocation constraints. As a result, any allocation that allows AG scanning (i.e. not confined to a single AG) and does not force a worst case full filesystem scan will now be able to attempt allocation from AGs lower than that defined by tp->t_firstblock. This is because xfs_alloc_vextent() allows try-locking of the AGFs and hence enables low space algorithms to at least -try- to get space from AGs lower than the one that we have currently locked and allocated from. This is a significant improvement in the low space allocation algorithm. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
d5753847 |
|
10-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: block reservation too large for minleft allocation When we enter xfs_bmbt_alloc_block() without having first allocated a data extent (i.e. tp->t_firstblock == NULLFSBLOCK) because we are doing something like unwritten extent conversion, the transaction block reservation is used as the minleft value. This works for operations like unwritten extent conversion, but it assumes that the block reservation is only for a BMBT split. THis is not always true, and sometimes results in larger than necessary minleft values being set. We only actually need enough space for a btree split, something we already handle correctly in xfs_bmapi_write() via the xfs_bmapi_minleft() calculation. We should use xfs_bmapi_minleft() in xfs_bmbt_alloc_block() to calculate the number of blocks a BMBT split on this inode is going to require, not use the transaction block reservation that contains the maximum number of blocks this transaction may consume in it... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
1dd0510f |
|
10-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: fix low space alloc deadlock I've recently encountered an ABBA deadlock with g/476. The upcoming changes seem to make this much easier to hit, but the underlying problem is a pre-existing one. Essentially, if we select an AG for allocation, then lock the AGF and then fail to allocate for some reason (e.g. minimum length requirements cannot be satisfied), then we drop out of the allocation with the AGF still locked. The caller then modifies the allocation constraints - usually loosening them up - and tries again. This can result in trying to access AGFs that are lower than the AGF we already have locked from the failed attempt. e.g. the failed attempt skipped several AGs before failing, so we have locks an AG higher than the start AG. Retrying the allocation from the start AG then causes us to violate AGF lock ordering and this can lead to deadlocks. The deadlock exists even if allocation succeeds - we can do a followup allocations in the same transaction for BMBT blocks that aren't guaranteed to be in the same AG as the original, and can move into higher AGs. Hence we really need to move the tp->t_firstblock tracking down into xfs_alloc_vextent() where it can be set when we exit with a locked AG. xfs_alloc_vextent() can also check there if the requested allocation falls within the allow range of AGs set by tp->t_firstblock. If we can't allocate within the range set, we have to fail the allocation. If we are allowed to to non-blocking AGF locking, we can ignore the AG locking order limitations as we can use try-locks for the first iteration over requested AG range. This invalidates a set of post allocation asserts that check that the allocation is always above tp->t_firstblock if it is set. Because we can use try-locks to avoid the deadlock in some circumstances, having a pre-existing locked AGF doesn't always prevent allocation from lower order AGFs. Hence those ASSERTs need to be removed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
ddccb81b |
|
01-Feb-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: pass the xfs_bmbt_irec directly through the log intent code Instead of repeatedly boxing and unboxing the incore extent mapping structure as it passes through the BUI code, pass the pointer directly through. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
ddfdd530 |
|
01-Dec-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: invalidate xfs_bufs when allocating cow extents While investigating test failures in xfs/17[1-3] in alwayscow mode, I noticed through code inspection that xfs_bmap_alloc_userdata isn't setting XFS_ALLOC_USERDATA when allocating extents for a file's CoW fork. COW staging extents should be flagged as USERDATA, since user data are persisted to these blocks before being remapped into a file. This mis-classification has a few impacts on the behavior of the system. First, the filestreams allocator is supposed to keep allocating from a chosen AG until it runs out of space in that AG. However, it only does that for USERDATA allocations, which means that COW allocations aren't tied to the filestreams AG. Fortunately, few people use filestreams, so nobody's noticed. A more serious problem is that xfs_alloc_ag_vextent_small looks for a buffer to invalidate *if* the USERDATA flag is set and the AG is so full that the allocation had to come from the AGFL because the cntbt is empty. The consequences of not invalidating the buffer are severe -- if the AIL incorrectly checkpoints a buffer that is now being used to store user data, that action will clobber the user's written data. Fix filestreams and yet another data corruption vector by flagging COW allocations as USERDATA. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
304a68b9 |
|
28-Nov-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: use iomap_valid method to detect stale cached iomaps Now that iomap supports a mechanism to validate cached iomaps for buffered write operations, hook it up to the XFS buffered write ops so that we can avoid data corruptions that result from stale cached iomaps. See: https://lore.kernel.org/linux-xfs/20220817093627.GZ3600936@dread.disaster.area/ or the ->iomap_valid() introduction commit for exact details of the corruption vector. The validity cookie we store in the iomap is based on the type of iomap we return. It is expected that the iomap->flags we set in xfs_bmbt_to_iomap() is not perturbed by the iomap core and are returned to us in the iomap passed via the .iomap_valid() callback. This ensures that the validity cookie is always checking the correct inode fork sequence numbers to detect potential changes that affect the extent cached by the iomap. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
78b0f58b |
|
18-Sep-2022 |
Zeng Heng <zengheng4@huawei.com> |
xfs: clean up "%Ld/%Lu" which doesn't meet C standard The "%Ld" specifier, which represents long long unsigned, doesn't meet C language standard, and even more, it makes people easily mistake with "%ld", which represent long unsigned. So replace "%Ld" with "lld". Do the same with "%Lu". Signed-off-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
c01147d9 |
|
09-Jul-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: replace inode fork size macros with functions Replace the shouty macros here with typechecked helper functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
932b42c6 |
|
09-Jul-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: replace XFS_IFORK_Q with a proper predicate function Replace this shouty macro with a real C function that has a more descriptive name. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
e45d7cb2 |
|
09-Jul-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: use XFS_IFORK_Q to determine the presence of an xattr fork Modify xfs_ifork_ptr to return a NULL pointer if the caller asks for the attribute fork but i_forkoff is zero. This eliminates the ambiguity between i_forkoff and i_af.if_present, which should make it easier to understand the lifetime of attr forks. While we're at it, remove the if_present checks around calls to xfs_idestroy_fork and xfs_ifork_zap_attr since they can both handle attr forks that have already been torn down. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
2ed5b09b |
|
09-Jul-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: make inode attribute forks a permanent part of struct xfs_inode Syzkaller reported a UAF bug a while back: ================================================================== BUG: KASAN: use-after-free in xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127 Read of size 4 at addr ffff88802cec919c by task syz-executor262/2958 CPU: 2 PID: 2958 Comm: syz-executor262 Not tainted 5.15.0-0.30.3-20220406_1406 #3 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x82/0xa9 lib/dump_stack.c:106 print_address_description.constprop.9+0x21/0x2d5 mm/kasan/report.c:256 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold.14+0x7f/0x11b mm/kasan/report.c:459 xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127 xfs_attr_get+0x378/0x4c2 fs/xfs/libxfs/xfs_attr.c:159 xfs_xattr_get+0xe3/0x150 fs/xfs/xfs_xattr.c:36 __vfs_getxattr+0xdf/0x13d fs/xattr.c:399 cap_inode_need_killpriv+0x41/0x5d security/commoncap.c:300 security_inode_need_killpriv+0x4c/0x97 security/security.c:1408 dentry_needs_remove_privs.part.28+0x21/0x63 fs/inode.c:1912 dentry_needs_remove_privs+0x80/0x9e fs/inode.c:1908 do_truncate+0xc3/0x1e0 fs/open.c:56 handle_truncate fs/namei.c:3084 [inline] do_open fs/namei.c:3432 [inline] path_openat+0x30ab/0x396d fs/namei.c:3561 do_filp_open+0x1c4/0x290 fs/namei.c:3588 do_sys_openat2+0x60d/0x98c fs/open.c:1212 do_sys_open+0xcf/0x13c fs/open.c:1228 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0x0 RIP: 0033:0x7f7ef4bb753d Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1b 79 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007f7ef52c2ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000055 RAX: ffffffffffffffda RBX: 0000000000404148 RCX: 00007f7ef4bb753d RDX: 00007f7ef4bb753d RSI: 0000000000000000 RDI: 0000000020004fc0 RBP: 0000000000404140 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0030656c69662f2e R13: 00007ffd794db37f R14: 00007ffd794db470 R15: 00007f7ef52c2fc0 </TASK> Allocated by task 2953: kasan_save_stack+0x19/0x38 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:434 [inline] __kasan_slab_alloc+0x68/0x7c mm/kasan/common.c:467 kasan_slab_alloc include/linux/kasan.h:254 [inline] slab_post_alloc_hook mm/slab.h:519 [inline] slab_alloc_node mm/slub.c:3213 [inline] slab_alloc mm/slub.c:3221 [inline] kmem_cache_alloc+0x11b/0x3eb mm/slub.c:3226 kmem_cache_zalloc include/linux/slab.h:711 [inline] xfs_ifork_alloc+0x25/0xa2 fs/xfs/libxfs/xfs_inode_fork.c:287 xfs_bmap_add_attrfork+0x3f2/0x9b1 fs/xfs/libxfs/xfs_bmap.c:1098 xfs_attr_set+0xe38/0x12a7 fs/xfs/libxfs/xfs_attr.c:746 xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59 __vfs_setxattr+0x11b/0x177 fs/xattr.c:180 __vfs_setxattr_noperm+0x128/0x5e0 fs/xattr.c:214 __vfs_setxattr_locked+0x1d4/0x258 fs/xattr.c:275 vfs_setxattr+0x154/0x33d fs/xattr.c:301 setxattr+0x216/0x29f fs/xattr.c:575 __do_sys_fsetxattr fs/xattr.c:632 [inline] __se_sys_fsetxattr fs/xattr.c:621 [inline] __x64_sys_fsetxattr+0x243/0x2fe fs/xattr.c:621 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0x0 Freed by task 2949: kasan_save_stack+0x19/0x38 mm/kasan/common.c:38 kasan_set_track+0x1c/0x21 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360 ____kasan_slab_free mm/kasan/common.c:366 [inline] ____kasan_slab_free mm/kasan/common.c:328 [inline] __kasan_slab_free+0xe2/0x10e mm/kasan/common.c:374 kasan_slab_free include/linux/kasan.h:230 [inline] slab_free_hook mm/slub.c:1700 [inline] slab_free_freelist_hook mm/slub.c:1726 [inline] slab_free mm/slub.c:3492 [inline] kmem_cache_free+0xdc/0x3ce mm/slub.c:3508 xfs_attr_fork_remove+0x8d/0x132 fs/xfs/libxfs/xfs_attr_leaf.c:773 xfs_attr_sf_removename+0x5dd/0x6cb fs/xfs/libxfs/xfs_attr_leaf.c:822 xfs_attr_remove_iter+0x68c/0x805 fs/xfs/libxfs/xfs_attr.c:1413 xfs_attr_remove_args+0xb1/0x10d fs/xfs/libxfs/xfs_attr.c:684 xfs_attr_set+0xf1e/0x12a7 fs/xfs/libxfs/xfs_attr.c:802 xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59 __vfs_removexattr+0x106/0x16a fs/xattr.c:468 cap_inode_killpriv+0x24/0x47 security/commoncap.c:324 security_inode_killpriv+0x54/0xa1 security/security.c:1414 setattr_prepare+0x1a6/0x897 fs/attr.c:146 xfs_vn_change_ok+0x111/0x15e fs/xfs/xfs_iops.c:682 xfs_vn_setattr_size+0x5f/0x15a fs/xfs/xfs_iops.c:1065 xfs_vn_setattr+0x125/0x2ad fs/xfs/xfs_iops.c:1093 notify_change+0xae5/0x10a1 fs/attr.c:410 do_truncate+0x134/0x1e0 fs/open.c:64 handle_truncate fs/namei.c:3084 [inline] do_open fs/namei.c:3432 [inline] path_openat+0x30ab/0x396d fs/namei.c:3561 do_filp_open+0x1c4/0x290 fs/namei.c:3588 do_sys_openat2+0x60d/0x98c fs/open.c:1212 do_sys_open+0xcf/0x13c fs/open.c:1228 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0x0 The buggy address belongs to the object at ffff88802cec9188 which belongs to the cache xfs_ifork of size 40 The buggy address is located 20 bytes inside of 40-byte region [ffff88802cec9188, ffff88802cec91b0) The buggy address belongs to the page: page:00000000c3af36a1 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2cec9 flags: 0xfffffc0000200(slab|node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0000200 ffffea00009d2580 0000000600000006 ffff88801a9ffc80 raw: 0000000000000000 0000000080490049 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88802cec9080: fb fb fb fc fc fa fb fb fb fb fc fc fb fb fb fb ffff88802cec9100: fb fc fc fb fb fb fb fb fc fc fb fb fb fb fb fc >ffff88802cec9180: fc fa fb fb fb fb fc fc fa fb fb fb fb fc fc fb ^ ffff88802cec9200: fb fb fb fb fc fc fb fb fb fb fb fc fc fb fb fb ffff88802cec9280: fb fb fc fc fa fb fb fb fb fc fc fa fb fb fb fb ================================================================== The root cause of this bug is the unlocked access to xfs_inode.i_afp from the getxattr code paths while trying to determine which ILOCK mode to use to stabilize the xattr data. Unfortunately, the VFS does not acquire i_rwsem when vfs_getxattr (or listxattr) call into the filesystem, which means that getxattr can race with a removexattr that's tearing down the attr fork and crash: xfs_attr_set: xfs_attr_get: xfs_attr_fork_remove: xfs_ilock_attr_map_shared: xfs_idestroy_fork(ip->i_afp); kmem_cache_free(xfs_ifork_cache, ip->i_afp); if (ip->i_afp && ip->i_afp = NULL; xfs_need_iread_extents(ip->i_afp)) <KABOOM> ip->i_forkoff = 0; Regrettably, the VFS is much more lax about i_rwsem and getxattr than is immediately obvious -- not only does it not guarantee that we hold i_rwsem, it actually doesn't guarantee that we *don't* hold it either. The getxattr system call won't acquire the lock before calling XFS, but the file capabilities code calls getxattr with and without i_rwsem held to determine if the "security.capabilities" xattr is set on the file. Fixing the VFS locking requires a treewide investigation into every code path that could touch an xattr and what i_rwsem state it expects or sets up. That could take years or even prove impossible; fortunately, we can fix this UAF problem inside XFS. An earlier version of this patch used smp_wmb in xfs_attr_fork_remove to ensure that i_forkoff is always zeroed before i_afp is set to null and changed the read paths to use smp_rmb before accessing i_forkoff and i_afp, which avoided these UAF problems. However, the patch author was too busy dealing with other problems in the meantime, and by the time he came back to this issue, the situation had changed a bit. On a modern system with selinux, each inode will always have at least one xattr for the selinux label, so it doesn't make much sense to keep incurring the extra pointer dereference. Furthermore, Allison's upcoming parent pointer patchset will also cause nearly every inode in the filesystem to have extended attributes. Therefore, make the inode attribute fork structure part of struct xfs_inode, at a cost of 40 more bytes. This patch adds a clunky if_present field where necessary to maintain the existing logic of xattr fork null pointer testing in the existing codebase. The next patch switches the logic over to XFS_IFORK_Q and it all goes away. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
732436ef |
|
09-Jul-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: convert XFS_IFORK_PTR to a static inline helper We're about to make this logic do a bit more, so convert the macro to a static inline function for better typechecking and fewer shouty macros. No functional changes here. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
08d3e84f |
|
07-Jul-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: pass perag to xfs_alloc_read_agf() xfs_alloc_read_agf() initialises the perag if it hasn't been done yet, so it makes sense to pass it the perag rather than pull a reference from the buffer. This allows callers to be per-ag centric rather than passing mount/agno pairs everywhere. Whilst modifying the xfs_reflink_find_shared() function definition, declare it static and remove the extern declaration as it is an internal function only these days. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
76b47e52 |
|
07-Jul-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: kill xfs_alloc_pagf_init() Trivial wrapper around xfs_alloc_read_agf(), can be easily replaced by passing a NULL agfbp to xfs_alloc_read_agf(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
4ed6435c |
|
25-Apr-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: stop artificially limiting the length of bunmap calls In commit e1a4e37cc7b6, we clamped the length of bunmapi calls on the data forks of shared files to avoid two failure scenarios: one where the extent being unmapped is so sparsely shared that we exceed the transaction reservation with the sheer number of refcount btree updates and EFI intent items; and the other where we attach so many deferred updates to the transaction that we pin the log tail and later the log head meets the tail, causing the log to livelock. We avoid triggering the first problem by tracking the number of ops in the refcount btree cursor and forcing a requeue of the refcount intent item any time we think that we might be close to overflowing. This has been baked into XFS since before the original e1a4 patch. A recent patchset fixed the second problem by changing the deferred ops code to finish all the work items created by each round of trying to complete a refcount intent item, which eliminates the long chains of deferred items (27dad); and causing long-running transactions to relog their intent log items when space in the log gets low (74f4d). Because this clamp affects /any/ unmapping request regardless of the sharing factors of the component blocks, it degrades the performance of all large unmapping requests -- whereas with an unshared file we can unmap millions of blocks in one go, shared files are limited to unmapping a few thousand blocks at a time, which causes the upper level code to spin in a bunmapi loop even if it wasn't needed. This also eliminates one more place where log recovery behavior can differ from online behavior, because bunmapi operations no longer need to requeue. The fstest generic/447 was created to test the old fix, and it still passes with this applied. Partial-revert-of: e1a4e37cc7b6 ("xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent") Depends: 27dada070d59 ("xfs: change the order in which child and parent defer ops ar finished") Depends: 74f4d6a1e065 ("xfs: only relog deferred intent items if free space in the log gets low") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
e7d410ac |
|
20-Apr-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert bmapi flags to unsigned. 5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
0e5b8e45 |
|
20-Apr-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert bmap extent type flags to unsigned. 5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
4f86bb4b |
|
09-Mar-2022 |
Chandan Babu R <chandan.babu@oracle.com> |
xfs: Conditionally upgrade existing inodes to use large extent counters This commit enables upgrading existing inodes to use large extent counters provided that underlying filesystem's superblock has large extent counter feature enabled. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
83a21c18 |
|
29-Mar-2022 |
Chandan Babu R <chandan.babu@oracle.com> |
xfs: Directory's data fork extent counter can never overflow The maximum file size that can be represented by the data fork extent counter in the worst case occurs when all extents are 1 block in length and each block is 1KB in size. With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with 1KB sized blocks, a file can reach upto, (2^31) * 1KB = 2TB This is much larger than the theoretical maximum size of a directory i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB. Since a directory's inode can never overflow its data fork extent counter, this commit removes all the overflow checks associated with it. xfs_dinode_verify() now performs a rough check to verify if a diretory's data fork is larger than 96GB. Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
df9ad5cc |
|
16-Nov-2021 |
Chandan Babu R <chandan.babu@oracle.com> |
xfs: Introduce macros to represent new maximum extent counts for data/attr forks This commit defines new macros to represent maximum extent counts allowed by filesystems which have support for large per-inode extent counters. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
0c35e7ba |
|
16-Nov-2021 |
Chandan Babu R <chandan.babu@oracle.com> |
xfs: Use uint64_t to count maximum blocks that can be used by BMBT Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
755c38ff |
|
16-Nov-2021 |
Chandan Babu R <chandan.babu@oracle.com> |
xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively A future commit will introduce a 64-bit on-disk data extent counter and a 32-bit on-disk attr extent counter. This commit promotes xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits in order to correctly handle in-core versions of these quantities. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
bb1d5049 |
|
25-Feb-2021 |
Chandan Babu R <chandan.babu@oracle.com> |
xfs: Use xfs_extnum_t instead of basic data types xfs_extnum_t is the type to use to declare variables which have values obtained from xfs_dinode->di_[a]nextents. This commit replaces basic types (e.g. uint32_t) with xfs_extnum_t for such variables. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
9feb8f19 |
|
27-Aug-2020 |
Chandan Babu R <chandan.babu@oracle.com> |
xfs: Introduce xfs_iext_max_nextents() helper xfs_iext_max_nextents() returns the maximum number of extents possible for one of data, cow or attribute fork. This helper will be extended further in a future commit when maximum extent counts associated with data/attribute forks are increased. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
95f0b95e |
|
08-Aug-2021 |
Chandan Babu R <chandan.babu@oracle.com> |
xfs: Define max extent length based on on-disk format definition The maximum extent length depends on maximum block count that can be stored in a BMBT record. Hence this commit defines MAXEXTLEN based on BMBT_BLOCKCOUNT_BITLEN. While at it, the commit also renames MAXEXTLEN to XFS_MAX_BMBT_EXTLEN. Suggested-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
740fd671 |
|
29-Nov-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: pass the mapping flags to xfs_bmbt_to_iomap To prepare for looking at the IOMAP_DAX flag in xfs_bmbt_to_iomap pass in the input mapping flags to xfs_bmbt_to_iomap. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-24-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
c201d9ca |
|
12-Oct-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: rename xfs_bmap_add_free to xfs_free_extent_later xfs_bmap_add_free isn't a block mapping function; it schedules deferred freeing operations for a later point in a compound transaction chain. While it's primarily used by bunmapi, its use has expanded beyond that. Move it to xfs_alloc.c and rename the function since it's now general freeing functionality. Bring the slab cache bits in line with the way we handle the other intent items. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
f3c799c2 |
|
12-Oct-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: create slab caches for frequently-used deferred items Create slab caches for the high-level structures that coordinate deferred intent items, since they're used fairly heavily. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
182696fb |
|
12-Oct-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: rename _zone variables to _cache Now that we've gotten rid of the kmem_zone_t typedef, rename the variables to _cache since that's what they are. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
e7720afa |
|
27-Sep-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: remove kmem_zone typedef Remove these typedefs by referencing kmem_cache directly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
0ed5f735 |
|
23-Sep-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: compute absolute maximum nlevels for each btree type Add code for all five btree types so that we can compute the absolute maximum possible btree height for each btree type. This is a setup for the next patch, which makes every btree type have its own cursor cache. The functions are exported so that we can have xfs_db report the absolute maximum btree heights for each btree type, rather than making everyone run their own ad-hoc computations. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
c0643f6f |
|
16-Sep-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: encode the max btree height in the cursor Encode the maximum btree height in the cursor, since we're soon going to allow smaller cursors for AG btrees and larger cursors for file btrees. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
6ca444cf |
|
16-Sep-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: prepare xfs_btree_cur for dynamic cursor heights Split out the btree level information into a separate struct and put it at the end of the cursor structure as a VLA. Files with huge data forks (and in the future, the realtime rmap btree) will require the ability to support many more levels than a per-AG btree cursor, which means that we're going to create per-btree type cursor caches to conserve memory for the more common case. Note that a subsequent patch actually introduces dynamic cursor heights. This one merely rearranges the structure to prepare for that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
ae127f08 |
|
16-Sep-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: remove xfs_btree_cur_t typedef Get rid of this old typedef before we start changing other things. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
9343ee76 |
|
18-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert bp->b_bn references to xfs_buf_daddr() Stop directly referencing b_bn in code outside the buffer cache, as b_bn is supposed to be used only as an internal cache index. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
04fcad80 |
|
18-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: introduce xfs_buf_daddr() Introduce a helper function xfs_buf_daddr() to extract the disk address of the buffer from the struct xfs_buf. This will replace direct accesses to bp->b_bn and bp->b_maps[0].bm_bn, as well as the XFS_BUF_ADDR() macro. This patch introduces the helper function and replaces all uses of XFS_BUF_ADDR() as this is just a simple sed replacement. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
75c8c50f |
|
18-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown Remove the shouty macro and instead use the inline function that matches other state/feature check wrapper naming. This conversion was done with sed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
0560f31a |
|
18-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert mount flags to features Replace m_flags feature checks with xfs_has_<feature>() calls and rework the setup code to set flags in m_features. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
38c26bfd |
|
18-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: replace xfs_sb_version checks with feature flag checks Convert the xfs_sb_version_hasfoo() to checks against mp->m_features. Checks of the superblock itself during disk operations (e.g. in the read/write verifiers and the to/from disk formatters) are not converted - they operate purely on the superblock state. Everything else should use the mount features. Large parts of this conversion were done with sed with commands like this: for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f done With manual cleanups for things like "xfs_has_extflgbit" and other little inconsistencies in naming. The result is ia lot less typing to check features and an XFS binary size reduced by a bit over 3kB: $ size -t fs/xfs/built-in.a text data bss dec hex filenam before 1130866 311352 484 1442702 16038e (TOTALS) after 1127727 311352 484 1439563 15f74b (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
5a981e4e |
|
02-Jun-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: mark xfs_bmap_set_attrforkoff static xfs_bmap_set_attrforkoff is only used inside of xfs_bmap.c, so mark it static. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
9bbafc71 |
|
01-Jun-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: move xfs_perag_get/put to xfs_ag.[ch] They are AG functions, not superblock functions, so move them to the appropriate location. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
0fe0bbe0 |
|
27-May-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: bunmapi has unnecessary AG lock ordering issues large directory block size operations are assert failing because xfs_bunmapi() is not completely removing fragmented directory blocks like so: XFS: Assertion failed: done, file: fs/xfs/libxfs/xfs_dir2.c, line: 677 .... Call Trace: xfs_dir2_shrink_inode+0x1a8/0x210 xfs_dir2_block_to_sf+0x2ae/0x410 xfs_dir2_block_removename+0x21a/0x280 xfs_dir_removename+0x195/0x1d0 xfs_rename+0xb79/0xc50 ? avc_has_perm+0x8d/0x1a0 ? avc_has_perm_noaudit+0x9a/0x120 xfs_vn_rename+0xdb/0x150 vfs_rename+0x719/0xb50 ? __lookup_hash+0x6a/0xa0 do_renameat2+0x413/0x5e0 __x64_sys_rename+0x45/0x50 do_syscall_64+0x3a/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae We are aborting the bunmapi() pass because of this specific chunk of code: /* * Make sure we don't touch multiple AGF headers out of order * in a single transaction, as that could cause AB-BA deadlocks. */ if (!wasdel && !isrt) { agno = XFS_FSB_TO_AGNO(mp, del.br_startblock); if (prev_agno != NULLAGNUMBER && prev_agno > agno) break; prev_agno = agno; } This is designed to prevent deadlocks in AGF locking when freeing multiple extents by ensuring that we only ever lock in increasing AG number order. Unfortunately, this also violates the "bunmapi will always succeed" semantic that some high level callers depend on, such as xfs_dir2_shrink_inode(), xfs_da_shrink_inode() and xfs_inactive_symlink_rmt(). This AG lock ordering was introduced back in 2017 to fix deadlocks triggered by generic/299 as reported here: https://lore.kernel.org/linux-xfs/800468eb-3ded-9166-20a4-047de8018582@gmail.com/ This codebase is old enough that it was before we were defering all AG based extent freeing from within xfs_bunmapi(). THat is, we never actually lock AGs in xfs_bunmapi() any more - every non-rt based extent free is added to the defer ops list, as is all BMBT block freeing. And RT extents are not RT based, so there's no lock ordering issues associated with them. Hence this AGF lock ordering code is both broken and dead. Let's just remove it so that the large directory block code works reliably again. Tested against xfs/538 and generic/299 which is the original test that exposed the deadlocks that this code fixed. Fixes: 5b094d6dac04 ("xfs: fix multi-AG deadlock in xfs_bunmapi") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
991c2c59 |
|
26-May-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: btree format inode forks can have zero extents xfs/538 is assert failing with this trace when testing with directory block sizes of 64kB: XFS: Assertion failed: !xfs_need_iread_extents(ifp), file: fs/xfs/libxfs/xfs_bmap.c, line: 608 .... Call Trace: xfs_bmap_btree_to_extents+0x2a9/0x470 ? kmem_cache_alloc+0xe7/0x220 __xfs_bunmapi+0x4ca/0xdf0 xfs_bunmapi+0x1a/0x30 xfs_dir2_shrink_inode+0x71/0x210 xfs_dir2_block_to_sf+0x2ae/0x410 xfs_dir2_block_removename+0x21a/0x280 xfs_dir_removename+0x195/0x1d0 xfs_remove+0x244/0x460 xfs_vn_unlink+0x53/0xa0 ? selinux_inode_unlink+0x13/0x20 vfs_unlink+0x117/0x220 do_unlinkat+0x1a2/0x2d0 __x64_sys_unlink+0x42/0x60 do_syscall_64+0x3a/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae This is a check to ensure that the extents have been read into memory before we are doing a ifork btree manipulation. This assert is bogus in the above case. We have a fragmented directory block that has more extents in it than can fit in extent format, so the inode data fork is in btree format. xfs_dir2_shrink_inode() asks to remove all remaining 16 filesystem blocks from the inode so it can convert to short form, and __xfs_bunmapi() removes all the extents. We now have a data fork in btree format but have zero extents in the fork. This incorrectly trips the xfs_need_iread_extents() assert because it assumes that an empty extent btree means the extent tree has not been read into memory yet. This is clearly not the case with xfs_bunmapi(), as it has an explicit call to xfs_iread_extents() in it to pull the extents into memory before it starts unmapping. Also, the assert directly after this bogus one is: ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE); Which covers the context in which it is legal to call xfs_bmap_btree_to_extents just fine. Hence we should just remove the bogus assert as it is clearly wrong and causes a regression. The returns the test behaviour to the pre-existing assert failure in xfs_dir2_shrink_inode() that indicates xfs_bunmapi() has failed to remove all the extents in the range it was asked to unmap. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
b2197a36 |
|
13-Apr-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: remove XFS_IFEXTENTS The in-memory XFS_IFEXTENTS is now only used to check if an inode with extents still needs the extents to be read into memory before doing operations that need the extent map. Add a new xfs_need_iread_extents helper that returns true for btree format forks that do not have any entries in the in-memory extent btree, and use that instead of checking the XFS_IFEXTENTS flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
0779f4a6 |
|
13-Apr-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: remove XFS_IFINLINE Just check for an inline format fork instead of the using the equivalent in-memory XFS_IFINLINE flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
ac1e0672 |
|
13-Apr-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: remove XFS_IFBROOT Just check for a btree format fork instead of the using the equivalent in-memory XFS_IFBROOT flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
2ac131df |
|
13-Apr-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: rename and simplify xfs_bmap_one_block xfs_bmap_one_block is only called for the attribute fork. Move it to xfs_attr.c, drop the unused whichfork argument and code only executed for the data fork and rename the result to xfs_attr_is_leaf. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
862a804a |
|
13-Apr-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: move the XFS_IFEXTENTS check into xfs_iread_extents Move the XFS_IFEXTENTS check from the callers into xfs_iread_extents to simplify the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
b2941046 |
|
06-Apr-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: precalculate default inode attribute offset Default attr fork offset is based on inode size, so is a fixed geometry parameter of the inode. Move it to the xfs_ino_geometry structure and stop calculating it on every call to xfs_default_attroffset(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
|
#
683ec9ba |
|
06-Apr-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: default attr fork size does not handle device inodes Device inodes have a non-default data fork size of 8 bytes as checked/enforced by xfs_repair. xfs_default_attroffset() doesn't handle this, so lets do a minor refactor so it does. Fixes: e6a688c33238 ("xfs: initialise attr fork on inode create") Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
|
#
b6785e27 |
|
02-Apr-2021 |
Chandan Babu R <chandanrlinux@gmail.com> |
xfs: Use struct xfs_bmdr_block instead of struct xfs_btree_block to calculate root node size The incore data fork of an inode stores the bmap btree root node as 'struct xfs_btree_block'. However, the ondisk version of the inode stores the bmap btree root node as a 'struct xfs_bmdr_block'. xfs_bmap_add_attrfork_btree() checks if the btree root node fits inside the data fork of the inode. However, it incorrectly uses 'struct xfs_btree_block' to compute the size of the bmap btree root node. Since size of 'struct xfs_btree_block' is larger than that of 'struct xfs_bmdr_block', xfs_bmap_add_attrfork_btree() could end up unnecessarily demoting the current root node as the child of newly allocated root node. This commit optimizes space usage by modifying xfs_bmap_add_attrfork_btree() to use 'struct xfs_bmdr_block' to check if the bmap btree root node fits inside the data fork of the inode. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
7821ea30 |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: move the di_forkoff field to struct xfs_inode In preparation of removing the historic icinode struct, move the forkoff field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
031474c2 |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: move the di_extsize field to struct xfs_inode In preparation of removing the historic icinode struct, move the extsize field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
6e73a545 |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: move the di_nblocks field to struct xfs_inode In preparation of removing the historic icinode struct, move the nblocks field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
13d2c10b |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: move the di_size field to struct xfs_inode In preparation of removing the historic icinode struct, move the on-disk size field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
6e8bd39d |
|
25-Mar-2021 |
Chandan Babu R <chandanrlinux@gmail.com> |
xfs: Initialize xfs_alloc_arg->total correctly when allocating minlen extents xfs/538 can cause the following call trace to be printed when executing on a multi-block directory configuration, WARNING: CPU: 1 PID: 2578 at fs/xfs/libxfs/xfs_bmap.c:717 xfs_bmap_extents_to_btree+0x520/0x5d0 Call Trace: ? xfs_buf_rele+0x4f/0x450 xfs_bmap_add_extent_hole_real+0x747/0x960 xfs_bmapi_allocate+0x39a/0x440 xfs_bmapi_write+0x507/0x9e0 xfs_da_grow_inode_int+0x1cd/0x330 ? up+0x12/0x60 xfs_dir2_grow_inode+0x62/0x110 ? xfs_trans_log_inode+0x234/0x2d0 xfs_dir2_sf_to_block+0x103/0x940 ? xfs_dir2_sf_check+0x8c/0x210 ? xfs_da_compname+0x19/0x30 ? xfs_dir2_sf_lookup+0xd0/0x3d0 xfs_dir2_sf_addname+0x10d/0x910 xfs_dir_createname+0x1ad/0x210 xfs_create+0x404/0x620 xfs_generic_create+0x24c/0x320 path_openat+0xda6/0x1030 do_filp_open+0x88/0x130 ? kmem_cache_alloc+0x50/0x210 ? __cond_resched+0x16/0x40 ? kmem_cache_alloc+0x50/0x210 do_sys_openat2+0x97/0x150 __x64_sys_creat+0x49/0x70 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae This occurs because xfs_bmap_exact_minlen_extent_alloc() initializes xfs_alloc_arg->total to xfs_bmalloca->minlen. In the context of xfs_bmap_exact_minlen_extent_alloc(), xfs_bmalloca->minlen has a value of 1 and hence the space allocator could choose an AG which has less than xfs_bmalloca->total number of free blocks available. As the transaction proceeds, one of the future space allocation requests could fail due to non-availability of free blocks in the AG that was originally chosen. This commit fixes the bug by assigning xfs_alloc_arg->total to the value of xfs_bmalloca->total. Fixes: 301519674699 ("xfs: Introduce error injection to allocate only minlen size extents for files") Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
e6a688c3 |
|
22-Mar-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: initialise attr fork on inode create When we allocate a new inode, we often need to add an attribute to the inode as part of the create. This can happen as a result of needing to add default ACLs or security labels before the inode is made visible to userspace. This is highly inefficient right now. We do the create transaction to allocate the inode, then we do an "add attr fork" transaction to modify the just created empty inode to set the inode fork offset to allow attributes to be stored, then we go and do the attribute creation. This means 3 transactions instead of 1 to allocate an inode, and this greatly increases the load on the CIL commit code, resulting in excessive contention on the CIL spin locks and performance degradation: 18.99% [kernel] [k] __pv_queued_spin_lock_slowpath 3.57% [kernel] [k] do_raw_spin_lock 2.51% [kernel] [k] __raw_callee_save___pv_queued_spin_unlock 2.48% [kernel] [k] memcpy 2.34% [kernel] [k] xfs_log_commit_cil The typical profile resulting from running fsmark on a selinux enabled filesytem is adds this overhead to the create path: - 15.30% xfs_init_security - 15.23% security_inode_init_security - 13.05% xfs_initxattrs - 12.94% xfs_attr_set - 6.75% xfs_bmap_add_attrfork - 5.51% xfs_trans_commit - 5.48% __xfs_trans_commit - 5.35% xfs_log_commit_cil - 3.86% _raw_spin_lock - do_raw_spin_lock __pv_queued_spin_lock_slowpath - 0.70% xfs_trans_alloc 0.52% xfs_trans_reserve - 5.41% xfs_attr_set_args - 5.39% xfs_attr_set_shortform.constprop.0 - 4.46% xfs_trans_commit - 4.46% __xfs_trans_commit - 4.33% xfs_log_commit_cil - 2.74% _raw_spin_lock - do_raw_spin_lock __pv_queued_spin_lock_slowpath 0.60% xfs_inode_item_format 0.90% xfs_attr_try_sf_addname - 1.99% selinux_inode_init_security - 1.02% security_sid_to_context_force - 1.00% security_sid_to_context_core - 0.92% sidtab_entry_to_string - 0.90% sidtab_sid2str_get 0.59% sidtab_sid2str_put.part.0 - 0.82% selinux_determine_inode_label - 0.77% security_transition_sid 0.70% security_compute_sid.part.0 And fsmark creation rate performance drops by ~25%. The key point to note here is that half the additional overhead comes from adding the attribute fork to the newly created inode. That's crazy, considering we can do this same thing at inode create time with a couple of lines of code and no extra overhead. So, if we know we are going to add an attribute immediately after creating the inode, let's just initialise the attribute fork inside the create transaction and chop that whole chunk of code out of the create fast path. This completely removes the performance drop caused by enabling SELinux, and the profile looks like: - 8.99% xfs_init_security - 9.00% security_inode_init_security - 6.43% xfs_initxattrs - 6.37% xfs_attr_set - 5.45% xfs_attr_set_args - 5.42% xfs_attr_set_shortform.constprop.0 - 4.51% xfs_trans_commit - 4.54% __xfs_trans_commit - 4.59% xfs_log_commit_cil - 2.67% _raw_spin_lock - 3.28% do_raw_spin_lock 3.08% __pv_queued_spin_lock_slowpath 0.66% xfs_inode_item_format - 0.90% xfs_attr_try_sf_addname - 0.60% xfs_trans_alloc - 2.35% selinux_inode_init_security - 1.25% security_sid_to_context_force - 1.21% security_sid_to_context_core - 1.19% sidtab_entry_to_string - 1.20% sidtab_sid2str_get - 0.86% sidtab_sid2str_put.part.0 - 0.62% _raw_spin_lock_irqsave - 0.77% do_raw_spin_lock __pv_queued_spin_lock_slowpath - 0.84% selinux_determine_inode_label - 0.83% security_transition_sid 0.86% security_compute_sid.part.0 Which indicates the XFS overhead of creating the selinux xattr has been halved. This doesn't fix the CIL lock contention problem, just means it's not a limiting factor for this workload. Lock contention in the security subsystems is going to be an issue soon, though... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [djwong: fix compilation error when CONFIG_SECURITY=n] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
|
#
3de4eb10 |
|
26-Jan-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: allow reservation of rtblocks with xfs_trans_alloc_inode Make it so that we can reserve rt blocks with the xfs_trans_alloc_inode wrapper function, then convert a few more callsites. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
3a1af6c3 |
|
26-Jan-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: refactor common transaction/inode/quota allocation idiom Create a new helper xfs_trans_alloc_inode that allocates a transaction, locks and joins an inode to it, and then reserves the appropriate amount of quota against that transction. Then replace all the open-coded idioms with a single call to this helper. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
02b7ee4e |
|
26-Jan-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: reserve data and rt quota at the same time Modify xfs_trans_reserve_quota_nblks so that we can reserve data and realtime blocks from the dquot at the same time. This change has the theoretical side effect that for allocations to realtime files we will reserve from the dquot both the number of rtblocks being allocated and the number of bmbt blocks that might be needed to add the mapping. However, since the mount code disables quota if it finds a realtime device, this should not result in any behavior changes. Now that we've moved the inode creation callers away from using the _nblks function, we can repurpose the (now unused) ninos argument for realtime blocks, so make that change. This also replaces the flags argument with a boolean parameter to force the reservation since we don't need to distinguish between data and rt quota reservations any more, and the only flag being passed in was FORCE_RES. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
85546500 |
|
22-Jan-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: create convenience wrappers for incore quota block reservations Create a couple of convenience wrappers for creating and deleting quota block reservations against future changes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
4abe21ad |
|
22-Jan-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: clean up quota reservation callsites Convert a few xfs_trans_*reserve* callsites that are open-coding other convenience functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
560ab6c0 |
|
27-Jan-2021 |
Chandan Babu R <chandanrlinux@gmail.com> |
xfs: Fix 'set but not used' warning in xfs_bmap_compute_alignments() With both CONFIG_XFS_DEBUG and CONFIG_XFS_WARN disabled, the only reference to local variable "error" in xfs_bmap_compute_alignments() gets eliminated during pre-processing stage of the compilation process. This causes the compiler to generate a "set but not used" warning. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
30151967 |
|
22-Jan-2021 |
Chandan Babu R <chandanrlinux@gmail.com> |
xfs: Introduce error injection to allocate only minlen size extents for files This commit adds XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag which helps userspace test programs to get xfs_bmap_btalloc() to always allocate minlen sized extents. This is required for test programs which need a guarantee that minlen extents allocated for a file do not get merged with their existing neighbours in the inode's BMBT. "Inode fork extent overflow check" for Directories, Xattrs and extension of realtime inodes need this since the file offset at which the extents are being allocated cannot be explicitly controlled from userspace. One way to use this error tag is to, 1. Consume all of the free space by sequentially writing to a file. 2. Punch alternate blocks of the file. This causes CNTBT to contain sufficient number of one block sized extent records. 3. Inject XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag. After step 3, xfs_bmap_btalloc() will issue space allocation requests for minlen sized extents only. ENOSPC error code is returned to userspace when there aren't any "one block sized" extents left in any of the AGs. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
07c72e55 |
|
22-Jan-2021 |
Chandan Babu R <chandanrlinux@gmail.com> |
xfs: Process allocated extent in a separate function This commit moves over the code in xfs_bmap_btalloc() which is responsible for processing an allocated extent to a new function. Apart from xfs_bmap_btalloc(), the new function will be invoked by another function introduced in a future commit. Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0961fddf |
|
22-Jan-2021 |
Chandan Babu R <chandanrlinux@gmail.com> |
xfs: Compute bmap extent alignments in a separate function This commit moves over the code which computes stripe alignment and extent size hint alignment into a separate function. Apart from xfs_bmap_btalloc(), the new function will be used by another function introduced in a future commit. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
aff4db57 |
|
22-Jan-2021 |
Chandan Babu R <chandanrlinux@gmail.com> |
xfs: Remove duplicate assert statement in xfs_bmap_btalloc() The check for verifying if the allocated extent is from an AG whose index is greater than or equal to that of tp->t_firstblock is already done a couple of statements earlier in the same function. Hence this commit removes the redundant assert statement. Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
02092a2f |
|
22-Jan-2021 |
Chandan Babu R <chandanrlinux@gmail.com> |
xfs: Check for extent overflow when renaming dir entries A rename operation is essentially a directory entry remove operation from the perspective of parent directory (i.e. src_dp) of rename's source. Hence the only place where we check for extent count overflow for src_dp is in xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real() returns -ENOSPC when it detects a possible extent count overflow and in response, the higher layers of directory handling code do the following: 1. Data/Free blocks: XFS lets these blocks linger until a future remove operation removes them. 2. Dabtree blocks: XFS swaps the blocks with the last block in the Leaf space and unmaps the last block. For target_dp, there are two cases depending on whether the destination directory entry exists or not. When destination directory entry does not exist (i.e. target_ip == NULL), extent count overflow check is performed only when transaction has a non-zero sized space reservation associated with it. With a zero-sized space reservation, XFS allows a rename operation to continue only when the directory has sufficient free space in its data/leaf/free space blocks to hold the new entry. When destination directory entry exists (i.e. target_ip != NULL), all we need to do is change the inode number associated with the already existing entry. Hence there is no need to perform an extent count overflow check. Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0dbc5cb1 |
|
22-Jan-2021 |
Chandan Babu R <chandanrlinux@gmail.com> |
xfs: Check for extent overflow when removing dir entries Directory entry removal must always succeed; Hence XFS does the following during low disk space scenario: 1. Data/Free blocks linger until a future remove operation. 2. Dabtree blocks would be swapped with the last block in the leaf space and then the new last block will be unmapped. This facility is reused during low inode extent count scenario i.e. this commit causes xfs_bmap_del_extent_real() to return -ENOSPC error code so that the above mentioned behaviour is exercised causing no change to the directory's extent count. Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
727e1acd |
|
22-Jan-2021 |
Chandan Babu R <chandanrlinux@gmail.com> |
xfs: Check for extent overflow when trivally adding a new extent When adding a new data extent (without modifying an inode's existing extents) the extent count increases only by 1. This commit checks for extent count overflow in such cases. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e8222613 |
|
16-Dec-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove xfs_buf_t typedef Prepare for kernel xfs_buf alignment by getting rid of the xfs_buf_t typedef from userspace. [darrick: This patch is a port of a userspace patch removing the xfs_buf_t typedef in preparation to make the userspace xfs_buf code behave more like its kernel counterpart.] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
33005fd0 |
|
04-Dec-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: refactor file range validation Refactor all the open-coded validation of file block ranges into a single helper, and teach the bmap scrubber to check the ranges. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
18695ad4 |
|
04-Dec-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: refactor realtime volume extent validation Refactor all the open-coded validation of realtime device extents into a single helper. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
67457eb0 |
|
04-Dec-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: refactor data device extent validation Refactor all the open-coded validation of non-static data device extents into a single helper. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
acf104c2 |
|
02-Dec-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: detect overflows in bmbt records Detect file block mappings with a blockcount that's either so large that integer overflows occur or are zero, because neither are valid in the filesystem. Worse yet, attempting directory modifications causes the iext code to trip over the bmbt key handling and takes the filesystem down. We can fix most of this by preventing the bad metadata from entering the incore structures in the first place. Found by setting blockcount=0 in a directory data fork mapping and watching the fireworks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
8df0fa39 |
|
21-Sep-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: don't free rt blocks when we're doing a REMAP bunmapi call When callers pass XFS_BMAPI_REMAP into xfs_bunmapi, they want the extent to be unmapped from the given file fork without the extent being freed. We do this for non-rt files, but we forgot to do this for realtime files. So far this isn't a big deal since nobody makes a bunmapi call to a rt file with the REMAP flag set, but don't leave a logic bomb. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
d0c20d38 |
|
02-Sep-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: fix xfs_bmap_validate_extent_raw when checking attr fork of rt files The realtime flag only applies to the data fork, so don't use the realtime block number checks on the attr fork of a realtime file. Fixes: 30b0984d9117 ("xfs: refactor bmap record validation") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
|
#
32a2b11f |
|
22-Jul-2020 |
Carlos Maiolino <cmaiolino@redhat.com> |
xfs: Remove kmem_zone_zalloc() usage Use kmem_cache_zalloc() directly. With the exception of xlog_ticket_alloc() which will be dealt on the next patch for readability. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
3050bd0b |
|
22-Jul-2020 |
Carlos Maiolino <cmaiolino@redhat.com> |
xfs: Remove kmem_zone_alloc() usage Use kmem_cache_alloc() directly. All kmem_zone_alloc() users pass 0 as flags, which are translated into: GFP_KERNEL | __GFP_NOWARN, and kmem_zone_alloc() loops forever until the allocation succeeds. We can use __GFP_NOFAIL to tell the allocator to loop forever rather than doing it ourself, and because the allocation will never fail, we do not need to use __GFP_NOWARN anymore. Hence, all callers can be converted to use GFP_KERNEL | __GFP_NOFAIL Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: add a comment back in about nofail] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
a5949d3f |
|
23-May-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: force writes to delalloc regions to unwritten When writing to a delalloc region in the data fork, commit the new allocations (of the da reservation) as unwritten so that the mappings are only marked written once writeback completes successfully. This fixes the problem of stale data exposure if the system goes down during targeted writeback of a specific region of a file, as tested by generic/042. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
f7e67b20 |
|
18-May-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: move the fork format fields into struct xfs_ifork Both the data and attr fork have a format that is stored in the legacy idinode. Move it into the xfs_ifork structure instead, where it uses up padding. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
daf83964 |
|
18-May-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: move the per-fork nextents fields into struct xfs_ifork There are there are three extents counters per inode, one for each of the forks. Two are in the legacy icdinode and one is directly in struct xfs_inode. Switch to a single counter in the xfs_ifork structure where it uses up padding at the end of the structure. This simplifies various bits of code that just wants the number of extents counter and can now directly dereference it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
4b516ff4 |
|
14-May-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the NULL fork handling in xfs_bmapi_read Now that we fully verify the inode forks before they are added to the inode cache, the crash reported in https://bugzilla.kernel.org/show_bug.cgi?id=204031 can't happen anymore, as we'll never let an inode that has inconsistent nextents counts vs the presence of an in-core attr fork leak into the inactivate code path. So remove the work around to try to handle the case, and just return an error and warn if the fork is not present. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
1a1c57b2 |
|
14-May-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the special COW fork handling in xfs_bmapi_read We don't call xfs_bmapi_read for the COW fork anymore, so remove the special casing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e9e2eae8 |
|
18-Mar-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: only check the superblock version for dinode size calculation The size of the dinode structure is only dependent on the file system version, so instead of checking the individual inode version just use the newly added xfs_sb_version_has_large_dinode helper, and simplify various calling conventions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
8ef54797 |
|
10-Mar-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: rename btree cursor private btree member flags BPRV is not longer appropriate because bc_private is going away. Script: $ sed -i 's/BTCUR_BPRV/BTCUR_BMBT/g' fs/xfs/*[ch] fs/xfs/*/*[ch] With manual cleanup to the definitions in fs/xfs/libxfs/xfs_btree.h Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: change "BC_BT" to "BTCUR_BMBT", fix subject line typo] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
92219c29 |
|
10-Mar-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert btree cursor inode-private member names bc_private.b -> bc_ino conversion via script: $ sed -i 's/bc_private\.b/bc_ino/g' fs/xfs/*[ch] fs/xfs/*/*[ch] And then revert the change to the bc_ino #define in fs/xfs/libxfs/xfs_btree.h manually. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: tweak the subject line slightly] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
b73df17e |
|
26-Feb-2020 |
Brian Foster <bfoster@redhat.com> |
xfs: open code insert range extent split helper The insert range operation currently splits the extent at the target offset in a separate transaction and lock cycle from the one that shifts extents. In preparation for reworking insert range into an atomic operation, lift the code into the caller so it can be easily condensed to a single rolling transaction and lock cycle and eliminate the helper. No functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f48e2df8 |
|
23-Jan-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: make xfs_*read_agf return EAGAIN to ALLOC_FLAG_TRYLOCK callers Refactor xfs_read_agf and xfs_alloc_read_agf to return EAGAIN if the caller passed TRYLOCK and we weren't able to get the lock; and change the callers to recognize this. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
ee647f85 |
|
23-Jan-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: remove the xfs_btree_get_buf[ls] functions Remove the xfs_btree_get_bufs and xfs_btree_get_bufl functions, since they're pretty trivial oneliners. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
af952aeb |
|
16-Dec-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
libxfs: resync with the userspace libxfs Prepare to resync the userspace libxfs with the kernel libxfs. There were a few things I missed -- a couple of static inline directory functions that have to be exported for xfs_repair; a couple of directory naming functions that make porting much easier if they're /not/ static inline; and a u16 usage that should have been uint16_t. None of these things are bugs in their own right; this just makes porting xfsprogs easier. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
|
#
d0c22041 |
|
11-Dec-2019 |
Brian Foster <bfoster@redhat.com> |
xfs: stabilize insert range start boundary to avoid COW writeback race generic/522 (fsx) occasionally fails with a file corruption due to an insert range operation. The primary characteristic of the corruption is a misplaced insert range operation that differs from the requested target offset. The reason for this behavior is a race between the extent shift sequence of an insert range and a COW writeback completion that causes a front merge with the first extent in the shift. The shift preparation function flushes and unmaps from the target offset of the operation to the end of the file to ensure no modifications can be made and page cache is invalidated before file data is shifted. An insert range operation then splits the extent at the target offset, if necessary, and begins to shift the start offset of each extent starting from the end of the file to the start offset. The shift sequence operates at extent level and so depends on the preparation sequence to guarantee no changes can be made to the target range during the shift. If the block immediately prior to the target offset was dirty and shared, however, it can undergo writeback and move from the COW fork to the data fork at any point during the shift. If the block is contiguous with the block at the start offset of the insert range, it can front merge and alter the start offset of the extent. Once the shift sequence reaches the target offset, it shifts based on the latest start offset and silently changes the target offset of the operation and corrupts the file. To address this problem, update the shift preparation code to stabilize the start boundary along with the full range of the insert. Also update the existing corruption check to fail if any extent is shifted with a start offset behind the target offset of the insert range. This prevents insert from racing with COW writeback completion and fails loudly in the event of an unexpected extent shift. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
69ffe596 |
|
26-Nov-2019 |
Omar Sandoval <osandov@fb.com> |
xfs: don't check for AG deadlock for realtime files in bunmapi Commit 5b094d6dac04 ("xfs: fix multi-AG deadlock in xfs_bunmapi") added a check in __xfs_bunmapi() to stop early if we would touch multiple AGs in the wrong order. However, this check isn't applicable for realtime files. In most cases, it just makes us do unnecessary commits. However, without the fix from the previous commit ("xfs: fix realtime file data space leak"), if the last and second-to-last extents also happen to have different "AG numbers", then the break actually causes __xfs_bunmapi() to return without making any progress, which sends xfs_itruncate_extents_flags() into an infinite loop. Fixes: 5b094d6dac04 ("xfs: fix multi-AG deadlock in xfs_bunmapi") Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0c4da70c |
|
26-Nov-2019 |
Omar Sandoval <osandov@fb.com> |
xfs: fix realtime file data space leak Realtime files in XFS allocate extents in rextsize units. However, the written/unwritten state of those extents is still tracked in blocksize units. Therefore, a realtime file can be split up into written and unwritten extents that are not necessarily aligned to the realtime extent size. __xfs_bunmapi() has some logic to handle these various corner cases. Consider how it handles the following case: 1. The last extent is unwritten. 2. The last extent is smaller than the realtime extent size. 3. startblock of the last extent is not aligned to the realtime extent size, but startblock + blockcount is. In this case, __xfs_bunmapi() calls xfs_bmap_add_extent_unwritten_real() to set the second-to-last extent to unwritten. This should merge the last and second-to-last extents, so __xfs_bunmapi() moves on to the second-to-last extent. However, if the size of the last and second-to-last extents combined is greater than MAXEXTLEN, xfs_bmap_add_extent_unwritten_real() does not merge the two extents. When that happens, __xfs_bunmapi() skips past the last extent without unmapping it, thus leaking the space. Fix it by only unwriting the minimum amount needed to align the last extent to the realtime extent size, which is guaranteed to merge with the last extent. Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
a71895c5 |
|
11-Nov-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: convert open coded corruption check to use XFS_IS_CORRUPT Convert the last of the open coded corruption check and report idioms to use the XFS_IS_CORRUPT macro. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
f9e03706 |
|
11-Nov-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: kill the XFS_WANT_CORRUPT_* macros The XFS_WANT_CORRUPT_* macros conceal subtle side effects such as the creation of local variables and redirections of the code flow. This is pretty ugly, so replace them with explicit XFS_IS_CORRUPT tests that remove both of those ugly points. The change was performed with the following coccinelle script: @@ expression mp, test; identifier label; @@ - XFS_WANT_CORRUPTED_GOTO(mp, test, label); + if (XFS_IS_CORRUPT(mp, !test)) { error = -EFSCORRUPTED; goto label; } @@ expression mp, test; @@ - XFS_WANT_CORRUPTED_RETURN(mp, test); + if (XFS_IS_CORRUPT(mp, !test)) return -EFSCORRUPTED; @@ expression mp, lval, rval; @@ - XFS_IS_CORRUPT(mp, !(lval == rval)) + XFS_IS_CORRUPT(mp, lval != rval) @@ expression mp, e1, e2; @@ - XFS_IS_CORRUPT(mp, !(e1 && e2)) + XFS_IS_CORRUPT(mp, !e1 || !e2) @@ expression e1, e2; @@ - !(e1 == e2) + e1 != e2 @@ expression e1, e2, e3, e4, e5, e6; @@ - !(e1 == e2 && e3 == e4) || e5 != e6 + e1 != e2 || e3 != e4 || e5 != e6 @@ expression e1, e2, e3, e4, e5, e6; @@ - !(e1 == e2 || (e3 <= e4 && e5 <= e6)) + e1 != e2 && (e3 > e4 || e5 > e6) @@ expression mp, e1, e2; @@ - XFS_IS_CORRUPT(mp, !(e1 <= e2)) + XFS_IS_CORRUPT(mp, e1 > e2) @@ expression mp, e1, e2; @@ - XFS_IS_CORRUPT(mp, !(e1 < e2)) + XFS_IS_CORRUPT(mp, e1 >= e2) @@ expression mp, e1; @@ - XFS_IS_CORRUPT(mp, !!e1) + XFS_IS_CORRUPT(mp, e1) @@ expression mp, e1, e2; @@ - XFS_IS_CORRUPT(mp, !(e1 || e2)) + XFS_IS_CORRUPT(mp, !e1 && !e2) @@ expression mp, e1, e2, e3, e4; @@ - XFS_IS_CORRUPT(mp, !(e1 == e2) && !(e3 == e4)) + XFS_IS_CORRUPT(mp, e1 != e2 && e3 != e4) @@ expression mp, e1, e2, e3, e4; @@ - XFS_IS_CORRUPT(mp, !(e1 <= e2) || !(e3 >= e4)) + XFS_IS_CORRUPT(mp, e1 > e2 || e3 < e4) @@ expression mp, e1, e2, e3, e4; @@ - XFS_IS_CORRUPT(mp, !(e1 == e2) && !(e3 <= e4)) + XFS_IS_CORRUPT(mp, e1 != e2 && e3 > e4) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
2fe4f928 |
|
07-Nov-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: refactor "does this fork map blocks" predicate Replace the open-coded checks for whether or not an inode fork maps blocks with a macro that will implant the code for us. This helps us declutter the bmap code a bit. Note that I had to use a macro instead of a static inline function because of C header dependency problems between xfs_inode.h and xfs_inode_fork.h. Conversion was performed with the following Coccinelle script: @@ expression ip, w; @@ - XFS_IFORK_FORMAT(ip, w) == XFS_DINODE_FMT_EXTENTS || XFS_IFORK_FORMAT(ip, w) == XFS_DINODE_FMT_BTREE + xfs_ifork_has_extents(ip, w) @@ expression ip, w; @@ - XFS_IFORK_FORMAT(ip, w) != XFS_DINODE_FMT_EXTENTS && XFS_IFORK_FORMAT(ip, w) != XFS_DINODE_FMT_BTREE + !xfs_ifork_has_extents(ip, w) @@ expression ip, w; @@ - XFS_IFORK_FORMAT(ip, w) == XFS_DINODE_FMT_BTREE || XFS_IFORK_FORMAT(ip, w) == XFS_DINODE_FMT_EXTENTS + xfs_ifork_has_extents(ip, w) @@ expression ip, w; @@ - XFS_IFORK_FORMAT(ip, w) != XFS_DINODE_FMT_BTREE && XFS_IFORK_FORMAT(ip, w) != XFS_DINODE_FMT_EXTENTS + !xfs_ifork_has_extents(ip, w) @@ expression ip, w; @@ - (xfs_ifork_has_extents(ip, w)) + xfs_ifork_has_extents(ip, w) @@ expression ip, w; @@ - (!xfs_ifork_has_extents(ip, w)) + !xfs_ifork_has_extents(ip, w) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
f5be0844 |
|
06-Nov-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: null out bma->prev if no previous extent Coverity complains that we don't check the return value of xfs_iext_peek_prev_extent like we do nearly all of the time. If there is no previous extent then just null out bma->prev like we do elsewhere in the bmap code. Coverity-id: 1424057 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
a5155b87 |
|
02-Nov-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: always log corruption errors Make sure we log something to dmesg whenever we return -EFSCORRUPTED up the call stack. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
c34d570d |
|
30-Oct-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: cleanup use of the XFS_ALLOC_ flags Always set XFS_ALLOC_USERDATA for data fork allocations, and check it in xfs_alloc_is_userdata instead of the current obsfucated check. Also remove the xfs_alloc_is_userdata and xfs_alloc_allow_busy_reuse helpers to make the code a little easier to understand. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
fd638f1d |
|
30-Oct-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: move extent zeroing to xfs_bmapi_allocate Move the extent zeroing case there for the XFS_BMAPI_ZERO flag outside the low-level allocator and into xfs_bmapi_allocate, where is still is in transaction context, but outside the very lowlevel code where it doesn't belong. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
be6cacbe |
|
30-Oct-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: refactor xfs_bmapi_allocate Avoid duplicate userdata and data fork checks by restructuring the code so we only have a helper for userdata allocations that combines these checks in a straight foward way. That also helps to obsoletes the comments explaining what the code does as it is now clearly obvious. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e992ae8a |
|
28-Oct-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: refactor xfs_iread_extents to use xfs_btree_visit_blocks xfs_iread_extents open-codes everything in xfs_btree_visit_blocks, so refactor the btree helper to be able to iterate only the records on level 0, then port iread_extents to use it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
c2414ad6 |
|
28-Oct-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: replace -EIO with -EFSCORRUPTED for corrupt metadata There are a few places where we return -EIO instead of -EFSCORRUPTED when we find corrupt metadata. Fix those places. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
da781e64 |
|
21-Oct-2019 |
Brian Foster <bfoster@redhat.com> |
xfs: don't set bmapi total block req where minleft is xfs_bmapi_write() takes a total block requirement parameter that is passed down to the block allocation code and is used to specify the total block requirement of the associated transaction. This is used to try and select an AG that can not only satisfy the requested extent allocation, but can also accommodate subsequent allocations that might be required to complete the transaction. For example, additional bmbt block allocations may be required on insertion of the resulting extent to an inode data fork. While it's important for callers to calculate and reserve such extra blocks in the transaction, it is not necessary to pass the total value to xfs_bmapi_write() in all cases. The latter automatically sets minleft to ensure that sufficient free blocks remain after the allocation attempt to expand the format of the associated inode (i.e., such as extent to btree conversion, btree splits, etc). Therefore, any callers that pass a total block requirement of the bmap mapping length plus worst case bmbt expansion essentially specify the additional reservation requirement twice. These callers can pass a total of zero to rely on the bmapi minleft policy. Beyond being superfluous, the primary motivation for this change is that the total reservation logic in the bmbt code is dubious in scenarios where minlen < maxlen and a maxlen extent cannot be allocated (which is more common for data extent allocations where contiguity is not required). The total value is based on maxlen in the xfs_bmapi_write() caller. If the bmbt code falls back to an allocation between minlen and maxlen, that allocation will not succeed until total is reset to minlen, which essentially throws away any additional reservation included in total by the caller. In addition, the total value is not reset until after alignment is dropped, which means that such callers drop alignment far too aggressively than necessary. Update all callers of xfs_bmapi_write() that pass a total block value of the mapping length plus bmbt reservation to instead pass zero and rely on xfs_bmapi_minleft() to enforce the bmbt reservation requirement. This trades off slightly less conservative AG selection for the ability to preserve alignment in more scenarios. xfs_bmapi_write() callers that incorporate unrelated or additional reservations in total beyond what is already included in minleft must continue to use the former. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
1c743574 |
|
21-Oct-2019 |
Dave Chinner <dchinner@redhat.com> |
xfs: cap longest free extent to maximum allocatable Cap longest extent to the largest we can allocate based on limits calculated at mount time. Dynamic state (such as finobt blocks) can result in the longest free extent exceeding the size we can allocate, and that results in failure to align full AG allocations when the AG is empty. Result: xfs_io-4413 [003] 426.412459: xfs_alloc_vextent_loopfailed: dev 8:96 agno 0 agbno 32 minlen 243968 maxlen 244000 mod 0 prod 1 minleft 1 total 262148 alignment 32 minalignslop 0 len 0 type NEAR_BNO otype START_BNO wasdel 0 wasfromfl 0 resv 0 datatype 0x5 firstblock 0xffffffffffffffff minlen and maxlen are now separated by the alignment size, and allocation fails because args.total > free space in the AG. [bfoster: Added xfs_bmap_btalloc() changes.] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
4e087a3b |
|
17-Oct-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: use a struct iomap in xfs_writepage_ctx In preparation for moving the XFS writeback code to fs/iomap.c, switch it to use struct iomap instead of the XFS-specific struct xfs_bmbt_irec. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
aeea4b75 |
|
07-Oct-2019 |
Brian Foster <bfoster@redhat.com> |
xfs: move local to extent inode logging into bmap helper The callers of xfs_bmap_local_to_extents_empty() log the inode external to the function, yet this function is where the on-disk format value is updated. Push the inode logging down into the function itself to help prevent future mistakes. Note that internal bmap callers track the inode logging flags independently and thus may log the inode core twice due to this change. This is harmless, so leave this code around for consistency with the other attr fork conversion functions. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
ce840429 |
|
23-Sep-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: revert 1baa2800e62d ("xfs: remove the unused XFS_ALLOC_USERDATA flag") Revert this commit, as it caused periodic regressions in xfs/173 w/ 1k blocks. [1] https://lore.kernel.org/lkml/20190919014602.GN15734@shao2-debian/ Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
e20e174c |
|
18-Sep-2019 |
Brian Foster <bfoster@redhat.com> |
xfs: convert inode to extent format after extent merge due to shift The collapse range operation can merge extents if two newly adjacent extents are physically contiguous. If the extent count is reduced on a btree format inode, a change to extent format might be necessary. This format change currently occurs as a side effect of the file size update after extents have been shifted for the collapse. This codepath ultimately calls xfs_bunmapi(), which happens to check for and execute the format conversion even if there were no blocks removed from the mapping. While this ultimately puts the inode into the correct state, the fact the format conversion occurs in a separate transaction from the change that called for it is a problem. If an extent shift transaction commits and the filesystem happens to crash before the format conversion, the inode fork is left in a corrupted state after log recovery. The inode fork verifier fails and xfs_repair ultimately nukes the inode. This problem was originally reproduced by generic/388. Similar to how the insert range extent split code handles extent to btree conversion, update the collapse range extent merge code to handle btree to extent format conversion in the same transaction that merges the extents. This ensures that the inode fork format remains consistent if the filesystem happens to crash in the middle of a collapse range operation that changes the inode fork format. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
eb77b23b |
|
03-Sep-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: add a xfs_valid_startblock helper Add a helper that validates the startblock is valid. This checks for a non-zero block on the main device, but skips that check for blocks on the realtime device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
1baa2800 |
|
30-Aug-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the unused XFS_ALLOC_USERDATA flag Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
3e08f42a |
|
26-Aug-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: remove unnecessary int returns from deferred bmap functions Remove the return value from the functions that schedule deferred bmap operations since they never fail and do not return status. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
74b4c5d4 |
|
26-Aug-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: remove unnecessary int returns from deferred refcount functions Remove the return value from the functions that schedule deferred refcount operations since they never fail and do not return status. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
bc46ac64 |
|
26-Aug-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: remove unnecessary int returns from deferred rmap functions Remove the return value from the functions that schedule deferred rmap operations since they never fail and do not return status. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
707e0dda |
|
26-Aug-2019 |
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> |
fs: xfs: Remove KM_NOSLEEP and KM_SLEEP. Since no caller is using KM_NOSLEEP and no callee branches on KM_SLEEP, we can remove KM_NOSLEEP and replace KM_SLEEP with 0. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
8612de3f |
|
11-Aug-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: don't crash on null attr fork xfs_bmapi_read Zorro Lang reported a crash in generic/475 if we try to inactivate a corrupt inode with a NULL attr fork (stack trace shortened somewhat): RIP: 0010:xfs_bmapi_read+0x311/0xb00 [xfs] RSP: 0018:ffff888047f9ed68 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff888047f9f038 RCX: 1ffffffff5f99f51 RDX: 0000000000000002 RSI: 0000000000000008 RDI: 0000000000000012 RBP: ffff888002a41f00 R08: ffffed10005483f0 R09: ffffed10005483ef R10: ffffed10005483ef R11: ffff888002a41f7f R12: 0000000000000004 R13: ffffe8fff53b5768 R14: 0000000000000005 R15: 0000000000000001 FS: 00007f11d44b5b80(0000) GS:ffff888114200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000ef6000 CR3: 000000002e176003 CR4: 00000000001606e0 Call Trace: xfs_dabuf_map.constprop.18+0x696/0xe50 [xfs] xfs_da_read_buf+0xf5/0x2c0 [xfs] xfs_da3_node_read+0x1d/0x230 [xfs] xfs_attr_inactive+0x3cc/0x5e0 [xfs] xfs_inactive+0x4c8/0x5b0 [xfs] xfs_fs_destroy_inode+0x31b/0x8e0 [xfs] destroy_inode+0xbc/0x190 xfs_bulkstat_one_int+0xa8c/0x1200 [xfs] xfs_bulkstat_one+0x16/0x20 [xfs] xfs_bulkstat+0x6fa/0xf20 [xfs] xfs_ioc_bulkstat+0x182/0x2b0 [xfs] xfs_file_ioctl+0xee0/0x12a0 [xfs] do_vfs_ioctl+0x193/0x1000 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x6f/0xb0 do_syscall_64+0x9f/0x4d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f11d39a3e5b The "obvious" cause is that the attr ifork is null despite the inode claiming an attr fork having at least one extent, but it's not so obvious why we ended up with an inode in that state. Reported-by: Zorro Lang <zlang@redhat.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204031 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
|
#
250d4b4c |
|
28-Jun-2019 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: remove unused header files There are many, many xfs header files which are included but unneeded (or included twice) in the xfs code, so remove them. nb: xfs_linux.h includes about 9 headers for everyone, so those explicit includes get removed by this. I'm not sure what the preference is, but if we wanted explicit includes everywhere, a followup patch could remove those xfs_*.h includes from xfs_linux.h and move them into the files that need them. Or it could be left as-is. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f5b999c0 |
|
12-Jun-2019 |
Eric Sandeen <sandeen@redhat.com> |
xfs: remove unused flag arguments There are several functions which take a flag argument that is only ever passed as "0," so remove these arguments. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
9fe82b8c |
|
25-Apr-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: track delayed allocation reservations across the filesystem Add a percpu counter to track the number of blocks directly reserved for delayed allocations on the data device. This counter (in contrast to i_delayed_blks) does not track allocated CoW staging extents or anything going on with the realtime device. It will be used in the upcoming summary counter scrub function to check the free block counts without having to freeze the filesystem or walk all the inodes to find the delayed allocations. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
4b0bce30 |
|
19-Mar-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: always init bma in xfs_bmapi_write Always init the tp/ip fields of bma in xfs_bmapi_write so that the bmapi_finish at the bottom never trips over null transaction or inode pointers. Coverity-id: 1443964 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
6958d11f |
|
17-Mar-2019 |
Brian Foster <bfoster@redhat.com> |
xfs: don't trip over uninitialized buffer on extent read of corrupted inode We've had rather rare reports of bmap btree block corruption where the bmap root block has a level count of zero. The root cause of the corruption is so far unknown. We do have verifier checks to detect this form of on-disk corruption, but this doesn't cover a memory corruption variant of the problem. The latter is a reasonable possibility because the root block is part of the inode fork and can reside in-core for some time before inode extents are read. If this occurs, it leads to a system crash such as the following: BUG: unable to handle kernel paging request at ffffffff00000221 PF error: [normal kernel read fault] ... RIP: 0010:xfs_trans_brelse+0xf/0x200 [xfs] ... Call Trace: xfs_iread_extents+0x379/0x540 [xfs] xfs_file_iomap_begin_delay+0x11a/0xb40 [xfs] ? xfs_attr_get+0xd1/0x120 [xfs] ? iomap_write_begin.constprop.40+0x2d0/0x2d0 xfs_file_iomap_begin+0x4c4/0x6d0 [xfs] ? __vfs_getxattr+0x53/0x70 ? iomap_write_begin.constprop.40+0x2d0/0x2d0 iomap_apply+0x63/0x130 ? iomap_write_begin.constprop.40+0x2d0/0x2d0 iomap_file_buffered_write+0x62/0x90 ? iomap_write_begin.constprop.40+0x2d0/0x2d0 xfs_file_buffered_aio_write+0xe4/0x3b0 [xfs] __vfs_write+0x150/0x1b0 vfs_write+0xba/0x1c0 ksys_pwrite64+0x64/0xa0 do_syscall_64+0x5a/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe The crash occurs because xfs_iread_extents() attempts to release an uninitialized buffer pointer as the level == 0 value prevented the buffer from ever being allocated or read. Change the level > 0 assert to an explicit error check in xfs_iread_extents() to avoid crashing the kernel in the event of localized, in-core inode corruption. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
26b91c72 |
|
18-Feb-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: make COW fork unwritten extent conversions more robust If we have racing buffered and direct I/O COW fork extents under writeback can have been moved to the data fork by the time we call xfs_reflink_convert_cow from xfs_submit_ioend. This would be mostly harmless as the block numbers don't change by this move, except for the fact that xfs_bmapi_write will crash or trigger asserts when not finding existing extents, even despite trying to paper over this with the XFS_BMAPI_CONVERT_ONLY flag. Instead of special casing non-transaction conversions in the already way too complicated xfs_bmapi_write just add a new helper for the much simpler non-transactional COW fork case, which simplify ignores not found extents. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
125851ac |
|
15-Feb-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: move stat accounting to xfs_bmapi_convert_delalloc This way we can actually count how many bytes got converted and how many calls we need, unlike in the caller which doesn't have the detailed view. Note that this includes a slight change in behavior as the xs_xstrat_quick is now bumped for every allocation instead of just the one covering the requested writeback offset, which makes a lot more sense. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
491ce61e |
|
15-Feb-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: move transaction handling to xfs_bmapi_convert_delalloc No need to deal with the transaction and the inode locking in the caller. Note that we also switch to passing whichfork as the second paramter, matching what most related functions do. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
d8ae82e3 |
|
15-Feb-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: split XFS_BMAPI_DELALLOC handling from xfs_bmapi_write Delalloc conversion has traditionally been part of our function to allocate blocks on disk (first xfs_bmapi, then xfs_bmapi_write), but delalloc conversion is a little special as we really do not want to allocate blocks over holes, for which we don't have reservations. Split the delalloc conversions into a separate helper to keep the code simple and structured. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
c8b54673 |
|
15-Feb-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: factor out two helpers from xfs_bmapi_write We want to be able to reuse them for the upcoming dedidcated delalloc convert routine. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
b101e334 |
|
15-Feb-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: simplify the xfs_bmap_btree_to_extents calling conventions Move boilerplate code from the callers into xfs_bmap_btree_to_extents: - exit early without failure if we don't need to convert to the extent format - assert that we have a btree cursor - don't reinitialize the passed in logflags argument Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
c2b31643 |
|
01-Feb-2019 |
Brian Foster <bfoster@redhat.com> |
xfs: use the latest extent at writeback delalloc conversion time The writeback delalloc conversion code is racy with respect to changes in the currently cached file mapping outside of the current page. This is because the ilock is cycled between the time the caller originally looked up the mapping and across each real allocation of the provided file range. This code has collected various hacks over the years to help combat the symptoms of these races (i.e., truncate race detection, allocation into hole detection, etc.), but none address the fundamental problem that the imap may not be valid at allocation time. Rather than continue to use race detection hacks, update writeback delalloc conversion to a model that explicitly converts the delalloc extent backing the current file offset being processed. The current file offset is the only block we can trust to remain once the ilock is dropped because any operation that can remove the block (truncate, hole punch, etc.) must flush and discard pagecache pages first. Modify xfs_iomap_write_allocate() to use the xfs_bmapi_delalloc() mechanism to request allocation of the entire delalloc extent backing the current offset instead of assuming the extent passed by the caller is unchanged. Record the range specified by the caller and apply it to the resulting allocated extent so previous checks by the caller for COW fork overlap are not lost. Finally, overload the bmapi delalloc flag with the range reval flag behavior since this is the only use case for both. This ensures that writeback always picks up the correct and current extent associated with the page, regardless of races with other extent modifying operations. If operating on a data fork and the COW overlap state has changed since the ilock was cycled, the caller revalidates against the COW fork sequence number before using the imap for the next block. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
627209fb |
|
01-Feb-2019 |
Brian Foster <bfoster@redhat.com> |
xfs: create delalloc bmapi wrapper for full extent allocation The writeback delalloc conversion code is racy with respect to changes in the currently cached file mapping. This stems from the fact that the bmapi allocation code requires a file range to allocate and the writeback conversion code assumes the range of the currently cached mapping is still valid with respect to the fork. It may not be valid, however, because the ilock is cycled (potentially multiple times) between the time the cached mapping was populated and the delalloc conversion occurs. To facilitate a solution to this problem, create a new xfs_bmapi_delalloc() wrapper to xfs_bmapi_write() that takes a file (FSB) offset and attempts to allocate whatever delalloc extent backs the offset. Use a new bmapi flag to cause xfs_bmapi_write() to set the range based on the extent backing the bno parameter unless bno lands in a hole. If bno does land in a hole, fall back to the current behavior (which may result in an error or quietly skipping holes in the specified range depending on other parameters). This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
3b350898 |
|
01-Feb-2019 |
Brian Foster <bfoster@redhat.com> |
xfs: remove superfluous writeback mapping eof trimming Now that the cached writeback mapping is explicitly invalidated on data fork changes, the EOF trimming band-aid is no longer necessary. Remove xfs_trim_extent_eof() as well since it has no other users. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
7280feda |
|
12-Dec-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: remove xfs_rmap_ag_owner and friends Owner information for static fs metadata can be defined readonly at build time because it never changes across filesystems. This enables us to reduce stack usage (particularly in scrub) because we can use the statically defined oinfo structures. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
66e3237e |
|
12-Dec-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: const-ify xfs_owner_info arguments Only certain functions actually change the contents of an xfs_owner_info; the rest can accept a const struct pointer. This will enable us to save stack space by hoisting static owner info types to be const global variables. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
9230a0b6 |
|
19-Nov-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: delalloc -> unwritten COW fork allocation can go wrong Long saga. There have been days spent following this through dead end after dead end in multi-GB event traces. This morning, after writing a trace-cmd wrapper that enabled me to be more selective about XFS trace points, I discovered that I could get just enough essential tracepoints enabled that there was a 50:50 chance the fsx config would fail at ~115k ops. If it didn't fail at op 115547, I stopped fsx at op 115548 anyway. That gave me two traces - one where the problem manifested, and one where it didn't. After refining the traces to have the necessary information, I found that in the failing case there was a real extent in the COW fork compared to an unwritten extent in the working case. Walking back through the two traces to the point where the CWO fork extents actually diverged, I found that the bad case had an extra unwritten extent in it. This is likely because the bug it led me to had triggered multiple times in those 115k ops, leaving stray COW extents around. What I saw was a COW delalloc conversion to an unwritten extent (as they should always be through xfs_iomap_write_allocate()) resulted in a /written extent/: xfs_writepage: dev 259:0 ino 0x83 pgoff 0x17000 size 0x79a00 offset 0 length 0 xfs_iext_remove: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/2 offset 32 block 152 count 20 flag 1 caller xfs_bmap_add_extent_delay_real xfs_bmap_pre_update: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/1 offset 1 block 4503599627239429 count 31 flag 0 caller xfs_bmap_add_extent_delay_real xfs_bmap_post_update: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/1 offset 1 block 121 count 51 flag 0 caller xfs_bmap_add_ex Basically, Cow fork before: 0 1 32 52 +H+DDDDDDDDDDDD+UUUUUUUUUUU+ PREV RIGHT COW delalloc conversion allocates: 1 32 +uuuuuuuuuuuu+ NEW And the result according to the xfs_bmap_post_update trace was: 0 1 32 52 +H+wwwwwwwwwwwwwwwwwwwwwwww+ PREV Which is clearly wrong - it should be a merged unwritten extent, not an unwritten extent. That lead me to look at the LEFT_FILLING|RIGHT_FILLING|RIGHT_CONTIG case in xfs_bmap_add_extent_delay_real(), and sure enough, there's the bug. It takes the old delalloc extent (PREV) and adds the length of the RIGHT extent to it, takes the start block from NEW, removes the RIGHT extent and then updates PREV with the new extent. What it fails to do is update PREV.br_state. For delalloc, this is always XFS_EXT_NORM, while in this case we are converting the delayed allocation to unwritten, so it needs to be updated to XFS_EXT_UNWRITTEN. This LF|RF|RC case does not do this, and so the resultant extent is always written. And that's the bug I've been chasing for a week - a bmap btree bug, not a reflink/dedupe/copy_file_range bug, but a BMBT bug introduced with the recent in core extent tree scalability enhancements. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
2f3cd809 |
|
18-Oct-2018 |
Allison Henderson <allison.henderson@oracle.com> |
xfs: Add attibute set and helper functions This patch adds xfs_attr_set_args and xfs_bmap_set_attrforkoff. These sub-routines set the attributes specified in @args. We will use this later for setting parent pointers as a deferred attribute operation. [dgc: remove attr fork init code from xfs_attr_set_args().] [dgc: xfs_attr_try_sf_addname() NULLs args.trans after commit.] [dgc: correct sf add error handling.] Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
daa79bae |
|
18-Oct-2018 |
Christoph Hellwig <hch@lst.de> |
xfs: remove suport for filesystems without unwritten extent flag The option to enable unwritten extents was made default in 2003, removed from mkfs in 2007, and cannot be disabled in v5. We also rely on it for a lot of common functionality, so filesystems without it will run a completely untested and buggy code path. Enabling the support also is a simple bit flip using xfs_db, so legacy file systems can still be brought forward. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
e55ec4dd |
|
30-Sep-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: fix error handling in xfs_bmap_extents_to_btree Commit 01239d77b9dd ("xfs: fix a null pointer dereference in xfs_bmap_extents_to_btree") attempted to fix a null pointer dreference when a fuzzing corruption of some kind was found. This fix was flawed, resulting in assert failures like: XFS: Assertion failed: ifp->if_broot == NULL, file: fs/xfs/libxfs/xfs_bmap.c, line: 715 ..... Call Trace: xfs_bmap_extents_to_btree+0x6b9/0x7b0 __xfs_bunmapi+0xae7/0xf00 ? xfs_log_reserve+0x1c8/0x290 xfs_reflink_remap_extent+0x20b/0x620 xfs_reflink_remap_blocks+0x7e/0x290 xfs_reflink_remap_range+0x311/0x530 vfs_dedupe_file_range_one+0xd7/0xe0 vfs_dedupe_file_range+0x15b/0x1a0 do_vfs_ioctl+0x267/0x6c0 The problem is that the error handling code now asserts that the inode fork is not in btree format before the error handling code undoes the modifications that put the fork back in extent format. Fix this by moving the assert back to after the xfs_iroot_realloc() call that returns the fork to extent format, and clean up the jump labels to be meaningful. Also, returning ENOSPC when xfs_btree_get_bufl() fails to instantiate the buffer that was allocated (the actual fix in the commit mentioned above) is incorrect. This is a fatal error - only an invalid block address or a filesystem shutdown can result in failing to get a buffer here. Hence change this to EFSCORRUPTED so that the higher layer knows this was a corruption related failure and should not treat it as an ENOSPC error. This should result in a shutdown (via cancelling a dirty transaction) which is necessary as we do not attempt to clean up the (invalid) block that we have already allocated. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
01239d77 |
|
10-Aug-2018 |
Shan Hai <shan.hai@oracle.com> |
xfs: fix a null pointer dereference in xfs_bmap_extents_to_btree Fuzzing tool reports a write to null pointer error in the xfs_bmap_extents_to_btree, fix it by bailing out on encountering a null pointer. Signed-off-by: Shan Hai <shan.hai@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
9d9e6233 |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: fold dfops into the transaction struct xfs_defer_ops has now been reduced to a single list_head. The external dfops mechanism is unused and thus everywhere a (permanent) transaction is accessible the associated dfops structure is as well. Remove the xfs_defer_ops structure and fold the list_head into the transaction. Also remove the last remnant of external dfops in xfs_trans_dup(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0f37d178 |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: pass transaction to xfs_defer_add() The majority of remaining references to struct xfs_defer_ops in XFS are associated with xfs_defer_add(). At this point, there are no more external xfs_defer_ops users left. All instances of xfs_defer_ops are embedded in the transaction, which means we can safely pass the transaction down to the dfops add interface. Update xfs_defer_add() to receive the transaction as a parameter. Various subsystems implement wrappers to allocate and construct the context specific data structures for the associated deferred operation type. Update these to also carry the transaction down as needed and clean up unused dfops parameters along the way. This removes most of the remaining references to struct xfs_defer_ops throughout the code and facilitates removal of the structure. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [darrick: fix unused variable warnings with ftrace disabled] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
7dbddbac |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: drop dop param from xfs_defer_op_type ->finish_item() callback The dfops infrastructure ->finish_item() callback passes the transaction and dfops as separate parameters. Since dfops is always part of a transaction, the latter parameter is no longer necessary. Remove it from the various callbacks. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
a8198666 |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: automatic dfops inode relogging Inodes that are held across deferred operations are explicitly joined to the dfops structure to ensure appropriate relogging. While inodes are currently joined explicitly, we can detect the conditions that require relogging at dfops finish time by inspecting the transaction item list for inodes with ili_lock_flags == 0. Replace the xfs_defer_ijoin() infrastructure with such detection and automatic relogging of held inodes. This eliminates the need for the per-dfops inode list, replaced by an on-stack variant in xfs_defer_trans_roll(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
488c919a |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: add missing defer ijoins for held inodes Log items that require relogging during deferred operations processing are explicitly joined to the associated dfops via the xfs_defer_*join() helpers. These calls imply that the associated object is "held" by the transaction such that when rolled, the item can be immediately joined to a follow up transaction. For buffers, this means the buffer remains locked and held after each roll. For inodes, this means that the inode remains locked. Failure to join a held item to the dfops structure means the associated object pins the tail of the log while dfops processing completes, because the item never relogs and is not unlocked or released until deferred processing completes. Currently, all buffers that are held in transactions (XFS_BLI_HOLD) with deferred operations are explicitly joined to the dfops. This is not the case for inodes, however, as various contexts defer operations to transactions with held inodes without explicit joins to the associated dfops (and thus not relogging). While this is not a catastrophic problem, it is not ideal. Given that we want to eventually relog such items automatically during dfops processing, start by explicitly adding these missing xfs_defer_ijoin() calls. A call is added everywhere an inode is joined to a transaction without transferring lock ownership and said transaction runs deferred operations. All xfs_defer_ijoin() calls will eventually be replaced by automatic dfops inode relogging. This patch essentially implements the behavior change that would otherwise occur due to automatic inode dfops relogging. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
1214f1cf |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: replace dop_low with transaction flag The dop_low field enables the low free space allocation mode when a previous allocation has detected difficulty allocating blocks. It has historically been part of the xfs_defer_ops structure, which means if enabled, it remains enabled across a set of transactions until the deferred operations have completed and the dfops is reset. Now that the dfops is embedded in the transaction, we can save a bit more space by using a transaction flag rather than a standalone boolean. Drop the ->dop_low field and replace it with a transaction flag that is set at the same points, carried across rolling transactions and cleared on completion of deferred operations. This essentially emulates the behavior of ->dop_low and so should not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
3ba738df |
|
17-Jul-2018 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the xfs_ifork_t typedef We only have a few more callers left, so seize the opportunity and kill it off. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
c8eac49e |
|
24-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove all boilerplate defer init/finish code At this point, the transaction subsystem completely manages deferred items internally such that the common and boilerplate xfs_trans_alloc() -> xfs_defer_init() -> xfs_defer_finish() -> xfs_trans_commit() sequence can be replaced with a simple transaction allocation and commit. Remove all such boilerplate deferred ops code. In doing so, we change each case over to use the dfops in the transaction and specifically eliminate: - The on-stack dfops and associated xfs_defer_init() call, as the internal dfops is initialized on transaction allocation. - xfs_bmap_finish() calls that precede a final xfs_trans_commit() of a transaction. - xfs_defer_cancel() calls in error handlers that precede a transaction cancel. The only deferred ops calls that remain are those that are non-deterministic with respect to the final commit of the associated transaction or are open-coded due to special handling. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0b04b6b8 |
|
19-Jul-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: trivial xfs_btree_del_cursor cleanups The error argument to xfs_btree_del_cursor already understands the "nonzero for error" semantics, so remove pointless error testing in the callers and pass it directly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
5fdd9794 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove xfs_defer_init() firstblock param All but one caller of xfs_defer_init() passes in the ->t_firstblock of the associated transaction. The one outlier is xlog_recover_process_intents(), which simply passes a dummy value because a valid pointer is required. This firstblock variable can simply be removed. At this point we could remove the xfs_defer_init() firstblock parameter and initialize ->t_firstblock directly. Even that is not necessary, however, because ->t_firstblock is automatically reinitialized in the new transaction on a transaction roll. Since xfs_defer_init() should never occur more than once on a particular transaction (since the corresponding finish will roll it), replace the reinit from xfs_defer_init() with an assert that verifies the transaction has a NULLFSBLOCK firstblock. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
64396ff2 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove xfs_alloc_arg firstblock field The xfs_alloc_arg.firstblock field is used to control the starting agno for an allocation. The structure already carries a pointer to the transaction, which carries the current firstblock value. Remove the field and access ->t_firstblock directly in the allocation code. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
cf612de7 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove xfs_btree_cur private firstblock field The bmbt cursor private structure has a firstblock field that is used to maintain locking order on bmbt allocations. The field holds an actual firstblock value (as opposed to a pointer), so it is initialized on cursor creation, updated on allocation and then the value is transferred back to the source before the cursor is destroyed. This value is always transferred from and back to the ->t_firstblock field. Since xfs_btree_cur already carries a reference to the transaction, we can remove this field from xfs_btree_cur and the associated copying. The bmbt allocations will update the value in the transaction directly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
280253d2 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove bmap format helpers firstblock params The bmap format helpers receive firstblock via ->t_firstblock. Drop the param and access it directly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
92f9da30 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove bmap extent add helper firstblock params The add extent helpers all receive firstblock via ->t_firstblock. Drop the parameter and access it directly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
94c07b4d |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove xfs_bmalloca firstblock field The xfs_bmalloca.firstblock field carries the firstblock value from the transaction into the bmap infrastructure. It's initialized in one place from ->t_firstblock, so drop the field and access ->t_firstblock directly throughout the bmap code. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
4b77a088 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: use ->t_firstblock in bmap extent split Also remove the unnecessary xfs_bmap_split_extent_at() parameter. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
333f950c |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove bmap insert/collapse firstblock param The only callers pass ->t_firstblock. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
2af52842 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove xfs_bunmapi() firstblock param All callers pass ->t_firstblock from the current transaction. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
a7beabea |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove xfs_bmapi_write() firstblock param All callers pass ->t_firstblock from the current transaction. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
580c4ff9 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: use ->t_firstblock in xfs_bmapi_remap() Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
37283797 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: use ->t_firstblock for all xfs_bunmapi() callers Convert all xfs_bunmapi() callers to ->t_firstblock. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
76613903 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: use ->t_firstblock in xattr ops Similar to the dirops code, the xattr code uses an on-stack firstblock variable for the various operations. This code rolls the underlying transaction in various places, however, which means we cannot simply replace the local firstblock vars with ->t_firstblock. Doing so (without further changes) would invalidate the memory pointed to by xfs_da_args.firstblock as soon as the first transaction rolls. To avoid this problem, remove xfs_da_args.firstblock and replace all such accesses with ->t_firstblock at the same time. This ensures that accesses to the current firstblock always occur through the current transaction rather than a potentially invalid xfs_da_args pointer. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
825d75cd |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: use ->t_firstblock in attrfork add Note that this codepath is a user of struct xfs_da_args. Switch it over to ->t_firstblock in preparation to remove xfs_da_args.firstblock. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
3ae2d891 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: allow null firstblock in xfs_bmapi_write() when tp is null xfs_bmapi_write() always expects a valid firstblock pointer. It immediately dereferences the pointer to help determine how to initialize the bma.minleft field. The remaining accesses are related to modifying btree format forks, which is only relevant for !COW fork callers. The reflink code passes a NULL transaction to xfs_bmapi_write() in a couple places that do COW fork unwritten conversion. The purpose of the firstblock field is to track the first block allocation in the current transaction, so technically firstblock should not be required for these callers either. Tweak xfs_bmapi_write() to initialize the bma correctly without accessing the firstblock pointer if no transaction is provided in the first place. Update the reflink callers to pass NULL instead of otherwise unused firstblock references. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
bcd2c9f3 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: refactor dfops init to attach to transaction Most callers of xfs_defer_init() immediately attach the dfops structure to a transaction. Add a transaction parameter to eliminate much of this boilerplate code. This also helps self-document the fact that many codepaths now expect a dfops pointer implicitly via xfs_trans->t_dfops. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
42b394a9 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove xfs_btree_cur bmbt dfops field All assignments of xfs_btree_cur.bc_private.b.dfops originate from ->t_dfops. Replace accesses of the former with the latter and remove the unnecessary field. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
81ba8f3e |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove dfops param from internal bmap extent helpers All callers of the various bmap extent helpers now use ->t_dfops. Remove the unnecessary dfops params and access ->t_dfops directly. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f4a9cf97 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: use ->t_dfops for collapse/insert range operations Use ->t_dfops for the collapse and insert range transactions. These are the only callers of the respective bmap helpers, so replace the unnecessary dfops parameters with direct accesses to ->t_dfops. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
3e3673e3 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove struct xfs_bmalloca dfops field Now that bma.dfops is only assigned from ->t_dfops, replace all accesses to the former with the latter and remove the unnecessary field. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
ff3edf25 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove xfs_bmapi_remap() dfops param All xfs_bmapi_remap() callers already use ->t_dfops. Note that deferred completion context unconditionally sets ->t_dfops if it hasn't already been set by the caller. Remove the unnecessary parameter and access ->t_dfops directly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
ccd9d911 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove xfs_bunmapi() dfops param Now that all xfs_bunmapi() callers use ->t_dfops, remove the unnecessary parameter and access ->t_dfops directly. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
6e702a5d |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove xfs_bmapi_write() dfops param Now that all callers use ->t_dfops, the xfs_bmapi_write() dfops parameter is no longer necessary. Remove it and access ->t_dfops directly. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
32a9b7c6 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: replace xfs_da_args->dfops accesses with ->t_dfops and remove Now that xfs_da_args->dfops is always assigned from a ->t_dfops pointer (or one that is immediately attached), replace all downstream accesses of the former with the latter and remove the field from struct xfs_da_args. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
d76e6ce8 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: use ->t_dfops in extent split tx and remove param Attach the local dfops to ->t_dfops of the extent split transaction. Since this is the only caller of xfs_bmap_split_extent_at(), remove the dfops parameter as well. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0bd6207f |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove dfops param in attr fork add path Now that the attribute fork add tx carries dfops along with the transaction, it is unnecessary to pass it down the stack. Remove the dfops parameter and access ->t_dfops directly where necessary. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
40d03ac6 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: use ->t_dfops for attr set/remove operations Attach the local dfops to the transaction allocated for xattr add and remove operations. Add an earlier initialization in xfs_attr_remove() to ensure the structure is valid if it remains unused at transaction commit time. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
c3a2f9ff |
|
11-Jul-2018 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the now unused XFS_BMAPI_IGSTATE flag Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f62cb48e |
|
22-Jun-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: don't allow insert-range to shift extents past the maximum offset Zorro Lang reports that generic/485 blows an assert on a filesystem with 512 byte blocks. The test tries to fallocate a post-eof extent at the maximum file size and calls insert range to shift the extents right by two blocks. On a 512b block filesystem this causes startoff to overflow the 54-bit startoff field, leading to the assert. Therefore, always check the rightmost extent to see if it would overflow prior to invoking the insert range machinery. Reported-by: zlang@redhat.com Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200137 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
0703a8e1 |
|
08-Jun-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: replace do_mod with native operations do_mod() is a hold-over from when we have different sizes for file offsets and and other internal values for 40 bit XFS filesystems. Hence depending on build flags variables passed to do_mod() could change size. We no longer support those small format filesystems and hence everything is of fixed size theses days, even on 32 bit platforms. As such, we can convert all the do_mod() callers to platform optimised modulus operations as defined by linux/math64.h. Individual conversions depend on the types of variables being used. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
9bb54cb5 |
|
07-Jun-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: clean up MIN/MAX Get rid of the MIN/MAX macros and just use the native min/max macros directly in the XFS code. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0b61f8a4 |
|
05-Jun-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert to SPDX license tags Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
17ba2cc7 |
|
03-Jun-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: don't assert when reporting on-disk corruption while loading btree Don't bother ASSERTing when we're already going to log and return the corruption status. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
7644bd98 |
|
14-May-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: teach xfs_bmapi_remap to accept some bmapi flags Teach xfs_bmapi_remap how to map in unwritten extent and to skip rmap updates. This enables us to rebuild real and unwritten extents from the rmapbt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
7cf199ba |
|
14-May-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: make xfs_bmapi_remapi work with attribute forks Add a new flags argument to xfs_bmapi_remapi so that we can pass BMAPI flags into the function. This enables us to pass in BMAPI_ATTRFORK so that we can remap things into the attribute fork. Eventually the online repair code will use this to rebuild attribute forks, so make it non-static. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
4e529339 |
|
10-May-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: factor out nodiscard helpers The changes to skip discards of speculative preallocation and unwritten extents introduced several new wrapper functions through the bunmapi -> extent free codepath to reduce churn in all of the associated callers. In several cases, these wrappers simply toggle a single flag to skip or not skip discards for the resulting blocks. The explicit _nodiscard() wrappers for such an isolated set of callers is a bit overkill. Kill off these wrappers and replace with the calls to the underlying functions in the contexts that need to control discard behavior. Retain the wrappers that preserve the original calling conventions to serve the original purpose of reducing code churn. This is a refactoring patch and does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
95eb308c |
|
09-May-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: add BMAPI_NORMAP flag to perform block remapping without updating rmapbt Add a new flag, XFS_BMAPI_NORMAP, which will perform file block remapping without updating the rmapbt. This will be used by the repair code to reconstruct bmbts from the rmapbt, in which case we don't want the rmapbt update. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
84ca484e |
|
09-May-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: don't discard on free of unwritten extents Unwritten extents by definition have not been written to until they are converted to normal written extents. If unwritten extents are freed from a file, it is therefore guaranteed that the blocks have not been written to since allocation (note that zero range punches and reallocates blocks). To cut down on online discards generated from workloads that make use of preallocation, skip discards of extents if they are in the unwritten state when the extent is freed. Note that this optimization does not apply to log recovery, during which all freed extents are discarded if online discard is enabled. Also note that it may be possible for a filesystem crash to occur after write completion of an unwritten extent but before unwritten conversion such that the extent remains unwritten after log recovery. Since this pseudo-inconsistency may already be possible after a crash (consider writing to recently allocated blocks where the allocation transaction is lost after a crash), this change shouldn't introduce any fundamental limitations that don't already exist. In short, on storage stacks where discards are important, it's good practice to run an occasional fstrim even with online discard enabled in the filesystem, particularly after a crash. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
fcb762f5 |
|
09-May-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: add bmapi nodiscard flag Freed extents are unconditionally discarded when online discard is enabled. Define XFS_BMAPI_NODISCARD to allow callers to bypass discards when unnecessary. For example, this will be useful for eofblocks trimming. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e6631f85 |
|
09-May-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: get rid of the log item descriptor It's just a connector between a transaction and a log item. There's a 1:1 relationship between a log item descriptor and a log item, and a 1:1 relationship between a log item descriptor and a transaction. Both relationships are created and terminated at the same time, so why do we even have the descriptor? Replace it with a specific list_head in the log item and a new log item dirtied flag to replace the XFS_LID_DIRTY flag. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [darrick: fix up deferred agfl intent finish_item use of LID_DIRTY] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
cec57256 |
|
04-May-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: bmap debugging should never panic the system Don't panic() the system if the bmap records are garbage, just call ASSERT which gives us the same backtrace but enables developers to control if the system goes down or not. This makes debugging with generic/388 much easier because it won't reboot the machine midway through a run just because btree_read_bufl returns EIO when the fs has already shut down. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
2c4306f7 |
|
17-Apr-2018 |
Eric Sandeen <sandeen@redhat.com> |
xfs: set format back to extents if xfs_bmap_extents_to_btree If xfs_bmap_extents_to_btree fails in a mode where we call xfs_iroot_realloc(-1) to de-allocate the root, set the format back to extents. Otherwise we can assume we can dereference ifp->if_broot based on the XFS_DINODE_FMT_BTREE format, and crash. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199423 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
a1f69417 |
|
06-Apr-2018 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: non-scrub - remove unused function parameters Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
90a58f95 |
|
23-Mar-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: refactor inode verifier error logging Refactor some of the inode verifier failure logging call sites to use the new xfs_inode_verifier_error method which dumps the offending buffer as well as the code location of the failed check. This trims the output, makes it clearer to the admin that repair must be run, and gives the developers more details to work from. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
30b0984d |
|
23-Mar-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: refactor bmap record validation Refactor the bmap validator into a more complete helper that looks for extents that run off the end of the device, overflow into the next AG, or have invalid flag states. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
6d8a45ce |
|
19-Jan-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: don't screw up direct writes when freesp is fragmented xfs_bmap_btalloc is given a range of file offset blocks that must be allocated to some data/attr/cow fork. If the fork has an extent size hint associated with it, the request will be enlarged on both ends to try to satisfy the alignment hint. If free space is fragmentated, sometimes we can allocate some blocks but not enough to fulfill any of the requested range. Since bmapi_allocate always trims the new extent mapping to match the originally requested range, this results in bmapi_write returning zero and no mapping. The consequences of this vary -- buffered writes will simply re-call bmapi_write until it can satisfy at least one block from the original request. Direct IO overwrites notice nmaps == 0 and return -ENOSPC through the dio mechanism out to userspace with the weird result that writes fail even when we have enough space because the ENOSPC return overrides any partial write status. For direct CoW writes the situation was disastrous because nobody notices us returning an invalid zero-length wrong-offset mapping to iomap and the write goes off into space. Therefore, if free space is so fragmented that we managed to allocate some space but not enough to map into even a single block of the original allocation request range, we should break the alignment hint in order to guarantee at least some forward progress for the direct write. If we return a short allocation to iomap_apply it'll call back about the remaining blocks. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
4b4c1326 |
|
19-Jan-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: treat CoW fork operations as delalloc for quota accounting Since the CoW fork only exists in memory, it is incorrect to update the on-disk quota block counts when we modify the CoW fork. Unlike the data fork, even real extents in the CoW fork are only delalloc-style reservations (on-disk they're owned by the refcountbt) so they must not be tracked in the on disk quota info. Ensure the i_delayed_blks accounting reflects this too. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
751f3767 |
|
25-Jan-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: refactor accounting updates out of xfs_bmap_btalloc Move all the inode and quota accounting updates out of xfs_bmap_btalloc in preparation for fixing some quota accounting problems with copy on write. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
6ca30729 |
|
23-Jan-2018 |
Shan Hai <shan.hai@oracle.com> |
xfs: bmap code cleanup Remove the extent size hint and realtime inode relevant code from the xfs_bmapi_reserve_delalloc since it is not called on the inode with extent size hint set or on a realtime inode. Signed-off-by: Shan Hai <shan.hai@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
be78ff0e |
|
16-Jan-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: recheck reflink / dirty page status before freeing CoW reservations Eryu Guan reported seeing occasional hangs when running generic/269 with a new fsstress that supports clonerange/deduperange. The cause of this hang is an infinite loop when we convert the CoW fork extents from unwritten to real just prior to writing the pages out; the infinite loop happens because there's nothing in the CoW fork to convert, and so it spins forever. The fundamental issue here is that when we go to perform these CoW fork conversions, we're supposed to have an extent waiting for us, but the low space CoW reaper has snuck in and blown them away! There are four conditions that can dissuade the reaper from touching our file -- no reflink iflag; dirty page cache; writeback in progress; or directio in progress. We check the four conditions prior to taking the locks, but we neglect to recheck them once we have the locks, which is how we end up whacking the writeback that's in progress. Therefore, refactor the four checks into a helper function and call it once again once we have the locks to make sure we really want to reap the inode. While we're at it, add an ASSERT for this weird condition so that we'll fail noisily if we ever screw this up again. Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Tested-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
59f6fec3 |
|
08-Jan-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: remove XFS_FSB_SANITY_CHECK We already have a function to verify fsb pointers, so get rid of the last users of the (less robust) macro. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
8c57b886 |
|
10-Dec-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: account for null transactions in bunmapi In e1a4e37cc7b665 ("xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent"), we try to constrain the amount of real extents we unmap from the data fork in a given call so that we don't blow out transaction reservations. However, not all bunmapi operations require a transaction -- if we're only removing a delalloc extent, no transaction is needed, so we have to code against that. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
d41c6172 |
|
27-Nov-2017 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: fix leaks on corruption errors in xfs_bmap.c Use _GOTO instead of _RETURN so we can free the allocated cursor on error. Fixes: bf80628 ("xfs: remove xfs_bmse_shift_one") Fixes-coverity-id: 1423813, 1423676 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
dac9c9b1 |
|
03-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: pass struct xfs_bmbt_irec to xfs_bmbt_validate_extent This removed an unaligned load per extent, as well as the manual poking into the on-disk extent format. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
c38ccf59 |
|
03-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the nr_extents argument to xfs_iext_remove We only have two places that remove 2 extents at the same time, so unroll the loop there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0254c2f2 |
|
03-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the nr_extents argument to xfs_iext_insert We only have two places that insert 2 extents at the same time, so unroll the loop there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
6bdcf26a |
|
03-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: use a b+tree for the in-core extent list Replace the current linear list and the indirection array for the in-core extent list with a b+tree to avoid the need for larger memory allocations for the indirection array when lots of extents are present. The current extent list implementations leads to heavy pressure on the memory allocator when modifying files with a high extent count, and can lead to high latencies because of that. The replacement is a b+tree with a few quirks. The leaf nodes directly store the extent record in two u64 values. The encoding is a little bit different from the existing in-core extent records so that the start offset and length which are required for lookups can be retreived with simple mask operations. The inner nodes store a 64-bit key containing the start offset in the first half of the node, and the pointers to the next lower level in the second half. In either case we walk the node from the beginninig to the end and do a linear search, as that is more efficient for the low number of cache lines touched during a search (2 for the inner nodes, 4 for the leaf nodes) than a binary search. We store termination markers (zero length for the leaf nodes, an otherwise impossible high bit for the inner nodes) to terminate the key list / records instead of storing a count to use the available cache lines as efficiently as possible. One quirk of the algorithm is that while we normally split a node half and half like usual btree implementations we just spill over entries added at the very end of the list to a new node on its own. This means we get a 100% fill grade for the common cases of bulk insertion when reading an inode into memory, and when only sequentially appending to a file. The downside is a slightly higher chance of splits on the first random insertions. Both insert and removal manually recurse into the lower levels, but the bulk deletion of the whole tree is still implemented as a recursive function call, although one limited by the overall depth and with very little stack usage in every iteration. For the first few extents we dynamically grow the list from a single extent to the next powers of two until we have a first full leaf block and that building the actual tree. The code started out based on the generic lib/btree.c code from Joern Engel based on earlier work from Peter Zijlstra, but has since been rewritten beyond recognition. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
b121459c |
|
03-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: simplify xfs_reflink_convert_cow Instead of looking up extents to convert and calling xfs_bmapi_write on each of them just let xfs_bmapi_write handle the full range. To make this robust add a new XFS_BMAPI_CONVERT_ONLY that only converts ranges and never allocates blocks. [darrick: shorten the stringified CONVERT_ONLY trace flag] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
b2b1712a |
|
03-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: introduce the xfs_iext_cursor abstraction Add a new xfs_iext_cursor structure to hide the direct extent map index manipulations. In addition to the existing lookup/get/insert/ remove and update routines new primitives to get the first and last extent cursor, as well as moving up and down by one extent are provided. Also new are convenience to increment/decrement the cursor and retreive the new extent, as well as to peek into the previous/next extent without updating the cursor and last but not least a macro to iterate over all extents in a fork. [darrick: rename for_each_iext to for_each_xfs_iext] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
906abed5 |
|
03-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: iterate over extents in xfs_bmap_extents_to_btree This actually makes the function very slightly less efficient for now as we detour through the expanded irect format between the in-core extent format and the on-disk one instead of just endian swapping them. But with the incore extent btree the in-core one will use a different format and the representation will be entirely hidden. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f36bc228 |
|
03-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: pass an on-disk extent to xfs_bmbt_validate_extent This prepares for getting rid of the current in-memory extent format. At the end of the series we will change the calling convention again to pass the xfs_bmbt_irec structure once it is available everywhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
42630361 |
|
03-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: treat idx as a cursor in xfs_bmap_collapse_extents Stop poking before and after the index and just increment or decrement it while doing our operations on it to prepare for a new extent list implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
657fcb23 |
|
03-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: treat idx as a cursor in xfs_bmap_del_extent_* Stop poking before and after the index and just increment or decrement it while doing our operations on it to prepare for a new extent list implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
a6818477 |
|
03-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: treat idx as a cursor in xfs_bmap_add_extent_unwritten_real Stop poking before and after the index and just increment or decrement it while doing our operations on it to prepare for a new extent list implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
1d2e0089 |
|
03-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: treat idx as a cursor in xfs_bmap_add_extent_hole_real Stop poking before and after the index and just increment or decrement it while doing our operations on it to prepare for a new extent list implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
41d196f4 |
|
03-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: treat idx as a cursor in xfs_bmap_add_extent_hole_delay Stop poking before and after the index and just increment or decrement it while doing our operations on it to prepare for a new extent list implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0d045540 |
|
03-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: treat idx as a cursor in xfs_bmap_add_extent_delay_real Stop poking before and after the index and just increment or decrement it while doing our operations on it to prepare for a new extent list implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
bf99971c |
|
03-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: remove a duplicate assignment in xfs_bmap_add_extent_delay_real Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
1bfd7618 |
|
03-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: don't create overlapping extents in xfs_bmap_add_extent_delay_real Two cases in xfs_bmap_add_extent_delay_real currently insert a new extent before updating the existing one that is being split. While this works fine with a simple extent list, a more complex tree can't easily cope with overlapping extent. Reshuffle the code a bit to update the slot of the existing delalloc extent to the new real extent before inserting the shortened delalloc extent before or after it. This avoids the overlapping extents while still allowing to update the br_startblock field of the delalloc extent with the updated indirect block reservation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e9e899a2 |
|
31-Oct-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: move error injection tags into their own file Move the error injection tag names into a libxfs header so that we can share it between kernel and userspace. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
dc56015f |
|
23-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: add a new xfs_iext_lookup_extent_before helper This helper looks up the last extent the covers space before the passed in block number. This is useful for truncate and similar operations that operate backwards over the extent list. For xfs_bunmapi it also is a slight optimization as we can return early if there are not extents at or below the end of the to be truncated range. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
211e95bb |
|
23-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: merge xfs_bmap_read_extents into xfs_iread_extents xfs_iread_extents is just a trivial wrapper, there is no good reason to keep the two separate. [darrick: minor fixups having left xfs_bmbt_validate_extent intact] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
29b3e94a |
|
19-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: rewrite xfs_bmap_first_unused to make better use of xfs_iext_get_extent Look at the return value of xfs_iext_get_extent instead of figuring out the extent count first and looping up to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
5936dc54 |
|
19-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: don't rely on extent indices in xfs_bmap_insert_extents Rewrite xfs_bmap_insert_extents so that we don't rely on extent indices except for iterating over them. Not being able to iterate to the previous extent or finding the extent that stop_fsb is in are sufficient exit conditions, and we don't need to do any extent count games given that: a) we already flushed all delalloc extents past our start offset before doing the operation b) xfs_iext_count() includes delalloc extents anyway Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
40591bdb |
|
19-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: don't rely on extent indices in xfs_bmap_collapse_extents Rewrite xfs_bmap_collapse_extents so that we don't rely on extent indices except for iterating over them. Not being able to iterate to the next extent is a sufficient exit condition, and we don't need to do any extent count games given that: a) we already flushed all delalloc extents past our start offset before doing the operation b) xfs_iext_count() includes delalloc extents anyway Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
11f75b3b |
|
19-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: update got in xfs_bmap_shift_update_extent This way the caller gets the proper updated extent returned in got. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
bf806280 |
|
19-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: remove xfs_bmse_shift_one Instead do the actual left and right shift work in the callers, and just keep a helper to update the bmap and rmap btrees as well as the in-core extent list. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
ecfea3f0 |
|
19-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: split xfs_bmap_shift_extents Have a separate helper for insert vs collapse, as this prepares us for simplifying the code in the next patches. Also changed the done output argument to a bool intead of int for both new functions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
6b18af0d |
|
19-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: remove XFS_BMAP_MAX_SHIFT_EXTENTS The define was always set to 1, which means looping until we reach is was dead code from the start. Also remove an initialization of next_fsb for the done case that doesn't fit the new code flow - it was never checked by the caller in the done case to start with. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
42b67dc6 |
|
19-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the never fully implemented UUID fork format Remove the dead code dealing with the UUID fork format that was never implemented in Linux (and neither in IRIX as far as I know). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e8e0e170 |
|
19-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: remove XFS_BMAP_TRACE_EXLIST Instead of looping over all extents in some debug-only helper just insert trace points into the loops that already exist in the calling functions. Also split the xfs_extlist trace point into one each for reading and writing extents from disk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
ca5d8e5b |
|
19-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: move pre/post-bmap tracing into xfs_iext_update_extent xfs_iext_update_extent already has basically all the information needed to centralize the bmap pre/post tracing. We just need to pass inode + bmap state instead of the inode fork pointer to get all trace annotations. In addition to covering all the existing trace points this gives us tracing coverage for the extent shifting operations for free. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
d138604f |
|
19-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: remove post-bmap tracing in xfs_bmap_local_to_extents Now that we use xfs_iext_insert this is already covered by the tracing in that function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
35e62da5 |
|
19-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: make better use of the 'state' variable in xfs_bmap_del_extent_real We already have all the information about the fork a=D1=95 well as additional tracing information, so pass that to xfs_iext_remove(). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
060ea65b |
|
19-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: add a xfs_bmap_fork_to_state helper This creates the right initial bmap state from the passed in inode fork enum. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f135761a |
|
17-Oct-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: refactor btree pointer checks Refactor the btree pointer checks so that we can call them from the scrub code without logging errors to dmesg. Preserve the existing error reporting for regular operations. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
b5cfbc22 |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: replace xfs_bmbt_lookup_ge with xfs_bmbt_lookup_first We only use xfs_bmbt_lookup_ge to look up the first bmap record in an inode, so replace xfs_bmbt_lookup_ge with a special purpose helper that is a bit more descriptive. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e16cf9b0 |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: pass a struct xfs_bmbt_irec to xfs_bmbt_lookup_eq Now that we've massaged the callers into the right form we can always pass the actual extent record instead of the individual fields. As an additional benefit the btree cursor will now be prepoulated with the correct extent state instead of having to fix it up later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
a67d00a5 |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: pass a struct xfs_bmbt_irec to xfs_bmbt_update Now that we've massaged the callers into the right form we can always pass the actual extent record instead of the individual fields. With that xfs_bmbt_disk_set_allf can go away, and xfs_bmbt_disk_set_all can be merged into the former implementation of xfs_bmbt_disk_set_allf. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
79fa6143 |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: refactor xfs_bmap_add_extent_unwritten_real Use xfs_iext_get_extent to find, and xfs_iext_update_extent to update entries in the in-core extent list. This isolates the function from the detailed layout of the extent list, and generally makes the code a lot more readable. Also get rid of the oldext and newext variables as using the extent records is a lot more descriptive. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
ca1862b0 |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: refactor delalloc accounting in xfs_bmap_add_extent_delay_real Account for all changes to the delalloc reservation in da_new, and use a single call xfs_mod_fdblocks to reserve/free blocks, including always checking for an error. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
4dcb8869 |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: refactor xfs_bmap_add_extent_delay_real Use xfs_iext_get_extent to find, and xfs_iext_update_extent to update entries in the in-core extent list. This isolates the function from the detailed layout of the extent list, and generally makes the code a lot more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
1abb9e55 |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: refactor xfs_bmap_add_extent_hole_real Use xfs_iext_update_extent to update entries in the in-core extent list. This isolates the function from the detailed layout of the extent list, and generally makes the code a lot more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
3ffc18ec |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: refactor xfs_bmap_add_extent_hole_delay Use xfs_iext_get_extent to find, and xfs_iext_update_extent to update entries in the in-core extent list. This isolates the function from the detailed layout of the extent list, and generally makes the code a lot more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
48fd52b1 |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: refactor xfs_del_extent_real Use xfs_iext_update_extent to update entries in the in-core extent list. This isolates the function from the detailed layout of the extent list, and generally makes the code a lot more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
491f6f8a |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: use the state defines in xfs_bmap_del_extent_real Use the same defines as the other extent add and delete helpers, which both improves code readability and trace point output. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0173c689 |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: use correct state defines in xfs_bmap_del_extent_{cow,delay} Use the _FILLING values to match the usage in the xfs_bmap_add_extent_* helpers. No change in behavior, just better naming in the code and tracepoint output. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
1b24b633 |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: move some more code into xfs_bmap_del_extent_real Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e1d7553f |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: use xfs_bmap_del_extent_delay for the data fork as well And remove the delalloc code from xfs_bmap_del_extent, which gets renamed to xfs_bmap_del_extent_real to fit the naming scheme used by the other xfs_bmap_{add,del}_extent_* routines. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
8280f6ed |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: rename bno to end in __xfs_bunmapi Rename the bno variable that's used as the end of the range in __xfs_bunmapi to end, which better describes it. Additionally change the start variable which takes the initial value of bno to be the function parameter itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
b213d692 |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: don't set XFS_BTCUR_BPRV_WASDEL in xfs_bunmapi The XFS_BTCUR_BPRV_WASDEL flag is supposed to indicate that we are converting a delayed allocation to a real one, which isn't the case in xfs_bunmapi. Setting it could theoretically lead to misaccounting here, but it's unlikely that we ever hit it in practice. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e3f0f756 |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: use xfs_iext_get_extent instead of open coding it This avoids exposure to details of the extent list implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
5e422f5e |
|
17-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: fix incorrect extent state in xfs_bmap_add_extent_unwritten_real There was one spot in xfs_bmap_add_extent_unwritten_real that didn't use the passed in new extent state but always converted to normal, leading to wrong behavior when converting from normal to unwritten. Only found by code inspection, it seems like this code path to move partial extent from written to unwritten while merging it with the next extent is rarely exercised. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
40214d12 |
|
13-Oct-2017 |
Brian Foster <bfoster@redhat.com> |
xfs: trim writepage mapping to within eof The writeback rework in commit fbcc02561359 ("xfs: Introduce writeback context for writepages") introduced a subtle change in behavior with regard to the block mapping used across the ->writepages() sequence. The previous xfs_cluster_write() code would only flush pages up to EOF at the time of the writepage, thus ensuring that any pages due to file-extending writes would be handled on a separate cycle and with a new, updated block mapping. The updated code establishes a block mapping in xfs_writepage_map() that could extend beyond EOF if the file has post-eof preallocation. Because we now use the generic writeback infrastructure and pass the cached mapping to each writepage call, there is no implicit EOF limit in place. If eofblocks trimming occurs during ->writepages(), any post-eof portion of the cached mapping becomes invalid. The eofblocks code has no means to serialize against writeback because there are no pages associated with post-eof blocks. Therefore if an eofblocks trim occurs and is followed by a file-extending buffered write, not only has the mapping become invalid, but we could end up writing a page to disk based on the invalid mapping. Consider the following sequence of events: - A buffered write creates a delalloc extent and post-eof speculative preallocation. - Writeback starts and on the first writepage cycle, the delalloc extent is converted to real blocks (including the post-eof blocks) and the mapping is cached. - The file is closed and xfs_release() trims post-eof blocks. The cached writeback mapping is now invalid. - Another buffered write appends the file with a delalloc extent. - The concurrent writeback cycle picks up the just written page because the writeback range end is LLONG_MAX. xfs_writepage_map() attributes it to the (now invalid) cached mapping and writes the data to an incorrect location on disk (and where the file offset is still backed by a delalloc extent). This problem is reproduced by xfstests test generic/464, which triggers racing writes, appends, open/closes and writeback requests. To address this problem, trim the mapping used during writeback to within EOF when the mapping is validated. This ensures the mapping is revalidated for any pages encountered beyond EOF as of the time the current mapping was cached or last validated. Reported-by: Eryu Guan <eguan@redhat.com> Diagnosed-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
749f24f3 |
|
09-Oct-2017 |
Thomas Meyer <thomas@m3y3r.de> |
xfs: Fix bool initialization/comparison Bool initializations should use true and false. Bool tests don't need comparisons. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
5e5c943c |
|
18-Sep-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: revert "xfs: factor rmap btree size into the indlen calculations" In commit fd26a88093ba we added a worst case estimate for rmapbt blocks needed to satisfy the block mapping request. Since then, we added the ability to reserve enough space in each AG such that we should never run out of blocks to grow the rmapbt, which makes this calculation unnecessary. Revert the commit because it makes the extra delalloc indlen accounting unnecessary and incorrect. Reported-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
7bf7a193 |
|
31-Aug-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: fix compiler warnings Fix up all the compiler warnings that have crept in. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
4cc1ee5e |
|
30-Aug-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: simplify the rmap code in xfs_bmse_merge In Christoph's patch to refactor xfs_bmse_merge, the updated rmap code does more work than it needs to (because map-extent auto-merges records). Remove the unnecessary unmap and save ourselves a deferred op. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
4c35445b |
|
29-Aug-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: use xfs_iext_*_extent helpers in xfs_bmap_split_extent_at This abstracts the function away from details of the low-level extent list implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
4da6b514 |
|
29-Aug-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: use xfs_iext_*_extent helpers in xfs_bmap_shift_extents This abstracts the function away from details of the low-level extent list implementation. Note that it seems like the previous implementation of rmap for the merge case was completely broken, but it no seems appear to trigger that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
05b7c8ab |
|
29-Aug-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: move some code around inside xfs_bmap_shift_extents For the first right move we need to look up next_fsb. That means our last fsb that contains next_fsb must also be the current extent, so take advantage of that by moving the code around a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f2285c14 |
|
29-Aug-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: use xfs_iext_get_extent in xfs_bmap_first_unused Use the bmap abstraction instead of open-coding bmbt details here. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
50bb44c2 |
|
29-Aug-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: switch xfs_bmap_local_to_extents to use xfs_iext_insert Use the helper instead of open coding it, to provide a better abstraction for the scalable extent list work. This also gets an additional assert and trace point for free. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
67e4e69c |
|
29-Aug-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: add a xfs_iext_update_extent helper This helper is used to update an extent record based on the extent index, and can be used to provide a level of abstractions between callers that want to modify in-core extent records and the details of the extent list implementation. Also switch all users of the xfs_bmbt_set_all(xfs_iext_get_ext(...)) pattern to this new helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
8ad7c629 |
|
28-Aug-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the ip argument to xfs_defer_finish And instead require callers to explicitly join the inode using xfs_defer_ijoin. Also consolidate the defer error handling in a few places using a goto label. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
882d8785 |
|
28-Aug-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: rename xfs_defer_join to xfs_defer_ijoin Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
5b094d6d |
|
18-Jul-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: fix multi-AG deadlock in xfs_bunmapi Just like in the allocator we must avoid touching multiple AGs out of order when freeing blocks, as freeing still locks the AGF and can cause the same AB-BA deadlocks as in the allocation path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Nikolay Borisov <n.borisov.lkml@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
4c1a67bd |
|
17-Jul-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: set firstfsb to NULLFSBLOCK before feeding it to _bmapi_write We must initialize the firstfsb parameter to _bmapi_write so that it doesn't incorrectly treat stack garbage as a restriction on which AGs it can search for free space. Fixes-coverity-id: 1402025 Fixes-coverity-id: 1415167 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
9e24cfd0 |
|
20-Jun-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: remove unneeded parameter from XFS_TEST_ERROR Since we moved the injected error frequency controls to the mountpoint, we can get rid of the last argument to XFS_TEST_ERROR. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
#
e1a4e37c |
|
14-Jun-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent In a pathological scenario where we are trying to bunmapi a single extent in which every other block is shared, it's possible that trying to unmap the entire large extent in a single transaction can generate so many EFIs that we overflow the transaction reservation. Therefore, use a heuristic to guess at the number of blocks we can safely unmap from a reflink file's data fork in an single transaction. This should prevent problems such as the log head slamming into the tail and ASSERTs that trigger because we've exceeded the transaction reservation. Note that since bunmapi can fail to unmap the entire range, we must also teach the deferred unmap code to roll into a new transaction whenever we get low on reservation. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> [hch: random edits, all bugs are my fault] Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
6e747506 |
|
12-May-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: fix warnings about unused stack variables Reduce stack usage and get rid of compiler warnings by eliminating unused variables. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
#
0daaecac |
|
12-May-2017 |
Brian Foster <bfoster@redhat.com> |
xfs: fix indlen accounting error on partial delalloc conversion The delalloc -> real block conversion path uses an incorrect calculation in the case where the middle part of a delalloc extent is being converted. This is documented as a rare situation because XFS generally attempts to maximize contiguity by converting as much of a delalloc extent as possible. If this situation does occur, the indlen reservation for the two new delalloc extents left behind by the conversion of the middle range is calculated and compared with the original reservation. If more blocks are required, the delta is allocated from the global block pool. This delta value can be characterized as the difference between the new total requirement (temp + temp2) and the currently available reservation minus those blocks that have already been allocated (startblockval(PREV.br_startblock) - allocated). The problem is that the current code does not account for previously allocated blocks correctly. It subtracts the current allocation count from the (new - old) delta rather than the old indlen reservation. This means that more indlen blocks than have been allocated end up stashed in the remaining extents and free space accounting is broken as a result. Fix up the calculation to subtract the allocated block count from the original extent indlen and thus correctly allocate the reservation delta based on the difference between the new total requirement and the unused blocks from the original reservation. Also remove a bogus assert that contradicts the fact that the new indlen reservation can be larger than the original indlen reservation. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0c1d9e4a |
|
20-Apr-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: simplify validation of the unwritten extent bit XFS only supports the unwritten extent bit in the data fork, and only if the file system has a version 5 superblock or the unwritten extent feature bit. We currently have two routines that validate the invariant: xfs_check_nostate_extents which return -EFSCORRUPTED when it's not met, and xfs_validate_extent that triggers and assert in debug build. Both of them iterate over all extents of an inode fork when called, which isn't very efficient. This patch instead adds a new helper that verifies the invariant one extent at a time, and calls it from the places where we iterate over all extents to converted them from or two the in-memory format. The callers then return -EFSCORRUPTED when reading invalid extents from disk, or trigger an assert when writing them to disk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
4f1adf33 |
|
19-Apr-2017 |
Eric Sandeen <sandeen@redhat.com> |
xfs: more do_div cleanups On some architectures do_div does the pointer compare trick to make sure that we've sent it an unsigned 64-bit number. (Why unsigned? I don't know.) Fix up the few places that squawk about this; in xfs_bmap_wants_extents() we just used a bare int64_t so change that to unsigned. In xfs_adjust_extent_unmap_boundaries() all we wanted was the mod, and we have an xfs-specific function to handle that w/o side effects, which includes proper casting for do_div. In xfs_daddr_to_ag[b]no, we were using the wrong type anyway; XFS_BB_TO_FSBT returns a block in the filesystem, so use xfs_rfsblock_t not xfs_daddr_t, and gain the unsignedness from that type as a bonus. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
7590632a |
|
11-Apr-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: remove bmap block allocation retries Now that reflink operations don't set the firstblock value we don't need the workarounds for non-NULL firstblock values without a prior allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
bf8eadba |
|
11-Apr-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: remove xfs_bmap_remap_alloc The main thing that xfs_bmap_remap_alloc does is fixing the AGFL, similar to what we do in the space allocator. But the reflink code doesn't touch the allocation btree unlike the normal space allocator, so we couldn't care less about the state of the AGFL. So remove xfs_bmap_remap_alloc and just handle the di_nblocks update in the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
6ebd5a44 |
|
11-Apr-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: introduce xfs_bmapi_remap Add a new helper to be used for reflink extent list additions instead of funneling them through xfs_bmapi_write and overloading the firstblock member in struct xfs_bmalloca and struct xfs_alloc_args. With some small changes to xfs_bmap_remap_alloc this also means we do not need a xfs_bmalloca structure for this case at all. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
6d04558f |
|
11-Apr-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: pass individual arguments to xfs_bmap_add_extent_hole_real For the reflink case we'd much rather pass the required arguments than faking up a struct xfs_bmalloca. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
39e07daa |
|
11-Apr-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: remove attr fork handling in xfs_bmap_finish_one We never do COW operations for the attr fork, so don't pretend we handle them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
52813fb1 |
|
11-Apr-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: fix integer truncation in xfs_bmap_remap_alloc bno should be a xfs_fsblock_t, which is 64-bit wides instead of a xfs_aglock_t, which truncates the value to 32 bits. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
2fcc319d |
|
08-Mar-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: try any AG when allocating the first btree block when reflinking When a reflink operation causes the bmap code to allocate a btree block we're currently doing single-AG allocations due to having ->firstblock set and then try any higher AG due a little reflink quirk we've put in when adding the reflink code. But given that we do not have a minleft reservation of any kind in this AG we can still not have any space in the same or higher AG even if the file system has enough free space. To fix this use a XFS_ALLOCTYPE_FIRST_AG allocation in this fall back path instead. [And yes, we need to redo this properly instead of piling hacks over hacks. I'm working on that, but it's not going to be a small series. In the meantime this fixes the customer reported issue] Also add a warning for failing allocations to make it easier to debug. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f65e6fad |
|
08-Mar-2017 |
Brian Foster <bfoster@redhat.com> |
xfs: use iomap new flag for newly allocated delalloc blocks Commit fa7f138 ("xfs: clear delalloc and cache on buffered write failure") fixed one regression in the iomap error handling code and exposed another. The fundamental problem is that if a buffered write is a rewrite of preexisting delalloc blocks and the write fails, the failure handling code can punch out preexisting blocks with valid file data. This was reproduced directly by sub-block writes in the LTP kernel/syscalls/write/write03 test. A first 100 byte write allocates a single block in a file. A subsequent 100 byte write fails and punches out the block, including the data successfully written by the previous write. To address this problem, update the ->iomap_begin() handler to distinguish newly allocated delalloc blocks from preexisting delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the ->iomap_end() handler to decide when a failed or short write should punch out delalloc blocks. This introduces the subtle requirement that ->iomap_begin() should never combine newly allocated delalloc blocks with existing blocks in the resulting iomap descriptor. This can occur when a new delalloc reservation merges with a neighboring extent that is part of the current write, for example. Therefore, drop the post-allocation extent lookup from xfs_bmapi_reserve_delalloc() and just return the record inserted into the fork. This ensures only new blocks are returned and thus that preexisting delalloc blocks are always handled as "found" blocks and not punched out on a failed rewrite. Reported-by: Xiong Zhou <xzhou@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
410d17f6 |
|
16-Feb-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: tune down agno asserts in the bmap code In various places we currently assert that xfs_bmap_btalloc allocates from the same as the firstblock value passed in, unless it's either NULLAGNO or the dop_low flag is set. But the reflink code does not fully follow this convention as it passes in firstblock purely as a hint for the allocator without actually having previous allocations in the transaction, and without having a minleft check on the current AG, leading to the assert firing on a very full and heavily used file system. As even the reflink code only allocates from equal or higher AGs for now we can simply the check to always allow for equal or higher AGs. Note that we need to eventually split the two meanings of the firstblock value. At that point we can also allow the reflink code to allocate from any AG instead of limiting it in any way. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
75d65361 |
|
13-Feb-2017 |
Brian Foster <bfoster@redhat.com> |
xfs: split indlen reservations fairly when under reserved Certain workoads that punch holes into speculative preallocation can cause delalloc indirect reservation splits when the delalloc extent is split in two. If further splits occur, an already short-handed extent can be split into two in a manner that leaves zero indirect blocks for one of the two new extents. This occurs because the shortage is large enough that the xfs_bmap_split_indlen() algorithm completely drains the requested indlen of one of the extents before it honors the existing reservation. This ultimately results in a warning from xfs_bmap_del_extent(). This has been observed during file copies of large, sparse files using 'cp --sparse=always.' To avoid this problem, update xfs_bmap_split_indlen() to explicitly apply the reservation shortage fairly between both extents. This smooths out the overall indlen shortage and defers the situation where we end up with a delalloc extent with zero indlen reservation to extreme circumstances. Reported-by: Patrick Dung <mpatdung@gmail.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0e339ef8 |
|
13-Feb-2017 |
Brian Foster <bfoster@redhat.com> |
xfs: handle indlen shortage on delalloc extent merge When a delalloc extent is created, it can be merged with pre-existing, contiguous, delalloc extents. When this occurs, xfs_bmap_add_extent_hole_delay() merges the extents along with the associated indirect block reservations. The expectation here is that the combined worst case indlen reservation is always less than or equal to the indlen reservation for the individual extents. This is not always the case, however, as existing extents can less than the expected indlen reservation if the extent was previously split due to a hole punch. If a new extent merges with such an extent, the total indlen requirement may be larger than the sum of the indlen reservations held by both extents. xfs_bmap_add_extent_hole_delay() assumes that the worst case indlen reservation is always available and assigns it to the merged extent without consideration for the indlen held by the pre-existing extent. As a result, the subsequent xfs_mod_fdblocks() call can attempt an unintentional allocation rather than a free (indicated by an ASSERT() failure). Further, if the allocation happens to fail in this context, the failure goes unhandled and creates a filesystem wide block accounting inconsistency. Fix xfs_bmap_add_extent_hole_delay() to function as designed. Cap the indlen reservation assigned to the merged extent to the sum of the indlen reservations held by each of the individual extents. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
a14234c7 |
|
06-Feb-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: go straight to real allocations for direct I/O COW writes When we allocate COW fork blocks for direct I/O writes we currently first create a delayed allocation, and then convert it to a real allocation once we've got the delayed one. As there is no good reason for that this patch instead makes use call xfs_bmapi_write from the COW allocation path. The only interesting bits are a few tweaks the low-level allocator to allow for this, most notably the need to remove the call to xfs_bmap_extsize_align for the cowextsize in xfs_bmap_btalloc - for the existing convert case it's a no-op, but for the direct allocation case it would blow up our block reservation way beyond what we reserved for the transaction. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
05a630d7 |
|
02-Feb-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: allow unwritten extents in the CoW fork In the data fork, we only allow extents to perform the following state transitions: delay -> real <-> unwritten There's no way to move directly from a delalloc reservation to an /unwritten/ allocated extent. However, for the CoW fork we want to be able to do the following to each extent: delalloc -> unwritten -> written -> remapped to data fork This will help us to avoid a race in the speculative CoW preallocation code between a first thread that is allocating a CoW extent and a second thread that is remapping part of a file after a write. In order to do this, however, we need two things: first, we have to be able to transition from da to unwritten, and second the function that converts between real and unwritten has to be made aware of the cow fork. Do both of those things. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
d5a91bae |
|
02-Feb-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: filter out obviously bad btree pointers Don't let anybody load an obviously bad btree pointer. Since the values come from disk, we must return an error, not just ASSERT. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
|
#
b6f41e44 |
|
28-Jan-2017 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: remove boilerplate around xfs_btree_init_block Now that xfs_btree_init_block_int is able to determine crc status from the passed-in mp, we can determine the proper magic as well if we are given a btree number, rather than an explicit magic value. Change xfs_btree_init_block[_int] callers to pass in the btree number, and let xfs_btree_init_block_int use the xfs_magics array via the xfs_btree_magic macro to determine which magic value is needed. This makes all of the if (crc) / else stanzas identical, and the if/else can be removed, leading to a single, common init_block call. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f88ae46b |
|
28-Jan-2017 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: glean crc status from mp not flags in xfs_btree_init_block_int xfs_btree_init_block_int() can determine whether crcs are in effect without the passed-in XFS_BTREE_CRC_BLOCKS flag; the mp argument allows us to determine this from the superblock. Remove the flag from callers, and use xfs_sb_version_hascrc(&mp->m_sb) internally instead. This removes one difference between the if & else cases in the callers. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
493611eb |
|
25-Jan-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: extsize hints are not unlikely in xfs_bmap_btalloc With COW files they are the hotpath, just like for files with the extent size hint attribute. We really shouldn't micro-manage anything but failure cases with unlikely. Additionally Arnd Bergmann recently reported that one of these two unlikely annotations causes link failures together with an upcoming kernel instrumentation patch, so let's get rid of it ASAP. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
d2b3964a |
|
20-Jan-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: fix COW writeback race Due to the way how xfs_iomap_write_allocate tries to convert the whole found extents from delalloc to real space we can run into a race condition with multiple threads doing writes to this same extent. For the non-COW case that is harmless as the only thing that can happen is that we call xfs_bmapi_write on an extent that has already been converted to a real allocation. For COW writes where we move the extent from the COW to the data fork after I/O completion the race is, however, not quite as harmless. In the worst case we are now calling xfs_bmapi_write on a region that contains hole in the COW work, which will trip up an assert in debug builds or lead to file system corruption in non-debug builds. This seems to be reproducible with workloads of small O_DSYNC write, although so far I've not managed to come up with a with an isolated reproducer. The fix for the issue is relatively simple: tell xfs_bmapi_write that we are only asked to convert delayed allocations and skip holes in that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
255c5162 |
|
09-Jan-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: fix bogus minleft manipulations We can't just set minleft to 0 when we're low on space - that's exactly what we need minleft for: to protect space in the AG for btree block allocations when we are low on free space. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0f352f8e |
|
04-Dec-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: error out if trying to add attrs and anextents > 0 We shouldn't assert if somehow we end up trying to add an attr fork to an inode that apparently already has attr extents because this is an indication of on-disk corruption. Instead, return an error code to userspace. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
356a3225 |
|
04-Dec-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: complain if we don't get nextents bmap records When reading into memory all extents of a btree-format inode fork, complain if the number of extents we find is not the same as the number of extents reported in the inode core. This is needed to stop an IO action from accessing the garbage areas of the in-core fork. [dchinner: removed redundant assert] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
c44a1f22 |
|
04-Dec-2016 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: handle cow fork in xfs_bmap_trace_exlist By inspection, xfs_bmap_trace_exlist isn't handling cow forks, and will trace the data fork instead. Fix this by setting state appropriately if whichfork == XFS_COW_FORK. ()___() < @ @ > | | {o_o} (|) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
7710517f |
|
04-Dec-2016 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: pass state not whichfork to trace_xfs_extlist When xfs_bmap_trace_exlist called trace_xfs_extlist, it sent in the "whichfork" var instead of the bmap "state" as expected (even though state was already set up for this purpose). As a result, the xfs_bmap_class in tracing code used "whichfork" not state in xfs_iext_state_to_fork(), and got the wrong ifork pointer. It all goes downhill from there, including an ASSERT when ifp_bytes is empty by the time it reaches xfs_iext_get_ext(): XFS: Assertion failed: idx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
974ae922 |
|
27-Nov-2016 |
Brian Foster <bfoster@redhat.com> |
xfs: track preallocation separately in xfs_bmapi_reserve_delalloc() Speculative preallocation is currently processed entirely by the callers of xfs_bmapi_reserve_delalloc(). The caller determines how much preallocation to include, adjusts the extent length and passes down the resulting request. While this works fine for post-eof speculative preallocation, it is not as reliable for COW fork preallocation. COW fork preallocation is implemented via the cowextszhint, which aligns the start offset as well as the length of the extent. Further, it is difficult for the caller to accurately identify when preallocation occurs because the returned extent could have been merged with neighboring extents in the fork. To simplify this situation and facilitate further COW fork preallocation enhancements, update xfs_bmapi_reserve_delalloc() to take a separate preallocation parameter to incorporate into the allocation request. The preallocation blocks value is tacked onto the end of the request and adjusted to accommodate neighboring extents and extent size limits. Since xfs_bmapi_reserve_delalloc() now knows precisely how much preallocation was included in the allocation, it can also tag the inodes appropriately to support preallocation reclaim. Note that xfs_bmapi_reserve_delalloc() callers are not yet updated to use the preallocation mechanism. This patch should not change behavior outside of correctly tagging reflink inodes when start offset preallocation occurs (which the caller does not handle correctly). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
fd26a880 |
|
27-Nov-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: factor rmap btree size into the indlen calculations When we're estimating the amount of space it's going to take to satisfy a delalloc reservation, we need to include the space that we might need to grow the rmapbt. This helps us to avoid running out of space later when _iomap_write_allocate needs more space than we reserved. Eryu Guan observed this happening on generic/224 when sunit/swidth were set. Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
0e8d630b |
|
23-Nov-2016 |
Christoph Hellwig <hch@lst.de> |
xfs: remove NULLEXTNUM We only ever set a field to this constant for an impossible to reach error case in xfs_bmap_search_extents. That functions has been removed, so we can remove the constant as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
6edc977f |
|
23-Nov-2016 |
Christoph Hellwig <hch@lst.de> |
xfs: remove xfs_bmap_search_extents Now that all users are gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
65c5f419 |
|
23-Nov-2016 |
Christoph Hellwig <hch@lst.de> |
xfs: remove prev argument to xfs_bmapi_reserve_delalloc We can easily lookup the previous extent for the cases where we need it, which saves the callers from looking it up for us later in the series. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
7efc7945 |
|
23-Nov-2016 |
Christoph Hellwig <hch@lst.de> |
xfs: use new extent lookup helpers in __xfs_bunmapi Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
2d58f6ef |
|
23-Nov-2016 |
Christoph Hellwig <hch@lst.de> |
xfs: use new extent lookup helpers in xfs_bmapi_write Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
334f3423 |
|
23-Nov-2016 |
Christoph Hellwig <hch@lst.de> |
xfs: use new extent lookup helpers in xfs_bmapi_read Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
86685f7b |
|
23-Nov-2016 |
Christoph Hellwig <hch@lst.de> |
xfs: cleanup xfs_bmap_last_before Rewrite the function using xfs_iext_lookup_extent and xfs_iext_get_extent, and massage the flow into something easily understandable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
5d829300 |
|
07-Nov-2016 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: provide helper for counting extents from if_bytes The open-coded pattern: ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t) is all over the xfs code; provide a new helper xfs_iext_count(ifp) to count the number of inline extents in an inode fork. [dchinner: pick up several missed conversions] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
4fd29ec4 |
|
07-Nov-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: check return value of _trans_reserve_quota_nblks Check the return value of xfs_trans_reserve_quota_nblks for errors. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
64e6428d |
|
19-Oct-2016 |
Christoph Hellwig <hch@lst.de> |
xfs: remove xfs_bunmapi_cow Since no one uses it anymore. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
fa5c836c |
|
19-Oct-2016 |
Christoph Hellwig <hch@lst.de> |
xfs: refactor xfs_bunmapi_cow Split out two helpers for deleting delayed or real extents from the COW fork. This allows to call them directly from xfs_reflink_cow_end_io once that function is refactored to iterate the extent tree. It will also allow to reuse the delalloc deletion from xfs_bunmapi in the future. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
0a0af28c |
|
19-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: add xfs_trim_extent This helpers allows to trim an extent to a subset of it's original range while making sure the block numbers in it remain valid, In the future xfs_trim_extent and xfs_bmapi_trim_map should probably be merged in some form. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> [hch: split from a previous patch from Darrick, moved around and added support for "raw" delayed extents"] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
fe23759e |
|
19-Oct-2016 |
Eric Sandeen <sandeen@redhat.com> |
xfs: remove pointless error goto in xfs_bmap_remap_alloc The commit: f65306ea xfs: map an inode's offset to an exact physical block added a pointless error0: target; remove it. Addresses-Coverity-Id: 1373865 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
1d55a4bf |
|
19-Oct-2016 |
Colin Ian King <colin.king@canonical.com> |
xfs: remove redundant assignment of ifp Remove redundant ifp = ifp statement, it does nothing. Found with static analysis by CoverityScan. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
90e2056d |
|
03-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: try other AGs to allocate a BMBT block Prior to the introduction of reflink, allocating a block and mapping it into a file was performed in a single transaction with a single block reservation, and the allocator was supposed to find enough blocks to allocate the extent and any BMBT blocks that might be necessary (unless we're low on space). However, due to the way copy on write works, allocation and mapping have been split into two transactions, which means that we must be able to handle the case where we allocate an extent for CoW but that AG runs out of free space before the blocks can be mapped into a file, and the mapping requires a new BMBT block. When this happens, look in one of the other AGs for a BMBT block instead of taking the FS down. The same applies to the functions that convert a data fork to extents and later btree format. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
f7ca3522 |
|
03-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: create a separate cow extent size hint for the allocator Create a per-inode extent size allocator hint for copy-on-write. This hint is separate from the existing extent size hint so that CoW can take advantage of the fragmentation-reducing properties of extent size hints without disabling delalloc for regular writes. The extent size hint that's fed to the allocator during a copy on write operation is the greater of the cowextsize and regular extsize hint. During reflink, if we're sharing the entire source file to the entire destination file and the destination file doesn't already have a cowextsize hint, propagate the source file's cowextsize hint to the destination file. Furthermore, zero the bulkstat buffer prior to setting the fields so that we don't copy kernel memory contents into userspace. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
174edb0e |
|
03-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: store in-progress CoW allocations in the refcount btree Due to the way the CoW algorithm in XFS works, there's an interval during which blocks allocated to handle a CoW can be lost -- if the FS goes down after the blocks are allocated but before the block remapping takes place. This is exacerbated by the cowextsz hint -- allocated reservations can sit around for a while, waiting to get used. Since the refcount btree doesn't normally store records with refcount of 1, we can use it to record these in-progress extents. In-progress blocks cannot be shared because they're not user-visible, so there shouldn't be any conflicts with other programs. This is a better solution than holding EFIs during writeback because (a) EFIs can't be relogged currently, (b) even if they could, EFIs are bound by available log space, which puts an unnecessary upper bound on how much CoW we can have in flight, and (c) we already have a mechanism to track blocks. At mount time, read the refcount records and free anything we find with a refcount of 1 because those were in-progress when the FS went down. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
4862cfe8 |
|
03-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: support removing extents from CoW fork Create a helper method to remove extents from the CoW fork without any of the side effects (rmapbt/bmbt updates) of the regular extent deletion routine. We'll eventually use this to clear out the CoW fork during ioend processing. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
60b4984f |
|
03-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: support allocating delayed extents in CoW fork Modify xfs_bmap_add_extent_delay_real() so that we can convert delayed allocation extents in the CoW fork to real allocations, and wire this up all the way back to xfs_iomap_write_allocate(). In a subsequent patch, we'll modify the writepage handler to call this. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
be51f811 |
|
03-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: support bmapping delalloc extents in the CoW fork Allow the creation of delayed allocation extents in the CoW fork. In a subsequent patch we'll wire up iomap_begin to actually do this via reflink helper functions. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
3993baeb |
|
03-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: introduce the CoW fork Introduce a new in-core fork for storing copy-on-write delalloc reservations and allocated extents that are in the process of being written out. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
4453593b |
|
03-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: return work remaining at the end of a bunmapi operation Return the range of file blocks that bunmapi didn't free. This hint is used by CoW and reflink to figure out what part of an extent actually got freed so that it can set up the appropriate atomic remapping of just the freed range. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
9f3afb57 |
|
03-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: implement deferred bmbt map/unmap operations Implement deferred versions of the inode block map/unmap functions. These will be used in subsequent patches to make reflink operations atomic. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
4847acf8 |
|
03-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: pass bmapi flags through to bmap_del_extent Pass BMAPI_ flags from bunmapi into bmap_del_extent and extend BMAPI_REMAP (which means "don't touch the allocator or the quota accounting") to apply to bunmapi as well. This will be used to implement the unmap operation, which will be used by swapext. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
f65306ea |
|
03-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: map an inode's offset to an exact physical block Teach the bmap routine to know how to map a range of file blocks to a specific range of physical blocks, instead of simply allocating fresh blocks. This enables reflink to map a file to blocks that are already in use. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
62aab20f |
|
03-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: adjust refcount when unmapping file blocks When we're unmapping blocks from a reflinked file, decrease the refcount of the affected blocks and free the extents that are no longer in use. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
292378ed |
|
25-Sep-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: remote attribute blocks aren't really userdata When adding a new remote attribute, we write the attribute to the new extent before the allocation transaction is committed. This means we cannot reuse busy extents as that violates crash consistency semantics. Hence we currently treat remote attribute extent allocation like userdata because it has the same overwrite ordering constraints as userdata. Unfortunately, this also allows the allocator to incorrectly apply extent size hints to the remote attribute extent allocation. This results in interesting failures, such as transaction block reservation overruns and in-memory inode attribute fork corruption. To fix this, we need to separate the busy extent reuse configuration from the userdata configuration. This changes the definition of XFS_BMAPI_METADATA slightly - it now means that allocation is metadata and reuse of busy extents is acceptible due to the metadata ordering semantics of the journal. If this flag is not set, it means the allocation is that has unordered data writeback, and hence busy extent reuse is not allowed. It no longer implies the allocation is for user data, just that the data write will not be strictly ordered. This matches the semantics for both user data and remote attribute block allocation. As such, This patch changes the "userdata" field to a "datatype" field, and adds a "no busy reuse" flag to the field. When we detect an unordered data extent allocation, we immediately set the no reuse flag. We then set the "user data" flags based on the inode fork we are allocating the extent to. Hence we only set userdata flags on data fork allocations now and consider attribute fork remote extents to be an unordered metadata extent. The result is that remote attribute extents now have the expected allocation semantics, and the data fork allocation behaviour is completely unchanged. It should be noted that there may be other ways to fix this (e.g. use ordered metadata buffers for the remote attribute extent data write) but they are more invasive and difficult to validate both from a design and implementation POV. Hence this patch takes the simple, obvious route to fixing the problem... Reported-and-tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
51446f5b |
|
18-Sep-2016 |
Christoph Hellwig <hch@lst.de> |
xfs: rewrite and optimize the delalloc write path Currently xfs_iomap_write_delay does up to lookups in the inode extent tree, which is rather costly especially with the new iomap based write path and small write sizes. But it turns out that the low-level xfs_bmap_search_extents gives us all the information we need in the regular delalloc buffered write path: - it will return us an extent covering the block we are looking up if it exists. In that case we can simply return that extent to the caller and are done - it will tell us if we are beyoned the last current allocated block with an eof return parameter. In that case we can create a delalloc reservation and use the also returned information about the last extent in the file as the hint to size our delalloc reservation. - it can tell us that we are writing into a hole, but that there is an extent beyoned this hole. In this case we can create a delalloc reservation that covers the requested size (possible capped to the next existing allocation). All that can be done in one single routine instead of bouncing up and down a few layers. This reduced the CPU overhead of the block mapping routines and also simplified the code a lot. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
3fd129b6 |
|
18-Sep-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: set up per-AG free space reservations One unfortunate quirk of the reference count and reverse mapping btrees -- they can expand in size when blocks are written to *other* allocation groups if, say, one large extent becomes a lot of tiny extents. Since we don't want to start throwing errors in the middle of CoWing, we need to reserve some blocks to handle future expansion. The transaction block reservation counters aren't sufficient here because we have to have a reserve of blocks in every AG, not just somewhere in the filesystem. Therefore, create two per-AG block reservation pools. One feeds the AGFL so that rmapbt expansion always succeeds, and the other feeds all other metadata so that refcountbt expansion never fails. Use the count of how many reserved blocks we need to have on hand to create a virtual reservation in the AG. Through selective clamping of the maximum length of allocation requests and of the length of the longest free extent, we can make it look like there's less free space in the AG unless the reservation owner is asking for blocks. In other words, play some accounting tricks in-core to make sure that we always have blocks available. On the plus side, there's nothing to clean up if we crash, which is contrast to the strategy that the rough draft used (actually removing extents from the freespace btrees). Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
9c194644 |
|
02-Aug-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: propagate bmap updates to rmapbt When we map, unmap, or convert an extent in a file's data or attr fork, schedule a respective update in the rmapbt. Previous versions of this patch required a 1:1 correspondence between bmap and rmap, but this is no longer true as we now have ability to make interval queries against the rmapbt. We use the deferred operations code to handle redo operations atomically and deadlock free. This plumbs in all five rmap actions (map, unmap, convert extent, alloc, free); we'll use the first three now for file data, and reflink will want the last two. We also add an error injection site to test log recovery. Finally, we need to fix the bmap shift extent code to adjust the rmaps correctly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
52548852 |
|
02-Aug-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: rmap btree requires more reserved free space Originally-From: Dave Chinner <dchinner@redhat.com> The rmap btree is allocated from the AGFL, which means we have to ensure ENOSPC is reported to userspace before we run out of free space in each AG. The last allocation in an AG can cause a full height rmap btree split, and that means we have to reserve at least this many blocks *in each AG* to be placed on the AGFL at ENOSPC. Update the various space calculation functions to handle this. Also, because the macros are now executing conditional code and are called quite frequently, convert them to functions that initialise variables in the struct xfs_mount, use the new variables everywhere and document the calculations better. [darrick.wong@oracle.com: don't reserve blocks if !rmap] [dchinner@redhat.com: update m_ag_max_usable after growfs] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
340785cc |
|
02-Aug-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: add owner field to extent allocation and freeing For the rmap btree to work, we have to feed the extent owner information to the the allocation and freeing functions. This information is what will end up in the rmap btree that tracks allocated extents. While we technically don't need the owner information when freeing extents, passing it allows us to validate that the extent we are removing from the rmap btree actually belonged to the owner we expected it to belong to. We also define a special set of owner values for internal metadata that would otherwise have no owner. This allows us to tell the difference between metadata owned by different per-ag btrees, as well as static fs metadata (e.g. AG headers) and internal journal blocks. There are also a couple of special cases we need to take care of - during EFI recovery, we don't actually know who the original owner was, so we need to pass a wildcard to indicate that we aren't checking the owner for validity. We also need special handling in growfs, as we "free" the space in the last AG when extending it, but because it's new space it has no actual owner... While touching the xfs_bmap_add_free() function, re-order the parameters to put the struct xfs_mount first. Extend the owner field to include both the owner type and some sort of index within the owner. The index field will be used to support reverse mappings when reflink is enabled. When we're freeing extents from an EFI, we don't have the owner information available (rmap updates have their own redo items). xfs_free_extent therefore doesn't need to do an rmap update. Make sure that the log replay code signals this correctly. This is based upon a patch originally from Dave Chinner. It has been extended to add more owner information with the intent of helping recovery operations when things go wrong (e.g. offset of user data block in a file). [dchinner: de-shout the xfs_rmap_*_owner helpers] [darrick: minor style fixes suggested by Christoph Hellwig] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
ba9e7802 |
|
02-Aug-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: add tracepoints and error injection for deferred extent freeing Add a couple of tracepoints for the deferred extent free operation and a site for injecting errors while finishing the operation. This makes it easier to debug deferred ops and test log redo. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
2c3234d1 |
|
02-Aug-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: rename flist/free_list to dfops Mechanical change of flist/free_list to dfops, since they're now deferred ops, not just a freeing list. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
310a75a3 |
|
02-Aug-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: change xfs_bmap_{finish,cancel,init,free} -> xfs_defer_* Drop the compatibility shims that we were using to integrate the new deferred operation mechanism into the existing code. No new code. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
3ab78df2 |
|
02-Aug-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: rework xfs_bmap_free callers to use xfs_defer_ops Restructure everything that used xfs_bmap_free to use xfs_defer_ops instead. For now we'll just remove the old symbols and play some cpp magic to make it work; in the next patch we'll actually rename everything. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
f4a0660d |
|
02-Aug-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: fix locking of the rt bitmap/summary inodes When we're deleting realtime extents, we need to lock the summary inode in case we need to update the summary info to prevent an assert on the rsumip inode lock on a debug kernel. While we're at it, fix the locking annotations so that we avoid triggering lockdep warnings. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
e66a4c67 |
|
20-Jun-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: convert list of extents to free into a regular list In struct xfs_bmap_free, convert the open-coded free extent list to a regular list, then use list_sort to sort it prior to processing. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
59bad075 |
|
20-Jun-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: rearrange xfs_bmap_add_free parameters This is already in xfsprogs' libxfs, so port it to the kernel. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
253f4911 |
|
05-Apr-2016 |
Christoph Hellwig <hch@lst.de> |
xfs: better xfs_trans_alloc interface Merge xfs_trans_reserve and xfs_trans_alloc into a single function call that returns a transaction with all the required log and block reservations, and which allows passing transaction flags directly to avoid the cumbersome _xfs_trans_alloc interface. While we're at it we also get rid of the transaction type argument that has been superflous since we stopped supporting the non-CIL logging mode. The guts of it will be removed in another patch. [dchinner: fixed transaction leak in error path in xfs_setattr_nonsize] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
09cbfeaf |
|
01-Apr-2016 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d34999c9 |
|
14-Mar-2016 |
Brian Foster <bfoster@redhat.com> |
xfs: borrow indirect blocks from freed extent when available xfs_bmap_del_extent() handles extent removal from the in-core and on-disk extent lists. When removing a delalloc range, it updates the indirect block reservation appropriately based on the removal. It currently enforces that the new indirect block reservation is less than or equal to the original. This is normally the case in all situations except for in certain cases when the removed range creates a hole in a single delalloc extent, thus splitting a single delalloc extent in two. It is possible with small enough extents to split an indlen==1 extent into two such slightly smaller extents. This leaves one extent with 0 indirect blocks and leads to assert failures in other areas (e.g., xfs_bunmapi() if the extent happens to be removed). Update the indlen distribution code to steal blocks from the deleted extent, if necessary, to satisfy the worst case total indirect reservation for the new extents. This is safe as the caller does not update the fdblocks counters until the extent is removed. Blocks stolen in this manner simply remain accounted as allocated, having ownership transferred from the data extent to an indirect reservation. As a precaution, fall back to the original reservation algorithm if the new indlen requirement is not met and warn if we end up with extents without any reservation at all to detect this more easily in the future. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
a9bd24ac |
|
14-Mar-2016 |
Brian Foster <bfoster@redhat.com> |
xfs: refactor delalloc indlen reservation split into helper The delayed allocation indirect reservation splitting code is not sufficient in some cases where a delalloc extent is split in two. In preparation for enhancements to this code, refactor the current indlen distribution algorithm into a new helper function. [dchinner: rename temp, temp2 variables] Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
b2706a05 |
|
14-Mar-2016 |
Brian Foster <bfoster@redhat.com> |
xfs: update freeblocks counter after extent deletion xfs_bunmapi() currently updates the fdblocks counter, unreserves quota, etc. before the extent is deleted by xfs_bmap_del_extent(). The function has problems dividing up the indirect reserved blocks for scenarios where a single delalloc extent is split in two. Particularly, there aren't always enough blocks reserved for multiple extents in a single extent reservation. The solution to this problem is to allow the extent removal code to steal from the deleted extent to meet indirect reservation requirements. Move the block of code in xfs_bmapi() that updates the fdblocks counter to after the call to xfs_bmap_del_extent() to allow the codepath to update the extent record before the free blocks are accounted. Also, reshuffle the code slightly so the delalloc accounting occurs near the xfs_bmap_del_extent() call to provide context for the comments. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
a5fd276b |
|
08-Mar-2016 |
Luis de Bethencourt <luisbg@osg.samsung.com> |
xfs: remove impossible condition bp_release is set to 0 just before the breakpoint of the for loop before the conditional check (in line 458). The other breakpoint is a goto that skips the dead code. Addresses-Coverity-Id: 102338 Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
a7e5d03b |
|
01-Mar-2016 |
Christoph Hellwig <hch@lst.de> |
xfs: remove xfs_trans_get_block_res Just use the t_blk_res field directly instead of obsfucating the reference by a macro. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
c19b3b05 |
|
08-Feb-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: mode di_mode to vfs inode Move the di_mode value from the xfs_icdinode to the VFS inode, reducing the xfs_icdinode byte another 2 bytes and collapsing another 2 byte hole in the structure. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
f6106efa |
|
10-Jan-2016 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: eliminate committed arg from xfs_bmap_finish Calls to xfs_bmap_finish() and xfs_trans_ijoin(), and the associated comments were replicated several times across the attribute code, all dealing with what to do if the transaction was or wasn't committed. And in that replicated code, an ASSERT() test of an uninitialized variable occurs in several locations: error = xfs_attr_thing(&args); if (!error) { error = xfs_bmap_finish(&args.trans, args.flist, &committed); } if (error) { ASSERT(committed); If the first xfs_attr_thing() failed, we'd skip the xfs_bmap_finish, never set "committed", and then test it in the ASSERT. Fix this up by moving the committed state internal to xfs_bmap_finish, and add a new inode argument. If an inode is passed in, it is passed through to __xfs_trans_roll() and joined to the transaction there if the transaction was committed. xfs_qm_dqalloc() was a little unique in that it called bjoin rather than ijoin, but as Dave points out we can detect the committed state but checking whether (*tpp != tp). Addresses-Coverity-Id: 102360 Addresses-Coverity-Id: 102361 Addresses-Coverity-Id: 102363 Addresses-Coverity-Id: 102364 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
e3543819 |
|
07-Jan-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: bmapbt checking on debug kernels too expensive For large sparse or fragmented files, checking every single entry in the bmapbt on every operation is prohibitively expensive. Especially as such checks rarely discover problems during normal operations on high extent coutn files. Our regression tests don't tend to exercise files with hundreds of thousands to millions of extents, so mostly this isn't noticed. However, trying to run things like xfs_mdrestore of large filesystem dumps on a debug kernel quickly becomes impossible as the CPU is completely burnt up repeatedly walking the sparse file bmapbt that is generated for every allocation that is made. Hence, if the file has more than 10,000 extents, just don't bother with walking the tree to check it exhaustively. The btree code has checks that ensure that the newly inserted/removed/modified record is correctly ordered, so the entrie tree walk in thses cases has limited additional value. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
6d3eb1ec |
|
03-Jan-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
libxfs: use a convenience variable instead of open-coding the fork Use a convenience variable instead of open-coding the inode fork. This isn't really needed for now, but will become important when we add the copy-on-write fork later. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
f1f96c49 |
|
03-Jan-2016 |
Eric Sandeen <sandeen@redhat.com> |
xfs: get mp from bma->ip in xfs_bmap code In my earlier commit c29aad4 xfs: pass mp to XFS_WANT_CORRUPTED_GOTO I added some local mp variables with code which indicates that mp might be NULL. Coverity doesn't like this now, because the updated per-fs XFS_STATS macros dereference mp. I don't think this is actually a problem; from what I can tell, we cannot get to these functions with a null bma->tp, so my NULL check was probably pointless. Still, it's not super obvious. So switch this code to get mp from the inode on the xfs_bmalloca structure, with no conditional, because the functions are already using bmap->ip directly. Addresses-Coverity-Id: 1339552 Addresses-Coverity-Id: 1339553 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
3fbbbea3 |
|
02-Nov-2015 |
Dave Chinner <dchinner@redhat.com> |
xfs: introduce BMAPI_ZERO for allocating zeroed extents To enable DAX to do atomic allocation of zeroed extents, we need to drive the block zeroing deep into the allocator. Because xfs_bmapi_write() can return merged extents on allocation that were only partially allocated (i.e. requested range spans allocated and hole regions, allocation into the hole was contiguous), we cannot zero the extent returned from xfs_bmapi_write() as that can overwrite existing data with zeros. Hence we have to drive the extent zeroing into the allocation code, prior to where we merge the extents into the BMBT and return the resultant map. This means we need to propagate this need down to the xfs_alloc_vextent() and issue the block zeroing at this point. While this functionality is being introduced for DAX, there is no reason why it is specific to DAX - we can per-zero blocks during the allocation transaction on any type of device. It's just slow (and usually slower than unwritten allocation and conversion) on traditional block devices so doesn't tend to get used. We can, however, hook hardware zeroing optimisations via sb_issue_zeroout() to this operation, so it may be useful in future and hence the "allocate zeroed blocks" API needs to be implementation neutral. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
ff6d6af2 |
|
12-Oct-2015 |
Bill O'Donnell <billodo@redhat.com> |
xfs: per-filesystem stats counter implementation This patch modifies the stats counting macros and the callers to those macros to properly increment, decrement, and add-to the xfs stats counts. The counts for global and per-fs stats are correctly advanced, and cleared by writing a "1" to the corresponding clear file. global counts: /sys/fs/xfs/stats/stats per-fs counts: /sys/fs/xfs/sda*/stats/stats global clear: /sys/fs/xfs/stats/stats_clear per-fs clear: /sys/fs/xfs/sda*/stats/stats_clear [dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around stats structures and macros. ] Signed-off-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
b7cdc66b |
|
11-Oct-2015 |
Brian Foster <bfoster@redhat.com> |
xfs: log local to remote symlink conversions correctly on v5 supers A local format symlink inode is converted to extent format when an extended attribute is set on an inode as part of the attribute fork creation. This means a block is allocated, the local symlink target name is copied to the block and the block is logged. Currently, xfs_bmap_local_to_extents() handles logging the remote block data based on the size of the data fork prior to the conversion. This is not correct on v5 superblock filesystems, which add an additional header to remote symlink blocks that is nonexistent in local format inodes. As a result, the full length of the remote symlink block content is not logged. This can lead to corruption should a crash occur and log recovery replay this transaction. Since a callout is already used to initialize the new remote symlink block, update the local-to-extents conversion mechanism to make the callout also responsible for logging the block. It is already required to set the log buffer type and format the block appropriately based on the superblock version. This ensures the remote symlink is always logged correctly. Note that xfs_bmap_local_to_extents() is only called for symlinks so there are no other callouts that require modification. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
d4a97a04 |
|
18-Aug-2015 |
Brian Foster <bfoster@redhat.com> |
xfs: add missing bmap cancel calls in error paths If a failure occurs after the bmap free list is populated and before xfs_bmap_finish() completes successfully (which returns a partial list on failure), the bmap free list must be cancelled. Otherwise, the extent items on the list are never freed and a memory leak occurs. Several random error paths throughout the code suffer this problem. Fix these up such that xfs_bmap_cancel() is always called on error. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
496817b4 |
|
21-Jun-2015 |
Dave Chinner <dchinner@redhat.com> |
xfs: clean up XFS_MIN_FREELIST macros We no longer calculate the minimum freelist size from the on-disk AGF, so we don't need the macros used for this. That means the nested macros can be cleaned up, and turn this into an actual function so the logic is clear and concise. This will make it much easier to add support for the rmap btree when the time comes. This also gets rid of the XFS_AG_MAXLEVELS macro used by these freelist macros as it is simply a wrapper around a single variable. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
50adbcb4 |
|
21-Jun-2015 |
Dave Chinner <dchinner@redhat.com> |
xfs: xfs_alloc_fix_freelist() can use incore perag structures At the moment, xfs_alloc_fix_freelist() uses a mix of per-ag based access and agf buffer based access to freelist and space usage information. However, once the AGF buffer is locked inside this function, it is guaranteed that both the in-memory and on-disk values are identical. xfs_alloc_fix_freelist() doesn't modify the values in the structures directly, so it is a read-only user of the infomration, and hence can use the per-ag structure exclusively for determining what it should do. This opens up an avenue for cleaning up a lot of duplicated logic whose only difference is the structure it gets the data from, and in doing so removes a lot of needless byte swapping overhead when fixing up the free list. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
70393313 |
|
03-Jun-2015 |
Christoph Hellwig <hch@lst.de> |
xfs: saner xfs_trans_commit interface The flags argument to xfs_trans_commit is not useful for most callers, as a commit of a transaction without a permanent log reservation must pass 0 here, and all callers for a transaction with a permanent log reservation except for xfs_trans_roll must pass XFS_TRANS_RELEASE_LOG_RES. So remove the flags argument from the public xfs_trans_commit interfaces, and introduce low-level __xfs_trans_commit variant just for xfs_trans_roll that regrants a log reservation instead of releasing it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
4906e215 |
|
03-Jun-2015 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the flags argument to xfs_trans_cancel xfs_trans_cancel takes two flags arguments: XFS_TRANS_RELEASE_LOG_RES and XFS_TRANS_ABORT. Both of them are a direct product of the transaction state, and can be deducted: - any dirty transaction needs XFS_TRANS_ABORT to be properly canceled, and XFS_TRANS_ABORT is a noop for a transaction that is not dirty. - any transaction with a permanent log reservation needs XFS_TRANS_RELEASE_LOG_RES to be properly canceled, and passing XFS_TRANS_RELEASE_LOG_RES for a transaction without a permanent log reservation is invalid. So just remove the flags argument and do the right thing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
2e588a46 |
|
31-May-2015 |
Brian Foster <bfoster@redhat.com> |
xfs: always log the inode on unwritten extent conversion The fsync() requirements for crash consistency on XFS are to flush file data and force any in-core inode updates to the log. We currently check whether the inode is pinned to identify whether the log needs to be forced, since a non-zero pin count generally represents an inode that has transactions awaiting a flush to the on-disk log. This is not sufficient in all cases, however. Reports of xfstests test generic/311 failures on ppc64/s390x hosts have identified failures to fsync outstanding inode modifications due to the inode not being pinned at the time of the fsync. This occurs because certain bmap updates can complete by logging bmapbt buffers but without ever dirtying (and thus pinning) the core inode. The following is a specific incarnation of this problem: $ mount $dev /mnt -o noatime,nobarrier $ for i in $(seq 0 2 31); do \ xfs_io -f -c "falloc $((i * 32768)) 32k" -c fsync /mnt/file; \ done $ xfs_io -c "pwrite -S 0 80k 16k" -c fsync -c "pwrite 76k 4k" -c fsync /mnt/file; \ hexdump /mnt/file; \ ./xfstests-dev/src/godown /mnt ... 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0013000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd * 0014000 0000 0000 0000 0000 0000 0000 0000 0000 * 00f8000 $ umount /mnt; mount ... $ hexdump /mnt/file 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 00f8000 In short, the unwritten extent conversion for the last write is lost despite the fact that an fsync executed before the filesystem was shutdown. Note that this is impossible to reproduce on v5 supers due to unconditional time callbacks for di_changecount and highly difficult to reproduce on CONFIG_HZ=1000 kernels due to those same callbacks frequently updating cmtime prior to the bmap update. CONFIG_HZ=100 reduces timer granularity enough to increase the odds that time updates are skipped and allows this to reproduce within a handful of attempts. To deal with this problem, unconditionally log the core in the unwritten extent conversion path. Fix up logflags after the extent conversion to keep the extent update code consistent with the other extent update helpers. This fixup is not necessary for the other (hole, delay) extent helpers because they execute in the block allocation codepath, which already logs the inode for other reasons (e.g., for di_nblocks). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
6dea405e |
|
28-May-2015 |
Dave Chinner <dchinner@redhat.com> |
xfs: extent size hints can round up extents past MAXEXTLEN This results in BMBT corruption, as seen by this test: # mkfs.xfs -f -d size=40051712b,agcount=4 /dev/vdc .... # mount /dev/vdc /mnt/scratch # xfs_io -ft -c "extsize 16m" -c "falloc 0 30g" -c "bmap -vp" /mnt/scratch/foo which results in this failure on a debug kernel: XFS: Assertion failed: (blockcount & xfs_mask64hi(64-BMBT_BLOCKCOUNT_BITLEN)) == 0, file: fs/xfs/libxfs/xfs_bmap_btree.c, line: 211 .... Call Trace: [<ffffffff814cf0ff>] xfs_bmbt_set_allf+0x8f/0x100 [<ffffffff814cf18d>] xfs_bmbt_set_all+0x1d/0x20 [<ffffffff814f2efe>] xfs_iext_insert+0x9e/0x120 [<ffffffff814c7956>] ? xfs_bmap_add_extent_hole_real+0x1c6/0xc70 [<ffffffff814c7956>] xfs_bmap_add_extent_hole_real+0x1c6/0xc70 [<ffffffff814caaab>] xfs_bmapi_write+0x72b/0xed0 [<ffffffff811c72ac>] ? kmem_cache_alloc+0x15c/0x170 [<ffffffff814fe070>] xfs_alloc_file_space+0x160/0x400 [<ffffffff81ddcc29>] ? down_write+0x29/0x60 [<ffffffff815063eb>] xfs_file_fallocate+0x29b/0x310 [<ffffffff811d2bc8>] ? __sb_start_write+0x58/0x120 [<ffffffff811e3e18>] ? do_vfs_ioctl+0x318/0x570 [<ffffffff811cd680>] vfs_fallocate+0x140/0x260 [<ffffffff811ce6f8>] SyS_fallocate+0x48/0x80 [<ffffffff81ddec09>] system_call_fastpath+0x12/0x17 The tracepoint that indicates the extent that triggered the assert failure is: xfs_iext_insert: idx 0 offset 0 block 16777224 count 2097152 flag 1 Clearly indicating that the extent length is greater than MAXEXTLEN, which is 2097151. A prior trace point shows the allocation was an exact size match and that a length greater than MAXEXTLEN was asked for: xfs_alloc_size_done: agno 1 agbno 8 minlen 2097152 maxlen 2097152 ^^^^^^^ ^^^^^^^ We don't see this problem with extent size hints through the IO path because we can't do single IOs large enough to trigger MAXEXTLEN allocation. fallocate(), OTOH, is not limited in it's allocation sizes and so needs help here. The issue is that the extent size hint alignment is rounding up the extent size past MAXEXTLEN, because xfs_bmapi_write() is not taking into account extent size hints when calculating the maximum extent length to allocate. xfs_bmapi_reserve_delalloc() is already doing this, but direct extent allocation is not. Unfortunately, the calculation in xfs_bmapi_reserve_delalloc() is wrong, and it works only because delayed allocation extents are not limited in size to MAXEXTLEN in the in-core extent tree. hence this calculation does not work for direct allocation, and the delalloc code needs fixing. This may, in fact be the underlying bug that occassionally causes transaction overruns in delayed allocation extent conversion, so now we know it's wrong we should fix it, too. Many thanks to Brian Foster for finding this problem during review of this patch. Hence the fix, after much code reading, is to allow xfs_bmap_extsize_align() to align partial extents when full alignment would extend the alignment past MAXEXTLEN. We can safely do this because all callers have higher layer allocation loops that already handle short allocations, and so will simply run another allocation to cover the remainder of the requested allocation range that we ignored during alignment. The advantage of this approach is that it also removes the need for callers to do anything other than limit their requests to MAXEXTLEN - they don't really need to be aware of extent size hints at all. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
a904b1ca |
|
24-Mar-2015 |
Namjae Jeon <namjae.jeon@samsung.com> |
xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate This patch implements fallocate's FALLOC_FL_INSERT_RANGE for XFS. 1) Make sure that both offset and len are block size aligned. 2) Update the i_size of inode by len bytes. 3) Compute the file's logical block number against offset. If the computed block number is not the starting block of the extent, split the extent such that the block number is the starting block of the extent. 4) Shift all the extents which are lying bewteen [offset, last allocated extent] towards right by len bytes. This step will make a hole of len bytes at offset. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
d64588ca |
|
24-Mar-2015 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove xfs_bmap_sanity_check() This code is redundant now that we have verifiers that sanity check the buffers as they are read from disk. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
5fb5aeee |
|
23-Feb-2015 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: pass mp to XFS_WANT_CORRUPTED_RETURN Today, if we hit an XFS_WANT_CORRUPTED_RETURN we don't print any information about which filesystem hit it. Passing in the mp allows us to print the filesystem (device) name, which is a pretty critical piece of information. Tested by running fsfuzzer 'til I hit some. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
c29aad41 |
|
23-Feb-2015 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: pass mp to XFS_WANT_CORRUPTED_GOTO Today, if we hit an XFS_WANT_CORRUPTED_GOTO we don't print any information about which filesystem hit it. Passing in the mp allows us to print the filesystem (device) name, which is a pretty critical piece of information. Tested by running fsfuzzer 'til I hit some. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
bab98bbe |
|
23-Feb-2015 |
Dave Chinner <dchinner@redhat.com> |
xfs: introduce xfs_mod_frextents Add a new helper to modify the incore counter of free realtime extents. This matches the helpers used for inode and data block counters, and removes a significant users of the xfs_mod_incore_sb() interface. Based on a patch originally from Christoph Hellwig. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
0d485ada |
|
23-Feb-2015 |
Dave Chinner <david@fromorbit.com> |
xfs: use generic percpu counters for free block counter XFS has hand-rolled per-cpu counters for the superblock since before there was any generic implementation. The free block counter is special in that it is used for ENOSPC detection outside transaction contexts for for delayed allocation. This means that the counter needs to be accurate at zero. The current per-cpu counter code jumps through lots of hoops to ensure we never run past zero, but we don't need to make all those jumps with the generic counter implementation. The generic counter implementation allows us to pass a "batch" threshold at which the addition/subtraction to the counter value will be folded back into global value under lock. We can use this feature to reduce the batch size as we approach 0 in a very similar manner to the existing counters and their rebalance algorithm. If we use a batch size of 1 as we approach 0, then every addition and subtraction will be done against the global value and hence allow accurate detection of zero threshold crossing. Hence we can replace the handrolled, accurate-at-zero counters with generic percpu counters. Note: this removes just enough of the icsb infrastructure to compile without warnings. The rest will go in subsequent commits. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
fe22d552 |
|
21-Jan-2015 |
Dave Chinner <dchinner@redhat.com> |
xfs: set buf types when converting extent formats Conversion from local to extent format does not set the buffer type correctly on the new extent buffer when a symlink data is moved out of line. Fix the symlink code and leave a comment in the generic bmap code reminding us that the format-specific data copy needs to set the destination buffer type appropriately. cc: <stable@vger.kernel.org> # 3.10 to current Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
61e63ecb |
|
21-Jan-2015 |
Dave Chinner <dchinner@redhat.com> |
xfs: consolidate superblock logging functions We now have several superblock loggin functions that are identical except for the transaction reservation and whether it shoul dbe a synchronous transaction or not. Consolidate these all into a single function, a single reserveration and a sync flag and call it xfs_sync_sb(). Also, xfs_mod_sb() is not really a modification function - it's the operation of logging the superblock buffer. hence change the name of it to reflect this. Note that we have to change the mp->m_update_flags that are passed around at mount time to a boolean simply to indicate a superblock update is needed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
4d11a402 |
|
21-Jan-2015 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove bitfield based superblock updates When we log changes to the superblock, we first have to write them to the on-disk buffer, and then log that. Right now we have a complex bitfield based arrangement to only write the modified field to the buffer before we log it. This used to be necessary as a performance optimisation because we logged the superblock buffer in every extent or inode allocation or freeing, and so performance was extremely important. We haven't done this for years, however, ever since the lazy superblock counters pulled the superblock logging out of the transaction commit fast path. Hence we have a bunch of complexity that is not necessary that makes writing the in-core superblock to disk much more complex than it needs to be. We only need to log the superblock now during management operations (e.g. during mount, unmount or quota control operations) so it is not a performance critical path anymore. As such, remove the complex field based logging mechanism and replace it with a simple conversion function similar to what we use for all other on-disk structures. This means we always log the entirity of the superblock, but again because we rarely modify the superblock this is not an issue for log bandwidth or CPU time. Indeed, if we do log the superblock frequently, delayed logging will minimise the impact of this overhead. [Fixed gquota/pquota inode sharing regression noticed by bfoster.] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
32296f86 |
|
03-Dec-2014 |
Dave Chinner <dchinner@redhat.com> |
xfs: fix set-but-unused warnings The kernel compile doesn't turn on these checks by default, so it's only when I do a kernel-user sync that I find that there are lots of compiler warnings waiting to be fixed. Fix up these set-but-unused warnings. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
4db431f5 |
|
03-Dec-2014 |
Dave Chinner <dchinner@redhat.com> |
xfs: cleanup xfs_bmse_merge returns Signed-off-by: Dave Chinner <dchinner@redhat.com> xfs_bmse_merge() has a jump label for return that just returns the error value. Convert all the code to just return the error directly and use XFS_WANT_CORRUPTED_RETURN. This also allows the final call to xfs_bmbt_update() to return directly. Noticed while reviewing coccinelle return cleanup patches and wondering why the same return pattern as in xfs_bmse_shift_one() wasn't picked up by the checker pattern... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
b11bd671 |
|
03-Dec-2014 |
Dave Chinner <dchinner@redhat.com> |
xfs: cleanup xfs_bmse_shift_one goto mess xfs_bmse_shift_one() jumps around determining whether to shift or merge, making the code flow difficult to follow. Clean it up and use direct error returns (including XFS_WANT_CORRUPTED_RETURN) to make the code flow better and be easier to read. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
d254aaec |
|
30-Nov-2014 |
kbuild test robot <fengguang.wu@intel.com> |
xfs: fix simple_return.cocci warning in xfs_bmse_shift_one fs/xfs/libxfs/xfs_bmap.c:5591:1-6: WARNING: end returns can be simpified Simplify a trivial if-return sequence. Possibly combine with a preceding function call. Generated by: scripts/coccinelle/misc/simple_return.cocci CC: Brian Foster <bfoster@redhat.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
|
#
508b6b3b |
|
27-Nov-2014 |
Christoph Hellwig <hch@lst.de> |
xfs: merge xfs_inum.h into xfs_format.h Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
4fb6e8ad |
|
27-Nov-2014 |
Christoph Hellwig <hch@lst.de> |
xfs: merge xfs_ag.h into xfs_format.h More on-disk format consolidation. A few declarations that weren't on-disk format related move into better suitable spots. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
6d3ebaae |
|
27-Nov-2014 |
Christoph Hellwig <hch@lst.de> |
xfs: merge xfs_dinode.h into xfs_format.h More consolidatation for the on-disk format defintions. Note that the XFS_IS_REALTIME_INODE moves to xfs_linux.h instead as it is not related to the on disk format, but depends on a CONFIG_ option. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
f71721d0 |
|
22-Sep-2014 |
Brian Foster <bfoster@redhat.com> |
xfs: writeback and inval. file range to be shifted by collapse The collapse range operation currently writes the entire file before starting the collapse to avoid changes in the in-core extent list due to writeback causing the extent count to change. Now that collapse range is fsb based rather than extent index based it can sustain changes in the extent list during the shift sequence without disruption. Modify xfs_collapse_file_space() to writeback and invalidate pages associated with the range of the file to be shifted. xfs_free_file_space() currently has similar behavior, but the space free need only affect the region of the file that is freed and this could change in the future. Also update the comments to reflect the current implementation. We retain the eofblocks trim permanently as a best option for dealing with delalloc extents. We don't shift delalloc extents because this scenario only occurs with post-eof preallocation (since data must be flushed such that the cache can be invalidated and data can be shifted). That means said space must also be initialized before being shifted into the accessible region of the file only to be immediately truncated off as the last part of the collapse. In other words, the eofblocks trim will happen anyways, we just run it first to ensure the file remains in a consistent state throughout the collapse. Finally, detect and fail explicitly in the event of a delalloc extent during the extent shift. The implementation does not support delalloc extents and the caller is expected to prevent this scenario in advance as is done by collapse. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
a979bdfe |
|
22-Sep-2014 |
Brian Foster <bfoster@redhat.com> |
xfs: refactor single extent shift into xfs_bmse_shift_one() helper xfs_bmap_shift_extents() has a variety of conditions and error checks that make the logic difficult to follow and indent heavy. Refactor the loop body of this function into a new xfs_bmse_shift_one() helper. This simplifies the error checks, eliminates index decrement on merge hack by pushing the index increment down into the helper, and makes the code more readable by reducing multiple levels of indentation. This is a code refactor only. The behavior of extent shift and collapse range is not modified. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
ddb19e31 |
|
22-Sep-2014 |
Brian Foster <bfoster@redhat.com> |
xfs: refactor shift-by-merge into xfs_bmse_merge() helper The extent shift mechanism in xfs_bmap_shift_extents() is complicated and handles several different, non-deterministic scenarios. These include extent shifts, extent merges and potential btree updates in either of the former scenarios. Refactor the code to be more linear and readable. The loop logic in xfs_bmap_shift_extents() and some initial error checking is adjusted slightly. The associated btree lookup and update/delete operations are condensed into single blocks of code. This reduces the number of btree-specific blocks and facilitates the separation of the merge operation into a new xfs_bmse_merge() and xfs_bmse_can_merge() helpers. This is a code refactor only. The behavior of extent shift and collapse range is not modified. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
2c845f5a |
|
22-Sep-2014 |
Brian Foster <bfoster@redhat.com> |
xfs: track collapse via file offset rather than extent index The collapse range implementation uses a transaction per extent shift. The progress of the overall operation is tracked via the current extent index of the in-core extent list. This is racy because the ilock must be dropped and reacquired for each transaction according to locking and log reservation rules. Therefore, writeback to prior regions of the file is possible and can change the extent count. This changes the extent to which the current index refers and causes the collapse to fail mid operation. To avoid this problem, the entire file is currently written back before the collapse operation starts. To eliminate the need to flush the entire file, use the file offset (fsb) to track the progress of the overall extent shift operation rather than the extent index. Modify xfs_bmap_shift_extents() to unconditionally convert the start_fsb parameter to an extent index and return the file offset of the extent where the shift left off, if further extents exist. The bulk of ths function can remain based on extent index as ilock is held by the caller. xfs_collapse_file_space() now uses the fsb output as the starting point for the subsequent shift. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
ca446d88 |
|
01-Sep-2014 |
Brian Foster <bfoster@redhat.com> |
xfs: don't log inode unless extent shift makes extent modifications The file collapse mechanism uses xfs_bmap_shift_extents() to collapse all subsequent extents down into the specified, previously punched out, region. This function performs some validation, such as whether a sufficient hole exists in the target region of the collapse, then shifts the remaining exents downward. The exit path of the function currently logs the inode unconditionally. While we must log the inode (and abort) if an error occurs and the transaction is dirty, the initial validation paths can generate errors before the transaction has been dirtied. This creates an unnecessary filesystem shutdown scenario, as the caller will cancel a transaction that has been marked dirty. Modify xfs_bmap_shift_extents() to OR the logflags bits as modifications are made to the inode bmap. Only log the inode in the exit path if logflags has been set. This ensures we only have to cancel a dirty transaction if modifications have been made and prevents an unnecessary filesystem shutdown otherwise. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
d5cf09ba |
|
29-Jul-2014 |
Christoph Hellwig <hch@lst.de> |
xfs: require 64-bit sector_t Trying to support tiny disks only and saving a bit memory might have made sense on an SGI O2 15 years ago, but is pretty pointless today. Remove the rarely tested codepath that uses various smaller in-memory types to reduce our test matrix and make the codebase a little bit smaller and less complicated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
2451337d |
|
24-Jun-2014 |
Dave Chinner <dchinner@redhat.com> |
xfs: global error sign conversion Convert all the errors the core XFs code to negative error signs like the rest of the kernel and remove all the sign conversion we do in the interface layers. Errors for conversion (and comparison) found via searches like: $ git grep " E" fs/xfs $ git grep "return E" fs/xfs $ git grep " E[A-Z].*;$" fs/xfs Negation points found via searches like: $ git grep "= -[a-z,A-Z]" fs/xfs $ git grep "return -[a-z,A-D,F-Z]" fs/xfs $ git grep " -[a-z].*;" fs/xfs [ with some bits I missed from Brian Foster ] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
30f712c9 |
|
24-Jun-2014 |
Dave Chinner <dchinner@redhat.com> |
libxfs: move source files Move all the source files that are shared with userspace into libxfs/. This is done as one big chunk simpy to get it done quickly Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|