#
3fed24ff |
|
19-Feb-2024 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
xfs: Replace xfs_isilocked with xfs_assert_ilocked To use the new rwsem_assert_held()/rwsem_assert_held_write(), we can't use the existing ASSERT macro. Add a new xfs_assert_ilocked() and convert all the callers. Fix an apparent bug in xfs_isilocked(): If the caller specifies XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL, xfs_assert_ilocked() will check both the IOLOCK and the ILOCK are held for write. xfs_isilocked() only checked that the ILOCK was held for write. xfs_assert_ilocked() is always on, even if DEBUG or XFS_WARN aren't defined. It's a cheap check, so I don't think it's worth defining it away. Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
57b98393 |
|
15-Jan-2024 |
Dave Chinner <dchinner@redhat.com> |
xfs: use xfs_defer_alloc a bit more Noticed by inspection, simple factoring allows the same allocation routine to be used for both transaction and recovery contexts. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
0b3a76e9 |
|
15-Jan-2024 |
Dave Chinner <dchinner@redhat.com> |
xfs: use GFP_KERNEL in pure transaction contexts When running in a transaction context, memory allocations are scoped to GFP_NOFS. Hence we don't need to use GFP_NOFS contexts in pure transaction context allocations - GFP_KERNEL will automatically get converted to GFP_NOFS as appropriate. Go through the code and convert all the obvious GFP_NOFS allocations in transaction context to use GFP_KERNEL. This further reduces the explicit use of GFP_NOFS in XFS. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
d4c75a1b |
|
15-Jan-2024 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert remaining kmem_free() to kfree() The remaining callers of kmem_free() are freeing heap memory, so we can convert them directly to kfree() and get rid of kmem_free() altogether. This conversion was done with: $ for f in `git grep -l kmem_free fs/xfs`; do > sed -i s/kmem_free/kfree/ $f > done $ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
10634530 |
|
15-Jan-2024 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert kmem_zalloc() to kzalloc() There's no reason to keep the kmem_zalloc() around anymore, it's just a thin wrapper around kmalloc(), so lets get rid of it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
bcdfae6e |
|
28-Dec-2023 |
Christoph Hellwig <hch@lst.de> |
xfs: use the op name in trace_xlog_intent_recovery_failed Instead of tracing the address of the recovery handler, use the name in the defer op, similar to other defer ops related tracepoints. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
4f6ac47b |
|
28-Dec-2023 |
Christoph Hellwig <hch@lst.de> |
xfs: fix a use after free in xfs_defer_finish_recovery dfp will be freed by ->recover_work and thus the tracepoint in case of an error can lead to a use after free. Store the defer ops in a local variable to avoid that. Fixes: 7f2f7531e0d4 ("xfs: store an ops pointer in struct xfs_defer_pending") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
603ce8ab |
|
13-Dec-2023 |
Christoph Hellwig <hch@lst.de> |
xfs: pass the defer ops directly to xfs_defer_add Pass a pointer to the xfs_defer_op_type structure to xfs_defer_add and remove the indirection through the xfs_defer_ops_type enum and a global table of all possible operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
dc22af64 |
|
13-Dec-2023 |
Christoph Hellwig <hch@lst.de> |
xfs: pass the defer ops instead of type to xfs_defer_start_recovery xfs_defer_start_recovery is only called from xlog_recover_intent_item, and the callers of that all have the actual xfs_defer_ops_type operation vector at hand. Pass that directly instead of looking it up from the defer_op_types table. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
7f2f7531 |
|
13-Dec-2023 |
Christoph Hellwig <hch@lst.de> |
xfs: store an ops pointer in struct xfs_defer_pending The dfp_type field in struct xfs_defer_pending is only used to either look up the operations associated with the pending word or in trace points. Replace it with a direct pointer to the operations vector, and store a pretty name in the vector for tracing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
3f3cec03 |
|
06-Dec-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: force small EFIs for reaping btree extents Introduce the concept of a defer ops barrier to separate consecutively queued pending work items of the same type. With a barrier in place, the two work items will be tracked separately, and receive separate log intent items. The goal here is to prevent reaping of old metadata blocks from creating unnecessarily huge EFIs that could then run the risk of overflowing the scrub transaction. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
4dffb2cb |
|
06-Dec-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: allow pausing of pending deferred work items Traditionally, all pending deferred work attached to a transaction is finished when one of the xfs_defer_finish* functions is called. However, online repair wants to be able to allocate space for a new data structure, format a new metadata structure into the allocated space, and commit that into the filesystem. As a hedge against system crashes during repairs, we also want to log some EFI items for the allocated space speculatively, and cancel them if we elect to commit the new data structure. Therefore, introduce the idea of pausing a pending deferred work item. Log intent items are still created for paused items and relogged as necessary. However, paused items are pushed onto a side list before we start calling ->finish_item, and the whole list is reattach to the transaction afterwards. New work items are never attached to paused pending items. Modify xfs_defer_cancel to clean up pending deferred work items holding a log intent item but not a log intent done item, since that is now possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
6b126139 |
|
06-Dec-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: don't append work items to logged xfs_defer_pending objects When someone tries to add a deferred work item to xfs_defer_add, it will try to attach the work item to the most recently added xfs_defer_pending object attached to the transaction. However, it doesn't check if the pending object has a log intent item attached to it. This is incorrect behavior because we cannot add more work to an object that has already been committed to the ondisk log. Therefore, change the behavior not to append to pending items with a non null dfp_intent. In practice this has not been an issue because the only way xfs_defer_add gets called after log intent items have been committed is from the defer ops ->finish_item functions themselves, and the @dop_pending isolation in xfs_defer_finish_noroll protects the pending items that have already been logged. However, the next patch will add the ability to pause a deferred extent free object during online btree rebuilding, and any new extfree work items need to have their own pending event. While we're at it, hoist the predicate to its own static inline function for readability. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
9c07bca7 |
|
04-Dec-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: elide ->create_done calls for unlogged deferred work Extended attribute updates use the deferred work machinery to manage state across a chain of smaller transactions. All previous deferred work users have employed log intent items and log done items to manage restarting of interrupted operations, which means that ->create_intent sets dfp_intent to a log intent item and ->create_done uses that item to create a log intent done item. However, xattrs have used the INCOMPLETE flag to deal with the lack of recovery support for an interrupted transaction chain. Log items are optional if the xattr update caller didn't set XFS_DA_OP_LOGGED to require a restartable sequence. In other words, ->create_intent can return NULL to say that there's no log intent item. If that's the case, no log intent done item should be created. Clean up xfs_defer_create_done not to do this, so that the ->create_done functions don't have to check for non-null dfp_intent themselves. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
a49c708f |
|
30-Nov-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: move ->iop_relog to struct xfs_defer_op_type The only log items that need relogging are the ones created for deferred work operations, and the only part of the code base that relogs log items is the deferred work machinery. Move the function pointers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
b28852a5 |
|
30-Nov-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: hoist xfs_trans_add_item calls to defer ops functions Remove even more repeated boilerplate. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
3e0958be |
|
30-Nov-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: clean out XFS_LI_DIRTY setting boilerplate from ->iop_relog Hoist this dirty flag setting to the ->iop_relog callsite to reduce boilerplate. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
bd3a88f6 |
|
30-Nov-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: use xfs_defer_create_done for the relogging operation Now that we have a helper to handle creating a log intent done item and updating all the necessary state flags, use it to reduce boilerplate in the ->iop_relog implementations. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
f3fd7f6f |
|
30-Nov-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: hoist ->create_intent boilerplate to its callsite Hoist the dirty flag setting code out of each ->create_intent implementation up to the callsite to reduce boilerplate further. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
db7ccc0b |
|
22-Nov-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: move ->iop_recover to xfs_defer_op_type Finish off the series by moving the intent item recovery function pointer to the xfs_defer_op_type struct, since this is really a deferred work function now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
3dd75c8d |
|
30-Nov-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: hoist intent done flag setting to ->finish_item callsite Each log intent item's ->finish_item call chain inevitably includes some code to set the dirty flag of the transaction. If there's an associated log intent done item, it also sets the item's dirty flag and the transaction's INTENT_DONE flag. This is repeated throughout the codebase. Reduce the LOC by moving all that to xfs_defer_finish_one. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
e5f1a514 |
|
22-Nov-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: use xfs_defer_finish_one to finish recovered work items Get rid of the open-coded calls to xfs_defer_finish_one. This also means that the recovery transaction takes care of cleaning up the dfp, and we have solved (I hope) all the ownership issues in recovery. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
e70fb328 |
|
22-Nov-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: recreate work items when recovering intent items Recreate work items for each xfs_defer_pending object when we are recovering intent items. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
03f7767c |
|
22-Nov-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: use xfs_defer_pending objects to recover intent items One thing I never quite got around to doing is porting the log intent item recovery code to reconstruct the deferred pending work state. As a result, each intent item open codes xfs_defer_finish_one in its recovery method, because that's what the EFI code did before xfs_defer.c even existed. This is a gross thing to have left unfixed -- if an EFI cannot proceed due to busy extents, we end up creating separate new EFIs for each unfinished work item, which is a change in behavior from what runtime would have done. Worse yet, Long Li pointed out that there's a UAF in the recovery code. The ->commit_pass2 function adds the intent item to the AIL and drops the refcount. The one remaining refcount is now owned by the recovery mechanism (aka the log intent items in the AIL) with the intent of giving the refcount to the intent done item in the ->iop_recover function. However, if something fails later in recovery, xlog_recover_finish will walk the recovered intent items in the AIL and release them. If the CIL hasn't been pushed before that point (which is possible since we don't force the log until later) then the intent done release will try to free its associated intent, which has already been freed. This patch starts to address this mess by having the ->commit_pass2 functions recreate the xfs_defer_pending state. The next few patches will fix the recovery functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
f8f9d952 |
|
31-Jul-2023 |
Long Li <leo.lilong@huawei.com> |
xfs: abort intent items when recovery intents fail When recovering intents, we capture newly created intent items as part of committing recovered intent items. If intent recovery fails at a later point, we forget to remove those newly created intent items from the AIL and hang: [root@localhost ~]# cat /proc/539/stack [<0>] xfs_ail_push_all_sync+0x174/0x230 [<0>] xfs_unmount_flush_inodes+0x8d/0xd0 [<0>] xfs_mountfs+0x15f7/0x1e70 [<0>] xfs_fs_fill_super+0x10ec/0x1b20 [<0>] get_tree_bdev+0x3c8/0x730 [<0>] vfs_get_tree+0x89/0x2c0 [<0>] path_mount+0xecf/0x1800 [<0>] do_mount+0xf3/0x110 [<0>] __x64_sys_mount+0x154/0x1f0 [<0>] do_syscall_64+0x39/0x80 [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd When newly created intent items fail to commit via transaction, intent recovery hasn't created done items for these newly created intent items, so the capture structure is the sole owner of the captured intent items. We must release them explicitly or else they leak: unreferenced object 0xffff888016719108 (size 432): comm "mount", pid 529, jiffies 4294706839 (age 144.463s) hex dump (first 32 bytes): 08 91 71 16 80 88 ff ff 08 91 71 16 80 88 ff ff ..q.......q..... 18 91 71 16 80 88 ff ff 18 91 71 16 80 88 ff ff ..q.......q..... backtrace: [<ffffffff8230c68f>] xfs_efi_init+0x18f/0x1d0 [<ffffffff8230c720>] xfs_extent_free_create_intent+0x50/0x150 [<ffffffff821b671a>] xfs_defer_create_intents+0x16a/0x340 [<ffffffff821bac3e>] xfs_defer_ops_capture_and_commit+0x8e/0xad0 [<ffffffff82322bb9>] xfs_cui_item_recover+0x819/0x980 [<ffffffff823289b6>] xlog_recover_process_intents+0x246/0xb70 [<ffffffff8233249a>] xlog_recover_finish+0x8a/0x9a0 [<ffffffff822eeafb>] xfs_log_mount_finish+0x2bb/0x4a0 [<ffffffff822c0f4f>] xfs_mountfs+0x14bf/0x1e70 [<ffffffff822d1f80>] xfs_fs_fill_super+0x10d0/0x1b20 [<ffffffff81a21fa2>] get_tree_bdev+0x3d2/0x6d0 [<ffffffff81a1ee09>] vfs_get_tree+0x89/0x2c0 [<ffffffff81a9f35f>] path_mount+0xecf/0x1800 [<ffffffff81a9fd83>] do_mount+0xf3/0x110 [<ffffffff81aa00e4>] __x64_sys_mount+0x154/0x1f0 [<ffffffff83968739>] do_syscall_64+0x39/0x80 Fix the problem above by abort intent items that don't have a done item when recovery intents fail. Fixes: e6fff81e4870 ("xfs: proper replay of deferred ops queued during log recovery") Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
2a5db859 |
|
31-Jul-2023 |
Long Li <leo.lilong@huawei.com> |
xfs: factor out xfs_defer_pending_abort Factor out xfs_defer_pending_abort() from xfs_defer_trans_abort(), which not use transaction parameter, so it can be used after the transaction life cycle. Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
d5c88131 |
|
11-Apr-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: allow queued AG intents to drain before scrubbing When a writer thread executes a chain of log intent items, the AG header buffer locks will cycle during a transaction roll to get from one intent item to the next in a chain. Although scrub takes all AG header buffer locks, this isn't sufficient to guard against scrub checking an AG while that writer thread is in the middle of finishing a chain because there's no higher level locking primitive guarding allocation groups. When there's a collision, cross-referencing between data structures (e.g. rmapbt and refcountbt) yields false corruption events; if repair is running, this results in incorrect repairs, which is catastrophic. Fix this by adding to the perag structure the count of active intents and make scrub wait until it has both AG header buffer locks and the intent counter reaches zero. One quirk of the drain code is that deferred bmap updates also bump and drop the intent counter. A fundamental decision made during the design phase of the reverse mapping feature is that updates to the rmapbt records are always made by the same code that updates the primary metadata. In other words, callers of bmapi functions expect that the bmapi functions will queue deferred rmap updates. Some parts of the reflink code queue deferred refcount (CUI) and bmap (BUI) updates in the same head transaction, but the deferred work manager completely finishes the CUI before the BUI work is started. As a result, the CUI drops the intent count long before the deferred rmap (RUI) update even has a chance to bump the intent count. The only way to keep the intent count elevated between the CUI and RUI is for the BUI to bump the counter until the RUI has been created. A second quirk of the intent drain code is that deferred work items must increment the intent counter as soon as the work item is added to the transaction. When a BUI completes and queues an RUI, the RUI must increment the counter before the BUI decrements it. The only way to accomplish this is to require that the counter be bumped as soon as the deferred work item is created in memory. In the next patches we'll improve on this facility, but this patch provides the basic functionality. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
4183e4f2 |
|
22-May-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: share xattr name and value buffers when logging xattr updates While running xfs/297 and generic/642, I noticed a crash in xfs_attri_item_relog when it tries to copy the attr name to the new xattri log item. I think what happened here was that we called ->iop_commit on the old attri item (which nulls out the pointers) as part of a log force at the same time that a chained attr operation was ongoing. The system was busy enough that at some later point, the defer ops operation decided it was necessary to relog the attri log item, but as we've detached the name buffer from the old attri log item, we can't copy it to the new one, and kaboom. I think there's a broader refcounting problem with LARP mode -- the setxattr code can return to userspace before the CIL actually formats and commits the log item, which results in a UAF bug. Therefore, the xattr log item needs to be able to retain a reference to the name and value buffers until the log items have completely cleared the log. Furthermore, each time we create an intent log item, we allocate new memory and (re)copy the contents; sharing here would be very useful. Solve the UAF and the unnecessary memory allocations by having the log code create a single refcounted buffer to contain the name and value contents. This buffer can be passed from old to new during a relog operation, and the logging code can (optionally) attach it to the xfs_attr_item for reuse when LARP mode is enabled. This also fixes a problem where the xfs_attri_log_item objects weren't being freed back to the same cache where they came from. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
4136e38a |
|
21-May-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: put attr[id] log item cache init with the others Initialize and destroy the xattr log item caches in the same places that we do all the other log item caches. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
e2c78949 |
|
21-May-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: use a separate slab cache for deferred xattr work state Create a separate slab cache for struct xfs_attr_item objects, since we can pack the (104-byte) intent items more tightly than we can with the general slab cache objects. On x86, this means 39 intents per memory page instead of 32. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
e0c41089 |
|
11-May-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: separate out initial attr_set states We current use XFS_DAS_UNINIT for several steps in the attr_set state machine. We use it for setting shortform xattrs, converting from shortform to leaf, leaf add, leaf-to-node and leaf add. All of these things are essentially known before we start the state machine iterating, so we really should separate them out: XFS_DAS_SF_ADD: - tries to do a shortform add - on success -> done - on ENOSPC converts to leaf, -> XFS_DAS_LEAF_ADD - on error, dies. XFS_DAS_LEAF_ADD: - tries to do leaf add - on success: - inline attr -> done - remote xattr || REPLACE -> XFS_DAS_FOUND_LBLK - on ENOSPC converts to node, -> XFS_DAS_NODE_ADD - on error, dies XFS_DAS_NODE_ADD: - tries to do node add - on success: - inline attr -> done - remote xattr || REPLACE -> XFS_DAS_FOUND_NBLK - on error, dies This makes it easier to understand how the state machine starts up and sets us up on the path to further state machine simplifications. This also converts the DAS state tracepoints to use strings rather than numbers, as converting between enums and numbers requires manual counting rather than just reading the name. This also introduces a XFS_DAS_DONE state so that we can trace successful operation completions easily. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson<allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
1d08e11d |
|
09-May-2022 |
Allison Henderson <allison.henderson@oracle.com> |
xfs: Implement attr logging and replay This patch adds the needed routines to create, log and recover logged extended attribute intents. Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
fd920008 |
|
03-May-2022 |
Allison Henderson <allison.henderson@oracle.com> |
xfs: Set up infrastructure for log attribute replay Currently attributes are modified directly across one or more transactions. But they are not logged or replayed in the event of an error. The goal of log attr replay is to enable logging and replaying of attribute operations using the existing delayed operations infrastructure. This will later enable the attributes to become part of larger multi part operations that also must first be recorded to the log. This is mostly of interest in the scheme of parent pointers which would need to maintain an attribute containing parent inode information any time an inode is moved, created, or removed. Parent pointers would then be of interest to any feature that would need to quickly derive an inode path from the mount point. Online scrub, nfs lookups and fs grow or shrink operations are all features that could take advantage of this. This patch adds two new log item types for setting or removing attributes as deferred operations. The xfs_attri_log_item will log an intent to set or remove an attribute. The corresponding xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is freed once the transaction is done. Both log items use a generic xfs_attr_log_format structure that contains the attribute name, value, flags, inode, and an op_flag that indicates if the operations is a set or remove. [dchinner: added extra little bits needed for intent whiteouts] Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
7b3ec2b2 |
|
03-May-2022 |
Allison Henderson <allison.henderson@oracle.com> |
xfs: Fix double unlock in defer capture code The new deferred attr patch set uncovered a double unlock in the recent port of the defer ops capture and continue code. During log recovery, we're allowed to hold buffers to a transaction that's being used to replay an intent item. When we capture the resources as part of scheduling a continuation of an intent chain, we call xfs_buf_hold to retain our reference to the buffer beyond the transaction commit, but we do /not/ call xfs_trans_bhold to maintain the buffer lock. This means that xfs_defer_ops_continue needs to relock the buffers before xfs_defer_restore_resources joins then tothe new transaction. Additionally, the buffers should not be passed back via the dres structure since they need to remain locked unlike the inodes. So simply set dr_bufs to zero after populating the dres structure. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
5ddd658e |
|
03-May-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: don't commit the first deferred transaction without intents If the first operation in a string of defer ops has no intents, then there is no reason to commit it before running the first call to xfs_defer_finish_one(). This allows the defer ops to be used effectively for non-intent based operations without requiring an unnecessary extra transaction commit when first called. This fixes a regression in per-attribute modification transaction count when delayed attributes are not being used. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
c201d9ca |
|
12-Oct-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: rename xfs_bmap_add_free to xfs_free_extent_later xfs_bmap_add_free isn't a block mapping function; it schedules deferred freeing operations for a later point in a compound transaction chain. While it's primarily used by bunmapi, its use has expanded beyond that. Move it to xfs_alloc.c and rename the function since it's now general freeing functionality. Bring the slab cache bits in line with the way we handle the other intent items. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
f3c799c2 |
|
12-Oct-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: create slab caches for frequently-used deferred items Create slab caches for the high-level structures that coordinate deferred intent items, since they're used fairly heavily. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
512edfac |
|
16-Sep-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: port the defer ops capture and continue to resource capture When log recovery tries to recover a transaction that had log intent items attached to it, it has to save certain parts of the transaction state (reservation, dfops chain, inodes with no automatic unlock) so that it can finish single-stepping the recovered transactions before finishing the chains. This is done with the xfs_defer_ops_capture and xfs_defer_ops_continue functions. Right now they open-code this functionality, so let's port this to the formalized resource capture structure that we introduced in the previous patch. This enables us to hold up to two inodes and two buffers during log recovery, the same way we do for regular runtime. With this patch applied, we'll be ready to support atomic extent swap which holds two inodes; and logged xattrs which holds one inode and one xattr leaf buffer. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
|
#
c5db9f93 |
|
16-Sep-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: formalize the process of holding onto resources across a defer roll Transaction users are allowed to flag up to two buffers and two inodes for ownership preservation across a deferred transaction roll. Hoist the variables and code responsible for this out of xfs_defer_trans_roll so that we can use it for the defer capture mechanism. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
|
#
74f4d6a1 |
|
25-Sep-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: only relog deferred intent items if free space in the log gets low Now that we have the ability to ask the log how far the tail needs to be pushed to maintain its free space targets, augment the decision to relog an intent item so that we only do it if the log has hit the 75% full threshold. There's no point in relogging an intent into the same checkpoint, and there's no need to relog if there's plenty of free space in the log. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
4e919af7 |
|
27-Sep-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: periodically relog deferred intent items There's a subtle design flaw in the deferred log item code that can lead to pinning the log tail. Taking up the defer ops chain examples from the previous commit, we can get trapped in sequences like this: Caller hands us a transaction t0 with D0-D3 attached. The defer ops chain will look like the following if the transaction rolls succeed: t1: D0(t0), D1(t0), D2(t0), D3(t0) t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0) t3: d5(t1), D1(t0), D2(t0), D3(t0) ... t9: d9(t7), D3(t0) t10: D3(t0) t11: d10(t10), d11(t10) t12: d11(t10) In transaction 9, we finish d9 and try to roll to t10 while holding onto an intent item for D3 that we logged in t0. The previous commit changed the order in which we place new defer ops in the defer ops processing chain to reduce the maximum chain length. Now make xfs_defer_finish_noroll capable of relogging the entire chain periodically so that we can always move the log tail forward. Most chains will never get relogged, except for operations that generate very long chains (large extents containing many blocks with different sharing levels) or are on filesystems with small logs and a lot of ongoing metadata updates. Callers are now required to ensure that the transaction reservation is large enough to handle logging done items and new intent items for the maximum possible chain length. Most callers are careful to keep the chain lengths low, so the overhead should be minimal. The decision to relog an intent item is made based on whether the intent was logged in a previous checkpoint, since there's no point in relogging an intent into the same checkpoint. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
27dada07 |
|
25-Sep-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: change the order in which child and parent defer ops are finished The defer ops code has been finishing items in the wrong order -- if a top level defer op creates items A and B, and finishing item A creates more defer ops A1 and A2, we'll put the new items on the end of the chain and process them in the order A B A1 A2. This is kind of weird, since it's convenient for programmers to be able to think of A and B as an ordered sequence where all the sub-tasks for A must finish before we move on to B, e.g. A A1 A2 D. Right now, our log intent items are not so complex that this matters, but this will become important for the atomic extent swapping patchset. In order to maintain correct reference counting of extents, we have to unmap and remap extents in that order, and we want to complete that work before moving on to the next range that the user wants to swap. This patch fixes defer ops to satsify that requirement. The primary symptom of the incorrect order was noticed in an early performance analysis of the atomic extent swap code. An astonishingly large number of deferred work items accumulated when userspace requested an atomic update of two very fragmented files. The cause of this was traced to the same ordering bug in the inner loop of xfs_defer_finish_noroll. If the ->finish_item method of a deferred operation queues new deferred operations, those new deferred ops are appended to the tail of the pending work list. To illustrate, say that a caller creates a transaction t0 with four deferred operations D0-D3. The first thing defer ops does is roll the transaction to t1, leaving us with: t1: D0(t0), D1(t0), D2(t0), D3(t0) Let's say that finishing each of D0-D3 will create two new deferred ops. After finish D0 and roll, we'll have the following chain: t2: D1(t0), D2(t0), D3(t0), d4(t1), d5(t1) d4 and d5 were logged to t1. Notice that while we're about to start work on D1, we haven't actually completed all the work implied by D0 being finished. So far we've been careful (or lucky) to structure the dfops callers such that D1 doesn't depend on d4 or d5 being finished, but this is a potential logic bomb. There's a second problem lurking. Let's see what happens as we finish D1-D3: t3: D2(t0), D3(t0), d4(t1), d5(t1), d6(t2), d7(t2) t4: D3(t0), d4(t1), d5(t1), d6(t2), d7(t2), d8(t3), d9(t3) t5: d4(t1), d5(t1), d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4) Let's say that d4-d11 are simple work items that don't queue any other operations, which means that we can complete each d4 and roll to t6: t6: d5(t1), d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4) t7: d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4) ... t11: d10(t4), d11(t4) t12: d11(t4) <done> When we try to roll to transaction #12, we're holding defer op d11, which we logged way back in t4. This means that the tail of the log is pinned at t4. If the log is very small or there are a lot of other threads updating metadata, this means that we might have wrapped the log and cannot get roll to t11 because there isn't enough space left before we'd run into t4. Let's shift back to the original failure. I mentioned before that I discovered this flaw while developing the atomic file update code. In that scenario, we have a defer op (D0) that finds a range of file blocks to remap, creates a handful of new defer ops to do that, and then asks to be continued with however much work remains. So, D0 is the original swapext deferred op. The first thing defer ops does is rolls to t1: t1: D0(t0) We try to finish D0, logging d1 and d2 in the process, but can't get all the work done. We log a done item and a new intent item for the work that D0 still has to do, and roll to t2: t2: D0'(t1), d1(t1), d2(t1) We roll and try to finish D0', but still can't get all the work done, so we log a done item and a new intent item for it, requeue D0 a second time, and roll to t3: t3: D0''(t2), d1(t1), d2(t1), d3(t2), d4(t2) If it takes 48 more rolls to complete D0, then we'll finally dispense with D0 in t50: t50: D<fifty primes>(t49), d1(t1), ..., d102(t50) We then try to roll again to get a chain like this: t51: d1(t1), d2(t1), ..., d101(t50), d102(t50) ... t152: d102(t50) <done> Notice that in rolling to transaction #51, we're holding on to a log intent item for d1 that was logged in transaction #1. This means that the tail of the log is pinned at t1. If the log is very small or there are a lot of other threads updating metadata, this means that we might have wrapped the log and cannot roll to t51 because there isn't enough space left before we'd run into t1. This is of course problem #2 again. But notice the third problem with this scenario: we have 102 defer ops tied to this transaction! Each of these items are backed by pinned kernel memory, which means that we risk OOM if the chains get too long. Yikes. Problem #1 is a subtle logic bomb that could hit someone in the future; problem #2 applies (rarely) to the current upstream, and problem #3 applies to work under development. This is not how incremental deferred operations were supposed to work. The dfops design of logging in the same transaction an intent-done item and a new intent item for the work remaining was to make it so that we only have to juggle enough deferred work items to finish that one small piece of work. Deferred log item recovery will find that first unfinished work item and restart it, no matter how many other intent items might follow it in the log. Therefore, it's ok to put the new intents at the start of the dfops chain. For the first example, the chains look like this: t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0) t3: d5(t1), D1(t0), D2(t0), D3(t0) ... t9: d9(t7), D3(t0) t10: D3(t0) t11: d10(t10), d11(t10) t12: d11(t10) For the second example, the chains look like this: t1: D0(t0) t2: d1(t1), d2(t1), D0'(t1) t3: d2(t1), D0'(t1) t4: D0'(t1) t5: d1(t4), d2(t4), D0''(t4) ... t148: D0<50 primes>(t147) t149: d101(t148), d102(t148) t150: d102(t148) <done> This actually sucks more for pinning the log tail (we try to roll to t10 while holding an intent item that was logged in t1) but we've solved problem #1. We've also reduced the maximum chain length from: sum(all the new items) + nr_original_items to: max(new items that each original item creates) + nr_original_items This solves problem #3 by sharply reducing the number of defer ops that can be attached to a transaction at any given time. The change makes the problem of log tail pinning worse, but is improvement we need to solve problem #2. Actually solving #2, however, is left to the next patch. Note that a subsequent analysis of some hard-to-trigger reflink and COW livelocks on extremely fragmented filesystems (or systems running a lot of IO threads) showed the same symptoms -- uncomfortably large numbers of incore deferred work items and occasional stalls in the transaction grant code while waiting for log reservations. I think this patch and the next one will also solve these problems. As originally written, the code used list_splice_tail_init instead of list_splice_init, so change that, and leave a short comment explaining our actions. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
ff4ab5e0 |
|
25-Sep-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: fix an incore inode UAF in xfs_bui_recover In xfs_bui_item_recover, there exists a use-after-free bug with regards to the inode that is involved in the bmap replay operation. If the mapping operation does not complete, we call xfs_bmap_unmap_extent to create a deferred op to finish the unmapping work, and we retain a pointer to the incore inode. Unfortunately, the very next thing we do is commit the transaction and drop the inode. If reclaim tears down the inode before we try to finish the defer ops, we dereference garbage and blow up. Therefore, create a way to join inodes to the defer ops freezer so that we can maintain the xfs_inode reference until we're done with the inode. Note: This imposes the requirement that there be enough memory to keep every incore inode in memory throughout recovery. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
929b92f6 |
|
25-Sep-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: xfs_defer_capture should absorb remaining transaction reservation When xfs_defer_capture extracts the deferred ops and transaction state from a transaction, it should record the transaction reservation type from the old transaction so that when we continue the dfops chain, we still use the same reservation parameters. Doing this means that the log item recovery functions get to determine the transaction reservation instead of abusing tr_itruncate in yet another part of xfs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
4f9a60c4 |
|
25-Sep-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: xfs_defer_capture should absorb remaining block reservations When xfs_defer_capture extracts the deferred ops and transaction state from a transaction, it should record the remaining block reservations so that when we continue the dfops chain, we can reserve the same number of blocks to use. We capture the reservations for both data and realtime volumes. This adds the requirement that every log intent item recovery function must be careful to reserve enough blocks to handle both itself and all defer ops that it can queue. On the other hand, this enables us to do away with the handwaving block estimation nonsense that was going on in xlog_finish_defer_ops. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
e6fff81e |
|
25-Sep-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: proper replay of deferred ops queued during log recovery When we replay unfinished intent items that have been recovered from the log, it's possible that the replay will cause the creation of more deferred work items. As outlined in commit 509955823cc9c ("xfs: log recovery should replay deferred ops in order"), later work items have an implicit ordering dependency on earlier work items. Therefore, recovery must replay the items (both recovered and created) in the same order that they would have been during normal operation. For log recovery, we enforce this ordering by using an empty transaction to collect deferred ops that get created in the process of recovering a log intent item to prevent them from being committed before the rest of the recovered intent items. After we finish committing all the recovered log items, we allocate a transaction with an enormous block reservation, splice our huge list of created deferred ops into that transaction, and commit it, thereby finishing all those ops. This is /really/ hokey -- it's the one place in XFS where we allow nested transactions; the splicing of the defer ops list is is inelegant and has to be done twice per recovery function; and the broken way we handle inode pointers and block reservations cause subtle use-after-free and allocator problems that will be fixed by this patch and the two patches after it. Therefore, replace the hokey empty transaction with a structure designed to capture each chain of deferred ops that are created as part of recovering a single unfinished log intent. Finally, refactor the loop that replays those chains to do so using one transaction per chain. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
b80b29d6 |
|
25-Sep-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: remove xfs_defer_reset Remove this one-line helper since the assert is trivially true in one call site and the rest obscures a bitmask operation. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
93293bcb |
|
21-Sep-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: log new intent items created as part of finishing recovered intent items During a code inspection, I found a serious bug in the log intent item recovery code when an intent item cannot complete all the work and decides to requeue itself to get that done. When this happens, the item recovery creates a new incore deferred op representing the remaining work and attaches it to the transaction that it allocated. At the end of _item_recover, it moves the entire chain of deferred ops to the dummy parent_tp that xlog_recover_process_intents passed to it, but fail to log a new intent item for the remaining work before committing the transaction for the single unit of work. xlog_finish_defer_ops logs those new intent items once recovery has finished dealing with the intent items that it recovered, but this isn't sufficient. If the log is forced to disk after a recovered log item decides to requeue itself and the system goes down before we call xlog_finish_defer_ops, the second log recovery will never see the new intent item and therefore has no idea that there was more work to do. It will finish recovery leaving the filesystem in a corrupted state. The same logic applies to /any/ deferred ops added during intent item recovery, not just the one handling the remaining work. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
78bba5c8 |
|
13-May-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: use ordered buffers to initialize dquot buffers during quotacheck While QAing the new xfs_repair quotacheck code, I uncovered a quota corruption bug resulting from a bad interaction between dquot buffer initialization and quotacheck. The bug can be reproduced with the following sequence: # mkfs.xfs -f /dev/sdf # mount /dev/sdf /opt -o usrquota # su nobody -s /bin/bash -c 'touch /opt/barf' # sync # xfs_quota -x -c 'report -ahi' /opt User quota on /opt (/dev/sdf) Inodes User ID Used Soft Hard Warn/Grace ---------- --------------------------------- root 3 0 0 00 [------] nobody 1 0 0 00 [------] # xfs_io -x -c 'shutdown' /opt # umount /opt # mount /dev/sdf /opt -o usrquota # touch /opt/man2 # xfs_quota -x -c 'report -ahi' /opt User quota on /opt (/dev/sdf) Inodes User ID Used Soft Hard Warn/Grace ---------- --------------------------------- root 1 0 0 00 [------] nobody 1 0 0 00 [------] # umount /opt Notice how the initial quotacheck set the root dquot icount to 3 (rootino, rbmino, rsumino), but after shutdown -> remount -> recovery, xfs_quota reports that the root dquot has only 1 icount. We haven't deleted anything from the filesystem, which means that quota is now under-counting. This behavior is not limited to icount or the root dquot, but this is the shortest reproducer. I traced the cause of this discrepancy to the way that we handle ondisk dquot updates during quotacheck vs. regular fs activity. Normally, when we allocate a disk block for a dquot, we log the buffer as a regular (dquot) buffer. Subsequent updates to the dquots backed by that block are done via separate dquot log item updates, which means that they depend on the logged buffer update being written to disk before the dquot items. Because individual dquots have their own LSN fields, that initial dquot buffer must always be recovered. However, the story changes for quotacheck, which can cause dquot block allocations but persists the final dquot counter values via a delwri list. Because recovery doesn't gate dquot buffer replay on an LSN, this means that the initial dquot buffer can be replayed over the (newer) contents that were delwritten at the end of quotacheck. In effect, this re-initializes the dquot counters after they've been updated. If the log does not contain any other dquot items to recover, the obsolete dquot contents will not be corrected by log recovery. Because quotacheck uses a transaction to log the setting of the CHKD flags in the superblock, we skip quotacheck during the second mount call, which allows the incorrect icount to remain. Fix this by changing the ondisk dquot initialization function to use ordered buffers to write out fresh dquot blocks if it detects that we're running quotacheck. If the system goes down before quotacheck can complete, the CHKD flags will not be set in the superblock and the next mount will run quotacheck again, which can fix uninitialized dquot buffers. This requires amending the defer code to maintaine ordered buffer state across defer rolls for the sake of the dquot allocation code. For regular operations we preserve the current behavior since the dquot items require properly initialized ondisk dquot records. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
3ec1b26c |
|
30-Apr-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: use a xfs_btree_cur for the ->finish_cleanup state Given how XFS is all based around btrees it doesn't make much sense to offer a totally generic state when we can just use the btree cursor. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f09d167c |
|
30-Apr-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: turn dfp_done into a xfs_log_item All defer op instance place their own extension of the log item into the dfp_done field. Replace that with a xfs_log_item to improve type safety and make the code easier to follow. Also use the opportunity to improve the ->finish_item calling conventions to place the done log item as the higher level structure before the list_entry used for the individual items. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
bb47d797 |
|
30-Apr-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: refactor xfs_defer_finish_noroll Split out a helper that operates on a single xfs_defer_pending structure to untangle the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
d367a868 |
|
30-Apr-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: merge the ->diff_items defer op into ->create_intent This avoids a per-item indirect call, and also simplifies the interface a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
c1f09188 |
|
30-Apr-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: merge the ->log_item defer op into ->create_intent These are aways called together, and my merging them we reduce the amount of indirect calls, improve type safety and in general clean up the code a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e046e949 |
|
30-Apr-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: factor out a xfs_defer_create_intent helper Create a helper that encapsulates the whole logic to create a defer intent. This reorders some of the work that was done, but none of that has an affect on the operation as only fields that don't directly interact are affected. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
707e0dda |
|
26-Aug-2019 |
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> |
fs: xfs: Remove KM_NOSLEEP and KM_SLEEP. Since no caller is using KM_NOSLEEP and no callee branches on KM_SLEEP, we can remove KM_NOSLEEP and replace KM_SLEEP with 0. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
250d4b4c |
|
28-Jun-2019 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: remove unused header files There are many, many xfs header files which are included but unneeded (or included twice) in the xfs code, so remove them. nb: xfs_linux.h includes about 9 headers for everyone, so those explicit includes get removed by this. I'm not sure what the preference is, but if we wanted explicit includes everywhere, a followup patch could remove those xfs_*.h includes from xfs_linux.h and move them into the files that need them. Or it could be left as-is. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
710d707d |
|
24-Apr-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: always rejoin held resources during defer roll During testing of xfs/141 on a V4 filesystem, I observed some inconsistent behavior with regards to resources that are held (i.e. remain locked) across a defer roll. The transaction roll always gives the defer roll function a new transaction, even if committing the old transaction fails. However, the defer roll function only rejoins the held resources if the transaction commit succeedied. This means that callers of defer roll have to figure out whether the held resources are attached to the transaction being passed back. Worse yet, if the defer roll was part of a defer finish call, we have a third possibility: the defer finish could pass back a dirty transaction with dirty held resources and an error code. The only sane way to handle all of these scenarios is to require that the code that held the resource either cancel the transaction before unlocking and releasing the resources, or use functions that detach resources from a transaction properly (e.g. xfs_trans_brelse) if they need to drop the reference before committing or cancelling the transaction. In order to make this so, change the defer roll code to join held resources to the new transaction unconditionally and fix all the bhold callers to release the held buffers correctly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
02b100fb |
|
12-Dec-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: streamline defer op type handling There's no need to bundle a pointer to the defer op type into the defer op control structure. Instead, store the defer op type enum, which enables us to shorten some of the lines. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
bc9f2b7c |
|
12-Dec-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: idiotproof defer op type configuration Recently, we forgot to port a new defer op type to xfsprogs, which caused us some userspace pain. Reorganize the way we make libxfs clients supply defer op type information so that all type information has to be provided at build time instead of risky runtime dynamic configuration. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
9d9e6233 |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: fold dfops into the transaction struct xfs_defer_ops has now been reduced to a single list_head. The external dfops mechanism is unused and thus everywhere a (permanent) transaction is accessible the associated dfops structure is as well. Remove the xfs_defer_ops structure and fold the list_head into the transaction. Also remove the last remnant of external dfops in xfs_trans_dup(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0f37d178 |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: pass transaction to xfs_defer_add() The majority of remaining references to struct xfs_defer_ops in XFS are associated with xfs_defer_add(). At this point, there are no more external xfs_defer_ops users left. All instances of xfs_defer_ops are embedded in the transaction, which means we can safely pass the transaction down to the dfops add interface. Update xfs_defer_add() to receive the transaction as a parameter. Various subsystems implement wrappers to allocate and construct the context specific data structures for the associated deferred operation type. Update these to also carry the transaction down as needed and clean up unused dfops parameters along the way. This removes most of the remaining references to struct xfs_defer_ops throughout the code and facilitates removal of the structure. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [darrick: fix unused variable warnings with ftrace disabled] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
1ae093cb |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: replace xfs_defer_ops ->dop_pending with on-stack list The xfs_defer_ops ->dop_pending list is used to track active deferred operations once intents are logged. These items must be aborted in the event of an error. The list is populated as intents are logged and items are removed as they complete (or are aborted). Now that xfs_defer_finish() cancels on error, there is no need to ever access ->dop_pending outside of xfs_defer_finish(). The list is only ever populated after xfs_defer_finish() begins and is either completed or cancelled before it returns. Remove ->dop_pending from xfs_defer_ops and replace it with a local list in the xfs_defer_finish() path. Pass the local list to the various helpers now that it is not accessible via dfops. Note that we have to check for NULL in the abort case as the final tx roll occurs outside of the scope of the new local list (once the dfops has completed and thus drained the list). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
9b1f4e98 |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: cancel dfops on xfs_defer_finish() error The current semantics of xfs_defer_finish() require the caller to call xfs_defer_cancel() on error. This is slightly inconsistent with transaction commit error handling where a failed commit cleans up the transaction before returning. More significantly, the only requirement for exposure of ->dop_pending outside of xfs_defer_finish() is so that xfs_defer_cancel() can drain it on error. Since the only recourse of xfs_defer_finish() errors is cancellation, mirror the transaction logic and cancel remaining dfops before returning from xfs_defer_finish() with an error. Beside simplifying xfs_defer_finish() semantics, this ensures that xfs_defer_finish() always returns with an empty ->dop_pending and thus facilitates removal of the list from xfs_defer_ops. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
60f31a60 |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: clean out superfluous dfops dop params/vars The dfops code still passes around the xfs_defer_ops pointer superfluously in a few places. Clean this up wherever the transaction will suffice. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
7dbddbac |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: drop dop param from xfs_defer_op_type ->finish_item() callback The dfops infrastructure ->finish_item() callback passes the transaction and dfops as separate parameters. Since dfops is always part of a transaction, the latter parameter is no longer necessary. Remove it from the various callbacks. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
a8198666 |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: automatic dfops inode relogging Inodes that are held across deferred operations are explicitly joined to the dfops structure to ensure appropriate relogging. While inodes are currently joined explicitly, we can detect the conditions that require relogging at dfops finish time by inspecting the transaction item list for inodes with ili_lock_flags == 0. Replace the xfs_defer_ijoin() infrastructure with such detection and automatic relogging of held inodes. This eliminates the need for the per-dfops inode list, replaced by an on-stack variant in xfs_defer_trans_roll(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
82ff27bc |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: automatic dfops buffer relogging Buffers that are held across deferred operations are explicitly joined to the dfops structure to ensure appropriate relogging. While buffers are currently joined explicitly, we can detect the conditions that require relogging at dfops finish time by inspecting the transaction item list for held buffers. Replace the xfs_defer_bjoin() infrastructure with such detection and automatic relogging of held buffers. This eliminates the need for the per-dfops buffer list, replaced by an on-stack variant in xfs_defer_trans_roll(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
1214f1cf |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: replace dop_low with transaction flag The dop_low field enables the low free space allocation mode when a previous allocation has detected difficulty allocating blocks. It has historically been part of the xfs_defer_ops structure, which means if enabled, it remains enabled across a set of transactions until the deferred operations have completed and the dfops is reset. Now that the dfops is embedded in the transaction, we can save a bit more space by using a transaction flag rather than a standalone boolean. Drop the ->dop_low field and replace it with a transaction flag that is set at the same points, carried across rolling transactions and cleared on completion of deferred operations. This essentially emulates the behavior of ->dop_low and so should not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
ce356d64 |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: pass transaction to dfops reset/move helpers All callers pass ->t_dfops of the associated transactions. Refactor the helpers to receive the transactions and facilitate further cleanups between xfs_defer_ops and xfs_trans. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
7279aa13 |
|
01-Aug-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove unused __xfs_defer_cancel() internal helper With no more external dfops users, there is no need for an xfs_defer_ops cancel wrapper. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
b277c37f |
|
24-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: bypass final dfops roll in trans commit path Once xfs_defer_finish() has completed all deferred operations, it checks the dirty state of the transaction and rolls it once more to return a clean transaction for the caller. This primarily to cover the case where repeated xfs_defer_finish() calls are made in a loop and we need to make sure that the caller starts the next iteration with a clean transaction. Otherwise we risk transaction reservation overrun. This final transaction roll is not required in the transaction commit path, however, because the transaction is immediately committed and freed after dfops completion. Refactor the final roll into a separate helper such that we can avoid it in the transaction commit path. Lift the dfops reset as well so dfops remains valid until after the last call to xfs_defer_trans_roll(). The reset is also unnecessary in the transaction commit path because the transaction is about to complete. This eliminates unnecessary regrants of transactions where the associated transaction roll can be replaced by a transaction commit. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
9e28a242 |
|
24-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: drop unnecessary xfs_defer_finish() dfops parameter Every caller of xfs_defer_finish() now passes the transaction and its associated ->t_dfops. The xfs_defer_ops parameter is therefore no longer necessary and can be removed. Since most xfs_defer_finish() callers also have to consider xfs_defer_cancel() on error, update the latter to also receive the transaction for consistency. The log recovery code contains an outlier case that cancels a dfops directly without an available transaction. Retain an internal wrapper to support this outlier case for the time being. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e021a2e5 |
|
24-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: support embedded dfops in transaction The dfops structure used by multi-transaction operations is typically stored on the stack and carried around by the associated transaction. The lifecycle of dfops does not quite match that of the transaction, but they are tightly related in that the former depends on the latter. The relationship of these objects is tight enough that we can avoid the cumbersome boilerplate code required in most cases to manage them separately by just embedding an xfs_defer_ops in the transaction itself. This means that a transaction allocation returns with an initialized dfops, a transaction commit finishes pending deferred items before the tx commit, a transaction cancel cancels the dfops before the transaction and a transaction dup operation transfers the current dfops state to the new transaction. The dup operation is slightly complicated by the fact that we can no longer just copy a dfops pointer from the old transaction to the new transaction. This is solved through a dfops move helper that transfers the pending items and other dfops state across the transactions. This also requires that transaction rolling code always refer to the transaction for the current dfops reference. Finally, to facilitate incremental conversion to the internal dfops and continue to support the current external dfops mode of operation, create the new ->t_dfops_internal field with a layer of indirection. On allocation, ->t_dfops points to the internal dfops. This state is overridden by callers who re-init a local dfops on the transaction. Once ->t_dfops is overridden, the external dfops reference is maintained as the transaction rolls. This patch adds the fundamental ability to support an internal dfops. All codepaths that perform deferred processing continue to override the internal dfops until they are converted over in subsequent patches. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
509308b4 |
|
24-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: reset dfops to initial state after finish xfs_defer_init() is currently used in two particular situations. The first and most obvious case is raw initialization of an xfs_defer_ops struct. The other case is partial reinit of xfs_defer_ops on reuse due to iteration. Most instances of the first case will be replaced by a single init of a dfops embedded in the transaction. Init calls are still technically required for the second case because the dfops may have low space mode enabled or have joined items that need to be reset before the dfops should be reused. Since the current dfops usage expects either a final transaction commit after xfs_defer_finish() or xfs_defer_init() if dfops is to be reused, we can shift some of the init logic into xfs_defer_finish() such that the latter returns with a reinitialized dfops. This eliminates the second dependency noted above such that a dfops is immediately ready for reuse after an xfs_defer_finish() without the need to change any calling code. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
83200bfa |
|
24-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove unused deferred ops committed field dop_committed is set when deferred item processing rolls the transaction at least once, but is only ever accessed in tracepoints. The transaction roll/commit events are already available via independent tracepoints, so remove the otherwise unused field. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
03f4e4b2 |
|
24-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: make deferred processing safe for embedded dfops xfs_defer_finish() has a couple quirks that are not safe with respect to the upcoming internal dfops functionality. First, xfs_defer_finish() attaches the passed in dfops structure to ->t_dfops and caches and restores the original value. Second, it continues to use the initial dfops reference before and after the transaction roll. These behaviors assume that dop is an independent memory allocation from the transaction itself, which may not always be true once transactions begin to use an embedded dfops structure. In the latter model, dfops processing creates a new xfs_defer_ops structure with each transaction and the associated state is migrated across to the new transaction. Fix up xfs_defer_finish() to handle the possibility of the current dfops changing after a transaction roll. Since ->t_dfops is used unconditionally in this path, it is no longer necessary to attach/restore ->t_dfops and pass it explicitly down to xfs_defer_trans_roll(). Update dop in the latter function and the caller to ensure that it always refers to the current dfops structure. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
81b549aa |
|
19-Jul-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: return from _defer_finish with a clean transaction The following assertion was seen on generic/051: XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:102! invalid opcode: 0000 [#1] SMP PTI CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4 RIP: 0010:assfail+0x23/0x30 Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000 RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000 R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014 FS: 00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0 Call Trace: xfs_defer_init+0xff/0x160 xfs_reflink_remap_extent+0x31b/0xa00 xfs_reflink_remap_blocks+0xec/0x4a0 xfs_reflink_remap_range+0x3a1/0x650 xfs_file_dedupe_range+0x39/0x50 vfs_dedupe_file_range+0x218/0x260 do_vfs_ioctl+0x262/0x6a0 ? __se_sys_newfstat+0x3c/0x60 ksys_ioctl+0x35/0x60 __x64_sys_ioctl+0x11/0x20 do_syscall_64+0x4b/0x190 entry_SYSCALL_64_after_hwframe+0x49/0xbe The root cause of the assertion failure is that xfs_defer_finish doesn't roll the transaction after processing all the deferred items. Therefore it returns a dirty transaction to the caller, which leaves the caller at risk of exceeding the transaction reservation if it logs more items. Brian Foster's patchset to move the defer_ops firstblock into the transaction requires t_firstblock == NULLFSBLOCK upon defer_ops initialization, which is how this was noticed at all. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
5fdd9794 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: remove xfs_defer_init() firstblock param All but one caller of xfs_defer_init() passes in the ->t_firstblock of the associated transaction. The one outlier is xlog_recover_process_intents(), which simply passes a dummy value because a valid pointer is required. This firstblock variable can simply be removed. At this point we could remove the xfs_defer_init() firstblock parameter and initialize ->t_firstblock directly. Even that is not necessary, however, because ->t_firstblock is automatically reinitialized in the new transaction on a transaction roll. Since xfs_defer_init() should never occur more than once on a particular transaction (since the corresponding finish will roll it), replace the reinit from xfs_defer_init() with an assert that verifies the transaction has a NULLFSBLOCK firstblock. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
bcd2c9f3 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: refactor dfops init to attach to transaction Most callers of xfs_defer_init() immediately attach the dfops structure to a transaction. Add a transaction parameter to eliminate much of this boilerplate code. This also helps self-document the fact that many codepaths now expect a dfops pointer implicitly via xfs_trans->t_dfops. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
6aa67184 |
|
11-Jul-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: rename xfs_trans ->t_agfl_dfops to ->t_dfops The ->t_agfl_dfops field is currently used to defer agfl block frees from associated transaction contexts. While all known problematic contexts have already been updated to use ->t_agfl_dfops, the broader goal is defer agfl frees from all callers that already use a deferred operations structure. Further, the transaction field facilitates a good amount of code clean up where the transaction and dfops have historically been passed down through the stack separately. Rename the field to something more generic to prepare to use it as such throughout XFS. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0b61f8a4 |
|
05-Jun-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert to SPDX license tags Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e632a569 |
|
09-May-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: adder caller IP to xfs_defer* tracepoints So it's clear in the trace where they are being called from. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
2bc5eba8 |
|
07-May-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: defer agfl block frees from deferred ops processing context Now that AGFL block frees are deferred when dfops is set in the transaction, start deferring AGFL block frees from contexts that are known to push the limits of existing log reservations. The first such context is deferred operation processing itself. This primarily targets deferred extent frees (such as file extents and inode chunks), but in doing so covers all allocation operations that occur in deferred operation processing context. Update xfs_defer_finish() to set and reset ->t_agfl_dfops across the processing sequence. This means that any AGFL block frees due to allocation events result in the addition of new EFIs to the dfops rather than being processed immediately. xfs_defer_finish() rolls the transaction at least once more to process the frees of the AGFL blocks back to the allocation btrees and returns once the AGFL is rectified. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
b7b2846f |
|
07-Dec-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: add the ability to join a held buffer to a defer_ops In certain cases, defer_ops callers will lock a buffer and want to hold the lock across transaction rolls. Similar to ijoined inodes, we want to dirty & join the buffer with each transaction roll in defer_finish so that afterwards the caller still owns the buffer lock and we haven't inadvertently pinned the log. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
8ad7c629 |
|
28-Aug-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the ip argument to xfs_defer_finish And instead require callers to explicitly join the inode using xfs_defer_ijoin. Also consolidate the defer error handling in a few places using a goto label. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
882d8785 |
|
28-Aug-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: rename xfs_defer_join to xfs_defer_ijoin Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
411350df |
|
28-Aug-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: refactor xfs_trans_roll Split xfs_trans_roll into a low-level helper that just rolls the actual transaction and a new higher level xfs_trans_roll_inode that takes care of logging and rejoining the inode. This gets rid of the NULL inode case, and allows to simplify the special cases in the deferred operation code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
b77428b1 |
|
23-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: defer should abort intent items if the trans roll fails If the deferred ops transaction roll fails, we need to abort the intent items if we haven't already logged a done item for it, regardless of whether or not the deferred ops has had a transaction committed. Dave found this while running generic/388. Move the tracepoint to make it easier to track object lifetimes. Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
385d6558 |
|
18-Sep-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: defer should allow ->finish_item to request a new transaction When xfs_defer_finish calls ->finish_item, it's possible that (refcount) won't be able to finish all the work in a single transaction. When this happens, the ->finish_item handler should shorten the log done item's list count, update the work item to reflect where work should continue, and return -EAGAIN so that defer_finish knows to retain the pending item on the pending list, roll the transaction, and restart processing where we left off. Plumb in the code and document how this mechanism is supposed to work. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
ea78d808 |
|
29-Aug-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: track log done items directly in the deferred pending work item Christoph reports slab corruption when a deferred refcount update aborts during _defer_finish(). The cause of this was broken log item state tracking in xfs_defer_pending -- upon an abort, _defer_trans_abort() will call abort_intent on all intent items, including the ones that have already had a done item attached. This is incorrect because each intent item has 2 refcount: the first is released when the intent item is committed to the log; and the second is released when the _done_ item is committed to the log, or by the intent creator if there is no done item. In other words, once we log the done item, responsibility for releasing the intent item's second refcount is transferred to the done item and /must not/ be performed by anything else. The dfp_committed flag should have been tracking whether or not we had a done item so that _defer_trans_abort could decide if it needs to abort the intent item, but due to a thinko this was not the case. Rip it out and track the done item directly so that we do the right thing w.r.t. intent item freeing. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
3cd48abc |
|
02-Aug-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: add tracepoints for the deferred ops mechanism Add tracepoints for the internals of the deferred ops mechanism and tracepoint classes for clients of the dops, to make debugging easier. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
4e0cc29b |
|
02-Aug-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: move deferred operations into a separate file All the code around struct xfs_bmap_free basically implements a deferred operation framework through which we can roll transactions (to unlock buffers and avoid violating lock order rules) while managing all the necessary log redo items. Previously we only used this code to free extents after some sort of mapping operation, but with the advent of rmap and reflink, we suddenly need to do more than that. With that in mind, xfs_bmap_free really becomes a deferred ops control structure. Rename the structure and move the deferred ops into their own file to avoid further bloating of the bmap code. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|