History log of /linux-master/fs/xfs/libxfs/xfs_bmap_btree.c
Revision Date Author Comments
# 5ef819c3 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: rename btree helpers that depends on the block number representation

All these helpers hardcode fsblocks or agblocks and not just the pointer
size. Rename them so that the names are still fitting when we add the
long format in-memory blocks and adjust the checks when calling them to
check the btree types and not just pointer length.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# ec793e69 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: remove xfs_btnum_t

The last checks for bc_btnum can be replaced with helpers that check
the btree ops. This allows adding new btrees to XFS without having
to update a global enum.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: complete the ops predicates]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 77953b97 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: add a name field to struct xfs_btree_ops

The btnum in struct xfs_btree_ops is often used for printing a symbolic
name for the btree. Add a name field to the ops structure and use that
directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 02f7ebf5 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: remove xfs_bmbt_stage_cursor

Just open code the two calls in the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 802f91f7 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: fold xfs_bmbt_init_common into xfs_bmbt_init_cursor

Make the levels initialization in xfs_bmbt_init_cursor conditional
and merge the two helpers.

This requires the fakeroot case to now pass a -1 whichfork directly
into xfs_bmbt_init_cursor, and some special casing for that, but
at least this scheme to deal with the fake btree root is handled and
documented in once place now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: tidy up a multline ternary]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 42e357c8 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: make staging file forks explicit

Don't open-code "-1" for whichfork when we're creating a staging btree
for a repair; let's define an actual symbol to make grepping and
understanding easier.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 579d7022 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: make full use of xfs_btree_stage_ifakeroot in xfs_bmbt_stage_cursor

Remove the duplicate cur->bc_nlevels assignment in xfs_bmbt_stage_cursor,
and move the cur->bc_ino.forksize assignment into
xfs_btree_stage_ifakeroot as it is part of setting up the fake btree
root.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 2b9e7f26 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: don't override bc_ops for staging btrees

Add a few conditionals for staging btrees to the core btree code instead
of overloading the bc_ops vector.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# f9c18129 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: add a xfs_btree_init_ptr_from_cur

Inode-rooted btrees don't need to initialize the root pointer in the
->init_ptr_from_cur method as the root is found by the
xfs_btree_get_iroot method later. Make ->init_ptr_from_cur option
for inode rooted btrees by providing a helper that does the right
thing for the given btree type and also documents the semantics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 88ee2f48 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: split the per-btree union in struct xfs_btree_cur

Split up the union that encodes btree-specific fields in struct
xfs_btree_cur. Most fields in there are specific to the btree type
encoded in xfs_btree_ops.type, and we can use the obviously named union
for that. But one field is specific to the bmapbt and two are shared by
the refcount and rtrefcountbt. Move those to a separate union to make
the usage clear and not need a separate struct for the refcount-related
fields.

This will also make unnecessary some very awkward btree cursor
refc/rtrefc switching logic in the rtrefcount patchset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 4f0cd5a5 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: split out a btree type from the btree ops geometry flags

Two of the btree cursor flags are always used together and encode
the fundamental btree type. There currently are two such types:

1) an on-disk AG-rooted btree with 32-bit pointers
2) an on-disk inode-rooted btree with 64-bit pointers

and we're about to add:

3) an in-memory btree with 64-bit pointers

Introduce a new enum and a new type field in struct xfs_btree_geom
to encode this type directly instead of using flags and change most
code to switch on this enum.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: make the pointer lengths explicit]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 1a9d2629 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: store the btree pointer length in struct xfs_btree_ops

Make the pointer length an explicit field in the btree operations
structure so that the next patch (which introduces an explicit btree
type enum) doesn't have to play a bunch of awkward games with inferring
the pointer length from the enumeration.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 07b7f2e3 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: move the btree stats offset into struct btree_ops

The statistics offset is completely static, move it into the btree_ops
structure instead of the cursor.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 90cfae81 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: move lru refs to the btree ops structure

Move the btree buffer LRU refcount to the btree ops structure so that we
can eliminate the last bc_btnum switch in the generic btree code. We're
about to create repair-specific btree types, and we don't want that
stuff cluttering up libxfs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 11388f65 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: remove the unnecessary daddr paramter to _init_block

Now that all of the callers pass XFS_BUF_DADDR_NULL as the daddr
parameter, we can elide that too.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 3c68858b 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: rename btree block/buffer init functions

Rename xfs_btree_init_block_int to xfs_btree_init_block, and
xfs_btree_init_block to xfs_btree_init_buf so that the name suggests the
type that caller are supposed to pass in.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# c87e3bf7 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: initialize btree blocks using btree_ops structure

Notice now that the btree ops structure encodes btree geometry flags and
the magic number through the buffer ops. Refactor the btree block
initialization functions to use the btree ops so that we no longer have
to open code all that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# d8d6df42 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: extern some btree ops structures

Expose these static btree ops structures so that we can reference them
in the AG initialization code in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# e9e66df8 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: remove bc_ino.flags

Just move the two flags into bc_flags where there is plenty of space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# fd9c7f77 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: encode the btree geometry flags in the btree ops structure

Certain btree flags never change for the life of a btree cursor because
they describe the geometry of the btree itself. Encode these in the
btree ops structure and reduce the amount of code required in each btree
type's init_cursor functions. This also frees up most of the bits in
bc_flags.

A previous version of this patch also converted the open-coded flags
logic to helpers. This was removed due to the pending refactoring (that
follows this patch) to eliminate most of the state flags.

Conversion script:

sed \
-e 's/XFS_BTREE_LONG_PTRS/XFS_BTGEO_LONG_PTRS/g' \
-e 's/XFS_BTREE_ROOT_IN_INODE/XFS_BTGEO_ROOT_IN_INODE/g' \
-e 's/XFS_BTREE_LASTREC_UPDATE/XFS_BTGEO_LASTREC_UPDATE/g' \
-e 's/XFS_BTREE_OVERLAPPING/XFS_BTGEO_OVERLAPPING/g' \
-e 's/cur->bc_flags & XFS_BTGEO_/cur->bc_ops->geom_flags \& XFS_BTGEO_/g' \
-i $(git ls-files fs/xfs/*.[ch] fs/xfs/libxfs/*.[ch] fs/xfs/scrub/*.[ch])

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# f9e325bf 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: drop XFS_BTREE_CRC_BLOCKS

All existing btree types set XFS_BTREE_CRC_BLOCKS when running against a
V5 filesystem. All currently proposed btree types are V5 only and use
the richer XFS_BTREE_CRC_BLOCKS format. Therefore, we can drop this
flag and change the conditional to xfs_has_crc.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 056d22c8 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: set the btree cursor bc_ops in xfs_btree_alloc_cursor

This is a precursor to putting more static data in the btree ops structure.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 8f71bede 15-Dec-2023 Darrick J. Wong <djwong@kernel.org>

xfs: repair inode fork block mapping data structures

Use the reverse-mapping btree information to rebuild an inode block map.
Update the btree bulk loading code as necessary to support inode rooted
btrees and fix some bitrot problems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 4c88fef3 06-Dec-2023 Darrick J. Wong <djwong@kernel.org>

xfs: remove __xfs_free_extent_later

xfs_free_extent_later is a trivial helper, so remove it to reduce the
amount of thinking required to understand the deferred freeing
interface. This will make it easier to introduce automatic reaping of
speculative allocations in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# b742d7b4 28-Jun-2023 Dave Chinner <dchinner@redhat.com>

xfs: use deferred frees for btree block freeing

Btrees that aren't freespace management trees use the normal extent
allocation and freeing routines for their blocks. Hence when a btree
block is freed, a direct call to xfs_free_extent() is made and the
extent is immediately freed. This puts the entire free space
management btrees under this path, so we are stacking btrees on
btrees in the call stack. The inobt, finobt and refcount btrees
all do this.

However, the bmap btree does not do this - it calls
xfs_free_extent_later() to defer the extent free operation via an
XEFI and hence it gets processed in deferred operation processing
during the commit of the primary transaction (i.e. via intent
chaining).

We need to change xfs_free_extent() to behave in a non-blocking
manner so that we can avoid deadlocks with busy extents near ENOSPC
in transactions that free multiple extents. Inserting or removing a
record from a btree can cause a multi-level tree merge operation and
that will free multiple blocks from the btree in a single
transaction. i.e. we can call xfs_free_extent() multiple times, and
hence the btree manipulation transaction is vulnerable to this busy
extent deadlock vector.

To fix this, convert all the remaining callers of xfs_free_extent()
to use xfs_free_extent_later() to queue XEFIs and hence defer
processing of the extent frees to a context that can be safely
restarted if a deadlock condition is detected.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>


# 7dfee17b 04-Jun-2023 Dave Chinner <dchinner@redhat.com>

xfs: validate block number being freed before adding to xefi

Bad things happen in defered extent freeing operations if it is
passed a bad block number in the xefi. This can come from a bogus
agno/agbno pair from deferred agfl freeing, or just a bad fsbno
being passed to __xfs_free_extent_later(). Either way, it's very
difficult to diagnose where a null perag oops in EFI creation
is coming from when the operation that queued the xefi has already
been completed and there's no longer any trace of it around....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 4a200a09 11-Apr-2023 Darrick J. Wong <djwong@kernel.org>

xfs: implement masked btree key comparisons for _has_records scans

For keyspace fullness scans, we want to be able to mask off the parts of
the key that we don't care about. For most btree types we /do/ want the
full keyspace, but for checking that a given space usage also has a full
complement of rmapbt records (even if different/multiple owners) we need
this masking so that we only track sparseness of rm_startblock, not the
whole keyspace (which is extremely sparse).

Augment the ->diff_two_keys and ->keys_contiguous helpers to take a
third union xfs_btree_key argument, and wire up xfs_rmap_has_records to
pass this through. This third "mask" argument should contain a nonzero
value in each structure field that should be used in the key comparisons
done during the scan.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 6abc7aef 11-Apr-2023 Darrick J. Wong <djwong@kernel.org>

xfs: replace xfs_btree_has_record with a general keyspace scanner

The current implementation of xfs_btree_has_record returns true if it
finds /any/ record within the given range. Unfortunately, that's not
sufficient for scrub. We want to be able to tell if a range of keyspace
for a btree is devoid of records, is totally mapped to records, or is
somewhere in between. By forcing this to be a boolean, we conflated
sparseness and fullness, which caused scrub to return incorrect results.
Fix the API so that we can tell the caller which of those three is the
current state.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 2a7f6d41 12-Feb-2023 Dave Chinner <dchinner@redhat.com>

xfs: use xfs_alloc_vextent_start_bno() where appropriate

Change obvious callers of single AG allocation to use
xfs_alloc_vextent_start_bno(). Callers no long need to specify
XFS_ALLOCTYPE_START_BNO, and so the type can be driven inward and
removed.

While doing this, also pass the allocation target fsb as a parameter
rather than encoding it in args->fsbno.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>


# 74c36a86 12-Feb-2023 Dave Chinner <dchinner@redhat.com>

xfs: use xfs_alloc_vextent_this_ag() where appropriate

Change obvious callers of single AG allocation to use
xfs_alloc_vextent_this_ag(). Drive the per-ag grabbing out to the
callers, too, so that callers with active references don't need
to do new lookups just for an allocation in a context that already
has a perag reference.

The only remaining caller that does single AG allocation through
xfs_alloc_vextent() is xfs_bmap_btalloc() with
XFS_ALLOCTYPE_NEAR_BNO. That is going to need more untangling before
it can be converted cleanly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>


# 692b6cdd 10-Feb-2023 Dave Chinner <dchinner@redhat.com>

xfs: t_firstblock is tracking AGs not blocks

The tp->t_firstblock field is now raelly tracking the highest AG we
have locked, not the block number of the highest allocation we've
made. It's purpose is to prevent AGF locking deadlocks, so rename it
to "highest AG" and simplify the implementation to just track the
agno rather than a fsbno.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>


# 36b6ad2d 10-Feb-2023 Dave Chinner <dchinner@redhat.com>

xfs: drop firstblock constraints from allocation setup

Now that xfs_alloc_vextent() does all the AGF deadlock prevention
filtering for multiple allocations in a single transaction, we no
longer need the allocation setup code to care about what AGs we
might already have locked.

Hence we can remove all the "nullfb" conditional logic in places
like xfs_bmap_btalloc() and instead have them focus simply on
setting up locality constraints. If the allocation fails due to
AGF lock filtering in xfs_alloc_vextent, then we just fall back as
we normally do to more relaxed allocation constraints.

As a result, any allocation that allows AG scanning (i.e. not
confined to a single AG) and does not force a worst case full
filesystem scan will now be able to attempt allocation from AGs
lower than that defined by tp->t_firstblock. This is because
xfs_alloc_vextent() allows try-locking of the AGFs and hence enables
low space algorithms to at least -try- to get space from AGs lower
than the one that we have currently locked and allocated from. This
is a significant improvement in the low space allocation algorithm.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>


# d5753847 10-Feb-2023 Dave Chinner <dchinner@redhat.com>

xfs: block reservation too large for minleft allocation

When we enter xfs_bmbt_alloc_block() without having first allocated
a data extent (i.e. tp->t_firstblock == NULLFSBLOCK) because we
are doing something like unwritten extent conversion, the transaction
block reservation is used as the minleft value.

This works for operations like unwritten extent conversion, but it
assumes that the block reservation is only for a BMBT split. THis is
not always true, and sometimes results in larger than necessary
minleft values being set. We only actually need enough space for a
btree split, something we already handle correctly in
xfs_bmapi_write() via the xfs_bmapi_minleft() calculation.

We should use xfs_bmapi_minleft() in xfs_bmbt_alloc_block() to
calculate the number of blocks a BMBT split on this inode is going to
require, not use the transaction block reservation that contains the
maximum number of blocks this transaction may consume in it...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>


# c01147d9 09-Jul-2022 Darrick J. Wong <djwong@kernel.org>

xfs: replace inode fork size macros with functions

Replace the shouty macros here with typechecked helper functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 732436ef 09-Jul-2022 Darrick J. Wong <djwong@kernel.org>

xfs: convert XFS_IFORK_PTR to a static inline helper

We're about to make this logic do a bit more, so convert the macro to a
static inline function for better typechecking and fewer shouty macros.
No functional changes here.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# df9ad5cc 16-Nov-2021 Chandan Babu R <chandan.babu@oracle.com>

xfs: Introduce macros to represent new maximum extent counts for data/attr forks

This commit defines new macros to represent maximum extent counts allowed by
filesystems which have support for large per-inode extent counters.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>


# c201d9ca 12-Oct-2021 Darrick J. Wong <djwong@kernel.org>

xfs: rename xfs_bmap_add_free to xfs_free_extent_later

xfs_bmap_add_free isn't a block mapping function; it schedules deferred
freeing operations for a later point in a compound transaction chain.
While it's primarily used by bunmapi, its use has expanded beyond that.
Move it to xfs_alloc.c and rename the function since it's now general
freeing functionality. Bring the slab cache bits in line with the
way we handle the other intent items.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>


# e7720afa 27-Sep-2021 Darrick J. Wong <djwong@kernel.org>

xfs: remove kmem_zone typedef

Remove these typedefs by referencing kmem_cache directly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>


# 9fa47bdc 23-Sep-2021 Darrick J. Wong <djwong@kernel.org>

xfs: use separate btree cursor cache for each btree type

Now that we have the infrastructure to track the max possible height of
each btree type, we can create a separate slab cache for cursors of each
type of btree. For smaller indices like the free space btrees, this
means that we can pack more cursors into a slab page, improving slab
utilization.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 0ed5f735 23-Sep-2021 Darrick J. Wong <djwong@kernel.org>

xfs: compute absolute maximum nlevels for each btree type

Add code for all five btree types so that we can compute the absolute
maximum possible btree height for each btree type. This is a setup for
the next patch, which makes every btree type have its own cursor cache.

The functions are exported so that we can have xfs_db report the
absolute maximum btree heights for each btree type, rather than making
everyone run their own ad-hoc computations.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# c940a0c5 16-Sep-2021 Darrick J. Wong <djwong@kernel.org>

xfs: dynamically allocate cursors based on maxlevels

To support future btree code, we need to be able to size btree cursors
dynamically for very large btrees. Switch the maxlevels computation to
use the precomputed values in the superblock, and create cursors that
can handle a certain height. For now, we retain the btree cursor cache
that can handle up to 9-level btrees, though a subsequent patch
introduces separate caches for each btree type, where each cache's
objects will be exactly tall enough to handle the specific btree type.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 56370ea6 16-Sep-2021 Darrick J. Wong <djwong@kernel.org>

xfs: refactor btree cursor allocation function

Refactor btree allocation to a common helper.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# cc411740 23-Sep-2021 Darrick J. Wong <djwong@kernel.org>

xfs: remove xfs_btree_cur.bc_blocklog

This field isn't used by anyone, so get rid of it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 04fcad80 18-Aug-2021 Dave Chinner <dchinner@redhat.com>

xfs: introduce xfs_buf_daddr()

Introduce a helper function xfs_buf_daddr() to extract the disk
address of the buffer from the struct xfs_buf. This will replace
direct accesses to bp->b_bn and bp->b_maps[0].bm_bn, as well as
the XFS_BUF_ADDR() macro.

This patch introduces the helper function and replaces all uses of
XFS_BUF_ADDR() as this is just a simple sed replacement.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# ebd9027d 18-Aug-2021 Dave Chinner <dchinner@redhat.com>

xfs: convert xfs_sb_version_has checks to use mount features

This is a conversion of the remaining xfs_sb_version_has..(sbp)
checks to use xfs_has_..(mp) feature checks.

This was largely done with a vim replacement macro that did:

:0,$s/xfs_sb_version_has\(.*\)&\(.*\)->m_sb/xfs_has_\1\2/g<CR>

A couple of other variants were also used, and the rest touched up
by hand.

$ size -t fs/xfs/built-in.a
text data bss dec hex filename
before 1127533 311352 484 1439369 15f689 (TOTALS)
after 1125360 311352 484 1437196 15ee0c (TOTALS)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 38c26bfd 18-Aug-2021 Dave Chinner <dchinner@redhat.com>

xfs: replace xfs_sb_version checks with feature flag checks

Convert the xfs_sb_version_hasfoo() to checks against
mp->m_features. Checks of the superblock itself during disk
operations (e.g. in the read/write verifiers and the to/from disk
formatters) are not converted - they operate purely on the
superblock state. Everything else should use the mount features.

Large parts of this conversion were done with sed with commands like
this:

for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do
sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f
done

With manual cleanups for things like "xfs_has_extflgbit" and other
little inconsistencies in naming.

The result is ia lot less typing to check features and an XFS binary
size reduced by a bit over 3kB:

$ size -t fs/xfs/built-in.a
text data bss dec hex filenam
before 1130866 311352 484 1442702 16038e (TOTALS)
after 1127727 311352 484 1439563 15f74b (TOTALS)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# deb06b9a 12-Aug-2021 Darrick J. Wong <djwong@kernel.org>

xfs: make the start pointer passed to btree alloc_block functions const

The @start pointer passed to each per-AG btree type's ->alloc_block
function isn't supposed to be modified, since it's a hint about the
location of the btree block being split that is to be fed to the
allocator, so mark the parameter const.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 22ece4e8 10-Aug-2021 Darrick J. Wong <djwong@kernel.org>

xfs: mark the record passed into xchk_btree functions as const

xchk_btree calls a user-supplied function to validate each btree record
that it finds. Those functions are not supposed to change the record
data, so mark the parameter const.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 8e38dc88 10-Aug-2021 Darrick J. Wong <djwong@kernel.org>

xfs: make the keys and records passed to btree inorder functions const

The inorder functions are simple predicates, which means that they don't
modify the parameters. Mark them all const.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 23825cd1 10-Aug-2021 Darrick J. Wong <djwong@kernel.org>

xfs: mark the record passed into btree init_key functions as const

These functions initialize a key from a record, but they aren't supposed
to modify the record. Mark it const.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# d29d5577 10-Aug-2021 Darrick J. Wong <djwong@kernel.org>

xfs: make the key parameters to all btree key comparison functions const

The btree key comparison functions are not allowed to change the keys
that are passed in, so mark them const. We'll need this for the next
patch, which adds const to the btree range query functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# db07349d 29-Mar-2021 Christoph Hellwig <hch@lst.de>

xfs: move the di_flags field to struct xfs_inode

In preparation of removing the historic icinode struct, move the flags
field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 6e73a545 29-Mar-2021 Christoph Hellwig <hch@lst.de>

xfs: move the di_nblocks field to struct xfs_inode

In preparation of removing the historic icinode struct, move the nblocks
field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 2e984bad 04-Dec-2020 Joseph Qi <joseph.qi@linux.alibaba.com>

xfs: remove unneeded return value check for *init_cursor()

Since *init_cursor() can always return a valid cursor, the NULL check
in caller is unneeded. So clean them up.
This also keeps the behavior consistent with other callers.

Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 32a2b11f 22-Jul-2020 Carlos Maiolino <cmaiolino@redhat.com>

xfs: Remove kmem_zone_zalloc() usage

Use kmem_cache_zalloc() directly.

With the exception of xlog_ticket_alloc() which will be dealt on the
next patch for readability.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# f7e67b20 18-May-2020 Christoph Hellwig <hch@lst.de>

xfs: move the fork format fields into struct xfs_ifork

Both the data and attr fork have a format that is stored in the legacy
idinode. Move it into the xfs_ifork structure instead, where it uses
up padding.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 8ef54797 10-Mar-2020 Dave Chinner <dchinner@redhat.com>

xfs: rename btree cursor private btree member flags

BPRV is not longer appropriate because bc_private is going away.
Script:

$ sed -i 's/BTCUR_BPRV/BTCUR_BMBT/g' fs/xfs/*[ch] fs/xfs/*/*[ch]

With manual cleanup to the definitions in fs/xfs/libxfs/xfs_btree.h

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: change "BC_BT" to "BTCUR_BMBT", fix subject line typo]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# 92219c29 10-Mar-2020 Dave Chinner <dchinner@redhat.com>

xfs: convert btree cursor inode-private member names

bc_private.b -> bc_ino conversion via script:

$ sed -i 's/bc_private\.b/bc_ino/g' fs/xfs/*[ch] fs/xfs/*/*[ch]

And then revert the change to the bc_ino #define in
fs/xfs/libxfs/xfs_btree.h manually.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: tweak the subject line slightly]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# b521c890 26-Aug-2019 Darrick J. Wong <darrick.wong@oracle.com>

xfs: fix sign handling problem in xfs_bmbt_diff_two_keys

In xfs_bmbt_diff_two_keys, we perform a signed int64_t subtraction with
two unsigned 64-bit quantities. If the second quantity is actually the
"maximum" key (all ones) as used in _query_all, the subtraction
effectively becomes addition of two positive numbers and the function
returns incorrect results. Fix this with explicit comparisons of the
unsigned values. Nobody needs this now, but the online repair patches
will need this to work properly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 250d4b4c 28-Jun-2019 Eric Sandeen <sandeen@sandeen.net>

xfs: remove unused header files

There are many, many xfs header files which are included but
unneeded (or included twice) in the xfs code, so remove them.

nb: xfs_linux.h includes about 9 headers for everyone, so those
explicit includes get removed by this. I'm not sure what the
preference is, but if we wanted explicit includes everywhere,
a followup patch could remove those xfs_*.h includes from
xfs_linux.h and move them into the files that need them.
Or it could be left as-is.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# dbd329f1 28-Jun-2019 Christoph Hellwig <hch@lst.de>

xfs: add struct xfs_mount pointer to struct xfs_buf

We need to derive the mount pointer from a buffer in a lot of place.
Add a direct pointer to short cut the pointer chasing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 39708c20 07-Feb-2019 Brian Foster <bfoster@redhat.com>

xfs: miscellaneous verifier magic value fixups

Most buffer verifiers have hardcoded magic value checks
conditionalized on the version of the filesystem. The magic value
field of the verifier structure facilitates abstraction of some of
this code. Populate the ->magic field of various verifiers to take
advantage of this abstraction. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 0f37d178 01-Aug-2018 Brian Foster <bfoster@redhat.com>

xfs: pass transaction to xfs_defer_add()

The majority of remaining references to struct xfs_defer_ops in XFS
are associated with xfs_defer_add(). At this point, there are no
more external xfs_defer_ops users left. All instances of
xfs_defer_ops are embedded in the transaction, which means we can
safely pass the transaction down to the dfops add interface.

Update xfs_defer_add() to receive the transaction as a parameter.
Various subsystems implement wrappers to allocate and construct the
context specific data structures for the associated deferred
operation type. Update these to also carry the transaction down as
needed and clean up unused dfops parameters along the way.

This removes most of the remaining references to struct
xfs_defer_ops throughout the code and facilitates removal of the
structure.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[darrick: fix unused variable warnings with ftrace disabled]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 1214f1cf 01-Aug-2018 Brian Foster <bfoster@redhat.com>

xfs: replace dop_low with transaction flag

The dop_low field enables the low free space allocation mode when a
previous allocation has detected difficulty allocating blocks. It
has historically been part of the xfs_defer_ops structure, which
means if enabled, it remains enabled across a set of transactions
until the deferred operations have completed and the dfops is reset.

Now that the dfops is embedded in the transaction, we can save a bit
more space by using a transaction flag rather than a standalone
boolean. Drop the ->dop_low field and replace it with a transaction
flag that is set at the same points, carried across rolling
transactions and cleared on completion of deferred operations. This
essentially emulates the behavior of ->dop_low and so should not
change behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 0b04b6b8 19-Jul-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: trivial xfs_btree_del_cursor cleanups

The error argument to xfs_btree_del_cursor already understands the
"nonzero for error" semantics, so remove pointless error testing in the
callers and pass it directly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 64396ff2 11-Jul-2018 Brian Foster <bfoster@redhat.com>

xfs: remove xfs_alloc_arg firstblock field

The xfs_alloc_arg.firstblock field is used to control the starting
agno for an allocation. The structure already carries a pointer to
the transaction, which carries the current firstblock value.

Remove the field and access ->t_firstblock directly in the
allocation code.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# cf612de7 11-Jul-2018 Brian Foster <bfoster@redhat.com>

xfs: remove xfs_btree_cur private firstblock field

The bmbt cursor private structure has a firstblock field that is
used to maintain locking order on bmbt allocations. The field holds
an actual firstblock value (as opposed to a pointer), so it is
initialized on cursor creation, updated on allocation and then the
value is transferred back to the source before the cursor is
destroyed.

This value is always transferred from and back to the ->t_firstblock
field. Since xfs_btree_cur already carries a reference to the
transaction, we can remove this field from xfs_btree_cur and the
associated copying. The bmbt allocations will update the value in
the transaction directly.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 42b394a9 11-Jul-2018 Brian Foster <bfoster@redhat.com>

xfs: remove xfs_btree_cur bmbt dfops field

All assignments of xfs_btree_cur.bc_private.b.dfops originate from
->t_dfops. Replace accesses of the former with the latter and remove
the unnecessary field. This patch does not change behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 0b61f8a4 05-Jun-2018 Dave Chinner <dchinner@redhat.com>

xfs: convert to SPDX license tags

Remove the verbose license text from XFS files and replace them
with SPDX tags. This does not change the license of any of the code,
merely refers to the common, up-to-date license files in LICENSES/

This change was mostly scripted. fs/xfs/Makefile and
fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
and modified by the following command:

for f in `git grep -l "GNU General" fs/xfs/` ; do
echo $f
cat $f | awk -f hdr.awk > $f.new
mv -f $f.new $f
done

And the hdr.awk script that did the modification (including
detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
is as follows:

$ cat hdr.awk
BEGIN {
hdr = 1.0
tag = "GPL-2.0"
str = ""
}

/^ \* This program is free software/ {
hdr = 2.0;
next
}

/any later version./ {
tag = "GPL-2.0+"
next
}

/^ \*\// {
if (hdr > 0.0) {
print "// SPDX-License-Identifier: " tag
print str
print $0
str=""
hdr = 0.0
next
}
print $0
next
}

/^ \* / {
if (hdr > 1.0)
next
if (hdr > 0.0) {
if (str != "")
str = str "\n"
str = str $0
next
}
print $0
next
}

/^ \*/ {
if (hdr > 0.0)
next
print $0
next
}

// {
if (hdr > 0.0) {
if (str != "")
str = str "\n"
str = str $0
next
}
print $0
}

END { }
$

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 14861c47 09-May-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: add helpers to calculate btree size

Add a bunch of helper functions that calculate the sizes of various
btrees. These will be used to repair btrees and btree headers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# e157ebdc 06-Mar-2018 Carlos Maiolino <cmaiolino@redhat.com>

Cleanup old XFS_BTREE_* traces

Remove unused legacy btree traces from IRIX era.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# b5572597 08-Jan-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: create a new buf_ops pointer to verify structure metadata

Expose all metadata structure buffer verifier functions via buf_ops.
These will be used by the online scrub mechanism to look for problems
with buffers that are already sitting around in memory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# bc1a09b8 08-Jan-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: refactor verifier callers to print address of failing check

Refactor the callers of verifiers to print the instruction address of a
failing check.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# a6a781a5 08-Jan-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: have buffer verifier functions report failing address

Modify each function that checks the contents of a metadata buffer to
return the instruction address of the failing test so that we can report
more precise failure errors to the log.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 31ca03c9 08-Jan-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: refactor xfs_verifier_error and xfs_buf_ioerror

Since all verification errors also mark the buffer as having an error,
we can combine these two calls. Later we'll add a xfs_failaddr_t
parameter to promote the idea of reporting corruption errors and the
address of the failing check to enable better debugging reports.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 8368a601 08-Jan-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: refactor long-format btree header verification routines

Create two helper functions to verify the headers of a long format
btree block. We'll use this later for the realtime rmapbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 59f6fec3 08-Jan-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: remove XFS_FSB_SANITY_CHECK

We already have a function to verify fsb pointers, so get rid of the
last users of the (less robust) macro.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 6bdcf26a 03-Nov-2017 Christoph Hellwig <hch@lst.de>

xfs: use a b+tree for the in-core extent list

Replace the current linear list and the indirection array for the in-core
extent list with a b+tree to avoid the need for larger memory allocations
for the indirection array when lots of extents are present. The current
extent list implementations leads to heavy pressure on the memory
allocator when modifying files with a high extent count, and can lead
to high latencies because of that.

The replacement is a b+tree with a few quirks. The leaf nodes directly
store the extent record in two u64 values. The encoding is a little bit
different from the existing in-core extent records so that the start
offset and length which are required for lookups can be retreived with
simple mask operations. The inner nodes store a 64-bit key containing
the start offset in the first half of the node, and the pointers to the
next lower level in the second half. In either case we walk the node
from the beginninig to the end and do a linear search, as that is more
efficient for the low number of cache lines touched during a search
(2 for the inner nodes, 4 for the leaf nodes) than a binary search.
We store termination markers (zero length for the leaf nodes, an
otherwise impossible high bit for the inner nodes) to terminate the key
list / records instead of storing a count to use the available cache
lines as efficiently as possible.

One quirk of the algorithm is that while we normally split a node half and
half like usual btree implementations we just spill over entries added at
the very end of the list to a new node on its own. This means we get a
100% fill grade for the common cases of bulk insertion when reading an
inode into memory, and when only sequentially appending to a file. The
downside is a slightly higher chance of splits on the first random
insertions.

Both insert and removal manually recurse into the lower levels, but
the bulk deletion of the whole tree is still implemented as a recursive
function call, although one limited by the overall depth and with very
little stack usage in every iteration.

For the first few extents we dynamically grow the list from a single
extent to the next powers of two until we have a first full leaf block
and that building the actual tree.

The code started out based on the generic lib/btree.c code from Joern
Engel based on earlier work from Peter Zijlstra, but has since been
rewritten beyond recognition.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 135dcc10 03-Nov-2017 Christoph Hellwig <hch@lst.de>

xfs: allow unaligned extent records in xfs_bmbt_disk_set_all

To make life a little simpler make xfs_bmbt_set_all unaligned access
aware so that we can use it directly on the destination buffer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# f0387501 17-Oct-2017 Christoph Hellwig <hch@lst.de>

xfs: remove xfs_bmbt_get_state

Unused after the big bmap refactor.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 9b150709 17-Oct-2017 Christoph Hellwig <hch@lst.de>

xfs: remove all xfs_bmbt_set_* helpers except for xfs_bmbt_set_all

Unused after the big bmap refactor.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# a67d00a5 17-Oct-2017 Christoph Hellwig <hch@lst.de>

xfs: pass a struct xfs_bmbt_irec to xfs_bmbt_update

Now that we've massaged the callers into the right form we can always
pass the actual extent record instead of the individual fields.

With that xfs_bmbt_disk_set_allf can go away, and xfs_bmbt_disk_set_all
can be merged into the former implementation of xfs_bmbt_disk_set_allf.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 99c794c6 29-Aug-2017 Brian Foster <bfoster@redhat.com>

xfs: skip bmbt block ino validation during owner change

Extent swap uses xfs_btree_visit_blocks() to fix up bmbt block
owners on v5 (!rmapbt) filesystems. The bmbt scan uses
xfs_btree_lookup_get_block() to read bmbt blocks which verifies the
current owner of the block against the parent inode of the bmbt.
This works during extent swap because the bmbt owners are updated to
the opposite inode number before the inode extent forks are swapped.

The modified bmbt blocks are marked as ordered buffers which allows
everything to commit in a single transaction. If the transaction
commits to the log and the system crashes such that recovery of the
extent swap is required, log recovery restarts the bmbt scan to fix
up any bmbt blocks that may have not been written back before the
crash. The log recovery bmbt scan occurs after the inode forks have
been swapped, however. This causes the bmbt block owner verification
to fail, leads to log recovery failure and requires xfs_repair to
zap the log to recover.

Define a new invalid inode owner flag to inform the btree block
lookup mechanism that the current inode may be invalid with respect
to the current owner of the bmbt block. Set this flag on the cursor
used for change owner scans to allow this operation to work at
runtime and during log recovery.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Fixes: bb3be7e7c ("xfs: check for bogus values in btree block headers")
Cc: stable@vger.kernel.org
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 118bb47e 16-Jun-2017 Darrick J. Wong <darrick.wong@oracle.com>

xfs: plumb in needed functions for range querying of various btrees

Plumb in the pieces (init_high_key, diff_two_keys) necessary to call
query_range on the inode space and block mapping btrees and to extract
raw btree records. This will eventually be used by the inobt and bmbt
scrubbers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# 38dee376 16-Jun-2017 Darrick J. Wong <darrick.wong@oracle.com>

xfs: always compile the btree inorder check functions

The btree record and key inorder check functions will be used by the
btree scrubber code, so make sure they're always built.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# c8ce540d 16-Jun-2017 Darrick J. Wong <darrick.wong@oracle.com>

xfs: remove double-underscore integer types

This is a purely mechanical patch that removes the private
__{u,}int{8,16,32,64}_t typedefs in favor of using the system
{u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform
the transformation and fix the resulting whitespace and indentation
errors:

s/typedef\t__uint8_t/typedef __uint8_t\t/g
s/typedef\t__uint/typedef __uint/g
s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
s/__uint8_t\t/__uint8_t\t\t/g
s/__uint/uint/g
s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
s/__int/int/g
/^typedef.*int[0-9]*_t;$/d

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 0c1d9e4a 20-Apr-2017 Christoph Hellwig <hch@lst.de>

xfs: simplify validation of the unwritten extent bit

XFS only supports the unwritten extent bit in the data fork, and only if
the file system has a version 5 superblock or the unwritten extent
feature bit.

We currently have two routines that validate the invariant:
xfs_check_nostate_extents which return -EFSCORRUPTED when it's not met,
and xfs_validate_extent that triggers and assert in debug build.

Both of them iterate over all extents of an inode fork when called,
which isn't very efficient.

This patch instead adds a new helper that verifies the invariant one
extent at a time, and calls it from the places where we iterate over
all extents to converted them from or two the in-memory format. The
callers then return -EFSCORRUPTED when reading invalid extents from
disk, or trigger an assert when writing them to disk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 7590632a 11-Apr-2017 Christoph Hellwig <hch@lst.de>

xfs: remove bmap block allocation retries

Now that reflink operations don't set the firstblock value we don't
need the workarounds for non-NULL firstblock values without a prior
allocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 2fcc319d 08-Mar-2017 Christoph Hellwig <hch@lst.de>

xfs: try any AG when allocating the first btree block when reflinking

When a reflink operation causes the bmap code to allocate a btree block
we're currently doing single-AG allocations due to having ->firstblock
set and then try any higher AG due a little reflink quirk we've put in
when adding the reflink code. But given that we do not have a minleft
reservation of any kind in this AG we can still not have any space in
the same or higher AG even if the file system has enough free space.
To fix this use a XFS_ALLOCTYPE_FIRST_AG allocation in this fall back
path instead.

[And yes, we need to redo this properly instead of piling hacks over
hacks. I'm working on that, but it's not going to be a small series.
In the meantime this fixes the customer reported issue]

Also add a warning for failing allocations to make it easier to debug.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# b6f41e44 28-Jan-2017 Eric Sandeen <sandeen@sandeen.net>

xfs: remove boilerplate around xfs_btree_init_block

Now that xfs_btree_init_block_int is able to determine crc
status from the passed-in mp, we can determine the proper
magic as well if we are given a btree number, rather than
an explicit magic value.

Change xfs_btree_init_block[_int] callers to pass in the
btree number, and let xfs_btree_init_block_int use the
xfs_magics array via the xfs_btree_magic macro to determine
which magic value is needed. This makes all of the
if (crc) / else stanzas identical, and the if/else can be
removed, leading to a single, common init_block call.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# f88ae46b 28-Jan-2017 Eric Sandeen <sandeen@sandeen.net>

xfs: glean crc status from mp not flags in xfs_btree_init_block_int

xfs_btree_init_block_int() can determine whether crcs are
in effect without the passed-in XFS_BTREE_CRC_BLOCKS flag;
the mp argument allows us to determine this from the
superblock. Remove the flag from callers, and use
xfs_sb_version_hascrc(&mp->m_sb) internally instead.

This removes one difference between the if & else cases
in the callers.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 255c5162 09-Jan-2017 Christoph Hellwig <hch@lst.de>

xfs: fix bogus minleft manipulations

We can't just set minleft to 0 when we're low on space - that's exactly
what we need minleft for: to protect space in the AG for btree block
allocations when we are low on free space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# b24a978c 08-Dec-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: use GPF_NOFS when allocating btree cursors

Use NOFS for allocating btree cursors, since they can be called
under the ilock.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 11ef38af 04-Dec-2016 Dave Chinner <dchinner@redhat.com>

xfs: make xfs btree stats less huge

Embedding a switch statement in every btree stats inc/add adds a lot
of code overhead to the core btree infrastructure paths. Stats are
supposed to be small and lightweight, but the btree stats have
become big and bloated as we've added more btrees. It needs fixing
because the reflink code will just add more overhead again.

Convert the v2 btree stats to arrays instead of independent
variables, and instead use the type to index the specific btree
array via an enum. This allows us to use array based indexing
to update the stats, rather than having to derefence variables
specific to the btree type.

If we then wrap the xfsstats structure in a union and place uint32_t
array beside it, and calculate the correct btree stats array base
array index when creating a btree cursor, we can easily access
entries in the stats structure without having to switch names based
on the btree type.

We then replace with the switch statement with a simple set of stats
wrapper macros, resulting in a significant simplification of the
btree stats code, and:

text data bss dec hex filename
48905 144 8 49057 bfa1 fs/xfs/libxfs/xfs_btree.o.old
36793 144 8 36945 9051 fs/xfs/libxfs/xfs_btree.o

it reduces the core btree infrastructure code size by close to 25%!

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 90e2056d 03-Oct-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: try other AGs to allocate a BMBT block

Prior to the introduction of reflink, allocating a block and mapping
it into a file was performed in a single transaction with a single
block reservation, and the allocator was supposed to find enough
blocks to allocate the extent and any BMBT blocks that might be
necessary (unless we're low on space).

However, due to the way copy on write works, allocation and mapping
have been split into two transactions, which means that we must be
able to handle the case where we allocate an extent for CoW but that
AG runs out of free space before the blocks can be mapped into a file,
and the mapping requires a new BMBT block. When this happens, look in
one of the other AGs for a BMBT block instead of taking the FS down.

The same applies to the functions that convert a data fork to extents
and later btree format.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 3993baeb 03-Oct-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: introduce the CoW fork

Introduce a new in-core fork for storing copy-on-write delalloc
reservations and allocated extents that are in the process of being
written out.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 973b8319 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: remove the get*keys and update_keys btree ops pointers

These are internal btree functions; we don't need them to be
dispatched via function pointers. Make them static again and
just check the overlapped flag to figure out what we need to
do. The strategy behind this patch was suggested by Christoph.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 340785cc 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: add owner field to extent allocation and freeing

For the rmap btree to work, we have to feed the extent owner
information to the the allocation and freeing functions. This
information is what will end up in the rmap btree that tracks
allocated extents. While we technically don't need the owner
information when freeing extents, passing it allows us to validate
that the extent we are removing from the rmap btree actually
belonged to the owner we expected it to belong to.

We also define a special set of owner values for internal metadata
that would otherwise have no owner. This allows us to tell the
difference between metadata owned by different per-ag btrees, as
well as static fs metadata (e.g. AG headers) and internal journal
blocks.

There are also a couple of special cases we need to take care of -
during EFI recovery, we don't actually know who the original owner
was, so we need to pass a wildcard to indicate that we aren't
checking the owner for validity. We also need special handling in
growfs, as we "free" the space in the last AG when extending it, but
because it's new space it has no actual owner...

While touching the xfs_bmap_add_free() function, re-order the
parameters to put the struct xfs_mount first.

Extend the owner field to include both the owner type and some sort
of index within the owner. The index field will be used to support
reverse mappings when reflink is enabled.

When we're freeing extents from an EFI, we don't have the owner
information available (rmap updates have their own redo items).
xfs_free_extent therefore doesn't need to do an rmap update. Make
sure that the log replay code signals this correctly.

This is based upon a patch originally from Dave Chinner. It has been
extended to add more owner information with the intent of helping
recovery operations when things go wrong (e.g. offset of user data
block in a file).

[dchinner: de-shout the xfs_rmap_*_owner helpers]
[darrick: minor style fixes suggested by Christoph Hellwig]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 2c3234d1 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: rename flist/free_list to dfops

Mechanical change of flist/free_list to dfops, since they're now
deferred ops, not just a freeing list.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 3ab78df2 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: rework xfs_bmap_free callers to use xfs_defer_ops

Restructure everything that used xfs_bmap_free to use xfs_defer_ops
instead. For now we'll just remove the old symbols and play some
cpp magic to make it work; in the next patch we'll actually rename
everything.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 70b22659 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: add function pointers for get/update keys to the btree

Add some function pointers to bc_ops to get the btree keys for
leaf and node blocks, and to update parent keys of a block.
Convert the _btree_updkey calls to use our new pointer, and
modify the tree shape changing code to call the appropriate
get_*_keys pointer instead of _btree_copy_keys because the
overlapping btree has to calculate high key values.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# e5821e57 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: during btree split, save new block key & ptr for future insertion

When a btree block has to be split, we pass the new block's ptr from
xfs_btree_split() back to xfs_btree_insert() via a pointer parameter;
however, we pass the block's key through the cursor's record. It is a
little weird to "initialize" a record from a key since the non-key
attributes will have garbage values.

When we go to add support for interval queries, we have to be able to
pass the lowest and highest keys accessible via a pointer. There's no
clean way to pass this back through the cursor's record field.
Therefore, pass the key directly back to xfs_btree_insert() the same
way that we pass the btree_ptr.

As a bonus, we no longer need init_rec_from_key and can drop it from the
codebase.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 59bad075 20-Jun-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: rearrange xfs_bmap_add_free parameters

This is already in xfsprogs' libxfs, so port it to the kernel.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# a7e5d03b 01-Mar-2016 Christoph Hellwig <hch@lst.de>

xfs: remove xfs_trans_get_block_res

Just use the t_blk_res field directly instead of obsfucating the reference
by a macro.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# edfd9dd5 07-Feb-2016 Christoph Hellwig <hch@lst.de>

xfs: move buffer invalidation to xfs_btree_free_block

... instead of leaving it in the methods.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 233135b7 03-Jan-2016 Eric Sandeen <sandeen@redhat.com>

xfs: print name of verifier if it fails

This adds a name to each buf_ops structure, so that if
a verifier fails we can print the type of verifier that
failed it. Should be a slight debugging aid, I hope.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# ce748eaa 28-Jul-2015 Eric Sandeen <sandeen@sandeen.net>

xfs: create new metadata UUID field and incompat flag

This adds a new superblock field, sb_meta_uuid. If set, along with
a new incompat flag, the code will use that field on a V5 filesystem
to compare to metadata UUIDs, which allows us to change the user-
visible UUID at will. Userspace handles the setting and clearing
of the incompat flag as appropriate, as the UUID gets changed; i.e.
setting the user-visible UUID back to the original UUID (as stored in
the new field) will remove the incompatible feature flag.

If the incompat flag is not set, this copies the user-visible UUID into
into the meta_uuid slot in memory when the superblock is read from disk;
the meta_uuid field is not written back to disk in this case.

The remainder of this patch simply switches verifiers, initializers,
etc to use the new sb_meta_uuid field.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# bb58e618 27-Nov-2014 Christoph Hellwig <hch@lst.de>

xfs: move most of xfs_sb.h to xfs_format.h

More on-disk format consolidation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 4fb6e8ad 27-Nov-2014 Christoph Hellwig <hch@lst.de>

xfs: merge xfs_ag.h into xfs_format.h

More on-disk format consolidation. A few declarations that weren't on-disk
format related move into better suitable spots.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 6d3ebaae 27-Nov-2014 Christoph Hellwig <hch@lst.de>

xfs: merge xfs_dinode.h into xfs_format.h

More consolidatation for the on-disk format defintions. Note that the
XFS_IS_REALTIME_INODE moves to xfs_linux.h instead as it is not related
to the on disk format, but depends on a CONFIG_ option.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# d5cf09ba 29-Jul-2014 Christoph Hellwig <hch@lst.de>

xfs: require 64-bit sector_t

Trying to support tiny disks only and saving a bit memory might have
made sense on an SGI O2 15 years ago, but is pretty pointless today.

Remove the rarely tested codepath that uses various smaller in-memory
types to reduce our test matrix and make the codebase a little bit
smaller and less complicated.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 2451337d 24-Jun-2014 Dave Chinner <dchinner@redhat.com>

xfs: global error sign conversion

Convert all the errors the core XFs code to negative error signs
like the rest of the kernel and remove all the sign conversion we
do in the interface layers.

Errors for conversion (and comparison) found via searches like:

$ git grep " E" fs/xfs
$ git grep "return E" fs/xfs
$ git grep " E[A-Z].*;$" fs/xfs

Negation points found via searches like:

$ git grep "= -[a-z,A-Z]" fs/xfs
$ git grep "return -[a-z,A-D,F-Z]" fs/xfs
$ git grep " -[a-z].*;" fs/xfs

[ with some bits I missed from Brian Foster ]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 30f712c9 24-Jun-2014 Dave Chinner <dchinner@redhat.com>

libxfs: move source files

Move all the source files that are shared with userspace into
libxfs/. This is done as one big chunk simpy to get it done
quickly

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>