History log of /linux-master/fs/xfs/libxfs/xfs_btree.c
Revision Date Author Comments
# a095686a 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: support in-memory btrees

Adapt the generic btree cursor code to be able to create a btree whose
buffers come from a (presumably in-memory) buftarg with a header block
that's specific to in-memory btrees. We'll connect this to other parts
of online scrub in the next patches.

Note that in-memory btrees always have a block size matching the system
memory page size for efficiency reasons. There are also a few things we
need to do to finalize a btree update; that's covered in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 8c1771c4 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: add a xfs_btree_ptrs_equal helper

This only has a single caller and thus might be a bit questionable,
but I think it really improves the readability of
xfs_btree_visit_block.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 6a701eb8 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: move and rename xfs_btree_read_bufl

Despite its name, xfs_btree_read_bufl doesn't contain any btree-related
functionaliy and isn't used by the btree code. Move it to xfs_bmap.c,
hard code the refval and ops arguments and rename it to
xfs_bmap_read_buf.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 6324b00c 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: remove xfs_btree_reada_bufs

xfs_btree_reada_bufl just wraps xfs_btree_readahead and a agblock
to daddr conversion. Just open code it's three callsites in the
two callers (One of which isn't even btree related).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 5eec8fa3 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: remove xfs_btree_reada_bufl

xfs_btree_reada_bufl just wraps xfs_btree_readahead and a fsblock
to daddr conversion. Just open code it's two callsites in the only
caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 79e72304 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: factor out a __xfs_btree_check_lblock_hdr helper

This will allow sharing code with the in-memory block checking helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 5ef819c3 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: rename btree helpers that depends on the block number representation

All these helpers hardcode fsblocks or agblocks and not just the pointer
size. Rename them so that the names are still fitting when we add the
long format in-memory blocks and adjust the checks when calling them to
check the btree types and not just pointer length.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 4ce0c711 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: consolidate btree block verification

Add a __xfs_btree_check_block helper that can be called by the scrub code
to validate a btree block of any form, and move the duplicate error
handling code from xfs_btree_check_sblock and xfs_btree_check_lblock into
xfs_btree_check_block and thus remove these two helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# d477f174 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: tighten up validation of root block in inode forks

Check that root blocks that sit in the inode fork and thus have a NULL
bp don't have siblings.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# bd45019d 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: remove the crc variable in __xfs_btree_check_lblock

crc is only used once, just use the xfs_has_crc check directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 43be0919 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: misc cleanups for __xfs_btree_check_sblock

Remove the local crc variable that is only used once and remove the bp
NULL checking as it can't ever be NULL for short form blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 57982d6c 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: consolidate btree ptr checking

Merge xfs_btree_check_sptr and xfs_btree_check_lptr into a single
__xfs_btree_check_ptr that can be shared between xfs_btree_check_ptr
and the scrub code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 8b8ada97 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: simplify xfs_btree_check_lblock_siblings

Stop using xfs_btree_check_lptr in xfs_btree_check_lblock_siblings,
as it only duplicates the xfs_verify_fsbno call in the other leg of
if / else besides adding a tautological level check.

With this the cur and level arguments can be removed as they are
now unused.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 4bc94bf6 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: simplify xfs_btree_check_sblock_siblings

Stop using xfs_btree_check_sptr in xfs_btree_check_sblock_siblings,
as it only duplicates the xfs_verify_agbno call in the other leg of
if / else besides adding a tautological level check.

With this the cur and level arguments can be removed as they are
now unused.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# ec793e69 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: remove xfs_btnum_t

The last checks for bc_btnum can be replaced with helpers that check
the btree ops. This allows adding new btrees to XFS without having
to update a global enum.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: complete the ops predicates]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 77953b97 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: add a name field to struct xfs_btree_ops

The btnum in struct xfs_btree_ops is often used for printing a symbolic
name for the btree. Add a name field to the ops structure and use that
directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 2b9e7f26 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: don't override bc_ops for staging btrees

Add a few conditionals for staging btrees to the core btree code instead
of overloading the bc_ops vector.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# f9c18129 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: add a xfs_btree_init_ptr_from_cur

Inode-rooted btrees don't need to initialize the root pointer in the
->init_ptr_from_cur method as the root is found by the
xfs_btree_get_iroot method later. Make ->init_ptr_from_cur option
for inode rooted btrees by providing a helper that does the right
thing for the given btree type and also documents the semantics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# f73def90 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: create predicate to determine if cursor is at inode root level

Create a predicate to decide if the given cursor and level point to the
root block in the inode immediate area instead of a disk block, and get
rid of the open-coded logic everywhere.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 88ee2f48 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: split the per-btree union in struct xfs_btree_cur

Split up the union that encodes btree-specific fields in struct
xfs_btree_cur. Most fields in there are specific to the btree type
encoded in xfs_btree_ops.type, and we can use the obviously named union
for that. But one field is specific to the bmapbt and two are shared by
the refcount and rtrefcountbt. Move those to a separate union to make
the usage clear and not need a separate struct for the refcount-related
fields.

This will also make unnecessary some very awkward btree cursor
refc/rtrefc switching logic in the rtrefcount patchset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 4f0cd5a5 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: split out a btree type from the btree ops geometry flags

Two of the btree cursor flags are always used together and encode
the fundamental btree type. There currently are two such types:

1) an on-disk AG-rooted btree with 32-bit pointers
2) an on-disk inode-rooted btree with 64-bit pointers

and we're about to add:

3) an in-memory btree with 64-bit pointers

Introduce a new enum and a new type field in struct xfs_btree_geom
to encode this type directly instead of using flags and change most
code to switch on this enum.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: make the pointer lengths explicit]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 1a9d2629 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: store the btree pointer length in struct xfs_btree_ops

Make the pointer length an explicit field in the btree operations
structure so that the next patch (which introduces an explicit btree
type enum) doesn't have to play a bunch of awkward games with inferring
the pointer length from the enumeration.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 186f20c0 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: factor out a btree block owner check

Hoist the btree block owner check into a separate helper so that we
don't have an ugly multiline if statement.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 2054cf05 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: factor out a xfs_btree_owner helper

Split out a helper to calculate the owner for a given btree instead of
duplicating the logic in two places. While we're at it, make the
bc_ag/bc_ino switch logic depend on the correct geometry flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: break this up into two patches for the owner check]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 90cfae81 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: move lru refs to the btree ops structure

Move the btree buffer LRU refcount to the btree ops structure so that we
can eliminate the last bc_btnum switch in the generic btree code. We're
about to create repair-specific btree types, and we don't want that
stuff cluttering up libxfs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# ad065ef0 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: set btree block buffer ops in _init_buf

Set the btree block buffer ops in xfs_btree_init_buf since we already
have access to that information through the btree ops.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 11388f65 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: remove the unnecessary daddr paramter to _init_block

Now that all of the callers pass XFS_BUF_DADDR_NULL as the daddr
parameter, we can elide that too.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 7771f703 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: btree convert xfs_btree_init_block to xfs_btree_init_buf calls

Convert any place we call xfs_btree_init_block with a buffer to use the
_init_buf function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 3c68858b 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: rename btree block/buffer init functions

Rename xfs_btree_init_block_int to xfs_btree_init_block, and
xfs_btree_init_block to xfs_btree_init_buf so that the name suggests the
type that caller are supposed to pass in.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# c87e3bf7 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: initialize btree blocks using btree_ops structure

Notice now that the btree ops structure encodes btree geometry flags and
the magic number through the buffer ops. Refactor the btree block
initialization functions to use the btree ops so that we no longer have
to open code all that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# e9e66df8 22-Feb-2024 Christoph Hellwig <hch@lst.de>

xfs: remove bc_ino.flags

Just move the two flags into bc_flags where there is plenty of space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# fd9c7f77 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: encode the btree geometry flags in the btree ops structure

Certain btree flags never change for the life of a btree cursor because
they describe the geometry of the btree itself. Encode these in the
btree ops structure and reduce the amount of code required in each btree
type's init_cursor functions. This also frees up most of the bits in
bc_flags.

A previous version of this patch also converted the open-coded flags
logic to helpers. This was removed due to the pending refactoring (that
follows this patch) to eliminate most of the state flags.

Conversion script:

sed \
-e 's/XFS_BTREE_LONG_PTRS/XFS_BTGEO_LONG_PTRS/g' \
-e 's/XFS_BTREE_ROOT_IN_INODE/XFS_BTGEO_ROOT_IN_INODE/g' \
-e 's/XFS_BTREE_LASTREC_UPDATE/XFS_BTGEO_LASTREC_UPDATE/g' \
-e 's/XFS_BTREE_OVERLAPPING/XFS_BTGEO_OVERLAPPING/g' \
-e 's/cur->bc_flags & XFS_BTGEO_/cur->bc_ops->geom_flags \& XFS_BTGEO_/g' \
-i $(git ls-files fs/xfs/*.[ch] fs/xfs/libxfs/*.[ch] fs/xfs/scrub/*.[ch])

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# f9e325bf 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: drop XFS_BTREE_CRC_BLOCKS

All existing btree types set XFS_BTREE_CRC_BLOCKS when running against a
V5 filesystem. All currently proposed btree types are V5 only and use
the richer XFS_BTREE_CRC_BLOCKS format. Therefore, we can drop this
flag and change the conditional to xfs_has_crc.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 2ed0b2c7 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: consolidate btree block allocation tracepoints

Don't waste tracepoint segment memory on per-btree block allocation
tracepoints when we can do it from the generic btree code.

With this patch applied, two tracepoints are collapsed into one
tracepoint, with the following effects on objdump -hx xfs.ko output:

Before:

10 __tracepoints_ptrs 00000b38 0000000000000000 0000000000000000 001412f0 2**2
14 __tracepoints_strings 00005433 0000000000000000 0000000000000000 001689a0 2**5
29 __tracepoints 00010d30 0000000000000000 0000000000000000 0023fe00 2**5

After:

10 __tracepoints_ptrs 00000b34 0000000000000000 0000000000000000 001417b0 2**2
14 __tracepoints_strings 00005413 0000000000000000 0000000000000000 00168e80 2**5
29 __tracepoints 00010cd0 0000000000000000 0000000000000000 00240760 2**5

Column 3 is the section size in bytes; removing these two tracepoints
reduces the size of the ELF segments by 132 bytes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 78067b92 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: consolidate btree block freeing tracepoints

Don't waste memory on extra per-btree block freeing tracepoints when we
can do it from the generic btree code.

With this patch applied, two tracepoints are collapsed into one
tracepoint, with the following effects on objdump -hx xfs.ko output:

Before:

10 __tracepoints_ptrs 00000b3c 0000000000000000 0000000000000000 00140eb0 2**2
14 __tracepoints_strings 00005453 0000000000000000 0000000000000000 00168540 2**5
29 __tracepoints 00010d90 0000000000000000 0000000000000000 0023f5e0 2**5

After:

10 __tracepoints_ptrs 00000b38 0000000000000000 0000000000000000 001412f0 2**2
14 __tracepoints_strings 00005433 0000000000000000 0000000000000000 001689a0 2**5
29 __tracepoints 00010d30 0000000000000000 0000000000000000 0023fe00 2**5

Column 3 is the section size in bytes; removing these two tracepoints
reduces the size of the ELF segments by 132 bytes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 989d5ec3 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: report XFS_IS_CORRUPT errors to the health system

Whenever we encounter XFS_IS_CORRUPT failures, we should report that to
the health monitoring system for later reporting.

I started with this semantic patch and massaged everything until it
built:

@@
expression mp, test;
@@

- if (XFS_IS_CORRUPT(mp, test)) return -EFSCORRUPTED;
+ if (XFS_IS_CORRUPT(mp, test)) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; }

@@
expression mp, test;
identifier label, error;
@@

- if (XFS_IS_CORRUPT(mp, test)) { error = -EFSCORRUPTED; goto label; }
+ if (XFS_IS_CORRUPT(mp, test)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto label; }

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# a78d10f4 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: report btree block corruption errors to the health system

Whenever we encounter corrupt btree blocks, we should report that to the
health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# d4c75a1b 15-Jan-2024 Dave Chinner <dchinner@redhat.com>

xfs: convert remaining kmem_free() to kfree()

The remaining callers of kmem_free() are freeing heap memory, so
we can convert them directly to kfree() and get rid of kmem_free()
altogether.

This conversion was done with:

$ for f in `git grep -l kmem_free fs/xfs`; do
> sed -i s/kmem_free/kfree/ $f
> done
$

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>


# 9099cd38 15-Dec-2023 Darrick J. Wong <djwong@kernel.org>

xfs: repair refcount btrees

Reconstruct the refcount data from the rmap btree.

Link: https://docs.kernel.org/filesystems/xfs-online-fsck-design.html#case-study-rebuilding-the-space-reference-counts
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 26de6462 15-Dec-2023 Darrick J. Wong <djwong@kernel.org>

xfs: read leaf blocks when computing keys for bulkloading into node blocks

When constructing a new btree, xfs_btree_bload_node needs to read the
btree blocks for level N to compute the keyptrs for the blocks that will
be loaded into level N+1. The level N blocks must be formatted at that
point.

A subsequent patch will change the btree bulkloader to write new btree
blocks in 256K chunks to moderate memory consumption if the new btree is
very large. As a consequence of that, it's possible that the buffers
for lower level blocks might have been reclaimed by the time the node
builder comes back to the block.

Therefore, change xfs_btree_bload_node to read the lower level blocks
to handle the reclaimed buffer case. As a side effect, the read will
increase the LRU refs, which will bias towards keeping new btree buffers
in memory after the new btree commits.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 4a200a09 11-Apr-2023 Darrick J. Wong <djwong@kernel.org>

xfs: implement masked btree key comparisons for _has_records scans

For keyspace fullness scans, we want to be able to mask off the parts of
the key that we don't care about. For most btree types we /do/ want the
full keyspace, but for checking that a given space usage also has a full
complement of rmapbt records (even if different/multiple owners) we need
this masking so that we only track sparseness of rm_startblock, not the
whole keyspace (which is extremely sparse).

Augment the ->diff_two_keys and ->keys_contiguous helpers to take a
third union xfs_btree_key argument, and wire up xfs_rmap_has_records to
pass this through. This third "mask" argument should contain a nonzero
value in each structure field that should be used in the key comparisons
done during the scan.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 6abc7aef 11-Apr-2023 Darrick J. Wong <djwong@kernel.org>

xfs: replace xfs_btree_has_record with a general keyspace scanner

The current implementation of xfs_btree_has_record returns true if it
finds /any/ record within the given range. Unfortunately, that's not
sufficient for scrub. We want to be able to tell if a range of keyspace
for a btree is devoid of records, is totally mapped to records, or is
somewhere in between. By forcing this to be a boolean, we conflated
sparseness and fullness, which caused scrub to return incorrect results.
Fix the API so that we can tell the caller which of those three is the
current state.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# bd7e7951 11-Apr-2023 Darrick J. Wong <djwong@kernel.org>

xfs: refactor ->diff_two_keys callsites

Create wrapper functions around ->diff_two_keys so that we don't have to
remember what the return values mean, and adjust some of the code
comments to reflect the longtime code behavior. We're going to
introduce more uses of ->diff_two_keys in the next patch, so reduce the
cognitive load for readers by doing this refactoring now.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# ee5fe8ff 11-Apr-2023 Darrick J. Wong <djwong@kernel.org>

xfs: refactor converting btree irec to btree key

We keep doing these conversions to support btree queries, so refactor
this into a helper.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 692b6cdd 10-Feb-2023 Dave Chinner <dchinner@redhat.com>

xfs: t_firstblock is tracking AGs not blocks

The tp->t_firstblock field is now raelly tracking the highest AG we
have locked, not the block number of the highest allocation we've
made. It's purpose is to prevent AGF locking deadlocks, so rename it
to "highest AG" and simplify the implementation to just track the
agno rather than a fsbno.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>


# c85007e2 05-Feb-2023 Dave Chinner <dchinner@redhat.com>

xfs: don't use BMBT btree split workers for IO completion

When we split a BMBT due to record insertion, we offload it to a
worker thread because we can be deep in the stack when we try to
allocate a new block for the BMBT. Allocation can use several
kilobytes of stack (full memory reclaim, swap and/or IO path can
end up on the stack during allocation) and we can already be several
kilobytes deep in the stack when we need to split the BMBT.

A recent workload demonstrated a deadlock in this BMBT split
offload. It requires several things to happen at once:

1. two inodes need a BMBT split at the same time, one must be
unwritten extent conversion from IO completion, the other must be
from extent allocation.

2. there must be a no available xfs_alloc_wq worker threads
available in the worker pool.

3. There must be sustained severe memory shortages such that new
kworker threads cannot be allocated to the xfs_alloc_wq pool for
both threads that need split work to be run

4. The split work from the unwritten extent conversion must run
first.

5. when the BMBT block allocation runs from the split work, it must
loop over all AGs and not be able to either trylock an AGF
successfully, or each AGF is is able to lock has no space available
for a single block allocation.

6. The BMBT allocation must then attempt to lock the AGF that the
second task queued to the rescuer thread already has locked before
it finds an AGF it can allocate from.

At this point, we have an ABBA deadlock between tasks queued on the
xfs_alloc_wq rescuer thread and a locked AGF. i.e. The queued task
holding the AGF lock can't be run by the rescuer thread until the
task the rescuer thread is runing gets the AGF lock....

This is a highly improbably series of events, but there it is.

There's a couple of ways to fix this, but the easiest way to ensure
that we only punt tasks with a locked AGF that holds enough space
for the BMBT block allocations to the worker thread.

This works for unwritten extent conversion in IO completion (which
doesn't have a locked AGF and space reservations) because we have
tight control over the IO completion stack. It is typically only 6
functions deep when xfs_btree_split() is called because we've
already offloaded the IO completion work to a worker thread and
hence we don't need to worry about stack overruns here.

The other place we can be called for a BMBT split without a
preceeding allocation is __xfs_bunmapi() when punching out the
center of an existing extent. We don't remove extents in the IO
path, so these operations don't tend to be called with a lot of
stack consumed. Hence we don't really need to ship the split off to
a worker thread in these cases, either.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# c0f399ff 26-Dec-2022 Darrick J. Wong <djwong@kernel.org>

xfs: fix off-by-one error in xfs_btree_space_to_height

Lately I've been stress-testing extreme-sized rmap btrees by using the
(new) xfs_db bmap_inflate command to clone bmbt mappings billions of
times and then using xfs_repair to build new rmap and refcount btrees.
This of course is /much/ faster than actually FICLONEing a file billions
of times.

Unfortunately, xfs_repair fails in xfs_btree_bload_compute_geometry with
EOVERFLOW, which indicates that xfs_mount.m_rmap_maxlevels is not
sufficiently large for the test scenario. For a 1TB filesystem (~67
million AG blocks, 4 AGs) the btheight command reports:

$ xfs_db -c 'btheight -n 4400801200 -w min rmapbt' /dev/sda
rmapbt: worst case per 4096-byte block: 84 records (leaf) / 45 keyptrs (node)
level 0: 4400801200 records, 52390491 blocks
level 1: 52390491 records, 1164234 blocks
level 2: 1164234 records, 25872 blocks
level 3: 25872 records, 575 blocks
level 4: 575 records, 13 blocks
level 5: 13 records, 1 block
6 levels, 53581186 blocks total

The AG is sufficiently large to build this rmap btree. Unfortunately,
m_rmap_maxlevels is 5. Augmenting the loop in the space->height
function to report height, node blocks, and blocks remaining produces
this:

ht 1 node_blocks 45 blockleft 67108863
ht 2 node_blocks 2025 blockleft 67108818
ht 3 node_blocks 91125 blockleft 67106793
ht 4 node_blocks 4100625 blockleft 67015668
final height: 5

The goal of this function is to compute the maximum height btree that
can be stored in the given number of ondisk fsblocks. Starting with the
top level of the tree, each iteration through the loop adds the fanout
factor of the next level down until we run out of blocks. IOWs, maximum
height is achieved by using the smallest fanout factor that can apply
to that level.

However, the loop setup is not correct. Top level btree blocks are
allowed to contain fewer than minrecs items, so the computation is
incorrect because the first time through the loop it should be using a
fanout factor of 2. With this corrected, the above becomes:

ht 1 node_blocks 2 blockleft 67108863
ht 2 node_blocks 90 blockleft 67108861
ht 3 node_blocks 4050 blockleft 67108771
ht 4 node_blocks 182250 blockleft 67104721
ht 5 node_blocks 8201250 blockleft 66922471
final height: 6

Fixes: 9ec691205e7d ("xfs: compute the maximum height of the rmap btree when reflink enabled")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 732436ef 09-Jul-2022 Darrick J. Wong <djwong@kernel.org>

xfs: convert XFS_IFORK_PTR to a static inline helper

We're about to make this logic do a bit more, so convert the macro to a
static inline function for better typechecking and fewer shouty macros.
No functional changes here.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 0800169e 07-Jul-2022 Dave Chinner <dchinner@redhat.com>

xfs: Pre-calculate per-AG agbno geometry

There is a lot of overhead in functions like xfs_verify_agbno() that
repeatedly calculate the geometry limits of an AG. These can be
pre-calculated as they are static and the verification context has
a per-ag context it can quickly reference.

In the case of xfs_verify_agbno(), we now always have a perag
context handy, so we can store the AG length and the minimum valid
block in the AG in the perag. This means we don't have to calculate
it on every call and it can be inlined in callers if we move it
to xfs_ag.h.

Move xfs_ag_block_count() to xfs_ag.c because it's really a
per-ag function and not an XFS type function. We need a little
bit of rework that is specific to xfs_initialise_perag() to allow
growfs to calculate the new perag sizes before we've updated the
primary superblock during the grow (chicken/egg situation).

Note that we leave the original xfs_verify_agbno in place in
xfs_types.c as a static function as other callers in that file do
not have per-ag contexts so still need to go the long way. It's been
renamed to xfs_verify_agno_agbno() to indicate it takes both an agno
and an agbno to differentiate it from new function.

Future commits will make similar changes for other per-ag geometry
validation functions.

Further:

$ size --totals fs/xfs/built-in.a
text data bss dec hex filename
before 1483006 329588 572 1813166 1baaae (TOTALS)
after 1482185 329588 572 1812345 1ba779 (TOTALS)

This rework reduces the binary size by ~820 bytes, indicating
that much less work is being done to bounds check the agbno values
against on per-ag geometry information.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>


# a54f78de 26-May-2022 Darrick J. Wong <djwong@kernel.org>

xfs: don't leak btree cursor when insrec fails after a split

The recent patch to improve btree cycle checking caused a regression
when I rebased the in-memory btree branch atop the 5.19 for-next branch,
because in-memory short-pointer btrees do not have AG numbers. This
produced the following complaint from kmemleak:

unreferenced object 0xffff88803d47dde8 (size 264):
comm "xfs_io", pid 4889, jiffies 4294906764 (age 24.072s)
hex dump (first 32 bytes):
90 4d 0b 0f 80 88 ff ff 00 a0 bd 05 80 88 ff ff .M..............
e0 44 3a a0 ff ff ff ff 00 df 08 06 80 88 ff ff .D:.............
backtrace:
[<ffffffffa0388059>] xfbtree_dup_cursor+0x49/0xc0 [xfs]
[<ffffffffa029887b>] xfs_btree_dup_cursor+0x3b/0x200 [xfs]
[<ffffffffa029af5d>] __xfs_btree_split+0x6ad/0x820 [xfs]
[<ffffffffa029b130>] xfs_btree_split+0x60/0x110 [xfs]
[<ffffffffa029f6da>] xfs_btree_make_block_unfull+0x19a/0x1f0 [xfs]
[<ffffffffa029fada>] xfs_btree_insrec+0x3aa/0x810 [xfs]
[<ffffffffa029fff3>] xfs_btree_insert+0xb3/0x240 [xfs]
[<ffffffffa02cb729>] xfs_rmap_insert+0x99/0x200 [xfs]
[<ffffffffa02cf142>] xfs_rmap_map_shared+0x192/0x5f0 [xfs]
[<ffffffffa02cf60b>] xfs_rmap_map_raw+0x6b/0x90 [xfs]
[<ffffffffa0384a85>] xrep_rmap_stash+0xd5/0x1d0 [xfs]
[<ffffffffa0384dc0>] xrep_rmap_visit_bmbt+0xa0/0xf0 [xfs]
[<ffffffffa0384fb6>] xrep_rmap_scan_iext+0x56/0xa0 [xfs]
[<ffffffffa03850d8>] xrep_rmap_scan_ifork+0xd8/0x160 [xfs]
[<ffffffffa0385195>] xrep_rmap_scan_inode+0x35/0x80 [xfs]
[<ffffffffa03852ee>] xrep_rmap_find_rmaps+0x10e/0x270 [xfs]

I noticed that xfs_btree_insrec has a bunch of debug code that return
out of the function immediately, without freeing the "new" btree cursor
that can be returned when _make_block_unfull calls xfs_btree_split. Fix
the error return in this function to free the btree cursor.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 56486f30 26-May-2022 Dave Chinner <dchinner@redhat.com>

xfs: assert in xfs_btree_del_cursor should take into account error

xfs/538 on a 1kB block filesystem failed with this assert:

XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || xfs_is_shutdown(cur->bc_mp), file: fs/xfs/libxfs/xfs_btree.c, line: 448

The problem was that an allocation failed unexpectedly in
xfs_bmbt_alloc_block() after roughly 150,000 minlen allocation error
injections, resulting in an EFSCORRUPTED error being returned to
xfs_bmapi_write(). The error occurred on extent-to-btree format
conversion allocating the new root block:

RIP: 0010:xfs_bmbt_alloc_block+0x177/0x210
Call Trace:
<TASK>
xfs_btree_new_iroot+0xdf/0x520
xfs_btree_make_block_unfull+0x10d/0x1c0
xfs_btree_insrec+0x364/0x790
xfs_btree_insert+0xaa/0x210
xfs_bmap_add_extent_hole_real+0x1fe/0x9a0
xfs_bmapi_allocate+0x34c/0x420
xfs_bmapi_write+0x53c/0x9c0
xfs_alloc_file_space+0xee/0x320
xfs_file_fallocate+0x36b/0x450
vfs_fallocate+0x148/0x340
__x64_sys_fallocate+0x3c/0x70
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa

Why the allocation failed at this point is unknown, but is likely
that we ran the transaction out of reserved space and filesystem out
of space with bmbt blocks because of all the minlen allocations
being done causing worst case fragmentation of a large allocation.

Regardless of the cause, we've then called xfs_bmapi_finish() which
calls xfs_btree_del_cursor(cur, error) to tear down the cursor.

So we have a failed operation, error != 0, cur->bc_ino.allocated > 0
and the filesystem is still up. The assert fails to take into
account that allocation can fail with an error and the transaction
teardown will shut the filesystem down if necessary. i.e. the
assert needs to check "|| error != 0" as well, because at this point
shutdown is pending because the current transaction is dirty....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 5672225e 26-May-2022 Dave Chinner <dchinner@redhat.com>

xfs: avoid unnecessary runtime sibling pointer endian conversions

Commit dc04db2aa7c9 has caused a small aim7 regression, showing a
small increase in CPU usage in __xfs_btree_check_sblock() as a
result of the extra checking.

This is likely due to the endian conversion of the sibling poitners
being unconditional instead of relying on the compiler to endian
convert the NULL pointer at compile time and avoiding the runtime
conversion for this common case.

Rework the checks so that endian conversion of the sibling pointers
is only done if they are not null as the original code did.

.... and these need to be "inline" because the compiler completely
fails to inline them automatically like it should be doing.

$ size fs/xfs/libxfs/xfs_btree.o*
text data bss dec hex filename
51874 240 0 52114 cb92 fs/xfs/libxfs/xfs_btree.o.orig
51562 240 0 51802 ca5a fs/xfs/libxfs/xfs_btree.o.inline

Just when you think the tools have advanced sufficiently we don't
have to care about stuff like this anymore, along comes a reminder
that *our tools still suck*.

Fixes: dc04db2aa7c9 ("xfs: detect self referencing btree sibling pointers")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# dc04db2a 03-May-2022 Dave Chinner <dchinner@redhat.com>

xfs: detect self referencing btree sibling pointers

To catch the obvious graph cycle problem and hence potential endless
looping.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 722db70f 20-Apr-2022 Dave Chinner <david@fromorbit.com>

xfs: convert btree buffer log flags to unsigned.

5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned
fields to be unsigned.

We also pass the fields to log to xfs_btree_offsets() as a uint32_t
all cases now. I have no idea why we made that parameter a int64_t
in the first place, but while we are fixing this up change it to
a uint32_t field, too.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# b698f0a1 22-Mar-2022 Hugh Dickins <hughd@google.com>

mm/fs: delete PF_SWAPWRITE

PF_SWAPWRITE has been redundant since v3.2 commit ee72886d8ed5 ("mm:
vmscan: do not writeback filesystem pages in direct reclaim").

Coincidentally, NeilBrown's current patch "remove inode_congested()"
deletes may_write_to_inode(), which appeared to be the one function which
took notice of PF_SWAPWRITE. But if you study the old logic, and the
conditions under which may_write_to_inode() was called, you discover that
flag and function have been pointless for a decade.

Link: https://lkml.kernel.org/r/75e80e7-742d-e3bd-531-614db8961e4@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Jan Kara <jack@suse.de>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4a6b35b3 09-Nov-2021 Darrick J. Wong <djwong@kernel.org>

xfs: sync xfs_btree_split macros with userspace libxfs

Sync this one last bit of discrepancy between kernel and userspace
libxfs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>


# 9fa47bdc 23-Sep-2021 Darrick J. Wong <djwong@kernel.org>

xfs: use separate btree cursor cache for each btree type

Now that we have the infrastructure to track the max possible height of
each btree type, we can create a separate slab cache for cursors of each
type of btree. For smaller indices like the free space btrees, this
means that we can pack more cursors into a slab page, improving slab
utilization.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 9ec69120 16-Sep-2021 Darrick J. Wong <djwong@kernel.org>

xfs: compute the maximum height of the rmap btree when reflink enabled

Instead of assuming that the hardcoded XFS_BTREE_MAXLEVELS value is big
enough to handle the maximally tall rmap btree when all blocks are in
use and maximally shared, let's compute the maximum height assuming the
rmapbt consumes as many blocks as possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 1b236ad7 13-Oct-2021 Darrick J. Wong <djwong@kernel.org>

xfs: clean up xfs_btree_{calc_size,compute_maxlevels}

During review of the next patch, Dave remarked that he found these two
btree geometry calculation functions lacking in documentation and that
they performed more work than was really necessary.

These functions take the same parameters and have nearly the same logic;
the only real difference is in the return values. Reword the function
comment to make it clearer what each function does, and move them to be
adjacent to reinforce their relation.

Clean up both of them to stop opencoding the howmany functions, stop
using the uint typedefs, and make them both support computations for
more than 2^32 leaf records, since we're going to need all of the above
for files with large data forks and large rmap btrees.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# c0643f6f 16-Sep-2021 Darrick J. Wong <djwong@kernel.org>

xfs: encode the max btree height in the cursor

Encode the maximum btree height in the cursor, since we're soon going to
allow smaller cursors for AG btrees and larger cursors for file btrees.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 6ca444cf 16-Sep-2021 Darrick J. Wong <djwong@kernel.org>

xfs: prepare xfs_btree_cur for dynamic cursor heights

Split out the btree level information into a separate struct and put it
at the end of the cursor structure as a VLA. Files with huge data forks
(and in the future, the realtime rmap btree) will require the ability to
support many more levels than a per-AG btree cursor, which means that
we're going to create per-btree type cursor caches to conserve memory
for the more common case.

Note that a subsequent patch actually introduces dynamic cursor heights.
This one merely rearranges the structure to prepare for that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 4c175af2 16-Sep-2021 Darrick J. Wong <djwong@kernel.org>

xfs: check that bc_nlevels never overflows

Warn if we ever bump nlevels higher than the allowed maximum cursor
height.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# ae127f08 16-Sep-2021 Darrick J. Wong <djwong@kernel.org>

xfs: remove xfs_btree_cur_t typedef

Get rid of this old typedef before we start changing other things.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 9343ee76 18-Aug-2021 Dave Chinner <dchinner@redhat.com>

xfs: convert bp->b_bn references to xfs_buf_daddr()

Stop directly referencing b_bn in code outside the buffer cache, as
b_bn is supposed to be used only as an internal cache index.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 04fcad80 18-Aug-2021 Dave Chinner <dchinner@redhat.com>

xfs: introduce xfs_buf_daddr()

Introduce a helper function xfs_buf_daddr() to extract the disk
address of the buffer from the struct xfs_buf. This will replace
direct accesses to bp->b_bn and bp->b_maps[0].bm_bn, as well as
the XFS_BUF_ADDR() macro.

This patch introduces the helper function and replaces all uses of
XFS_BUF_ADDR() as this is just a simple sed replacement.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# ebd9027d 18-Aug-2021 Dave Chinner <dchinner@redhat.com>

xfs: convert xfs_sb_version_has checks to use mount features

This is a conversion of the remaining xfs_sb_version_has..(sbp)
checks to use xfs_has_..(mp) feature checks.

This was largely done with a vim replacement macro that did:

:0,$s/xfs_sb_version_has\(.*\)&\(.*\)->m_sb/xfs_has_\1\2/g<CR>

A couple of other variants were also used, and the rest touched up
by hand.

$ size -t fs/xfs/built-in.a
text data bss dec hex filename
before 1127533 311352 484 1439369 15f689 (TOTALS)
after 1125360 311352 484 1437196 15ee0c (TOTALS)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 75c8c50f 18-Aug-2021 Dave Chinner <dchinner@redhat.com>

xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown

Remove the shouty macro and instead use the inline function that
matches other state/feature check wrapper naming. This conversion
was done with sed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 38c26bfd 18-Aug-2021 Dave Chinner <dchinner@redhat.com>

xfs: replace xfs_sb_version checks with feature flag checks

Convert the xfs_sb_version_hasfoo() to checks against
mp->m_features. Checks of the superblock itself during disk
operations (e.g. in the read/write verifiers and the to/from disk
formatters) are not converted - they operate purely on the
superblock state. Everything else should use the mount features.

Large parts of this conversion were done with sed with commands like
this:

for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do
sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f
done

With manual cleanups for things like "xfs_has_extflgbit" and other
little inconsistencies in naming.

The result is ia lot less typing to check features and an XFS binary
size reduced by a bit over 3kB:

$ size -t fs/xfs/built-in.a
text data bss dec hex filenam
before 1130866 311352 484 1442702 16038e (TOTALS)
after 1127727 311352 484 1439563 15f74b (TOTALS)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 32816fd7 12-Aug-2021 Darrick J. Wong <djwong@kernel.org>

xfs: constify btree function parameters that are not modified

Constify the rest of the btree functions that take structure and union
pointers and are not supposed to modify them.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 159eb69d 10-Aug-2021 Darrick J. Wong <djwong@kernel.org>

xfs: make the record pointer passed to query_range functions const

The query_range functions are supposed to call a caller-supplied
function on each record found in the dataset. These functions don't
own the memory storing the record, so don't let them change the record.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 04dcb474 10-Aug-2021 Darrick J. Wong <djwong@kernel.org>

xfs: make the key parameters to all btree query range functions const

Range query functions are not supposed to modify the query keys that are
being passed in, so mark them all const.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 50f02fe3 01-Jun-2021 Dave Chinner <dchinner@redhat.com>

xfs: remove agno from btree cursor

Now that everything passes a perag, the agno is not needed anymore.
Convert all the users to use pag->pag_agno instead and remove the
agno from the cursor. This was largely done as an automated search
and replace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>


# be9fb17d 01-Jun-2021 Dave Chinner <dchinner@redhat.com>

xfs: add a perag to the btree cursor

Which will eventually completely replace the agno in it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# 07b6403a 01-Jun-2021 Dave Chinner <dchinner@redhat.com>

xfs: move perag structure and setup to libxfs/xfs_ag.[ch]

Move the xfs_perag infrastructure to the libxfs files that contain
all the per AG infrastructure. This helps set up for passing perags
around all the code instead of bare agnos with minimal extra
includes for existing files.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>


# 756b1c34 23-Feb-2021 Dave Chinner <dchinner@redhat.com>

xfs: use current->journal_info for detecting transaction recursion

Because the iomap code using PF_MEMALLOC_NOFS to detect transaction
recursion in XFS is just wrong. Remove it from the iomap code and
replace it with XFS specific internal checks using
current->journal_info instead.

[djwong: This change also realigns the lifetime of NOFS flag changes to
match the incore transaction, instead of the inconsistent scheme we have
now.]

Fixes: 9070733b4efa ("xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 1cd738b1 11-Feb-2021 Brian Foster <bfoster@redhat.com>

xfs: consider shutdown in bmapbt cursor delete assert

The assert in xfs_btree_del_cursor() checks that the bmapbt block
allocation field has been handled correctly before the cursor is
freed. This field is used for accurate calculation of indirect block
reservation requirements (for delayed allocations), for example.
generic/019 reproduces a scenario where this assert fails because
the filesystem has shutdown while in the middle of a bmbt record
insertion. This occurs after a bmbt block has been allocated via the
cursor but before the higher level bmap function (i.e.
xfs_bmap_add_extent_hole_real()) completes and resets the field.

Update the assert to accommodate the transient state if the
filesystem has shutdown. While here, clean up the indentation and
comments in the function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# e8222613 16-Dec-2020 Dave Chinner <dchinner@redhat.com>

xfs: remove xfs_buf_t typedef

Prepare for kernel xfs_buf alignment by getting rid of the
xfs_buf_t typedef from userspace.

[darrick: This patch is a port of a userspace patch removing the
xfs_buf_t typedef in preparation to make the userspace xfs_buf code
behave more like its kernel counterpart.]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 1189686e 11-Dec-2020 Zheng Yongjun <zhengyongjun3@huawei.com>

fs/xfs: convert comma to semicolon

Replace a comma between expression statements by a semicolon.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 60e3d707 11-Mar-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: support bulk loading of staged btrees

Add a new btree function that enables us to bulk load a btree cursor.
This will be used by the upcoming online repair patches to generate new
btrees. This avoids the programmatic inefficiency of calling
xfs_btree_insert in a loop (which generates a lot of log traffic) in
favor of stamping out new btree blocks with ordered buffers, and then
committing both the new root and scheduling the removal of the old btree
blocks in a single transaction commit.

The design of this new generic code is based off the btree rebuilding
code in xfs_repair's phase 5 code, with the explicit goal of enabling us
to share that code between scrub and repair. It has the additional
feature of being able to control btree block loading factors.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# 349e1c03 11-Mar-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: introduce fake roots for inode-rooted btrees

Create an in-core fake root for inode-rooted btree types so that callers
can generate a whole new btree using the upcoming btree bulk load
function without making the new tree accessible from the rest of the
filesystem. It is up to the individual btree type to provide a function
to create a staged cursor (presumably with the appropriate callouts to
update the fakeroot) and then commit the staged root back into the
filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# e06536a6 11-Mar-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: introduce fake roots for ag-rooted btrees

Create an in-core fake root for AG-rooted btree types so that callers
can generate a whole new btree using the upcoming btree bulk load
function without making the new tree accessible from the rest of the
filesystem. It is up to the individual btree type to provide a function
to create a staged cursor (presumably with the appropriate callouts to
update the fakeroot) and then commit the staged root back into the
filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# 8ef54797 10-Mar-2020 Dave Chinner <dchinner@redhat.com>

xfs: rename btree cursor private btree member flags

BPRV is not longer appropriate because bc_private is going away.
Script:

$ sed -i 's/BTCUR_BPRV/BTCUR_BMBT/g' fs/xfs/*[ch] fs/xfs/*/*[ch]

With manual cleanup to the definitions in fs/xfs/libxfs/xfs_btree.h

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: change "BC_BT" to "BTCUR_BMBT", fix subject line typo]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# 92219c29 10-Mar-2020 Dave Chinner <dchinner@redhat.com>

xfs: convert btree cursor inode-private member names

bc_private.b -> bc_ino conversion via script:

$ sed -i 's/bc_private\.b/bc_ino/g' fs/xfs/*[ch] fs/xfs/*/*[ch]

And then revert the change to the bc_ino #define in
fs/xfs/libxfs/xfs_btree.h manually.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: tweak the subject line slightly]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# 576af732 10-Mar-2020 Dave Chinner <dchinner@redhat.com>

xfs: convert btree cursor ag-private member name

bc_private.a -> bc_ag conversion via script:

`sed -i 's/bc_private\.a/bc_ag/g' fs/xfs/*[ch] fs/xfs/*/*[ch]`

And then revert the change to the bc_ag #define in
fs/xfs/libxfs/xfs_btree.h manually.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# 8d57c216 11-Mar-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: add a function to deal with corrupt buffers post-verifiers

Add a helper function to get rid of buffers that we have decided are
corrupt after the verifiers have run. This function is intended to
handle metadata checks that can't happen in the verifiers, such as
inter-block relationship checking. Note that we now mark the buffer
stale so that it will not end up on any LRU and will be purged on
release.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# ee647f85 23-Jan-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: remove the xfs_btree_get_buf[ls] functions

Remove the xfs_btree_get_bufs and xfs_btree_get_bufl functions, since
they're pretty trivial oneliners.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# ce92464c 23-Jan-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: make xfs_trans_get_buf return an error code

Convert xfs_trans_get_buf() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 5a57c05b 02-Jan-2020 Eric Sandeen <sandeen@redhat.com>

xfs: remove shadow variable in xfs_btree_lshift

Sparse warns about a shadow variable in this function after the
Fixed: commit added another int i; with larger scope. It's safe
to remove the one with the smaller scope to fix this shadow,
although the shadow itself is harmless.

Fixes: 2c813ad66a72 ("xfs: support btrees with overlapping intervals for keys")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 377bcd5f 14-Nov-2019 Carlos Maiolino <cmaiolino@redhat.com>

xfs: Remove kmem_zone_free() wrapper

We can remove it now, without needing to rework the KM_ flags.

Use kmem_cache_free() directly.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# a71895c5 11-Nov-2019 Darrick J. Wong <darrick.wong@oracle.com>

xfs: convert open coded corruption check to use XFS_IS_CORRUPT

Convert the last of the open coded corruption check and report idioms to
use the XFS_IS_CORRUPT macro.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# f9e03706 11-Nov-2019 Darrick J. Wong <darrick.wong@oracle.com>

xfs: kill the XFS_WANT_CORRUPT_* macros

The XFS_WANT_CORRUPT_* macros conceal subtle side effects such as the
creation of local variables and redirections of the code flow. This is
pretty ugly, so replace them with explicit XFS_IS_CORRUPT tests that
remove both of those ugly points. The change was performed with the
following coccinelle script:

@@
expression mp, test;
identifier label;
@@

- XFS_WANT_CORRUPTED_GOTO(mp, test, label);
+ if (XFS_IS_CORRUPT(mp, !test)) { error = -EFSCORRUPTED; goto label; }

@@
expression mp, test;
@@

- XFS_WANT_CORRUPTED_RETURN(mp, test);
+ if (XFS_IS_CORRUPT(mp, !test)) return -EFSCORRUPTED;

@@
expression mp, lval, rval;
@@

- XFS_IS_CORRUPT(mp, !(lval == rval))
+ XFS_IS_CORRUPT(mp, lval != rval)

@@
expression mp, e1, e2;
@@

- XFS_IS_CORRUPT(mp, !(e1 && e2))
+ XFS_IS_CORRUPT(mp, !e1 || !e2)

@@
expression e1, e2;
@@

- !(e1 == e2)
+ e1 != e2

@@
expression e1, e2, e3, e4, e5, e6;
@@

- !(e1 == e2 && e3 == e4) || e5 != e6
+ e1 != e2 || e3 != e4 || e5 != e6

@@
expression e1, e2, e3, e4, e5, e6;
@@

- !(e1 == e2 || (e3 <= e4 && e5 <= e6))
+ e1 != e2 && (e3 > e4 || e5 > e6)

@@
expression mp, e1, e2;
@@

- XFS_IS_CORRUPT(mp, !(e1 <= e2))
+ XFS_IS_CORRUPT(mp, e1 > e2)

@@
expression mp, e1, e2;
@@

- XFS_IS_CORRUPT(mp, !(e1 < e2))
+ XFS_IS_CORRUPT(mp, e1 >= e2)

@@
expression mp, e1;
@@

- XFS_IS_CORRUPT(mp, !!e1)
+ XFS_IS_CORRUPT(mp, e1)

@@
expression mp, e1, e2;
@@

- XFS_IS_CORRUPT(mp, !(e1 || e2))
+ XFS_IS_CORRUPT(mp, !e1 && !e2)

@@
expression mp, e1, e2, e3, e4;
@@

- XFS_IS_CORRUPT(mp, !(e1 == e2) && !(e3 == e4))
+ XFS_IS_CORRUPT(mp, e1 != e2 && e3 != e4)

@@
expression mp, e1, e2, e3, e4;
@@

- XFS_IS_CORRUPT(mp, !(e1 <= e2) || !(e3 >= e4))
+ XFS_IS_CORRUPT(mp, e1 > e2 || e3 < e4)

@@
expression mp, e1, e2, e3, e4;
@@

- XFS_IS_CORRUPT(mp, !(e1 == e2) && !(e3 <= e4))
+ XFS_IS_CORRUPT(mp, e1 != e2 && e3 > e4)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 27d9ee57 06-Nov-2019 Darrick J. Wong <darrick.wong@oracle.com>

xfs: actually check xfs_btree_check_block return in xfs_btree_islastblock

Coverity points out that xfs_btree_islastblock doesn't check the return
value of xfs_btree_check_block. Since the question "Does the cursor
point to the last block in this level?" only makes sense if the caller
previously performed a lookup or seek operation, the block should
already have been checked.

Therefore, check the return value in an ASSERT and turn the whole thing
into a static inline predicate.

Coverity-id: 114069
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# a5155b87 02-Nov-2019 Darrick J. Wong <darrick.wong@oracle.com>

xfs: always log corruption errors

Make sure we log something to dmesg whenever we return -EFSCORRUPTED up
the call stack.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# e992ae8a 28-Oct-2019 Darrick J. Wong <darrick.wong@oracle.com>

xfs: refactor xfs_iread_extents to use xfs_btree_visit_blocks

xfs_iread_extents open-codes everything in xfs_btree_visit_blocks, so
refactor the btree helper to be able to iterate only the records on
level 0, then port iread_extents to use it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# e7ee96df 28-Aug-2019 Darrick J. Wong <darrick.wong@oracle.com>

xfs: remove all *_ITER_ABORT values

Use -ECANCELED to signal "stop iterating" instead of these magical
*_ITER_ABORT values, since it's duplicative.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 71912e08 26-Aug-2019 zhengbin <zhengbin13@huawei.com>

xfs: remove excess function parameter description in 'xfs_btree_sblock_v5hdr_verify'

Fixes gcc warning:

fs/xfs/libxfs/xfs_btree.c:4475: warning: Excess function parameter 'max_recs' description in 'xfs_btree_sblock_v5hdr_verify'
fs/xfs/libxfs/xfs_btree.c:4475: warning: Excess function parameter 'pag_max_level' description in 'xfs_btree_sblock_v5hdr_verify'

Fixes: c5ab131ba0df ("libxfs: refactor short btree block verification")
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 250d4b4c 28-Jun-2019 Eric Sandeen <sandeen@sandeen.net>

xfs: remove unused header files

There are many, many xfs header files which are included but
unneeded (or included twice) in the xfs code, so remove them.

nb: xfs_linux.h includes about 9 headers for everyone, so those
explicit includes get removed by this. I'm not sure what the
preference is, but if we wanted explicit includes everywhere,
a followup patch could remove those xfs_*.h includes from
xfs_linux.h and move them into the files that need them.
Or it could be left as-is.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# dbd329f1 28-Jun-2019 Christoph Hellwig <hch@lst.de>

xfs: add struct xfs_mount pointer to struct xfs_buf

We need to derive the mount pointer from a buffer in a lot of place.
Add a direct pointer to short cut the pointer chasing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# f5b999c0 12-Jun-2019 Eric Sandeen <sandeen@redhat.com>

xfs: remove unused flag arguments

There are several functions which take a flag argument that is
only ever passed as "0," so remove these arguments.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 7d048df4 30-Nov-2018 Eric Sandeen <sandeen@redhat.com>

xfs: fix inverted return from xfs_btree_sblock_verify_crc

xfs_btree_sblock_verify_crc is a bool so should not be returning
a failaddr_t; worse, if xfs_log_check_lsn fails it returns
__this_address which looks like a boolean true (i.e. success)
to the caller.

(interestingly xfs_btree_lblock_verify_crc doesn't have the issue)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 0b61f8a4 05-Jun-2018 Dave Chinner <dchinner@redhat.com>

xfs: convert to SPDX license tags

Remove the verbose license text from XFS files and replace them
with SPDX tags. This does not change the license of any of the code,
merely refers to the common, up-to-date license files in LICENSES/

This change was mostly scripted. fs/xfs/Makefile and
fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
and modified by the following command:

for f in `git grep -l "GNU General" fs/xfs/` ; do
echo $f
cat $f | awk -f hdr.awk > $f.new
mv -f $f.new $f
done

And the hdr.awk script that did the modification (including
detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
is as follows:

$ cat hdr.awk
BEGIN {
hdr = 1.0
tag = "GPL-2.0"
str = ""
}

/^ \* This program is free software/ {
hdr = 2.0;
next
}

/any later version./ {
tag = "GPL-2.0+"
next
}

/^ \*\// {
if (hdr > 0.0) {
print "// SPDX-License-Identifier: " tag
print str
print $0
str=""
hdr = 0.0
next
}
print $0
next
}

/^ \* / {
if (hdr > 1.0)
next
if (hdr > 0.0) {
if (str != "")
str = str "\n"
str = str $0
next
}
print $0
next
}

/^ \*/ {
if (hdr > 0.0)
next
print $0
next
}

// {
if (hdr > 0.0) {
if (str != "")
str = str "\n"
str = str $0
next
}
print $0
}

END { }
$

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 2551a530 04-Jun-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: explicitly pass buffer size to xfs_corruption_error

Explicitly pass the buffer length to xfs_corruption_error() instead of
assuming XFS_CORRUPTION_DUMP_LEN so that we avoid dumping off the end
of the buffer.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 85ae0109 03-Jun-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: don't assert when on-disk btree pointers are garbage

Don't ASSERT when we encounter bad on-disk btree pointers in the debug
check functions. Log the error to leave breadcrumbs and let the upper
layers deal with it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# e63a1008 04-Jun-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: strengthen btree pointer checks before use

Instead of ASSERTing on null btree pointers in xfs_btree_ptr_to_daddr,
use the new block number verifiers to ensure that the btree pointer
doesn't point to any sensitive areas (AG headers, past-EOFS) and return
-EFSCORRUPTED if this is the case. Remove the ASSERT because on-disk
corruptions shouldn't trigger ASSERTs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 4cbae4b8 03-Jun-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: introduce xfs_btree_debug_check_ptr

Make xfs_btree_check_ptr a non-debug function and introduce a new _debug
version that only runs when #ifdef DEBUG. This will enable us to reuse
the checking logic with other parts of the btree code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# eeee0d6a 03-Jun-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: btree lookup shouldn't ASSERT on empty btree nodes

If a btree lookup encounters an empty btree node or an empty btree leaf
on a multi-level btree, that's evidence of a corrupt on-disk btree.
Therefore, we should return -EFSCORRUPTED to the upper levels, not an
ASSERT failure.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 08daa3cc 09-May-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: add repair helpers for the reference count btree

Add a couple of functions to the refcount btree and generic btree code
that will be used to repair the refcountbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 14861c47 09-May-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: add helpers to calculate btree size

Add a bunch of helper functions that calculate the sizes of various
btrees. These will be used to repair btrees and btree headers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# a1f69417 06-Apr-2018 Eric Sandeen <sandeen@sandeen.net>

xfs: non-scrub - remove unused function parameters

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# e157ebdc 06-Mar-2018 Carlos Maiolino <cmaiolino@redhat.com>

Cleanup old XFS_BTREE_* traces

Remove unused legacy btree traces from IRIX era.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# fb1755a6 24-Jan-2018 Carlos Maiolino <cmaiolino@redhat.com>

Split buffer's b_fspriv field

By splitting the b_fspriv field into two different fields (b_log_item
and b_li_list). It's possible to get rid of an old ABI workaround, by
using the new b_log_item field to store xfs_buf_log_item separated from
the log items attached to the buffer, which will be linked in the new
b_li_list field.

This way, there is no more need to reorder the log items list to place
the buf_log_item at the beginning of the list, simplifying a bit the
logic to handle buffer IO.

This also opens the possibility to change buffer's log items list into a
proper list_head.

b_log_item field is still defined as a void *, because it is still used
by the log buffers to store xlog_in_core structures, and there is no
need to add an extra field on xfs_buf just for xlog_in_core.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: minor style changes]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# ce1d802e 16-Jan-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: add scrub cross-referencing helpers for the free space btrees

Add a couple of functions to the free space btrees that will be used
to cross-reference metadata against the bnobt/cntbt, and a generic
btree function that provides the real implementation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# a6a781a5 08-Jan-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: have buffer verifier functions report failing address

Modify each function that checks the contents of a metadata buffer to
return the instruction address of the failing test so that we can report
more precise failure errors to the log.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# e1e55aaf 08-Jan-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: refactor short form btree pointer verification

Now that we have xfs_verify_agbno, use it to verify short form btree
pointers instead of open-coding them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 8368a601 08-Jan-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: refactor long-format btree header verification routines

Create two helper functions to verify the headers of a long format
btree block. We'll use this later for the realtime rmapbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 59f6fec3 08-Jan-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: remove XFS_FSB_SANITY_CHECK

We already have a function to verify fsb pointers, so get rid of the
last users of the (less robust) macro.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 4483eb56 06-Nov-2017 Christoph Hellwig <hch@lst.de>

xfs: mark xfs_btree_check_lblock and xfs_btree_check_ptr static

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# e9e899a2 31-Oct-2017 Darrick J. Wong <darrick.wong@oracle.com>

xfs: move error injection tags into their own file

Move the error injection tag names into a libxfs header so that we can
share it between kernel and userspace.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 2fdbec5c 25-Oct-2017 Darrick J. Wong <darrick.wong@oracle.com>

xfs: compare btree block keys to parent block's keys during scrub

When we're done checking all the records/keys in a btree block, compute
the low and high key of the block and compare them to the associated key
in the parent btree block.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# cc3e0948 17-Oct-2017 Darrick J. Wong <darrick.wong@oracle.com>

xfs: scrub the shape of a metadata btree

Create a function that can check the shape of a btree -- each block
passes basic inspection and all the pointers look ok. In the next patch
we'll add the ability to check the actual keys and records stored within
the btree. Add some helper functions so that we report detailed scrub
errors in a uniform manner in dmesg. These are helper functions for
subsequent patches.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 52c732ee 17-Oct-2017 Darrick J. Wong <darrick.wong@oracle.com>

xfs: refactor btree block header checking functions

Refactor the btree block header checks to have an internal function that
returns the address of the failing check without logging errors. The
scrubber will call the internal function, while the external version
will maintain the current logging behavior.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# f135761a 17-Oct-2017 Darrick J. Wong <darrick.wong@oracle.com>

xfs: refactor btree pointer checks

Refactor the btree pointer checks so that we can call them from the
scrub code without logging errors to dmesg. Preserve the existing error
reporting for regular operations.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 2dd3d709 29-Aug-2017 Brian Foster <bfoster@redhat.com>

xfs: relog dirty buffers during swapext bmbt owner change

The owner change bmbt scan that occurs during extent swap operations
does not handle ordered buffer failures. Buffers that cannot be
marked ordered must be physically logged so previously dirty ranges
of the buffer can be relogged in the transaction.

Since the bmbt scan may need to process and potentially log a large
number of blocks, we can't expect to complete this operation in a
single transaction. Update extent swap to use a permanent
transaction with enough log reservation to physically log a buffer.
Update the bmbt scan to physically log any buffers that cannot be
ordered and to terminate the scan with -EAGAIN. On -EAGAIN, the
caller rolls the transaction and restarts the scan. Finally, update
the bmbt scan helper function to skip bmbt blocks that already match
the expected owner so they are not reprocessed after scan restarts.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[darrick: fix the xfs_trans_roll call]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 99c794c6 29-Aug-2017 Brian Foster <bfoster@redhat.com>

xfs: skip bmbt block ino validation during owner change

Extent swap uses xfs_btree_visit_blocks() to fix up bmbt block
owners on v5 (!rmapbt) filesystems. The bmbt scan uses
xfs_btree_lookup_get_block() to read bmbt blocks which verifies the
current owner of the block against the parent inode of the bmbt.
This works during extent swap because the bmbt owners are updated to
the opposite inode number before the inode extent forks are swapped.

The modified bmbt blocks are marked as ordered buffers which allows
everything to commit in a single transaction. If the transaction
commits to the log and the system crashes such that recovery of the
extent swap is required, log recovery restarts the bmbt scan to fix
up any bmbt blocks that may have not been written back before the
crash. The log recovery bmbt scan occurs after the inode forks have
been swapped, however. This causes the bmbt block owner verification
to fail, leads to log recovery failure and requires xfs_repair to
zap the log to recover.

Define a new invalid inode owner flag to inform the btree block
lookup mechanism that the current inode may be invalid with respect
to the current owner of the bmbt block. Set this flag on the cursor
used for change owner scans to allow this operation to work at
runtime and during log recovery.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Fixes: bb3be7e7c ("xfs: check for bogus values in btree block headers")
Cc: stable@vger.kernel.org
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 8dc518df 29-Aug-2017 Brian Foster <bfoster@redhat.com>

xfs: don't log dirty ranges for ordered buffers

Ordered buffers are attached to transactions and pushed through the
logging infrastructure just like normal buffers with the exception
that they are not actually written to the log. Therefore, we don't
need to log dirty ranges of ordered buffers. xfs_trans_log_buf() is
called on ordered buffers to set up all of the dirty state on the
transaction, buffer and log item and prepare the buffer for I/O.

Now that xfs_trans_dirty_buf() is available, call it from
xfs_trans_ordered_buf() so the latter is now mutually exclusive with
xfs_trans_log_buf(). This reflects the implementation of ordered
buffers and helps eliminate confusion over the need to log ranges of
ordered buffers just to set up internal log state.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 1e86eabe 17-Jul-2017 Darrick J. Wong <darrick.wong@oracle.com>

xfs: check _btree_check_block value

Check the _btree_check_block return value for the firstrec and lastrec
functions, since we have the ability to signal that the repositioning
did not succeed.

Fixes-coverity-id: 114067
Fixes-coverity-id: 114068
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# 9e24cfd0 20-Jun-2017 Darrick J. Wong <darrick.wong@oracle.com>

xfs: remove unneeded parameter from XFS_TEST_ERROR

Since we moved the injected error frequency controls to the mountpoint,
we can get rid of the last argument to XFS_TEST_ERROR.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>


# 26788097 16-Jun-2017 Darrick J. Wong <darrick.wong@oracle.com>

xfs: export various function for the online scrubber

Export various internal functions so that the online scrubber can use
them to check the state of metadata.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# c8ce540d 16-Jun-2017 Darrick J. Wong <darrick.wong@oracle.com>

xfs: remove double-underscore integer types

This is a purely mechanical patch that removes the private
__{u,}int{8,16,32,64}_t typedefs in favor of using the system
{u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform
the transformation and fix the resulting whitespace and indentation
errors:

s/typedef\t__uint8_t/typedef __uint8_t\t/g
s/typedef\t__uint/typedef __uint/g
s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
s/__uint8_t\t/__uint8_t\t\t/g
s/__uint/uint/g
s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
s/__int/int/g
/^typedef.*int[0-9]*_t;$/d

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 5a4c7334 16-Jun-2017 Darrick J. Wong <darrick.wong@oracle.com>

xfs: optimize _btree_query_all

Don't bother wandering our way through the leaf nodes when the caller
issues a query_all; just zoom down the left side of the tree and walk
rightwards along level zero.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# a4d768e7 22-May-2017 Eric Sandeen <sandeen@sandeen.net>

xfs: fix unaligned access in xfs_btree_visit_blocks

This structure copy was throwing unaligned access warnings on sparc64:

Kernel unaligned access at TPC[1043c088] xfs_btree_visit_blocks+0x88/0xe0 [xfs]

xfs_btree_copy_ptrs does a memcpy, which avoids it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 9070733b 03-May-2017 Michal Hocko <mhocko@suse.com>

xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS

xfs has defined PF_FSTRANS to declare a scope GFP_NOFS semantic quite
some time ago. We would like to make this concept more generic and use
it for other filesystems as well. Let's start by giving the flag a more
generic name PF_MEMALLOC_NOFS which is in line with an exiting
PF_MEMALLOC_NOIO already used for the same purpose for GFP_NOIO
contexts. Replace all PF_FSTRANS usage from the xfs code in the first
step before we introduce a full API for it as xfs uses the flag directly
anyway.

This patch doesn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170306131408.9828-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e9a2599a 28-Mar-2017 Darrick J. Wong <darrick.wong@oracle.com>

xfs: create a function to query all records in a btree

Create a helper function that will query all records in a btree.
This will be used by the online repair functions to examine every
record in a btree to rebuild a second btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# d5a91bae 02-Feb-2017 Darrick J. Wong <darrick.wong@oracle.com>

xfs: filter out obviously bad btree pointers

Don't let anybody load an obviously bad btree pointer. Since the values
come from disk, we must return an error, not just ASSERT.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>


# b6f41e44 28-Jan-2017 Eric Sandeen <sandeen@sandeen.net>

xfs: remove boilerplate around xfs_btree_init_block

Now that xfs_btree_init_block_int is able to determine crc
status from the passed-in mp, we can determine the proper
magic as well if we are given a btree number, rather than
an explicit magic value.

Change xfs_btree_init_block[_int] callers to pass in the
btree number, and let xfs_btree_init_block_int use the
xfs_magics array via the xfs_btree_magic macro to determine
which magic value is needed. This makes all of the
if (crc) / else stanzas identical, and the if/else can be
removed, leading to a single, common init_block call.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# af7d20fd 28-Jan-2017 Eric Sandeen <sandeen@sandeen.net>

xfs: make xfs_btree_magic more generic

Right now the xfs_btree_magic() define takes only a cursor;
change this to take crc and btnum args to make it more generically
useful, and move to a function.

This will allow xfs_btree_init_block_int callers which don't
have a cursor to make use of the xfs_magics array, which will
happen in the next patch.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# f88ae46b 28-Jan-2017 Eric Sandeen <sandeen@sandeen.net>

xfs: glean crc status from mp not flags in xfs_btree_init_block_int

xfs_btree_init_block_int() can determine whether crcs are
in effect without the passed-in XFS_BTREE_CRC_BLOCKS flag;
the mp argument allows us to determine this from the
superblock. Remove the flag from callers, and use
xfs_sb_version_hascrc(&mp->m_sb) internally instead.

This removes one difference between the if & else cases
in the callers.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# bb3be7e7 04-Dec-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: check for bogus values in btree block headers

When we're reading a btree block, make sure that what we retrieved
matches the owner and level; and has a plausible number of records.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# f1b8243c 19-Oct-2016 Eric Biggers <ebiggers@google.com>

xfs: add some 'static' annotations

sparse reported that several variables and a function were not
forward-declared anywhere and therefore should be 'static'.

Found with sparse by running 'make C=2 CF=-D__CHECK_ENDIAN__ fs/xfs/'

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 1946b91c 03-Oct-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: define the on-disk refcount btree format

Start constructing the refcount btree implementation by establishing
the on-disk format and everything needed to read, write, and
manipulate the refcount btree blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 46eeb521 03-Oct-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: introduce refcount btree definitions

Add new per-AG refcount btree definitions to the per-AG structures.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# c611cc03 18-Sep-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: count the blocks in a btree

Provide a helper method to count the number of blocks in a short form
btree. The refcount and rmap btrees need to know the number of blocks
already in use to set up their per-AG block reservations during mount.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 4ed3f687 18-Sep-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: create a standard btree size calculator code

Create a helper to generate AG btree height calculator functions.
This will be used (much) later when we get to the refcount btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# a1d46cff 18-Sep-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: remove xfs_btree_bigkey

Remove the xfs_btree_bigkey mess and simply make xfs_btree_key big enough
to hold both keys in-core.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 5b5c2dbd 26-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: simple btree query range should look right if LE lookup fails

If the initial LOOKUP_LE in the simple query range fails to find
anything, we should attempt to increment the btree cursor to see
if there actually /are/ records for what we're trying to find.
Without this patch, a bnobt range query of (0, $agsize) returns
no results because the leftmost record never has a startblock
of zero.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 72227899 25-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: fix some key handling problems in _btree_simple_query_range

We only need the record's high key for the first record that we look
at; for all records, we /definitely/ need the regular record key.
Therefore, fix how the simple range query function gets its keys.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# ed150e1a 25-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: don't perform lookups on zero-height btrees

If the caller passes in a cursor to a zero-height btree (which is
impossible), we never set block to anything but NULL, which causes the
later dereference of it to crash. Instead, just return -EFSCORRUPTED.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# c1d22ae8 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: in btree_lshift, only allocate temporary cursor when needed

We only need the temporary cursor in _btree_lshift if we're shifting
in an overlapped btree. Therefore, factor that into a single block
of code so we avoid unnecessary cursor duplication.

Also fix use of the wrong cursor when checking for corruption in
xfs_btree_rshift().

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 1f704b2b 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: remove unnecesary lshift/rshift key initialization

In the lshift/rshift functions we don't use the key variable for
anything now, so remove the variable and its initializer. The
update_keys functions figure out the key for a block on their own.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 973b8319 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: remove the get*keys and update_keys btree ops pointers

These are internal btree functions; we don't need them to be
dispatched via function pointers. Make them static again and
just check the overlapped flag to figure out what we need to
do. The strategy behind this patch was suggested by Christoph.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 035e00ac 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: define the on-disk rmap btree format

Originally-From: Dave Chinner <dchinner@redhat.com>

Now we have all the surrounding call infrastructure in place, we can
start filling out the rmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for adding the
btree operations implementation.

[darrick: record owner and offset info in rmap btree]
[darrick: fork, bmbt and unwritten state in rmap btree]
[darrick: flags are a separate field in xfs_rmap_irec]
[darrick: calculate maxlevels separately]
[darrick: move the 'unwritten' bit into unused parts of rm_offset]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# b8704944 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: introduce rmap btree definitions

Originally-From: Dave Chinner <dchinner@redhat.com>

Add new per-ag rmap btree definitions to the per-ag structures. The
rmap btree will sit in the empty slots on disk after the free space
btrees, and hence form a part of the array of space management
btrees. This requires the definition of the btree to be contiguous
with the free space btrees.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 3ab78df2 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: rework xfs_bmap_free callers to use xfs_defer_ops

Restructure everything that used xfs_bmap_free to use xfs_defer_ops
instead. For now we'll just remove the old symbols and play some
cpp magic to make it work; in the next patch we'll actually rename
everything.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 28a89567 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: refactor btree owner change into a separate visit-blocks function

Refactor the btree_change_owner function into a more generic apparatus
which visits all blocks in a btree. We'll use this in a subsequent
patch for counting btree blocks for AG reservations.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 105f7d83 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: introduce interval queries on btrees

Create a function to enable querying of btree records mapping to a
range of keys. This will be used in subsequent patches to allow
querying the reverse mapping btree to find the extents mapped to a
range of physical blocks, though the generic code can be used for
any range query.

The overlapped query range function needs to use the btree get_block
helper because the root block could be an inode, in which case
bc_bufs[nlevels-1] will be NULL.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 2c813ad6 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: support btrees with overlapping intervals for keys

On a filesystem with both reflink and reverse mapping enabled, it's
possible to have multiple rmap records referring to the same blocks on
disk. When overlapping intervals are possible, querying a classic
btree to find all records intersecting a given interval is inefficient
because we cannot use the left side of the search interval to filter
out non-matching records the same way that we can use the existing
btree key to filter out records coming after the right side of the
search interval. This will become important once we want to use the
rmap btree to rebuild BMBTs, or implement the (future) fsmap ioctl.

(For the non-overlapping case, we can perform such queries trivially
by starting at the left side of the interval and walking the tree
until we pass the right side.)

Therefore, extend the btree code to come closer to supporting
intervals as a first-class record attribute. This involves widening
the btree node's key space to store both the lowest key reachable via
the node pointer (as the btree does now) and the highest key reachable
via the same pointer and teaching the btree modifying functions to
keep the highest-key records up to date.

This behavior can be turned on via a new btree ops flag so that btrees
that cannot store overlapping intervals don't pay the overhead costs
in terms of extra code and disk format changes.

When we're deleting a record in a btree that supports overlapped
interval records and the deletion results in two btree blocks being
joined, we defer updating the high/low keys until after all possible
joining (at higher levels in the tree) have finished. At this point,
the btree pointers at all levels have been updated to remove the empty
blocks and we can update the low and high keys.

When we're doing this, we must be careful to update the keys of all
node pointers up to the root instead of stopping at the first set of
keys that don't need updating. This is because it's possible for a
single deletion to cause joining of multiple levels of tree, and so
we need to update everything going back to the root.

The diff_two_keys functions return < 0, 0, or > 0 if key1 is less than,
equal to, or greater than key2, respectively. This is consistent
with the rest of the kernel and the C library.

In btree_updkeys(), we need to evaluate the force_all parameter before
running the key diff to avoid reading uninitialized memory when we're
forcing a key update. This happens when we've allocated an empty slot
at level N + 1 to point to a new block at level N and we're in the
process of filling out the new keys.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 70b22659 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: add function pointers for get/update keys to the btree

Add some function pointers to bc_ops to get the btree keys for
leaf and node blocks, and to update parent keys of a block.
Convert the _btree_updkey calls to use our new pointer, and
modify the tree shape changing code to call the appropriate
get_*_keys pointer instead of _btree_copy_keys because the
overlapping btree has to calculate high key values.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# e5821e57 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: during btree split, save new block key & ptr for future insertion

When a btree block has to be split, we pass the new block's ptr from
xfs_btree_split() back to xfs_btree_insert() via a pointer parameter;
however, we pass the block's key through the cursor's record. It is a
little weird to "initialize" a record from a key since the non-key
attributes will have garbage values.

When we go to add support for interval queries, we have to be able to
pass the lowest and highest keys accessible via a pointer. There's no
clean way to pass this back through the cursor's record field.
Therefore, pass the key directly back to xfs_btree_insert() the same
way that we pass the btree_ptr.

As a bonus, we no longer need init_rec_from_key and can drop it from the
codebase.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 0d309791 02-Aug-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: set *stat=1 after iroot realloc

If we make the inode root block of a btree unfull by expanding the
root, we must set *stat to 1 to signal success, rather than leaving
it uninitialized.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# fbfb24bf 19-Jul-2016 Kaho Ng <ngkaho1234@gmail.com>

xfs: indentation fix in xfs_btree_get_iroot()

The indentation in this function is different from the other functions.
Those spacebars are converted to tabs to improve readability.

Signed-off-by: Kaho Ng <ngkaho1234@gmail.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 19b54ee6 20-Jun-2016 Darrick J. Wong <darrick.wong@oracle.com>

xfs: refactor btree maxlevels computation

Create a common function to calculate the maximum height of a per-AG
btree. This will eventually be used by the rmapbt and refcountbt
code to calculate appropriate maxlevels values for each. This is
important because the verifiers and the transaction block
reservations depend on accurate estimates of how many blocks are
needed to satisfy a btree split.

We were mistakenly using the max bnobt height for all the btrees,
which creates a dangerous situation since the larger records and
keys in an rmapbt make it very possible that the rmapbt will be
taller than the bnobt and so we can run out of transaction block
reservation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# edfd9dd5 07-Feb-2016 Christoph Hellwig <hch@lst.de>

xfs: move buffer invalidation to xfs_btree_free_block

... instead of leaving it in the methods.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# c46ee8ad 07-Feb-2016 Christoph Hellwig <hch@lst.de>

xfs: factor btree block freeing into a helper

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 196328ec 07-Feb-2016 Christoph Hellwig <hch@lst.de>

xfs: handle errors from ->free_blocks in xfs_btree_kill_iroot

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# c5ab131b 03-Jan-2016 Darrick J. Wong <darrick.wong@oracle.com>

libxfs: refactor short btree block verification

Create xfs_btree_sblock_verify() to verify short-format btree blocks
(i.e. the per-AG btrees with 32-bit block pointers) instead of
open-coding them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# fef4ded8 11-Oct-2015 Geliang Tang <geliangtang@163.com>

libxfs: fix two comment typos

Just fix two typos in code comments.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# a45086e2 11-Oct-2015 Brian Foster <bfoster@redhat.com>

xfs: validate metadata LSNs against log on v5 superblocks

Since the onset of v5 superblocks, the LSN of the last modification has
been included in a variety of on-disk data structures. This LSN is used
to provide log recovery ordering guarantees (e.g., to ensure an older
log recovery item is not replayed over a newer target data structure).

While this works correctly from the point a filesystem is formatted and
mounted, userspace tools have some problematic behaviors that defeat
this mechanism. For example, xfs_repair historically zeroes out the log
unconditionally (regardless of whether corruption is detected). If this
occurs, the LSN of the filesystem is reset and the log is now in a
problematic state with respect to on-disk metadata structures that might
have a larger LSN. Until either the log catches up to the highest
previously used metadata LSN or each affected data structure is modified
and written out without incident (which resets the metadata LSN), log
recovery is susceptible to filesystem corruption.

This problem is ultimately addressed and repaired in the associated
userspace tools. The kernel is still responsible to detect the problem
and notify the user that something is wrong. Check the superblock LSN at
mount time and fail the mount if it is invalid. From that point on,
trigger verifier failure on any metadata I/O where an invalid LSN is
detected. This results in a filesystem shutdown and guarantees that we
do not log metadata changes with invalid LSNs on disk. Since this is a
known issue with a known recovery path, present a warning to instruct
the user how to recover.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# ce748eaa 28-Jul-2015 Eric Sandeen <sandeen@sandeen.net>

xfs: create new metadata UUID field and incompat flag

This adds a new superblock field, sb_meta_uuid. If set, along with
a new incompat flag, the code will use that field on a V5 filesystem
to compare to metadata UUIDs, which allows us to change the user-
visible UUID at will. Userspace handles the setting and clearing
of the incompat flag as appropriate, as the UUID gets changed; i.e.
setting the user-visible UUID back to the original UUID (as stored in
the new field) will remove the incompatible feature flag.

If the incompat flag is not set, this copies the user-visible UUID into
into the meta_uuid slot in memory when the superblock is read from disk;
the meta_uuid field is not written back to disk in this case.

The remainder of this patch simply switches verifiers, initializers,
etc to use the new sb_meta_uuid field.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 5fb5aeee 23-Feb-2015 Eric Sandeen <sandeen@sandeen.net>

xfs: pass mp to XFS_WANT_CORRUPTED_RETURN

Today, if we hit an XFS_WANT_CORRUPTED_RETURN we don't print any
information about which filesystem hit it. Passing in the mp allows
us to print the filesystem (device) name, which is a pretty critical
piece of information.

Tested by running fsfuzzer 'til I hit some.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# c29aad41 23-Feb-2015 Eric Sandeen <sandeen@sandeen.net>

xfs: pass mp to XFS_WANT_CORRUPTED_GOTO

Today, if we hit an XFS_WANT_CORRUPTED_GOTO we don't print any
information about which filesystem hit it. Passing in the mp allows
us to print the filesystem (device) name, which is a pretty critical
piece of information.

Tested by running fsfuzzer 'til I hit some.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# bb58e618 27-Nov-2014 Christoph Hellwig <hch@lst.de>

xfs: move most of xfs_sb.h to xfs_format.h

More on-disk format consolidation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 4fb6e8ad 27-Nov-2014 Christoph Hellwig <hch@lst.de>

xfs: merge xfs_ag.h into xfs_format.h

More on-disk format consolidation. A few declarations that weren't on-disk
format related move into better suitable spots.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# d5cf09ba 29-Jul-2014 Christoph Hellwig <hch@lst.de>

xfs: require 64-bit sector_t

Trying to support tiny disks only and saving a bit memory might have
made sense on an SGI O2 15 years ago, but is pretty pointless today.

Remove the rarely tested codepath that uses various smaller in-memory
types to reduce our test matrix and make the codebase a little bit
smaller and less complicated.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 2451337d 24-Jun-2014 Dave Chinner <dchinner@redhat.com>

xfs: global error sign conversion

Convert all the errors the core XFs code to negative error signs
like the rest of the kernel and remove all the sign conversion we
do in the interface layers.

Errors for conversion (and comparison) found via searches like:

$ git grep " E" fs/xfs
$ git grep "return E" fs/xfs
$ git grep " E[A-Z].*;$" fs/xfs

Negation points found via searches like:

$ git grep "= -[a-z,A-Z]" fs/xfs
$ git grep "return -[a-z,A-D,F-Z]" fs/xfs
$ git grep " -[a-z].*;" fs/xfs

[ with some bits I missed from Brian Foster ]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 30f712c9 24-Jun-2014 Dave Chinner <dchinner@redhat.com>

libxfs: move source files

Move all the source files that are shared with userspace into
libxfs/. This is done as one big chunk simpy to get it done
quickly

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>