History log of /linux-master/fs/bcachefs/fsck.c
Revision Date Author Comments
# 09d4c2ac 31-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: reconstruct_inode()

If an inode is missing, but corresponding extents and dirent still
exist, it's well worth recreating it - this does so.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cc053290 31-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Subvolume reconstruction

We can now recreate missing subvolumes from dirents and/or inodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fa14b504 02-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: ratelimit informational fsck errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# eab3a3ce 29-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix overlapping extent repair

overlapping extent repair was colliding with extent past end of inode
checks - don't update "extent ends at" until we know we have an extent.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8ce1db80 31-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix remove_dirent()

We were missing an iter_traverse().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d2554263 23-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Split out recovery_passes.c

We've grown a fair amount of code for managing recovery passes; tracking
which ones we're running, which ones need to be run, and flagging in the
superblock which ones need to be run on the next recovery.

So it's worth splitting out into its own file, this code is pretty
different from the code in recovery.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dcc1c045 26-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix use after free in check_root_trans()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 109ea419 14-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix spurious -BCH_ERR_transaction_restart_nested

We only need to return transaction_restart_nested when we're inside a
context that's handling transaction restarts.

Also, add a missing check_subdir_count() call.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3ff34756 16-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix check_key_has_snapshot() call

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 74406f66 15-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: check_path() now only needs to walk up to subvolume root

Now that checking subvolume structure is a separate pass, the main
check_directory_connectivity() pass only needs to walk up to a given
inode's subvolume root.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 663db5a5 15-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_check_subvolume_structure()

Now that we've got bch_subvolume.fs_path_parent, it's easy to write
subvolume

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b8628a25 08-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_subvolume::fs_path_parent

Record the filesystem path heirarchy for subvolumes in bch_subvolume

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 56e23047 09-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Correctly reattach subvolumes

Subvolumes need special handling to reattach - we always reattach them
in the root subvolume's lost+found, and they need a slightly different
kind of dirent.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3a136177 08-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: check_path() now prints full inode when reattaching

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 688a7694 08-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Pass inode bkey to check_path()

prep work for improving logging/error messages

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f5d58d0c 08-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix path where dirent -> subvol missing and we don't fix

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 45b4ed52 21-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Repair subvol dirents that point to non subvols

when repair switches d_type to or from DT_SUBVOL, we need to update the
target accordingly

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c60b7f80 06-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: check dirent->d_parent_subvol

Check that d_parent_subvol makes sense - the dirent's snapshot must be
visible in d_parent_subvol (i.e. an ancestor of d_parent_subvol's
snapshot) in order to be visible.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f4e68c85 06-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: check inode->bi_parent_subvol against dirent

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ea27001e 06-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: delete duplicated checks in check_dirent_to_subvol()

these were already checked in check_subvol()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e539ebb8 06-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: simplify check_dirent_inode_dirent()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0b498a5a 06-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: check bi_parent_subvol in check_inode()

check for inodes with a nonzero bi_parent_subvol field that aren't
actually subvolume roots

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 971a1503 08-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: better log message in lookup_inode_for_snapshot()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0b17618f 06-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: check_inode_dirent_inode()

check that if an inode has a backpointer, the dirent it points to points
back to it.

We do this in check_dirent_inode_dirent(), but only for inodes that have
dirents that point to them - we also have to do the check starting from
the inode to catch inodes that don't have dirents that point to them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f2b02d09 05-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Check subvol <-> inode pointers in check_inode()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3d4998c2 05-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: factor out check_inode_backpointer()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 11def188 05-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Factor out check_subvol_dirent()

Going to be adding more code here for checking subvol structure.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 69c8e6ce 01-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: move fsck_write_inode() to inode.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 23f25223 25-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_hash_set_snapshot() -> bch2_hash_set_in_snapshot()

Minor renaming for clarity, bit of refactoring.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 52f3a72f 27-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fix check_inode_deleted_list()

check_inode_deleted_list() returns true if the inode is on the deleted
list; check_inode() was checking the return code incorrectly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d2fda304 24-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: __lookup_dirent() works in snapshot, not subvol

Add a new helper, bch2_hash_lookup_in_snapshot(), for when we're not
operating in a subvolume and already have a snapshot ID, and then use it
in lookup_lostfound() -> __lookup_dirent().

This is a bugfix - lookup_lostfound() doesn't take a subvolume ID, we
were passing a nonsense subvolume ID before, and don't have one to pass
since we may be operating in an interior snapshot node that doesn't have
a subvolume ID.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 074cbcda 03-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fsck_err()s don't need to manually check c->sb.version anymore

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c98d132e 10-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: check_directory_structure() can now be run online

Now that we have dynamically resizable btree paths,
check_directory_structure() can check one path - inode up to the root -
in a single transaction.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d296e7b1 15-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix reattach_inode() for snapshots

reattach_inode() was broken w.r.t. snapshots - we'd lookup the subvolume
to look up lost+found, but if we're in an interior node snapshot that
didn't make any sense.

Instead, this adds a dirent path for creating in a specific snapshot,
skipping the subvolume; and we also make sure to create lost+found in
the root snapshot, to avoid conflicts with lost+found being created in
overlapping snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6474b706 11-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Clean up btree_trans

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 07f383c7 03-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: btree_iter -> btree_path_idx_t

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4eb3877e 17-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fsck -> bch2_trans_run()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 80eab7a7 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: for_each_btree_key() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 44ddd8ad 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: kill for_each_btree_key_old_upto()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3a860b5a 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: for_each_btree_key_upto() -> for_each_btree_key_old_upto()

And for_each_btree_key2_upto -> for_each_btree_key_upto

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# defd9e39 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: darray_for_each() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 038fecc0 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: qstr_eq()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cf904c8d 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_err_(fn|msg) check if should print

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 002c76dc 10-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: check_root() can now be run online

check_root() is simple enough to run as one single transaction, so is
trivial to run online.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5028b907 07-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename for_each_btree_key2() -> for_each_btree_key()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 27b2df98 07-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill for_each_btree_key()

for_each_btree_key() handles transaction restarts, like
for_each_btree_key2(), but only calls bch2_trans_begin() after a
transaction restart - for_each_btree_key2() wraps every loop iteration
in a transaction.

The for_each_btree_key() behaviour is problematic when it leads to
holding the SRCU lock that prevents key cache reclaim for an unbounded
amount of time - there's no real need to keep it around.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3f0e297d 28-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Explicity go RW for fsck

This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less
error prone.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3c471b65 26-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: convert bch_fs_flags to x-macro

Now we can print out filesystem flags in sysfs, useful for debugging
various "what's my filesystem doing" issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cb52d23e 11-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename BTREE_INSERT flags

BTREE_INSERT flags are actually transaction commit flags - rename them
for clarity.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 359d1bad 16-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Check for unlinked inodes not on deleted list

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8b58623f 13-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improved backpointer messages in fsck

When we have a key to print, we should print it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# db18ef1a 13-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix bch2_check_nlinks() for snapshots

When searching the link table for the matching inode, we were searching
for a specific - incorrect - snapshot ID as well, causing us to fail to
find the inode.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 103ffe9a 02-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: x-macro-ify inode flags enum

This lets us use bch2_prt_bitflags to print them out.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b65db750 24-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Enumerate fsck errors

This patch adds a superblock error counter for every distinct fsck
error; this means that when analyzing filesystems out in the wild we'll
be able to see what sorts of inconsistencies are being found and repair,
and hence what bugs to look for.

Errors validating bkeys are not yet considered distinct fsck errors, but
this patch adds a new helper, bkey_fsck_err(), in order to add distinct
error types for them as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9db2f860 22-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Check for too-large encoded extents

We don't yet repair (split) them, just check.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 88dfe193 19-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_id_str()

Since we can run with unknown btree IDs, we can't directly index btree
IDs into fixed size arrays.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b0b5bbf9 19-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't run bch2_delete_dead_snapshots() unnecessarily

Be a bit more careful about when bch2_delete_dead_snapshots needs to
run: it only needs to run synchronously if we're running fsck, and it
only needs to run at all if we have snapshot nodes to delete or if fsck
has noticed that it needs to run.

Also:
Rename BCH_FS_HAVE_DELETED_SNAPSHOTS -> BCH_FS_NEED_DELETE_DEAD_SNAPSHOTS

Kill bch2_delete_dead_snapshots_hook(), move functionality to
bch2_mark_snapshot()

Factor out bch2_check_snapshot_needs_deletion(), to explicitly check
if we need to be running snapshot deletion.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d2a990d1 26-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_err_msg(), bch_err_fn() now filters out transaction restart errors

These errors aren't actual errors, and should never be printed - do this
in the common helpers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a190cbcf 25-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Silence transaction restart error message

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 40a53b92 19-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More minor smatch fixes

- fix a few uninitialized return values
- return a proper error code in lookup_lostfound()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6bd68ec2 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Heap allocate btree_trans

We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96dea3d5 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix W=12 build errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# aef32bf7 11-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: __bch2_btree_insert() -> bch2_btree_insert_trans()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e46c181a 10-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert more code to bch_err_msg()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c872afa2 10-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix bch2_propagate_key_to_snapshot_leaves()

When we handle a transaction restart in a nested context, we need to
return -BCH_ERR_transaction_restart_nested because we invalidated the
outer context's iterators and locks.

bch2_propagate_key_to_snapshot_leaves() wasn't doing this, this patch
fixes it to use trans_was_restarted().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a111901f 18-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_propagate_key_to_snapshot_leaves()

If fsck finds a key that needs work done, the primary example being an
unlinked inode that needs to be deleted, and the key is in an internal
snapshot node, we have a bit of a conundrum.

The conundrum is that internal snapshot nodes are shared, and we in
general do updates in internal snapshot nodes because there may be
overwrites in some snapshots and not others, and this may affect other
keys referenced by this key (i.e. extents).

For example, we might be seeing an unlinked inode in an internal
snapshot node, but then in one child snapshot the inode might have been
reattached and might not be unlinked. Deleting the inode in the internal
snapshot node would be wrong, because then we'll delete all the extents
that the child snapshot references.

But if an unlinked inode does not have any overwrites in child
snapshots, we're fine: the inode is overwrritten in all child snapshots,
so we can do the deletion at the point of comonality in the snapshot
tree, i.e. the node where we found it.

This patch adds a new helper, bch2_propagate_key_to_snapshot_leaves(),
to handle the case where we need a to update a key that does have
overwrites in child snapshots: we copy the key to leaf snapshot nodes,
and then rewind fsck and process the needed updates there.

With this, fsck can now always correctly handle unlinked inodes found in
internal snapshot nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8e877caa 16-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Split out snapshot.c

subvolume.c has gotten a bit large, this splits out a separate file just
for managing snapshot trees - BTREE_ID_snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ad52bac2 03-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Log a message when running an explicit recovery pass

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e2bd0617 20-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix overlapping extent repair

A number of smallish fixes for overlapping extent repair, and (part of)
a new unit test. This fixes all the issues turned up by bhzhu203, in his
filesystem image from running mongodb + snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7904c82c 21-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Move fsck_inode_rm() to inode.c

Prep work for the new deleted inodes btree

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2a89a3e9 20-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix a null ptr deref in check_xattr()

We were attempting to initialize inode hash info when no inodes were
found.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9d8a3c95 16-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fsck: delete dead code

Delete the old, now reimplemented overlapping extent check/repair.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ae2e13d7 16-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_run_explicit_recovery_pass()

This introduces bch2_run_explicit_recovery_pass() and uses it for when
fsck detects that we need to re-run dead snaphots cleanup, and makes
dead snapshot cleanup more like a normal recovery pass.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 20e6d9a8 16-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix lookup_inode_for_snapshot()

This fixes a use-after-free.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6b20d746 16-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: need_snapshot_cleanup shouldn't be a fsck error

We currently don't track whether snapshot cleanup still needs to finish
(aside from running a full fsck), so it shouldn't be a fsck error yet -
fsck -n after fsck has succesfully completed shouldn't error.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 464ee192 16-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve key_visible_in_snapshot()

Delete a redundant bch2_snapshot_is_ancestor() check, and convert some
assertions to debug assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a397b8df 16-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Refactor overlapping extent checks

Make the overlapping extent check/repair code more self contained.

This is prep work for hopefully reducing key_visible_in_snapshot() usage
here as well, and also includes a nice performance optimization to not
check ref_visible2() unless the extents potentially overlap.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a0076086 16-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: check_extent(): don't use key_visible_in_snapshot()

This changes the main part of check_extents(), that checks the extent
against the corresponding inode, to not use key_visible_in_snapshot().

key_visible_in_snapshot() has to iterate over the list of ancestor
overwrites repeatedly calling bch2_snapshot_is_ancestor(), so this is a
significant performance improvement.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 650eb16b 16-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: check_extent() refactoring

More prep work for reducing key_visible_in_snapshot() usage - this
rearranges how KEY_TYPE_whitout keys are handled, so that they can be
marked off in inode_warker->inode->seen_this_pos.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a57f4d61 16-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fsck: walk_inode() now takes is_whiteout

We only want to synthesize an inode for the current snapshot ID for non
whiteouts - this refactoring lets us call walk_inode() earlier and clean
up some control flow.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0d8f320d 12-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Simplify check_extent()

Minor refactoring/dead code deletion, prep work for reworking
check_extent() to avoid key_visible_in_snapshot().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 43b81a4e 13-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: overlapping_extents_found()

This improves the repair path for overlapping extents - we now verify
that we find in the btree the overlapping extents that the algorithm
detected, and fail the fsck run with a more useful error if it doesn't
match.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f9f52bc4 16-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fsck: inode_walker: last_pos, seen_this_pos

Prep work for changing check_extent() to avoid
key_visible_in_snapshot() - this adds the state to track whether an
inode has seen an extent at this pos.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5897505e 16-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: check_extents(): make sure to check i_sectors for last inode

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8479938d 12-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert snapshot table to RCU array

This switches the generic radix tree for the in-memory table of snapshot
nodes to a simple rcu array. This means we have to add new locking to
deal with reallocations, but is faster than traversing the radix tree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 067d228b 07-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Enumerate recovery passes

Recovery and fsck have many different passes/jobs to do, which always
run in the same order - but not all of them run all the time. Some are
for fsck, some for unclean shutdown, some for version upgrades.

This adds some new structure: a defined list of recovery passes that we
can run in a loop, as well as consolidating the log messages.

The main benefit is consolidating the "should run this recovery pass"
logic, as well as cleaning up the "this recovery pass has finished"
state; instead of having a bunch of ad-hoc state bits in c->flags, we've
now got c->curr_recovery_pass.

By consolidating the "should run this recovery pass" logic, in the
future on disk format upgrades will be able to say "upgrading to this
version requires x passes to run", instead of forcing all of fsck to
run.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 73bd774d 06-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted sparse fixes

- endianness fixes
- mark some things static
- fix a few __percpu annotations
- fix silent enum conversions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0b9fbce2 27-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix a format string warning

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 06dcca51 25-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fsck: Break walk_inode() up into multiple functions

Some refactoring, prep work for algorithm improvements related to
snapshots.

we need to add a bitmap to the list of inodes for "seen this snapshot";
for this bitmap to correctly be available, we'll need to gather the list
of inodes first, and later look up the inode for a given snapshot.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 75da9764 24-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fsck needs BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE

A few fsck paths weren't using BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE -
oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 454377d8 24-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve error message for overlapping extents

We now print out the full previous extent we overlapping with, to aid in
debugging and searching through the journal.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1bb3c2a9 20-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New error message helpers

Add two new helpers for printing error messages with __func__ and
bch2_err_str():
- bch_err_fn
- bch_err_msg

Also kill the old error strings in the recovery path, which were causing
us to incorrectly report memory allocation failures - they're not needed
anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e47a390a 27-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert -ENOENT to private error codes

As with previous conversions, replace -ENOENT uses with more informative
private error codes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 73da30e8 12-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix check_overlapping_extents()

A error check had a flipped conditional - whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c59b483 29-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: BTREE_ID_snapshot_tree

This adds a new btree which gets us a persistent per-snapshot-tree
identifier.

- BTREE_ID_snapshot_trees
- KEY_TYPE_snapshot_tree
- bch_snapshot now has a field that points to a snapshot_tree

This is going to be used to designate one snapshot ID/subvolume out of a
given tree of snapshots as the "main" subvolume, so that we can do quota
accounting in that subvolume and not the rest.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dbda63bb 30-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_bkey_make_mut() now calls bch2_trans_update()

It's safe to call bch2_trans_update with a k/v pair where the value
hasn't been filled out, as long as the key part has been and the value
is filled out by transaction commit time.

This patch folds the bch2_trans_update() call into bch2_bkey_make_mut(),
eliminating a bit of boilerplate.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bcb79a51 29-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_bkey_get_iter() helpers

Introduce new helpers for a common pattern:

bch2_trans_iter_init();
bch2_btree_iter_peek_slot();

- bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of
the correct type
- bch2_bkey_get_val_typed() copies the val out of the btree to a
(typically stack allocated) variable; it handles the case where the
value in the btree is smaller than the current version of the type,
zeroing out the remainder.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ab158fce 30-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Converting to typed bkeys is now allowed for err, null ptrs

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c8d5b714 25-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Make sure hash info gets initialized in fsck

We had some bugs with setting/using first_this_inode in the inode
walker in the dirents/xattr code.

This patch changes to not clear first_this_inode until after
initializing the new hash info.

Also, we fix an error message to not print on transaction restart, and
add a comment to related fsck error code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e3dc75eb 16-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix a null ptr deref in fsck check_extents()

It turns out, in rare situations we need to be passing in a disk
reservation, which will be used internally by the transaction commit
path when needed. Pass one in...

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 65d48e35 14-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Private error codes: ENOMEM

This adds private error codes for most (but not all) of our ENOMEM uses,
which makes it easier to track down assorted allocation failures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 51fe0332 10-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Suppress transaction restart err message

This isn't a real error, and doesn't need to be printed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c58029ec 07-May-2022 Daniel Hill <daniel@gluo.nz>

bcachefs: Reimplement repair for overlapping extents

Repair now checks if overlapping extents exist in the same snapshot
and calls update_trans_update_extent to do the repair work.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8ffa11a2 19-Jan-2023 Daniel Hill <daniel@gluo.nz>

bcachefs: let __bch2_btree_insert() pass in flags

This patch is prep work for the following patch.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 79203111 13-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Unwritten extents support

- bch2_extent_merge checks unwritten bit
- read path returns 0s for unwritten extents without actually reading
- reflink path skips over unwritten extents
- bch2_bkey_ptrs_invalid() checks for extents with both written and
unwritten extents, and non-normal extents (stripes, btree ptrs) with
unwritten ptrs
- fiemap checks for unwritten extents and returns
FIEMAP_EXTENT_UNWRITTEN

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c72f687a 11-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Use for_each_btree_key_upto() more consistently

It's important that in BTREE_ITER_FILTER_SNAPSHOTS mode we always use
peek_upto() and provide an end for the interval we're searching for -
otherwise, when we hit the end of the inode the next inode be in a
different subvolume and not have any keys in the current snapshot, and
we'd iterate over arbitrarily many keys before returning one.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 419fc65f 01-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix hash_check_key()

On hash collision when we have to check for duplicates or incorrect
hash value, we weren't specifying a snapshot ID to iterate with.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 994ba475 23-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New btree helpers

This introduces some new conveniences, to help cut down on boilerplate:

- bch2_trans_kmalloc_nomemzero() - performance optimiation
- bch2_bkey_make_mut()
- bch2_bkey_get_mut()
- bch2_bkey_get_mut_typed()
- bch2_bkey_alloc()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e88a75eb 24-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New bpos_cmp(), bkey_cmp() replacements

This patch introduces
- bpos_eq()
- bpos_lt()
- bpos_le()
- bpos_gt()
- bpos_ge()

and equivalent replacements for bkey_cmp().

Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a1019576 22-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More style fixes

Fixes for various checkpatch errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# df6a24f8 22-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Make error messages more uniform

Use __func__ in error messages that refer to function name, and do so
more uniformly - suggested by checkpatch.pl

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e3e02e6 19-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted checkpatch fixes

checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:

ASSIGN_IN_IF
UNSPECIFIED_INT - bcachefs coding style prefers single token type names
NEW_TYPEDEFS - typedefs are occasionally good
FUNCTION_ARGUMENTS - we prefer to look at functions in .c files
(hopefully with docbook documentation), not .h
file prototypes
MULTISTATEMENT_MACRO_USE_DO_WHILE
- we have _many_ x-macros and other macros where
we can't do this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5877d887 04-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Re-enable hash_redo_key()

When subvolumes & snapshots were rolled out, hash_redo_key() was
disabled due to some new complications - namely, bch2_hash_set() works
at the subvolume level, and fsck does not run in a defined subvolume,
instead working at the snapshot ID level.

This patch splits out bch2_hash_set_snapshot() from bch2_hash_set(), and
makes one small tweak for fsck:

- Normally, bch2_hash_set() (and other dirent code) needs to know what
subvolume we're in, because dirents that point to other subvolumes
should only be visible in the subvolume they were created in, not
other snapshots. We can't check that in fsck, so we just assume that
all dirents are visible.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 12043cf1 18-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fsck: Another transaction restart handling fix

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 42590b53 18-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_btree_delete_range_trans() now returns -BCH_ERR_transaction_restart_nested

The new convention is that functions that handle transaction restarts
within an existing transaction context should return
-BCH_ERR_transaction_restart_nested when they did so, since they
invalidated the outer transaction context.

This also means bch2_btree_delete_range_trans() is changed to only call
bch2_trans_begin() after a transaction restart, not on every loop
iteration.

This is to fix a bug in fsck, in check_inode() when we truncate an inode
with BCH_INODE_I_SIZE_DIRTY set.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# efd0d038 17-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Minor transaction restart handling fix

- fsck_inode_rm() wasn't returning BCH_ERR_transaction_restart_nested
- change bch2_trans_verify_not_restarted() to call panic() - we don't
want these errors to be missed

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c497df8b 11-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Increment restart count in bch2_trans_begin()

Instead of counting transaction restarts, count when the transaction is
restarted: if bch2_trans_begin() was called when the transaction wasn't
restarted we need to ensure restart_count is still incremented.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c7be3cb5 10-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: "Snapshot deletion did not run correctly" should be a fsck err

This was noticed when a test hit this error and didn't fail, because
fsck wasn't returning that it fixed errors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 0763c552 22-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fsck: Fix nested transaction handling

This uses the new trans->restart count to make sure we always correctly
return -BCH_ERR_transaction_restart_nested when we restart a nested
transaction - eliminating some other hacks and preparing for new
assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 7903e3d2 20-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix check_i_sectors()

bch2_count_inode_sectors() uses for_each_btree_key() internally, which
handles lock restarts - the lockrestart_do() in check_i_sectors() is
redundant, and buggy here since the count that
bch2_count_inode_sectors() returns was interpreted as an error by
lockrestart_do().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 1ed0a5d2 19-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert fsck errors to errcode.h

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 549d173c 17-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: EINTR -> BCH_ERR_transaction_restart

Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d4bf5eec 18-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bch2_err_str() in error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 615f867c 17-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improved errcodes

Instead of overloading standard error codes (EINTR/EAGAIN), and defining
short lists of error codes in multiple places that potentially end up
overlapping & conflicting, we're now going to have one master list of
error codes.

Error codes are defined with an x-macro: thus we also have
bch2_err_str() now.

Also, error codes have a class field. Now, instead of checking for
errors with ==, code should use bch2_err_matches(), which returns true
if the error is equal to or a sub-error of the error class.

This means we can define unique errors for every source location where
an error is generated, which will help improve our error messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# eace11a7 16-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert more fsck code to for_each_btree_key2()

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# a1783320 15-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: for_each_btree_key2()

This introduces two new macros for iterating through the btree, with
transaction restart handling
- for_each_btree_key2()
- for_each_btree_key_commit()

Every iteration is now in an implicit transaction, and - as with
lockrestart_do() and commit_do() - returning -EINTR will cause the
transaction to be restarted, at the same key.

This patch converts a bunch of code that was open coding this to these
new macros, saving a substantial amount of code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 0d06b4ec 16-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix repair for extent past end of inode

When we find an extent past an inode's i_size, we need to do the
deletion in the inode's snapshot (which will emit a whiteout if
necessary); and we also need to note that we now have an a key at that
position and snapshot, so that we don't go into an infinite loop.

Also, switch to walking inodes in reverse older, oldest snapshot to
newest, so that we emit the fewest whiteouts possible.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c7a09cb1 16-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: When fsck finds redundant snapshot keys, trigger snapshots cleanup

Fsck now checks for keys in different snapshot IDs that are now
redundant due to other snapshots being deleted - it needs to for its own
algorithms to not get confused.

When it detects this it should re-run the post snapshot deletion cleanup
- this patch does that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 35f1a503 14-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve fsck for subvols/snapshots

- Bunch of refactoring, and move some code out of
bch2_snapshots_start() and into bch2_snapshots_check(), for constency
with the rest of fsck

- Interior snapshot nodes no longer point to a subvolume; this is so we
don't end up with dangling subvol references when deleting or require
scanning the full snapshots btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 49124d8a 14-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve snapshots_seen

This makes the snapshots_seen data structure fsck private and improves
it; we now also track the equivalence class for each snapshot id we've
seen, which means we can detect when snapshot deletion hasn't finished
or run correctly (which will otherwise confuse fsck).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 4ab35c34 13-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix subvol/snapshot deleting in recovery

fsck doesn't want to run while we're cleaning up deleted snapshots - if
that work needs to be done, we want it to have finished before fsck
runs, otherwise fsck will get confused when it finds multiple keys in
the same snapshot ID equivalence class (i.e. the mechanism that
snapshot deletion uses for cleaning up redundant keys).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e4085b70 13-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fsck_inode_rm() shouldn't delete subvols

We should never see an inode marked as unlinked that's a subvolume root
(or a directory) in fsck, but even if we do it's not correct for fsck to
delete the subvolume: subvolumes are owned by dirents, and if we find a
dangling subvolume (not marked as unlinked) we want fsck to reattach it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e68914ca 13-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename __bch2_trans_do() -> commit_do()

Better/more descriptive naming, and prep for adding
nested_lockrestart_do() and nested_commit_do().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d8f31407 17-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix hash_check_key()

hash_check_key() was incorrectly handling transaction restarts - switch
it to for_each_btree_key_norestart().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 0095aa94 16-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve some fsck error messages

We have string names for d_type; use it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 41fc8622 12-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: In fsck, pass BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE when deleting dirents

A user reported an error where we hit an assertion due to deleting a key
in an internal snapshot node, when deleting a dirent that points to a
nonexisting inode.

We try to avoid doing updates to keys for internal snapshot nodes, but
upon inspection of the places where we remove dirents in fsck it appears
BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE is correct for all of them: either
the target dirent doesn't exist, or it's a directory with multiple
dirents pointing to it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e492e7b6 11-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve error logging in fsck.c

This adds error logging to a bunch of functions in fsck.c - in fsck,
reduntant error messages is probably better than not enough.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e296b1f9 11-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix inode_backpointer_exists()

If the dirent an inode points to doesn't exist, we shouldn't be
returning an error - just 0/false.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 292dea86 06-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fsck: Work around transaction restarts

In check_extents() and check_dirents(), we're working towards only
handling transaction restarts in one place, at the top level - but we're
not there yet. check_i_sectors() and check_subdir_count() handle
transaction restarts locally, which means the iterator for the
dirent/extent is left unlocked (should_be_locked == 0), leading to
asserts popping when we go to do updates.

This patch hacks around this for now, until we can delete the offending
code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 91d961ba 29-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: darrays

Inspired by CCAN darray - simple, stupid resizable (dynamic) arrays.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# fa8e94fa 25-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Heap allocate printbufs

This patch changes printbufs dynamically allocate and reallocate a
buffer as needed. Stack usage has become a bit of a problem, and a major
cause of that has been static size string buffers on the stack.

The most involved part of this refactoring is that printbufs must now be
exited with printbuf_exit().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9e343161 13-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Small fsck fix

The check_dirents pass handles transaction restarts at the toplevel -
check_subdir_count() was incorrectly handling transaction restarts
without returning -EINTR, meaning that the iterator pointing to the
dirent being checked was left invalid.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f0f41a6d 30-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add error messages for memory allocation failures

This adds some missing diagnostics from rare but annoying to debug
runtime allocation failure paths.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 3e52c222 29-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add journal_seq to inode & alloc keys

Add fields to inode & alloc keys that record the journal sequence number
when they were most recently modified.

For alloc keys, this is needed to know what journal sequence number we
have to flush before the bucket can be reused. Currently this is tracked
in memory, but we'll be getting rid of the in memory bucket array.

For inodes, this is needed for fsync when the inode has been evicted
from the vfs cache. Currently we use a bloom filter per outstanding
journal buf - but that mechanism has been broken since we added the
ability to not issue a flush/fua for every journal write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c27314b4 03-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix __remove_dirent()

__lookup_inode() doesn't work for what __remove_dirent() wants - it just
wants the first inode at a given inode number, they all have the same
hash info.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 47f80bbf 03-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix check_inodes()

We were starting at the wrong btree position, and thus not actually
checking any inodes - oops.

Also, make check_key_has_snapshot() a mustfix fsck error, since later
fsck code assumes that all keys have valid snapshot IDs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 285b181a 28-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve transaction restart handling in fsck code

The fsck code has been handling transaction restarts locally, to avoid
calling fsck_err() multiple times (and asking the user/logging the error
multiple times) on transaction restart.

However, with our improving assertions about iterator validity, this
isn't working anymore - the code wasn't entirely correct, in ways that
are fine for now but are going to matter once we start wanting online
fsck.

This code converts much of the fsck code to handle transaction restarts
in a more rigorously correct way - moving restart handling up to the top
level of check_dirent, check_xattr and others - at the cost of logging
errors multiple times on transaction restart.

Fixing the issues with logging errors multiple times is probably going
to require memoizing calls to fsck_err() - we'll leave that for future
improvements.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2027875b 10-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add BCH_SUBVOLUME_UNLINKED

Snapshot deletion needs to become a multi step process, where we unlink,
then tear down the page cache, then delete the subvolume - the deleting
flag is equivalent to an inode with i_nlink = 0.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 6b3d8b89 25-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't run triggers in fix_reflink_p_key()

It seems some users have reflink pointers which span many indirect
extents, from a short window in time when merging of reflink pointers
was allowed.

Now, we're seeing transaction path overflows in fix_reflink_p(), the
code path to clear out the reflink_p fields now used for front/back pad
- but, we don't actually need to be running triggers in that path, which
is an easy partial fix.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b0d1b70a 24-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Must check for errors from bch2_trans_cond_resched()

But we don't need to call it from outside the btree iterator code
anymore, since it's called by bch2_trans_begin() and
bch2_btree_path_traverse().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 4db65027 11-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Subvol dirents are now only visible in parent subvol

This changes the on disk format for dirents that point to subvols so
that they also record the subvolid of the parent subvol, so that we can
filter them out in other subvolumes.

This also updates the dirent code to do that filtering, and in
particular tweaks the rename code - we need to ensure that there's only
ever one dirent (counting multiplicities in different snapshots) that
point to a subvolume.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 6e0c886d 20-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix check_path() for snapshots

check_path() wasn't checking the snapshot ID when checking for directory
structure loops - so, snapshots would cause us to detect a loop that
wasn't actually a loop.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 6d76aefe 14-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix for leaking of reflinked extents

When a reflink pointer points to only part of an indirect extent, and
then that indirect extent is fragmented (e.g. by copygc), if the reflink
pointer only points to one of the fragments we leak a reference.

Fix this by storing front/back pad values in reflink pointers - when
inserting reflink pointesr, we initialize them to cover the full range
of the indirect extents we reference.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# bfe88863 19-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New on disk format to fix reflink_p pointers

We had a bug where reflink_p pointers weren't being initialized to 0,
and when we started using the second word, things broke badly.

This patch revs the on disk format version and adds cleanup code to zero
out the second word of reflink_p pointers before we start using it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9a796fdb 19-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_exit() no longer returns errors

Now that peek_node()/next_node() are converted to return errors
directly, we don't need bch2_trans_exit() to return errors - it's
cleaner this way and wasn't used much anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 488f9776 18-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix check_path() across subvolumes

Checking of directory structure across subvolumes was broken - we need
to look up the snapshot ID of the parent subvolume when crossing subvol
boundaries.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 97996ddf 30-Sep-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_subvolume_get()

Factor out a little helper.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# ea0531f8 30-Sep-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix check_inode_update_hardlinks()

We were incorrectly using bch2_inode_write(), which gets the snapshot ID
from the iterator, with a BTREE_ITER_ALL_SNAPSHOTS iterator -
fortunately caught by an assertion in the update path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 42d23732 16-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Snapshot creation, deletion

This is the final patch in the patch series implementing snapshots.
This patch implements two new ioctls that work like creation and
deletion of directories, but fancier.

- BCH_IOCTL_SUBVOLUME_CREATE, for creating new subvolumes and snaphots
- BCH_IOCTL_SUBVOLUME_DESTROY, for deleting subvolumes and snapshots

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 18443cb9 04-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Update data move path for snapshots

The data move path operates on existing extents, and not within a
subvolume as the regular IO paths do. It needs to change because it may
cause existing extents to be split, and when splitting an existing
extent in an ancestor snapshot we need to make sure the new split has
the same visibility in child snapshots as the existing extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# ef1669ff 19-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Update fsck for snapshots

This updates the fsck algorithms to handle snapshots - meaning there
will be multiple versions of the same key (extents, inodes, dirents,
xattrs) in different snapshots, and we have to carefully consider which
keys are visible in which snapshot.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 6fed42bb 15-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Plumb through subvolume id

To implement snapshots, we need every filesystem btree operation (every
btree operation without a subvolume) to start by looking up the
subvolume and getting the current snapshot ID, with
bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing
btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode.

This patch adds those bch2_subvolume_get_snapshot() calls, and also
switches to passing around a subvol_inum instead of just an inode
number.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 81ed9ce3 19-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Per subvolume lost+found

On existing filesystems, we have a single global lost+found. Introducing
subvolumes means we need to introduce per subvolume lost+found
directories, because inodes are added to lost+found by their inode
number, and inode numbers are now only unique within a subvolume.

This patch adds support to fsck for per subvolume lost+found.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b9e1adf5 15-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add support for dirents that point to subvolumes

Dirents currently always point to inodes. Subvolumes add a new type of
dirent, with d_type DT_SUBVOL, that instead points to an entry in the
subvolumes btree, and the subvolume has a pointer to the root inode.

This patch adds bch2_dirent_read_target() to get the inode (and
potentially subvolume) a dirent points to, and changes existing code to
use that instead of reading from d_inum directly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 14b393ee 15-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Subvolumes, snapshots

This patch adds subvolume.c - support for the subvolumes and snapshots
btrees and related data types and on disk data structures. The next
patches will start hooking up this new code to existing code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 67e0dd8f 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_path

This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1a488e73 27-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BTREE_INSERT_NOUNLOCK

With the recent transaction restart changes, it's no longer needed - all
transaction commits have BTREE_INSERT_NOUNLOCK semantics.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 38200544 21-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't squash return code in check_dirents()

We were squashing BCH_FSCK_ERRORS_NOT_FIXED.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 914f2786 14-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improvements to fsck check_dirents()

The fsck code handles transaction restarts in a very ad hoc way, and not
always correctly. This patch makes some improvements to check_dirents(),
but more work needs to be done to figure out how this kind of code
should be structured.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8c3f6da9 14-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve iter->should_be_locked

Adding iter->should_be_locked introduced a regression where it ended up
not being set on the iterator passed to bch2_btree_update_start(), which
is definitely not what we want.

This patch requires it to be set when calling bch2_trans_update(), and
adds various fixups to make that happen.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b89726ab 14-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill __btree_delete_at()

With trans->updates2 gone, we can now drop this helper and use
bch2_btree_delete_at() instead.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# bc3f8b25 01-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check for errors from bch2_trans_update()

Upcoming refactoring is going to change bch2_trans_update() to start
returning transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 82355e28 17-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a memcpy call

Not supposed to pass a null ptr to memcpy (even if the size is 0).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 909004d2 14-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make sure to use BTREE_ITER_PREFETCH in fsck

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fc51b041 21-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New check_nlinks algorithm for snapshots

With snapshots, using a radix tree for the table of link counts won't
work anymore because we also need to distinguish between inodes with
different snapshot IDs. Instead, this patch builds up a sorted array of
inodes that have hardlinks that we can binary search on - taking
advantage of the fact that with inode backpointers, the check_nlinks()
pass _only_ needs to concern itself with inodes that have hardlinks now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e3b4b48c 24-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a null ptr deref

Fix a few memory safety issues, found by asan in userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 58686a25 19-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Lookup/create lost+found lazily

This is prep work for subvolumes - each subvolume will have its own
lost+found.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f02810a1 16-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an unused var warning in userspace

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f24fab9c 16-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix some small memory leaks

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ae8bbb9f 16-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Simplify fsck remove_dirent()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d3ff7fec 07-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improved check_directory_structure()

Now that we have inode backpointers, we can simplify checking directory
structure: instead of doing a DFS from the filesystem root and then
checking if we found everything, we can iterate over every inode and see
if we can go up until we get to the root.

This patch also has a number of fixes and simplifications for the inode
backpointer checks. Also, it turns out we don't actually need the
BCH_INODE_BACKPTR_UNTRUSTED flag.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 176cf4bf 09-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix fsck to not use bch2_link_trans()

bch2_link_trans() uses the btree key cache for inode updates, and fsck
isn't supposed to - also, it's not really what we want for reattaching
unreachable inodes anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b906aadd 08-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Redo check_nlink fsck pass

Now that we have inode backpointers the check_nlink pass only is
concerned with files that have hardlinks, and can be simplified.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8a85b20c 06-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Inode backpointers are now required

This lets us simplify fsck quite a bit, which we need for making fsck
snapshot aware.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7ac2c55e 06-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Simplify hash table checks

Very early on there was a period where we were accidentally generating
dirents with trailing garbage; we've since dropped support for
filesystems that old and the fsck code can be dropped.

Also, this patch switches to a simpler algorithm for checking hash
tables. It's less efficient on hash collision - but with 64 bit keys,
those are very rare.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5c16add5 06-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check inodes at start of fsck

This splits out checking inode nlinks from the rest of the inode checks
and moves most of the inode checks to the start of fsck, so that other
fsck passes can depend on it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3a14d58e 06-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop bch2_fsck_inode_nlink()

We've had BCH_FEATURE_atomic_nlink for quite some time, we can drop this
now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b6d4f474 06-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move some dirent checks to bch2_dirent_invalid()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ab2a29cc 02-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Inode backpointers

This patch adds two new inode fields, bi_dir and bi_dir_offset, that
point back to the inode's dirent.

Since we're only adding fields for a single backpointer, files that have
been hardlinked won't necessarily have valid backpointers: we also add a
new inode flag, BCH_INODE_BACKPTR_UNTRUSTED, that's set if an inode has
ever had multiple links to it. That's ok, because we only really need
this functionality for directories, which can never have multiple
hardlinks - when we add subvolumes, we'll need a way to enemurate and
print subvolumes, and this will let us reconstruct a path to a subvolume
root given a subvolume root inode.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e751c01a 24-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Start using bpos.snapshot field

This patch starts treating the bpos.snapshot field like part of the key
in the btree code:

* bpos_successor() and bpos_predecessor() now include the snapshot field
* Keys in btrees that will be using snapshots (extents, inodes, dirents
and xattrs) now always have their snapshot field set to U32_MAX

The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that
determines whether we're iterating over keys in all snapshots or not -
internally, this controlls whether bkey_(successor|predecessor)
increment/decrement the snapshot field, or only the higher bits of the
key.

We add a new member to struct btree_iter, iter->snapshot: when
BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always
equal iter->snapshot, which will be 0 for btrees that don't use
snapshots, and alsways U32_MAX for btrees that will use snapshots
(until we enable snapshot creation).

This patch also introduces a new metadata version number, and compat
code for reading from/writing to older versions - this isn't a forced
upgrade (yet).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e0ba3b64 21-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Replace bch2_btree_iter_next() calls with bch2_btree_iter_advance

The way btree iterators work internally has been changing, particularly
with the iter->real_pos changes, and bch2_btree_iter_next() is no longer
hyper optimized - it's just advance followed by peek, so it's more
efficient to just call advance where we're not using the return value of
bch2_btree_iter_next().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 50dc0f69 19-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Require all btree iterators to be freed

We keep running into occasional bugs with btree transaction iterators
overflowing - this will make those bugs more visible.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# abcecb49 19-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fsck code refactoring

Change fsck code to always put btree iterators - also, make some flow
control improvements to deal with lock restarts better, and refactor
check_extents() to not walk extents twice for counting/checking
i_sectors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c7bb769c 19-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: __bch2_trans_get_iter() refactoring, BTREE_ITER_NOT_EXTENTS

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 41f8b09e 20-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename BTREE_ID enums for consistency with other enums

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 79f88eba 20-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename KEY_TYPE_whiteout -> KEY_TYPE_hash_whiteout

Snapshots are going to need a different whiteout key type. Also, switch
to using BCH_BKEY_TYPES() to define the bkey value accessors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fe38b720 07-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use inode btree key cache in fsck code

We had a cache coherency bug with the btree key cache in the fsck code -
this fixes fsck to be consistent about not using it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2bb748a6 12-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fsck fixes

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# eaf79831 09-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix for hash_redo_key() in fsck

It's possible we're calling hash_redo_key() because of a duplicate key -
easiest fix for that is to just not use BCH_HASH_SET_MUST_CREATE.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 07a1006a 17-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reduce/kill BKEY_PADDED use

With various newer key types - stripe keys, inline data extents - the
old approach of calculating the maximum size of the value is becoming
more and more error prone. Better to switch to bkey_on_stack, which can
dynamically allocate if necessary to handle any size bkey.

In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3eb26d01 01-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_get_iter() no longer returns errors

Since we now always preallocate the maximum number of iterators when we
initialize a btree transaction, getting an iterator never fails - we can
delete a fair amount of error path code.

This patch also simplifies the iterator allocation code a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 34c1cd6a 01-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix for fsck spuriously finding duplicate extents

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6584e84a 20-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use bkey cache for inode update in fsck

fsck doesn't know about the btree key cache, and non-cached iterators
aren't cache coherent (yet?)

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fe458476 07-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: use a radix tree for inum bitmap in fsck

The change to use the cpu nr for the high bits of new inode numbers
means that inode numbers are very space - we see -ENOMEM during fsck
without this.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a3e72262 05-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New varints

Previous varint implementation used by the inode code was not nearly as
fast as it could have been; partly because it was attempting to encode
integers up to 96 bits (for timestamps) but this meant that encoding and
decoding the length required a table lookup.

Instead, we'll just encode timestamps greater than 64 bits as two
separate varints; this will make decoding/encoding of inodes
significantly faster overall.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a672fb8f 24-Aug-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make sure to go rw if lazy in fsck

The paths where we delete or truncate inodes don't pass commit flags for
BTREE_INSERT_LAZY_RW, so just go rw if necessary in the fsck code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9ef846a7 03-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve assorted error messages

This also consolidates the various checks in bch2_mark_pointer() and
bch2_trans_mark_pointer().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1d60b999 30-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix inodes pass in fsck

It wasn't updated for the patch that switched inodes to using the offset
field of struct bkey.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 39fb2983 07-Jan-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bkey_type_successor

Previously, BTREE_ID_INODES was special - inodes were indexed by the
inode field, which meant the offset field of struct bpos wasn't used,
which led to special cases in e.g. the btree iterator code.

Now, inodes in the inodes btree are indexed by the offset field.

Also: prevously min_key was special for extents btrees, min_key for
extents would equal max_key for the previous node. Now, min_key =
bkey_successor() of the previous node, same as non extent btrees.

This means we can completely get rid of
btree_type_sucessor/predecessor.

Also make some improvements to the metadata IO validate/compat code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f7005e01 25-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve error message in fsck

Seeing the extents that were overlapping is highly useful for figuring
out what went wrong.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0728eed7 21-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a locking bug in fsck

This works around a btree locking issue - we can't be holding read locks
while taking write locks, which currently means we can't have live
iterators holding read locks at commit time.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e3e464ac 30-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move extent overwrite handling out of core btree code

Ever since the btree code was first written, handling of overwriting
existing extents - including partially overwriting and splittin existing
extents - was handled as part of the core btree insert path. The modern
transaction and iterator infrastructure didn't exist then, so that was
the only way for it to be done.

This patch moves that outside of the core btree code to a pass that runs
at transaction commit time.

This is a significant simplification to the btree code and overall
reduction in code size, but more importantly it gets us much closer to
the core btree code being completely independent of extents and is
important prep work for snapshots.

This introduces a new feature bit; the old and new extent update models
are incompatible when the filesystem needs journal replay.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 163e885a 26-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill TRANS_RESET_MEM|TRANS_RESET_ITERS

All iterators should be released now with bch2_trans_iter_put(), so
TRANS_RESET_ITERS shouldn't be needed anymore, and TRANS_RESET_MEM is
always used.

Also convert more code to __bch2_trans_do().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2d594dfb 31-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out btree_trigger_flags

The trigger flags really belong with individual btree_insert_entries,
not the transaction commit flags - this splits out those flags and
unifies them with the BCH_BUCKET_MARK flags. Todo - split out
btree_trigger.c from buckets.c

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c3ff72c 28-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert some enums to x-macros

Helps for preventing things from getting out of sync.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 58e2388f 22-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BTREE_INSERT_ATOMIC

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b1fd23df 22-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert all bch2_trans_commit() users to BTREE_INSERT_ATOMIC

BTREE_INSERT_ATOMIC should really be the default mode, and there's not
that much code that doesn't need it - so this is prep work for getting
rid of the flag.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 184b1dc1 11-Nov-2019 Justin Husted <sigstop@gmail.com>

bcachefs: Update directory timestamps during link

Timestamp updates on the directory during a link operation were cached.
This is inconsistent with other metadata operations such as rename, as
well as being less efficient.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b627c7d8 09-Nov-2019 Justin Husted <sigstop@gmail.com>

bcachefs: Set lost+found mode to 0700

For security and conformance with other filesystems, the lost+found
directory should not be world or group accessible.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6d01598e 03-Nov-2019 Justin Husted <sigstop@gmail.com>

bcachefs: Fix uninitialized field in hash_check_init()

The chain_end field was not initialized before use in
hash_set_chain_start.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96385742 02-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Factor out fs-common.c

This refactoring makes the code easier to understand by separating the
bcachefs btree transactional code from the linux VFS code - but more
importantly, it's also to share code with the fuse port.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a7199432 22-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill deferred btree updates

Will be replaced by cached btree iterators

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ef9f95ba 25-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve error handling for for_each_btree_key_continue()

Change it to match for_each_btree_key()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b43a0f60 25-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Cleanup i_nlink handling

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9c37b632 19-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check for extents past eof correctly

bcachefs used to work mostly in terms of PAGE_SIZE, not block size at
the vfs level - but that has since been fixed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 20bceecb 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More work to avoid transaction restarts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 58fbf808 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Delete duplicate code

Also rename for consistency

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 619f5bee 17-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: some improvements to startup messages and options

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 94f651e2 17-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Return errors from for_each_btree_key()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dcf77129 31-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: minor fsck fix

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6bd13057 31-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fsck locking improvements

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9d455b24 29-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: make sure to use BTREE_INSERT_LAZY_RW in fsck

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0f238367 27-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trans_for_each_iter()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 424eb881 25-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Only get btree iters from btree transactions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 134915f3 21-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Go rw lazily

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0564b167 13-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: convert bch2_btree_insert_at() usage to bch2_trans_commit()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6d033aa4 09-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't need to walk inodes on clean shutdown

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f0cfb963 29-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Track nr_inodes with the key marking machinery

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 26609b61 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make bkey types globally unique

this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.

Also improve the on disk format versioning stuff.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 319f9ac3 08-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: revamp to_text methods

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 02f1a96c 03-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename nofsck opt to fsck

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bc230209 28-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: make fsck spew less

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 741daa5b 21-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Dirent repair code

There was a bug for awhile in previous kernels where we weren't
computing dirent name lengths correctly and we weren't zeroing out
padding at the end of dirents (due to struct bch_dirent changing size by
adding __attribute__((aligned)), and not updating other code to use
offsetof).

This patch fixes dirents with junk at the end, by going off of the
dirent's hash.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fc3268c1 08-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: kill extent_insert_hook

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 581edb63 08-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: mempoolify btree_trans

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d69f41d6 12-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert raw uses of bch2_btree_iter_link() to new transactions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 88c07f73 14-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Only check inode i_nlink during full fsck

Now that all filesystem operatinos that manipulate the filesystem
heirachy and i_nlink are fully atomic, we can add a feature bit to
indicate i_nlink doesn't need to be checked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c6fdbd8 17-Mar-2017 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Initial commit

Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>