History log of /linux-master/fs/bcachefs/snapshot.c
Revision Date Author Comments
# 01e5f4fc 04-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Make snapshot_is_ancestor() safe

Snapshot table accesses generally need to be checking for invalid
snapshot ID now, fix one that was missed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a292be3b 27-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Reconstruct missing snapshot nodes

When the snapshots btree is going, we'll have to delete huge amounts of
data - unless we can reconstruct it by looking at the keys that refer to
it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 13c1e583 28-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve -o norecovery; opts.recovery_pass_limit

This adds opts.recovery_pass_limit, and redoes -o norecovery to make use
of it; this fixes some issues with -o norecovery so it can be safely
used for data recovery.

Norecovery means "don't do journal replay"; it's an important data
recovery tool when we're getting stuck in journal replay.

When using it this way we need to make sure we don't free journal keys
after startup, so we continue to overlay them: thus it needs to imply
retain_recovery_info, as well as nochanges.

recovery_pass_limit is an explicit option for telling recovery to exit
after a specific recovery pass; this is a much cleaner way of
implementing -o norecovery, as well as being a useful debug feature in
its own right.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ec9cc18f 22-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add checks for invalid snapshot IDs

Previously, we assumed that keys were consistent with the snapshots
btree - but that's not correct as fsck may not have been run or may not
be complete.

This adds checks and error handling when using the in-memory snapshots
table (that mirrors the snapshots btree).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 63332394 21-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Move snapshot table size to struct snapshot_table

We need to add bounds checking for snapshot table accesses - it turns
out there are cases where we do need to use the snapshots table before
fsck checks have completed (and indeed, fsck may not have been run).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c31b83a 17-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_snapshot_is_ancestor() now safe to call in early recovery

this fixes an assertion pop in
bch2_check_snapshot_trees() ->
check_snapshot_tree() ->
bch2_snapshot_tree_master_subvol() ->
bch2_snapshot_is_ancestor()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c4333eb5 23-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix check_snapshot() memcpy

check_snapshot() copies the bch_snapshot to a temporary to easily handle
older versions that don't have all the fields of the current version,
but it lacked a min() to correctly handle keys newer and larger than the
current version.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d32088f2 20-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_snapshot::btime

Add a field to bch_snapshot for creation time; this will be important
when we start exposing the snapshot tree to userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 369acf97 16-Jan-2024 Su Yue <glass.su@suse.com>

bcachefs: kvfree bch_fs::snapshots in bch2_fs_snapshots_exit

bch_fs::snapshots is allocated by kvzalloc in __snapshot_t_mut.
It should be freed by kvfree not kfree.
Or umount will triger:

[ 406.829178 ] BUG: unable to handle page fault for address: ffffe7b487148008
[ 406.830676 ] #PF: supervisor read access in kernel mode
[ 406.831643 ] #PF: error_code(0x0000) - not-present page
[ 406.832487 ] PGD 0 P4D 0
[ 406.832898 ] Oops: 0000 [#1] PREEMPT SMP PTI
[ 406.833512 ] CPU: 2 PID: 1754 Comm: umount Kdump: loaded Tainted: G OE 6.7.0-rc7-custom+ #90
[ 406.834746 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 406.835796 ] RIP: 0010:kfree+0x62/0x140
[ 406.836197 ] Code: 80 48 01 d8 0f 82 e9 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 78 9f 1f 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 56 9f 1f 01 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 b0 00 00 00 66 90 48 8b 07 f6
[ 406.837810 ] RSP: 0018:ffffb9d641607e48 EFLAGS: 00010286
[ 406.838213 ] RAX: ffffe7b487148000 RBX: ffffb9d645200000 RCX: ffffb9d641607dc4
[ 406.838738 ] RDX: 000065bb00000000 RSI: ffffffffc0d88b84 RDI: ffffb9d645200000
[ 406.839217 ] RBP: ffff9a4625d00068 R08: 0000000000000001 R09: 0000000000000001
[ 406.839650 ] R10: 0000000000000001 R11: 000000000000001f R12: ffff9a4625d4da80
[ 406.840055 ] R13: ffff9a4625d00000 R14: ffffffffc0e2eb20 R15: 0000000000000000
[ 406.840451 ] FS: 00007f0a264ffb80(0000) GS:ffff9a4e2d500000(0000) knlGS:0000000000000000
[ 406.840851 ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 406.841125 ] CR2: ffffe7b487148008 CR3: 000000018c4d2000 CR4: 00000000000006f0
[ 406.841464 ] Call Trace:
[ 406.841583 ] <TASK>
[ 406.841682 ] ? __die+0x1f/0x70
[ 406.841828 ] ? page_fault_oops+0x159/0x470
[ 406.842014 ] ? fixup_exception+0x22/0x310
[ 406.842198 ] ? exc_page_fault+0x1ed/0x200
[ 406.842382 ] ? asm_exc_page_fault+0x22/0x30
[ 406.842574 ] ? bch2_fs_release+0x54/0x280 [bcachefs]
[ 406.842842 ] ? kfree+0x62/0x140
[ 406.842988 ] ? kfree+0x104/0x140
[ 406.843138 ] bch2_fs_release+0x54/0x280 [bcachefs]
[ 406.843390 ] kobject_put+0xb7/0x170
[ 406.843552 ] deactivate_locked_super+0x2f/0xa0
[ 406.843756 ] cleanup_mnt+0xba/0x150
[ 406.843917 ] task_work_run+0x59/0xa0
[ 406.844083 ] exit_to_user_mode_prepare+0x197/0x1a0
[ 406.844302 ] syscall_exit_to_user_mode+0x16/0x40
[ 406.844510 ] do_syscall_64+0x4e/0xf0
[ 406.844675 ] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 406.844907 ] RIP: 0033:0x7f0a2664e4fb

Signed-off-by: Su Yue <glass.su@suse.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 074cbcda 03-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fsck_err()s don't need to manually check c->sb.version anymore

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ad00bce0 27-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: mark now takes bkey_s

Prep work for disk space accounting rewrite: we're going to want to use
a single callback for both of our current triggers, so we need to change
them to have the same type signature first.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 80eab7a7 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: for_each_btree_key() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# defd9e39 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: darray_for_each() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cf904c8d 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_err_(fn|msg) check if should print

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 249bf593 09-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix snapshot.c assertion for online fsck

c->curr_recovery_pass can go backwards; this adds a non rewinding
version, c->recovery_pass_done.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5028b907 07-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename for_each_btree_key2() -> for_each_btree_key()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 27b2df98 07-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill for_each_btree_key()

for_each_btree_key() handles transaction restarts, like
for_each_btree_key2(), but only calls bch2_trans_begin() after a
transaction restart - for_each_btree_key2() wraps every loop iteration
in a transaction.

The for_each_btree_key() behaviour is problematic when it leads to
holding the SRCU lock that prevents key cache reclaim for an unbounded
amount of time - there's no real need to keep it around.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3f0e297d 28-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Explicity go RW for fsck

This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less
error prone.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3c471b65 26-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: convert bch_fs_flags to x-macro

Now we can print out filesystem flags in sysfs, useful for debugging
various "what's my filesystem doing" issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cb52d23e 11-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename BTREE_INSERT flags

BTREE_INSERT flags are actually transaction commit flags - rename them
for clarity.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0a11adfb 11-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix an endianness conversion

cpu_to_le32(), not le32_to_cpu() - fixes a sparse complaint.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2e7acdfb 27-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix deleted inodes btree in snapshot deletion

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b65db750 24-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Enumerate fsck errors

This patch adds a superblock error counter for every distinct fsck
error; this means that when analyzing filesystems out in the wild we'll
be able to see what sorts of inconsistencies are being found and repair,
and hence what bugs to look for.

Errors validating bkeys are not yet considered distinct fsck errors, but
this patch adds a new helper, bkey_fsck_err(), in order to add distinct
error types for them as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5394fe94 26-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix snapshot skiplists

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b0b5bbf9 19-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't run bch2_delete_dead_snapshots() unnecessarily

Be a bit more careful about when bch2_delete_dead_snapshots needs to
run: it only needs to run synchronously if we're running fsck, and it
only needs to run at all if we have snapshot nodes to delete or if fsck
has noticed that it needs to run.

Also:
Rename BCH_FS_HAVE_DELETED_SNAPSHOTS -> BCH_FS_NEED_DELETE_DEAD_SNAPSHOTS

Kill bch2_delete_dead_snapshots_hook(), move functionality to
bch2_mark_snapshot()

Factor out bch2_check_snapshot_needs_deletion(), to explicitly check
if we need to be running snapshot deletion.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0dd092bf 19-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix lock ordering with snapshot_create_lock

We must not hold btree locks while taking snapshot_create_lock - this
fixes a lockdep splat.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 37fad949 28-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: snapshot_create_lock

Add a new lock for snapshot creation - this addresses a few races with
logged operations and snapshot deletion.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1e2d3999 28-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix snapshot skiplists during snapshot deletion

In snapshot deleion, we have to pick new skiplist nodes for entries that
point to nodes being deleted.

The function that finds a new skiplist node, skipping over entries being
deleted, was incorrect: if n = 0, but the parent node is being deleted,
we also need to skip over that node.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d281701b 26-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix looping around bch2_propagate_key_to_snapshot_leaves()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# eebe8a84 23-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Make sure to initialize equiv when creating new snapshots

Previously, equiv was set in the snapshot deletion path, which is where
it's needed - equiv, for snapshot ID equivalence classes, would ideally
be a private data structure to the snapshot deletion path.

But if a new snapshot is created while snapshot deletion is running,
move_key_to_correct_snapshot() moves a key to snapshot id 0 - oops.

Fixes: https://github.com/koverstreet/bcachefs/issues/593
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d04fdf5c 19-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: snapshots: Use kvfree_rcu_mightsleep()

kvfree_rcu() was renamed - not removed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d67a72bf 15-Sep-2023 Dan Carpenter <dan.carpenter@linaro.org>

bcachefs: snapshot: Add missing assignment in bch2_delete_dead_snapshots()

This code accidentally left out the "ret = " assignment so the errors
from for_each_btree_key2() are not checked.

Fixes: 53534482a250 ("bcachefs: for_each_btree_key2()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6bd68ec2 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Heap allocate btree_trans

We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96dea3d5 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix W=12 build errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6bf3766b 12-Sep-2023 Colin Ian King <colin.i.king@gmail.com>

bcachefs: Fix a handful of spelling mistakes in various messages

There are several spelling mistakes in error messages. Fix these.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e46c181a 10-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert more code to bch_err_msg()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c872afa2 10-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix bch2_propagate_key_to_snapshot_leaves()

When we handle a transaction restart in a nested context, we need to
return -BCH_ERR_transaction_restart_nested because we invalidated the
outer context's iterators and locks.

bch2_propagate_key_to_snapshot_leaves() wasn't doing this, this patch
fixes it to use trans_was_restarted().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 097d4cc8 28-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix snapshot_skiplist_good()

We weren't correctly checking snapshot skiplist nodes - we were checking
if they were in the same tree, not if they were an actual ancestor.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a111901f 18-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_propagate_key_to_snapshot_leaves()

If fsck finds a key that needs work done, the primary example being an
unlinked inode that needs to be deleted, and the key is in an internal
snapshot node, we have a bit of a conundrum.

The conundrum is that internal snapshot nodes are shared, and we in
general do updates in internal snapshot nodes because there may be
overwrites in some snapshots and not others, and this may affect other
keys referenced by this key (i.e. extents).

For example, we might be seeing an unlinked inode in an internal
snapshot node, but then in one child snapshot the inode might have been
reattached and might not be unlinked. Deleting the inode in the internal
snapshot node would be wrong, because then we'll delete all the extents
that the child snapshot references.

But if an unlinked inode does not have any overwrites in child
snapshots, we're fine: the inode is overwrritten in all child snapshots,
so we can do the deletion at the point of comonality in the snapshot
tree, i.e. the node where we found it.

This patch adds a new helper, bch2_propagate_key_to_snapshot_leaves(),
to handle the case where we need a to update a key that does have
overwrites in child snapshots: we copy the key to leaf snapshot nodes,
and then rewind fsck and process the needed updates there.

With this, fsck can now always correctly handle unlinked inodes found in
internal snapshot nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f55d6e07 17-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Cleanup redundant snapshot nodes

After deleteing snapshots, we may be left with a snapshot tree where
some nodes only have one child, and we have a linear chain.

Interior snapshot nodes are never used directly (i.e. they never have
subvolumes that point to them), they are only referered to by child
snapshot nodes - hence, they are redundant.

The existing code talks about redundant snapshot nodes as forming and
equivalence class; i.e. nodes for which snapshot_t->equiv is equal. In a
given equivalence class, we only ever need a single key at a given
position - i.e. multiple versions with different snapshot fields are
redundant.

The existing snapshot cleanup code deletes these redundant keys, but not
redundant nodes. It turns out this is buggy, because we assume that
after snapshot deletion finishes we should only have a single key per
equivalence class, but the btree update path doesn't preserve this -
overwriting keys in old snapshots doesn't check for the equivalence
class being equal, and thus we can end up with duplicate keys in the
same equivalence class and fsck complaining about snapshot deletion not
having run correctly.

The equivalence class notion has been leaking out of the core snapshots
code and into too much other code, i.e. fsck, so this patch takes a
different approach: snapshot deletion now moves keys to the node in an
equivalence class being kept (the leafiest node) and then deletes the
redundant nodes in the equivalance class.

Some work has to be done to correctly delete interior snapshot nodes;
snapshot node depth and skiplist fields for descendent nodes have to be
fixed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 66487c54 13-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix is_ancestor bitmap

The is_ancestor bitmap is at optimization for bch2_snapshot_is_ancestor;
once we get sufficiently close to the ancestor ID we're searching for we
test a bitmap.

But initialization of the is_ancestor bitmap was broken; we do it by
using bch2_snapshot_parent(), but we call that on nodes that haven't
been initialized yet with bch2_mark_snapshot().

Fix this by adding a separate loop in bch2_snapshots_read() for
initializing the is_ancestor bitmap, and also add some new debug asserts
for checking this sort of breakage in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fa5bed37 18-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: move check_pos_snapshot_overwritten() to snapshot.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8e877caa 16-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Split out snapshot.c

subvolume.c has gotten a bit large, this splits out a separate file just
for managing snapshot trees - BTREE_ID_snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>