History log of /linux-master/fs/bcachefs/recovery.c
Revision Date Author Comments
# 79055f50 15-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: make sure to release last journal pin in replay

This fixes a deadlock when journal replay has many keys to insert that
were from fsck, not the journal.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5ab4beb7 08-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't scan for btree nodes when we can reconstruct

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a292be3b 27-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Reconstruct missing snapshot nodes

When the snapshots btree is going, we'll have to delete huge amounts of
data - unless we can reconstruct it by looking at the keys that refer to
it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 55936afe 15-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Flag btrees with missing data

We need this to know when we should attempt to reconstruct the snapshots
btree

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4409b808 11-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Repair pass for scanning for btree nodes

If a btree root or interior btree node goes bad, we're going to lose a
lot of data, unless we can recover the nodes that it pointed to by
scanning.

Fortunately btree node headers are fully self describing, and
additionally the magic number is xored with the filesytem UUID, so we
can do so safely.

This implements the scanning - next patch will rework topology repair to
make use of the found nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f2f61f41 14-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bdbf953b 19-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_shoot_down_journal_keys()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 27fcec6c 30-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Clear recovery_passes_required as they complete without errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 13c1e583 28-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve -o norecovery; opts.recovery_pass_limit

This adds opts.recovery_pass_limit, and redoes -o norecovery to make use
of it; this fixes some issues with -o norecovery so it can be safely
used for data recovery.

Norecovery means "don't do journal replay"; it's an important data
recovery tool when we're getting stuck in journal replay.

When using it this way we need to make sure we don't free journal keys
after startup, so we continue to overlay them: thus it needs to imply
retain_recovery_info, as well as nochanges.

recovery_pass_limit is an explicit option for telling recovery to exit
after a specific recovery pass; this is a much cleaner way of
implementing -o norecovery, as well as being a useful debug feature in
its own right.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0a34c058 30-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Ensure bch_sb_field_ext always exists

This makes bch_sb_field_ext more consistent with the rest of -o
nochanges - we don't want to be varying other codepaths based on -o
nochanges, since it's used for testing in dry run mode; also fixes some
potential null ptr derefs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4fe0eeea 28-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Flush journal immediately after replay if we did early repair

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d2554263 23-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Split out recovery_passes.c

We've grown a fair amount of code for managing recovery passes; tracking
which ones we're running, which ones need to be run, and flagging in the
superblock which ones need to be run on the next recovery.

So it's worth splitting out into its own file, this code is pretty
different from the code in recovery.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a5860368 16-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't corrupt journal keys gap buffer when dropping alloc info

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cdce1094 11-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: reconstruct_alloc cleanup

Now that we've got the errors_silent mechanism, we don't have to check
if the reconstruct_alloc option is set all over the place.

Also - users no longer have to explicitly select fsck and fix_errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2cce3752 25-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: split out ignore_blacklisted, ignore_not_dirty

prep work for replaying the journal backwards

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 69426613 23-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: improve move_gap()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 95ffc7fb 23-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: journal_keys now uses darray helpers

nice bit of code cleanup

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 894d0622 23-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename journal_keys.d -> journal_keys.data

This will let us use some darray helpers in the next patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 52946d82 06-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill more -EIO error codes

This converts -EIOs related to btree node errors to private error codes,
which will help with some ongoing debugging by giving us better error
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ba89083e 08-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix journal replay with unreadable btree roots

When a btree root is unreadable, we still might be able to get some data
back by replaying what's in the journal. Previously though, we got
confused when journal replay would attempt to replay a key for a level
that didn't exist.

This adds bch2_btree_increase_depth(), so that journal replay can handle
this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6fa30fe7 06-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: journal_seq_blacklist_add() now handles entries being added out of order

bch2_journal_seq_blacklist_add() was bugged when the new entry
overlapped with multiple existing entries, and it also assumed new
entries are being added in increasing order.

This is true on any sane filesystem, but when trying to recover from
very badly mangled filesystems we might end up with the journal sequence
number rewinding vs. what the blacklist list knows about - easiest to
just handle that here.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2eeccee8 12-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix check_version_upgrade()

When also downgrading, check_version_upgrade() could pick a new version
greater than the latest supported version.

Fixes:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8e7834a8 16-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_fs_usage_base

Split out base filesystem usage into its own type; prep work for
breaking up bch2_trans_fs_usage_apply().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 72e2c920 05-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Restart recovery passes more reliably

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 15eaaa4c 03-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Upgrades now specify errors to fix, like downgrades

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d55ddf6e 31-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Online fsck can now fix errors

BCH_FS_fsck_done -> BCH_FS_fsck_running; set when we might be fixing
fsck errors. Also; set fix_errors to ask by default when fsck is
running.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# eff1f728 29-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Upgrading uses bch_sb.recovery_passes_required

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 62719cf3 23-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix nochanges/read_only interaction

nochanges means "we cannot issue writes at all"; it's possible to go
into a pseudo read-write mode where we pin dirty metadata in memory,
which is used for fsck in dry run mode and doing journal replay on a
read only mount, but we do not want to allow an actual read-write mount
in nochanges mode.

But we do always want to allow early read-write, during recovery - this
patch clarifies that.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cea07a7b 17-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: vstruct_for_each() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9fea2274 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: for_each_member_device() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# defd9e39 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: darray_for_each() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cf904c8d 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_err_(fn|msg) check if should print

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 249bf593 09-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix snapshot.c assertion for online fsck

c->curr_recovery_pass can go backwards; this adds a non rewinding
version, c->recovery_pass_done.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 27b2df98 07-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill for_each_btree_key()

for_each_btree_key() handles transaction restarts, like
for_each_btree_key2(), but only calls bch2_trans_begin() after a
transaction restart - for_each_btree_key2() wraps every loop iteration
in a transaction.

The for_each_btree_key() behaviour is problematic when it leads to
holding the SRCU lock that prevents key cache reclaim for an unbounded
amount of time - there's no real need to keep it around.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7f391b2f 06-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_run_online_recovery_passes()

Add a new helper for running online recovery passes - i.e. online fsck.
This is a subset of our normal recovery passes, and does not - for now -
use or follow c->curr_recovery_pass.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2b41226d 04-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add ability to redirect log output

Upcoming patches are going to add two new ioctls for running fsck in the
kernel, but pretending that we're running our normal userspace fsck.

This patch adds some plumbing for redirecting our normal log messages
away from the dmesg log to a thread_with_file file descriptor - via a
struct log_output, which will be consumed by the fsck f_op's read method.

The new ioctls will allow for running fsck in the kernel against an
offline filesystem (without mounting it), and an online filesystem. For
an offline filesystem we need a way to pass in a pointer to the
log_output, which is done via a new hidden opts.h option.

For online fsck, we can set c->output directly, but only want to
redirect log messages from the thread running fsck - hence the new
c->output_filter method.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3f0e297d 28-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Explicity go RW for fsck

This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less
error prone.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3c471b65 26-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: convert bch_fs_flags to x-macro

Now we can print out filesystem flags in sysfs, useful for debugging
various "what's my filesystem doing" issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9b34f02c 23-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill dev_usage->buckets_ec

This counter is redundant; it's simply the sum of BCH_DATA_stripe and
BCH_DATA_parity buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b27d7afb 09-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't flush journal after replay

The flush_all_pins() after journal replay was unecessary, and trying to
completely flush the journal while RW is not a great idea - it's not
guaranteed to terminate if other threads keep adding things to the
jorunal.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cb52d23e 11-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename BTREE_INSERT flags

BTREE_INSERT flags are actually transaction commit flags - rename them
for clarity.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 57322430 09-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Make journal replay more efficient

Journal replay now first attempts to replay keys in sorted order,
similar to how the btree write buffer flush path works.

Any keys that can not be replayed due to journal deadlock are then left
for later and replayed in journal order, unpinning journal entries as we
go.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bdde9829 09-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Go rw before journal replay

This gets us slightly nicer log messages.

Also, this slightly clarifies synchronization of c->journal_keys; after
we go RW it's in use by multiple threads (so that the btree iterator
code can overlay keys from the journal); so it has to be prepped before
that point.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9a71de67 08-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: BTREE_INSERT_JOURNAL_REPLAY now "don't init trans->journal_res"

This slightly changes how trans->journal_res works, in preparation for
changing the btree write buffer flush path to use it.

Now, BTREE_INSERT_JOURNAL_REPLAY means "don't take a journal
reservation; trans->journal_res.seq already refers to the journal
sequence number to pin".

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fbf92708 16-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Print old version when scanning for old metadata

Also: we should be using bch2_fs_read_write_early()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 30418de0 13-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Flush fsck errors before running twice

It's confusing if we run fsck a second time (in debug mode, to verify
the second run is clean), but errors are still ratelimited from the
first run.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 84f16387 29-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_sb_field_downgrade

Add a new superblock section that contains a list of
{ minor version, recovery passes, errors_to_fix }

that is - a list of recovery passes that must be run when downgrading
past a given version, and a list of errors to silently fix.

The upcoming disk accounting rewrite is not going to be fully
compatible: we're going to have to regenerate accounting both when
upgrading to the new version, and also from downgrading from the new
version, since the new method of doing disk space accounting is a
completely different architecture based on deltas, and synchronizing
them for every jounal entry write to maintain compatibility is going to
be too expensive and impractical.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8b16413c 29-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_sb.recovery_passes_required

Add two new superblock fields. Since the main section of the superblock
is now fully, we have to add a new variable length section for them -
bch_sb_field_ext.

- recovery_passes_requried: recovery passes that must be run on the
next mount
- errors_silent: errors that will be silently fixed

These are to improve upgrading and dwongrading: these fields won't be
cleared until after recovery successfully completes, so there won't be
any issues with crashing partway through an upgrade or a downgrade.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 808c680f 29-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add persistent identifiers for recovery passes

The next patch will start to refer to recovery passes from the
superblock; naturally, we now need identifiers that don't change, since
the existing enum is in the order in which they are run and is not
fixed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f87bf892 29-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fix setting version_upgrade_complete

If a superblock write hasn't happened (i.e. we never had to go rw), then
c->sb.version will be out of date w.r.t. c->disk_sb.sb->version.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4a147af2 09-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix uninitialized var in bch2_journal_replay()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8a443d3e 17-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Proper refcounting for journal_keys

The btree iterator code overlays keys from the journal until journal
replay is finished; since we're now starting copygc/rebalance etc.
before replay is finished, this is multithreaded access and thus needs
refcounting.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5a53f851 03-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix recovery when forced to use JSET_NO_FLUSH journal entry

When we didn't find anything in the journal that we'd like to use, and
we're forced to use whatever we can find - that entry will have been a
JSET_NO_FLUSH entry with a garbage last_seq value, since it's not
normally used.

Initialize it to something sane, for bch2_fs_journal_start().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6dfa10ab 31-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix build errors with gcc 10

gcc 10 seems to complain about array bounds in situations where gcc 11
does not - curious.

This unfortunately requires adding some casts for now; we may
investigate getting rid of our __u64 _data[] VLA in a future patch so
that our start[0] members can be VLAs.

Reported-by: John Stoffel <john@stoffel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b65db750 24-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Enumerate fsck errors

This patch adds a superblock error counter for every distinct fsck
error; this means that when analyzing filesystems out in the wild we'll
be able to see what sorts of inconsistencies are being found and repair,
and hence what bugs to look for.

Errors validating bkeys are not yet considered distinct fsck errors, but
this patch adds a new helper, bkey_fsck_err(), in order to add distinct
error types for them as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fb3f57bb 20-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: rebalance_work

This adds a new btree, rebalance_work, to eliminate scanning required
for finding extents that need work done on them in the background - i.e.
for the background_target and background_compression options.

rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an
extent in the extents or reflink btree at the same pos.

A new extent field is added, bch_extent_rebalance, which indicates that
this extent has work that needs to be done in the background - and which
options to use. This allows per-inode options to be propagated to
indirect extents - at least in some circumstances. In this patch,
changing IO options on a file will not propagate the new options to
indirect extents pointed to by that file.

Updating (setting/clearing) the rebalance_work btree is done by the
extent trigger, which looks at the bch_extent_rebalance field.

Scanning is still requrired after changing IO path options - either just
for a given inode, or for the whole filesystem. We indicate that
scanning is required by adding a KEY_TYPE_cookie key to the
rebalance_work btree: the cookie counter is so that we can detect that
scanning is still required when an option has been flipped mid-way
through an existing scan.

Future possible work:
- Propagate options to indirect extents when being changed
- Add other IO path options - nr_replicas, ec, to rebalance_work so
they can be applied in the background when they change
- Add a counter, for bcachefs fs usage output, showing the pending
amount of rebalance work: we'll probably want to do this after the
disk space accounting rewrite (moving it to a new btree)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bbe682c7 21-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Ensure devices are always correctly initialized

We can't mark device superblocks or allocate journal on a device that
isn't online.

That means we may need to do this on every mount, because we may have
formatted a new filesystem and then done the first mount
(bch2_fs_initialize()) in degraded mode.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 88dfe193 19-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_id_str()

Since we can run with unknown btree IDs, we can't directly index btree
IDs into fixed size arrays.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b0b5bbf9 19-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't run bch2_delete_dead_snapshots() unnecessarily

Be a bit more careful about when bch2_delete_dead_snapshots needs to
run: it only needs to run synchronously if we're running fsck, and it
only needs to run at all if we have snapshot nodes to delete or if fsck
has noticed that it needs to run.

Also:
Rename BCH_FS_HAVE_DELETED_SNAPSHOTS -> BCH_FS_NEED_DELETE_DEAD_SNAPSHOTS

Kill bch2_delete_dead_snapshots_hook(), move functionality to
bch2_mark_snapshot()

Factor out bch2_check_snapshot_needs_deletion(), to explicitly check
if we need to be running snapshot deletion.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 795413c5 29-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix drop_alloc_keys()

For consistency with the rest of the reconstruct_alloc option, we should
be skipping all alloc keys.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4fc1f402 27-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix another smatch complaint

This should be harmless, but initialize last_seq anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7dcf62c0 26-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Make btree root read errors recoverable

The entire btree will be lost, but that is better than the entire
filesystem not being recoverable.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6bd68ec2 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Heap allocate btree_trans

We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96dea3d5 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix W=12 build errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6bf3766b 12-Sep-2023 Colin Ian King <colin.i.king@gmail.com>

bcachefs: Fix a handful of spelling mistakes in various messages

There are several spelling mistakes in error messages. Fix these.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# aaad530a 27-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: BTREE_ID_logged_ops

Add a new btree for long running logged operations - i.e. for logging
operations that we can't do within a single btree transaction, so that
they can be resumed if we crash.

Keys in the logged operations btree will represent operations in
progress, with the state of the operation stored in the value.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8e877caa 16-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Split out snapshot.c

subvolume.c has gotten a bit large, this splits out a separate file just
for managing snapshot trees - BTREE_ID_snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e0a2b00a 11-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix check_version_upgrade()

We were failing to upgrade to the latest compatible version - whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 401585fe 05-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: btree_journal_iter.c

Split out a new file from recovery.c for managing the list of keys we
read from the journal: before journal replay finishes the btree iterator
code needs to be able to iterate over and return keys from the journal
as well, so there's a fair bit of code here.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a37ad1a3 05-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: sb-clean.c

Pull code for bch_sb_field_clean out into its own file.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1e81f89b 06-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix assorted checkpatch nits

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e08e63e4 06-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: BCH_COMPAT_bformat_overflow_done no longer required

Awhile back, we changed bkey_format generation to ensure that the packed
representation could never represent fields larger than the unpacked
representation.

This was to ensure that bkey_packed_successor() always gave a sensible
result, but in the current code bkey_packed_successor() is only used in
a debug assertion - not for anything important.

This kills the requirement that we've gotten rid of those weird bkey
formats, and instead changes the assertion to check if we're dealing
with an old weird bkey format.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0ed4ca14 03-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Ensure topology repair runs

This fixes should_restart_for_topology_repair() - previously it was
returning false if the btree io path had already seleceted topology
repair to run, even if it hadn't run yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ad52bac2 03-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Log a message when running an explicit recovery pass

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a1d1072f 03-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Print out required recovery passes on version upgrade

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b56b787c 02-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: In debug mode, run fsck again after fixing errors

We want to ensure that fsck actually fixed all the errors it found - the
second fsck run should be clean.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 922bc5a0 16-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Make topology repair a normal recovery pass

This adds bch2_run_explicit_recovery_pass(), for rewinding recovery and
explicitly running a specific recovery pass - this is a more general
replacement for how we were running topology repair before.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ae2e13d7 16-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_run_explicit_recovery_pass()

This introduces bch2_run_explicit_recovery_pass() and uses it for when
fsck detects that we need to re-run dead snaphots cleanup, and makes
dead snapshot cleanup more like a normal recovery pass.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a0f8faea 11-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fix_errors option is now a proper enum

Before, it was parsed as a bool but internally it was really an enum:
this lets us pass in all the possible values.

But we special case the option parsing: no supplied value is parsed as
FSCK_FIX_yes, to match the previous behaviour.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f26c67f4 25-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Snapshot depth, skiplist fields

This extents KEY_TYPE_snapshot to include some new fields:
- depth, to indicate depth of this particular node from the root
- skip[3], skiplist entries for quickly walking back up to the root

These are to improve bch2_snapshot_is_ancestor(), making it O(ln(n))
instead of O(n) in the snapshot tree depth.

Skiplist nodes are picked at random from the set of ancestor nodes, not
some fixed fraction.

This introduces bcachefs_metadata_version 1.1, snapshot_skiplists.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 065bd335 10-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Version table now lists required recovery passes

Now that we've got forward compatibility sorted out, we should be doing
more frequent version upgrades in the future.

To avoid having to run a full fsck for every version upgrade, this
improves the BCH_METADATA_VERSIONS() table to explicitly specify a
bitmask of recovery passes to run when upgrading to or past a given
version.

This means we can also delete PASS_UPGRADE().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6619d846 09-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_sb_maybe_downgrade(), bch2_sb_upgrade()

Add some new helpers, and fix upgrade/downgrade in bch2_fs_initialize().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ba8eeae8 27-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bcachefs_metadata_version_major_minor

This introduces major/minor versioning to the superblock version number.
Major version number changes indicate incompatible releases; we can move
forward to a new major version number, but not backwards. Minor version
numbers indicate compatible changes - these add features, but can still
be mounted and used by old versions.

With the recent patches that make it possible to roll out new btrees and
key types without breaking compatibility, we should be able to roll out
most new features without incompatible changes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 067d228b 07-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Enumerate recovery passes

Recovery and fsck have many different passes/jobs to do, which always
run in the same order - but not all of them run all the time. Some are
for fsck, some for unclean shutdown, some for version upgrades.

This adds some new structure: a defined list of recovery passes that we
can run in a loop, as well as consolidating the log messages.

The main benefit is consolidating the "should run this recovery pass"
logic, as well as cleaning up the "this recovery pass has finished"
state; instead of having a bunch of ad-hoc state bits in c->flags, we've
now got c->curr_recovery_pass.

By consolidating the "should run this recovery pass" logic, in the
future on disk format upgrades will be able to say "upgrading to this
version requires x passes to run", instead of forcing all of fsck to
run.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 78328fec 08-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Stash journal replay params in bch_fs

For the upcoming enumeration of recovery passes, we need all recovery
passes to be called the same way - including journal replay.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 10a6ced2 08-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill bch2_bucket_gens_read()

This folds bch2_bucket_gens_read() into bch2_alloc_read(), doing the
version check there.

This is prep work for enumarating all recovery passes: we need some
cleanup first to make calling all the recovery passes consistent.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3045bb95 27-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: version_upgrade is now an enum

The version_upgrade parameter is now an enum, not a bool, and it's
persistent in the superblock:
- compatible (default): upgrade to the latest compatible version
- incompatible: upgrade to latest incompatible version
- none

Currently all upgrades are incompatible upgrades, but the next release
will introduce major:minor versions.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 24964e1c 28-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE()

Version upgrades are not atomic operations: when we do a version upgrade
we need to update the superblock before we start using new features, and
then when the upgrade completes we need to update the superblock again.
This adds a new superblock field so we can detect and handle incomplete
version upgrades.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c8b4534d 07-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Delete redundant log messages

Now that we have distinct error codes for different memory allocation
failures, the early init log messages are no longer needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 73bd774d 06-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted sparse fixes

- endianness fixes
- mark some things static
- fix a few __percpu annotations
- fix silent enum conversions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# faa6cb6c 28-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Allow for unknown btree IDs

We need to allow filesystems with metadata from newer versions to be
mountable and usable by older versions.

This patch enables us to roll out new btrees without a new major version
number; we can now handle btree roots for unknown btree types.

The unknown btree roots will be retained, and fsck (including
backpointers) will check them, the same as other btree types.

We add a dynamic array for the extra, unknown btree roots, in addition
to the fixed size btree root array, and add new helpers for looking up
btree roots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e3804b55 28-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_version_to_text()

Add a new helper for printing out metadata versions in a standard
format.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ec14fc60 27-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill JOURNAL_WATERMARK

This unifies JOURNAL_WATERMARK with BCH_WATERMARK; we're working towards
specifying watermarks once in the transaction commit path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1bb3c2a9 20-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New error message helpers

Add two new helpers for printing error messages with __func__ and
bch2_err_str():
- bch_err_fn
- bch_err_msg

Also kill the old error strings in the recovery path, which were causing
us to incorrectly report memory allocation failures - they're not needed
anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e47a390a 27-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert -ENOENT to private error codes

As with previous conversions, replace -ENOENT uses with more informative
private error codes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c59b483 29-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: BTREE_ID_snapshot_tree

This adds a new btree which gets us a persistent per-snapshot-tree
identifier.

- BTREE_ID_snapshot_trees
- KEY_TYPE_snapshot_tree
- bch_snapshot now has a field that points to a snapshot_tree

This is going to be used to designate one snapshot ID/subvolume out of a
given tree of snapshots as the "main" subvolume, so that we can do quota
accounting in that subvolume and not the rest.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bcb79a51 29-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_bkey_get_iter() helpers

Introduce new helpers for a common pattern:

bch2_trans_iter_init();
bch2_btree_iter_peek_slot();

- bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of
the correct type
- bch2_bkey_get_val_typed() copies the val out of the btree to a
(typically stack allocated) variable; it handles the case where the
value in the btree is smaller than the current version of the type,
zeroing out the remainder.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 27763692 04-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add a cond_resched() call to journal_keys_sort()

We're just doing cpu work here and it could take awhile, a
cond_resched() is definitely needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 62a03559 31-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rip out code for storing backpointers in alloc keys

We don't store backpointers in alloc keys anymore, since we gained the
btree write buffer.

This patch drops support for backpointers in alloc keys, and revs the on
disk format version so that we know a fsck is required.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 349b1d83 22-Mar-2023 Brian Foster <bfoster@redhat.com>

bcachefs: use reservation for log messages during recovery

If we block on journal reservation attempting to log journal
messages during recovery, particularly for the first message(s)
before we start doing actual work, chances are the filesystem ends
up deadlocked.

Allow logged messages to use reserved journal space to mitigate this
problem. In the worst case where no space is available whatsoever,
this at least allows the fs to recognize that the journal is stuck
and fail the mount gracefully.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 26559553 15-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add a fallback when journal_keys doesn't fit in ram

We may end up in a situation where allocating the buffer for the sorted
journal_keys fails - but it would likely succeed, post compaction where
we drop duplicates.

We've had reports of this allocation failing, so this adds a slowpath to
do the compaction incrementally.

This is only a band-aid fix; we need to look at limiting the number of
keys in the journal based on the amount of system RAM.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 40a18fe2 14-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add error message for failing to allocate sorted journal keys

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 65d48e35 14-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Private error codes: ENOMEM

This adds private error codes for most (but not all) of our ENOMEM uses,
which makes it easier to track down assorted allocation failures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ac2ccddc 04-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Drop some anonymous structs, unions

Rust bindgen doesn't cope well with anonymous structs and unions. This
patch drops the fancy anonymous structs & unions in bkey_i that let us
use the same helpers for bkey_i and bkey_packed; since bkey_packed is an
internal type that's never exposed to outside code, it's only a minor
inconvenienc.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 27616a31 18-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Simplify ec stripes heap

Now that we have a separate data structure for tracking open stripes,
the stripes heap can track all existing stripes, which is a nice
simplification.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 80c33085 05-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fragmentation LRU

Now that we have much more efficient updates to the LRU btree, this
patch adds a new LRU that indexes buckets by fragmentation.

This means copygc no longer has to scan every bucket to find buckets
that need to be evacuated.

Changes:
- A new field in bch_alloc_v4, fragmentation_lru - this corresponds to
the bucket's position in the fragmentation LRU. We add a new field
for this instead of calculating it as needed because we may make the
fragmentation LRU optional; this field indicates whether a bucket is
on the fragmentation LRU.

Also, zoned devices will introduce variable bucket sizes; explicitly
recording the LRU position will be safer for them.

- A new copygc path for using the fragmentation LRU instead of
scanning every bucket and building up an in-memory heap.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 806c8a6a 12-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix failure to read btree roots

If failed to read a btree root - or if we're not using a btree root,
because of the reconstruct_alloc option - make sure we update the
corresponding info for the key/level for the root on disk.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 83f33d68 05-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rework lru btree

This patch changes how the LRU index works:

Instead of using KEY_TYPE_lru where the bucket the lru entry points to
is part of the value, this switches to KEY_TYPE_set and encoding the
bucket we refer to in the low bits of the key.

This means that we no longer have to check for collisions when inserting
LRU entries. We'll be making using of this in the next patch, which adds
a btree write buffer - a pure write buffer for btree updates, where
updates are appended to a simple array and then periodically sorted and
batch inserted.

This is a new on disk format version, and a forced upgrade.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5250b74d 25-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bucket_gens btree

To improve mount times, add a btree for just bucket gens, 256 of them
per key: this means we'll have to scan drastically less metadata at
startup.

This adds
- trigger for keeping it in sync with the all btree
- initialization code, for filesystems from previous versions
- new path for reading bucket gens
- new fsck code

And a new on disk format version.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8dd69d9f 21-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: KEY_TYPE_inode_v3, metadata_version_inode_v3

Move bi_size and bi_sectors into the non-varint portion of the inode, so
that the write path can update them without going through the relatively
expensive unpack/pack operations.

Other changes:
- Add a field for the offset of the varint section, so we can add new
non-varint fields without needing a new inode type, like alloc_v3
- Move bi_mode into the flags field, so that the varint section can be
u64 aligned

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 47b323a0 19-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Start snapshots before bch2_gc()

bch2_gc may require snapshots to be started - the repair path when
checking the reflink btree may do updates to the extents btree.

This moves bch2_fs_initialize_subvolumes() and bch2_fs_snapshots_start()
to before bch2_gc() - since we haven't gone RW yet, the updates in
bch2_fs_initialize_subvolumes() are done via the journal replay keys
list, so it's fine to do this before bch2_gc().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a8c752bb 17-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New on disk format: Backpointers

This patch adds backpointers: we now have a reverse index from device
and offset on that device (specifically, offset within a bucket) back to
btree nodes and (non cached) data extents.

The first 40 backpointers within a bucket are stored in the alloc key;
after that backpointers spill over to the next backpointers btree. This
is to help avoid performance regressions from additional btree updates
on large streaming workloads.

This patch adds all the code for creating, checking and repairing
backpointers. The next patch in the series is going to use backpointers
for copygc - finally getting rid of the need to scan all extents to do
copygc.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f2b542ba 11-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Go RW before check_alloc_info()

It's possible to do btree updates before going RW by adding them to the
list of updates for journal replay to do, but this is limited by what
fits in RAM. This patch switches the second alloc info phase to run
after going RW - btree_gc has already ensured the alloc btree itself is
correct - and tweaks the allocation path to deal with the potential
small inconsistencies.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5f5c7466 17-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Start copygc when first going read-write

In the distant past, it wasn't possible to start copygc until after
journal replay had finished. Now, the btree iterator code overlays keys
from the journal, so there's no reason not to start it earlier - and it
solves a rare deadlock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 858536c7 11-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert EROFS errors to private error codes

More error code improvements - this gets us more useful error messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5bbe3f2d 14-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Log more messages in the journal

This patch

- Adds a mechanism for queuing up journal entries prior to the journal
being started, which will be used for early journal log messages

- Adds bch2_fs_log_msg() and improves bch2_trans_log_msg(), which now
take format strings. bch2_fs_log_msg() can be used before or after
the journal has been started, and will use the appropriate mechanism.

- Deletes the now obsolete bch2_journal_log_msg()

- And adds more log messages to the recovery path - messages for
journal/filesystem started, journal entries being blacklisted, and
journal replay starting/finishing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 67ace272 22-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add a missing bch2_err_str() call

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1ba8a796 14-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Recover from blacklisted journal entries

If it so happens that we crash while dirty, meaning we don't have the
superblock clean section, and we erroneously mark a journal entry we
wrote as blacklisted, we won't be able to recover.

This patch fixes this by adding a fallback: if we've got no superblock
clean section, and no non-ignored journal entries, we try the most
recent ignored journal entry.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4f948723 09-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix bch2_journal_keys_peek_upto()

bch2_journal_keys_peek_upto() was comparing against btree_id & level
incorrectly - fix this by using __journal_key_cmp().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e0de429a 01-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't error out when just reading the journal

This tweaks the recovery and journal paths so that we don't error out
before we need to: the list_journal command should work, even if we
wouldn't be able to replay successfully.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e88a75eb 24-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New bpos_cmp(), bkey_cmp() replacements

This patch introduces
- bpos_eq()
- bpos_lt()
- bpos_le()
- bpos_gt()
- bpos_ge()

and equivalent replacements for bkey_cmp().

Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b2d1d56b 13-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fixes for building in userspace

- Marking a non-static function as inline doesn't actually work and is
now causing problems - drop that

- Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages
with bcachefs (filesystem name)

- Userspace doesn't have real percpu variables (maybe we can get this
fixed someday), put an #ifdef around bch2_disk_reservation_add()
fastpath

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a1019576 22-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More style fixes

Fixes for various checkpatch errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c167f9e5 23-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Journal keys overlay fixes

- In the btree iterator code that overlays keys from the journal, we
were incorrectly specifying level=0 instead of the btree_path's
current level in a few places
- When we didn't do journal replay, we shouldn't free the journal keys:
this fixes cmd_list and cmd_dump, which run in norecovery mode

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e3e02e6 19-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted checkpatch fixes

checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:

ASSIGN_IN_IF
UNSPECIFIED_INT - bcachefs coding style prefers single token type names
NEW_TYPEDEFS - typedefs are occasionally good
FUNCTION_ARGUMENTS - we prefer to look at functions in .c files
(hopefully with docbook documentation), not .h
file prototypes
MULTISTATEMENT_MACRO_USE_DO_WHILE
- we have _many_ x-macros and other macros where
we can't do this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1ffb876f 12-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill journal_keys->journal_seq_base

This removes an optimization that didn't actually save us any memory,
due to alignment, but did make the code more complicated than it needed
to be. We were also seeing a bug where journal_seq_base wasn't getting
correctly initailized, so hopefully it'll fix that too.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1ed0a5d2 19-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert fsck errors to errcode.h

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d4bf5eec 18-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bch2_err_str() in error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 3ab25c1b 22-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: We can handle missing btree roots for all alloc btrees

We can rebuild alloc info if these btree roots are missing - no need to
bail out and say the filesystem is unrecoverable

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4ab35c34 13-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix subvol/snapshot deleting in recovery

fsck doesn't want to run while we're cleaning up deleted snapshots - if
that work needs to be done, we want it to have finished before fsck
runs, otherwise fsck will get confused when it finds multiple keys in
the same snapshot ID equivalence class (i.e. the mechanism that
snapshot deletion uses for cleaning up redundant keys).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 401ec4db 03-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Printbuf rework

This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 57617902 06-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix btree_and_journal_iter

We had a bug where btree_and_journal_iter would return the same key
twice - after deleting it (perhaps because it was present in both the
btree and the journal?)

This reworks btree_and_journal_iter to track the current position, much
like btree_paths, which makes the logic considerably simpler and more
robust.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f2aa0265 06-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix for cmd_list_journal

cmd_list_journal wasn't correctly listing the most recent journal
entries as blacklisted - because in the recovery path when just reading
the journal, we were failing to add those to the blacklist table.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 30525f68 21-May-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix journal_keys_search() overhead

Previously, on every btree_iter_peek() operation we were searching the
journal keys, doing a full binary search - which was slow.

This patch fixes that by saving our position in the journal keys, so
that we only do a full binary search when moving our position backwards
or a large jump forwards.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 11f5e595 25-May-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Always print when doing journal replay in fsck

This logging improvement helps see when the previous fsck pass has
completed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d8a161ad 14-May-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: LRU repair tweaks

- Drop old unneeded parameter for whether we're in initial GC - which
was from when btree updates had to be done differently before we
went RW.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 099989c1 20-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix journal_iters_fix()

journal_iters_fix() was incorrectly rewinding iterators past keys they
had already returned, leading to those keys being double counted in the
bch2_gc() path - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 1cab5a82 21-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Go RW before bch2_check_lrus()

btree updates before going RW are expensive if they're in random order,
since they use the list of keys for journal replay to insert, which is
just a gap buffer.

This patch improves the bucket invalidate path so that if
bch2_check_lrus() hasn't finished it only prints warnings instead of
doing an emergency shutdown, which means we can now set BCH_FS_MAY_GO_RW
before bch2_check_lrus().

Also, the filesystem state bits are reorganized a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 75c8d030 12-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill old rebuild_replicas option

This option was useful when the replicas mechism was new and still being
debugged, but hasn't been used in ages - let's delete it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c609947b 11-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix for getting stuck in journal replay

In journal replay, we weren't immediately dropping journal pins when we
start doing updates that ewern't from journal replay - leading to
journal reclaim getting stuck.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 5650bb46 11-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Introduce bch2_journal_keys_peek_(upto|slot)()

When many journal replay keys have been overwritten,
bch2_journal_keys_peek() was taking excessively long to scan before it
found a key to return.

Fix this by introducing bch2_journal_keys_peek_upto() which takes a
parameter for the end of the range we want, so that we can terminate the
search much sooner, and replace all uses of bch2_journal_keys_peek()
with peek_upto() or peek_slot().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 502f973d 09-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a few warnings on 32 bit

These showed up when building for mips.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# ce6201c4 20-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use a genradix for reading journal entries

Previously, the journal read path used a linked list for storing the
journal entries we read from disk. But there's been a bug that's been
causing journal_flush_delay to incorrectly be set to 0, leading to far
more journal entries than is normal being written out, which then means
filesystems are no longer able to start due to the O(n^2) behaviour of
inserting into/searching that linked list.

Fix this by switching to a radix tree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 95752a02 20-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor journal_keys_sort() to return an error code

When there weren't any keys in the journal there's no need to allocate
the buffer - but doing that causes a spurious -ENOMEM.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 822835ff 31-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fold bucket_state in to BCH_DATA_TYPES()

Previously, we were missing accounting for buckets in need_gc_gens and
need_discard states. This matters because buckets in those states need
other btree operations done before they can be used, so they can't be
conuted when checking current number of free buckets against the
allocation watermark.

Also, we weren't directly counting free buckets at all. Now, data type 0
== BCH_DATA_free, and free buckets are counted; this means we can get
rid of the separate (poorly defined) count of unavailable buckets.

This is a new on disk format version, with upgrade and fsck required for
the accounting changes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e1effd42 05-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More improvements for alloc info checks

- Move checks for whether the device & bucket are valid from the
.key_invalid method to bch2_check_alloc_key(). This is because
.key_invalid() is called on keys that may no longer exist (post
journal replay), which is a problem when removing/resizing devices.

- We weren't checking the need_discard btree to ensure that every set
bucket has a corresponding alloc key. This refactors the code for
checking the freespace btree, so that it now checks both.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d1d7737f 03-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Gap buffer for journal keys

Btree updates before we go RW work by inserting into the array of keys
that journal replay will insert - but inserting into a flat array is
O(n), meaning if btree_gc needs to update many alloc keys, we're O(n^2).

Fortunately, the updates btree_gc does happens in sequential order,
which means a gap buffer works nicely here - this patch implements a gap
buffer for journal keys.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 5735608c 10-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill main in-memory bucket array

All code using the in-memory bucket array, excluding GC, has now been
converted to use the alloc btree directly - so we can finally delete it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5add07d5 17-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fsck for need_discard & freespace btrees

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f25d8215 09-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill allocator threads & freelists

Now that we have new persistent data structures for the allocator, this
patch converts the allocator to use them.

Now, foreground bucket allocation uses the freespace btree to find
buckets to allocate, instead of popping buckets off the freelist.

The background allocator threads are no longer needed and are deleted,
as well as the allocator freelists. Now we only need background tasks
for invalidating buckets containing cached data (when we are low on
empty buckets), and for issuing discards.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c6b2826c 11-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Freespace, need_discard btrees

This adds two new btrees for the upcoming allocator rewrite: an extents
btree of free buckets, and a btree for buckets awaiting discards.

We also add a new trigger for alloc keys to keep the new btrees up to
date, and a compatibility path to initialize them on existing
filesystems.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 31f63fd1 14-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Introduce a separate journal watermark for copygc

Since journal reclaim -> btree key cache flushing may require the
allocation of new btree nodes, it has an implicit dependency on copygc
in order to make forward progress - so we should avoid blocking copygc
unless the journal is really close to full.

This introduces watermarks to replace our single MAY_GET_UNRESERVED bit
in the journal, and adds a watermark for copygc and plumbs it through.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d5d3be7d 10-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_journal_log_msg()

This adds bch2_journal_log_msg(), which just logs a message to the
journal, and uses it to mark startup and when journal replay finishes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# fa8e94fa 25-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Heap allocate printbufs

This patch changes printbufs dynamically allocate and reallocate a
buffer as needed. Stack usage has become a bit of a problem, and a major
cause of that has been static size string buffers on the stack.

The most involved part of this refactoring is that printbufs must now be
exited with printbuf_exit().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 78c8fe20 19-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Normal update/commit path now works before going RW

This improves __bch2_trans_commit - early in the recovery process, when
we're running btree_gc and before we want to go RW, it now uses
bch2_journal_key_insert() to add the update to the list of updates for
journal replay to do, instead of btree_gc having to use separate
interfaces depending on whether we're running at bringup or, later,
runtime.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 12bf93a4 20-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add .to_text() methods for all superblock sections

This patch improves the superblock .to_text() methods and adds methods
for all types that were missing them. It also improves printbufs by
allowing them to specfiy what units we want to be printing in, and adds
new wrapper methods for unifying our kernel and userspace environments.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8ccf4dff 19-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: opts.read_journal_only

Add an option that tells recovery to only read the journal, to be used
by the list_journal command.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 10b93677 19-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Delete some flag bits that are no longer used

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 0f78264a 13-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Print a better message for mark and sweep pass

Btree gc, aka mark and sweep, checks allocations - so let's just print
that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# ec061b21 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_gc no longer uses main in-memory bucket array

This changes the btree_gc code to only use the second bucket array, the
one dedicated to GC. On completion, it compares what's in its in memory
bucket array to the allocation information in the btree and writes it
directly, instead of updating the main in-memory bucket array and
writing that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 45e4cd9e 24-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: run_one_trigger() now checks journal keys

Previously, when doing updates and running triggers before journal
replay completes, triggers would see the incorrect key for the old key
being overwritten - this patch updates the trigger code to check the
journal keys when necessary, needed for the upcoming allocator rewrite.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9b6e2f1e 04-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

Revert "bcachefs: Delete some obsolete journal_seq_blacklist code"

This reverts commit f95b61228efd04c9c158123da5827c96e9773b29.

It turns out, we're seeing filesystems in the wild end up with
blacklisted btree node bsets - this should not be happening, and until
we understand why and fix it we need to keep this code around.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 03ea3962 04-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Log & error message improvements

- Add a shim uuid_unparse_lower() in the kernel, since %pU doesn't work
in userspace

- We don't need to print the bcachefs: or the filesystem name prefix in
userspace

- Improve a few error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 365f64f3 03-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add verbose log messages for journal read

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# fe312f81 03-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use kvmalloc() for array of sorted keys in journal replay

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d8601afc 27-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Simplify journal replay

With BTREE_ITER_WITH_JOURNAL, there's no longer any restrictions on the
order we have to replay keys from the journal in, and we can also start
up journal reclaim right away - and delete a bunch of code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 5222a460 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BTREE_ITER_WITH_JOURNAL

This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is
automatically enabled when initializing a btree iterator before journal
replay has completed - it overlays the contents of the journal with the
btree.

This lets us delete bch2_btree_and_journal_walk() and just use the
normal btree iterator interface instead - which also lets us delete a
significant amount of duplicated code.

Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch -
we're redoing the binary search over keys in the journal every time we
call bch2_btree_iter_peek().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f28620c1 01-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tweak journal reclaim order

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# cd7c2d3d 01-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make sure BCH_FS_FSCK_DONE gets set

If we're not running fsck we still want to set BCH_FS_FSCK_DONE, so that
bch2_fsck_err() calls are interpreted as bch2_inconsistent_error()
calls().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# dfd41fb9 31-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix race between btree updates & journal replay

Add a flag to indicate whether a journal replay key has been
overwritten, and set/test it with appropriate btree locks held.

This fixes a race between the allocator - invalidating buckets, and
doing btree updates - and journal replay, which before this patch could
clobber the allocator thread's update with an older version of the key
from the journal.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 528b18e6 31-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_journal_entry_to_text()

This adds a _to_text() pretty printer for journal entries - including
every subtype - which will shortly be used by the 'bcachefs
list_journal' subcommand.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 5ba2fd11 29-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Journal replay does't resort main list of keys

The upcoming BTREE_ITER_WITH_JOURNAL patch will require journal keys to
stay in sorted order, so the btree iterator code can overlay them over
btree keys.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d93cf685 27-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Run scan_old_btree_nodes after version upgrade

In the recovery path, we scan for old btree nodes if we don't have
certain compat bits set. If we do this, we should be doing it after we
upgraded to the newest on disk format.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 04f0f77d 26-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Delete some obsolete journal_seq_blacklist code

Since metadata version bcachefs_metadata_version_btree_ptr_sectors_written,
we haven't needed the journal seq blacklist mechanism for ignoring
blacklisted btree node writes - we now only need it for ignoring journal
entries that were written after the newest flush journal entry, and then
we only need to keep those blacklist entries around until journal replay
is finished.

That means we can delete the code for scanning btree nodes to GC
journal_seq_blacklist entries.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e75b2d4c 23-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_journal_key_insert() no longer transfers ownership

bch2_journal_key_insert() used to assume that the key passed to it was
allocated with kmalloc(), and on success took ownership. This patch
deletes that behaviour, making it more similar to
bch2_trans_update()/bch2_trans_commit().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c64740ef 30-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't start allocator threads too early

If the allocator threads start before journal replay has finished
replaying alloc keys, journal replay might overwrite the allocator's
btree updates.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 77170d0d 24-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_bucket_alloc_new_fs() no longer depends on bucket marks

Now that bch2_bucket_alloc_new_fs() isn't looking at bucket marks to
decide what buckets are eligible to allocate, we can clean up the
filesystem initialization and device add paths. Previously, we had to
use ancient code to mark superblock/journal buckets in the in memory
bucket marks as we allocated them, and then zero that out and re-do that
marking using the newer transational bucket mark paths. Now, we can
simply delete the in-memory bucket marking.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 09943313 24-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rewrite bch2_bucket_alloc_new_fs()

This changes bch2_bucket_alloc_new_fs() to a simple bump allocator that
doesn't need to use the in memory bucket array, part of a larger patch
series to entirely get rid of the in memory bucket array, except for
gc/fsck.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# fb0e4808 10-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_alloc_write()

This adds a new helper that much like the one we have for inode updates,
that allocates the packed alloc key, packs it and calls
bch2_trans_update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 990d42d1 04-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out struct gc_stripe from struct stripe

We have two radix trees of stripes - one that mirrors some information
from the stripes btree in normal operation, and another that GC uses to
recalculate block usage counts.

The normal one is now only used for finding partially empty stripes in
order to reuse them - the normal stripes radix tree and the GC stripes
radix tree are used significantly differently, so this patch splits them
into separate types.

In an upcoming patch we'll be replacing c->stripes with a btree that
indexes stripes by the order we want to reuse them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9be1efe9 15-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix error reporting from bch2_journal_flush_seq

- bch2_journal_halt() was unconditionally overwriting j->err_seq, the
sequence number that we failed to write
- journal_write_done was updating seq_ondisk and flushed_seq_ondisk even
for writes that errored, which broke the way bch2_journal_flush_seq_async()
locklessly checked for completions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 0a84a066 15-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Also log device name in userspace

Change log messages in userspace to be closer to what they are in kernel
space, and include the device name - it's also useful in userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 697e546f 26-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor journal replay code

This consolidates duplicated code in journal replay - it's only a few
flags that are different for replaying alloc keys.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 3e52c222 29-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add journal_seq to inode & alloc keys

Add fields to inode & alloc keys that record the journal sequence number
when they were most recently modified.

For alloc keys, this is needed to know what journal sequence number we
have to flush before the bucket can be reused. Currently this is tracked
in memory, but we'll be getting rid of the in memory bucket array.

For inodes, this is needed for fsync when the inode has been evicted
from the vfs cache. Currently we use a bloom filter per outstanding
journal buf - but that mechanism has been broken since we added the
ability to not issue a flush/fua for every journal write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 904823de 29-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert bch2_mark_key() to take a btree_trans *

This helps to unify the interface between bch2_mark_key() and
bch2_trans_mark_key() - and it also gives access to the journal
reservation and journal seq in the mark_key path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8325cd1e 27-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't do upgrades in nochanges mode

nochanges mode is often used for getting data off of otherwise
nonrecoverable filesystems, which is often because of errors hit during
fsck.

Don't force version upgrade & fsck in nochanges mode, so that it's more
likely to mount.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 4db65027 11-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Subvol dirents are now only visible in parent subvol

This changes the on disk format for dirents that point to subvols so
that they also record the subvolid of the parent subvol, so that we can
filter them out in other subvolumes.

This also updates the dirent code to do that filtering, and in
particular tweaks the rename code - we need to ensure that there's only
ever one dirent (counting multiplicities in different snapshots) that
point to a subvolume.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# bfe88863 19-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New on disk format to fix reflink_p pointers

We had a bug where reflink_p pointers weren't being initialized to 0,
and when we started using the second word, things broke badly.

This patch revs the on disk format version and adds cleanup code to zero
out the second word of reflink_p pointers before we start using it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 0476fa94 27-Sep-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rev the on disk format version for snapshots

This will cause the compat code to be run that creates entries in the
subvolumes and snapshots btrees.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 42d23732 16-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Snapshot creation, deletion

This is the final patch in the patch series implementing snapshots.
This patch implements two new ioctls that work like creation and
deletion of directories, but fancier.

- BCH_IOCTL_SUBVOLUME_CREATE, for creating new subvolumes and snaphots
- BCH_IOCTL_SUBVOLUME_DESTROY, for deleting subvolumes and snapshots

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 6fed42bb 15-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Plumb through subvolume id

To implement snapshots, we need every filesystem btree operation (every
btree operation without a subvolume) to start by looking up the
subvolume and getting the current snapshot ID, with
bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing
btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode.

This patch adds those bch2_subvolume_get_snapshot() calls, and also
switches to passing around a subvol_inum instead of just an inode
number.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 14b393ee 15-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Subvolumes, snapshots

This patch adds subvolume.c - support for the subvolumes and snapshots
btrees and related data types and on disk data structures. The next
patches will start hooking up this new code to existing code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 67e0dd8f 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_path

This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 78cf784e 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Further reduce iter->trans usage

This is prep work for splitting btree_path out from btree_iter -
btree_path will not have a pointer to btree_trans.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8dd6ed94 23-Jul-2021 Brett Holman <bholman.devel@gmail.com>

bcachefs: add progress stats to sysfs

This adds progress stats to sysfs for copygc, rebalance, recovery, and the
cmd_job ioctls.

Signed-off-by: Brett Holman <bholman.devel@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 877da05f 30-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Zero out mem_ptr field in btree ptr keys from journal replay

This fixes a bad ptr deref on recovery from unclean shutdown in
bch2_btree_node_get_noiter().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9f1833ca 10-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Update btree ptrs after every write

This closes a significant hole (and last known hole) in our ability to
verify metadata. Previously, since btree nodes are log structured, we
couldn't detect lost btree writes that weren't the first write to a
given node. Additionally, this seems to have lead to some significant
metadata corruption on multi device filesystems with metadata
replication: since a write may have made it to one device and not
another, if we read that btree node back from the replica that did have
that write and started appending after that point, the other replica
would have a gap in the bset entries and reading from that replica
wouldn't find the rest of the bsets.

But, since updates to interior btree nodes are now journalled, we can
close this hole by updating pointers to btree nodes after every write
with the currently written number of sectors, without negatively
affecting performance. This means we will always detect lost or corrupt
metadata - it also means that our btree is now a curious hybrid of COW
and non COW btrees, with all the benefits of both (excluding
complexity).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8c3f6da9 14-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve iter->should_be_locked

Adding iter->should_be_locked introduced a regression where it ended up
not being set on the iterator passed to bch2_btree_update_start(), which
is definitely not what we want.

This patch requires it to be set when calling bch2_trans_update(), and
adds various fixups to make that happen.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 90d22a66 10-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix overflow in journal_replay_entry_early

If filesystem on disk was used by a version with a larger BCH_DATA_NR
thas the currently running version, we don't want this to cause a buffer
overrun.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c0ebe3e4 23-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Assorted endianness fixes

Found by sparse

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 3a402c8d 07-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix some refcounting bugs

We really need debug mode assertions that ca->ref and ca->io_ref are
used correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ac1019d3 29-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Clean up bch2_btree_and_journal_walk()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# aae15aaf 24-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New and improved topology repair code

This splits out btree topology repair into a separate pass, and makes
some improvements:
- When we have to pick which of two overlapping nodes to drop keys
from, we use the btree node header sequence number to preserve the
newer node

- the gc code has been changed so that it doesn't bail out if we're
continuing/ignoring on fsck error - this way the dump tool can skip
running the repair pass but still walk all reachable metadata

- add a new superblock flag indicating when a filesystem is known to
have btree topology issues, and the topology repair pass should be
run

- changing the start/end of a node might mean keys in that node have to
be deleted: this patch handles that better by splitting it out into a
separate function and running it explicitly in the topology repair
code, previously those keys were only being dropped when the btree
node was read in.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4932e07e 24-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix key cache assertion

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d62ab355 14-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_trans_mark_dev_sb()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 423300e8 13-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BCH_BEATURE_atomic_nlink is obsolete

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8a85b20c 06-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Inode backpointers are now required

This lets us simplify fsck quite a bit, which we need for making fsck
snapshot aware.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3a14d58e 06-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop bch2_fsck_inode_nlink()

We've had BCH_FEATURE_atomic_nlink for quite some time, we can drop this
now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e751c01a 24-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Start using bpos.snapshot field

This patch starts treating the bpos.snapshot field like part of the key
in the btree code:

* bpos_successor() and bpos_predecessor() now include the snapshot field
* Keys in btrees that will be using snapshots (extents, inodes, dirents
and xattrs) now always have their snapshot field set to U32_MAX

The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that
determines whether we're iterating over keys in all snapshots or not -
internally, this controlls whether bkey_(successor|predecessor)
increment/decrement the snapshot field, or only the higher bits of the
key.

We add a new member to struct btree_iter, iter->snapshot: when
BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always
equal iter->snapshot, which will be 0 for btrees that don't use
snapshots, and alsways U32_MAX for btrees that will use snapshots
(until we enable snapshot creation).

This patch also introduces a new metadata version number, and compat
code for reading from/writing to older versions - this isn't a forced
upgrade (yet).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4cf91b02 04-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out bpos_cmp() and bkey_cmp()

With snapshots, we're going to need to differentiate between comparisons
that should and shouldn't include the snapshot field. bpos_cmp is now
the comparison function that does include the snapshot field, used by
core btree code.

Upper level filesystem code generally does _not_ want to compare against
the snapshot field - that code wants keys to compare as equal even when
one of them is in an ancestor snapshot.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 73590619 21-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't unconditially version_upgrade in initialize

This is mkfs's job. Also, clean up the handling of feature bits some.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c7bb769c 19-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: __bch2_trans_get_iter() refactoring, BTREE_ITER_NOT_EXTENTS

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7d6f07ed 04-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix compat code for superblock

The bkey compat code wasn't being run for btree roots in the superblock
clean section - this patch fixes it to use the journal entry validate
code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 41f8b09e 20-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename BTREE_ID enums for consistency with other enums

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f2785955 19-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill support for !BTREE_NODE_NEW_EXTENT_OVERWRITE()

bcachefs has been aggressively migrating filesystems and btree nodes to
the new format for quite some time - this shouldn't affect anyone
anymore, and lets us delete a _lot_ of code. Also, it frees up
KEY_TYPE_discard for a new whiteout key type for snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 41e37786 16-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Bring back metadata only gc

This is useful for the filesystem dump debugging tool - when we're
hitting bugs we want to skip as much of the recovery process as
possible, and the dump tool only needs to know where metadata lives.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 19dd3172 04-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use x-macros for compat feature bits

This is to generate strings for them, so that we can print them out.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e01dacf7 20-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bkey format generation for 32 bit fields

Having a packed format that can represent a field larger than the
unpacked type breaks bkey_packed_successor() assertions - we need to fix this to start using the snapshot filed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a4805d66 22-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Scan for old btree nodes if necessary on mount

We dropped support for !BTREE_NODE_NEW_EXTENT_OVERWRITE but it turned
out there were people who still had filesystems with btree nodes in that
format in the wild. This adds a new compat feature that indicates we've
scanned for and rewritten nodes in the old format, and does that scan at
mount time if the option isn't set.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dab9ef0d 23-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add error message for some allocation failures

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8042b5b7 10-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Extents may now cross btree node boundaries

When snapshots arrive, we won't necessarily be able to arbitrarily split
existis - when we need to split an existing extent, we'll have to check
if the extent was overwritten in child snapshots and if so emit a
whiteout for the split in the child snapshot.

Because extents couldn't span btree nodes previously, journal replay
would sometimes have to split existing extents. That's no good anymore,
but fortunately since extent handling has already been lifted above most
of the btree code there's no real need for that rule anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5d428c7c 03-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Run fsck if BCH_FEATURE_alloc_v2 isn't set

We're using BCH_FEATURE_alloc_v2 to also gate journalling updates to dev
usage - we don't have the code for reconstructing this from buckets
anymore, so we need to run fsck if it's not set.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 180fb49d 21-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Journal updates to dev usage

This eliminates the need to scan every bucket to regenerate dev_usage at
mount time.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2abe5420 21-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Persist 64 bit io clocks

Originally, bcachefs - going back to bcache - stored, for each bucket, a
16 bit counter corresponding to how long it had been since the bucket
was read from. But, this required periodically rescaling counters on
every bucket to avoid wraparound. That wasn't an issue in bcache, where
we'd perodically rewrite the per bucket metadata all at once, but in
bcachefs we're trying to avoid having to walk every single bucket.

This patch switches to persisting 64 bit io clocks, corresponding to the
64 bit bucket timestaps introduced in the previous patch with
KEY_TYPE_alloc_v2.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a0b73c1c 26-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add (partial) support for fixing btree topology

When we walk the btrees during recovery, part of that is checking that
btree topology is correct: for every interior btree node, its child
nodes should exactly span the range the parent node covers.

Previously, we had checks for this, but not repair code. Now that we
have the ability to do btree updates during initial GC, this patch adds
that repair code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5b593ee1 26-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add support for doing btree updates prior to journal replay

Some errors may need to be fixed in order for GC to successfully run -
walk and mark all metadata. But we can't start the allocators and do
normal btree updates until after GC has completed, and allocation
information is known to be consistent, so we need a different method of
doing btree updates.

Fortunately, we already have code for walking the btree while overlaying
keys from the journal to be replayed. This patch adds an update path
that adds keys to the list of keys to be replayed by journal replay, and
also fixes up iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 079663d8 21-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill metadata only gc

This was useful before we had transactional updates to interior btree
nodes - but now, it's just extra unneeded complexity.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ac958006 14-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Factor out bch2_ec_stripes_heap_start()

This fixes a bug where mark and sweep gc incorrectly was clearing out
the stripes heap and causing assertions to fire later - simpler to just
create the stripes heap after gc has finished.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# edfbba58 11-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add btree node prefetching to bch2_btree_and_journal_walk()

bch2_btree_and_journal_walk() walks the btree overlaying keys from the
journal; it was introduced so that we could read in the alloc btree
prior to journal replay being done, when journalling of updates to
interior btree nodes was introduced.

But it didn't have btree node prefetching, which introduced a severe
regression with mount times, particularly on spinning rust. This patch
implements btree node prefetching for the btree + journal walk,
hopefully fixing that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4291a331 08-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_alloc_write() should be writing for all devices

Alloc info isn't stored on a particular device, it makes no sense to
only be writing it out for rw members - this was causing fsck to not fix
alloc info errors, oops.

Also, make sure we write out alloc info in other repair paths.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 07a1006a 17-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reduce/kill BKEY_PADDED use

With various newer key types - stripe keys, inline data extents - the
old approach of calculating the maximum size of the value is becoming
more and more error prone. Better to switch to bkey_on_stack, which can
dynamically allocate if necessary to handle any size bkey.

In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 719fe7fb 10-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Update transactional triggers interface to pass old & new keys

This is needed to fix a bug where we're overflowing iterators within a
btree transaction, because we're updating the stripes btree (to update
block counts) and the stripes btree trigger is unnecessarily updating
the alloc btree - it doesn't need to update the alloc btree when the
pointers within a stripe aren't changing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# adbcada4 14-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't require flush/fua on every journal write

This patch adds a flag to journal entries which, if set, indicates that
they weren't done as flush/fua writes.

- non flush/fua journal writes don't update last_seq (i.e. they don't
free up space in the journal), thus the journal free space
calculations now check whether nonflush journal writes are currently
allowed (i.e. are we low on free space, or would doing a flush write
free up a lot of space in the journal)

- write_delay_ms, the user configurable option for when open journal
entries are automatically written, is now interpreted as the max
delay between flush journal writes (default 1 second).

- bch2_journal_flush_seq_async is changed to ensure a flush write >=
the requested sequence number has happened

- journal read/replay must now ignore, and blacklist, any journal
entries newer than the most recent flush entry in the journal. Also,
the way the read_entire_journal option is handled has been improved;
struct journal_replay now has an entry, 'ignore', for entries that
were read but should not be used.

- assorted refactoring and improvements related to journal read in
journal_io.c and recovery.c

Previously, we'd have to issue a flush/fua write every time we
accumulated a full journal entry - typically the bucket size. Now we
need to issue them much less frequently: when an fsync is requested, or
it's been more than write_delay_ms since the last flush, or when we need
to free up space in the journal. This is a significant performance
improvement on many write heavy workloads.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ebb84d09 13-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Increase journal pipelining

This patch increases the maximum journal buffers in flight from 2 to 4 -
this will be particularly helpful when in the future we stop requiring
flush+fua for every journal write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3eb26d01 01-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_get_iter() no longer returns errors

Since we now always preallocate the maximum number of iterators when we
initialize a btree transaction, getting an iterator never fails - we can
delete a fair amount of error path code.

This patch also simplifies the iterator allocation code a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2e9f3b88 01-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use BTREE_ITER_PREFETCH in journal+btree iter

Introducing the journal+btree iter introduced a regression where we
stopped using BTREE_ITER_PREFETCH - this is a performance regression on
rotating disks.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5731cf01 29-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix journal reclaim spinning in recovery

We can't run journal reclaim until we've finished replaying updates to
interior btree nodes - the check for this was in the wrong place though,
leading to journal reclaim spinning before it was allowed to proceed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6d758368 13-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a btree transaction iter overflow

extent_replay_key dates from before putting iterators was required -
fixed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a3e72262 05-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New varints

Previous varint implementation used by the inode code was not nearly as
fast as it could have been; partly because it was attempting to encode
integers up to 96 bits (for timestamps) but this meant that encoding and
decoding the length required a table lookup.

Instead, we'll just encode timestamps greater than 64 bits as two
separate varints; this will make decoding/encoding of inodes
significantly faster overall.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 33114c2d 24-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop alloc keys from journal when -o reconstruct_alloc

This fixes a bug where we'd pop an assertion due to replaying a key for
an interior btree node when that node no longer exists.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8d6b6222 16-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improvements to writing alloc info

Now that we've got transactional alloc info updates (and have for
awhile), we don't need to write it out on shutdown, and we don't need to
write it out on startup except when GC found errors - this is a big
improvement to mount/unmount performance.

This patch also fixes a few bugs where we weren't writing out alloc
info (on new filesystems, and new devices) and should have been.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9f115ce9 04-Aug-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a bug with the journal_seq_blacklist mechanism

Previously, we would start doing btree updates before writing the first
journal entry; if this was after an unclean shutdown, this could cause
those btree updates to not be blacklisted.

Also, move some code to headers for userspace debug tools.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f621e152 20-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add an option for rebuilding the replicas section

There is a bug where we cnan end up clearing the data_has field in the
superblock members section, which causes us to skip reading the journal
and thus journal replay fails. This option tells the recovery path to
not trust those fields.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 697e45b2 06-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BTREE_TRIGGER_NOOVERWRITES

This is prep work for reworking the triggers machinery - we have
triggers that need to know both the old and the new key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5d20ba48 04-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use cached iterators for alloc btree

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7fffc85b 13-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add an internal option for reading entire journal

To be used the debug tool that dumps the contents of the journal.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 61fc3c96 03-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Set filesystem features earlier in fs init path

Before we were setting features after allocating btree nodes, which
meant we were using the old btree pointer format.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 00b8ccf7 25-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Interior btree updates are now fully transactional

We now update the alloc info (bucket sector counts) atomically with
journalling the update to the interior btree nodes, and we also set new
btree roots atomically with the journalled part of the btree update.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b2930396 24-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix reading of alloc info after unclean shutdown

When updates to interior nodes started being journalled, that meant that
after an unclean shutdown, until journal replay is done we can't walk
the btree without overlaying the updates from the journal.

The initial btree gc was changed to walk the btree overlaying keys from
the journal - but bch2_alloc_read() and bch2_stripes_read() were missed.
Major whoops...

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b58a181d 30-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix iterating of journal keys within a btree node

Extent btrees no longer have weird special behaviour for min_key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5a655f06 28-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Read journal when keep_journal on

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f1d786a0 25-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add an option for keeping journal entries after startup

This will be used by the userspace debug tools.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2f194e16 25-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an assertion when nothing to replay

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f44a6a71 15-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Replay interior node keys

This slightly modifies the journal replay code so that it can replay
updates to interior nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e62d65f2 15-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trans_commit() path can now insert to interior nodes

This will be needed for the upcoming patches to journal updates to
interior btree nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5d548743 16-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Clear BCH_FEATURE_extents_above_btree_updates on clean shutdown

This is needed so that users can roll back to before "d9bb516b2d
bcachefs: Move extent overwrite handling out of core btree code", which
it appears may still be buggy.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e3e464ac 30-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move extent overwrite handling out of core btree code

Ever since the btree code was first written, handling of overwriting
existing extents - including partially overwriting and splittin existing
extents - was handled as part of the core btree insert path. The modern
transaction and iterator infrastructure didn't exist then, so that was
the only way for it to be done.

This patch moves that outside of the core btree code to a pass that runs
at transaction commit time.

This is a significant simplification to the btree code and overall
reduction in code size, but more importantly it gets us much closer to
the core btree code being completely independent of extents and is
important prep work for snapshots.

This introduces a new feature bit; the old and new extent update models
are incompatible when the filesystem needs journal replay.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3186c80f 05-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Skip 0 size deleted extents in journal replay

These are created by the new extent update path, but not used yet by the
recovery code and they break the existing recovery code, so we can just
skip them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f6d0368e 09-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Traverse iterator in journal replay

This fixes a bug where we end up spinning in journal replay - in theory
this shouldn't be necessary though, transaction reset should be
re-traversing all iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 27beb810 07-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix another iterator leak

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b807a0c8 26-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BCH_SB_FEATURES_ALL

BCH_FEATURE_btree_ptr_v2 wasn't getting set on new filesystems, oops

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 548b3d20 07-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_ptr_v2

Add a new btree ptr type which contains the sequence number (random 64
bit cookie, actually) for that btree node - this lets us verify that
when we read in a btree node it really is the btree node we wanted.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5c4a5cd5 27-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_and_journal_iter

Introduce a new iterator that iterates over keys in the btree with keys
from the journal overlaid on top. This factors out what the erasure
coding init code was doing manually.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2d594dfb 31-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out btree_trigger_flags

The trigger flags really belong with individual btree_insert_entries,
not the transaction commit flags - this splits out those flags and
unifies them with the BCH_BUCKET_MARK flags. Todo - split out
btree_trigger.c from buckets.c

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bcd6f3e0 26-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use KEY_TYPE_deleted whitouts for extents

Previously, partial overwrites of existing extents were handled
implicitly by the btree code; when reading in a btree node, we'd do a
mergesort of the different bsets and detect and fix partially
overlapping extents during that mergesort.

That approach won't work with snapshots: this changes extents to work
like regular keys as far as the btree code is concerned, where a 0 size
KEY_TYPE_deleted whiteout will completely overwrite an existing extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c3ff72c 28-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert some enums to x-macros

Helps for preventing things from getting out of sync.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e731d466 26-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't export __bch2_fs_read_write

BTREE_INSERT_LAZY_RW was added for this since this code was written; use
it instead.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8b3bbe2c 24-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't reexecute triggers when retrying transaction commit

This was causing a bug with transaction iterators overflowing; now, if
triggers have to be reexecuted we always return -EINTR and retry from
the start of the transaction.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 58e2388f 22-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BTREE_INSERT_ATOMIC

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b1fd23df 22-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert all bch2_trans_commit() users to BTREE_INSERT_ATOMIC

BTREE_INSERT_ATOMIC should really be the default mode, and there's not
that much code that doesn't need it - so this is prep work for getting
rid of the flag.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ba239c95 29-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_check_set_feature()

New helper function for setting incompatible feature bits

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4de77495 16-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reorganize extents.c

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4be1a412 09-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Inline data extents

This implements extents that have their data inline, in the value,
instead of the bkey value being pointers to the data - and the read and
write paths are updated to read from these new extent types and write
them out, when the write size is small enough.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 085ab693 09-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rework of cut_front & cut_back

This changes bch2_cut_front and bch2_cut_back so that they're able to
shorten the size of the value, and it also changes the extent update
path to update the accounting in the btree node when this happens.

When the size of the value is shortened, they zero out the space that's
no longer used, so it's interpreted as noops (as implemented in the last
patch).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b627c7d8 09-Nov-2019 Justin Husted <sigstop@gmail.com>

bcachefs: Set lost+found mode to 0700

For security and conformance with other filesystems, the lost+found
directory should not be world or group accessible.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ff929515 28-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Trust btree alloc info at runtime

This lets us avoid a cache miss in the write path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 05240ba6 11-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix creation of lost+found

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a40d97a7 07-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix incorrect use of bch2_extent_atomic_end()

bch2_extent_atomic_end counts the number of iterators requried for
marking overwrites - but journal replay never marks overwrites, so that
part was incorrect. And counting iterators for the key being inserted
should be unnecessary because we did that prior to the key being
inserted before it was first journalled.

This should fix an iterator overflow bug - the iterators for walking
overwrites were totally unneeded.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 137b0ed9 04-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_extent_atomic_end() now traverses iter

This fixes a bug in io.c bch2_write_index_default() - it was missing the
traverse call, but bch2_extent_atomic_end returns an error now and can
just call it itself.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96385742 02-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Factor out fs-common.c

This refactoring makes the code easier to understand by separating the
bcachefs btree transactional code from the linux VFS code - but more
importantly, it's also to share code with the fuse port.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a7199432 22-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill deferred btree updates

Will be replaced by cached btree iterators

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f9c55193 07-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop trans arg to bch2_extent_atomic_end()

Just for consistency

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 89b05118 06-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Flush fsck errors when looping in btree gc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ad7e137e 28-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Switch reconstruct_alloc to a mount option

Right now this is the only way of repairing bucket gens in the future

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 06f6c3ec 27-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reflink pointers also have to be remarked if split in journal replay

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 76426098 16-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reflink

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3c7f3b7a 16-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor bch2_extent_trim_atomic() for reflink

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2cbe5cfe 09-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rework calling convention for marking overwrites

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e222d206 12-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix ec_stripes_read()

Change it to not mark keys that will be overwritten by keys in the
journal - this fixes a bug where we pop an assertion in
bucket_set_stripe() because of a stale pointer - because the stripe that
has the stale pointer has been deleted.

This code could be factored out and used elsewhere, at some point.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2ded276b 24-Jun-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix array overrun with unknown btree roots

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f707e3d8 18-Jun-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix kasan splat

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6e738539 24-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve key marking interface

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3838be78 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use a fixed size buffer for fs_usage_deltas

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 20bceecb 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More work to avoid transaction restarts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 619f5bee 17-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: some improvements to startup messages and options

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 932aa837 11-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_mark_update()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5e82a9a1 10-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Write out fs usage consistently

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c6dd04f8 15-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Mark overwrites from journal replay in initial gc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d0734356 11-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Deduplicate keys in the journal before replay

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 644d180b 11-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Journal replay refactoring

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 478259b7 04-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: delete duplicated code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1dd7f9d9 04-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rewrite journal_seq_blacklist machinery

Now, we store blacklisted journal sequence numbers in the superblock,
not the journal: this helps to greatly simplify the code, and more
importantly it's now implemented in a way that doesn't require all btree
nodes to be visited before starting the journal - instead, we
unconditionally blacklist the next 4 journal sequence numbers after an
unclean shutdown.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a1d58243 29-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: add ability to run gc on metadata only

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7b512638 29-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor bch2_fs_recovery()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0bc166ff 28-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Track whether filesystem has errors in superblock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 03e183cb 21-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Verify fs hasn't been modified before going rw

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 134915f3 21-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Go rw lazily

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6122ab63 21-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More debug params for testing of recovery paths

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 05235e99 21-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Run gc if failed to read alloc btree

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 082f0801 21-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix error handling in bch2_fs_recovery()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1633e492 28-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: improved flush_held_btree_writes()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 768ac639 14-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a mechanism for blocking the journal

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f7e76361 10-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: no need to run gc when initializing new fs

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1df42b57 06-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: don't do initial gc if have alloc info feature

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3577df5f 09-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: serialize persistent_reserved

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e0745e2 24-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: initialize fs usage summary in recovery

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 42b72e0b 24-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: journal_replay_early()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2c5af169 24-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: reserve space in journal for fs usage entries

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 94cd106f 09-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: delete a debug printk

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 61c8d7c8 25-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Persist stripe blocks_used

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 73e6ab95 01-Dec-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Switch replicas to mark_lock

Prep work for upcoming disk accounting changes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f0cfb963 29-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Track nr_inodes with the key marking machinery

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 26609b61 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make bkey types globally unique

this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.

Also improve the on disk format versioning stuff.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dfe9bfb3 24-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Stripes now properly subject to gc

gc now verifies the contents of the stripes radix tree, important for
persistent alloc info

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4e65431c 23-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

Revert "bcachefs: start erasure coding after journal replay"

This reverts commit 36f389604294dfc953e6f5624ceb683818d32f28.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 129550c4 18-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: start erasure coding after journal replay

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cd575ddf 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Erasure coding

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# af9d3bc2 30-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: stripe support for replicas tracking

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8b335bae 04-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Assorted fixes for running on very small devices

It's now possible to create and use a filesystem on a 512k device with
4k buckets (though at that size we still waste almost half to internal
reserves)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 72644db1 03-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an assertion when rebuilding replicas

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 02f1a96c 03-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename nofsck opt to fsck

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7b3f84ea 05-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out alloc_background.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fc3268c1 08-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: kill extent_insert_hook

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 88c07f73 14-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Only check inode i_nlink during full fsck

Now that all filesystem operatinos that manipulate the filesystem
heirachy and i_nlink are fully atomic, we can add a feature bit to
indicate i_nlink doesn't need to be checked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c6fdbd8 17-Mar-2017 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Initial commit

Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>