History log of /linux-master/fs/bcachefs/btree_gc.c
Revision Date Author Comments
# 719aec84 17-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fix leak in bch2_gc_write_reflink_key

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 27c15ed2 12-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_member.btree_allocated_bitmap

This adds a small (64 bit) per-device bitmap that tracks ranges that
have btree nodes, for accelerating btree node scan if it is ever needed.

- New helpers, bch2_dev_btree_bitmap_marked() and
bch2_dev_bitmap_mark(), for checking and updating the bitmap

- Interior btree update path updates the bitmaps when required

- The check_allocations pass has a new fsck_err check,
btree_bitmap_not_marked

- New on disk format version, mi_btree_mitmap, which indicates the new
bitmap is present

- Upgrade table lists the required recovery pass and expected fsck error

- Btree node scan uses the bitmap to skip ranges if we're on the new
version

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5ab4beb7 08-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't scan for btree nodes when we can reconstruct

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 359571c3 08-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix check_topology() when using node scan

shoot down journal keys _before_ populating journal keys with pointers
to scanned nodes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 43f5ea46 16-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Topology repair now uses nodes found by scanning to fill holes

With the new btree node scan code, we can now recover from corrupt btree
roots - simply create a new fake root at depth 1, and then insert all
the leaves we found.

If the root wasn't corrupt but there's corruption elsewhere in the
btree, we can fill in holes as needed with the newest version of a given
node(s) from the scan; we also check if a given btree node is older than
what we found from the scan.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b268aa4e 10-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't skip fake btree roots in fsck

When a btree root is unreadable, we might still have keys fro the
journal to walk and mark.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d2554263 23-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Split out recovery_passes.c

We've grown a fair amount of code for managing recovery passes; tracking
which ones we're running, which ones need to be run, and flagging in the
superblock which ones need to be run on the next recovery.

So it's worth splitting out into its own file, this code is pretty
different from the code in recovery.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 47d2080e 25-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill bch2_bkey_ptr_data_type()

Remove some duplication, and inconsistency between check_fix_ptrs and
the main ptr marking paths

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6f5869ff 26-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix use after free in bch2_check_fix_ptrs()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 79032b07 23-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improved topology repair checks

Consolidate bch2_gc_check_topology() and btree_node_interior_verify(),
and replace them with an improved version,
bch2_btree_node_check_topology().

This checks that children of an interior node correctly span the full
range of the parent node with no overlaps.

Also, ensure that topology repairs at runtime are always a fatal error;
in particular, this adds a check in btree_iter_down() - if we don't find
a key while walking down the btree that's indicative of a topology error
and should be flagged as such, not a null ptr deref.

Some checks in btree_update_interior.c remaining BUG_ONS(), because we
already checked the node for topology errors when starting the update,
and the assertions indicate that we _just_ corrupted the btree node -
i.e. the problem can't be that existing on disk corruption, they
indicate an actual algorithmic bug.

In the future, we'll be annotating the fsck errors list with which
recovery pass corrects them; the open coded "run explicit recovery pass
or fatal error" in bch2_btree_node_check_topology() will in the future
be done for every fsck_err() call.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 37bb9c95 16-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix locking in bch2_alloc_write_key()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cdce1094 11-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: reconstruct_alloc cleanup

Now that we've got the errors_silent mechanism, we don't have to check
if the reconstruct_alloc option is set all over the place.

Also - users no longer have to explicitly select fsck and fix_errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b6fc661f 10-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix order of gc_done passes

gc_stripes_done() and gc_reflink_done() may do alloc btree updates (i.e.
when deleting an indirect extent) - we need bucket gens to be fixed by
then.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 06ebc483 10-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fix deletion of indirect extents in btree_gc

we need to run the normal extent update path on deletion -
bch2_bkey_make_mut() is incorrect when key type is changing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 52946d82 06-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill more -EIO error codes

This converts -EIOs related to btree node errors to private error codes,
which will help with some ongoing debugging by giving us better error
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cb6fc943 01-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: kill kvpmalloc()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5f43b013 22-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: btree node prefetching in check_topology

btree_and_journal_iter is old code that we want to get rid of, but we're
not ready to yet.

lack of btree node prefetching is, it turns out, a real performance
issue for fsck on spinning rust, so - add it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fc634d8e 22-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: btree_and_journal_iter.trans

we now always have a btree_trans when using a btree_and_journal_iter;
prep work for adding prefetching to btree_and_journal_iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b3eba6a4 10-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix degraded mode fsck

We don't know where the superblock and journal lives on offline devices;
that means if a device is offline fsck can't check those buckets.

Previously, fsck would incorrectly clear bucket data types for those
buckets on offline devices; now we just use the previous state.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8e7834a8 16-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_fs_usage_base

Split out base filesystem usage into its own type; prep work for
breaking up bch2_trans_fs_usage_apply().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e58f963c 06-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: helpers for printing data types

We need bounds checking since new versions may introduce new data types.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f0431c5f 31-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Combine .trans_trigger, .atomic_trigger

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 089e3113 27-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill BTREE_TRIGGER_NOATOMIC

dead code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ad00bce0 27-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: mark now takes bkey_s

Prep work for disk space accounting rewrite: we're going to want to use
a single callback for both of our current triggers, so we need to change
them to have the same type signature first.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 717296c3 27-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: trans_mark now takes bkey_s

Prep work for disk space accounting rewrite: we're going to want to use
a single callback for both of our current triggers, so we need to change
them to have the same type signature first.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0beebd92 21-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bkey_for_each_ptr() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9fea2274 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: for_each_member_device() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 80eab7a7 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: for_each_btree_key() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cf904c8d 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_err_(fn|msg) check if should print

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5028b907 07-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename for_each_btree_key2() -> for_each_btree_key()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 27b2df98 07-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill for_each_btree_key()

for_each_btree_key() handles transaction restarts, like
for_each_btree_key2(), but only calls bch2_trans_begin() after a
transaction restart - for_each_btree_key2() wraps every loop iteration
in a transaction.

The for_each_btree_key() behaviour is problematic when it leads to
holding the SRCU lock that prevents key cache reclaim for an unbounded
amount of time - there's no real need to keep it around.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3c471b65 26-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: convert bch_fs_flags to x-macro

Now we can print out filesystem flags in sysfs, useful for debugging
various "what's my filesystem doing" issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9b34f02c 23-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill dev_usage->buckets_ec

This counter is redundant; it's simply the sum of BCH_DATA_stripe and
BCH_DATA_parity buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 086a52f7 09-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename bch_replicas_entry -> bch_replicas_entry_v1

Prep work for introducing bch_replicas_entry_v2

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cb52d23e 11-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename BTREE_INSERT flags

BTREE_INSERT flags are actually transaction commit flags - rename them
for clarity.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 463086d9 28-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert gc_alloc_start() to for_each_btree_key2()

This eliminates some SRCU warnings: for_each_btree_key2() runs every
loop iteration in a distinct transaction context.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b65db750 24-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Enumerate fsck errors

This patch adds a superblock error counter for every distinct fsck
error; this means that when analyzing filesystems out in the wild we'll
be able to see what sorts of inconsistencies are being found and repair,
and hence what bugs to look for.

Errors validating bkeys are not yet considered distinct fsck errors, but
this patch adds a new helper, bkey_fsck_err(), in order to add distinct
error types for them as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 253ba178 19-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix ca->oldest_gen allocation

The ca->oldest_gen array needs to be the same size as the bucket_gens
array; ca->mi.nbuckets is updated with only state_lock held, not
gc_lock, so bch2_gc_gens() could race with device resize and allocate
too small of an oldest_gens array.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 88dfe193 19-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_id_str()

Since we can run with unknown btree IDs, we can't directly index btree
IDs into fixed size arrays.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6bd68ec2 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Heap allocate btree_trans

We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96dea3d5 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix W=12 build errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b5e85d4d 12-Sep-2023 Yang Li <yang.lee@linux.alibaba.com>

bcachefs: Remove unneeded semicolon

./fs/bcachefs/btree_gc.c:1249:2-3: Unneeded semicolon
./fs/bcachefs/btree_gc.c:1521:2-3: Unneeded semicolon
./fs/bcachefs/btree_gc.c:1575:2-3: Unneeded semicolon
./fs/bcachefs/counters.c:46:2-3: Unneeded semicolon

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e46c181a 10-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert more code to bch_err_msg()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 401585fe 05-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: btree_journal_iter.c

Split out a new file from recovery.c for managing the list of keys we
read from the journal: before journal replay finishes the btree iterator
code needs to be able to iterate over and return keys from the journal
as well, so there's a fair bit of code here.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1e81f89b 06-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix assorted checkpatch nits

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0ed4ca14 03-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Ensure topology repair runs

This fixes should_restart_for_topology_repair() - previously it was
returning false if the btree io path had already seleceted topology
repair to run, even if it hadn't run yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 922bc5a0 16-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Make topology repair a normal recovery pass

This adds bch2_run_explicit_recovery_pass(), for rewinding recovery and
explicitly running a specific recovery pass - this is a more general
replacement for how we were running topology repair before.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a0f8faea 11-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fix_errors option is now a proper enum

Before, it was parsed as a bool but internally it was really an enum:
this lets us pass in all the possible values.

But we special case the option parsing: no supplied value is parsed as
FSCK_FIX_yes, to match the previous behaviour.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 067d228b 07-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Enumerate recovery passes

Recovery and fsck have many different passes/jobs to do, which always
run in the same order - but not all of them run all the time. Some are
for fsck, some for unclean shutdown, some for version upgrades.

This adds some new structure: a defined list of recovery passes that we
can run in a loop, as well as consolidating the log messages.

The main benefit is consolidating the "should run this recovery pass"
logic, as well as cleaning up the "this recovery pass has finished"
state; instead of having a bunch of ad-hoc state bits in c->flags, we've
now got c->curr_recovery_pass.

By consolidating the "should run this recovery pass" logic, in the
future on disk format upgrades will be able to say "upgrading to this
version requires x passes to run", instead of forcing all of fsck to
run.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 73bd774d 06-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted sparse fixes

- endianness fixes
- mark some things static
- fix a few __percpu annotations
- fix silent enum conversions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# faa6cb6c 28-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Allow for unknown btree IDs

We need to allow filesystems with metadata from newer versions to be
mountable and usable by older versions.

This patch enables us to roll out new btrees without a new major version
number; we can now handle btree roots for unknown btree types.

The unknown btree roots will be retained, and fsck (including
backpointers) will check them, the same as other btree types.

We add a dynamic array for the extra, unknown btree roots, in addition
to the fixed size btree root array, and add new helpers for looking up
btree roots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0fb3355d 26-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve bch2_bkey_make_mut()

bch2_bkey_make_mut() now takes the bkey_s_c by reference and points it
at the new, mutable key.

This helps in some fsck paths that may have multiple repair operations
on the same key.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1bb3c2a9 20-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New error message helpers

Add two new helpers for printing error messages with __func__ and
bch2_err_str():
- bch_err_fn
- bch_err_msg

Also kill the old error strings in the recovery path, which were causing
us to incorrectly report memory allocation failures - they're not needed
anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dbda63bb 30-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_bkey_make_mut() now calls bch2_trans_update()

It's safe to call bch2_trans_update with a k/v pair where the value
hasn't been filled out, as long as the key part has been and the value
is filled out by transaction commit time.

This patch folds the bch2_trans_update() call into bch2_bkey_make_mut(),
eliminating a bit of boilerplate.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6b52bcde 26-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Always run topology error when CONFIG_BCACHEFS_DEBUG=y

Improved test coverage.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 330970c2 19-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Make reconstruct_alloc quieter

We shouldn't be printing out fsck errors for expected errors - this
helps make test logs more readable, and makes it easier to see what the
actual failure was.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 65d48e35 14-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Private error codes: ENOMEM

This adds private error codes for most (but not all) of our ENOMEM uses,
which makes it easier to track down assorted allocation failures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d57c9add 03-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve error message for stripe block sector counts wrong

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9d32097f 03-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More stripe create cleanup/fixes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 91065976 01-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Mark stripe buckets with correct data type

Currently, we don't use bucket data type for tracking whether buckets
are part of a stripe; parity buckets are BCH_DATA_parity, but data
buckets in a stripe are BCH_DATA_user. There's a separate counter,
buckets_ec, outside the BCH_DATA_TYPES system for tracking number of
buckets on a device that are part of a stripe.

The trouble with this approach is that it's too coarse grained, and we
need better information on fragmentation for debugging copygc.

With this patch, data buckets in a stripe are now tracked as
BCH_DATA_stripe buckets.

This doesn't yet differentiate between erasure coded and non-erasure
coded data in a stripe bucket, nor do we yet track empty data buckets in
stripes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2611a041 01-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_mark_key() now takes btree_id & level

btree & level are passed to trans_mark - for backpointers -
bch2_mark_key() should take them as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7546c78d 18-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix ec repair code check

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 19a614d2 30-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Better inlining for bch2_alloc_to_v4_mut

This separates out the slowpath into a separate function, and inlines
bch2_alloc_v4_mut into bch2_trans_start_alloc_update(), the main place
it's called.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 858536c7 11-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert EROFS errors to private error codes

More error code improvements - this gets us more useful error messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 994ba475 23-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New btree helpers

This introduces some new conveniences, to help cut down on boilerplate:

- bch2_trans_kmalloc_nomemzero() - performance optimiation
- bch2_bkey_make_mut()
- bch2_bkey_get_mut()
- bch2_bkey_get_mut_typed()
- bch2_bkey_alloc()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 14d7d61f 13-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix btree_gc when multiple passes required

We weren't resetting filesystem & device usage when restarting gc, which
was spotted when free bucket counters overflowed - whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5f659376 12-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Suppress -EROFS messages when shutting down

This isn't actually an error condition, this just indicates a normal
shutdown - no reason for these to be in the log.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e88a75eb 24-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New bpos_cmp(), bkey_cmp() replacements

This patch introduces
- bpos_eq()
- bpos_lt()
- bpos_le()
- bpos_gt()
- bpos_ge()

and equivalent replacements for bkey_cmp().

Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a1019576 22-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More style fixes

Fixes for various checkpatch errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e3e02e6 19-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted checkpatch fixes

checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:

ASSIGN_IN_IF
UNSPECIFIED_INT - bcachefs coding style prefers single token type names
NEW_TYPEDEFS - typedefs are occasionally good
FUNCTION_ARGUMENTS - we prefer to look at functions in .c files
(hopefully with docbook documentation), not .h
file prototypes
MULTISTATEMENT_MACRO_USE_DO_WHILE
- we have _many_ x-macros and other macros where
we can't do this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2da671dc 09-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Use btree_type_has_ptrs() more consistently

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ca7d8fca 21-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New locking functions

In the future, with the new deadlock cycle detector, we won't be using
bare six_lock_* anymore: lock wait entries will all be embedded in
btree_trans, and we will need a btree_trans context whenever locking a
btree node.

This patch plumbs a btree_trans to the few places that need it, and adds
two new locking functions
- btree_node_lock_nopath, which may fail returning a transaction
restart, and
- btree_node_lock_nopath_nofail, to be used in places where we know we
cannot deadlock (i.e. because we're holding no other locks).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 674cfc26 26-Aug-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add persistent counters for all tracepoints

Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1ed0a5d2 19-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert fsck errors to errcode.h

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d4bf5eec 18-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bch2_err_str() in error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 326568f1 16-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert bch2_gc_done() for_each_btree_key2()

This converts bch2_gc_stripes_done() and bch2_gc_reflink_done() to the
new for_each_btree_key_commit() macro.

The new for_each_btree_key2() and for_each_btree_key_commit() macros
handles transaction retries, allowing us to avoid nested transactions -
which we want to avoid since they're tricky to do completely correctly
and upcoming assertions are going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# a1783320 15-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: for_each_btree_key2()

This introduces two new macros for iterating through the btree, with
transaction restart handling
- for_each_btree_key2()
- for_each_btree_key_commit()

Every iteration is now in an implicit transaction, and - as with
lockrestart_do() and commit_do() - returning -EINTR will cause the
transaction to be restarted, at the same key.

This patch converts a bunch of code that was open coding this to these
new macros, saving a substantial amount of code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e68914ca 13-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename __bch2_trans_do() -> commit_do()

Better/more descriptive naming, and prep for adding
nested_lockrestart_do() and nested_commit_do().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 80b3bf33 11-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Silence some fsck errors when reconstructing alloc info

There's no need to print fsck errors for errors that are expected, and
the user has already opted to repair.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 1534ebb7 11-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Put some repair messages behind opts->verbose

These messages log the updates we're doing in bch2_check_fix_ptrs(),
which is useful when debugging but not usually needed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 7a47d099 22-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Always descend to leaf nodes it btree_gc

If a btree node is unreadable, it's the topology repair that fixes that
and it's kicked off by btree_gc, so btree_gc needs to touch every node
and very that they can be read.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2817d453 22-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix assertion in topology repair

If we were at the end of the node, when breaking out of the loop we'd
pop the assertion on line 446 when cur wasn't NULL.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 401ec4db 03-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Printbuf rework

This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1f93726e 17-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tracepoint improvements

Delete some obsolete tracepoints, organize alloc tracepoints better,
make a few tracepoints more consistent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c0960603 17-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Shutdown path improvements

We're seeing occasional firings of the assertion in the key cache
shutdown code that nr_dirty == 0, which means we must sometimes be doing
transaction commits after we've gone read only.

Cleanups & changes:
- BCH_FS_ALLOC_CLEAN renamed to BCH_FS_CLEAN_SHUTDOWN
- new helper bch2_btree_interior_updates_flush(), which returns true if
it had to wait
- bch2_btree_flush_writes() now also returns true if there were btree
writes in flight
- __bch2_fs_read_only now checks if btree writes were in flight in the
shutdown loop: btree write completion does a transaction update, to
update the pointer in the parent node
- assert that !BCH_FS_CLEAN_SHUTDOWN in __bch2_trans_commit

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 7003589d 10-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Ensure buckets have io_time[READ] set

It's an error if a bucket is in state BCH_DATA_cached but not on the LRU
btree - i.e io_time[READ] == 0 - so, make sure it's set before adding
it.

Also, make some of the LRU code a bit clearer and more direct.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 822835ff 31-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fold bucket_state in to BCH_DATA_TYPES()

Previously, we were missing accounting for buckets in need_gc_gens and
need_discard states. This matters because buckets in those states need
other btree operations done before they can be used, so they can't be
conuted when checking current number of free buckets against the
allocation watermark.

Also, we weren't directly counting free buckets at all. Now, data type 0
== BCH_DATA_free, and free buckets are counted; this means we can get
rid of the separate (poorly defined) count of unavailable buckets.

This is a new on disk format version, with upgrade and fsck required for
the accounting changes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 48620e51 07-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Topology repair fixes

- We were failing to start topology repair, because we hadn't set the
superblock flag indicating it needed to run
- set_node_min() forget to update the btree node's key
- bch2_gc_alloc_reset() didn't reset data type, leading to inserting an
invalid key that was empty but had nonzero data type

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 66d90823 13-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill struct bucket_mark

This switches struct bucket to using a lock, instead of cmpxchg. And now
that the protected members no longer need to fit into a u64, we can
expand the sector counts to 32 bits.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5735608c 10-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill main in-memory bucket array

All code using the in-memory bucket array, excluding GC, has now been
converted to use the alloc btree directly - so we can finally delete it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f25d8215 09-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill allocator threads & freelists

Now that we have new persistent data structures for the allocator, this
patch converts the allocator to use them.

Now, foreground bucket allocation uses the freespace btree to find
buckets to allocate, instead of popping buckets off the freelist.

The background allocator threads are no longer needed and are deleted,
as well as the allocator freelists. Now we only need background tasks
for invalidating buckets containing cached data (when we are low on
empty buckets), and for issuing discards.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3d48a7f8 31-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: KEY_TYPE_alloc_v4

This introduces a new alloc key which doesn't use varints. Soon we'll be
adding backpointers and storing them in alloc keys, which means our
pack/unpack workflow for alloc keys won't really work - we'll need to be
mutating alloc keys in place.

Instead of bch2_alloc_unpack(), we now have bch2_alloc_to_v4() that
converts older types of alloc keys to v4 if needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f0a3a2cc 28-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Journal seq now incremented at entry open, not close

This patch changes journal_entry_open() to initialize the new journal
entry, not __journal_entry_close().

This also means that journal_cur_seq() refers to the sequence number of
the last journal entry when we don't have an open journal entry, not the
next one.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# fa8e94fa 25-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Heap allocate printbufs

This patch changes printbufs dynamically allocate and reallocate a
buffer as needed. Stack usage has become a bit of a problem, and a major
cause of that has been static size string buffers on the stack.

The most involved part of this refactoring is that printbufs must now be
exited with printbuf_exit().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 78c8fe20 19-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Normal update/commit path now works before going RW

This improves __bch2_trans_commit - early in the recovery process, when
we're running btree_gc and before we want to go RW, it now uses
bch2_journal_key_insert() to add the update to the list of updates for
journal replay to do, instead of btree_gc having to use separate
interfaces depending on whether we're running at bringup or, later,
runtime.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c929f230 12-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Stale ptr cleanup is now done by gc_gens

Before we had dedicated gc code for bucket->oldest_gen this was
btree_gc's responsibility, but now that we have that we can rip it out,
simplifying the already overcomplicated btree_gc.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# aa8982c3 10-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix reflink repair code

The reflink repair code was incorrectly inserting a nonzero deleted key
via journal replay - this is due to bch2_journal_key_insert() being
somewhat hacky, and so this fix is also hacky for now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c45c8667 24-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_gc_gens() no longer uses bucket array

Like the previous patches, this converts bch2_gc_gens() to use the alloc
btree directly, and private arrays of generation numbers for its own
recalculation of oldest_gen.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# ec061b21 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_gc no longer uses main in-memory bucket array

This changes the btree_gc code to only use the second bucket array, the
one dedicated to GC. On completion, it compares what's in its in memory
bucket array to the allocation information in the btree and writes it
directly, instead of updating the main in-memory bucket array and
writing that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8f11548e 01-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve path for when btree_gc needs another pass

btree_gc sometimes needs another pass when it corrects bucket generation
numbers or data types - when it finds multiple pointers of different
data types to the same bucket, it may want to keep the second one it
found.

When this happens, we now clear out bucket sector counts _without_
resetting the bucket generation/data types that we already found,
instead of resetting them to what we have in the alloc btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 4e08446d 04-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_check_fix_ptrs()

The repair for for btree_ptrs was saying one thing and doing another -
fortunately, that code can just be deleted.

Also, when we update a btree node pointer, we also have to update node
in memery, if it exists in the btree node cache - this fixes
bch2_check_fix_ptrs() to do that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 5222a460 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BTREE_ITER_WITH_JOURNAL

This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is
automatically enabled when initializing a btree iterator before journal
replay has completed - it overlays the contents of the journal with the
btree.

This lets us delete bch2_btree_and_journal_walk() and just use the
normal btree iterator interface instead - which also lets us delete a
significant amount of duplicated code.

Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch -
we're redoing the binary search over keys in the journal every time we
call bch2_btree_iter_peek().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2a84de33 01-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Log what we're doing when repairing

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 13f914ec 26-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bch2_ec_mem_alloc()

bch2_ec_mem_alloc() was only used by GC, and there's no real need to
preallocate the stripes radix tree since we can cope fine with memory
allocation failure when we use the radix tree. This deletes a fair bit
of code, and it's also needed for the upcoming patch because
bch2_btree_iter_peek_prev() won't be working before journal replay
completes (and using it was incorrect previously, as well).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 47ac34ec 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Separate out gc_bucket()

Since the main in memory bucket array is going away, we don't want to be
calling bucket() or __bucket() when what we want is the GC in-memory
bucket.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e75b2d4c 23-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_journal_key_insert() no longer transfers ownership

bch2_journal_key_insert() used to assume that the key passed to it was
allocated with kmalloc(), and on success took ownership. This patch
deletes that behaviour, making it more similar to
bch2_trans_update()/bch2_trans_commit().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 77170d0d 24-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_bucket_alloc_new_fs() no longer depends on bucket marks

Now that bch2_bucket_alloc_new_fs() isn't looking at bucket marks to
decide what buckets are eligible to allocate, we can clean up the
filesystem initialization and device add paths. Previously, we had to
use ancient code to mark superblock/journal buckets in the in memory
bucket marks as we allocated them, and then zero that out and re-do that
marking using the newer transational bucket mark paths. Now, we can
simply delete the in-memory bucket marking.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 991ba021 10-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add more time_stats

This adds more latency/event measurements and breaks some apart into
more events. Journal writes are broken apart into flush writes and
noflush writes, btree compactions are broken out from btree splits,
btree mergers are added, as well as btree_interior_updates - foreground
and total.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 990d42d1 04-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out struct gc_stripe from struct stripe

We have two radix trees of stripes - one that mirrors some information
from the stripes btree in normal operation, and another that GC uses to
recalculate block usage counts.

The normal one is now only used for finding partially empty stripes in
order to reuse them - the normal stripes radix tree and the GC stripes
radix tree are used significantly differently, so this patch splits them
into separate types.

In an upcoming patch we'll be replacing c->stripes with a btree that
indexes stripes by the order we want to reuse them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b547d005 29-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Erasure coding fixes

When we added the stripe and stripe_redundancy fields to alloc keys, we
neglected to add them to the functions that convert back and forth with
the in-memory types.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 181fe42a 28-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Handle replica marking fsck errors locally

This simplifies the code quite a bit and eliminates an inconsistency - a
given bkey doesn't necessarily translate to a single replicas entry for
disk space accounting.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 58e1ea4b 28-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Push c->mark_lock usage down to where it is needed

This changes the bch2_mark_key() and related paths to take mark lock
where it is needed, instead of taking it in the upper transaction commit
path - by pushing down locking we'll be able to handle fsck errors
locally instead of requiring a separate check in the btree_gc code for
replicas being marked.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 7468c4ef 21-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix BCH_FS_ERROR flag handling

We were setting BCH_FS_ERROR on startup if the superblock was marked as
containing errors, which is not what we wanted - BCH_FS_ERROR indicates
whether errors have been found, so that after a successful fsck we're
able to clear the error bit in the superblock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e5464a37 20-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a bit of missing repair code

This adds repair code to drop very stale pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2debb1b8 29-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BTREE_TRIGGER_INSERT now only means insert

This allows triggers to distinguish between a key entering the btree -
i.e. being called from the trans commit path - vs. being called on a key
that already exists, i.e. by GC.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 904823de 29-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert bch2_mark_key() to take a btree_trans *

This helps to unify the interface between bch2_mark_key() and
bch2_trans_mark_key() - and it also gives access to the journal
reservation and journal seq in the mark_key path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 961b2d62 29-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Assorted ec fixes

- The backpointer that ec_stripe_update_ptrs() uses now needs to include
the snapshot ID, which means we have to change where we add the
backpointer to after getting the snapshot ID for the new extents

- ec_stripe_update_ptrs() needs to be calling bch2_trans_begin()

- improve error message in bch2_mark_stripe()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f3cf0999 24-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_btree_node_rewrite() now returns transaction restarts

We have been getting away from handling transaction restarts locally -
convert bch2_btree_node_rewrite() to the newer style.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b0d1b70a 24-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Must check for errors from bch2_trans_cond_resched()

But we don't need to call it from outside the btree iterator code
anymore, since it's called by bch2_trans_begin() and
bch2_btree_path_traverse().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d355c6f4 19-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: for_each_btree_node() now returns errors directly

This changes for_each_btree_node() to work like for_each_btree_key(),
and to that end bch2_btree_iter_peek_node() and next_node() also return
error ptrs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# dfc276df 18-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve reflink repair code

When a reflink pointer points to an indirect extent that doesn't exist,
we need to replace it with a KEY_TYPE_error key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b9a7d8ac 13-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix implementation of KEY_TYPE_error

When force-removing a device, we were silently dropping extents that we
no longer had pointers for - we should have been switching them to
KEY_TYPE_error, so that reads for data that was lost return errors.

This patch adds the logic for switching a key to KEY_TYPE_error to
bch2_bkey_drop_ptr(), and improves the logic somewhat.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e59a4d78 30-Sep-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a spurious fsck error

We were getting spurious "multiple types of data in same bucket" errors
in fsck, because the check was running for (cached) stale pointers -
oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 67e0dd8f 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_path

This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f4ccfe07 21-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix unhandled transaction restart in bch2_gc_btree_gens()

This fixes https://github.com/koverstreet/bcachefs/issues/305

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e5af273f 25-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trans->restarted

Start tracking when btree transactions have been restarted - and assert
that we're always calling bch2_trans_begin() immediately after
transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e3a67bdb 10-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Regularize argument passing of btree_trans

btree_trans should always be passed when we have one - iter->trans is
disfavoured. This mainly updates old code in btree_update_interior.c,
some of which predates btree_trans.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 618b1c0e 05-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out SPOS_MAX

Internal btree code really wants a POS_MAX with all fields ~0; external
code more likely wants the snapshot field to be 0, because when we're
passing it to bch2_trans_get_iter() it's used for the snapshot we're
operating in, which should be 0 for most btrees that don't use
snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d976a84e 22-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't loop into topology repair

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 08061519 21-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't ratelimit certain fsck errors

It's unhelpful if we see "Halting mark and sweep to start topology
repair" but we don't see the error that triggered it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 297d8934 10-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Extensive triggers cleanups

- We no longer mark subsets of extents, they're marked like regular
keys now - which means we can drop the offset & sectors arguments
to trigger functions
- Drop other arguments that are no longer needed anymore in various
places - fs_usage
- Drop the logic for handling extents in bch2_mark_update() that isn't
needed anymore, to match bch2_trans_mark_update()
- Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we
don't have an old key to mark

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 4351d3ec 07-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More topology repair code

This improves the handling of overlapping btree nodes; now, we handle
the case where one btree node completely overwrites another.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# bc3f8b25 01-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check for errors from bch2_trans_update()

Upcoming refactoring is going to change bch2_trans_update() to start
returning transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 890b74f0 23-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fsck for reflink refcounts

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e1036ce5 14-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Repair code for multiple types of data in same bucket

bch2_check_fix_ptrs() is awkward, we need to find a way to improve it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3a402c8d 07-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix some refcounting bugs

We really need debug mode assertions that ca->ref and ca->io_ref are
used correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ceda1b9a 25-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Evict btree nodes we're deleting

There was a bug that led to duplicate btree node pointers being inserted
at the wrong level. The new topology repair code can fix that, except
that the btree cache code gets confused when we read in a btree node
from the pointer that was at the wrong level. This patch evicts nodes
that we're deleting to, which nicely solves the problem.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# aae15aaf 24-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New and improved topology repair code

This splits out btree topology repair into a separate pass, and makes
some improvements:
- When we have to pick which of two overlapping nodes to drop keys
from, we use the btree node header sequence number to preserve the
newer node

- the gc code has been changed so that it doesn't bail out if we're
continuing/ignoring on fsck error - this way the dump tool can skip
running the repair pass but still walk all reachable metadata

- add a new superblock flag indicating when a filesystem is known to
have btree topology issues, and the topology repair pass should be
run

- changing the start/end of a node might mean keys in that node have to
be deleted: this patch handles that better by splitting it out into a
separate function and running it explicitly in the topology repair
code, previously those keys were only being dropped when the btree
node was read in.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c8441be 23-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix repair leading to replicas not marked

bch2_check_fix_ptrs() was being called after checking if the replicas
set was marked - but repair could change which replicas set needed to be
marked. Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dac1525d 16-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: gc shouldn't care about owned_by_allocator

The owned_by_allocator field is a purely in memory thing, even if/when
we bring back GC at runtime there's no need for it to be recalculating
this field. This is prep work for pulling it out of struct bucket, and
eventually getting rid of the bucket array.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1b9374ad 14-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_gc_done() error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d44a6e35 13-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop old style btree node coalescing

We have foreground btree node merging now, and any future btree node
merging improvements are going to be based off of that code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e949fbbb 13-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Ensure bucket gen gc completes

We don't want it to block, if it can't allocate it should just continue
instead of possibly deadlocking.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ac516d0e 13-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add the status of bucket gen gc to sysfs

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d7f35163 09-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix BTREE_ITER_NOT_EXTENTS

bch2_btree_iter_peek() wasn't properly checking for
BTREE_ITER_IS_EXTENTS when updating iter->pos.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0e96452e 09-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_gc_btree_gens()

Since we're using a NOT_EXTENTS iterator, we shouldn't be setting the
iter pos to the start of the extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e264b2f6 31-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve bch2_btree_update_start()

bch2_btree_update_start() is now responsible for taking gc_lock and
upgrading the iterator to lock parent nodes - greatly simplifying error
handling and all of the callers.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e751c01a 24-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Start using bpos.snapshot field

This patch starts treating the bpos.snapshot field like part of the key
in the btree code:

* bpos_successor() and bpos_predecessor() now include the snapshot field
* Keys in btrees that will be using snapshots (extents, inodes, dirents
and xattrs) now always have their snapshot field set to U32_MAX

The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that
determines whether we're iterating over keys in all snapshots or not -
internally, this controlls whether bkey_(successor|predecessor)
increment/decrement the snapshot field, or only the higher bits of the
key.

We add a new member to struct btree_iter, iter->snapshot: when
BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always
equal iter->snapshot, which will be 0 for btrees that don't use
snapshots, and alsways U32_MAX for btrees that will use snapshots
(until we enable snapshot creation).

This patch also introduces a new metadata version number, and compat
code for reading from/writing to older versions - this isn't a forced
upgrade (yet).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4cf91b02 04-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out bpos_cmp() and bkey_cmp()

With snapshots, we're going to need to differentiate between comparisons
that should and shouldn't include the snapshot field. bpos_cmp is now
the comparison function that does include the snapshot field, used by
core btree code.

Upper level filesystem code generally does _not_ want to compare against
the snapshot field - that code wants keys to compare as equal even when
one of them is in an ancestor snapshot.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3bf57160 26-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix packed bkey format calculation for new btree roots

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0390ea8a 24-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop bkey noops

Bkey noops were introduced to deal with trimming inline data extents in
place in the btree: if the u64s field of a bkey was 0, that u64 was a
noop and we'd start looking for the next bkey immediately after it.

But extent handling has been lifted above the btree - we no longer
modify existing extents in place in the btree, and the compatibilty code
for old style extent btree nodes is gone, so we can completely drop this
code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e0ba3b64 21-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Replace bch2_btree_iter_next() calls with bch2_btree_iter_advance

The way btree iterators work internally has been changing, particularly
with the iter->real_pos changes, and bch2_btree_iter_next() is no longer
hyper optimized - it's just advance followed by peek, so it's more
efficient to just call advance where we're not using the return value of
bch2_btree_iter_next().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7e6dbac9 19-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bkey ops->debugcheck method

This code used to be used for running some assertions on alloc info at
runtime, but it long predates fsck and hasn't been good for much in
ages - we can delete it now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 50dc0f69 19-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Require all btree iterators to be freed

We keep running into occasional bugs with btree transaction iterators
overflowing - this will make those bugs more visible.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b3b66e30 12-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Have fsck check for stripe pointers matching stripe

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f020bfcd 04-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bch2_bpos_to_text() more consistently

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 006d69aa 16-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't drop ptrs to btree nodes

If a ptr gen doesn't match the bucket gen, the bucket likely doesn't
contain the data we want - but it's still possible the data we want
might have been overwritten, and for btree node pointers we can verify
whether or not the node is the one we wanted with the node's sequence
number, so it's better to keep the pointer and try reading from it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d065472c 16-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a use-after-free in bch2_gc_mark_key()

bch2_check_fix_ptrs() can update/reallocate k

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 41e37786 16-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Bring back metadata only gc

This is useful for the filesystem dump debugging tool - when we're
hitting bugs we want to skip as much of the recovery process as
possible, and the dump tool only needs to know where metadata lives.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 19dd3172 04-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use x-macros for compat feature bits

This is to generate strings for them, so that we can print them out.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 33a391a2 24-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix some (spurious) warnings about uninitialized vars

These are only complained about when building in userspace, for some
reason.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a4805d66 22-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Scan for old btree nodes if necessary on mount

We dropped support for !BTREE_NODE_NEW_EXTENT_OVERWRITE but it turned
out there were people who still had filesystems with btree nodes in that
format in the wild. This adds a new compat feature that indicates we've
scanned for and rewritten nodes in the old format, and does that scan at
mount time if the option isn't set.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dab9ef0d 23-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add error message for some allocation failures

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0507962f 17-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop invalid stripe ptrs in fsck

More repair code, now that we can repair extents during initial gc.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 180fb49d 21-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Journal updates to dev usage

This eliminates the need to scan every bucket to regenerate dev_usage at
mount time.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2abe5420 21-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Persist 64 bit io clocks

Originally, bcachefs - going back to bcache - stored, for each bucket, a
16 bit counter corresponding to how long it had been since the bucket
was read from. But, this required periodically rescaling counters on
every bucket to avoid wraparound. That wasn't an issue in bcache, where
we'd perodically rewrite the per bucket metadata all at once, but in
bcachefs we're trying to avoid having to walk every single bucket.

This patch switches to persisting 64 bit io clocks, corresponding to the
64 bit bucket timestaps introduced in the previous patch with
KEY_TYPE_alloc_v2.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5fc70d3a 27-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Repair bad data pointers

Now that we can repair metadata during GC, we can handle bad pointers
that would trigger errors being marked, when they need to just be
dropped.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a0b73c1c 26-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add (partial) support for fixing btree topology

When we walk the btrees during recovery, part of that is checking that
btree topology is correct: for every interior btree node, its child
nodes should exactly span the range the parent node covers.

Previously, we had checks for this, but not repair code. Now that we
have the ability to do btree updates during initial GC, this patch adds
that repair code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5b593ee1 26-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add support for doing btree updates prior to journal replay

Some errors may need to be fixed in order for GC to successfully run -
walk and mark all metadata. But we can't start the allocators and do
normal btree updates until after GC has completed, and allocation
information is known to be consistent, so we need a different method of
doing btree updates.

Fortunately, we already have code for walking the btree while overlaying
keys from the journal to be replayed. This patch adds an update path
that adds keys to the list of keys to be replayed by journal replay, and
also fixes up iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a66f7989 26-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor checking of btree topology

Still a lot of work to be done here: we can't yet repair btree topology
issues, but this patch refactors things so that we have better access to
what we need in the topology checks. Next up will be figuring out a way
to do btree updates during gc, before journal replay is done.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 079663d8 21-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill metadata only gc

This was useful before we had transactional updates to interior btree
nodes - but now, it's just extra unneeded complexity.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6e53151b 17-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill stripe->dirty

This makes bch2_stripes_write() work more like bch2_alloc_write().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a39c74be 17-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix gc updating stripes info

The primary stripes radix tree can be sparse, which was causing an
assertion to pop because the one use for gc isn't. Fix this by changing
the algorithm to copy between the two radix trees.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ac958006 14-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Factor out bch2_ec_stripes_heap_start()

This fixes a bug where mark and sweep gc incorrectly was clearing out
the stripes heap and causing assertions to fire later - simpler to just
create the stripes heap after gc has finished.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4291a331 08-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_alloc_write() should be writing for all devices

Alloc info isn't stored on a particular device, it makes no sense to
only be writing it out for rw members - this was causing fsck to not fix
alloc info errors, oops.

Also, make sure we write out alloc info in other repair paths.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 07a1006a 17-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reduce/kill BKEY_PADDED use

With various newer key types - stripe keys, inline data extents - the
old approach of calculating the maximum size of the value is becoming
more and more error prone. Better to switch to bkey_on_stack, which can
dynamically allocate if necessary to handle any size bkey.

In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3187aa8d 21-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much

Previously, we were using BTREE_INSERT_RESERVE in a lot of places where
it no longer makes sense.

- we now have more open_buckets than we used to, and the reserves work
better, so we shouldn't need to use BTREE_INSERT_RESERVE just because
we're holding open_buckets pinned anymore.

- We have the btree key cache for updates to the alloc btree, meaning
we no longer need the btree reserve to ensure the allocator can make
forward progress.

This means that we should only need a reserve for btree updates to
ensure that copygc can make forward progress.

Since it's now just for copygc, we can also fold RESERVE_BTREE into
RESERVE_MOVINGGC (the allocator's freelist reserve).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f299d573 13-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor filesystem usage accounting

Various filesystem usage counters are kept in percpu counters, with one
set per in flight journal buffer. Right now all the code that deals with
it assumes that there's only two buffers/sets of counters, but the
number of journal bufs is getting increased to 4 in the next patch - so
refactor that code to not assume a constant.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b7a9bbfc 19-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move journal reclaim to a kthread

This is to make tracing easier.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 29364f34 02-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop sysfs interface to debug parameters

It's not used much anymore, the module paramter interface is better.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8d6b6222 16-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improvements to writing alloc info

Now that we've got transactional alloc info updates (and have for
awhile), we don't need to write it out on shutdown, and we don't need to
write it out on startup except when GC found errors - this is a big
improvement to mount/unmount performance.

This patch also fixes a few bugs where we weren't writing out alloc
info (on new filesystems, and new devices) and should have been.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c47c50f8 13-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix gc of stale ptr gens

Awhile back, gcing of stale pointers was split out from full
mark-and-sweep gc - but, the bit to actually drop those stale pointers
wasn't implemnted. Whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 74ed7e56 21-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't let copygc buckets be stolen by other threads

And assorted other copygc fixes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 988e98cf 10-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor replicas code

Awhile back the mechanism for garbage collecting unused replicas entries
was significantly improved, but some cleanup was missed - this patch
does that now.

This is also prep work for a patch to account for erasure coded parity
blocks separately - we need to consolidate the logic for
checking/marking the various replicas entries from one bkey into a
single function.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 89fd25be 09-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use x-macros for data types

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ba6dd1dd 06-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve stripe triggers/heap code

Soon we'll be able to modify existing stripes - replacing empty blocks
with new blocks and new p/q blocks. This patch updates the trigger code
to handle pointers changing in an existing stripe; also, it
significantly improves how the stripes heap works, which means we can
get rid of the stripe creation/deletion lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b9c3d139 17-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a deadlock in the RO path

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 451570a5 15-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Implement a new gc that only recalcs oldest gen

Full mark and sweep gc doesn't (yet?) work with the new btree key cache
code, but it also blocks updates to interior btree nodes for the
duration and isn't really necessary in practice; we aren't currently
attempting to repair errors in allocation info at runtime.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1ada1606 15-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Turn c->state_lock into an rwsem

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 00b8ccf7 25-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Interior btree updates are now fully transactional

We now update the alloc info (bucket sector counts) atomically with
journalling the update to the interior btree nodes, and we also set new
btree roots atomically with the journalled part of the btree update.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# aafcf9bc 24-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Better error messages on bucket sector count overflows

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1e1a31c4 28-Apr-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add some printks for error paths

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0f9dda47 05-Apr-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a deadlock on starting an interior btree update

Not legal to block on a journal prereservation with btree locks held.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d06c1a0c 29-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check btree topology at startup

When initial btree gc was changed to overlay journal keys as it walks
the btree, it also stopped checking btree topology.

Previously, checking btree topology was a fairly complicated affair -
but it's much easier now that btree_ptr_v2 has min_key in the pointer.

This rewrites the old range_checks code and uses it in both runtime and
initial gc.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 39fb2983 07-Jan-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bkey_type_successor

Previously, BTREE_ID_INODES was special - inodes were indexed by the
inode field, which meant the offset field of struct bpos wasn't used,
which led to special cases in e.g. the btree iterator code.

Now, inodes in the inodes btree are indexed by the offset field.

Also: prevously min_key was special for extents btrees, min_key for
extents would equal max_key for the previous node. Now, min_key =
bkey_successor() of the previous node, same as non extent btrees.

This means we can completely get rid of
btree_type_sucessor/predecessor.

Also make some improvements to the metadata IO validate/compat code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e62d65f2 15-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trans_commit() path can now insert to interior nodes

This will be needed for the upcoming patches to journal updates to
interior btree nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e3e464ac 30-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move extent overwrite handling out of core btree code

Ever since the btree code was first written, handling of overwriting
existing extents - including partially overwriting and splittin existing
extents - was handled as part of the core btree insert path. The modern
transaction and iterator infrastructure didn't exist then, so that was
the only way for it to be done.

This patch moves that outside of the core btree code to a pass that runs
at transaction commit time.

This is a significant simplification to the btree code and overall
reduction in code size, but more importantly it gets us much closer to
the core btree code being completely independent of extents and is
important prep work for snapshots.

This introduces a new feature bit; the old and new extent update models
are incompatible when the filesystem needs journal replay.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a9bc0a51 18-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check for bad key version number

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2d594dfb 31-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out btree_trigger_flags

The trigger flags really belong with individual btree_insert_entries,
not the transaction commit flags - this splits out those flags and
unifies them with the BCH_BUCKET_MARK flags. Todo - split out
btree_trigger.c from buckets.c

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ad44bdc3 09-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bkey noops

For upcoming inline data extents, we're going to need to be able to
shorten the value of existing bkeys in the btree - and to make that work
we're going to be able to need to pad out the space the value previously
took up with something.

This patch changes the various code that iterates over bkeys to handle
k->u64s == 0 as meaning "skip the next 8 bytes".

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ea3532cb 11-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a subtle race in the btree split path

We have to free the old (in memory) btree node _before_ unlocking the
new nodes - else, some other thread with a read lock on the old node
could see stale data after another thread has already updated the new
node.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f7c0fcdd 08-Oct-2019 Justin Husted <sigstop@gmail.com>

bcachefs: Fix uninitialized data in bch2_gc_btree()

Running the filesystem under valgrind exposed a path where the max_stale
variable in bch2_gc_btree() might not be initialized before use in a
rare case when there are no btree nodes in a transaction.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0741d378 04-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't allocate memory under mark_lock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 89b05118 06-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Flush fsck errors when looping in btree gc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 06ab329c 29-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve pointer marking checks and error messages

Importantly, we don't want to use bch2_fs_inconsistent_on() for errors
that fsck can repair, becuase that will just put us in RO mode and
prevent fsck from actually fixing stuff. Probably want to get rid of it
in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6671a708 27-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor bch2_alloc_write()

Major simplification - gets rid of the need for marking buckets as
dirty, instead we write buckets if the in memory mark is different from
what's in the btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2cbe5cfe 09-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rework calling convention for marking overwrites

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6e738539 24-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve key marking interface

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 20bceecb 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More work to avoid transaction restarts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c43a6ef9 05-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_bkey_cached_common

This is prep work for the btree key cache: btree iterators will point to
either struct btree, or a new struct bkey_cached.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5e82a9a1 10-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Write out fs usage consistently

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 94f651e2 17-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Return errors from for_each_btree_key()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c6dd04f8 15-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Mark overwrites from journal replay in initial gc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d0734356 11-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Deduplicate keys in the journal before replay

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4881fdb7 04-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: initial gc no longer needs to touch every node

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a1d58243 29-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: add ability to run gc on metadata only

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cccf4e6d 28-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert gc errors to fsck errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 36e916e1 29-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Caller now responsible for calling mark_key for gc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3a0e06db 24-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted preemption fixes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ccaa61c9 28-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix initial gc

Buckets weren't being marked as dirty

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 05b3d5ac 28-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: simplify gc locking a bit

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 424eb881 25-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Only get btree iters from btree transactions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8b2b9d11 21-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix error handling in gc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6122ab63 21-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More debug params for testing of recovery paths

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 28062d32 20-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix gc handling of bucket gens

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ecf37a4a 14-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fs_usage_u64s()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 768ac639 14-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a mechanism for blocking the journal

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8777210b 12-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: refactor key marking code a bit

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 39fbc5a4 11-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: gc lock no longer needed for disk reservations

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 76f4c7b0 11-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix oldest_gen handling

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1df42b57 06-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: don't do initial gc if have alloc info feature

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 42b72e0b 24-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: journal_replay_early()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 61c8d7c8 25-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Persist stripe blocks_used

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7ef2a73a 21-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix check for if extent update is allocating

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 23f80d2b 17-Dec-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Factor out acc_u64s()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 06b7345c 01-Dec-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Include summarized counts in fs_usage

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9166b41d 25-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: s/usage_lock/mark_lock

better describes what it's for, and we're going to call a new lock
usage_lock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8eb7f3ee 18-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: move dirty into bucket_mark

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 90541a74 21-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add new alloc fields

prep work for persistent alloc info

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f0cfb963 29-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Track nr_inodes with the key marking machinery

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 26609b61 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make bkey types globally unique

this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.

Also improve the on disk format versioning stuff.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d034c09b 27-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: return errors correctly from gc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# eeb83e25 22-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Hold usage_lock over mark_key and fs_usage_apply

Fixes an inconsistency at the end of gc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dfe9bfb3 24-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Stripes now properly subject to gc

gc now verifies the contents of the stripes radix tree, important for
persistent alloc info

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ad7ae8d6 23-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Btree locking fix, refactoring

Hit an assertion, probably spurious, indicating an iterator was unlocked
when it shouldn't have been (spurious because it wasn't locked at all
when the caller called btree_insert_at()).

Add a flag, BTREE_ITER_NOUNLOCK, and tighten up the assertions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9ca53b55 23-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: gc now operates on second set of bucket marks

This means we can now use gc to verify the allocation information -
important for testing persistant alloc info

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cd575ddf 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Erasure coding

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 91f8b567 12-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More btree gc refactorings

more prep work for erasure coding

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1d25849c 07-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Centralize marking of replicas in btree update path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 47799326 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: more key marking refactoring

prep work for erasure coding

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 103e2127 30-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: replicas: prep work for stripes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2252aa27 21-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree gc refactoring

prep work for erasure coding

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ef337c54 06-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Allocation code refactoring

bch2_alloc_sectors_start() was a nightmare to work with - it's got some
tricky stuff to do, since it wants to use the buckets the writepoint
already has, unless they're not in the target it wants to write to,
unless it can't allocate from any other devices in which case it will
use those buckets if it has to - et cetera.

This restructures the code to start with a new empty list of open
buckets we're going to use for the new allocation, pulling buckets from
the write point's list as we decide that we really are going to use
them - making the code somewhat more functional and drastically easier
to understand.

Also fixes a bug where we could end up waiting on c->freelist_wait
(because allocating from one device failed) but return success from
bch2_bucket_alloc(), because allocating from a different device
succeeded.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7b3f84ea 05-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out alloc_background.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 216c9fac 11-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Pass around bset_tree less

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 271a3d3a 21-Jul-2016 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: lift ordering restriction on 0 size extents

This lifts the restriction that 0 size extents must not overlap with
other extents, which means we can now sort extents and non extents the
same way, and will let us simplify a bunch of other stuff as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5b650fd1 24-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Account for internal fragmentation better

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a7c7a309 23-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_mark_key() now takes bch_data_type

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b2be7c8b 22-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: kill bucket mark sector count saturation

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 97446a24 20-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix device add

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c6fdbd8 17-Mar-2017 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Initial commit

Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>