History log of /linux-master/fs/bcachefs/alloc_background.c
Revision Date Author Comments
# 7ee88737 02-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Check for bad needs_discard before doing discard

In the discard worker, we were failing to validate the bucket state -
meaning a corrupt needs_discard btree could cause us to discard a bucket
that we shouldn't.

If check_alloc_info hasn't run yet we just want to bail out, otherwise
it's a filesystem inconsistent error.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3ed94062 17-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve bch2_fatal_error()

error messages should always include __func__

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1ba6f48f 17-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix nested transaction restart handling in bch2_bucket_gens_init()

Nested transaction restart handling is typically best avoided; when the
inner context handles a transaction restart it invalidates the outer
transaction context, so we need to make sure to return a
transaction_restart_nested error.

This code wasn't doing that, and hit the assertion in
for_each_btree_key() that checks for that via trans->restart_count.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cdce1094 11-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: reconstruct_alloc cleanup

Now that we've got the errors_silent mechanism, we don't have to check
if the reconstruct_alloc option is set all over the place.

Also - users no longer have to explicitly select fsck and fix_errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a393f331 15-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Split out discard fastpath

Buckets usually can't be discarded until the transaction that made them
empty has been committed in the journal.

Tracing has indicated that we're queuing the discard worker excessively,
only for it to skip over many buckets that are still waiting on a
journal commit, discarding only one or two buckets per iteration.

We want to switch to only queuing the discard worker after a journal
flush write, but there's an important optimization we need to preserve:
if a bucket becomes empty and it was never committed in the journal
while it was in use, we want to discard it and reuse it right away -
since overwriting it before the previous writes are flushed from the
device cache eans those writes only cost bus bandwidth.

So, this patch implements a fast path for buckets that can be discarded
right away. We need new locking between the two discard workers; the new
list of buckets being discarded provides that locking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6e9d0558 15-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_trigger_alloc() handles state changes better

bch2_trigger_alloc() kicks off certain tasks on bucket state changes;
e.g. triggering the bucket discard worker and the invalidate worker.

We've observed the discard worker running too often - most runs it
doesn't do any work, according to the tracepoint - so clearly, we're
kicking it off too often.

This adds an explicit statechange() macro to make these checks more
precise.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 096386a5 22-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: discard path uses unlock_long()

Some (bad) devices can have really terrible discard latency; we don't
want them blocking memory reclaim and causing warnings.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a6548c8b 15-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Avoid flushing the journal in the discard path

When issuing discards, we may need to flush the journal if there's too
many buckets that can't be discarded until a journal flush.

But the heuristic was bad; we should be comparing the number of buckets
that need to flushes against the number of free buckets, not the number
of buckets we saw.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e58f963c 06-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: helpers for printing data types

We need bounds checking since new versions may introduce new data types.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 38c23fb8 07-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: BTREE_TRIGGER_ATOMIC

Add a new flag to be explicit about when we're running atomic triggers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9d5dba2b 06-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: drop to_text code for obsolete bps in alloc keys

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 074cbcda 03-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fsck_err()s don't need to manually check c->sb.version anymore

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 153d1c63 27-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: unify alloc trigger

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6820ac2c 27-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: move bch2_mark_alloc() to alloc_background.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 717296c3 27-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: trans_mark now takes bkey_s

Prep work for disk space accounting rewrite: we're going to want to use
a single callback for both of our current triggers, so we need to change
them to have the same type signature first.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 07f383c7 03-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: btree_iter -> btree_path_idx_t

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 41b84fb4 17-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: for_each_member_device_rcu() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9fea2274 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: for_each_member_device() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 80eab7a7 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: for_each_btree_key() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3a860b5a 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: for_each_btree_key_upto() -> for_each_btree_key_old_upto()

And for_each_btree_key2_upto -> for_each_btree_key_upto

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cf904c8d 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_err_(fn|msg) check if should print

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5028b907 07-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename for_each_btree_key2() -> for_each_btree_key()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 27b2df98 07-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill for_each_btree_key()

for_each_btree_key() handles transaction restarts, like
for_each_btree_key2(), but only calls bch2_trans_begin() after a
transaction restart - for_each_btree_key2() wraps every loop iteration
in a transaction.

The for_each_btree_key() behaviour is problematic when it leads to
holding the SRCU lock that prevents key cache reclaim for an unbounded
amount of time - there's no real need to keep it around.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3f0e297d 28-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Explicity go RW for fsck

This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less
error prone.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 74644030 27-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: count_event()

Small helper for event counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cb13f471 02-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_write_buffer_flush() -> bch2_btree_write_buffer_tryflush()

More accurate naming.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3f59547e 25-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Refactor bch2_check_alloc_to_lru_ref()

This code was somewhat convoluted - because originally bch2_lru_set()
could modify the LRU index if there was a collision.

That's no longer the case, so the "create LRU entry" path has no reason
to update the alloc key, so we can separate the handling of the two fsck
errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dafff7e5 23-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New bucket sector count helpers

This introduces bch2_bucket_sectors() and bch2_bucket_sectors_dirty(),
prep work for separately accounting stripe sectors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 25f64e99 11-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't use update_cached_sectors() in bch2_mark_alloc()

bch2_update_cached_sectors_list() is closer to how the new disk space
accounting works, called from trans_mark().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cb52d23e 11-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename BTREE_INSERT flags

BTREE_INSERT flags are actually transaction commit flags - rename them
for clarity.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7d9ae04e 16-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix locking when checking freespace btree

On transaction restart, we weren't re-validating the hole we saw.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7cb2a789 03-Nov-2023 Brian Foster <bfoster@redhat.com>

bcachefs: use swab40 for bch_backpointer.bucket_offset bitfield

The bucket_offset field of bch_backpointer is a 40-bit bitfield, but the
bch2_backpointer_swab() helper uses swab32. This leads to inconsistency
when an on-disk fs is accessed from an opposite endian machine.

As it turns out, we already have an internal swab40() helper that is
used from the bch_alloc_v4 swab callback. Lift it into the backpointers
header file and use it consistently in both places.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0996c72a 03-Nov-2023 Brian Foster <bfoster@redhat.com>

bcachefs: byte order swap bch_alloc_v4.fragmentation_lru field

A simple test to populate a filesystem on one CPU architecture and
fsck on an arch of the opposite byte order produces errors related
to the fragmentation LRU. This occurs because the 64-bit
fragmentation_lru field is not byte-order swapped when reads detect
that the on-disk/bset key values were written in opposite byte-order
of the current CPU.

Update the bch2_alloc_v4 swab callback to handle fragmentation_lru
as is done for other multi-byte fields. This doesn't affect existing
filesystems when accessed by CPUs of the same endianness because the
->swab() callback is only called when the bset flags indicate an
endianness mismatch between the CPU and on-disk data.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1f7056b7 30-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Ensure copygc does not spin

If copygc does no work - finds no fragmented buckets - wait for a bit of
IO to happen.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b65db750 24-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Enumerate fsck errors

This patch adds a superblock error counter for every distinct fsck
error; this means that when analyzing filesystems out in the wild we'll
be able to see what sorts of inconsistencies are being found and repair,
and hence what bugs to look for.

Errors validating bkeys are not yet considered distinct fsck errors, but
this patch adds a new helper, bkey_fsck_err(), in order to add distinct
error types for them as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 88dfe193 19-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_id_str()

Since we can run with unknown btree IDs, we can't directly index btree
IDs into fixed size arrays.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 69d1f052 28-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Correctly initialize new buckets on device resize

bch2_dev_resize() was never updated for the allocator rewrite with
persistent freelists, and it wasn't noticed because the tests weren't
running fsck - oops.

Fix this by running bch2_dev_freespace_init() for the new buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3f7b9713 24-Sep-2023 Hunter Shaffer <huntershaffer182456@gmail.com>

bcachefs: New superblock section members_v2

members_v2 has dynamically resizable entries so that we can extend
bch_member. The members can no longer be accessed with simple array
indexing Instead members_v2_get is used to find a member's exact
location within the array and returns a copy of that member.
Alternatively member_v2_get_mut retrieves a mutable point to a member.

Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6bd68ec2 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Heap allocate btree_trans

We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96dea3d5 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix W=12 build errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0940863f 12-Sep-2023 Nathan Chancellor <nathan@kernel.org>

bcachefs: Fix -Wformat in bch2_bucket_gens_invalid()

When building bcachefs for 32-bit ARM, there is a compiler warning in
bch2_bucket_gens_invalid() due to use of an incorrect format specifier:

fs/bcachefs/alloc_background.c:530:10: error: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Werror,-Wformat]
529 | prt_printf(err, "bad val size (%lu != %zu)",
| ~~~
| %zu
530 | bkey_val_bytes(k.k), sizeof(struct bch_bucket_gens));
| ^~~~~~~~~~~~~~~~~~~
fs/bcachefs/util.h:223:54: note: expanded from macro 'prt_printf'
223 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__)
| ^~~~~~~~~~~

On 64-bit architectures, size_t is 'unsigned long', so there is no
warning when using %lu but on 32-bit architectures, size_t is 'unsigned
int'. Use '%zu', the format specifier for 'size_t', to eliminate the
warning.

Fixes: 4be0d766a7e9 ("bcachefs: bucket_gens btree")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 14f63ff3 12-Sep-2023 Nathan Chancellor <nathan@kernel.org>

bcachefs: Fix -Wformat in bch2_alloc_v4_invalid()

When building bcachefs for 32-bit ARM, there is a compiler warning in
bch2_alloc_v4_invalid() due to use of an incorrect format specifier:

fs/bcachefs/alloc_background.c:246:30: error: format specifies type 'unsigned long' but the argument has type 'unsigned int' [-Werror,-Wformat]
245 | prt_printf(err, "bad val size (%u > %lu)",
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| %u
246 | alloc_v4_u64s(a.v), bkey_val_u64s(k.k));
| ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
fs/bcachefs/bkey.h:58:27: note: expanded from macro 'bkey_val_u64s'
58 | #define bkey_val_u64s(_k) ((_k)->u64s - BKEY_U64s)
| ^
fs/bcachefs/util.h:223:54: note: expanded from macro 'prt_printf'
223 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__)
| ^~~~~~~~~~~

This expression is of type 'size_t'. On 64-bit architectures, size_t is
'unsigned long', so there is no warning when using %lu but on 32-bit
architectures, size_t is 'unsigned int'. Use '%zu', the format specifier
for 'size_t' to eliminate the warning.

Fixes: 11be8e8db283 ("bcachefs: New on disk format: Backpointers")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# aef32bf7 11-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: __bch2_btree_insert() -> bch2_btree_insert_trans()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e46c181a 10-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert more code to bch_err_msg()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cba37d81 24-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill stripe check in bch2_alloc_v4_invalid()

Since we set bucket data type to BCH_DATA_stripe based on the data
pointer, not just the stripe pointer, it doesn't make sense to check for
no stripe in the .key_invalid method - this is a situation that
shouldn't happen, but our other fsck/repair code handles it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 71aba590 22-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Always check alloc data type

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bf5a261c 01-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted fixes for clang

clang had a few more warnings about enum conversion, and also didn't
like the opts.c initializer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 067d228b 07-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Enumerate recovery passes

Recovery and fsck have many different passes/jobs to do, which always
run in the same order - but not all of them run all the time. Some are
for fsck, some for unclean shutdown, some for version upgrades.

This adds some new structure: a defined list of recovery passes that we
can run in a loop, as well as consolidating the log messages.

The main benefit is consolidating the "should run this recovery pass"
logic, as well as cleaning up the "this recovery pass has finished"
state; instead of having a bunch of ad-hoc state bits in c->flags, we've
now got c->curr_recovery_pass.

By consolidating the "should run this recovery pass" logic, in the
future on disk format upgrades will be able to say "upgrading to this
version requires x passes to run", instead of forcing all of fsck to
run.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 10a6ced2 08-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill bch2_bucket_gens_read()

This folds bch2_bucket_gens_read() into bch2_alloc_read(), doing the
version check there.

This is prep work for enumarating all recovery passes: we need some
cleanup first to make calling all the recovery passes consistent.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 24964e1c 28-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE()

Version upgrades are not atomic operations: when we do a version upgrade
we need to update the superblock before we start using new features, and
then when the upgrade completes we need to update the superblock again.
This adds a new superblock field so we can detect and handle incomplete
version upgrades.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8726dc93 06-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Change check for invalid key types

As part of the forward compatibility patch series, we need to allow for
new key types without complaining loudly when running an old version.

This patch changes the flags parameter of bkey_invalid to an enum, and
adds a new flag to indicate we're being called from the transaction
commit path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 73bd774d 06-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted sparse fixes

- endianness fixes
- mark some things static
- fix a few __percpu annotations
- fix silent enum conversions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f33c58fc 27-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill BTREE_INSERT_USE_RESERVE

Now that we have journal watermarks and alloc watermarks unified,
BTREE_INSERT_USE_RESERVE is redundant and can be deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 298ac24e 26-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Reduce stack frame size of bch2_check_alloc_info()

Excessive inlining may (on some versions of gcc?) cause excessive stack
usage; this turns off some inlining in bch2_check_alloc_info.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1bb3c2a9 20-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New error message helpers

Add two new helpers for printing error messages with __func__ and
bch2_err_str():
- bch_err_fn
- bch_err_msg

Also kill the old error strings in the recovery path, which were causing
us to incorrectly report memory allocation failures - they're not needed
anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e96f5a61 18-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix bch2_check_discard_freespace_key()

We weren't correctly checking the freespace btree - it's an extents
btree, which means we need to iterate over each bucket in a freespace
extent.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# faa62a20 20-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: alloc_v4_u64s() fix

With the recent bkey_ops.min_val_size addition, bkey values are
automatically extended to the size of the current version.

The check in bch2_alloc_v4_invalid() needs to be updated to take this
into account.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dbda63bb 30-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_bkey_make_mut() now calls bch2_trans_update()

It's safe to call bch2_trans_update with a k/v pair where the value
hasn't been filled out, as long as the key part has been and the value
is filled out by transaction commit time.

This patch folds the bch2_trans_update() call into bch2_bkey_make_mut(),
eliminating a bit of boilerplate.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 34dfa5db 27-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_bkey_get_mut() improvements

- bch2_bkey_get_mut() now handles types increasing in size, allocating
a buffer for the type's current size when necessary
- bch2_bkey_make_mut_typed()
- bch2_bkey_get_mut() now initializes the iterator, like
bch2_bkey_get_iter()

Also, refactor so that most of the code is in functions - now macros are
only used for wrappers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bcb79a51 29-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_bkey_get_iter() helpers

Introduce new helpers for a common pattern:

bch2_trans_iter_init();
bch2_btree_iter_peek_slot();

- bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of
the correct type
- bch2_bkey_get_val_typed() copies the val out of the btree to a
(typically stack allocated) variable; it handles the case where the
value in the btree is smaller than the current version of the type,
zeroing out the remainder.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 251babb5 18-Apr-2023 Brian Foster <bfoster@redhat.com>

bcachefs: fix NULL bch_dev deref when checking bucket_gens keys

fsck removes bucket_gens keys for devices that do not exist in the
volume (i.e., if the device was removed). In 'fsck -n' mode, the
associated fsck_err_on() wrapper returns false to skip the key
removal. This proceeds on to the rest of the function, which
eventually segfaults on a NULL bch_dev because the device does not
exist.

Update bch2_check_bucket_gens_key() to skip out of the rest of the
function when the associated device does not exist, regardless of
running fsck in check or repair mode.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 615fccad 16-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix a slab-out-of-bounds

In __bch2_alloc_to_v4_mut(), we overrun the buffer we allocate if the
alloc key had backpointers stored in it (which we no longer support).

Fix this with a max() call.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 62a03559 31-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rip out code for storing backpointers in alloc keys

We don't store backpointers in alloc keys anymore, since we gained the
btree write buffer.

This patch drops support for backpointers in alloc keys, and revs the on
disk format version so that we know a fsck is required.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1546cf97 28-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix bch2_get_key_or_hole()

This fixes an off by one error, due to confusing closed vs. half open
intervals.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e9b9e475 22-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_dev_freespace_init() Print out status every 10 seconds

It appears freespace init can still take awhile, and we've had a report
or two of it getting stuck - let's have it print out where it's at every
10 seconds.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8bff9875 23-Mar-2023 Brian Foster <bfoster@redhat.com>

bcachefs: use dedicated workqueue for tasks holding write refs

A workqueue resource deadlock has been observed when running fsck
on a filesystem with a full/stuck journal. fsck is not currently
able to repair the fs due to fairly rapid emergency shutdown, but
rather than exit gracefully the fsck process hangs during the
shutdown sequence. Fortunately this is easily recoverable from
userspace, but the root cause involves code shared between the
kernel and userspace and so should be addressed.

The deadlock scenario involves the main task in the bch2_fs_stop()
-> bch2_fs_read_only() path waiting on write references to drain
with the fs state lock held. A bch2_read_only_work() workqueue task
is scheduled on the system_long_wq, blocked on the state lock.
Finally, various other write ref holding workqueue tasks are
scheduled to run on the same workqueue and must complete in order to
release references that the initial task is waiting on.

To avoid this problem, we can split the dependent workqueue tasks
across different workqueues. It's a bit of a waste to create a
dedicated wq for the read-only worker, but there are several tasks
throughout the fs that follow the pattern of acquiring a write
reference and then scheduling to the system wq. Use a local wq
for such tasks to break the subtle dependency between these and the
read-only worker.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b40901b0 13-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New erasure coding shutdown path

This implements a new shutdown path for erasure coding, which is needed
for the upcoming BCH_WRITE_WAIT_FOR_EC write path.

The process is:
- Cancel new stripes being built up
- Close out/cancel open buckets on write points or the partial list
that are for stripes
- Shutdown rebalance/copygc
- Then wait for in flight new stripes to finish

With BCH_WRITE_WAIT_FOR_EC, move ops will be waiting on stripes to fill
up before they complete; the new ec shutdown path is needed for shutting
down copygc/rebalance without deadlocking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 46e14854 11-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix next_bucket()

This fixes an infinite loop in bch2_get_key_or_real_bucket_hole().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 39a1ea12 24-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Single open_bucket_partial list

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 84ddb8b9 17-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't invalidate open buckets

Like bch2_trans_mark_bucket(), we shouldn't be incrementing a bucket gen
while it's still open - erasure coding was hitting this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 80c33085 05-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fragmentation LRU

Now that we have much more efficient updates to the LRU btree, this
patch adds a new LRU that indexes buckets by fragmentation.

This means copygc no longer has to scan every bucket to find buckets
that need to be evacuated.

Changes:
- A new field in bch_alloc_v4, fragmentation_lru - this corresponds to
the bucket's position in the fragmentation LRU. We add a new field
for this instead of calculating it as needed because we may make the
fragmentation LRU optional; this field indicates whether a bucket is
on the fragmentation LRU.

Also, zoned devices will introduce variable bucket sizes; explicitly
recording the LRU position will be safer for them.

- A new copygc path for using the fragmentation LRU instead of
scanning every bucket and building up an in-memory heap.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1b30ed5f 06-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Use btree write buffer for LRU btree

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8ffa11a2 19-Jan-2023 Daniel Hill <daniel@gluo.nz>

bcachefs: let __bch2_btree_insert() pass in flags

This patch is prep work for the following patch.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 629a21b6 03-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve invalidate_one_bucket() error messages

Make sure to check for lru entries that point to buckets that don't
exist as well as buckets in the wrong state, and improve the error
message we print out.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dbe17f18 20-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: BKEY_INVALID_FROM_JOURNAL

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# facafdcb 20-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Change bkey_invalid() rw param to flags

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 83f33d68 05-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rework lru btree

This patch changes how the LRU index works:

Instead of using KEY_TYPE_lru where the bucket the lru entry points to
is part of the value, this switches to KEY_TYPE_set and encoding the
bucket we refer to in the low bits of the key.

This means that we no longer have to check for collisions when inserting
LRU entries. We'll be making using of this in the next patch, which adds
a btree write buffer - a pure write buffer for btree updates, where
updates are appended to a simple array and then periodically sorted and
batch inserted.

This is a new on disk format version, and a forced upgrade.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5250b74d 25-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bucket_gens btree

To improve mount times, add a btree for just bucket gens, 256 of them
per key: this means we'll have to scan drastically less metadata at
startup.

This adds
- trigger for keeping it in sync with the all btree
- initialization code, for filesystems from previous versions
- new path for reading bucket gens
- new fsck code

And a new on disk format version.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d23124c7 30-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve bch2_check_alloc_info()

This factors out a new helper from bch2_dev_freespace_init(),
bch2_get_key_or_hole(), and uses it in bch2_check_alloc_info(): we're
now able to process holes in the alloc btree as ranges, instead of one
bucket at a time.

This will improve fsck performance on new filesystems, or filesystems
where not every bucket has been used yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cc65f565 26-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve bch2_dev_freespace_init()

This makes bch2_dev_freespace_init() much faster: instead of processing
every bucket on the device one at a time, we handle ranges of missing
keys all at once: the freespace btree is an extents style btree, so we
only have to insert one freespace key for every range of missing keys
in the alloc btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a8c752bb 17-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New on disk format: Backpointers

This patch adds backpointers: we now have a reverse index from device
and offset on that device (specifically, offset within a bucket) back to
btree nodes and (non cached) data extents.

The first 40 backpointers within a bucket are stored in the alloc key;
after that backpointers spill over to the next backpointers btree. This
is to help avoid performance regressions from additional btree updates
on large streaming workloads.

This patch adds all the code for creating, checking and repairing
backpointers. The next patch in the series is going to use backpointers
for copygc - finally getting rid of the need to scan all extents to do
copygc.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f2b542ba 11-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Go RW before check_alloc_info()

It's possible to do btree updates before going RW by adding them to the
list of updates for journal replay to do, but this is limited by what
fits in RAM. This patch switches the second alloc info phase to run
after going RW - btree_gc has already ensured the alloc btree itself is
correct - and tweaks the allocation path to deal with the potential
small inconsistencies.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d94189ad 08-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Debug mode for c->writes references

This adds a debug mode where we split up the c->writes refcount into
distinct refcounts for every codepath that takes a reference, and adds
sysfs code to print the value of each ref.

This will make it easier to debug shutdown hangs due to refcount leaks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f52dd1ae 19-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix bch_alloc_to_text()

We weren't guarding against the alloc key having an invalid data type.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 19a614d2 30-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Better inlining for bch2_alloc_to_v4_mut

This separates out the slowpath into a separate function, and inlines
bch2_alloc_v4_mut into bch2_trans_start_alloc_update(), the main place
it's called.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 31381636 23-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_trans_relock_notrace()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 994ba475 23-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New btree helpers

This introduces some new conveniences, to help cut down on boilerplate:

- bch2_trans_kmalloc_nomemzero() - performance optimiation
- bch2_bkey_make_mut()
- bch2_bkey_get_mut()
- bch2_bkey_get_mut_typed()
- bch2_bkey_alloc()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 78c0b75c 19-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More errcode cleanup

We shouldn't be overloading standard error codes now that we have
provisions for bcachefs-specific errorcodes: this patch converts super.c
and super-io.c to per error site errcodes, with a bit of cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e88a75eb 24-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New bpos_cmp(), bkey_cmp() replacements

This patch introduces
- bpos_eq()
- bpos_lt()
- bpos_le()
- bpos_gt()
- bpos_ge()

and equivalent replacements for bkey_cmp().

Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 674cfc26 26-Aug-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add persistent counters for all tracepoints

Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 549d173c 17-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: EINTR -> BCH_ERR_transaction_restart

Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d4bf5eec 18-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bch2_err_str() in error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 615f867c 17-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improved errcodes

Instead of overloading standard error codes (EINTR/EAGAIN), and defining
short lists of error codes in multiple places that potentially end up
overlapping & conflicting, we're now going to have one master list of
error codes.

Error codes are defined with an x-macro: thus we also have
bch2_err_str() now.

Also, error codes have a class field. Now, instead of checking for
errors with ==, code should use bch2_err_matches(), which returns true
if the error is equal to or a sub-error of the error class.

This means we can define unique errors for every source location where
an error is generated, which will help improve our error messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 445d184a 16-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert alloc code to for_each_btree_key_commit()

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d04801a0 16-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert bch2_do_invalidates_work() to for_each_btree_key2()

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# ca91f40f 16-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert bch2_dev_freespace_init() to for_each_btree_key_commit()

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 4910a950 16-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert bch2_do_discards_work() to for_each_btree_key2()

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a1783320 15-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: for_each_btree_key2()

This introduces two new macros for iterating through the btree, with
transaction restart handling
- for_each_btree_key2()
- for_each_btree_key_commit()

Every iteration is now in an implicit transaction, and - as with
lockrestart_do() and commit_do() - returning -EINTR will cause the
transaction to be restarted, at the same key.

This patch converts a bunch of code that was open coding this to these
new macros, saving a substantial amount of code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e68914ca 13-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename __bch2_trans_do() -> commit_do()

Better/more descriptive naming, and prep for adding
nested_lockrestart_do() and nested_commit_do().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 80b3bf33 11-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Silence some fsck errors when reconstructing alloc info

There's no need to print fsck errors for errors that are expected, and
the user has already opted to repair.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 47ab0c5f 26-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_check_alloc_key()

bch2_check_alloc_key() was failing to check buckets that didn't have
alloc keys yet (because they'd never been used) - they still need to be
added to the freespace btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e34da43e 19-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve bch2_check_alloc_info

- In check_alloc_key(), previously we were re-initializing iterators
for the need_discard and freespace btrees for every alloc key we
checked. But this was causing us to redo lookups into the journal
keys every time, since those lookups are cached in struct btree_iter.
This initializes the iterators in bch2_check_alloc_info and passes
them into check_alloc_key().

- Make the looping more consistent/efficient in bch2_check_alloc_info()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 22add2ec 26-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use BTREE_INSERT_LAZY_RW in bch2_check_alloc_info()

This runs before we go rw for journal replay, but after we're allowed to
go rw. It might be time to consider killing BTREE_INSERT_LAZY_RW,
though.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 38585367 20-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Bucket invalidate path improvements

- invalidate_one_bucket() now returns 1 when we don't have any buckets
on this device to invalidate, ensuring we don't spin
- the tracepoint invocation is moved to after the transaction commit,
and we now include the number of cached sectors in the tracepoint

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 1c6ff394 20-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix refcount leak in bch2_do_invalidates()

If we fail to queue the work item because it's already in process, we
need to drop the ref we just took.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# a3d7afa5 18-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Always use percpu_ref_tryget_live() on c->writes

If we're trying to get a ref and the refcount has been killed, it means
we're doing an emergency shutdown - we always want tryget_live().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 6f44a994 13-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a persistent counter for bucket discards

Like the previous patch for bucket invalidates, add another counter for
a core allocator path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 440c15cc 13-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a persistent counter for bucket invalidation

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# df8c2ccb 10-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix freespace initialization

bch2_dev_freespace_init() was using __bch2_trans_do() incorrectly, and
calling bch2_bucket_do_index() with a stale alloc key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 401ec4db 03-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Printbuf rework

This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1cab5a82 21-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Go RW before bch2_check_lrus()

btree updates before going RW are expensive if they're in random order,
since they use the list of keys for journal replay to insert, which is
just a gap buffer.

This patch improves the bucket invalidate path so that if
bch2_check_lrus() hasn't finished it only prints warnings instead of
doing an emergency shutdown, which means we can now set BCH_FS_MAY_GO_RW
before bch2_check_lrus().

Also, the filesystem state bits are reorganized a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 1f93726e 17-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tracepoint improvements

Delete some obsolete tracepoints, organize alloc tracepoints better,
make a few tracepoints more consistent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e1b8f5f5 31-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Plumb btree_id & level to trans_mark

For backpointers, we'll need the full key location - that means btree_id
and btree level. This patch plumbs it through.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 0b090326 11-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve bch2_lru_delete() error messages

When we detect a filesystem inconsistency, we should include the
relevent keys in the error message. This patch adds a parameter to pass
the key with the lru entry to bch2_lru_delete(), so that it can be
printed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9b93596c 11-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve error message when alloc key doesn't match lru entry

Error messages should always print out the full key when available -
this gives us a starting point when looking through the journal to debug
what went wrong.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 7003589d 10-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Ensure buckets have io_time[READ] set

It's an error if a bucket is in state BCH_DATA_cached but not on the LRU
btree - i.e io_time[READ] == 0 - so, make sure it's set before adding
it.

Also, make some of the LRU code a bit clearer and more direct.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 84befe8e 10-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bch2_trans_inconsistent_on() in more places

This gets us better error messages.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# a9c0a4cb 09-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Minor device removal fixes

- We weren't clearing the LRU btree
- bch2_alloc_read() runs before bch2_check_alloc_key() deletes alloc
keys for devices/buckets that don't exists, so it needs to check for
that
- bch2_check_lrus() needs to check that buckets exists
- improve some error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# aae29082 09-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_btree_delete_extent_at()

New helper, for deleting extents.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 822835ff 31-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fold bucket_state in to BCH_DATA_TYPES()

Previously, we were missing accounting for buckets in need_gc_gens and
need_discard states. This matters because buckets in those states need
other btree operations done before they can be used, so they can't be
conuted when checking current number of free buckets against the
allocation watermark.

Also, we weren't directly counting free buckets at all. Now, data type 0
== BCH_DATA_free, and free buckets are counted; this means we can get
rid of the separate (poorly defined) count of unavailable buckets.

This is a new on disk format version, with upgrade and fsck required for
the accounting changes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 62491956 07-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move alloc assertion to .key_invalid()

.key_invalid is a better place for this assertion.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 11c7d3e8 06-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check for read_time == 0 in bch2_alloc_v4_invalid()

We've been seeing this error in fsck and we weren't able to track down
where it came from - but now that .key_invalid methods take a rw
argument, we can safely check for this.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 275c8426 03-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add rw to .key_invalid()

This adds a new parameter to .key_invalid() methods for whether the key
is being read or written; the idea being that methods can do more
aggressive checks when a key is newly created and being written, when we
wouldn't want to delete the key because of those checks.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e1effd42 05-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More improvements for alloc info checks

- Move checks for whether the device & bucket are valid from the
.key_invalid method to bch2_check_alloc_key(). This is because
.key_invalid() is called on keys that may no longer exist (post
journal replay), which is a problem when removing/resizing devices.

- We weren't checking the need_discard btree to ensure that every set
bucket has a corresponding alloc key. This refactors the code for
checking the freespace btree, so that it now checks both.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f0ac7df2 03-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert .key_invalid methods to printbufs

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 5735608c 10-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill main in-memory bucket array

All code using the in-memory bucket array, excluding GC, has now been
converted to use the alloc btree directly - so we can finally delete it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5add07d5 17-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fsck for need_discard & freespace btrees

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# caece7fe 10-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New bucket invalidate path

In the old allocator code, preparing an existing empty bucket was part
of the same code path that invalidated buckets containing cached data.
In the new allocator code this is no longer the case: the main allocator
path finds empty buckets (via the new freespace btree), and can't
allocate buckets that contain cached data.

We now need a separate code path to invalidate buckets containing cached
data when we're low on empty buckets, which this patch implements. When
the number of free buckets decreases that triggers the new invalidate
path to run, which uses the LRU btree to pick cached data buckets to
invalidate until we're above our watermark.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 59cc38b8 10-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New discard implementation

In the old allocator code, buckets would be discarded just prior to
being used - this made sense in bcache where we were discarding buckets
just after invalidating the cached data they contain, but in a
filesystem where we typically have more free space we want to be
discarding buckets when they become empty.

This patch implements the new behaviour - it checks the need_discard
btree for buckets awaiting discards, and then clears the appropriate
bit in the alloc btree, which moves the buckets to the freespace btree.

Additionally, discards are now enabled by default.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f25d8215 09-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill allocator threads & freelists

Now that we have new persistent data structures for the allocator, this
patch converts the allocator to use them.

Now, foreground bucket allocation uses the freespace btree to find
buckets to allocate, instead of popping buckets off the freelist.

The background allocator threads are no longer needed and are deleted,
as well as the allocator freelists. Now we only need background tasks
for invalidating buckets containing cached data (when we are low on
empty buckets), and for issuing discards.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c6b2826c 11-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Freespace, need_discard btrees

This adds two new btrees for the upcoming allocator rewrite: an extents
btree of free buckets, and a btree for buckets awaiting discards.

We also add a new trigger for alloc keys to keep the new btrees up to
date, and a compatibility path to initialize them on existing
filesystems.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3d48a7f8 31-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: KEY_TYPE_alloc_v4

This introduces a new alloc key which doesn't use varints. Soon we'll be
adding backpointers and storing them in alloc keys, which means our
pack/unpack workflow for alloc keys won't really work - we'll need to be
mutating alloc keys in place.

Instead of bch2_alloc_unpack(), we now have bch2_alloc_to_v4() that
converts older types of alloc keys to v4 if needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 31f63fd1 14-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Introduce a separate journal watermark for copygc

Since journal reclaim -> btree key cache flushing may require the
allocation of new btree nodes, it has an implicit dependency on copygc
in order to make forward progress - so we should avoid blocking copygc
unless the journal is really close to full.

This introduces watermarks to replace our single MAY_GET_UNRESERVED bit
in the journal, and adds a watermark for copygc and plumbs it through.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e154711 13-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: x-macroize alloc_reserve enum

This makes an array of strings available, like our other enums.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3117db99 21-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't issue discards when in nochanges mode

When the nochanges option is selected, we're supposed to never issue
writes. Unfortunately, it seems discards were missed when implemnting
this, leading to some painful filesystem corruption.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# ec061b21 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_gc no longer uses main in-memory bucket array

This changes the btree_gc code to only use the second bucket array, the
one dedicated to GC. On completion, it compares what's in its in memory
bucket array to the allocation information in the btree and writes it
directly, instead of updating the main in-memory bucket array and
writing that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 12ce5b7d 11-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Btree key cache coherency

- Updates to non key cache iterators will now be transparently
redirected to the key cache for cached btrees.

- Except when creating new keys: then the update goes to underlying
btree

For for iterating over a cached btree to work, we need to ensure that if
a key exists in the key cache, it also exists in the btree - otherwise
the iterator code will skip past it and not check the key cache.

Otherwise, for consistency, all updates should go to the same place -
the key cache.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0678cbe2 10-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Ignore cached data when calculating fragmentation

Previously, bucket fragmentation was considered to be bucket size -
total amount of live data, both dirty and cached.

This meant that if a bucket was full but only a small amount of data in
it was dirty - the rest cached, we'd get stuck: copygc wouldn't move the
dirty data out of the bucket and the allocator wouldn't be able to
invalidate and drop the cached data.

This changes fragmentation to exclude cached data, so that copygc will
evacuate these buckets and copygc/the allocator will always be able to
make forward progress.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 3763cb95 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use in-memory bucket array for alloc updates

More prep work for getting rid of the in-memory bucket array: now that
we have BTREE_ITER_WITH_JOURNAL, the allocator code can do ntree lookups
before journal replay is finished, and there's no longer any need for it
to get allocation information from the in-memory bucket array.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 1f5f52bd 23-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill allocator short-circuit invalidate

The allocator thread invalidates buckets (increments their generation
number) prior to discarding them and putting them on freelists. We've
had a short circuit path for some time to only update the in-memory
bucket mark when doing the invalidate if we're not invalidating cached
data, but that short-circuit path hasn't really been needed for quite
some time (likely since the btree key cache code was added).

We're deleting it now as part of deleting/converting code that uses the
in memory bucket array.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 21aec962 04-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New data structure for buckets waiting on journal commit

Implement a hash table, using cuckoo hashing, for empty buckets that are
waiting on a journal commit before they can be reused.

This replaces the journal_seq field of bucket_mark, and is part of
eventually getting rid of the in memory bucket array.

We may need to make bch2_bucket_needs_journal_commit() lockless, pending
profiling and testing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d8601afc 27-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Simplify journal replay

With BTREE_ITER_WITH_JOURNAL, there's no longer any restrictions on the
order we have to replay keys from the journal in, and we can also start
up journal reclaim right away - and delete a bunch of code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 5222a460 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BTREE_ITER_WITH_JOURNAL

This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is
automatically enabled when initializing a btree iterator before journal
replay has completed - it overlays the contents of the journal with the
btree.

This lets us delete bch2_btree_and_journal_walk() and just use the
normal btree iterator interface instead - which also lets us delete a
significant amount of duplicated code.

Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch -
we're redoing the binary search over keys in the journal every time we
call bch2_btree_iter_peek().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 36f035e9 26-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix allocator + journal interaction

The allocator needs to wait until the last update touching a bucket has
been commited before writing to it again. However, the code was checking
against the last dirty journal sequence number, not the last flushed
journal sequence number.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# a7860877 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New in-memory array for bucket gens

The main in-memory bucket array is going away, but we'll still need to
keep bucket generations in memory, at least for now - ptr_stale() needs
to be an efficient operation.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# abe19d45 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor open_bucket code

Prep work for adding a hash table of open buckets - instead of embedding
a bch_extent_ptr, we need to refer to the bucket directly so that we're
not calling sector_to_bucket() in the hash table lookup code, which has
an expensive divide.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c64740ef 30-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't start allocator threads too early

If the allocator threads start before journal replay has finished
replaying alloc keys, journal replay might overwrite the allocator's
btree updates.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 09943313 24-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rewrite bch2_bucket_alloc_new_fs()

This changes bch2_bucket_alloc_new_fs() to a simple bump allocator that
doesn't need to use the in memory bucket array, part of a larger patch
series to entirely get rid of the in memory bucket array, except for
gc/fsck.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 7243498d 24-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill non-lru cache replacement policies

Prep work for persistent LRUs and getting rid of the in memory bucket
array.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 20572300 10-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve alloc_mem_to_key()

This moves some common code into alloc_mem_to_key(), which translates
from the in-memory format for a bucket to the btree key format.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# fb0e4808 10-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_alloc_write()

This adds a new helper that much like the one we have for inode updates,
that allocates the packed alloc key, packs it and calls
bch2_trans_update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b547d005 29-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Erasure coding fixes

When we added the stripe and stripe_redundancy fields to alloc keys, we
neglected to add them to the functions that convert back and forth with
the in-memory types.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 3e52c222 29-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add journal_seq to inode & alloc keys

Add fields to inode & alloc keys that record the journal sequence number
when they were most recently modified.

For alloc keys, this is needed to know what journal sequence number we
have to flush before the bucket can be reused. Currently this is tracked
in memory, but we'll be getting rid of the in memory bucket array.

For inodes, this is needed for fsync when the inode has been evicted
from the vfs cache. Currently we use a bloom filter per outstanding
journal buf - but that mechanism has been broken since we added the
ability to not issue a flush/fua for every journal write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 904823de 29-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert bch2_mark_key() to take a btree_trans *

This helps to unify the interface between bch2_mark_key() and
bch2_trans_mark_key() - and it also gives access to the journal
reservation and journal seq in the mark_key path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b0d1b70a 24-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Must check for errors from bch2_trans_cond_resched()

But we don't need to call it from outside the btree iterator code
anymore, since it's called by bch2_trans_begin() and
bch2_btree_path_traverse().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 69294246 01-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix allocator shutdown error message

We return 1 to indicate kthread_should_stop() returned true - we
shouldn't be printing an error.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 67e0dd8f 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_path

This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8b3e9bd6 24-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Always check for transaction restarts

On transaction restart iterators won't be locked anymore - make sure
we're always checking for errors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8d344587 13-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add safe versions of varint encode/decode

This adds safe versions of bch2_varint_(encode|decode) that don't read
or write past the end of the buffer, or varint being encoded.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2e655e6d 12-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add open_buckets to sysfs

This is to help debug a rare shutdown deadlock in the allocator code -
the btree code is leaking open_buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# bc3f8b25 01-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check for errors from bch2_trans_update()

Upcoming refactoring is going to change bch2_trans_update() to start
returning transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 01254036 31-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs; Check for allocator thread shutdown

We were missing a kthread_should_stop() check in the loop in
bch2_invalidate_buckets(), very occasionally leading to us getting stuck
while shutting down.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 3a402c8d 07-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix some refcounting bugs

We really need debug mode assertions that ca->ref and ca->io_ref are
used correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ac1019d3 29-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Clean up bch2_btree_and_journal_walk()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 89baec78 17-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Allocator refactoring

This uses the kthread_wait_freezable() macro to simplify a lot of the
allocator thread code, along with cleaning up bch2_invalidate_bucket2().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 04903131 18-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Handle errors in bch2_trans_mark_update()

It's not actually the case that iterators are always checked here -
__bch2_trans_commit() checks for that after running triggers.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6ad060b0 16-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Allocator thread doesn't need gc_lock anymore

Even with runtime gc (which currently isn't supported), runtime gc no
longer clears/recalculates the main set of bucket marks - it allocates
and calculates another set, updating the primary at the end.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dac1525d 16-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: gc shouldn't care about owned_by_allocator

The owned_by_allocator field is a purely in memory thing, even if/when
we bring back GC at runtime there's no need for it to be recalculating
this field. This is prep work for pulling it out of struct bucket, and
eventually getting rid of the bucket array.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d62ab355 14-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_trans_mark_dev_sb()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b1bd955b 07-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't wait for ALLOC_SCAN_BATCH buckets in allocator

It used to be necessary for the allocator thread to batch up
invalidating buckets when possible - but since we added the btree key
cache that hasn't been a concern, and now it's causing the allocator
thread to livelock when the filesystem is nearly full.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 73590619 21-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't unconditially version_upgrade in initialize

This is mkfs's job. Also, clean up the handling of feature bits some.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 50dc0f69 19-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Require all btree iterators to be freed

We keep running into occasional bugs with btree transaction iterators
overflowing - this will make those bugs more visible.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2436cb9f 20-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use x-macros for more enums

This patch standardizes all the enums that have associated string tables
(probably more enums should have string tables).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 41f8b09e 20-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename BTREE_ID enums for consistency with other enums

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bae895a5 18-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add allocator thread state to sysfs

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 51c66fed 17-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rip out copygc pd controller

We have a separate mechanism for ratelimiting copygc now - the pd
controller has only been causing problems.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cb66fc5f 13-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix copygc threshold

Awhile back the meaning of is_available_bucket() and thus also
bch_dev_usage->buckets_unavailable changed to include buckets that are
owned by the allocator - this was so that the stat could be persisted
like other allocation information, and wouldn't have to be regenerated
by walking each bucket at mount time.

This broke copygc, which needs to consider buckets that are reclaimable
and haven't yet been grabbed by the allocator thread and moved onta
freelist. This patch fixes that by adding dev_buckets_reclaimable() for
copygc and the allocator thread, and cleans up some of the callers a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1b057787 04-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a cond_seched() to the allocator thread

This is just a band-aid fix for now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 59a74051 05-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Create allocator threads when allocating filesystem

We're seeing failures to mount because of a failure to start the
allocator threads, which currently happens fairly late in the mount
process, after walking all metadata, and kthread_create() fails if
something has tried to kill the mount process, which is probably not
what we want.

This patch avoids this issue by creating, but not starting, the
allocator threads when we preallocate all of our other in memory data
structures.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dab9ef0d 23-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add error message for some allocation failures

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 180fb49d 21-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Journal updates to dev usage

This eliminates the need to scan every bucket to regenerate dev_usage at
mount time.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2abe5420 21-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Persist 64 bit io clocks

Originally, bcachefs - going back to bcache - stored, for each bucket, a
16 bit counter corresponding to how long it had been since the bucket
was read from. But, this required periodically rescaling counters on
every bucket to avoid wraparound. That wasn't an issue in bcache, where
we'd perodically rewrite the per bucket metadata all at once, but in
bcachefs we're trying to avoid having to walk every single bucket.

This patch switches to persisting 64 bit io clocks, corresponding to the
64 bit bucket timestaps introduced in the previous patch with
KEY_TYPE_alloc_v2.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7f4e1d5d 22-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: KEY_TYPE_alloc_v2

This introduces a new version of KEY_TYPE_alloc, which uses the new
varint encoding introduced for inodes. This means we'll eventually be
able to support much larger bucket sizes (for SMR devices), and the
read/write time fields are expanded to 64 bits - which will be used in
the next patch to get rid of the periodic rescaling of those fields.

Also, for buckets that are members of erasure coded stripes, this adds
persistent fields for the index of the stripe they're members of and the
stripe redundancy. This is part of work to get rid of having to scan and
read into memory the alloc and stripes btrees at mount time.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4529ae09 25-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an assertion

If we're invalidating a bucket that has cached data in it, data_type
won't be 0 - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bfcf840d 22-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Mark superblocks transactionally

More work towards getting rid of the in memory struct bucket: this path
adds code for marking superblock and journal buckets via the btree, and
uses it in the device add and journal resize paths.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9afc6652 22-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bch2_invalidate_bucket()

This patch is working towards eventually getting rid of the in memory
struct bucket, and relying only on the btree representation.

Since bch2_invalidate_bucket() was only used for incrementing gens, not
invalidating cached data, no other counters were being changed as a side
effect - meaning it's safe for the allocator code to increment the
bucket gen directly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 72eab8da 21-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor dev usage

This is to make it more amenable for serialization.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4291a331 08-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_alloc_write() should be writing for all devices

Alloc info isn't stored on a particular device, it makes no sense to
only be writing it out for rw members - this was causing fsck to not fix
alloc info errors, oops.

Also, make sure we write out alloc info in other repair paths.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3187aa8d 21-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much

Previously, we were using BTREE_INSERT_RESERVE in a lot of places where
it no longer makes sense.

- we now have more open_buckets than we used to, and the reserves work
better, so we shouldn't need to use BTREE_INSERT_RESERVE just because
we're holding open_buckets pinned anymore.

- We have the btree key cache for updates to the alloc btree, meaning
we no longer need the btree reserve to ensure the allocator can make
forward progress.

This means that we should only need a reserve for btree updates to
ensure that copygc can make forward progress.

Since it's now just for copygc, we can also fold RESERVE_BTREE into
RESERVE_MOVINGGC (the allocator's freelist reserve).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f30dd860 16-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't write bucket IO time lazily

With the btree key cache code, we don't need to update the alloc btree
lazily - and this will mean we can remove the bch2_alloc_write() call in
the shutdown path.

Future work: we really need to expend the bucket IO clocks from 16 to 64
bits, so that we don't have to rescale them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b7a9bbfc 19-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move journal reclaim to a kthread

This is to make tracing easier.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 39283c71 19-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix for bad stripe pointers

The allocator usually doesn't increment bucket gens right away on
buckets that it's about to hand out (for reasons that need to be
documented), instead deferring that to whatever extent update first
references that bucket.

But stripe pointers reference buckets without changing bucket sector
counts, meaning we could end up with a pointer in a stripe with a gen
newer than the bucket it points to.

Fix this by adding a transactional trigger for KEY_TYPE_stripe that just
writes out the keys in the alloc btree for the buckets it points to.

Also - consolidate the code that checks pointer validity.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 28998019 17-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Start/stop io clock hands in read/write paths

This fixes a bug where the clock hands in the journal and superblock
didn't match, because we were still incrementing the read clock hand
while read-only.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8d6b6222 16-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improvements to writing alloc info

Now that we've got transactional alloc info updates (and have for
awhile), we don't need to write it out on shutdown, and we don't need to
write it out on startup except when GC found errors - this is a big
improvement to mount/unmount performance.

This patch also fixes a few bugs where we weren't writing out alloc
info (on new filesystems, and new devices) and should have been.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f3721e12 16-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Perf improvements for bch_alloc_read()

On large filesystems reading in the alloc info takes a significant
amount of time. But we don't need to be calling into the fully general
bch2_mark_key() path, just open code what we need in
bch2_alloc_read_fn().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f9adbb7d 12-Aug-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a cond_resched() to bch2_alloc_write()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 74ed7e56 21-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't let copygc buckets be stolen by other threads

And assorted other copygc fixes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3d080aa5 22-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Delete unused arguments

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e6d11615 11-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make copygc thread global

Per device copygc threads don't move data to different devices and they
make fragmentation works - they don't make much sense anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 89fd25be 09-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use x-macros for data types

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# eff508b4 17-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a kthread_should_stop() check to allocator thread

Turns out it's possible during shutdown for the allocator to get stuck
spinning on bch2_invalidate_buckets() without hitting any of the other
checks.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7dd1ebfa 15-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Increase size of btree node reserve

Also tweak the allocator to be more aggressive about keeping it full.
The recent changes to make updates to interior nodes transactional (and
thus generate updates to the alloc btree) all put more stress on the
btree node reserves.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5d20ba48 04-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use cached iterators for alloc btree

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 255adc51 03-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Always increment bucket gen on bucket reuse

Not doing so confuses copygc

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a27443bc 03-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill old allocator startup code

It's not needed anymore since we can now write to buckets before
updating the alloc btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 039fc4c5 28-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fixes for going RO

Now that interior btree updates are fully transactional, we don't need
to write out alloc info in a loop. However, interior btree updates do
put more things in the journal, so we still need a loop in the RO
sequence.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# baeed3c3 28-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't require alloc btree to be updated before buckets are used

This is to break a circular dependency in the shutdown path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 00b8ccf7 25-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Interior btree updates are now fully transactional

We now update the alloc info (bucket sector counts) atomically with
journalling the update to the interior btree nodes, and we also set new
btree roots atomically with the journalled part of the btree update.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b2930396 24-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix reading of alloc info after unclean shutdown

When updates to interior nodes started being journalled, that meant that
after an unclean shutdown, until journal replay is done we can't walk
the btree without overlaying the updates from the journal.

The initial btree gc was changed to walk the btree overlaying keys from
the journal - but bch2_alloc_read() and bch2_stripes_read() were missed.
Major whoops...

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a9310ab0 11-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fixes for startup on very full filesystems

- Always pass BTREE_INSERT_USE_RESERVE when writing alloc btree keys
- Don't strand buckest on the copygc freelist until after recovery is
done and we're starting copygc.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5c4a5cd5 27-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_and_journal_iter

Introduce a new iterator that iterates over keys in the btree with keys
from the journal overlaid on top. This factors out what the erasure
coding init code was doing manually.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2d594dfb 31-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out btree_trigger_flags

The trigger flags really belong with individual btree_insert_entries,
not the transaction commit flags - this splits out those flags and
unifies them with the BCH_BUCKET_MARK flags. Todo - split out
btree_trigger.c from buckets.c

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 58e2388f 22-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BTREE_INSERT_ATOMIC

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e3728b500 11-Oct-2019 Justin Husted <sigstop@gmail.com>

bcachefs: Initialize padding space after alloc bkey

Packed bkeys are padded up to 64 bit alignment, but the alloc bkey type
was not clearing the pad bytes after the last data byte. This left the
key possibly containing some random garbage at the end.

This problem was found using valgrind.

This patch also changes a path with the inode bkey to clear in the same
way.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ae93a628 12-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix flushing held btree writes when there's a fs error

Previously, we'd go into an infinite loop.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a7199432 22-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill deferred btree updates

Will be replaced by cached btree iterators

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4d13e818 18-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Avoid deadlocking on the allocator

The allocator needs to make sure there's buckets available on the
RESERVE_NONE freelist if at all possible - otherwise foreground IO will
get stuck.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6671a708 27-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor bch2_alloc_write()

Major simplification - gets rid of the need for marking buckets as
dirty, instead we write buckets if the in memory mark is different from
what's in the btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 67163cde 27-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Trust in memory bucket mark

This fixes a bug in the journal replay -> extent_replay_key ->
split_compressed path, when we do an update that changes alloc info but
the alloc info in the btree isn't up to date yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2cbe5cfe 09-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rework calling convention for marking overwrites

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6e738539 24-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve key marking interface

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 20bceecb 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More work to avoid transaction restarts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6fb076e6 14-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix spurious inconsistency in recovery

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 460651ee 17-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Various improvements to bch2_alloc_write()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 932aa837 11-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_mark_update()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c43a6ef9 05-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_bkey_cached_common

This is prep work for the btree key cache: btree iterators will point to
either struct btree, or a new struct bkey_cached.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 94f651e2 17-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Return errors from for_each_btree_key()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f80b4e64 16-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix hang while shutting down

If the allocator thread exited before bch2_dev_allocator_stop() was
called (because of an error), bch2_dev_allocator_quiesce() could hang.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 53beb841 16-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: lockdep fix when going rw from bch2_alloc_write()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d0734356 11-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Deduplicate keys in the journal before replay

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3ea2b1e1 12-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: cmp_int()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a0e0bda1 06-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Pass flags arg to bch2_alloc_write()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a1d58243 29-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: add ability to run gc on metadata only

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3a0e06db 24-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted preemption fixes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0f238367 27-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trans_for_each_iter()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 424eb881 25-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Only get btree iters from btree transactions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 134915f3 21-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Go rw lazily

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0564b167 13-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: convert bch2_btree_insert_at() usage to bch2_trans_commit()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 18c9883e 13-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix bch2_invalidate_one_bucket2() during journal replay

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 61f321fc 13-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make deferred inode updates a mount option

Journal reclaim may still need performance tuning

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e5d6c59 19-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use journal preres for deferred btree updates

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fcbf3e50 01-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Allocator startup fixes/refactoring

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1633e492 28-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: improved flush_held_btree_writes()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 86a225c4 20-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix a deadlock on startup

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8fe826f9 13-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert bucket invalidation to key marking path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8c96cfcc 13-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix more locking bugs

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 39fbc5a4 11-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: gc lock no longer needed for disk reservations

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 76f4c7b0 11-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix oldest_gen handling

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 053dbb37 11-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a locking bug

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 736affa8 08-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix for unmount hang

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b935a8a6 09-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a bug when shutting down before allocator started

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 430735cd 18-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Persist alloc info on clean shutdown

- Does not persist alloc info for stripes yet
- Also does not yet include filesystem block/sector counts yet, from
struct fs_usage
- Not made use of just yet

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5e5d9bdb 22-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix fifo overflow in allocator startup

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d0cc3def 13-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More allocator startup improvements

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9166b41d 25-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: s/usage_lock/mark_lock

better describes what it's for, and we're going to call a new lock
usage_lock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8eb7f3ee 18-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: move dirty into bucket_mark

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 90541a74 21-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add new alloc fields

prep work for persistent alloc info

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 26609b61 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make bkey types globally unique

this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.

Also improve the on disk format versioning stuff.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e8897337 22-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Allow for new alloc fields

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9ca53b55 23-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: gc now operates on second set of bucket marks

This means we can now use gc to verify the allocation information -
important for testing persistant alloc info

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 61274e9d 18-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Allocator startup improvements

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cd575ddf 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Erasure coding

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 319f9ac3 08-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: revamp to_text methods

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8b335bae 04-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Assorted fixes for running on very small devices

It's now possible to create and use a filesystem on a 512k device with
4k buckets (though at that size we still waste almost half to internal
reserves)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b092dadd 04-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Scale down number of writepoints when low on space

this means we don't have to reserve space for them when calculating
filesystem capacity

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 198d6700 21-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: add functionality for heaps to update backpointers

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ef337c54 06-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Allocation code refactoring

bch2_alloc_sectors_start() was a nightmare to work with - it's got some
tricky stuff to do, since it wants to use the buckets the writepoint
already has, unless they're not in the target it wants to write to,
unless it can't allocate from any other devices in which case it will
use those buckets if it has to - et cetera.

This restructures the code to start with a new empty list of open
buckets we're going to use for the new allocation, pulling buckets from
the write point's list as we decide that we really are going to use
them - making the code somewhat more functional and drastically easier
to understand.

Also fixes a bug where we could end up waiting on c->freelist_wait
(because allocating from one device failed) but return success from
bch2_bucket_alloc(), because allocating from a different device
succeeded.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7b3f84ea 05-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out alloc_background.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>