History log of /linux-master/fs/bcachefs/buckets.c
Revision Date Author Comments
# 47d2080e 25-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill bch2_bkey_ptr_data_type()

Remove some duplication, and inconsistency between check_fix_ptrs and
the main ptr marking paths

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3ed94062 17-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve bch2_fatal_error()

error messages should always include __func__

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 506b1876 08-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_bit_mod -> bch2_btree_bit_mod_buffered

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cb6fc943 01-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: kill kvpmalloc()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5b14ce35 11-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_trans_account_disk_usage_change()

The disk space accounting rewrite is splitting out accounting for each
replicas set - those are moving to btree keys, instead of percpu
counters.

This breaks bch2_trans_fs_usage_apply() up, splitting out the part we
will still need.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8e7834a8 16-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_fs_usage_base

Split out base filesystem usage into its own type; prep work for
breaking up bch2_trans_fs_usage_apply().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e58f963c 06-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: helpers for printing data types

We need bounds checking since new versions may introduce new data types.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f5d4481c 28-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: move "ptrs not changing" optimization to bch2_trigger_extent()

This is useful for btree ptrs as well, when we're just updating
sectors_written.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4f9ec59f 28-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: unify extent trigger

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5a82ec3f 29-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_trigger_stripe_ptr()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1f34c21b 29-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_trigger_pointer()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f4f78779 27-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: move stripe triggers to ec.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6820ac2c 27-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: move bch2_mark_alloc() to alloc_background.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6cacd0c4 27-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: unify reservation trigger

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 282e7c37 27-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: kill mem_trigger_run_overwrite_then_insert()

now that type signatures are unified, redundant

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ad00bce0 27-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: mark now takes bkey_s

Prep work for disk space accounting rewrite: we're going to want to use
a single callback for both of our current triggers, so we need to change
them to have the same type signature first.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 717296c3 27-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: trans_mark now takes bkey_s

Prep work for disk space accounting rewrite: we're going to want to use
a single callback for both of our current triggers, so we need to change
them to have the same type signature first.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 41b84fb4 17-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: for_each_member_device_rcu() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9fea2274 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: for_each_member_device() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cf904c8d 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_err_(fn|msg) check if should print

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0d963a63 03-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Move reflink_p triggers into reflink.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9e243d3c 25-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill journal_seq/gc args to bch2_dev_usage_update_m()

This is only used by gc (fsck).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9b34f02c 23-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill dev_usage->buckets_ec

This counter is redundant; it's simply the sum of BCH_DATA_stripe and
BCH_DATA_parity buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ed0cd515 23-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_dev_usage_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dafff7e5 23-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New bucket sector count helpers

This introduces bch2_bucket_sectors() and bch2_bucket_sectors_dirty(),
prep work for separately accounting stripe sectors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3b05b8e0 23-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Simplify check_bucket_ref()

We only need the sector count being modified.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 25f64e99 11-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't use update_cached_sectors() in bch2_mark_alloc()

bch2_update_cached_sectors_list() is closer to how the new disk space
accounting works, called from trans_mark().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 086a52f7 09-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename bch_replicas_entry -> bch_replicas_entry_v1

Prep work for introducing bch_replicas_entry_v2

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 03013bb0 25-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix bucket data type for stripe buckets

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 202a7c29 23-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't stop copygc thread on device resize

copygc no longer has to scan the buckets, so it's no longer a problem if
the number of buckets is changing while it's running.

This also fixes a bug where we forgot to restart copygc.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# df94cb2e 27-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix an integer overflow

Fixes:

bcachefs (e7fdc10e-54a3-49d9-bd0c-390370889d84): disk usage increased 4294967296 more than 2823707312 sectors reserved)
transaction updates for __bchfs_fallocate journal seq 467859
update: btree=extents cached=0 bch2_trans_update+0x4e8/0x540
old u64s 5 type deleted 536925940:3559337304:4294967283 len 0 ver 0
new u64s 6 type reservation 536925940:3559337304:4294967283 len 3559337304 ver 0: generation 0 replicas 2
update: btree=inodes cached=1 bch2_extent_update_i_size_sectors+0x305/0x3b0
old u64s 19 type inode_v3 0:536925940:4294967283 len 0 ver 0: mode 100600 flags 15300000 journal_seq 467859 bi_size 0 bi_sectors 0 bi_version 0 bi_atime 40905301656446 bi_ctime 40905301656446 bi_mtime 40905301656446 bi_otime 40905301656446 bi_uid 0 bi_gid 0 bi_nlink 0 bi_generation 0 bi_dev 0 bi_data_checksum 0 bi_compression 0 bi_project 0 bi_background_compression 0 bi_data_replicas 0 bi_promote_target 0 bi_foreground_target 0 bi_background_target 0 bi_erasure_code 0 bi_fields_set 0 bi_dir 1879048193 bi_dir_offset 3384856038735393365 bi_subvol 0 bi_parent_subvol 0 bi_nocow 0
new u64s 19 type inode_v3 0:536925940:4294967283 len 0 ver 0: mode 100600 flags 15300000 journal_seq 467859 bi_size 0 bi_sectors 3559337304 bi_version 0 bi_atime 40905301656446 bi_ctime 40905301656446 bi_mtime 40905301656446 bi_otime 40905301656446 bi_uid 0 bi_gid 0 bi_nlink 0 bi_generation 0 bi_dev 0 bi_data_checksum 0 bi_compression 0 bi_project 0 bi_background_compression 0 bi_data_replicas 0 bi_promote_target 0 bi_foreground_target 0 bi_background_target 0 bi_erasure_code 0 bi_fields_set 0 bi_dir 1879048193 bi_dir_offset 3384856038735393365 bi_subvol 0 bi_parent_subvol 0 bi_nocow 0

Kernel panic - not syncing: bcachefs (e7fdc10e-54a3-49d9-bd0c-390370889d84): panic after error
CPU: 4 PID: 5154 Comm: rsync Not tainted 6.5.9-gateway-gca1614174cc0-dirty #1
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Phantom Gaming 4, BIOS P4.20 08/02/2021
Call Trace:
<TASK>
dump_stack_lvl+0x5a/0x90
panic+0x105/0x300
? console_unlock+0xf1/0x130
? bch2_printbuf_exit+0x16/0x30
? srso_return_thunk+0x5/0x10
bch2_inconsistent_error+0x6f/0x80
bch2_trans_fs_usage_apply+0x279/0x3d0
__bch2_trans_commit+0x112a/0x1df0
? bch2_extent_update+0x13a/0x1d0
bch2_extent_update+0x13a/0x1d0
bch2_extent_fallocate+0x58e/0x740
bch2_fallocate_dispatch+0xb7c/0x1030
? do_filp_open+0xa0/0x140
vfs_fallocate+0x18e/0x1d0
__x64_sys_fallocate+0x46/0x70
do_syscall_64+0x48/0xa0
? exit_to_user_mode_prepare+0x4d/0xa0
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7fc85d91bbb3
Code: 64 89 02 b8 ff ff ff ff eb bd 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 80 3d 31 da 0d 00 00 49 89 ca 74 14 b8 1d 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5d c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 10

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b65db750 24-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Enumerate fsck errors

This patch adds a superblock error counter for every distinct fsck
error; this means that when analyzing filesystems out in the wild we'll
be able to see what sorts of inconsistencies are being found and repair,
and hence what bugs to look for.

Errors validating bkeys are not yet considered distinct fsck errors, but
this patch adds a new helper, bkey_fsck_err(), in order to add distinct
error types for them as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fb3f57bb 20-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: rebalance_work

This adds a new btree, rebalance_work, to eliminate scanning required
for finding extents that need work done on them in the background - i.e.
for the background_target and background_compression options.

rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an
extent in the extents or reflink btree at the same pos.

A new extent field is added, bch_extent_rebalance, which indicates that
this extent has work that needs to be done in the background - and which
options to use. This allows per-inode options to be propagated to
indirect extents - at least in some circumstances. In this patch,
changing IO options on a file will not propagate the new options to
indirect extents pointed to by that file.

Updating (setting/clearing) the rebalance_work btree is done by the
extent trigger, which looks at the bch_extent_rebalance field.

Scanning is still requrired after changing IO path options - either just
for a given inode, or for the whole filesystem. We indicate that
scanning is required by adding a KEY_TYPE_cookie key to the
rebalance_work btree: the cookie counter is so that we can detect that
scanning is still required when an option has been flipped mid-way
through an existing scan.

Future possible work:
- Propagate options to indirect extents when being changed
- Add other IO path options - nr_replicas, ec, to rebalance_work so
they can be applied in the background when they change
- Add a counter, for bcachefs fs usage output, showing the pending
amount of rebalance work: we'll probably want to do this after the
disk space accounting rewrite (moving it to a new btree)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 523f33ef 22-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: All triggers are BTREE_TRIGGER_WANTS_OLD_AND_NEW

Upcoming rebalance_work btree will require extent triggers to be
BTREE_TRIGGER_WANTS_OLD_AND_NEW - so to reduce potential confusion,
let's just make all triggers BTREE_TRIGGER_WANTS_OLD_AND_NEW.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bbe682c7 21-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Ensure devices are always correctly initialized

We can't mark device superblocks or allocate journal on a device that
isn't online.

That means we may need to do this on every mount, because we may have
formatted a new filesystem and then done the first mount
(bch2_fs_initialize()) in degraded mode.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 88d39fd5 06-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Switch to unsafe_memcpy() in a few places

The new fortify checking doesn't work for us in all places; this
switches to unsafe_memcpy() where appropriate to silence a few
warnings/errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 73bbeaa2 27-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bucket_lock() is now a sleepable lock

fsck_err() may sleep - it takes a mutex and may allocate memory, so
bucket_lock() needs to be a sleepable lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a55fc65e 19-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix an overflow check

When bucket sector counts were changed from u16s to u32s, a few things
were missed. This fixes an overflow check, and a truncation that
prevented the overflow check from firing.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6bd68ec2 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Heap allocate btree_trans

We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7cb0e699 12-Sep-2023 Colin Ian King <colin.i.king@gmail.com>

bcachefs: remove redundant initialization of pointer d

The pointer d is being initialized with a value that is never read,
it is being re-assigned later on when it is used in a for-loop.
The initialization is redundant and can be removed.

Cleans up clang-scan build warning:
fs/bcachefs/buckets.c:1303:25: warning: Value stored to 'd' during its
initialization is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# aef32bf7 11-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: __bch2_btree_insert() -> bch2_btree_insert_trans()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1e81f89b 06-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix assorted checkpatch nits

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4dc5bb9a 16-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: move inode triggers to inode.c

bit of reorg

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 813e0cec 15-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Upgrade path fixes

Some minor fixes to not print errors that are actually due to a verson
upgrade.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 73bd774d 06-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted sparse fixes

- endianness fixes
- mark some things static
- fix a few __percpu annotations
- fix silent enum conversions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3a63b32f 23-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_trans_mark_pointer() refactoring

bch2_bucket_backpointer_mod() doesn't need to update the alloc key, we
can exit the alloc iter earlier.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1bb3c2a9 20-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New error message helpers

Add two new helpers for printing error messages with __func__ and
bch2_err_str():
- bch_err_fn
- bch_err_msg

Also kill the old error strings in the recovery path, which were causing
us to incorrectly report memory allocation failures - they're not needed
anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 21da6101 28-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: replicas_deltas_realloc() uses allocate_dropping_locks()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 19c304be 28-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: GFP_NOIO -> GFP_NOFS

GFP_NOIO dates from the bcache days, when we operated under the block
layer. Now, GFP_NOFS is more appropriate, so switch all GFP_NOIO uses to
GFP_NOFS.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e47a390a 27-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert -ENOENT to private error codes

As with previous conversions, replace -ENOENT uses with more informative
private error codes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 962210b2 22-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix a buffer overrun in bch2_fs_usage_read()

We were copying the size of a struct bch_fs_usage_online to a struct
bch_fs_usage, which is 8 bytes smaller.

This adds some new helpers so we can do this correctly, and get rid of
some magic +1s too.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f12a798a 30-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_bkey_get_mut() now calls bch2_trans_update()

It's safe to call bch2_trans_update with a k/v pair where the value
hasn't been filled out, as long as the key part has been and the value
is filled out by transaction commit time.

This patch folds the bch2_trans_update() call into bch2_bkey_get_mut(),
eliminating a bit of boilerplate.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 34dfa5db 27-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_bkey_get_mut() improvements

- bch2_bkey_get_mut() now handles types increasing in size, allocating
a buffer for the type's current size when necessary
- bch2_bkey_make_mut_typed()
- bch2_bkey_get_mut() now initializes the iterator, like
bch2_bkey_get_iter()

Also, refactor so that most of the code is in functions - now macros are
only used for wrappers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 62a03559 31-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rip out code for storing backpointers in alloc keys

We don't store backpointers in alloc keys anymore, since we gained the
btree write buffer.

This patch drops support for backpointers in alloc keys, and revs the on
disk format version so that we know a fsck is required.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 65d48e35 14-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Private error codes: ENOMEM

This adds private error codes for most (but not all) of our ENOMEM uses,
which makes it easier to track down assorted allocation failures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2640faeb 06-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Journal resize fixes

- Fix a sleeping-in-atomic bug due to calling
bch2_journal_buckets_to_sb() under the journal lock.
- Additionally, now we mark buckets as journal buckets before adding
them to the journal in memory and the superblock. This ensures that
if we crash part way through we'll never be writing to journal
buckets that aren't marked correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 91065976 01-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Mark stripe buckets with correct data type

Currently, we don't use bucket data type for tracking whether buckets
are part of a stripe; parity buckets are BCH_DATA_parity, but data
buckets in a stripe are BCH_DATA_user. There's a separate counter,
buckets_ec, outside the BCH_DATA_TYPES system for tracking number of
buckets on a device that are part of a stripe.

The trouble with this approach is that it's too coarse grained, and we
need better information on fragmentation for debugging copygc.

With this patch, data buckets in a stripe are now tracked as
BCH_DATA_stripe buckets.

This doesn't yet differentiate between erasure coded and non-erasure
coded data in a stripe bucket, nor do we yet track empty data buckets in
stripes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2611a041 01-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_mark_key() now takes btree_id & level

btree & level are passed to trans_mark - for backpointers -
bch2_mark_key() should take them as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 27616a31 18-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Simplify ec stripes heap

Now that we have a separate data structure for tracking open stripes,
the stripes heap can track all existing stripes, which is a nice
simplification.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 627a2312 18-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Switch ec_stripes_heap_lock to a mutex

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 32770815 11-Feb-2023 Daniel Hill <daniel@gluo.nz>

bcachefs: Don't run triggers when repairing in __bch2_mark_reflink_p()

Triggers current trip-up on the faulty reflink we're trying to repair,
Disabling them lets us fix broken reflink and continue.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8ffa11a2 19-Jan-2023 Daniel Hill <daniel@gluo.nz>

bcachefs: let __bch2_btree_insert() pass in flags

This patch is prep work for the following patch.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c1f59ef6 11-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More info on check_bucket_ref() error

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8dd69d9f 21-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: KEY_TYPE_inode_v3, metadata_version_inode_v3

Move bi_size and bi_sectors into the non-varint portion of the inode, so
that the write path can update them without going through the relatively
expensive unpack/pack operations.

Other changes:
- Add a field for the offset of the varint section, so we can add new
non-varint fields without needing a new inode type, like alloc_v3
- Move bi_mode into the flags field, so that the varint section can be
u64 aligned

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a8c752bb 17-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New on disk format: Backpointers

This patch adds backpointers: we now have a reverse index from device
and offset on that device (specifically, offset within a bucket) back to
btree nodes and (non cached) data extents.

The first 40 backpointers within a bucket are stored in the alloc key;
after that backpointers spill over to the next backpointers btree. This
is to help avoid performance regressions from additional btree updates
on large streaming workloads.

This patch adds all the code for creating, checking and repairing
backpointers. The next patch in the series is going to use backpointers
for copygc - finally getting rid of the need to scan all extents to do
copygc.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 920e69bc 03-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Btree write buffer

This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.

This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.

This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.

Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.

- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.

- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.

If there isn't room we flush the write buffer in the transaction
commit error path and retry.

- A new persistent option, for specifying the number of entries in the
write buffer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 19a614d2 30-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Better inlining for bch2_alloc_to_v4_mut

This separates out the slowpath into a separate function, and inlines
bch2_alloc_v4_mut into bch2_trans_start_alloc_update(), the main place
it's called.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7c909f65 20-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix repair path in bch2_mark_reflink_p()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ad5d3d82 25-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill fs_usage_apply_warn()

We now have bch2_trans_inconsistent() which generically does the same
thing - dumps pending btree transaction updates.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2cc9c0db 28-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix some memcpy() warnings

With CONFIG_FORTIFY_SOURCE, the compiler attempts to warn about mempcys
that extend past struct field boundaries. This results in some spurious
warnings where we use embedded variable length structs, this patch
switches to unsafe_mecpy() to fix the warnings.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 994ba475 23-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New btree helpers

This introduces some new conveniences, to help cut down on boilerplate:

- bch2_trans_kmalloc_nomemzero() - performance optimiation
- bch2_bkey_make_mut()
- bch2_bkey_get_mut()
- bch2_bkey_get_mut_typed()
- bch2_bkey_alloc()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8852501f 24-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve fs_usage_apply_warn() message

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# df6a24f8 22-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Make error messages more uniform

Use __func__ in error messages that refer to function name, and do so
more uniformly - suggested by checkpatch.pl

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e3e02e6 19-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted checkpatch fixes

checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:

ASSIGN_IN_IF
UNSPECIFIED_INT - bcachefs coding style prefers single token type names
NEW_TYPEDEFS - typedefs are occasionally good
FUNCTION_ARGUMENTS - we prefer to look at functions in .c files
(hopefully with docbook documentation), not .h
file prototypes
MULTISTATEMENT_MACRO_USE_DO_WHILE
- we have _many_ x-macros and other macros where
we can't do this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ed80c569 21-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Optimize bch2_dev_usage_read()

- add bch2_dev_usage_read_fast(), which doesn't return by value -
bch_dev_usage is big enough that we don't want the silent memcpy
- tweak the allocation path to only call bch2_dev_usage_read() once per
bucket allocated

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5b3243cb 11-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix cached data accounting

Negating without casting to a signed integer means the value wasn't
getting sign extended properly - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6c22eb70 08-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix "multiple types of data in same bucket" with ec

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 098ef98d 18-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add private error codes for ENOSPC

Continuing the saga of introducing private dedicated error codes for
each error path, this patch converts ENOSPC to error codes that are
subtypes of ENOSPC. We've recently had a test failure where we got
-ENOSPC where we shouldn't have, and didn't have enough information to
tell where it came from, so this patch will solve that problem.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dadecd02 14-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_run()

This adds a new helper, bch2_trans_run(), that runs a function with a
btree_transaction context but without handling transaction restarts.
We're adding checks for nested transaction restart handling: when an
inner transaction handles a transaction restart it will still have to
return it to the outer transaction, or else assertions will be popped in
the outer transaction.

But some places don't need restart handling at the outer scope, so this
helper does what they need.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f501ad2b 17-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_mark_alloc(): Do wakeups after updating usage

We have an obvious wake up race if we do the wakeup _before_ updating
the counters the thing doing the waiting is reading.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e68914ca 13-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename __bch2_trans_do() -> commit_do()

Better/more descriptive naming, and prep for adding
nested_lockrestart_do() and nested_commit_do().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 401ec4db 03-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Printbuf rework

This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1f93726e 17-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tracepoint improvements

Delete some obsolete tracepoints, organize alloc tracepoints better,
make a few tracepoints more consistent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e1b8f5f5 31-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Plumb btree_id & level to trans_mark

For backpointers, we'll need the full key location - that means btree_id
and btree level. This patch plumbs it through.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 75c8d030 12-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill old rebuild_replicas option

This option was useful when the replicas mechism was new and still being
debugged, but hasn't been used in ages - let's delete it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 822835ff 31-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fold bucket_state in to BCH_DATA_TYPES()

Previously, we were missing accounting for buckets in need_gc_gens and
need_discard states. This matters because buckets in those states need
other btree operations done before they can be used, so they can't be
conuted when checking current number of free buckets against the
allocation watermark.

Also, we weren't directly counting free buckets at all. Now, data type 0
== BCH_DATA_free, and free buckets are counted; this means we can get
rid of the separate (poorly defined) count of unavailable buckets.

This is a new on disk format version, with upgrade and fsck required for
the accounting changes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e1effd42 05-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More improvements for alloc info checks

- Move checks for whether the device & bucket are valid from the
.key_invalid method to bch2_check_alloc_key(). This is because
.key_invalid() is called on keys that may no longer exist (post
journal replay), which is a problem when removing/resizing devices.

- We weren't checking the need_discard btree to ensure that every set
bucket has a corresponding alloc key. This refactors the code for
checking the freespace btree, so that it now checks both.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c6b6d416 02-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: gc mark fn fixes, cleanups

mark_stripe_bucket() was busted; it was using @new unitialized.

Also, clean up all the gc mark functions, and convert them to the same
style.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 75f02de4 31-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use crc_is_compressed()

Trivial cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 66d90823 13-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill struct bucket_mark

This switches struct bucket to using a lock, instead of cmpxchg. And now
that the protected members no longer need to fit into a u64, we can
expand the sector counts to 32 bits.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5735608c 10-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill main in-memory bucket array

All code using the in-memory bucket array, excluding GC, has now been
converted to use the alloc btree directly - so we can finally delete it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5f43f99c 10-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_dev_usage_update() no longer depends on bucket_mark

This is one of the last steps in getting rid of the main in-memory
bucket array.

This changes bch2_dev_usage_update() to take bkey_alloc_unpacked instead
of bucket_mark, and for the places where we are in fact working with
bucket_mark and don't have bkey_alloc_unpacked, we add a wrapper that
takes bucket_mark and converts to bkey_alloc_unpacked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# caece7fe 10-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New bucket invalidate path

In the old allocator code, preparing an existing empty bucket was part
of the same code path that invalidated buckets containing cached data.
In the new allocator code this is no longer the case: the main allocator
path finds empty buckets (via the new freespace btree), and can't
allocate buckets that contain cached data.

We now need a separate code path to invalidate buckets containing cached
data when we're low on empty buckets, which this patch implements. When
the number of free buckets decreases that triggers the new invalidate
path to run, which uses the LRU btree to pick cached data buckets to
invalidate until we're above our watermark.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 59cc38b8 10-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New discard implementation

In the old allocator code, buckets would be discarded just prior to
being used - this made sense in bcache where we were discarding buckets
just after invalidating the cached data they contain, but in a
filesystem where we typically have more free space we want to be
discarding buckets when they become empty.

This patch implements the new behaviour - it checks the need_discard
btree for buckets awaiting discards, and then clears the appropriate
bit in the alloc btree, which moves the buckets to the freespace btree.

Additionally, discards are now enabled by default.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f25d8215 09-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill allocator threads & freelists

Now that we have new persistent data structures for the allocator, this
patch converts the allocator to use them.

Now, foreground bucket allocation uses the freespace btree to find
buckets to allocate, instead of popping buckets off the freelist.

The background allocator threads are no longer needed and are deleted,
as well as the allocator freelists. Now we only need background tasks
for invalidating buckets containing cached data (when we are low on
empty buckets), and for issuing discards.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c6b2826c 11-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Freespace, need_discard btrees

This adds two new btrees for the upcoming allocator rewrite: an extents
btree of free buckets, and a btree for buckets awaiting discards.

We also add a new trigger for alloc keys to keep the new btrees up to
date, and a compatibility path to initialize them on existing
filesystems.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3d48a7f8 31-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: KEY_TYPE_alloc_v4

This introduces a new alloc key which doesn't use varints. Soon we'll be
adding backpointers and storing them in alloc keys, which means our
pack/unpack workflow for alloc keys won't really work - we'll need to be
mutating alloc keys in place.

Instead of bch2_alloc_unpack(), we now have bch2_alloc_to_v4() that
converts older types of alloc keys to v4 if needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e154711 13-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: x-macroize alloc_reserve enum

This makes an array of strings available, like our other enums.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 78668fe0 30-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move deletion of refcount=0 indirect extents to their triggers

For backpointers, we need to switch the order triggers are run in: we
need to run triggers for deletions/overwrites before triggers for
inserts.

To avoid breaking the reflink triggers, this patch moves deleting of
indirect extents with refcount=0 to their triggers, instead of doing it
when we update those keys.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 880e2275 12-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move trigger fns to bkey_ops

This replaces the switch statements in bch2_mark_key(),
bch2_trans_mark_key() with new bkey methods - prep work for the next
patch, which fixes BTREE_TRIGGER_WANTS_OLD_AND_NEW.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2158fe46 02-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_inconsistent()

Add a new error macro that also dumps transaction updates in addition to
doing an emergency shutdown - when a transaction update discovers or is
causing a fs inconsistency, it's helpful to see what updates it was
doing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# fa8e94fa 25-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Heap allocate printbufs

This patch changes printbufs dynamically allocate and reallocate a
buffer as needed. Stack usage has become a bit of a problem, and a major
cause of that has been static size string buffers on the stack.

The most involved part of this refactoring is that printbufs must now be
exited with printbuf_exit().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3598c56e 24-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Consolidate trigger code a bit

Upcoming patches are doing more work on the triggers code, this patch
just moves code around.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# ae94c78f 10-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_mark_key() now takes a bkey_i *

We're now coming up with triggers that modify the update being done. A
bkey_s_c is const - bkey_i is the correct type to be using here.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b0551285 19-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve reflink repair code

When a reflink pointer points to a missing indirect extent, we replace
it with an error key. Instead of replacing the entire reflink pointer
with an error key, this patch replaces only the missing range with an
error key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 78c8fe20 19-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Normal update/commit path now works before going RW

This improves __bch2_trans_commit - early in the recovery process, when
we're running btree_gc and before we want to go RW, it now uses
bch2_journal_key_insert() to add the update to the list of updates for
journal replay to do, instead of btree_gc having to use separate
interfaces depending on whether we're running at bringup or, later,
runtime.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2232fa39 13-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Only allocate buckets_nouse when requested

It's only needed by the migrate tool - this patch adds an option to
enable allocating it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2e63e180 24-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Stash a copy of key being overwritten in btree_insert_entry

We currently need to call bch2_btree_path_peek_slot() multiple times in
the transaction commit path - and some of those need to be updated to
also check the keys from journal replay, too. Let's consolidate this and
stash the key being overwritten in btree_insert_entry.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 80bf2f34 06-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix freeing in bch2_dev_buckets_resize()

We were double-freeing old_buckets and not freeing old_buckets_gens:
also, the code was supposed to free buckets, not old_buckets;
old_buckets is only needed because we have to use rcu_assign_pointer()
instead of swap(), and won't be set if we hit the error path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 0678cbe2 10-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Ignore cached data when calculating fragmentation

Previously, bucket fragmentation was considered to be bucket size -
total amount of live data, both dirty and cached.

This meant that if a bucket was full but only a small amount of data in
it was dirty - the rest cached, we'd get stuck: copygc wouldn't move the
dirty data out of the bucket and the allocator wouldn't be able to
invalidate and drop the cached data.

This changes fragmentation to exclude cached data, so that copygc will
evacuate these buckets and copygc/the allocator will always be able to
make forward progress.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 3763cb95 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use in-memory bucket array for alloc updates

More prep work for getting rid of the in-memory bucket array: now that
we have BTREE_ITER_WITH_JOURNAL, the allocator code can do ntree lookups
before journal replay is finished, and there's no longer any need for it
to get allocation information from the in-memory bucket array.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 21aec962 04-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New data structure for buckets waiting on journal commit

Implement a hash table, using cuckoo hashing, for empty buckets that are
waiting on a journal commit before they can be reused.

This replaces the journal_seq field of bucket_mark, and is part of
eventually getting rid of the in memory bucket array.

We may need to make bch2_bucket_needs_journal_commit() lockless, pending
profiling and testing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f443fa66 13-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Also print out in-memory gen on stale dirty pointer

We're trying to track down a bug that shows itself as newly-created
extents having stale dirty pointers - possibly due to the in memory gen
and the btree gen being inconsistent. This patch changes the error
message to also print out the in memory bucket gen when this happens.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f0f41a6d 30-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add error messages for memory allocation failures

This adds some missing diagnostics from rare but annoying to debug
runtime allocation failure paths.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e3ad2937 27-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Optimize bucket reuse

If the btree updates pointing to a bucket were never flushed by the
journal before the bucket became empty again, we can reuse the bucket
without a journal flush.

This tweaks the tracking of journal sequence numbers in alloc keys to
implement this optimization: now, we only update the journal sequence
number in alloc keys on transitions to and from empty. When a bucket
becomes empty, we check if we can tell the journal not to flush entries
starting from when the bucket was used.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 13f914ec 26-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bch2_ec_mem_alloc()

bch2_ec_mem_alloc() was only used by GC, and there's no real need to
preallocate the stripes radix tree since we can cope fine with memory
allocation failure when we use the radix tree. This deletes a fair bit
of code, and it's also needed for the upcoming patch because
bch2_btree_iter_peek_prev() won't be working before journal replay
completes (and using it was incorrect previously, as well).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 36f035e9 26-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix allocator + journal interaction

The allocator needs to wait until the last update touching a bucket has
been commited before writing to it again. However, the code was checking
against the last dirty journal sequence number, not the last flushed
journal sequence number.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# a7860877 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New in-memory array for bucket gens

The main in-memory bucket array is going away, but we'll still need to
keep bucket generations in memory, at least for now - ptr_stale() needs
to be an efficient operation.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 47ac34ec 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Separate out gc_bucket()

Since the main in memory bucket array is going away, we don't want to be
calling bucket() or __bucket() when what we want is the GC in-memory
bucket.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e75b2d4c 23-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_journal_key_insert() no longer transfers ownership

bch2_journal_key_insert() used to assume that the key passed to it was
allocated with kmalloc(), and on success took ownership. This patch
deletes that behaviour, making it more similar to
bch2_trans_update()/bch2_trans_commit().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 77170d0d 24-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_bucket_alloc_new_fs() no longer depends on bucket marks

Now that bch2_bucket_alloc_new_fs() isn't looking at bucket marks to
decide what buckets are eligible to allocate, we can clean up the
filesystem initialization and device add paths. Previously, we had to
use ancient code to mark superblock/journal buckets in the in memory
bucket marks as we allocated them, and then zero that out and re-do that
marking using the newer transational bucket mark paths. Now, we can
simply delete the in-memory bucket marking.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8244f320 14-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Option improvements

This adds flags for options that must be a power of two (block size and
btree node size), and options that are stored in the superblock as a
power of two (encoded extent max).

Also: options are now stored in memory in the same units they're
displayed in (bytes): we now convert when getting and setting from the
superblock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 20572300 10-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve alloc_mem_to_key()

This moves some common code into alloc_mem_to_key(), which translates
from the in-memory format for a bucket to the btree key format.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# fb0e4808 10-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_alloc_write()

This adds a new helper that much like the one we have for inode updates,
that allocates the packed alloc key, packs it and calls
bch2_trans_update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 990d42d1 04-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out struct gc_stripe from struct stripe

We have two radix trees of stripes - one that mirrors some information
from the stripes btree in normal operation, and another that GC uses to
recalculate block usage counts.

The normal one is now only used for finding partially empty stripes in
order to reuse them - the normal stripes radix tree and the GC stripes
radix tree are used significantly differently, so this patch splits them
into separate types.

In an upcoming patch we'll be replacing c->stripes with a btree that
indexes stripes by the order we want to reuse them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 94a3e1a6 04-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_update() is now __must_check

With snapshots, bch2_trans_update() has to check if we need a whitout,
which can cause a transaction restart, so this is important now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b547d005 29-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Erasure coding fixes

When we added the stripe and stripe_redundancy fields to alloc keys, we
neglected to add them to the functions that convert back and forth with
the in-memory types.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 181fe42a 28-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Handle replica marking fsck errors locally

This simplifies the code quite a bit and eliminates an inconsistency - a
given bkey doesn't necessarily translate to a single replicas entry for
disk space accounting.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 58e1ea4b 28-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Push c->mark_lock usage down to where it is needed

This changes the bch2_mark_key() and related paths to take mark lock
where it is needed, instead of taking it in the upper transaction commit
path - by pushing down locking we'll be able to handle fsck errors
locally instead of requiring a separate check in the btree_gc code for
replicas being marked.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 502cfb35 28-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bch2_replicas_delta_list_marked()

This changes bch2_trans_fs_usage_apply() to handle failure (replicas
entry missing) by reverting the changes it made - meaning we can make
the main transaction commit path a bit slimmer, and perhaps also
simplify some locking in upcoming patches.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f0c3f88b 26-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Run insert triggers before overwrite triggers

Currently, btree triggers are run in natural key order, which presents a
problem for fallocate in INSERT_RANGE mode: since we're moving existing
extents to higher offsets, the trigger for deleting the old extent runs
before the trigger that adds the new extent, potentially leading to
indirect extents being deleted that shouldn't be when the delete causes
the refcount to hit 0.

This changes the order we run triggers so that for a givin btree, we run
all insert triggers before overwrite triggers, nicely sidestepping this
issue.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c714614b 15-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Disk space accounting fix on brand-new fs

The filesystem initialization path first marks superblock and journal
buckets non transactionally, since the btree isn't functional yet. That
path was updating the per-journal-buf percpu counters via
bch2_dev_usage_update(), and updating the wrong set of counters so those
updates didn't get written out until journal entry 4.

The relevant code is going to get significantly rewritten in the future
as we transition away from the in memory bucket array, so this just
hacks around it for now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 076c783c 05-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix upgrade path for reflink_p fix

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 3e52c222 29-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add journal_seq to inode & alloc keys

Add fields to inode & alloc keys that record the journal sequence number
when they were most recently modified.

For alloc keys, this is needed to know what journal sequence number we
have to flush before the bucket can be reused. Currently this is tracked
in memory, but we'll be getting rid of the in memory bucket array.

For inodes, this is needed for fsync when the inode has been evicted
from the vfs cache. Currently we use a bloom filter per outstanding
journal buf - but that mechanism has been broken since we added the
ability to not issue a flush/fua for every journal write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2debb1b8 29-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BTREE_TRIGGER_INSERT now only means insert

This allows triggers to distinguish between a key entering the btree -
i.e. being called from the trans commit path - vs. being called on a key
that already exists, i.e. by GC.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 904823de 29-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert bch2_mark_key() to take a btree_trans *

This helps to unify the interface between bch2_mark_key() and
bch2_trans_mark_key() - and it also gives access to the journal
reservation and journal seq in the mark_key path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 961b2d62 29-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Assorted ec fixes

- The backpointer that ec_stripe_update_ptrs() uses now needs to include
the snapshot ID, which means we have to change where we add the
backpointer to after getting the snapshot ID for the new extents

- ec_stripe_update_ptrs() needs to be calling bch2_trans_begin()

- improve error message in bch2_mark_stripe()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 37f72492 29-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_mark_update()

When the old or new key doesn't exist, we should still pass in a deleted
key with the correct pos. This fixes a bug in the ec code, when
bch2_mark_stripe() was looking up the wrong in-memory stripe.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f3b1e193 26-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve error messages in trans_mark_reflink_p()

We should always print out the key we were marking.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 396a887d 21-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix fsck path for refink pointers

The way __bch2_mark_reflink_p returns errors was clashing with returning
the number of sectors processed - we weren't returning FSCK_ERR_EXIT
correctly.

Fix this by only using the return code for errors, which actually ends
up simplifying the overall logic.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 6d76aefe 14-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix for leaking of reflinked extents

When a reflink pointer points to only part of an indirect extent, and
then that indirect extent is fragmented (e.g. by copygc), if the reflink
pointer only points to one of the fragments we leak a reference.

Fix this by storing front/back pad values in reflink pointers - when
inserting reflink pointesr, we initialize them to cover the full range
of the indirect extents we reference.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# dfc276df 18-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve reflink repair code

When a reflink pointer points to an indirect extent that doesn't exist,
we need to replace it with a KEY_TYPE_error key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 14b393ee 15-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Subvolumes, snapshots

This patch adds subvolume.c - support for the subvolumes and snapshots
btrees and related data types and on disk data structures. The next
patches will start hooking up this new code to existing code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 67e0dd8f 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_path

This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6fba6b83 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Prefer using btree_insert_entry to btree_iter

This moves some data dependencies forward, to improve pipelining.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# fd0bd123 17-Aug-2021 Brett Holman <bholman.devel@gmail.com>

bcachefs: Fix 32 bit build failures

This fix replaces multiple 64 bit divisions with do_div() equivalents.

Signed-off-by: Brett Holman <bholman.devel@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 62df3d44 17-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Disk space accounting fix

DIV_ROUND_UP() wasn't doing what we wanted when passing it negative
numbers - fix it by just not passing it negative numbers anymore.

Also, no need to do the scaling by compression ratio for incompressible
data.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 297d8934 10-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Extensive triggers cleanups

- We no longer mark subsets of extents, they're marked like regular
keys now - which means we can drop the offset & sectors arguments
to trigger functions
- Drop other arguments that are no longer needed anymore in various
places - fs_usage
- Drop the logic for handling extents in bch2_mark_update() that isn't
needed anymore, to match bch2_trans_mark_update()
- Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we
don't have an old key to mark

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8c3f6da9 14-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve iter->should_be_locked

Adding iter->should_be_locked introduced a regression where it ended up
not being set on the iterator passed to bch2_btree_update_start(), which
is definitely not what we want.

This patch requires it to be set when calling bch2_trans_update(), and
adds various fixups to make that happen.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8ee529e9 14-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make sure bch2_trans_mark_update uses correct iter flags

Now that bch2_btree_iter_peek_with_updates() has been removed in favor
of BTREE_ITER_WITH_UPDATES, we need to make sure it's not used where we
don't want it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 290448ed 10-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't underflow c->sectors_available

This rarely used error path should've been checking for underflow -
oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 953ee28a 10-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bch2_btree_iter_peek_cached()

It's now been rolled into bch2_btree_iter_peek_slot()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c1949baa 07-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Simplify reflink trigger

Now that we only mark entire extents, we can ditch the
"reflink_p_frag_references" code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8e6bbc41 01-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move extent_handle_overwrites() to bch2_trans_update()

This lifts handling of overlapping extents out of __bch2_trans_commit()
and moves it to where we first do the update - which means that
BTREE_ITER_WITH_UPDATES can now work correctly in extents mode.

Also, this patch reworks how extent triggers work: previously, on
partial extent overwrite we would pass this information to the trigger,
telling it what part of the extent was being overwritten. But, this
approach has had too many subtle corner cases - now, we only mark whole
extents, meaning on partial extent overwrite we unmark the old extent
and mark the new extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 224ec3e6 08-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't mark superblocks past end of usable space

bcachefs-tools recently started putting a backup superblock at the end
of the device. This causes a problem if the bucket size doesn't divide
the device size - but we can fix it by just skipping marking that part.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# bc3f8b25 01-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check for errors from bch2_trans_update()

Upcoming refactoring is going to change bch2_trans_update() to start
returning transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 890b74f0 23-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fsck for reflink refcounts

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9eba7c8d 27-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reflink refcount fix

__bch2_trans_mark_reflink_p wasn't always correctly returning the number
of sectors processed - the new logic is a bit more straightforward
overall too.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 7e94eeff 31-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Inline fastpath of bch2_disk_reservation_add()

The fastpath now doesn't even disable preemption - instead we use a (non
locked) cmpxchg.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ed343411 18-May-2021 Dan Robertson <dan@dlrobertson.com>

bcachefs: statfs resports incorrect avail blocks

The current implementation of bch_statfs does not scale the number of
available blocks provided in f_bavail by the reserve factor. This causes
an allocation of a file of this size to fail.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bbfcb451 16-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_extent_can_insert() call

It was being skipped when hole punching, leading to problems when
splitting compressed extents.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2cd05634 16-May-2021 Brett Holman <bpholman5@gmail.com>

bcachefs: made changes to support clang, fixed a couple bugs

fs/bcachefs/bset.c edited prefetch macro to add clang support
fs/bcachefs/btree_iter.c bugfix: initialize iter->real_pos in bch2_btree_iter_init for later use
fs/bcachefs/io.c bugfix: eliminated undefined behavior (negative bitshift)
fs/bcachefs/buckets.c bugfix: invert sign to handle 64bit abs()

Signed-off-by: Brett Holman <bpholman5@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d125615a 14-May-2021 Dan Robertson <dan@dlrobertson.com>

bcachefs: properly initialize used values

- Ensure the second key value in bch_hash_info is initialized to zero
if the info type is of type BCH_STR_HASH_SIPHASH.

- Initialize the possibly returned value in bch2_inode_create. Assuming
bch2_btree_iter_peek returns bkey_s_c_null, the uninitialized value
of ret could be returned to the user as an error pointer.

- Fix compiler warning in initialization of bkey_s_c_stripe

fs/bcachefs/buckets.c:1646:35: warning: suggest braces around initialization
of subobject [-Wmissing-braces]
struct bkey_s_c_stripe new_s = { NULL };
^~~~

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 933532b8 03-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix reflink trigger

The trigger for reflink pointers wasn't always incrementing/decrementing
the refcounts correctly - this patch fixes that logic.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3a402c8d 07-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix some refcounting bugs

We really need debug mode assertions that ca->ref and ca->io_ref are
used correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d99af4f1 29-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Call bch2_inconsistent_error() on missing stripe/indirect extent

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# eb365fbc 21-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't BUG() in update_replicas

Apparently, we have a bug where in mark and sweep while accounting for a
key, a replicas entry isn't found. Change the code to print out the key
we couldn't mark and halt instead of a BUG_ON().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 04903131 18-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Handle errors in bch2_trans_mark_update()

It's not actually the case that iterators are always checked here -
__bch2_trans_commit() checks for that after running triggers.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dac1525d 16-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: gc shouldn't care about owned_by_allocator

The owned_by_allocator field is a purely in memory thing, even if/when
we bring back GC at runtime there's no need for it to be recalculating
this field. This is prep work for pulling it out of struct bucket, and
eventually getting rid of the bucket array.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d62ab355 14-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_trans_mark_dev_sb()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 319c1305 13-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix heap overrun in bch2_fs_usage_read() XXX squash

oops

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ecc14209 04-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an uninitialized variable

Fortunately it was just used in an error message

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 35d5aff2 03-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bch2_fs_usage_scratch_get()

This is an important cleanup, eliminating an unnecessary copy in the
transaction commit path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9c2e6242 03-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix livelock calling bch2_mark_bkey_replicas()

The bug was that we were trying to find a replicas entry that wasn't
sorted - but, we can also simplify the code by not using
bch2_mark_bkey_replicas and instead ensuring the list of replicas
entries exists directly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b753d4b3 03-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix this_cpu_ptr() usage

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 65bcd657 28-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

buckets.c fixups XXX squash

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e9895f0a 19-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Assert that iterators aren't being double freed

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b3b66e30 12-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Have fsck check for stripe pointers matching stripe

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 41f8b09e 20-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename BTREE_ID enums for consistency with other enums

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9620c3ec 23-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a mempool for the replicas delta list

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8042b5b7 10-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Extents may now cross btree node boundaries

When snapshots arrive, we won't necessarily be able to arbitrarily split
existis - when we need to split an existing extent, we'll have to check
if the extent was overwritten in child snapshots and if so emit a
whiteout for the split in the child snapshot.

Because extents couldn't span btree nodes previously, journal replay
would sometimes have to split existing extents. That's no good anymore,
but fortunately since extent handling has already been lifted above most
of the btree code there's no real need for that rule anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 180fb49d 21-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Journal updates to dev usage

This eliminates the need to scan every bucket to regenerate dev_usage at
mount time.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7f4e1d5d 22-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: KEY_TYPE_alloc_v2

This introduces a new version of KEY_TYPE_alloc, which uses the new
varint encoding introduced for inodes. This means we'll eventually be
able to support much larger bucket sizes (for SMR devices), and the
read/write time fields are expanded to 64 bits - which will be used in
the next patch to get rid of the periodic rescaling of those fields.

Also, for buckets that are members of erasure coded stripes, this adds
persistent fields for the index of the stripe they're members of and the
stripe redundancy. This is part of work to get rid of having to scan and
read into memory the alloc and stripes btrees at mount time.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bfcf840d 22-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Mark superblocks transactionally

More work towards getting rid of the in memory struct bucket: this path
adds code for marking superblock and journal buckets via the btree, and
uses it in the device add and journal resize paths.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9afc6652 22-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bch2_invalidate_bucket()

This patch is working towards eventually getting rid of the in memory
struct bucket, and relying only on the btree representation.

Since bch2_invalidate_bucket() was only used for incrementing gens, not
invalidating cached data, no other counters were being changed as a side
effect - meaning it's safe for the allocator code to increment the
bucket gen directly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 72eab8da 21-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor dev usage

This is to make it more amenable for serialization.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2ef220cb 17-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix double counting of stripe block counts by GC

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cd9f3dfe 17-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix integer overflow in bch2_disk_reservation_get()

The sectors argument shouldn't have been a u32 - it can be up to U32_MAX
(i.e. fallocate creating persistent reservations), and if replication is
enabled we'll overflow when we calculate the real number of sectors to
reserve. Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2a3731e3 11-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Erasure coding fixes & refactoring

- Originally bch_extent_stripe_ptr didn't contain the block index,
instead we'd have to search through the stripe pointers to figure out
which pointer matched. When the block field was added to
bch_extent_stripe_ptr, not all of the code was updated to use it.
This patch fixes that, and we also now verify that field where it
makes sense.

- The ec_stripe_buf_init/exit() functions have been improved, and are
now used by the bch2_ec_read_extent() (recovery read) path.

- get_stripe_key() is now used by bch2_ec_read_extent().

- We now have a getter and setter for checksums within a stripe, like
we had previously for block sector counts, and ec_generate_checksums
and ec_validate_checksums are now quite a bit smaller and cleaner.

ec.c still needs a lot of work, but this patch is slowly moving things
in the right direction.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3187aa8d 21-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much

Previously, we were using BTREE_INSERT_RESERVE in a lot of places where
it no longer makes sense.

- we now have more open_buckets than we used to, and the reserves work
better, so we shouldn't need to use BTREE_INSERT_RESERVE just because
we're holding open_buckets pinned anymore.

- We have the btree key cache for updates to the alloc btree, meaning
we no longer need the btree reserve to ensure the allocator can make
forward progress.

This means that we should only need a reserve for btree updates to
ensure that copygc can make forward progress.

Since it's now just for copygc, we can also fold RESERVE_BTREE into
RESERVE_MOVINGGC (the allocator's freelist reserve).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 719fe7fb 10-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Update transactional triggers interface to pass old & new keys

This is needed to fix a bug where we're overflowing iterators within a
btree transaction, because we're updating the stripes btree (to update
block counts) and the stripes btree trigger is unnecessarily updating
the alloc btree - it doesn't need to update the alloc btree when the
pointers within a stripe aren't changing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f299d573 13-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor filesystem usage accounting

Various filesystem usage counters are kept in percpu counters, with one
set per in flight journal buffer. Right now all the code that deals with
it assumes that there's only two buffers/sets of counters, but the
number of journal bufs is getting increased to 4 in the next patch - so
refactor that code to not assume a constant.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3eb26d01 01-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_get_iter() no longer returns errors

Since we now always preallocate the maximum number of iterators when we
initialize a btree transaction, getting an iterator never fails - we can
delete a fair amount of error path code.

This patch also simplifies the iterator allocation code a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 101d4713 13-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a 64 bit divide

this fixes builds on 32 bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 801a3de6 24-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Indirect inline data extents

When inline data extents were added, reflink was forgotten about - we
need indirect inline data extents for reflink + inline data to work
correctly.

This patch adds them, and a new feature bit that's flipped when they're
used.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5b088c1d 23-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_mark_stripe()

There's no reason not to always recalculate these fields

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b88e971e 22-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't drop replicas when copygcing ec data

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# af4d05c4 09-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Account for stripe parity sectors separately

Instead of trying to charge EC parity to the data within the stripe
(which is subject to rounding errors), let's charge it to the stripe
itself. It should also make -ENOSPC issues easier to deal with if we
charge for parity blocks up front, and means we can also make more fine
grained accounting available to the user.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 39283c71 19-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix for bad stripe pointers

The allocator usually doesn't increment bucket gens right away on
buckets that it's about to hand out (for reasons that need to be
documented), instead deferring that to whatever extent update first
references that bucket.

But stripe pointers reference buckets without changing bucket sector
counts, meaning we could end up with a pointer in a stripe with a gen
newer than the bucket it points to.

Fix this by adding a transactional trigger for KEY_TYPE_stripe that just
writes out the keys in the alloc btree for the buckets it points to.

Also - consolidate the code that checks pointer validity.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f3721e12 16-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Perf improvements for bch_alloc_read()

On large filesystems reading in the alloc info takes a significant
amount of time. But we don't need to be calling into the fully general
bch2_mark_key() path, just open code what we need in
bch2_alloc_read_fn().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9ee38f62 12-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix off-by-one error in ptr gen check

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3d080aa5 22-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Delete unused arguments

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e6d11615 11-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make copygc thread global

Per device copygc threads don't move data to different devices and they
make fragmentation works - they don't make much sense anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 89fd25be 09-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use x-macros for data types

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ba6dd1dd 06-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve stripe triggers/heap code

Soon we'll be able to modify existing stripes - replacing empty blocks
with new blocks and new p/q blocks. This patch updates the trigger code
to handle pointers changing in an existing stripe; also, it
significantly improves how the stripes heap works, which means we can
get rid of the stripe creation/deletion lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e63534a2 06-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rework triggers interface

The trigger for stripe keys is shortly going to need both the old and
the new key passed to the trigger - this patch does that rework.

For now, this just changes the in memory triggers, and this doesn't
change how extent triggers work.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 697e45b2 06-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BTREE_TRIGGER_NOOVERWRITES

This is prep work for reworking the triggers machinery - we have
triggers that need to know both the old and the new key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 64f2a880 28-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_extent_can_insert() not being called

It's supposed to check whether we're splitting a compressed extent and
if so get a bigger disk reservation - hence this fixes a "disk usage
increased by x without a reservaiton" bug.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 649a9b68 18-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Track sectors of erasure coded data

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b9c3d139 17-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a deadlock in the RO path

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1d186789 15-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: delete a slightly faulty assertion

state lock isn't held at startup

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5d20ba48 04-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use cached iterators for alloc btree

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2ca88e5a 07-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Btree key cache

This introduces a new kind of btree iterator, cached iterators, which
point to keys cached in a hash table. The cache also acts as a write
cache - in the update path, we journal the update but defer updating the
btree until the cached entry is flushed by journal reclaim.

Cache coherency is for now up to the users to handle, which isn't ideal
but should be good enough for now.

These new iterators will be used for updating inodes and alloc info (the
alloc and stripes btrees).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1ada1606 15-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Turn c->state_lock into an rwsem

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 255adc51 03-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Always increment bucket gen on bucket reuse

Not doing so confuses copygc

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9ef846a7 03-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve assorted error messages

This also consolidates the various checks in bch2_mark_pointer() and
bch2_trans_mark_pointer().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# baeed3c3 28-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't require alloc btree to be updated before buckets are used

This is to break a circular dependency in the shutdown path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 00b8ccf7 25-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Interior btree updates are now fully transactional

We now update the alloc info (bucket sector counts) atomically with
journalling the update to the interior btree nodes, and we also set new
btree roots atomically with the journalled part of the btree update.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# aafcf9bc 24-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Better error messages on bucket sector count overflows

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 19f24758 16-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use peek_filter() unnecessarily

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e3e464ac 30-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move extent overwrite handling out of core btree code

Ever since the btree code was first written, handling of overwriting
existing extents - including partially overwriting and splittin existing
extents - was handled as part of the core btree insert path. The modern
transaction and iterator infrastructure didn't exist then, so that was
the only way for it to be done.

This patch moves that outside of the core btree code to a pass that runs
at transaction commit time.

This is a significant simplification to the btree code and overall
reduction in code size, but more importantly it gets us much closer to
the core btree code being completely independent of extents and is
important prep work for snapshots.

This introduces a new feature bit; the old and new extent update models
are incompatible when the filesystem needs journal replay.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2e70ce56 18-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More btree iter invariants

Ensure that iter->pos always lies between the start and end of iter->k
(the last key returned). Also, bch2_btree_iter_set_pos() now invalidates
the key that peek() or next() returned.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 38f0664a 26-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix error message on bucket sector count overflow

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 548b3d20 07-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_ptr_v2

Add a new btree ptr type which contains the sequence number (random 64
bit cookie, actually) for that btree node - this lets us verify that
when we read in a btree node it really is the btree node we wanted.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 24326cd1 31-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Sort & deduplicate updates in bch2_trans_update()

Previously, when doing multiple update in the same transaction commit
that overwrote each other, we relied on doing the updates in the same
order as the bch2_trans_update() calls in order to get the correct
result. But that wasn't correct for triggers; bch2_trans_mark_update()
when marking overwrites would do the wrong thing because it hadn't seen
the update that was being overwritten.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2d594dfb 31-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out btree_trigger_flags

The trigger flags really belong with individual btree_insert_entries,
not the transaction commit flags - this splits out those flags and
unifies them with the BCH_BUCKET_MARK flags. Todo - split out
btree_trigger.c from buckets.c

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 54e86b58 30-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make btree_insert_entry more private to update path

This should be private to btree_update_leaf.c, and we might end up
removing it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ef496cd2 26-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't BUG_ON() sector count overflow

Return an error instead (still work in progress...)

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b7ba66c8 28-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Inline more of bch2_trans_commit hot path

The main optimization here is that if we let
bch2_replicas_delta_list_apply() fail, we can completely skip calling
bch2_bkey_replicas_marked_locked().

And assorted other small optimizations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ff929515 28-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Trust btree alloc info at runtime

This lets us avoid a cache miss in the write path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 77d63522 19-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make replicas_delta_list smaller

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 43de7376 07-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix erasure coding disk space accounting

Disk space accounting for erasure coding + compression was completely
broken - we need to calculate the parity sectors delta the same way we
calculate disk_sectors, by calculating the old and new usage and
subtracting to get the difference.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 37954a27 08-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Limit pointers to being in only one stripe

This make the disk accounting code saner, and it's not clear why we'd
ever want the same data to be in multiple stripes simultaneously.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 332c6e53 08-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_mark_extent()

If an extent only contained cached or erasure coded pointers, there
won't be any devices in the normal dirty replicas list or an entry to
update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 64bc0011 26-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rework btree iterator lifetimes

The btree_trans struct needs to memoize/cache btree iterators, so that
on transaction restart we don't have to completely redo btree lookups,
and so that we can do them all at once in the correct order when the
transaction had to restart to avoid a deadlock.

This switches the btree iterator lookups to work based on iterator
position, instead of trying to match them up based on the stack trace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a7199432 22-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill deferred btree updates

Will be replaced by cached btree iterators

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 63095894 22-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improved bch2_fcollapse()

Move extents instead of copying them - this way, we can iterate over
only live extents, not the entire keyspace. Also, this means we can
mostly skip running triggers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 36e9d698 07-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Do updates in order they were queued up in

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4430ea70 05-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BTREE_INSERT_NOMARK_INSERT

Was dead code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 78854fca 29-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix BTREE_INSERT_NOMARK_OVERWRITES

bch2_mark_update() was correct, but bch2_trans_mark_update() wasn't
respecting BTREE_INSERT_NOMARK_OVERWRITES - key marking/triggers really
need to be cleaned up.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 06ab329c 29-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve pointer marking checks and error messages

Importantly, we don't want to use bch2_fs_inconsistent_on() for errors
that fsck can repair, becuase that will just put us in RO mode and
prevent fsck from actually fixing stuff. Probably want to get rid of it
in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9940a791 27-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix error message on bucket overflow

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# df5d4dae 22-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fixes for replicas tracking

The continue statement in bch2_trans_mark_extent() was wrong - by
bailing out early, we'd be constructing the wrong replicas list to
update. Also, the assertion in update_replicas() was wrong - due to
rounding with compressed extents, it is possible for sectors to be 0
sometimes.

Also, change extent_to_replicas() in replicas.c to match the replicas
list we construct in buckets.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6671a708 27-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor bch2_alloc_write()

Major simplification - gets rid of the need for marking buckets as
dirty, instead we write buckets if the in memory mark is different from
what's in the btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 67163cde 27-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Trust in memory bucket mark

This fixes a bug in the journal replay -> extent_replay_key ->
split_compressed path, when we do an update that changes alloc info but
the alloc info in the btree isn't up to date yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 41fcd621 21-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix faulty assertion

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 76426098 16-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reflink

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3c7f3b7a 16-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor bch2_extent_trim_atomic() for reflink

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2cbe5cfe 09-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rework calling convention for marking overwrites

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e3d3a9d9 06-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trans_get_key() now works correctly for extents

More prep work for reflink: for extents, we're not looking for an exact
mach on pos, rather that the pos is within the range of the key the
iterator points to.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0c04f5eb 15-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't overflow trans with iters from triggers

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8d591d5d 12-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert some assertions to fsck errors

Actual repair code will come later, but this is a start

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 91052b9d 24-Jun-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor trans_(get|update)_key

these are still pretty ugly...

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 88767d65 24-Jun-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Update path now handles triggers that generate more triggers

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6e738539 24-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve key marking interface

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4ee202e2 21-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: better BTREE_INSERT_NO_CLEAR_REPLICAS

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3838be78 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use a fixed size buffer for fs_usage_deltas

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6fb076e6 14-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix spurious inconsistency in recovery

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7cfac5f5 08-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix for the stripes mark path and gc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 460651ee 17-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Various improvements to bch2_alloc_write()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 932aa837 11-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_mark_update()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5e82a9a1 10-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Write out fs usage consistently

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fca1223c 03-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Avoid write lock on mark_lock

mark_lock is a frequently taken lock, and there's also potential for
deadlocks since currently bch2_clear_page_bits which is called from
memory reclaim has to take it to drop disk reservations.

The disk reservation get path takes it when it recalculates the number
of sectors known to be available, but it's not really needed for
consistency. We just want to make sure we only have one thread updating
the sectors_available count, which we can do with a dedicated mutex.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 94f651e2 17-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Return errors from for_each_btree_key()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 201a4d4c 17-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix triggers for stripes btree

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c6dd04f8 15-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Mark overwrites from journal replay in initial gc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a1d58243 29-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: add ability to run gc on metadata only

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 36e916e1 29-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Caller now responsible for calling mark_key for gc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3a0e06db 24-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted preemption fixes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4d8100da 15-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Allocate fs_usage in do_btree_insert_at()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0dc17247 13-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: kill struct btree_insert

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 59928c12 07-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't BUG_ON() on bucket sector count overflow

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ecf37a4a 14-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fs_usage_u64s()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 768ac639 14-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a mechanism for blocking the journal

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8fe826f9 13-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert bucket invalidation to key marking path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 73c27c60 14-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fixes for cached data accounting

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8777210b 12-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: refactor key marking code a bit

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2ecc6171 12-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix double counting when gc is running

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 39fbc5a4 11-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: gc lock no longer needed for disk reservations

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 76f4c7b0 11-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix oldest_gen handling

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3577df5f 09-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: serialize persistent_reserved

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e0745e2 24-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: initialize fs usage summary in recovery

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4c97e04a 06-Feb-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: percpu utility code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bdba6c29 24-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix inode counting

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 61c8d7c8 25-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Persist stripe blocks_used

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 430735cd 18-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Persist alloc info on clean shutdown

- Does not persist alloc info for stripes yet
- Also does not yet include filesystem block/sector counts yet, from
struct fs_usage
- Not made use of just yet

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7ef2a73a 21-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix check for if extent update is allocating

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d0cc3def 13-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More allocator startup improvements

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 23f80d2b 17-Dec-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Factor out acc_u64s()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 06b7345c 01-Dec-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Include summarized counts in fs_usage

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5663a415 27-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: refactor bch_fs_usage

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 641ab736 06-Dec-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: improve/clarify ptr_disk_sectors()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9166b41d 25-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: s/usage_lock/mark_lock

better describes what it's for, and we're going to call a new lock
usage_lock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8eb7f3ee 18-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: move dirty into bucket_mark

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f0cfb963 29-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Track nr_inodes with the key marking machinery

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 26609b61 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make bkey types globally unique

this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.

Also improve the on disk format versioning stuff.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# eeb83e25 22-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Hold usage_lock over mark_key and fs_usage_apply

Fixes an inconsistency at the end of gc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dfe9bfb3 24-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Stripes now properly subject to gc

gc now verifies the contents of the stripes radix tree, important for
persistent alloc info

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9ca53b55 23-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: gc now operates on second set of bucket marks

This means we can now use gc to verify the allocation information -
important for testing persistant alloc info

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 61274e9d 18-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Allocator startup improvements

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cd575ddf 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Erasure coding

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b35b1925 05-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move key marking out of extents.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4628529f 04-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Disk usage in compressed sectors, not uncompressed

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8b335bae 04-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Assorted fixes for running on very small devices

It's now possible to create and use a filesystem on a 512k device with
4k buckets (though at that size we still waste almost half to internal
reserves)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b092dadd 04-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Scale down number of writepoints when low on space

this means we don't have to reserve space for them when calculating
filesystem capacity

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 47799326 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: more key marking refactoring

prep work for erasure coding

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1742237b 27-Sep-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: extent_for_each_ptr_decode()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7b3f84ea 05-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out alloc_background.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6eac2c2e 24-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Change how replicated data is accounted

Due to compression, the different replicas of a replicated extent don't
necessarily have to take up the same amount of space - so replicated
data sector counts shouldn't be stored divided by the number of
replicas.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5b650fd1 24-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Account for internal fragmentation better

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 09f3297a 24-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: kill s_alloc, use bch_data_type

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a7c7a309 23-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_mark_key() now takes bch_data_type

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3142e7ef 23-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix nbuckets usage on device resize

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b29e197a 22-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Invalidate buckets when writing to alloc btree

Prep work for persistent alloc information. Refactoring also lets us
make free_inc much smaller, which means a lot fewer buckets stranded on
freelists.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b2be7c8b 22-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: kill bucket mark sector count saturation

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c6923995 21-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: don't call bch2_bucket_seq_cleanup from journal_buf_switch

journal_buf_switch is called from the foreground when getting a journal
reservation and thus is somewhat latency sensitive;
bch2_bucket_seq_cleanup has to run infrequently but is a bit expensive
when it does run.

Call it from the journal write path instead, and punt the journal write
to worqueue context.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c6fdbd8 17-Mar-2017 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Initial commit

Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>