History log of /linux-master/fs/bcachefs/movinggc.c
Revision Date Author Comments
# 3ed94062 17-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve bch2_fatal_error()

error messages should always include __func__

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9fea2274 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: for_each_member_device() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 80eab7a7 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: for_each_btree_key() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3a860b5a 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: for_each_btree_key_upto() -> for_each_btree_key_old_upto()

And for_each_btree_key2_upto -> for_each_btree_key_upto

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# defd9e39 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: darray_for_each() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cf904c8d 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_err_(fn|msg) check if should print

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a79e1b6d 27-Nov-2023 Daniel Hill <daniel@gluo.nz>

bcachefs: copygc shouldn't try moving buckets on error

Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3ec3758a 27-Nov-2023 Daniel Hill <daniel@gluo.nz>

bcachefs: copygc should wakeup on shutdown if disabled

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 74529338 26-Nov-2023 Daniel Hill <daniel@gluo.nz>

bcachefs: remove dead bch2_evacuate_bucket()

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cb13f471 02-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_write_buffer_flush() -> bch2_btree_write_buffer_tryflush()

More accurate naming.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 183bcc89 02-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Clean up btree write buffer write ref handling

__bch2_btree_write_buffer_flush() now assumes a write ref is already
held (as called by the transaction commit path); and the wrappers
bch2_write_buffer_flush() and flush_sync() take an explicit write ref.

This means internally the write buffer code can always use
BTREE_INSERT_NOCHECK_RW, instead of in the previous code passing flags
around and hoping the NOCHECK_RW flag was always carried around
correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dafff7e5 23-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New bucket sector count helpers

This introduces bch2_bucket_sectors() and bch2_bucket_sectors_dirty(),
prep work for separately accounting stripe sectors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 415e5107 28-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Extra kthread_should_stop() calls for copygc

This fixes a bug where going read-only was taking longer than it should
have due to copygc forgetting to check kthread_should_stop()

Additionally: fix a missing is_kthread check in bch2_move_ratelimit().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0e91d3a6 01-Nov-2023 Brian Foster <bfoster@redhat.com>

bcachefs: fix odebug warn and lockdep splat due to on-stack rhashtable

Guenter Roeck reports a lockdep splat and DEBUG_OBJECTS_WORK related
warning when bch2_copygc_thread() initializes its rhashtable. The
lockdep splat relates to a warning print caused by the fact that the
rhashtable exists on the stack but is not annotated as so. This is
something that could be addressed by INIT_WORK_ONSTACK(), but
rhashtable doesn't expose that control and probably isnt worth the
churn for just one user. Instead, dynamically allocate the
buckets_in_flight structure and avoid the splat that way.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f82755e4 30-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Data move path now uses bch2_trans_unlock_long()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1f7056b7 30-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Ensure copygc does not spin

If copygc does no work - finds no fragmented buckets - wait for a bit of
IO to happen.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96a363a7 23-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: move: move_stats refactoring

data_progress_list is gone - it was redundant with moving_context_list

The upcoming rebalance rewrite is going to have it using two different
move_stats objects with the same moving_context, depending on whether
it's scanning or using the rebalance_work btree - this patch plumbs
stats around a bit differently so that will work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 63316903 20-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: moving_context now owns a btree_trans

btree_trans and moving_context are used together, and having the
moving_context owns the transaction object reduces some plumbing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6bd68ec2 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Heap allocate btree_trans

We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96dea3d5 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix W=12 build errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e82f5f40 12-Sep-2023 Nathan Chancellor <nathan@kernel.org>

bcachefs: Fix -Wcompare-distinct-pointer-types in bch2_copygc_get_buckets()

When building bcachefs for 32-bit ARM, there is a warning when using
max() to compare an expression involving 'size_t' with an 'unsigned
long' literal:

fs/bcachefs/movinggc.c:159:21: error: comparison of distinct pointer types ('typeof (16UL) *' (aka 'unsigned long *') and 'typeof (buckets_in_flight->nr / 4) *' (aka 'unsigned int *')) [-Werror,-Wcompare-distinct-pointer-types]
159 | size_t nr_to_get = max(16UL, buckets_in_flight->nr / 4);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/minmax.h:76:19: note: expanded from macro 'max'
76 | #define max(x, y) __careful_cmp(x, y, >)
| ^~~~~~~~~~~~~~~~~~~~~~
include/linux/minmax.h:38:24: note: expanded from macro '__careful_cmp'
38 | __builtin_choose_expr(__safe_cmp(x, y), \
| ^~~~~~~~~~~~~~~~
include/linux/minmax.h:28:4: note: expanded from macro '__safe_cmp'
28 | (__typecheck(x, y) && __no_side_effects(x, y))
| ^~~~~~~~~~~~~~~~~
include/linux/minmax.h:22:28: note: expanded from macro '__typecheck'
22 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
| ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~
1 error generated.

On 64-bit architectures, size_t is 'unsigned long', so there is no
warning when comparing these two expressions. Use max_t(size_t, ...) for
this situation, eliminating the warning.

Fixes: dd49018737d4 ("bcachefs: Rhashtable based buckets_in_flight for copygc")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1809b8cb 10-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Break up io.c

More reorganization, this splits up io.c into
- io_read.c
- io_misc.c - fallocate, fpunch, truncate
- io_write.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e46c181a 10-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert more code to bch_err_msg()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f6e6f42b 04-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix for bch2_copygc() spuriously returning -EEXIST

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f33c58fc 27-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill BTREE_INSERT_USE_RESERVE

Now that we have journal watermarks and alloc watermarks unified,
BTREE_INSERT_USE_RESERVE is redundant and can be deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ec14fc60 27-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill JOURNAL_WATERMARK

This unifies JOURNAL_WATERMARK with BCH_WATERMARK; we're working towards
specifying watermarks once in the transaction commit path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0ce4e0e7 26-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add a missing rhashtable_destroy() call

Fixes https://lore.kernel.org/linux-bcachefs/784c3e6a-75bd-e6ca-535a-43b3e1daf643@kernel.dk/T/#mbf7caf005f960018eba23b58795d06c06c947411

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e53a961c 24-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename enum alloc_reserve -> bch_watermark

This is prep work for consolidating with JOURNAL_WATERMARK.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e47a390a 27-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert -ENOENT to private error codes

As with previous conversions, replace -ENOENT uses with more informative
private error codes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bcb79a51 29-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_bkey_get_iter() helpers

Introduce new helpers for a common pattern:

bch2_trans_iter_init();
bch2_btree_iter_peek_slot();

- bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of
the correct type
- bch2_bkey_get_val_typed() copies the val out of the btree to a
(typically stack allocated) variable; it handles the case where the
value in the btree is smaller than the current version of the type,
zeroing out the remainder.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 958c347b 29-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Mark bch2_copygc() noinline

This works around a "stack from too large" error.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1af5227c 21-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill bch2_verify_bucket_evacuated()

With backpointers, it's now impossible for bch2_evacuate_bucket() to be
completely reliable: it can race with an extent being partially
overwritten or split, which needs a new write buffer flush for the
backpointer to be seen.

This shouldn't be a real issue in practice; the previous patch added a
new tracepoint so we'll be able to see more easily if it is.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 32de2ea0 11-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rhashtable based buckets_in_flight for copygc

Previously, copygc used a fifo for tracking buckets in flight - this had
the disadvantage of being fixed size, since we pass references to
elements into the move code.

This restructures it to be a hash table and linked list, since with
erasure coding we need to be able to pipeline across an arbitrary number
of buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0fb11e08 17-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improved copygc wait debugging

This just adds a line for how long copygc has been waiting to sysfs
copygc_wait, helpful for debugging why copygc isn't running.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c639c29c 14-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix an assert in copygc thread shutdown path

We're not supposed to have nested (locked) btree_trans on the stack:
this means copygc shutdown needs to exit our btree_trans before exiting
the move_ctxt, which calls bch2_write().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2d004446 14-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_bucket_is_movable() -> BTREE_ITER_CACHED

BTREE_ITER_CACHED should really be the default for cached btrees - this
is an easy mistake to make.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8fcdf814 27-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improved copygc pipelining

This improves copygc pipelining across multiple buckets: we now track
each in flight bucket we're evacuating, with separate moving_contexts.

This means that whereas previously we had to wait for outstanding moves
to complete to ensure we didn't try to evacuate the same bucket twice,
we can now just check buckets we want to evacuate against the pending
list.

This also mean we can run the verify_bucket_evacuated() check without
killing pipelining - meaning it can now always be enabled, not just on
debug builds.

This is going to be important for the upcoming erasure coding work,
where moving IOs that are being erasure coded will now skip the initial
replication step; instead the IOs will wait on the stripe to complete.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 91065976 01-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Mark stripe buckets with correct data type

Currently, we don't use bucket data type for tracking whether buckets
are part of a stripe; parity buckets are BCH_DATA_parity, but data
buckets in a stripe are BCH_DATA_user. There's a separate counter,
buckets_ec, outside the BCH_DATA_TYPES system for tracking number of
buckets on a device that are part of a stripe.

The trouble with this approach is that it's too coarse grained, and we
need better information on fragmentation for debugging copygc.

With this patch, data buckets in a stripe are now tracked as
BCH_DATA_stripe buckets.

This doesn't yet differentiate between erasure coded and non-erasure
coded data in a stripe bucket, nor do we yet track empty data buckets in
stripes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c85d7796 01-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_copygc_wait_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 80c33085 05-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fragmentation LRU

Now that we have much more efficient updates to the LRU btree, this
patch adds a new LRU that indexes buckets by fragmentation.

This means copygc no longer has to scan every bucket to find buckets
that need to be evacuated.

Changes:
- A new field in bch_alloc_v4, fragmentation_lru - this corresponds to
the bucket's position in the fragmentation LRU. We add a new field
for this instead of calculating it as needed because we may make the
fragmentation LRU optional; this field indicates whether a bucket is
on the fragmentation LRU.

Also, zoned devices will introduce variable bucket sizes; explicitly
recording the LRU position will be safer for them.

- A new copygc path for using the fragmentation LRU instead of
scanning every bucket and building up an in-memory heap.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8e3f913e 17-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Copygc now uses backpointers

Previously, copygc needed to walk the entire extents & reflink btrees to
find extents that needed to be moved.

Now that we have backpointers, this patch implements
bch2_evacuate_bucket() in the move code, which copygc now uses for
evacuating mostly empty buckets.

Also, thanks to the new backpointers code, copygc can now move btree
nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 19a614d2 30-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Better inlining for bch2_alloc_to_v4_mut

This separates out the slowpath into a separate function, and inlines
bch2_alloc_v4_mut into bch2_trans_start_alloc_update(), the main place
it's called.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 858536c7 11-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert EROFS errors to private error codes

More error code improvements - this gets us more useful error messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5f659376 12-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Suppress -EROFS messages when shutting down

This isn't actually an error condition, this just indicates a normal
shutdown - no reason for these to be in the log.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b2d1d56b 13-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fixes for building in userspace

- Marking a non-static function as inline doesn't actually work and is
now causing problems - drop that

- Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages
with bcachefs (filesystem name)

- Userspace doesn't have real percpu variables (maybe we can get this
fixed someday), put an #ifdef around bch2_disk_reservation_add()
fastpath

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 674cfc26 26-Aug-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add persistent counters for all tracepoints

Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d4bf5eec 18-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bch2_err_str() in error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 0337cc7e 20-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: move.c refactoring

- add bch2_moving_ctxt_(init|exit)
- split out __bch2_evacutae_bucket() which takes an existing
moving_ctxt, this will be used for improving copygc performance by
pipelining across multiple buckets

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c91996c5 15-Jun-2022 Daniel Hill <daniel@gluo.nz>

bcachefs: data jobs, including rebalance wait for copygc.

move_ratelimit() now has a bool that specifies whether we want to
wait for copygc to finish.

When copygc is running, we're probably low on free buckets instead
of consuming the remaining buckets, we want to wait for copygc to
finish.

This should help with performance, and run away bucket fragmentation.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7f5c5d20 13-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Redo data_update interface

This patch significantly cleans up and simplifies the data_update
interface. Instead of only being able to specify a single pointer by
device to rewrite, we're now able to specify any or all of the pointers
in the original extent to be rewrited, as a bitmask.

data_cmd is no more: the various pred functions now just return true if
the extent should be moved/updated. All the data_update path does is
rewrite existing replicas, or add new ones.

This fixes a bug where with background compression on replicated
filesystems, where rebalance -> data_update would incorrectly drop the
wrong old replica, and keep trying to recompress an extent pointer and
each time failing to drop the right replica. Oops.

Now, the data update path doesn't look at the io options to decide which
pointers to keep and which to drop - it only goes off of the
data_update_options passed to it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 54feff0a 17-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve "copygc requested to run" error message

This improves the "copygc requested to run but no buckets found" to show
the device that requires copygc to be run on - we'll definitely need to
improve this more.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 822835ff 31-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fold bucket_state in to BCH_DATA_TYPES()

Previously, we were missing accounting for buckets in need_gc_gens and
need_discard states. This matters because buckets in those states need
other btree operations done before they can be used, so they can't be
conuted when checking current number of free buckets against the
allocation watermark.

Also, we weren't directly counting free buckets at all. Now, data type 0
== BCH_DATA_free, and free buckets are counted; this means we can get
rid of the separate (poorly defined) count of unavailable buckets.

This is a new on disk format version, with upgrade and fsck required for
the accounting changes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# afb6f7f6 04-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Silence spurious copygc err when shutting down

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f25d8215 09-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill allocator threads & freelists

Now that we have new persistent data structures for the allocator, this
patch converts the allocator to use them.

Now, foreground bucket allocation uses the freespace btree to find
buckets to allocate, instead of popping buckets off the freelist.

The background allocator threads are no longer needed and are deleted,
as well as the allocator freelists. Now we only need background tasks
for invalidating buckets containing cached data (when we are low on
empty buckets), and for issuing discards.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3d48a7f8 31-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: KEY_TYPE_alloc_v4

This introduces a new alloc key which doesn't use varints. Soon we'll be
adding backpointers and storing them in alloc keys, which means our
pack/unpack workflow for alloc keys won't really work - we'll need to be
mutating alloc keys in place.

Instead of bch2_alloc_unpack(), we now have bch2_alloc_to_v4() that
converts older types of alloc keys to v4 if needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 31f63fd1 14-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Introduce a separate journal watermark for copygc

Since journal reclaim -> btree key cache flushing may require the
allocation of new btree nodes, it has an implicit dependency on copygc
in order to make forward progress - so we should avoid blocking copygc
unless the journal is really close to full.

This introduces watermarks to replace our single MAY_GET_UNRESERVED bit
in the journal, and adds a watermark for copygc and plumbs it through.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e154711 13-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: x-macroize alloc_reserve enum

This makes an array of strings available, like our other enums.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d73e0d2c 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Copygc no longer uses bucket array

This converts the copygc code to use the alloc btree directly to find
buckets that need to be evacuated instead of the in-memory bucket array,
which is finally going away soon.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 0678cbe2 10-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Ignore cached data when calculating fragmentation

Previously, bucket fragmentation was considered to be bucket size -
total amount of live data, both dirty and cached.

This meant that if a bucket was full but only a small amount of data in
it was dirty - the rest cached, we'd get stuck: copygc wouldn't move the
dirty data out of the bucket and the allocator wouldn't be able to
invalidate and drop the cached data.

This changes fragmentation to exclude cached data, so that copygc will
evacuate these buckets and copygc/the allocator will always be able to
make forward progress.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# acc3e09b 06-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename data_op_data_progress -> data_jobs

Mild refactoring.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 200472e9 27-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add an error message for copygc spinning

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 47b15c57 04-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix copygc sectors_to_move calculation

With erasure coding, copygc's count of sectors to move was off, which
matters for the debug statement it prints out when it's not able to move
all the data it tried to.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8dd6ed94 23-Jul-2021 Brett Holman <bholman.devel@gmail.com>

bcachefs: add progress stats to sysfs

This adds progress stats to sysfs for copygc, rebalance, recovery, and the
cmd_job ioctls.

Signed-off-by: Brett Holman <bholman.devel@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 28624ba4 18-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Be sure to check ptr->dev in copygc pred function

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 19d2819d 25-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a tracepoint for copygc waiting

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c4d4b2f0 25-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a cond_resched call to the copygc main loop

We seem to have a bug where the copygc thread ends up spinning and
making the system unusable - this will at least prevent it from locking
up the machine, and it's a good thing to have anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d4b44223 27-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Change copygc wait amount to be min of per device waits

We're seeing a filesystem get stuck when all devices but one have no
more reclaimable buckets - because the copygc wait amount is curretly
filesystem wide.

This patch should fix that, possibly at the expensive of running too
much when only one or a few devices is full and the rebalance thread
needs to move data around.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f09517fc 20-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a deadlock on journal reclaim

Flushing the btree key cache needs to use allocation reserves - journal
reclaim depends on flushing the btree key cache for making forward
progress, and the allocator and copygc depend on journal reclaim making
forward progress.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bae895a5 18-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add allocator thread state to sysfs

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 51c66fed 17-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rip out copygc pd controller

We have a separate mechanism for ratelimiting copygc now - the pd
controller has only been causing problems.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5bbe4bf9 13-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add copygc wait to sysfs

Currently debugging an issue with copygc not running when it's supposed
to, and this is an obvious first step.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cb66fc5f 13-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix copygc threshold

Awhile back the meaning of is_available_bucket() and thus also
bch_dev_usage->buckets_unavailable changed to include buckets that are
owned by the allocator - this was so that the stat could be persisted
like other allocation information, and wouldn't have to be regenerated
by walking each bucket at mount time.

This broke copygc, which needs to consider buckets that are reclaimable
and haven't yet been grabbed by the allocator thread and moved onta
freelist. This patch fixes that by adding dev_buckets_reclaimable() for
copygc and the allocator thread, and cleans up some of the callers a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1889ad5a 14-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add code to scan for/rewite old btree nodes

This adds a new data job type to scan for btree nodes in the old extent
format, and rewrite them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dab9ef0d 23-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add error message for some allocation failures

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2abe5420 21-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Persist 64 bit io clocks

Originally, bcachefs - going back to bcache - stored, for each bucket, a
16 bit counter corresponding to how long it had been since the bucket
was read from. But, this required periodically rescaling counters on
every bucket to avoid wraparound. That wasn't an issue in bcache, where
we'd perodically rewrite the per bucket metadata all at once, but in
bcachefs we're trying to avoid having to walk every single bucket.

This patch switches to persisting 64 bit io clocks, corresponding to the
64 bit bucket timestaps introduced in the previous patch with
KEY_TYPE_alloc_v2.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7f4e1d5d 22-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: KEY_TYPE_alloc_v2

This introduces a new version of KEY_TYPE_alloc, which uses the new
varint encoding introduced for inodes. This means we'll eventually be
able to support much larger bucket sizes (for SMR devices), and the
read/write time fields are expanded to 64 bits - which will be used in
the next patch to get rid of the periodic rescaling of those fields.

Also, for buckets that are members of erasure coded stripes, this adds
persistent fields for the index of the stripe they're members of and the
stripe redundancy. This is part of work to get rid of having to scan and
read into memory the alloc and stripes btrees at mount time.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 72eab8da 21-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor dev usage

This is to make it more amenable for serialization.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3187aa8d 21-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much

Previously, we were using BTREE_INSERT_RESERVE in a lot of places where
it no longer makes sense.

- we now have more open_buckets than we used to, and the reserves work
better, so we shouldn't need to use BTREE_INSERT_RESERVE just because
we're holding open_buckets pinned anymore.

- We have the btree key cache for updates to the alloc btree, meaning
we no longer need the btree reserve to ensure the allocator can make
forward progress.

This means that we should only need a reserve for btree updates to
ensure that copygc can make forward progress.

Since it's now just for copygc, we can also fold RESERVE_BTREE into
RESERVE_MOVINGGC (the allocator's freelist reserve).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b206df6e 03-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix some spurious gcc warnings

These only come up when building in userspace, for some reason.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b7a9bbfc 19-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move journal reclaim to a kthread

This is to make tracing easier.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b88e971e 22-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't drop replicas when copygcing ec data

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 142cbdff 12-Aug-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Change copygc to consider bucket fragmentation

When devices have different sized buckets this is more correct.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 74ed7e56 21-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't let copygc buckets be stolen by other threads

And assorted other copygc fixes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3d080aa5 22-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Delete unused arguments

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8f3b41ab 11-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't restrict copygc writes to the same device

This no longer makes any sense, since copygc is now one thread per
filesystem, not per device, with a single write point.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e6d11615 11-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make copygc thread global

Per device copygc threads don't move data to different devices and they
make fragmentation works - they don't make much sense anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 89fd25be 09-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use x-macros for data types

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 784d8d17 03-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve warning for copygc failing to move data

This will help narrow down which code is at fault when this happens.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 309c54c3 20-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Redo copygc throttling

The code that checked the current free space and waited if it was too
big was causing issues - btree node allocations do not increment the
write IO clock (perhaps they should); but more broadly the check
wouldn't run copygc at all until the device was mostly full, at which
point it might have to do a bunch of work.

This redoes that logic so that copygc starts to run earlier, smoothly
running more and more often as the device becomes closer to full.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bd7e82ee 20-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: kill ca->freelist_lock

All uses were supposed to be switched over to c->freelist_lock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 99aaf570 25-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor various code to not be extent specific

With reflink, various code now has to handle both KEY_TYPE_extent
or KEY_TYPE_reflink_v - so, convert it to be generic across all keys
with pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5884fddf 24-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix starting copygc when already started

We can sometimes call bch2_dev_read_write() when the device is already
RW (in error paths).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f80b4e64 16-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix hang while shutting down

If the allocator thread exited before bch2_dev_allocator_stop() was
called (because of an error), bch2_dev_allocator_quiesce() could hang.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3ea2b1e1 12-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: cmp_int()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ac7f0d77 03-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: ratelimit copygc warning

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 26609b61 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make bkey types globally unique

this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.

Also improve the on disk format versioning stuff.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 198d6700 21-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: add functionality for heaps to update backpointers

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7b3f84ea 05-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out alloc_background.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a9bec520 01-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Better calculation of copygc threshold

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c6fdbd8 17-Mar-2017 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Initial commit

Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>