History log of /linux-master/fs/bcachefs/move.c
Revision Date Author Comments
# d7e77f53 16-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: opts->compression can now also be applied in the background

The "apply this compression method in the background" paths now use the
compression option if background_compression is not set; this means that
setting or changing the compression option will cause existing data to
be compressed accordingly in the background.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ec4edd7b 16-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Prep work for variable size btree node buffers

bcachefs btree nodes are big - typically 256k - and btree roots are
pinned in memory. As we're now up to 18 btrees, we now have significant
memory overhead in mostly empty btree roots.

And in the future we're going to start enforcing that certain btree node
boundaries exist, to solve lock contention issues - analagous to XFS's
AGIs.

Thus, we need to start allocating smaller btree node buffers when we
can. This patch changes code that refers to the filesystem constant
c->opts.btree_node_size to refer to the btree node buffer size -
btree_buf_bytes() - where appropriate.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 189c176c 15-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve move_extent tracepoint

Also print out the data_opts, so that we can see what specifically is
being done to an extent.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fa3185af 15-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Re-add move_extent_write tracepoint

It appears this was accidentally deleted at some point - also, do a bit
of cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e58f963c 06-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: helpers for printing data types

We need bounds checking since new versions may introduce new data types.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0beebd92 21-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bkey_for_each_ptr() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 80eab7a7 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: for_each_btree_key() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# defd9e39 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: darray_for_each() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cf904c8d 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_err_(fn|msg) check if should print

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5028b907 07-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename for_each_btree_key2() -> for_each_btree_key()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 27b2df98 07-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill for_each_btree_key()

for_each_btree_key() handles transaction restarts, like
for_each_btree_key2(), but only calls bch2_trans_begin() after a
transaction restart - for_each_btree_key2() wraps every loop iteration
in a transaction.

The for_each_btree_key() behaviour is problematic when it leads to
holding the SRCU lock that prevents key cache reclaim for an unbounded
amount of time - there's no real need to keep it around.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0c069781 25-Nov-2023 Daniel Hill <daniel@gluo.nz>

bcachefs: rebalance should wakeup on shutdown if disabled

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 74529338 26-Nov-2023 Daniel Hill <daniel@gluo.nz>

bcachefs: remove dead bch2_evacuate_bucket()

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 62286a08 27-Nov-2023 Gustavo A. R. Silva <gustavoars@kernel.org>

bcachefs: Replace zero-length arrays with flexible-array members

Fake flexible arrays (zero-length and one-element arrays) are
deprecated, and should be replaced by flexible-array members.

So, replace zero-length arrays with flexible-array members
in multiple structures.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 74644030 27-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: count_event()

Small helper for event counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cb13f471 02-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_write_buffer_flush() -> bch2_btree_write_buffer_tryflush()

More accurate naming.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dafff7e5 23-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New bucket sector count helpers

This introduces bch2_bucket_sectors() and bch2_bucket_sectors_dirty(),
prep work for separately accounting stripe sectors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ba11c7d6 20-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: BCH_DATA_OP_drop_extra_replicas

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3c843a67 20-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert bch2_move_btree() to bbpos

Minor cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 01e95645 20-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: x-macro-ify bch_data_ops enum

This will let us add an enum -> string table for a to_text() fn.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 415e5107 28-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Extra kthread_should_stop() calls for copygc

This fixes a bug where going read-only was taking longer than it should
have due to copygc forgetting to check kthread_should_stop()

Additionally: fix a missing is_kthread check in bch2_move_ratelimit().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1b1bd0fd 26-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: -EROFS doesn't count as move_extent_start_fail

The automated tests check if we've hit too many slowpath/error path
events and fail the test - if we're just shutting down, that naturally
shouldn't count.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ae4d612c 26-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: trace_move_extent_start_fail() now includes errcode

Renamed from trace_move_extent_alloc_mem_fail, because there are other
reasons we colud fail (disk space allocation failure).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7d9f8468 24-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Data update path won't accidentaly grow replicas

Previously, there was a bug where if an extent had greater durability
than required (because we needed to move a durability=1 pointer and
ended up putting it on a durability 2 device), we would submit a write
for replicas=2 - the durability of the pointer being rewritten - instead
of the number of replicas required to bring it back up to the
data_replicas option.

This, plus the allocation path sometimes allocating on a greater
durability device than requested, meant that extents could continue
having more and more replicas added as they were being rewritten.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 261af2f1 22-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Make sure bch2_move_ratelimit() also waits for move_ops

This adds move_ctxt_wait_event_timeout(), which can sleep for a timeout
while also issueing pending moves as reads complete.

Co-developed-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 50e029c6 20-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_moving_ctxt_flush_all()

Introduce a new helper to flush all move IOs, and use it in a few places
where we should have been.

The new helper also drops btree locks before waiting on outstanding move
writes, avoiding potential deadlocks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f82755e4 30-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Data move path now uses bch2_trans_unlock_long()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96a363a7 23-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: move: move_stats refactoring

data_progress_list is gone - it was redundant with moving_context_list

The upcoming rebalance rewrite is going to have it using two different
move_stats objects with the same moving_context, depending on whether
it's scanning or using the rebalance_work btree - this patch plumbs
stats around a bit differently so that will work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d5eade93 23-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: move: convert to bbpos

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 63316903 20-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: moving_context now owns a btree_trans

btree_trans and moving_context are used together, and having the
moving_context owns the transaction object reduces some plumbing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a0bfe3b0 20-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: move.c exports, refactoring

Prep work for the new rebalance code - we need a few helpers exported.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 84809057 21-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve io option handling in data move path

The data move path now correctly picks IO options when inodes in
different snapshots have different options applied.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 88dfe193 19-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_id_str()

Since we can run with unknown btree IDs, we can't directly index btree
IDs into fixed size arrays.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 40a53b92 19-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More minor smatch fixes

- fix a few uninitialized return values
- return a proper error code in lookup_lostfound()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6bd68ec2 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Heap allocate btree_trans

We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96dea3d5 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix W=12 build errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1809b8cb 10-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Break up io.c

More reorganization, this splits up io.c into
- io_read.c
- io_misc.c - fallocate, fpunch, truncate
- io_write.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9d2a7bd8 23-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve bch2_moving_ctxt_to_text()

Print more information out about moving contexts - fold in the output of
the redundant bch2_data_jobs_to_text(), and also include information
relevant to whether move_data() should be blocked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# faa6cb6c 28-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Allow for unknown btree IDs

We need to allow filesystems with metadata from newer versions to be
mountable and usable by older versions.

This patch enables us to roll out new btrees without a new major version
number; we can now handle btree roots for unknown btree types.

The unknown btree roots will be retained, and fsck (including
backpointers) will check them, the same as other btree types.

We add a dynamic array for the extra, unknown btree roots, in addition
to the fixed size btree root array, and add new helpers for looking up
btree roots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1bb3c2a9 20-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New error message helpers

Add two new helpers for printing error messages with __func__ and
bch2_err_str():
- bch_err_fn
- bch_err_msg

Also kill the old error strings in the recovery path, which were causing
us to incorrectly report memory allocation failures - they're not needed
anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e47a390a 27-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert -ENOENT to private error codes

As with previous conversions, replace -ENOENT uses with more informative
private error codes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a49bd8c0 14-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Delete an incorrect bch2_trans_unlock()

These deletes a bch2_trans_unlock() call from __bch2_move_data(). It was
redundant; bch2_move_extent() has the correct unlock call, and it was
buggy because when move_extent calls bch2_extent_drop_ptrs() we don't
want the transaction to be unlocked yet - this fixes a btree_iter.c
assertion.

Fixes https://github.com/koverstreet/bcachefs/issues/511.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dbda63bb 30-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_bkey_make_mut() now calls bch2_trans_update()

It's safe to call bch2_trans_update with a k/v pair where the value
hasn't been filled out, as long as the key part has been and the value
is filled out by transaction commit time.

This patch folds the bch2_trans_update() call into bch2_bkey_make_mut(),
eliminating a bit of boilerplate.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1af5227c 21-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill bch2_verify_bucket_evacuated()

With backpointers, it's now impossible for bch2_evacuate_bucket() to be
completely reliable: it can race with an extent being partially
overwritten or split, which needs a new write buffer flush for the
backpointer to be seen.

This shouldn't be a real issue in practice; the previous patch added a
new tracepoint so we'll be able to see more easily if it is.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5a21764d 20-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve move path tracepoints

Move path tracepoints now include the key being moved. Also, add new
tracepoints for the start of move_extent, and evacuate_bucket.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 62a03559 31-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rip out code for storing backpointers in alloc keys

We don't store backpointers in alloc keys anymore, since we gained the
btree write buffer.

This patch drops support for backpointers in alloc keys, and revs the on
disk format version so that we know a fsck is required.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6bdefe9c 29-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Use BTREE_ITER_INTENT in ec_stripe_update_extent()

This adds a flags param to bch2_backpointer_get_key() so that we can
pass BTREE_ITER_INTENT, since ec_stripe_update_extent() is updating the
extent immediately.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ffc76edb 27-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix bch2_verify_bucket_evacuated()

We were going into an infinite loop when printing out backpointers, due
to never incrementing bp_offset - whoops.

Also limit the number of backpointers we print to 10; this is debug code
and we only need to print a sample, not all of them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d59ca7e8 19-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: verify_bucket_evacuated() -> set_btree_iter_dontneed()

This should help with excessive 'would deadlock' transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e36e572 19-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix an unhandled transaction restart error

This is a bit awkward: we're passing around a btree_trans, but we're not
in a context where transaction restarts are handled - we should try to
come up with a better way to denote situations like this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b40901b0 13-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New erasure coding shutdown path

This implements a new shutdown path for erasure coding, which is needed
for the upcoming BCH_WRITE_WAIT_FOR_EC write path.

The process is:
- Cancel new stripes being built up
- Close out/cancel open buckets on write points or the partial list
that are for stripes
- Shutdown rebalance/copygc
- Then wait for in flight new stripes to finish

With BCH_WRITE_WAIT_FOR_EC, move ops will be waiting on stripes to fill
up before they complete; the new ec shutdown path is needed for shutting
down copygc/rebalance without deadlocking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b9fa375b 11-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_fs_moving_ctxts_to_text()

This also adds bch2_write_op_to_text(): now we can see outstand moves,
useful for debugging shutdown with the upcoming BCH_WRITE_WAIT_FOR_EC
and likely for other things in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3f5d3fb4 10-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: evacuate_bucket() no longer moves cached ptrs

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5bf9db01 10-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: evacuate_bucket() no longer calls verify_bucket_evacuated()

The copygc code itself now calls this when all moves from a given bucket
are complete.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8fcdf814 27-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improved copygc pipelining

This improves copygc pipelining across multiple buckets: we now track
each in flight bucket we're evacuating, with separate moving_contexts.

This means that whereas previously we had to wait for outstanding moves
to complete to ensure we didn't try to evacuate the same bucket twice,
we can now just check buckets we want to evacuate against the pending
list.

This also mean we can run the verify_bucket_evacuated() check without
killing pipelining - meaning it can now always be enabled, not just on
debug builds.

This is going to be important for the upcoming erasure coding work,
where moving IOs that are being erasure coded will now skip the initial
replication step; instead the IOs will wait on the stripe to complete.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2f528663 04-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: moving_context->stats is allowed to be NULL

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e9b70146 22-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't call bch2_trans_update() unlocked

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 80c33085 05-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fragmentation LRU

Now that we have much more efficient updates to the LRU btree, this
patch adds a new LRU that indexes buckets by fragmentation.

This means copygc no longer has to scan every bucket to find buckets
that need to be evacuated.

Changes:
- A new field in bch_alloc_v4, fragmentation_lru - this corresponds to
the bucket's position in the fragmentation LRU. We add a new field
for this instead of calculating it as needed because we may make the
fragmentation LRU optional; this field indicates whether a bucket is
on the fragmentation LRU.

Also, zoned devices will introduce variable bucket sizes; explicitly
recording the LRU position will be safer for them.

- A new copygc path for using the fragmentation LRU instead of
scanning every bucket and building up an in-memory heap.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 429dd427 12-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix verify_bucket_evacuated()

This fixes an incorrectly handled transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c782c583 08-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add max nr of IOs in flight to the move path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7ffb6a7e 02-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix deadlock on nocow locks in data move path

The recent nocow locking rework introduced a deadlock in the data move
path: the new nocow locking scheme uses a hash table with a fixed size
array for chaining, meaning on hash collision we may have to wait for
other locks to be released before we can lock a bucket.

And since the data move path needs to submit writes from the same thread
that's taking nocow locks and submitting reads, this introduces a
deadlock.

This shouldn't happen often in practice, but since the data move path
can keep large numbers of IOs in flight simultaneously, it's something
we have to handle.

This patch makes move_ctxt_wait_event() available to
bch2_data_update_init() and uses it when appropriate, which is our
normal solution to this kind of thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a8b3a677 02-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Nocow support

This adds support for nocow mode, where we do writes in-place when
possible. Patch components:

- New boolean filesystem and inode option, nocow: note that when nocow
is enabled, data checksumming and compression are implicitly disabled

- To prevent in-place writes from racing with data moves
(data_update.c) or bucket reuse (i.e. a bucket being reused and
re-allocated while a nocow write is in flight, we have a new locking
mechanism.

Buckets can be locked for either data update or data move, using a
fixed size hash table of two_state_shared locks. We don't have any
chaining, meaning updates and moves to different buckets that hash to
the same lock will wait unnecessarily - we'll want to watch for this
becoming an issue.

- The allocator path also needs to check for in-place writes in flight
to a given bucket before giving it out: thus we add another counter
to bucket_alloc_state so we can track this.

- Fsync now may need to issue cache flushes to block devices instead of
flushing the journal. We add a device bitmask to bch_inode_info,
ei_devs_need_flush, which tracks devices that need to have flushes
issued - note that this will lead to unnecessary flushes when other
codepaths have already issued flushes, we may want to replace this with
a sequence number.

- New nocow write path: look up extents, and if they're writable write
to them - otherwise fall back to the normal COW write path.

XXX: switch to sequence numbers instead of bitmask for devs needing
journal flush

XXX: ei_quota_lock being a mutex means bch2_nocow_write_done() needs to
run in process context - see if we can improve this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4dcd1cae 13-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Data update support for unwritten extents

The data update path requires special support for unwritten extents - we
still need to be able to move them, but there's no need to read or write
anything.

This patch adds a new error code to tell bch2_move_extent() that we're
short circuiting the read, and adds bch2_update_unwritten_extent() to
create a reservation then call __bch2_data_update_index_update().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 53b1c6f4 14-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't use key cache during fsck

The btree key cache mainly helps with lock contention, at the cost of
additional memory overhead. During some fsck passes the memory overhead
really matters, but fsck is single threaded so lock contention is an
issue - so skipping the key cache during fsck will help with
performance.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8e3f913e 17-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Copygc now uses backpointers

Previously, copygc needed to walk the entire extents & reflink btrees to
find extents that needed to be moved.

Now that we have backpointers, this patch implements
bch2_evacuate_bucket() in the move code, which copygc now uses for
evacuating mostly empty buckets.

Also, thanks to the new backpointers code, copygc can now move btree
nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d94189ad 08-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Debug mode for c->writes references

This adds a debug mode where we split up the c->writes refcount into
distinct refcounts for every codepath that takes a reference, and adds
sysfs code to print the value of each ref.

This will make it easier to debug shutdown hangs due to refcount leaks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 01ad6737 23-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_inode_opts_get()

This improves io_opts() and makes it a non-inline function - it's big
enough that it probably shouldn't be.

Also, bch_io_opts no longer needs fields for whether options are
defined, so we can slim it down a bit.

We'd like to stop passing around the full bch_io_opts, but that'll be
tricky because of bch2_rebalance_add_key().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 858536c7 11-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert EROFS errors to private error codes

More error code improvements - this gets us more useful error messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 994ba475 23-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New btree helpers

This introduces some new conveniences, to help cut down on boilerplate:

- bch2_trans_kmalloc_nomemzero() - performance optimiation
- bch2_bkey_make_mut()
- bch2_bkey_get_mut()
- bch2_bkey_get_mut_typed()
- bch2_bkey_alloc()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e88a75eb 24-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New bpos_cmp(), bkey_cmp() replacements

This patch introduces
- bpos_eq()
- bpos_lt()
- bpos_le()
- bpos_gt()
- bpos_ge()

and equivalent replacements for bkey_cmp().

Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b2d1d56b 13-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fixes for building in userspace

- Marking a non-static function as inline doesn't actually work and is
now causing problems - drop that

- Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages
with bcachefs (filesystem name)

- Userspace doesn't have real percpu variables (maybe we can get this
fixed someday), put an #ifdef around bch2_disk_reservation_add()
fastpath

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e3e02e6 19-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted checkpatch fixes

checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:

ASSIGN_IN_IF
UNSPECIFIED_INT - bcachefs coding style prefers single token type names
NEW_TYPEDEFS - typedefs are occasionally good
FUNCTION_ARGUMENTS - we prefer to look at functions in .c files
(hopefully with docbook documentation), not .h
file prototypes
MULTISTATEMENT_MACRO_USE_DO_WHILE
- we have _many_ x-macros and other macros where
we can't do this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1be88797 09-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Handle dropping pointers in data_update path

Cached pointers are generally dropped, not moved: this led to an
assertion firing in the data update path when there were no new replicas
being written.

This path adds a data_options field for pointers to be dropped, and
tweaks move_extent() to check if we're only dropping pointers, not
writing new ones, before kicking off a data update operation.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 674cfc26 26-Aug-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add persistent counters for all tracepoints

Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 549d173c 17-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: EINTR -> BCH_ERR_transaction_restart

Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d4bf5eec 18-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bch2_err_str() in error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 7c0732b8 29-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix move path when move_stats == NULL

This isn't done very often, but it is legitimate

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 4081ace3 20-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Get ref on c->writes in move.c

There's no point reading an extent in order to move it if the write is
going to fail because we're shutting down. This patch changes the move
path so that moving_io now owns a ref on c->writes - as a bonus,
rebalance and copygc will now notice that we're shutting down and exit
quicker.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 0337cc7e 20-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: move.c refactoring

- add bch2_moving_ctxt_(init|exit)
- split out __bch2_evacutae_bucket() which takes an existing
moving_ctxt, this will be used for improving copygc performance by
pipelining across multiple buckets

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c91996c5 15-Jun-2022 Daniel Hill <daniel@gluo.nz>

bcachefs: data jobs, including rebalance wait for copygc.

move_ratelimit() now has a bool that specifies whether we want to
wait for copygc to finish.

When copygc is running, we're probably low on free buckets instead
of consuming the remaining buckets, we want to wait for copygc to
finish.

This should help with performance, and run away bucket fragmentation.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7f5c5d20 13-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Redo data_update interface

This patch significantly cleans up and simplifies the data_update
interface. Instead of only being able to specify a single pointer by
device to rewrite, we're now able to specify any or all of the pointers
in the original extent to be rewrited, as a bitmask.

data_cmd is no more: the various pred functions now just return true if
the extent should be moved/updated. All the data_update path does is
rewrite existing replicas, or add new ones.

This fixes a bug where with background compression on replicated
filesystems, where rebalance -> data_update would incorrectly drop the
wrong old replica, and keep trying to recompress an extent pointer and
each time failing to drop the right replica. Oops.

Now, the data update path doesn't look at the io options to decide which
pointers to keep and which to drop - it only goes off of the
data_update_options passed to it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c501fef6 13-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Pull out data_update.c

This is the start of reorganizing the data IO paths. The plan is to also
break apart io.c into data_read.c and data_write.c, and migrate_write
will be renamed to the data_update path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7bb61e8c 19-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make IO in flight by copygc/rebalance configurable

This adds a new option, move_bytes_in_flight, for configuring the amount
of IO in flight by copygc/rebalance - users with many devices in their
filesystem will want to increase this.

In the future we should be smarter about this, but this is an easy
improvement.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 104c6974 15-Mar-2022 Daniel Hill <daniel@gluo.nz>

bcachefs: Add persistent counters

This adds a new superblock field for persisting counters
and adds a sysfs interface in counters/ exposing these counters.

The superblock field is ignored by older versions letting us avoid
an on disk version bump.

Each sysfs file outputs a counter that tracks since filesystem
creation and a counter for the current mount session.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1f93726e 17-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tracepoint improvements

Delete some obsolete tracepoints, organize alloc tracepoints better,
make a few tracepoints more consistent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c0960603 17-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Shutdown path improvements

We're seeing occasional firings of the assertion in the key cache
shutdown code that nr_dirty == 0, which means we must sometimes be doing
transaction commits after we've gone read only.

Cleanups & changes:
- BCH_FS_ALLOC_CLEAN renamed to BCH_FS_CLEAN_SHUTDOWN
- new helper bch2_btree_interior_updates_flush(), which returns true if
it had to wait
- bch2_btree_flush_writes() now also returns true if there were btree
writes in flight
- __bch2_fs_read_only now checks if btree writes were in flight in the
shutdown loop: btree write completion does a transaction update, to
update the pointer in the parent node
- assert that !BCH_FS_CLEAN_SHUTDOWN in __bch2_trans_commit

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d905f67e 15-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Copygc allocations shouldn't be nowait

We don't actually want copygc allocations to be nowait - an allocation
for copygc might fail and then later succeed due to a bucket needing to
wait on journal commit, or to be discarded.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 3e154711 13-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: x-macroize alloc_reserve enum

This makes an array of strings available, like our other enums.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 91d961ba 29-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: darrays

Inspired by CCAN darray - simple, stupid resizable (dynamic) arrays.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f61816d0 21-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a use after free

In move_read_endio, we were checking if the next pending write has its
read completed - but this can turn after a use after free (and we were
accessing the list without a lock), so instead just better to just
unconditionally do the wakeup.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# eb331fe5 14-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check for stale dirty pointer before reads

Since we retry reads when we discover we read from a pointer that went
stale, if a dirty pointer is erroniously stale it would cause us to loop
retrying that read forever - unless we check before issuing the read,
while the btree is still locked, when we know that a dirty pointer
should never be stale.

This patch adds that check, along with printing some helpful debug info.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 52eef42c 15-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix locking in data move path

We need to ensure we don't have any btree locks held when calling
do_pending_writes() - besides issuing IOs, upcoming allocator changes
will have allocations doing btree lookups directly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8ede9910 09-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Handle transaction restarts in __bch2_move_data()

We weren't checking for -EINTR in the main loop in __bch2_move_data -
this code predates modern transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# bf0fdb4d 09-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't erasure code cached ptrs

It doesn't make much sense to be erasure coding cached pointers, we
should be erasure coding one of the dirty pointers in an extent. This
patch makes sure we're passing BCH_WRITE_CACHED when we expect the new
pointer to be a cached pointer, and tweaks the write path to not
allocate from a stripe when BCH_WRITE_CACHED is set - and fixes an
assertion we were hitting in the ec path where when adding the stripe to
an extent and deleting the other pointers the pointer to the stripe
didn't exist (because dropping all dirty pointers from an extent turns
it into a KEY_TYPE_error key).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 47b15c57 04-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix copygc sectors_to_move calculation

With erasure coding, copygc's count of sectors to move was off, which
matters for the debug statement it prints out when it's not able to move
all the data it tried to.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 3e52c222 29-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add journal_seq to inode & alloc keys

Add fields to inode & alloc keys that record the journal sequence number
when they were most recently modified.

For alloc keys, this is needed to know what journal sequence number we
have to flush before the bucket can be reused. Currently this is tracked
in memory, but we'll be getting rid of the in memory bucket array.

For inodes, this is needed for fsync when the inode has been evicted
from the vfs cache. Currently we use a bloom filter per outstanding
journal buf - but that mechanism has been broken since we added the
ability to not issue a flush/fua for every journal write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 961b2d62 29-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Assorted ec fixes

- The backpointer that ec_stripe_update_ptrs() uses now needs to include
the snapshot ID, which means we have to change where we add the
backpointer to after getting the snapshot ID for the new extents

- ec_stripe_update_ptrs() needs to be calling bch2_trans_begin()

- improve error message in bch2_mark_stripe()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 23af498c 24-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Ensure we flush btree updates in evacuate path

This fixes a possible race where we fail to remove a device because of
btree nodes still on it, that are being deleted by in flight btree
updates.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f3cf0999 24-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_btree_node_rewrite() now returns transaction restarts

We have been getting away from handling transaction restarts locally -
convert bch2_btree_node_rewrite() to the newer style.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b0d1b70a 24-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Must check for errors from bch2_trans_cond_resched()

But we don't need to call it from outside the btree iterator code
anymore, since it's called by bch2_trans_begin() and
bch2_btree_path_traverse().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b71717da 19-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Handle transaction restarts in bch2_blacklist_entries_gc()

It shouldn't be necessary when we're only using a single iterator and
not doing updates, but that's harder to debug at the moment.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9a796fdb 19-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_exit() no longer returns errors

Now that peek_node()/next_node() are converted to return errors
directly, we don't need bch2_trans_exit() to return errors - it's
cleaner this way and wasn't used much anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d355c6f4 19-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: for_each_btree_node() now returns errors directly

This changes for_each_btree_node() to work like for_each_btree_key(),
and to that end bch2_btree_iter_peek_node() and next_node() also return
error ptrs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b9a7d8ac 13-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix implementation of KEY_TYPE_error

When force-removing a device, we were silently dropping extents that we
no longer had pointers for - we should have been switching them to
KEY_TYPE_error, so that reads for data that was lost return errors.

This patch adds the logic for switching a key to KEY_TYPE_error to
bch2_bkey_drop_ptr(), and improves the logic somewhat.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e8bde78a 12-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix rereplicate_pred()

It was switching off of the key type incorrectly - this code must've
been quite old, and not rereplicating anything that wasn't a
btree_ptr_v1 or a plain old extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 4b09ef12 07-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_move_btree()

bch2_trans_begin() is now required for transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 18443cb9 04-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Update data move path for snapshots

The data move path operates on existing extents, and not within a
subvolume as the regular IO paths do. It needs to change because it may
cause existing extents to be split, and when splitting an existing
extent in an ancestor snapshot we need to make sure the new split has
the same visibility in child snapshots as the existing extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 6fed42bb 15-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Plumb through subvolume id

To implement snapshots, we need every filesystem btree operation (every
btree operation without a subvolume) to start by looking up the
subvolume and getting the current snapshot ID, with
bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing
btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode.

This patch adds those bch2_subvolume_get_snapshot() calls, and also
switches to passing around a subvol_inum instead of just an inode
number.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 67e0dd8f 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_path

This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8f54337d 03-Sep-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix initialization of bch_write_op.nonce

If an extent ends up with a replica that is encrypted an a replica that
isn't encrypted (due the user changing options), and then
copygc/rebalance moves one of the replicas by reading from the
unencrypted replica, we had a bug where we wouldn't correctly initialize
op->nonce - for each crc field in an extent, crc.offset + crc.nonce must
be equal.

This patch fixes that by moving op.nonce initialization to
bch2_migrate_write_init.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 5f8077cc 29-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BTREE_ITER_SET_POS_AFTER_COMMIT

BTREE_ITER_SET_POS_AFTER_COMMIT is used internally to automagically
advance extent btree iterators on sucessful commit.

But with the upcomnig btree_path patch it's getting more awkward to
support, and it adds overhead to core data structures that's only used
in a few places, and can be easily done by the caller instead.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8dd6ed94 23-Jul-2021 Brett Holman <bholman.devel@gmail.com>

bcachefs: add progress stats to sysfs

This adds progress stats to sysfs for copygc, rebalance, recovery, and the
cmd_job ioctls.

Signed-off-by: Brett Holman <bholman.devel@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 700c25b3 24-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bch2_trans_begin() more consistently

Upcoming patch will require that a transaction restart is always
immediately followed by bch2_trans_begin().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8b3e9bd6 24-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Always check for transaction restarts

On transaction restart iterators won't be locked anymore - make sure
we're always checking for errors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e3a67bdb 10-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Regularize argument passing of btree_trans

btree_trans should always be passed when we have one - iter->trans is
disfavoured. This mainly updates old code in btree_update_interior.c,
some of which predates btree_trans.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 618b1c0e 05-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out SPOS_MAX

Internal btree code really wants a POS_MAX with all fields ~0; external
code more likely wants the snapshot field to be 0, because when we're
passing it to bch2_trans_get_iter() it's used for the snapshot we're
operating in, which should be 0 for most btrees that don't use
snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# bc3f8b25 01-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check for errors from bch2_trans_update()

Upcoming refactoring is going to change bch2_trans_update() to start
returning transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c0ebe3e4 23-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Assorted endianness fixes

Found by sparse

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9f311f21 29-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't use bch_write_op->cl for delivering completions

We already had op->end_io as an alternative mechanism to op->cl.parent
for delivering write completions; this switches all code paths to using
op->end_io.

Two reasons:
- op->end_io is more efficient, due to fewer atomic ops, this completes
the conversion that was originally only done for the direct IO path.
- We'll be restructing the write path to use a different mechanism for
punting to process context, refactoring to not use op->cl will make
that easier.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# af171183 28-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill bch_write_op.index_update_fn

This deletes bch_write_op.index_update_fn: indirect function calls have
gotten considerably more expensive post spectre/meltdown, and we only
have two different index_update_fns - this patch adds a flag to specify
which one to use (normal vs. data move path).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 443d2760 23-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a null ptr deref

bch2_btree_iter_peek() won't always return a key - whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 7b7278bb 20-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix two btree iterator leaks

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e95d7edf 18-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Preallocate trans mem in bch2_migrate_index_update()

This will help avoid transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 883d9701 16-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use bch2_inode_find_by_inum() in move.c

Since move.c isn't aware of what subvolume we're in, we can't use the
standard inode lookup code - fortunately, we're just using it for
reading IO options.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e0ba3b64 21-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Replace bch2_btree_iter_next() calls with bch2_btree_iter_advance

The way btree iterators work internally has been changing, particularly
with the iter->real_pos changes, and bch2_btree_iter_next() is no longer
hyper optimized - it's just advance followed by peek, so it's more
efficient to just call advance where we're not using the return value of
bch2_btree_iter_next().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 50dc0f69 19-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Require all btree iterators to be freed

We keep running into occasional bugs with btree transaction iterators
overflowing - this will make those bugs more visible.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5ff75ccb 14-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix read retry path for indirect extents

In the read path, for retry of indirect extents to work we need to
differentiate between the location in the btree the read was for, vs.
the location where we found the data. This patch adds that plumbing to
bch_read_bio.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 41f8b09e 20-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename BTREE_ID enums for consistency with other enums

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 19dd3172 04-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use x-macros for compat feature bits

This is to generate strings for them, so that we can print them out.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e01dacf7 20-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bkey format generation for 32 bit fields

Having a packed format that can represent a field larger than the
unpacked type breaks bkey_packed_successor() assertions - we need to fix this to start using the snapshot filed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a4805d66 22-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Scan for old btree nodes if necessary on mount

We dropped support for !BTREE_NODE_NEW_EXTENT_OVERWRITE but it turned
out there were people who still had filesystems with btree nodes in that
format in the wild. This adds a new compat feature that indicates we've
scanned for and rewritten nodes in the old format, and does that scan at
mount time if the option isn't set.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1889ad5a 14-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add code to scan for/rewite old btree nodes

This adds a new data job type to scan for btree nodes in the old extent
format, and rewrite them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 07a1006a 17-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reduce/kill BKEY_PADDED use

With various newer key types - stripe keys, inline data extents - the
old approach of calculating the maximum size of the value is becoming
more and more error prone. Better to switch to bkey_on_stack, which can
dynamically allocate if necessary to handle any size bkey.

In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 35a067b4 14-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Change when we allow overwrites

Originally, we'd check for -ENOSPC when getting a disk reservation
whenever the new extent took up more space on disk than the old extent.

Erasure coding screwed this up, because with erasure coding writes are
initially replicated, and then in the background the extra replicas are
dropped when the stripe is created. This means that with erasure coding
enabled, writes will always take up more space on disk than the data
they're overwriting - but, according to posix, overwrites aren't
supposed to return ENOSPC.

So, in this patch we fudge things: if the new extent has more replicas
than the _effective_ replicas of the old extent, or if the old extent is
compressed and the new one isn't, we check for ENOSPC when getting the
disk reservation - otherwise, we don't.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3187aa8d 21-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much

Previously, we were using BTREE_INSERT_RESERVE in a lot of places where
it no longer makes sense.

- we now have more open_buckets than we used to, and the reserves work
better, so we shouldn't need to use BTREE_INSERT_RESERVE just because
we're holding open_buckets pinned anymore.

- We have the btree key cache for updates to the alloc btree, meaning
we no longer need the btree reserve to ensure the allocator can make
forward progress.

This means that we should only need a reserve for btree updates to
ensure that copygc can make forward progress.

Since it's now just for copygc, we can also fold RESERVE_BTREE into
RESERVE_MOVINGGC (the allocator's freelist reserve).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f0e70018 20-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix iterator overflow in move path

The move path was calling bch2_bucket_io_time_reset() for cached
pointers (which it shouldn't have been), and then not calling
bch2_trans_reset() when it got -EINTR (indicating transaction restart).
Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f30dd860 16-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't write bucket IO time lazily

With the btree key cache code, we don't need to update the alloc btree
lazily - and this will mean we can remove the bch2_alloc_write() call in
the shutdown path.

Future work: we really need to expend the bucket IO clocks from 16 to 64
bits, so that we don't have to rescale them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b88e971e 22-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't drop replicas when copygcing ec data

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6ea873d1 15-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix copygc of compressed data

The check for when we need to get a disk reservation was wrong.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 922ae9f4 10-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Copy ptr->cached when migrating data

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 74ed7e56 21-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't let copygc buckets be stolen by other threads

And assorted other copygc fixes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8f3b41ab 11-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't restrict copygc writes to the same device

This no longer makes any sense, since copygc is now one thread per
filesystem, not per device, with a single write point.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 89fd25be 09-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use x-macros for data types

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 784d8d17 03-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve warning for copygc failing to move data

This will help narrow down which code is at fault when this happens.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 00b8ccf7 25-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Interior btree updates are now fully transactional

We now update the alloc info (bucket sector counts) atomically with
journalling the update to the interior btree nodes, and we also set new
btree roots atomically with the journalled part of the btree update.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2c480a71 24-Apr-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Handle -EINTR bch2_migrate_index_update()

peek_slot() shouldn't return -EINTR when there's only a single live
iterator, but that's tricky to guarantee - we seem to be returning
-EINTR when we shouldn't, but it's easy enough to handle in the caller.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a7b46a3d 09-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't log errors that are expected during shutdown

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ab05de4c 23-Feb-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Track incompressible data

This fixes the background_compression option: wihout some way of marking
data as incompressible, rebalance will keep rewriting incompressible
data over and over.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2d594dfb 31-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out btree_trigger_flags

The trigger flags really belong with individual btree_insert_entries,
not the transaction commit flags - this splits out those flags and
unifies them with the BCH_BUCKET_MARK flags. Todo - split out
btree_trigger.c from buckets.c

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c3ff72c 28-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert some enums to x-macros

Helps for preventing things from getting out of sync.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 58e2388f 22-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BTREE_INSERT_ATOMIC

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5934a0ca 20-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bkey_on_stack_reassemble()

Small helper function.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4de77495 16-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reorganize extents.c

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 085ab693 09-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rework of cut_front & cut_back

This changes bch2_cut_front and bch2_cut_back so that they're able to
shorten the size of the value, and it also changes the extent update
path to update the accounting in the btree node when this happens.

When the size of the value is shortened, they zero out the space that's
no longer used, so it's interpreted as noops (as implemented in the last
patch).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 35189e09 09-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bkey_on_stack

This implements code for storing small bkeys on the stack and allocating
out of a mempool if they're too big.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bd09d268 04-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Only look up inode io opts in extents btree

We currently don't have a way to propagate inode io opts to indirect
extents. This is a problem...

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a7199432 22-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill deferred btree updates

Will be replaced by cached btree iterators

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8d84260e 07-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: data move path should not be trying to move reflink_p keys

This was spotted when the move_extent() path tried to allocate a bio for
a reflink_p extent, but adding pages to the bio failed because we
overflowed bi_max_vecs. Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b50dd792 07-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a null ptr deref

rbio->c wasn't being initialized in the move path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 76426098 16-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reflink

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 99aaf570 25-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor various code to not be extent specific

With reflink, various code now has to handle both KEY_TYPE_extent
or KEY_TYPE_reflink_v - so, convert it to be generic across all keys
with pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 06ed8558 08-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add offset_into_extent param to bch2_read_extent()

With reflink, we'll no longer be able to calculate the offset of the
data we want into the extent we're reading from from the extent pos and
the iter pos - we'll have to pass it in separately.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 20bceecb 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More work to avoid transaction restarts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ae0ff7b8 30-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Redo replicas gc mechanism

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 932aa837 11-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_mark_update()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 94f651e2 17-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Return errors from for_each_btree_key()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0f238367 27-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trans_for_each_iter()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 424eb881 25-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Only get btree iters from btree transactions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0564b167 13-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: convert bch2_btree_insert_at() usage to bch2_trans_commit()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7ef2a73a 21-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix check for if extent update is allocating

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# db636adb 04-Dec-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Compression fixes

regressions from switching disk space accounting to be in compressed
sectors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 26609b61 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make bkey types globally unique

this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.

Also improve the on disk format versioning stuff.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1d25849c 07-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Centralize marking of replicas in btree update path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4628529f 04-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Disk usage in compressed sectors, not uncompressed

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 103e2127 30-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: replicas: prep work for stripes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 71c9e0ba 27-Sep-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_extent_ptr_decoded_append()

This new helper for the move path avoids creating a new CRC entry when
we already have one that matches the pointer being added.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a2753581 30-Sep-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_extent_drop_ptrs()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1742237b 27-Sep-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: extent_for_each_ptr_decode()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7b3f84ea 05-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out alloc_background.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f43cc5be 30-Sep-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix failure to suspend

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c2fcff59 25-Sep-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix suspend when moving data faster than ratelimit

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fc3268c1 08-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: kill extent_insert_hook

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c6fdbd8 17-Mar-2017 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Initial commit

Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>