History log of /linux-master/fs/bcachefs/btree_iter.c
Revision Date Author Comments
# 79032b07 23-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improved topology repair checks

Consolidate bch2_gc_check_topology() and btree_node_interior_verify(),
and replace them with an improved version,
bch2_btree_node_check_topology().

This checks that children of an interior node correctly span the full
range of the parent node with no overlaps.

Also, ensure that topology repairs at runtime are always a fatal error;
in particular, this adds a check in btree_iter_down() - if we don't find
a key while walking down the btree that's indicative of a topology error
and should be flagged as such, not a null ptr deref.

Some checks in btree_update_interior.c remaining BUG_ONS(), because we
already checked the node for topology errors when starting the update,
and the assertions indicate that we _just_ corrupted the btree node -
i.e. the problem can't be that existing on disk corruption, they
indicate an actual algorithmic bug.

In the future, we'll be annotating the fsck errors list with which
recovery pass corrects them; the open coded "run explicit recovery pass
or fatal error" in bch2_btree_node_check_topology() will in the future
be done for every fsck_err() call.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 36f9ef10 24-Mar-2024 Hongbo Li <lihongbo22@huawei.com>

bcachefs: fix trans->mem realloc in __bch2_trans_kmalloc

The old code doesn't consider the mem alloced from mempool when call
krealloc on trans->mem. Also in bch2_trans_put, using mempool_free to
free trans->mem by condition "trans->mem_bytes == BTREE_TRANS_MEM_MAX"
is inaccurate when trans->mem was allocated by krealloc function.
Instead, we use used_mempool stuff to record the situation, and realloc
or free the trans->mem in elegant way.

Also, after krealloc failed in __bch2_trans_kmalloc, the old data
should be copied to the new buffer when alloc from mempool_alloc.

Fixes: 31403dca5bb1 ("bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5ca8ff15 12-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Use kvzalloc() when dynamically allocating btree paths

THis silences a mm/page_alloc.c warning about allocating more than a
page with GFP_NOFAIL - and there's no reason for this to not have a
vmalloc fallback anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 83bd5985 09-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Track iter->ip_allocated at bch2_trans_copy_iter()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3254c1b0 09-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Save key_cache_path in peek_slot()

When bch2_btree_iter_peek_slot() clones the iterator to search for the
next key, and then discovers that the key from the cloned iterator is
the key we want to return - we also want to save the
iter->key_cache_path as well, for the update path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 52946d82 06-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill more -EIO error codes

This converts -EIOs related to btree node errors to private error codes,
which will help with some ongoing debugging by giving us better error
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fc634d8e 22-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: btree_and_journal_iter.trans

we now always have a btree_trans when using a btree_and_journal_iter;
prep work for adding prefetching to btree_and_journal_iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fadc6067 24-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Set path->uptodate when no node at level

We were failing to set path->uptodate when reaching the end of a btree
node iterator, causing the new prefetch code for backpointers gc to go
into an infinite loop.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ba89083e 08-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix journal replay with unreadable btree roots

When a btree root is unreadable, we still might be able to get some data
back by replaying what's in the journal. Previously though, we got
confused when journal replay would attempt to replay a key for a level
that didn't exist.

This adds bch2_btree_increase_depth(), so that journal replay can handle
this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 204f4514 24-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree

If we're in FILTER_SNAPSHOTS mode and we start scanning a range of the
keyspace where no keys are visible in the current snapshot, we have a
problem - we'll scan for a very long time before scanning terminates.

Awhile back, this was fixed for most cases with peek_upto() (and
assertions that enforce that it's being used).

But the fix missed the fact that the inodes btree is different - every
key offset is in a different snapshot tree, not just the inode field.

Fixes:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b97de453 15-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve trace_trans_restart_relock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 49a5192c 03-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add an option to control btree node prefetching

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 89056f24 23-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: track transaction durations

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 83322e8c 23-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: btree_trans always has stats

reserve slot 0 for unknown (when we overflow), to avoid some branches

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c558c577 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_trans_peek_slot_updates

refactoring the BTREE_ITER_WITH_UPDATES code, prep for removing the flag
and making it always-on

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 359e89ad 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_trans_peek_prev_updates

bch2_btree_iter_peek_prev() now supports BTREE_ITER_WITH_UPDATES

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# eb686359 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_trans_peek_updates

refactoring the BTREE_ITER_WITH_UPDATES code, prep for removing the flag
and making it always-on

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0c99e17d 10-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: growable btree_paths

XXX: we're allocating memory with btree locks held - bad

We need to plumb through an error path so we can do
allocate_dropping_locks() - but we're merging this now because it fixes
a transaction path overflow caused by indirect extent fragmentation, and
the resize path is rare.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2c3b0fc3 10-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: trans->nr_paths

Start to plumb through dynamically growable btree_paths; this patch
replaces most BTREE_ITER_MAX references with trans->nr_paths.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5cc6daf7 12-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: trans->updates will also be resizable

the reflink triggers are also bumping up against the maximum number of
paths in a transaction - and generating proportional numbers of updates.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 31403dca 11-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS

- Some tweaks to greatly reduce locking overhead for the list of btree
transactions, so that it can always be enabled: leave btree_trans
objects on the list when they're on the percpu single item freelist,
and only check for duplicates in the same process when
CONFIG_BCACHEFS_DEBUG is enabled

- don't zero out the full btree_trans() unless we allocated it from
the mempool

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fea153a8 12-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: rcu protect trans->paths

Upcoming patches are going to be changing trans->paths to a
reallocatable buffer. We need to guard against use after free when it's
used by other threads; this introduces RCU protection to those paths and
changes them to check for trans->paths == NULL

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6474b706 11-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Clean up btree_trans

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 398c9834 10-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: kill btree_path.idx

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 542e6396 10-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_iter_peek_prev() no longer uses path->idx

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 566eabd3 10-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_path_get() no longer uses path->idx

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b0b67378 10-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: trans_for_each_path_with_node() no longer uses path->idx

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ccb7b08f 10-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: trans_for_each_path() no longer uses path->idx

path->idx is now a code smell: we should be using path_idx_t, since it's
stable across btree path reallocation.

This is also a bit faster, using the same loop counter vs. fetching
path->idx from each path we iterate over.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 311e446a 10-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_path_to_text() -> btree_path_idx_t

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1f75ba4e 10-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: struct trans_for_each_path_inorder_iter

reducing our usage of path->idx

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7f9821a7 10-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: btree_insert_entry -> btree_path_idx_t

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 07f383c7 03-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: btree_iter -> btree_path_idx_t

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 788cc25d 08-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: btree_path_alloc() -> btree_path_idx_t

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96ed47d1 08-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_path_traverse() -> btree_path_idx_t

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f6363aca 08-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_path_make_mut() -> btree_path_idx_t

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4617d946 08-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_path_set_pos() -> btree_path_idx_t

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 74e600c1 10-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs; bch2_path_put() -> btree_path_idx_t

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 255ebbbf 08-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_path_get() -> btree_path_idx_t

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5ce8b92d 07-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: minor bch2_btree_path_set_pos() optimization

bpos_eq() is cheaper than bpos_cmp()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0bc64d7e 17-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: kill __bch2_btree_iter_peek_upto_and_restart()

dead code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 79904fa2 11-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_trans_srcu_lock() should be static

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 559e6c23 16-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: trans_for_each_update() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e06af207 15-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fix userspace build errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 67997234 11-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: kill btree_trans->wb_updates

the btree write buffer path now creates a journal entry directly

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f3360005 10-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_trans_node_add no longer uses trans_for_each_path()

In the future we'll be making trans->paths resizable and potentially
having _many_ more paths (for fsck); we need to start fixing algorithms
that walk each path in a transaction where possible.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 24de63da 10-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve trans->extra_journal_entries

Instead of using a darray, we now allocate journal entries for the
transaction commit path with our normal bump allocator - with an inlined
fastpath, and using btree_transaction_stats to remember how much to
initially allocate so as to avoid transaction restarts.

This is prep work for converting write buffer updates to use this
mechanism.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a83b6c89 10-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: kill btree_path->(alloc_seq|downgrade_seq)

These were for extra info in tracepoints for debugging a specialized
issue - we do not want to bloat btree_path for this, at least in release
builds.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f8fd5871 07-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: reserve path idx 0 for sentinal

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6e92d155 03-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Refactor trans->paths_allocated to be standard bitmap

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a564c9fa 02-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Include btree_trans in more tracepoints

This gives us more context information - e.g. which codepath is invoking
btree node reads.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 33981244 26-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve trace_trans_restart_would_deadlock

In the CI, we're seeing tests failing due to excessive would_deadlock
transaction restarts - the tracepoint now includes the lock cycle that
occured.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e153a0d7 26-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve trace_trans_restart_too_many_iters()

We now include the list of paths in use.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3c471b65 26-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: convert bch_fs_flags to x-macro

Now we can print out filesystem flags in sysfs, useful for debugging
various "what's my filesystem doing" issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ad9c7992 17-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill btree_iter->journal_pos

For BTREE_ITER_WITH_JOURNAL, we memoize lookups in the journal keys, to
avoid the binary search overhead.

Previously we stashed the pos of the last key returned from the journal,
in order to force the lookup to be redone when rewinding.

Now bch2_journal_keys_peek_upto() handles rewinding itself when
necessary - so we can slim down btree_iter.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e56978c8 12-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill BTREE_ITER_ALL_LEVELS

As discussed in the previous patch, BTREE_ITER_ALL_LEVELS appears to be
racy with concurrent interior node updates - and perhaps it is fixable,
but it's tricky and unnecessary.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cd5bd162 11-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix redundant variable initialization

path->level was being read, but never used.

Reported-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fa014953 29-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix extents iteration + snapshots interaction

peek_upto() checks against the end position and bails out before
FILTER_SNAPSHOTS checks; this is because if we end up at a different
inode number than the original search key none of the keys we see might
be visibile in the current snapshot - we might be looking at inode in a
completely different subvolume.

But this is broken, because when we're iterating over extents we're
checking against the extent start position to decide when to bail out,
and the extent start position isn't monotonically increasing until after
we've run FILTER_SNAPSHOTS.

Fix this by adding a simple inode number check where the old bailout
check was, and moving the main check to the correct position.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>


# 50a8a732 14-Dec-2023 Thomas Bertschinger <tahbertschinger@gmail.com>

bcachefs: fix invalid memory access in bch2_fs_alloc() error path

When bch2_fs_alloc() gets an error before calling
bch2_fs_btree_iter_init(), bch2_fs_btree_iter_exit() makes an invalid
memory access because btree_trans_list is uninitialized.

Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Fixes: 6bd68ec266ad ("bcachefs: Heap allocate btree_trans")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8a443d3e 17-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Proper refcounting for journal_keys

The btree iterator code overlays keys from the journal until journal
replay is finished; since we're now starting copygc/rebalance etc.
before replay is finished, this is multithreaded access and thus needs
refcounting.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 006ccc30 04-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill journal pre-reservations

This deletes the complicated and somewhat expensive journal
pre-reservation machinery in favor of just using journal watermarks:
when the journal is more than half full, we run journal reclaim more
aggressively, and when the journal is more than 3/4s full we only allow
journal reclaim to get new journal reservations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9fcdd23b 04-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add a comment for BTREE_INSERT_NOJOURNAL usage

BTREE_INSERT_NOJOURNAL is primarily used for a performance optimization
related to inode updates and fsync - document it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f82755e4 30-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Data move path now uses bch2_trans_unlock_long()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c4accde4 29-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Ensure srcu lock is not held too long

The SRCU read lock that btree_trans takes exists to make it safe for
bch2_trans_relock() to deref pointers to btree nodes/key cache items we
don't have locked, but as a side effect it blocks reclaim from freeing
those items.

Thus, it's important to not hold it for too long: we need to
differentiate between bch2_trans_unlock() calls that will be only for a
short duration, and ones that will be for an unbounded duration.

This introduces bch2_trans_unlock_long(), to be used mainly by the data
move paths.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# be9e782d 27-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't downgrade locks on transaction restart

We should only be downgrading locks on success - otherwise, our
transaction restarts won't be getting the correct locks and we'll
livelock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 88dfe193 19-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_id_str()

Since we can run with unknown btree IDs, we can't directly index btree
IDs into fixed size arrays.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6bd68ec2 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Heap allocate btree_trans

We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96dea3d5 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix W=12 build errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6bf3766b 12-Sep-2023 Colin Ian King <colin.i.king@gmail.com>

bcachefs: Fix a handful of spelling mistakes in various messages

There are several spelling mistakes in error messages. Fix these.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5b7fbdcd 09-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix silent enum conversion error

This changes mark_btree_node_locked() to take an enum
btree_node_locked_type, not a six_lock_type, since BTREE_NODE_UNLOCKED
is -1 which may cause problems converting back and forth to
six_lock_type if short enums are in use.

With this change, we never store BTREE_NODE_UNLOCKED in a six_lock_type
enum.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8e877caa 16-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Split out snapshot.c

subvolume.c has gotten a bit large, this splits out a separate file just
for managing snapshot trees - BTREE_ID_snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ff5b741c 13-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Zero btree_paths on allocation

This fixes a bug in the cycle detector, bch2_check_for_deadlock() - we
have to make sure the node pointers in the btree paths array are set to
something not-garbage before another thread may see them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 401585fe 05-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: btree_journal_iter.c

Split out a new file from recovery.c for managing the list of keys we
read from the journal: before journal replay finishes the btree iterator
code needs to be able to iterate over and return keys from the journal
as well, so there's a fair bit of code here.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1e81f89b 06-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix assorted checkpatch nits

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bf5a261c 01-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted fixes for clang

clang had a few more warnings about enum conversion, and also didn't
like the opts.c initializer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 73bd774d 06-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted sparse fixes

- endianness fixes
- mark some things static
- fix a few __percpu annotations
- fix silent enum conversions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# faa6cb6c 28-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Allow for unknown btree IDs

We need to allow filesystems with metadata from newer versions to be
mountable and usable by older versions.

This patch enables us to roll out new btrees without a new major version
number; we can now handle btree roots for unknown btree types.

The unknown btree roots will be retained, and fsck (including
backpointers) will check them, the same as other btree types.

We add a dynamic array for the extra, unknown btree roots, in addition
to the fixed size btree root array, and add new helpers for looking up
btree roots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a5b696ee 19-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: seqmutex; fix a lockdep splat

We can't be holding btree_trans_lock while copying to user space, which
might incur a page fault. To fix this, convert it to a seqmutex so we
can unlock/relock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 49c7cd9d 30-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More drop_locks_do() conversions

Using drop_locks_do() ensures that every unlock() is paired with a
relock(), with proper error checking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 78367aaa 27-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_trans_kmalloc no longer allocates memory with btree locks held

When allocating memory, gfp flags should generally be

- GFP_NOWAIT|__GFP_NOWARN if btree locks are held
- GFP_NOFS if in the IO path or otherwise holding resources needed for
IO submission
- GFP_KERNEL otherwise

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b5fd7566 28-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: drop_locks_do()

Add a new helper for the common pattern of:
- trans_unlock()
- do something
- trans_relock()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f154c3eb 27-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: trans_for_each_path_safe()

bch2_btree_trans_to_text() is used on btree_trans objects that are owned
by different threads - when printing out deadlock cycles - so we need a
safe version of trans_for_each_path(), else we race with seeing a
btree_path that was just allocated and not fully initialized:

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 32913f49 16-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

six locks: Seq now only incremented on unlock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1fb4fe63 20-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

six locks: Kill six_lock_state union

As suggested by Linus, this drops the six_lock_state union in favor of
raw bitmasks.

On the one hand, bitfields give more type-level structure to the code.
However, a significant amount of the code was working with
six_lock_state as a u64/atomic64_t, and the conversions from the
bitfields to the u64 were deemed a bit too out-there.

More significantly, because bitfield order is poorly defined (#ifdef
__LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the
sequence number would overflow into the rest of the bitfield if the
compiler didn't put the sequence number at the high end of the word.

The new code is a bit saner when we're on an architecture without real
atomic64_t support - all accesses to lock->state now go through
atomic64_*() operations.

On architectures with real atomic64_t support, we additionally use
atomic bit ops for setting/clearing individual bits.

Text size: 7467 bytes -> 4649 bytes - compilers still suck at
bitfields.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f375d6ca 16-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't call local_clock() twice in trans_begin()

local_clock() is not as cheap as we'd like it to be, alas

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 11f11737 20-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Call bch2_path_put_nokeep() before bch2_path_put()

bch2_path_put_nokeep() is sketchy, and we should consider removing it:
it unconditionally frees btree_paths once their ref hits 0.

The assumption is that we only use it for paths that have never been
visible outside the btree core btree code; i.e. higher level code will
never be making assumptions about locking based on these paths.

However, there's subtle brokenness with this approach:

- If we call bch2_path_put(), then bch2_path_put_nokeep(),
bch2_path_put() may free the first path on the assumption that we we
have another path keeping a node locked - but then
bch2_path_put_nokeep() just unconditionally frees it.

The same bug may arise if we're calling bch2_path_put() and
bch2_path_put_nokeep() on the same (refcounted) path, or two adjacent
paths that point to the same btree node.

This patch hacks around one of these bugs by calling
bch2_path_put_nokeep() first in bch2_trans_iter_exit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 65d48e35 14-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Private error codes: ENOMEM

This adds private error codes for most (but not all) of our ENOMEM uses,
which makes it easier to track down assorted allocation failures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 511b629a 06-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_iter_peek_node_and_restart()

Minor refactoring for the Rust interface.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1306f87d 02-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Plumb btree_trans through btree cache code

Soon, __bch2_btree_node_write() is going to require a btree_trans: zoned
device support is going to require a new allocation for every btree node
write. This is a bit of prep work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0f2ea655 27-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_iter_peek_and_restart_outlined()

Needed for interfacing with Rust - bindgen can't handle inline
functions, alas.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5e2d8be8 18-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Split trans->last_begin_ip and trans->last_restarted_ip

These are two different things - this improves our debug assert
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6623c0fc 17-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add an assertion for using multiple btree_trans

A thread should never be using more than one btree_trans - doing so is
an invitation for deadlocks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 920e69bc 03-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Btree write buffer

This adds a new method of doing btree updates - a straight write buffer,
implemented as a flat fixed size array.

This is only useful when we don't need to read from the btree in order
to do the update, and when reading is infrequent - perfect for the LRU
btree.

This will make LRU btree updates fast enough that we'll be able to use
it for persistently indexing buckets by fragmentation, which will be a
massive boost to copygc performance.

Changes:
- A new btree_insert_type enum, for btree_insert_entries. Specifies
btree, btree key cache, or btree write buffer.

- bch2_trans_update_buffered(): updates via the btree write buffer
don't need a btree path, so we need a new update path.

- Transaction commit path changes:
The update to the btree write buffer both mutates global, and can
fail if there isn't currently room. Therefore we do all write buffer
updates in the transaction all at once, and also if it fails we have
to revert filesystem usage counter changes.

If there isn't room we flush the write buffer in the transaction
commit error path and retry.

- A new persistent option, for specifying the number of entries in the
write buffer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 60b55388 09-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: trans->notrace_relock_fail

When we unlock in order to submit IO, the next relock event is likely to
fail if submit_bio() blocked - we shouldn't those events in our _fail
stats, since those are expected events and shouldn't cause test
failures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 434b1c75 09-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Switch a BUG_ON() to a panic()

This assert is popping - rarely - in the CI, this will help us track it
down from the logs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 992fa4e6 08-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix btree_path_alloc()

We need to call bch2_trans_update_max_paths() before marking the new
path as allocated, since we're not initializing it yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c72f687a 11-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Use for_each_btree_key_upto() more consistently

It's important that in BTREE_ITER_FILTER_SNAPSHOTS mode we always use
peek_upto() and provide an end for the interval we're searching for -
otherwise, when we hit the end of the inode the next inode be in a
different subvolume and not have any keys in the current snapshot, and
we'd iterate over arbitrarily many keys before returning one.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 94c69faf 04-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Use six_lock_ip()

This uses the new _ip() interface to six locks and hooks it up to
btree_path->ip_allocated, when available.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 12344c7c 01-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_trans_in_restart_error()

This replaces various BUG_ON() assertions with panics that tell us where
the restart was done and the restart type.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 464b4155 05-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix bch2_trans_reset_updates()

This should have been resetting trans->fs_usage_deltas as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4e3d1899 04-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Inline bch2_btree_path_traverse() fastpath

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ee2c6ea7 08-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: btree_iter->ip_allocated

In debug mode, we now track where btree iterators and paths are
initialized/allocated - helpful in tracking down btree path overflows.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6c36318c 07-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: key cache: Don't hold btree locks while using GFP_RECLAIM

This is something we need to do more widely: instead of bothering with
GFP_NOIO/GFP_NOFS, if we need to allocate memory while holding locks:

- first attempt the allocation with GFP_NOWAIT
- if that fails, drop btree locks with bch2_trans_unlock(), then
retry with GFP_KERNEL.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c82ed304 07-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix bch2_btree_path_traverse_all()

We need to take a ref on a path while we're traversing it: this fixes a
bug with paths getting reused while being traversed, in the key cache
fill code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ee94c413 06-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Delete a faulty assertion

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9d7f2a41 04-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_trans_to_text(): print blocked time

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e242b92a 15-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix for long running btree transactions & key cache

While a btree transaction is running, we hold a SRCU read lock on the
btree key cache that prevents btree key cache keys from being freed -
this is so that relock() operations won't access freed memory.

The downside of this is that long running btree transactions prevent
memory from being freed from the key cache. This adds a check in
bch2_trans_begin() - if the transaction has been running longer than 1
second, drop and retake the SRCU read lock and zero out pointers to
unlock key cache paths.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 08f78031 23-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_trans_revalidate_updates_in_node()

When we started stashing the key being overwritten in
btree_insert_entry, this introduced a typical iterator invalidation
problem, triggered by btree node splits or resorts.

Previously, dealt with this by unconditionally re-validating those
stashed pointers in the transaction commit path. This patch gets rid of
that by doing it only when needed, in bch2_trans_node_add() or
bch2_trans_node_reinit_iter().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 321bdc73 25-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bkey_min(), bkey_max()

Parallel to bpos_min(), bpos_max() - trivial refactoring.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ef073286 09-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add a missing bch2_btree_path_traverse() call

bch2_btree_iter_peek_upto() in snapshots mode may need to keep a
btree_path for the insert position, not just the position of the key
we're returning. The code was incorrectly assuming this would be in the
same btree node - we were missing a bch2_btree_path_traverse() call.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5c792e1b 01-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix a btree iter assertion pop

This fixes a (harmless) broken invariant in __bch2_btree_path_set_pos():
iterators to interior nodes should point to the first non whiteout.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1617d56d 22-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Key cache now works for snapshots btrees

This switches btree_key_cache_fill() to use a btree iterator, not a
btree path, so that it can search for keys in previous snapshots.

We also add another iterator flag, BTREE_ITER_KEY_CACHE_FILL, to avoid
recursion back into the key cache.

This will allow us to re-enable the key cache for inodes in the next
patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 087e53c2 20-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Bring back BTREE_ITER_CACHED_NOFILL

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dcced069 20-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill __btree_trans_peek_key_cache()

There was no reason for this to be a separate helper - we always want
the relock call that btree_trans_peek_key_cache() did.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 52bf51b9 20-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix __btree_trans_peek_key_cache()

We were returning a pointer to a variable on the stack - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e88a75eb 24-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New bpos_cmp(), bkey_cmp() replacements

This patch introduces
- bpos_eq()
- bpos_lt()
- bpos_le()
- bpos_gt()
- bpos_ge()

and equivalent replacements for bkey_cmp().

Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c96f108b 24-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Optimize bch2_trans_iter_init()

When flags & btree_id are constants, we can constant fold the entire
calculation of the actual iterator flags - and the whole thing becomes
small enough to inline.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c9ee99ad 22-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Move some asserts behind CONFIG_BCACHEFS_DEBUG

Convert some non-critical asserts in long-stable code to debug asserts.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0f35e086 15-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix return code from btree_path_traverse_one()

trans->restarted is a positive error code, not the usual negative

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b2d1d56b 13-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fixes for building in userspace

- Marking a non-static function as inline doesn't actually work and is
now causing problems - drop that

- Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages
with bcachefs (filesystem name)

- Userspace doesn't have real percpu variables (maybe we can get this
fixed someday), put an #ifdef around bch2_disk_reservation_add()
fastpath

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 984dc67e 01-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve __bch2_btree_path_make_mut()

btree_path_copy() doesn't need to call
bch2_btree_path_check_sort_fast() - the newly allocated path will always
be in the correct position, post copy; also delete some redundant
branches from __bch2_btree_path_make_mut().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0cc455b3 01-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Inlining improvements

- Don't call into bch2_encrypt_bio() when we're not encrypting
- Pull slowpath out of trans_lock_write()
- Make sure bc2h_trans_journal_res_get() gets inlined.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a1019576 22-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More style fixes

Fixes for various checkpatch errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c167f9e5 23-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Journal keys overlay fixes

- In the btree iterator code that overlays keys from the journal, we
were incorrectly specifying level=0 instead of the btree_path's
current level in a few places
- When we didn't do journal replay, we shouldn't free the journal keys:
this fixes cmd_list and cmd_dump, which run in norecovery mode

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1f69368c 22-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix an out-of-bounds shift

roundup_pow_of_two() is undefined for 0 - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c81f5836 21-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't touch c->flags in bch2_trans_iter_init()

This moves the JOURNAL_REPLAY_DONE flag check from
bch2_trans_iter_init() to bch2_trans_init(), where we stash a copy in
btree_trans - gaining us a small performance improvement.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e3e02e6 19-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted checkpatch fixes

checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:

ASSIGN_IN_IF
UNSPECIFIED_INT - bcachefs coding style prefers single token type names
NEW_TYPEDEFS - typedefs are occasionally good
FUNCTION_ARGUMENTS - we prefer to look at functions in .c files
(hopefully with docbook documentation), not .h
file prototypes
MULTISTATEMENT_MACRO_USE_DO_WHILE
- we have _many_ x-macros and other macros where
we can't do this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 307e3c13 17-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Optimize bch2_trans_init()

Now we store the transaction's fn idx in a local variable, instead of
redoing the lookup every time we call bch2_trans_init().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 29aa78f1 17-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Split out __btree_path_up_until_good_node()

This breaks up btree_path_up_until_good_node() so that only the fastpath
gets inlined.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d7e4e513 14-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Switch to local_clock() for fastpath time source

local_clock() isn't always completely accurate - e.g. on machines with
TSC drift - but ktime_get_ns() overhead is too high, unfortunately.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dccedaaa 14-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix btree node prefetchig

We were forgetting to count down the number of nodes to prefetch, firing
off _way_ more than intended - whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f42238b5 12-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix a rare path in bch2_btree_path_peek_slot()

In the drop_alloc tests, we may end up calling
bch2_btree_iter_peek_slot() on an interior level that doesn't exist.
Previously, this would hit the path->uptodate assertion in
bch2_btree_path_peek_slot(); this path first checks a NULL btree node,
which is how we know we're at the end of the btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7dcbdbd8 11-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_path_put_nokeep()

The btree iterator code may allocate extra btree paths, temporarily,
that do not refer to keys being returned: we don't need to wait until
transaction restart to drop these, when they're not referenced they
should be deleted right away.

This fixes a transaction path overflow bug.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 969576ec 09-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_iter_peek() now works with interior nodes

Needed by the next patch, which will be iterating over keys in nodes at
level 1.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 29cea6f4 27-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix bch2_btree_path_up_until_good_node()

There was a rare bug when path->locks_want was nonzero, but not
BTREE_MAX_DEPTH, where we'd return on a valid node that wasn't locked -
oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c298fd7d 26-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs; Mark __bch2_trans_iter_init as inline

This function is fairly small and only used in two places: one very hot,
the other cold, so it should definitely be inlined.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3f3bc66e 26-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Optimize btree_path_alloc()

- move slowpath code to a separate function, btree_path_overflow()
- no need to use hweight64
- copy nr_max_paths from btree_transaction_stats to btree_trans,
avoiding a data dependency in the fast path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 14d8f26a 26-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Inline bch2_trans_kmalloc() fast path

Small performance optimization.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a8f35428 25-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_print_string_as_lines()

This adds a helper for printing a large buffer one line at a time, to
avoid the 1k printk limit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e9174370 25-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_node_relock_notrace()

Most of the node_relock_fail trace events are generated from
bch2_btree_path_verify_level(), when debugcheck_iterators is enabled -
but we're not interested in these trace events, they don't indicate that
we're in a slowpath.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# afbc7194 01-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve bch2_btree_trans_to_text()

This is just a formatting/readability improvement.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0d7009d7 22-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Delete old deadlock avoidance code

This deletes our old lock ordering based deadlock avoidance code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 96d994b3 22-Aug-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Print deadlock cycle in debugfs

In the event that we're not finished debugging the cycle detector, this
adds a new file to debugfs that shows what the cycle detector finds, if
anything. By comparing this with btree_transactions, which shows held
locks for every btree_transaction, we'll be able to determine if it's
the cycle detector that's buggy or something else.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 33bd5d06 22-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Deadlock cycle detector

We've outgrown our own deadlock avoidance strategy.

The btree iterator API provides an interface where the user doesn't need
to concern themselves with lock ordering - different btree iterators can
be traversed in any order. Without special care, this will lead to
deadlocks.

Our previous strategy was to define a lock ordering internally, and
whenever we attempt to take a lock and trylock() fails, we'd check if
the current btree transaction is holding any locks that cause a lock
ordering violation. If so, we'd issue a transaction restart, and then
bch2_trans_begin() would re-traverse all previously used iterators, but
in the correct order.

That approach had some issues, though.
- Sometimes we'd issue transaction restarts unnecessarily, when no
deadlock would have actually occured. Lock ordering restarts have
become our primary cause of transaction restarts, on some workloads
totally 20% of actual transaction commits.

- To avoid deadlock or livelock, we'd often have to take intent locks
when we only wanted a read lock: with the lock ordering approach, it
is actually illegal to hold _any_ read lock while blocking on an intent
lock, and this has been causing us unnecessary lock contention.

- It was getting fragile - the various lock ordering rules are not
trivial, and we'd been seeing occasional livelock issues related to
this machinery.

So, since bcachefs is already a relational database masquerading as a
filesystem, we're stealing the next traditional database technique and
switching to a cycle detector for avoiding deadlocks.

When we block taking a btree lock, after adding ourself to the waitlist
but before sleeping, we do a DFS of btree transactions waiting on other
btree transactions, starting with the current transaction and walking
our held locks, and transactions blocking on our held locks.

If we find a cycle, we emit a transaction restart. Occasionally (e.g.
the btree split path) we can not allow the lock() operation to fail, so
if necessary we'll tell another transaction that it has to fail.

Result: trans_restart_would_deadlock events are reduced by a factor of
10 to 100, and we'll be able to delete a whole bunch of grotty, fragile
code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 845cffed 19-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add a debug assert

Chasing down a strange locking bug.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 57ce8274 18-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Make an assertion more informative

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e4215d0f 16-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: All held locks must be in a btree path

With the new deadlock cycle detector, it's critical that all held locks
be marked in a btree_path, because that's what the cycle detector
traverses - any locks that aren't correctly marked will cause deadlocks.

This changes the btree_path to allocate some btree_paths for the new
nodes, since until the final update is done we otherwise don't have a
path referencing them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 38474c26 02-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Avoid using btree_node_lock_nopath()

With the upcoming cycle detector, we have to be careful about using
btree_node_lock_nopath - in particular, using it to take write locks can
cause deadlocks.

All held locks need to be tracked in a btree_path, so that the cycle
detector knows about them - unless we know that we cannot cause
deadlocks for other reasons: e.g. we are only taking read locks, or
we're in very early fsck (topology repair).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4e6defd1 31-Aug-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: btree_bkey_cached_common->cached

Add a type descriptor to btree_bkey_cached_common - there's no reason
not to since we've got padding that was otherwise unused, and this is a
nice cleanup (and helpful in later patches).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 674cfc26 26-Aug-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add persistent counters for all tracepoints

Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c240c3a9 22-Aug-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Print lock counts in debugs btree_transactions

Improve our debugfs output, to help in debugging deadlocks: this shows,
for every btree node we print, the current number of readers/intent
locks/write locks held.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 14599cce 22-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Switch btree locking code to struct btree_bkey_cached_common

This is just some type safety cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 616928c3 22-Aug-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Track maximum transaction memory

This patch
- tracks maximum bch2_trans_kmalloc() memory used in btree_transaction_stats
- makes it available in debugfs
- switches bch2_trans_init() to using that for the amount of memory to
preallocate, instead of the parameter passed in

This drastically reduces transaction restarts, and means we no longer
need to track this in the source code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2e27f656 21-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill nodes_intent_locked

Previously, we used two different bit arrays for tracking held btree
node locks. This patch switches to an array of two bit integers, which
will let us track, in a future patch, when we hold a write lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d4263e56 21-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Better use of locking helpers

Held btree locks are tracked in btree_path->nodes_locked and
btree_path->nodes_intent_locked. Upcoming patches are going to change
the representation in struct btree_path, so this patch switches to
proper helpers instead of direct access to these fields.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# cd5afabe 19-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_locking.c

Start to centralize some of the locking code in a new file; more locking
code will be moving here in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# efd0d038 17-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Minor transaction restart handling fix

- fsck_inode_rm() wasn't returning BCH_ERR_transaction_restart_nested
- change bch2_trans_verify_not_restarted() to call panic() - we don't
want these errors to be missed

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 23dfb3a2 17-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_btree_iter_peek_slot() error path

iter->k needs to be consistent with iter->pos - required for
bch2_btree_iter_(rewind|advance) to work correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8192f8a5 16-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Another should_be_locked fixup

When returning a key from the key cache, in BTREE_ITER_WITH_KEY_CACHE
mode, we don't want to set should_be_locked on iter->path; we're not
returning a key from that path, so we donn't need to, and also since we
traversed the key cache iterator before setting should_be_locked on that
path it might be unlocked (if we unlocked, bch2_trans_relock() won't
have relocked it).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 223b560e 15-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_path_down() optimization

We should be calling btree_node_mem_ptr_set() before path_level_init(),
since we already touched the key that btree_node_mem_ptr_set() will
modify and path_level_init() will be doing the lookup in the child btree
node we're recursing to.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c497df8b 11-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Increment restart count in bch2_trans_begin()

Instead of counting transaction restarts, count when the transaction is
restarted: if bch2_trans_begin() was called when the transaction wasn't
restarted we need to ensure restart_count is still incremented.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 5c0bb66a 11-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Track the maximum btree_paths ever allocated by each transaction

We need a way to check if the machinery for handling btree_paths with in
a transaction is behaving reasonably, as it often has not been - we've
had bugs with transaction path overflows caused by duplicate paths and
plenty of other things.

This patch tracks, per transaction fn, the most btree paths ever
allocated by that transaction and makes it available in debugfs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4aba7d45 11-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename lock_held_stats -> btree_transaction_stats

Going to be adding more things to this in the next patch.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a300261a 10-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix duplicate paths left by bch2_path_put()

bch2_path_put() is supposed to drop paths that aren't needed on
transaction restart, or to hold locks that we're supposed to keep until
transaction commit: but it was failing to free paths in some cases that
it should have, leading to transaction path overflows with lots of
duplicate paths.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 6fae65c1 10-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BTREE_ITER_CACHED_(NOFILL|NOCREATE)

These were used more prior to getting rid of the in-memory bucket arrays
- they don't serve much purpose anymore, and deleting them lets us write
better assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9f96568c 09-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tracepoint improvements

Our types are exported to the tracepoint code, so it's not necessary to
break things out individually when passing them to tracepoints - we can
also call other functions from TP_fast_assign().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# fa3ae3ca 09-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: six_lock_counts() is now in six.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 315c9ba6 10-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BTREE_ITER_NO_NODE -> BCH_ERR codes

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# fd211bc7 10-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't set should_be_locked on paths that aren't locked

It doesn't make any sense to set should_be_locked on btree_paths that
aren't locked, and is often a bug - this patch adds assertions and fixes
some of those bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 49e401fa 07-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tracepoint improvements

- use strlcpy(), not strncpy()
- add tracepoints for btree_path alloc and free
- give the tracepoint for key cache upgrade fail a proper name
- add a tracepoint for btree_node_upgrade_fail

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 86b74451 05-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_btree_trans_to_text()

bch2_btree_trans_to_text() is used to print btree_transactions owned by
other threads; thus, it needs to be particularly careful. This fixes a
null ptr deref caused by racing with the owning thread changing
path->l[].b.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 01eed771 21-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tighten up btree_path assertions

Currently seeing a very rare and difficult to explain btree_path
inconsistency - this patch adds assertions to the only place that seems
to be missing them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# a0cb8d78 17-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Inject transaction restarts in debug mode

In CONFIG_BCACHEFS_DEBUG mode, we'll now randomly issue transaction
restarts - with a decaying probability based on the number of restarts
we've already had, to ensure that transactions eventually make forward
progress. This should help shake out some bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 549d173c 17-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: EINTR -> BCH_ERR_transaction_restart

Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e941ae7d 17-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a counter for btree_trans restarts

This will help us improve nested transactions - we need to add
assertions that whenever an inner transaction handles a restart, it
still returns -EINTR to the outer transaction.

This also adds nested_lockrestart_do() and nested_commit_do() which use
the new counters to correctly return -EINTR when the transaction was
restarted.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c807ca95 14-Jul-2022 Daniel Hill <daniel@gluo.nz>

bcachefs: added lock held time stats

We now record the length of time btree locks are held and expose this in debugfs.

Enabled via CONFIG_BCACHEFS_LOCK_TIME_STATS.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8bfe14e8 14-Jul-2022 Daniel Hill <daniel@gluo.nz>

bcachefs: lock time stats prep work.

We need the caller name and a place to store our results, btree_trans provides this.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 43de721a 13-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Unlock in bch2_trans_begin() if we've held locks more than 10us

We try to ensure we never hold btree locks for too long - bcachefs tries
to be soft realtime. This adds a check when restarting a transaction,
where a transaction restart is cheap - if we've been holding locks for
too long, drop and retake them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e28307a1 05-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Silence unimportant tracepoints

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 50b13bee 17-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve an error message

When inserting a key type that's not valid for a given btree, we should
print out which btree we were inserting into.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 401ec4db 03-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Printbuf rework

This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0fbf71f8 29-May-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_reset_updates()

Factor out a new helper.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 30525f68 21-May-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix journal_keys_search() overhead

Previously, on every btree_iter_peek() operation we were searching the
journal keys, doing a full binary search - which was slow.

This patch fixes that by saving our position in the journal keys, so
that we only do a full binary search when moving our position backwards
or a large jump forwards.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# ee4d17d0 25-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Put btree_trans_verify_sorted() behind debug_check_iterators

This is pretty expensive, and we've tested sufficiently with it now that
it doesn't need to be on by default.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b0babf2a 12-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_btree_iter_peek_all_levels()

This adds bch2_btree_iter_peek_all_levels(), which returns keys from
every level of the btree - interior nodes included - in monotonically
increasing order, soon to be used by the backpointers check & repair
code.

- BTREE_ITER_ALL_LEVELS can now be passed to for_each_btree_key() to
iterate thusly, much like BTREE_ITER_SLOTS

- The existing algorithm in bch2_btree_iter_advance() doesn't work with
peek_all_levels(): we have to defer the actual advancing until the
next time we call peek, where we have the btree path traversed and
uptodate. So, we add an advanced bit to btree_iter; when
BTREE_ITER_ALL_LEVELS is set bch2_btree_iter_advanced() just marks
the iterator as advanced.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c4bce586 14-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_path_set_level_(up|down)

This adds two new helpers to btree_iter.c for changing the level of a
path up or down - to be used by the new
bch2_btree_iter_peek_all_levels().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2ae4573e 14-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_btree_iter_peek_slot() now works on interior nodes

The new backpointers code will be using bch2_btree_iter_peek_slot() on
interior nodes - this patch updates peek_slot() to make that work.

- Pass the correct level to bch2_journal_keys_peek_slot()
- We should only set BTREE_ITER_CACHED or BTREE_ITER_WITH_KEY_CACHE
when using bch2_trans_iter_init(), not bch2_trans_node_iter_init()
- Update assertions

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 5650bb46 11-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Introduce bch2_journal_keys_peek_(upto|slot)()

When many journal replay keys have been overwritten,
bch2_journal_keys_peek() was taking excessively long to scan before it
found a key to return.

Fix this by introducing bch2_journal_keys_peek_upto() which takes a
parameter for the end of the range we want, so that we can terminate the
search much sooner, and replace all uses of bch2_journal_keys_peek()
with peek_upto() or peek_slot().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2a6870ad 29-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use darray for extra_journal_entries

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d8648425 30-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_path_make_mut() clears should_be_locked

This fixes a bug where __bch2_btree_node_update_key() wasn't clearing
should_be_locked, leading to bch2_btree_path_traverse() always failing -
all callers of btree_path_make_mut() want should_be_locked cleared, so
do it there.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 7071878b 30-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a missing btree_path_set_dirty() calls

bch2_btree_iter_next_node() was mucking with other btree_path state
without setting path->update to be consistent with the fact that the
path is very much no longer uptodate - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8570d775 11-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_updates_to_text()

This turns bch2_dump_trans_updates() into a to_text() method - this way
it can be used by debug tracing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2158fe46 02-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_inconsistent()

Add a new error macro that also dumps transaction updates in addition to
doing an emergency shutdown - when a transaction update discovers or is
causing a fs inconsistency, it's helpful to see what updates it was
doing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 85d8cf16 10-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_btree_iter_peek_upto()

In BTREE_ITER_FILTER_SNAPHOTS mode, we skip over keys in unrelated
snapshots. When we hit the end of an inode, if the next inode(s) are in
a different subvolume, we could potentially have to skip past many keys
before finding a key we can return to the caller, so they can terminate
the iteration.

This adds a peek_upto() variant to solve this problem, to be used when
we know the range we're searching within.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d4d24a65 10-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Delay setting path->should_be_locked

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 61a66469 06-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix lock ordering under traverse_all()

traverse_all() traverses btree paths in sorted order, so it should never
see transaction restarts due to lock ordering violations. But some code
in __bch2_btree_path_upgrade(), while necessary when not running under
traverse_all(), was causing some confusing lock ordering violations -
disabling this code under traverse_all() will let us put in some more
assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# a897ef68 07-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix error handling in traverse_all()

In btree_path_traverse_all() we were failing to check for -EIO in the
retry loop, and after btree node read error we'd go into an infinite
loop.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e1f7fa06 05-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't keep around btree_paths unnecessarily

When bch2_trans_begin() is called and there hasn't been a transaction
restart, we presume that we're now doing something new - iterating over
different keys, and we now shouldn't keep aruond paths related to the
previous transaction, excepting the subvolumes btree.

This should fix some of our "transaction path overflow" bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# a0a07c59 25-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix btree path sorting

In btree_update_interior.c, we were changing a path's level directly -
which affects path sort order - without re-sorting paths, leading to
assertions when bch2_path_get() verified paths were sorted correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# fa8e94fa 25-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Heap allocate printbufs

This patch changes printbufs dynamically allocate and reallocate a
buffer as needed. Stack usage has become a bit of a problem, and a major
cause of that has been static size string buffers on the stack.

The most involved part of this refactoring is that printbufs must now be
exited with printbuf_exit().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# eb7bd15f 24-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve debug assertion

We're hitting a strange bug with transaction paths not being sorted
correctly - this dumps transaction paths in the order we thought was
sorted, which will hopefully shed some light as to what's going on.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 25a77231 24-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Always clear should_be_locked in bch2_trans_begin()

bch2_trans_begin() invalidates all iterators, until they're revalidated
by calling peek() or traverse().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8f9ad91a 17-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix failure to allocate btree node in cache

The error code when we fail to allocate a node in the btree node cache
doesn't make it to bch2_btree_path_traverse_all(). Instead, we need to
stash a flag in btree_trans so we know we have to take the cannibalize
lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 33aa419d 16-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix __btree_path_traverse_all

The loop that traverses paths in traverse_all() needs to be a little bit
tricky, because traversing a path can cause other paths to be added (or
perhaps removed) at about the same position.

The old logic was buggy, replace it with simpler logic.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 7abda8c1 15-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix __bch2_btree_node_lock

__bch2_btree_node_lock() was implementing the wrong lock ordering for
cached vs. non cached paths - this fixes it to match the btree path sort
order as defined by __btree_path_cmp(), and also simplifies the code
some.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c7ce2732 15-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Also show when blocked on write locks

This consolidates some of the btree node lock path, so that when we're
blocked taking a write lock on a node it shows up in
bch2_btree_trans_to_text(), along with intent and read locks.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2ce8fbd9 13-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bch2_bkey_debugcheck

The old .debugcheck methods are no more and this just calls the .invalid
method, which doesn't add much since we already check that when doing
btree updates and when reading metadata in.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 12ce5b7d 11-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Btree key cache coherency

- Updates to non key cache iterators will now be transparently
redirected to the key cache for cached btrees.

- Except when creating new keys: then the update goes to underlying
btree

For for iterating over a cached btree to work, we need to ensure that if
a key exists in the key cache, it also exists in the btree - otherwise
the iterator code will skip past it and not check the key cache.

Otherwise, for consistency, all updates should go to the same place -
the key cache.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f7b6ca23 06-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BTREE_ITER_WITH_KEY_CACHE

This is the start of cache coherency with the btree key cache - this
adds a btree iterator flag that causes lookups to also check the key
cache when we're iterating over the btree (not iterating over the key
cache).

Note that we could still race with another thread creating at item in
the key cache and updating it, since we aren't holding the key cache
locked if it wasn't found. The next patch for the update path will
address this by causing the transaction to restart if the key cache is
found to be dirty.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 45e4cd9e 24-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: run_one_trigger() now checks journal keys

Previously, when doing updates and running triggers before journal
replay completes, triggers would see the incorrect key for the old key
being overwritten - this patch updates the trigger code to check the
journal keys when necessary, needed for the upcoming allocator rewrite.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2e63e180 24-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Stash a copy of key being overwritten in btree_insert_entry

We currently need to call bch2_btree_path_peek_slot() multiple times in
the transaction commit path - and some of those need to be updated to
also check the keys from journal replay, too. Let's consolidate this and
stash the key being overwritten in btree_insert_entry.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# ce91abd6 06-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_btree_path_set_pos()

bch2_btree_path_set_pos() is now available outside of btree_iter.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7c8f6f98 12-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_id_cached()

Add a new helper that returns true if the given btree ID uses the btree
key cache. This enables some new cleanups, since the helper can check
the options for whether caching is enabled on a given btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 3763cb95 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use in-memory bucket array for alloc updates

More prep work for getting rid of the in-memory bucket array: now that
we have BTREE_ITER_WITH_JOURNAL, the allocator code can do ntree lookups
before journal replay is finished, and there's no longer any need for it
to get allocation information from the in-memory bucket array.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 1f2d9192 08-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: iter->update_path

With BTREE_ITER_FILTER_SNAPSHOTS, we have to distinguish between the
path where the key was found, and the path for inserting into the
current snapshot. This adds a new field to struct btree_iter for saving
a path for the current snapshot, and plumbs it through
bch2_trans_update().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a1e82d35 08-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor bch2_btree_iter()

This splits bch2_btree_iter() up into two functions: an inner function
that handles BTREE_ITER_WITH_JOURNAL, BTREE_ITER_WITH_UPDATES, and
iterating acrcoss leaf nodes, and an outer one that implements
BTREE_ITER_FILTER_SNAPHSOTS.

This is prep work for remember a btree_path at our update position in
BTREE_ITER_FILTER_SNAPSHOTS mode.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bc82d08b 08-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tracepoint improvements

This improves the transaction restart tracepoints - adding distinct
tracepoints for all the locations and reasons a transaction might have
been restarted, and ensures that there's a tracepoint for every
transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 669f87a5 03-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Switch to __func__for recording where btree_trans was initialized

Symbol decoding, via %ps, isn't supported in userspace - this will also
be faster when we're using trans->fn in the fast path, as with the new
BCH_JSET_ENTRY_log journal messages.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8e432d98 10-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

fixup! bcachefs: Factor out __bch2_btree_iter_set_pos()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5222a460 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BTREE_ITER_WITH_JOURNAL

This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is
automatically enabled when initializing a btree iterator before journal
replay has completed - it overlays the contents of the journal with the
btree.

This lets us delete bch2_btree_and_journal_walk() and just use the
normal btree iterator interface instead - which also lets us delete a
significant amount of duplicated code.

Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch -
we're redoing the binary search over keys in the journal every time we
call bch2_btree_iter_peek().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ffa7d262 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use BTREE_ITER_NOPRESERVE in bch2_btree_iter_verify_ret()

This fixes a transaction path overflow.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f3e1f444 21-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BTREE_ITER_NOPRESERVE

This adds a flag to not mark the initial btree_path as preserve, for
paths that we expect to be cheap to reconstitute if necessary - this
solves a btree_path overflow caused by need_whiteout_for_snapshot().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 99fafb04 20-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix some shutdown path bugs

This fixes some bugs when we hit an error very early in the filesystem
startup path, before most things have been initialized.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b84d42c3 16-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out CONFIG_BCACHEFS_DEBUG_TRANSACTIONS

This puts the btree_transactions sysfs/debugfs file behind a separate
config option - it's highly useful, but not cheap enough to enable
permenantly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 32b26e8c 05-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_assert_pos_locked()

This adds a new assertion to be used by bch2_inode_update_after_write(),
which updates the VFS inode based on the update to the btree inode we
just did - we require that the btree inode still be locked when we do
that update.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9a74f63c 07-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: path->should_be_locked fixes

- We should only be clearing should_be_locked in btree_path_set_pos() -
it's the responsiblity of the btree_path code, not the btree_iter
code.

- bch2_path_put() needs to pay attention to path->should_be_locked, to
ensure we don't drop locks we're supposed to be keeping.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f527afea 02-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix upgrade_readers()

The bch2_btree_path_upgrade() call was failing and tripping an assert -
path->level + 1 is in this case not necessarily exactly what we want,
fix it by upgrading exactly the locks we want.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d7407292 30-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix faulty assertion

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 904823de 29-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert bch2_mark_key() to take a btree_trans *

This helps to unify the interface between bch2_mark_key() and
bch2_trans_mark_key() - and it also gives access to the journal
reservation and journal seq in the mark_key path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 6caf0578 28-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_btree_iter_advance()

Was popping an assertion on !BTREE_ITER_ALL_SNAPSHOTS iters when getting
to the end.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 979735df 24-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_btree_iter_next_node()

We were modifying state, then return -EINTR, causing us to skip nodes -
ouch.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b0d1b70a 24-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Must check for errors from bch2_trans_cond_resched()

But we don't need to call it from outside the btree iterator code
anymore, since it's called by bch2_trans_begin() and
bch2_btree_path_traverse().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e5fa91d7 20-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix restart handling in for_each_btree_key()

Code that uses for_each_btree_key often wants transaction restarts to be
handled locally and not returned. Originally, we wouldn't return
transaction restarts if there was a single iterator in the transaction -
the reasoning being if there weren't other iterators being invalidated,
and the current iterator was being advanced/retraversed, there weren't
any locks or iterators we were required to preserve.

But with the btree_path conversion that approach doesn't work anymore -
even when we're using for_each_btree_key() with a single iterator there
will still be two paths in the transaction, since we now always preserve
the path at the pos the iterator was initialized at - the reason being
that on restart we often restart from the same place.

And it turns out there's now a lot of for_each_btree_key() uses that _do
not_ want transaction restarts handled locally, and should be returning
them.

This patch splits out for_each_btree_key_norestart() and
for_each_btree_key_continue_norestart(), and converts existing users as
appropriate. for_each_btree_key(), for_each_btree_key_continue(), and
for_each_btree_node() now handle transaction restarts themselves by
calling bch2_trans_begin() when necessary - and the old hack to not
return transaction restarts when there's a single path in the
transaction has been deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9a796fdb 19-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_exit() no longer returns errors

Now that peek_node()/next_node() are converted to return errors
directly, we don't need bch2_trans_exit() to return errors - it's
cleaner this way and wasn't used much anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d355c6f4 19-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: for_each_btree_node() now returns errors directly

This changes for_each_btree_node() to work like for_each_btree_key(),
and to that end bch2_btree_iter_peek_node() and next_node() also return
error ptrs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 60816d9b 14-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve bch2_dump_trans_paths_updates()

Also print the key beyng overwritten for each update.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d697b9ab 07-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More btree iterator fixes

- check for getting to the end of the btree in bch2_path_verify_locks
and __btree_path_traverse_all(), this fixes an infinite loop in
__btree_path_traverse_all().
- relax requirement in bch2_btree_node_upgrade() that we must want an
intent lock, this fixes bugs with paths that point to interior nodes
(nonzero level).
- bch2_btree_node_update_key(): fix it to upgrade the path to an intent
lock, if necessary

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 502027a8 07-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Ensure btree_path consistent with node iterators

Btree node iterators want the interior btree_path to point to the same
pos as the returned btree node - this fixes a regression from the
introduction of btree_path, where rewriting/updating keys of btree nodes
(e.g. in bch2_dev_metadata_drop()) via btree node iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# a861c722 15-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Require snapshot id to be set

Now that all the existing code has been converted for snapshots, this
patch changes the code for initializing a btree iterator to require a
snapshot to be specified, and also change bkey_invalid() to allow for
non U32_MAX snapshot IDs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c075ff70 04-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BTREE_ITER_FILTER_SNAPSHOTS

For snapshots, we need to implement btree lookups that return the first
key that's an ancestor of the snapshot ID the lookup is being done in -
and filter out keys in unrelated snapshots. This patch adds the btree
iterator flag BTREE_ITER_FILTER_SNAPSHOTS which does that filtering.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 3074bc0f 15-Sep-2021 Kent Overstreet <kent.overstreet@gmail.com>

Revert "bcachefs: Add more assertions for locking btree iterators out of order"

Figured out the bug we were chasing, and it had nothing to do with
locking btree iterators/paths out of order.

This reverts commit ff08733dd298c969aec7c7828095458f73fd5374.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8ee0134e 07-Sep-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Consolidate intent lock code in btree_path_up_until_good_node

We need to take all needed intent locks when relocking an iterator:
bch2_btree_path_traverse() had a special cased, faster version of this,
but it really should be in up_until_good_node() so that set_pos() can
use it too.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# db92f2ea 07-Sep-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Optimize btree lookups in write path

This patch significantly reduces the number of btree lookups required in
the extent update path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c404f203 07-Sep-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a missing btree_path_make_mut() call

Also add another small helper, btree_path_clone().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f48361b0 04-Sep-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop some fast path tracepoints

These haven't turned out to be useful

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 1d3ecd7e 04-Sep-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tighten up btree locking invariants

New rule is: if a btree path holds any locks it should be holding
precisely the locks wanted (accoringing to path->level and
path->locks_want).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 1ae29c1f 04-Sep-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Extent btree iterators are no longer special

Since iter->real_pos was introduced, we no longer have to deal with
extent btree iterators that have skipped past deleted keys - this is a
real performance improvement on btree updates.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 068bcaa5 03-Sep-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add more assertions for locking btree iterators out of order

btree_path_traverse_all() traverses btree iterators in sorted order, and
thus shouldn't see transaction restarts due to potential deadlocks - but
sometimes we do. This patch adds some more assertions and tracks some
more state to help track this down.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 807dda8c 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bpos_diff() XXX check for perf regression

This improves the btree iterator lookup code by using
trans_for_each_iter_inorder().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 67e0dd8f 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_path

This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f21566f1 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BTREE_ITER_NODES

We really only need to distinguish between btree iterators and btree key
cache iterators - this is more prep work for btree_path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# deb0e573 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BTREE_ITER_NEED_PEEK

This was used for an optimization that hasn't existing in quite awhile
- iter->uptodate will probably be going away as well.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# a0a56879 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More renaming

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f7a966a3 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Clean up/rename bch2_trans_node_* fns

These utility functions are for managing btree node state within a
btree_trans - rename them for consistency, and drop some unneeded
arguments.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 78cf784e 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Further reduce iter->trans usage

This is prep work for splitting btree_path out from btree_iter -
btree_path will not have a pointer to btree_trans.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 5f8077cc 29-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BTREE_ITER_SET_POS_AFTER_COMMIT

BTREE_ITER_SET_POS_AFTER_COMMIT is used internally to automagically
advance extent btree iterators on sucessful commit.

But with the upcomnig btree_path patch it's getting more awkward to
support, and it adds overhead to core data structures that's only used
in a few places, and can be easily done by the caller instead.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9f6bd307 24-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reduce iter->trans usage

Disfavoured, and should go away.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 84841b0d 24-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_dump_trans_iters_updates()

This factors out bch2_dump_trans_iters_updates() from the iter alloc
overflow path, and makes some small improvements to what it prints.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e6e024e9 24-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Ensure iter->real_pos is consistent with key returned

iter->real_pos needs to match the key returned or bad things will happen
when we go to update the key at that position. When we returned a
pending update from btree_trans_peek_updates(), this wasn't necessarily
the case.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# dc02bed6 23-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Free iterator if we have duplicate

This helps - but does not fully fix - the outstanding "transaction
iterator overflow" bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c8476a4e 07-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Minor btree iter refactoring

This makes the flow control in bch2_btree_iter_peek() and
bch2_btree_iter_peek_prev() a bit cleaner.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d2c50773 07-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix btree_trans_peek_updates()

Should have been using bpos_cmp(), not bkey_cmp().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 0423fb71 12-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Keep a sorted list of btree iterators

This will be used to make other operations on btree iterators within a
transaction more efficient, and enable some other improvements to how we
manage btree iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0d32711e 27-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: traverse_all() shouldn't be restarting the transaction

We're only called by bch2_trans_begin() now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 955af634 24-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: __bch2_trans_commit() no longer calls bch2_trans_reset()

It's now the caller's responsibility to call bch2_trans_begin.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e829b717 21-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Ensure btree_iter_traverse() obeys iter->should_be_locked

iter->should_be_locked means that if bch2_btree_iter_relock() fails, we
need to restart the transaction.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b4e09b35 27-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_btree_iter_traverse() shouldn't normally call traverse_all()

If there's more than one iterator in the btree_trans, it's requried to
call bch2_trans_begin() to handle transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e5af273f 25-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trans->restarted

Start tracking when btree transactions have been restarted - and assert
that we're always calling bch2_trans_begin() immediately after
transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# a88171c9 24-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Clean up interior update paths

Btree node merging now happens prior to transaction commit, not after,
so we don't need to pay attention to BTREE_INSERT_NOUNLOCK.

Also, foreground_maybe_merge shouldn't be calling
bch2_btree_iter_traverse_all() - this is becoming private to the btree
iterator code and should only be called by bch2_trans_begin().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8b3e9bd6 24-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Always check for transaction restarts

On transaction restart iterators won't be locked anymore - make sure
we're always checking for errors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 67b07638 24-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: traverse_all() is responsible for clearing should_be_locked

bch2_btree_iter_traverse_all() may loop, and it needs to clear
iter->should_be_locked on every iteration.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# fe523397 27-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_relock() only relocks iters that should be locked

This avoids unexpected lock restarts in bch2_btree_iter_traverse_all().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2b4e4b8c 24-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Minor tracepoint improvements

Btree iterator tracepoints should print whether they're for the key
cache.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 6e075b54 24-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_btree_iter_relock_intent()

This adds a new helper for btree_cache.c that does what we want where
the iterator is still being traverse - and also eliminates some
unnecessary transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 71f892a4 14-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_btree_iter_rewind()

We'd hit a BUG() when rewinding at the start of the btree on btrees with
snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 5aab6635 14-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tighten up btree_iter locking assertions

We weren't correctly verifying that we had interior node intent locks -
this patch also fixes bugs uncovered by the new assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# d38494c4 07-Jul-2021 Dan Robertson <dan@dlrobertson.com>

bcachefs: docs: add docs for bch2_trans_reset

Add basic kernel docs for bch2_trans_reset and bch2_trans_begin.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c21affdd 05-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_btree_iter_peek_slot() assertion

This assertion is checking that what the iterator points to is
consistent with iter->real_pos, and since it's an internal btree
ordering property it should be using bpos_cmp.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 618b1c0e 05-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out SPOS_MAX

Internal btree code really wants a POS_MAX with all fields ~0; external
code more likely wants the snapshot field to be 0, because when we're
passing it to bch2_trans_get_iter() it's used for the snapshot we're
operating in, which should be 0 for most btrees that don't use
snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 508b1f71 03-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_btree_iter_peek_prev()

In !BTREE_ITER_IS_EXTENTS mode, we shouldn't be looking at k->size, i.e.
we shouldn't use bkey_start_pos().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 729608a6 24-Jun-2021 Christopher James Halse Rogers <raof@ubuntu.com>

bcachefs: Fix unused variable warning when !BCACHEFS_DEBUG

Signed-off-by: Christopher James Halse Rogers <raof@ubuntu.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bb6bbf4a 12-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Clear iter->should_be_locked in bch2_trans_reset

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 953ee28a 10-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bch2_btree_iter_peek_cached()

It's now been rolled into bch2_btree_iter_peek_slot()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# a49e9a05 12-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix null ptr deref when splitting compressed extents

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 7ed158f2 07-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Always zero memory from bch2_trans_kmalloc()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# cd8319fd 07-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill trans->updates2

Now that extent handling has been lifted to bch2_trans_update(), we
don't need to keep two different lists of updates.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8e6bbc41 01-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move extent_handle_overwrites() to bch2_trans_update()

This lifts handling of overlapping extents out of __bch2_trans_commit()
and moves it to where we first do the update - which means that
BTREE_ITER_WITH_UPDATES can now work correctly in extents mode.

Also, this patch reworks how extent triggers work: previously, on
partial extent overwrite we would pass this information to the trigger,
telling it what part of the extent was being overwritten. But, this
approach has had too many subtle corner cases - now, we only mark whole
extents, meaning on partial extent overwrite we unmark the old extent
and mark the new extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b1d87f52 30-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_iter_peek_slot() now saves initial position when searching

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1d214eb1 30-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill __bch2_btree_iter_peek_slot_extents()

This codepath won't just be for extents in the future, it'll also be for
BTREE_ITER_FILTER_SNAPSHOTS mode.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e750296b 30-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_iter_peek_slot() now supports BTREE_ITER_WITH_UPDATES

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5288e66a 03-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BTREE_ITER_WITH_UPDATES

This drops bch2_btree_iter_peek_with_updates() and replaces it with a
new flag, BTREE_ITER_WITH_UPDATES, and also reworks
bch2_btree_iter_peek_slot() to respect it too.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 509d3e0a 20-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Child btree iterators

This adds the ability for btree iterators to own child iterators - to be
used by an upcoming rework of bch2_btree_iter_peek_slot(), so we can
scan forwards while maintaining our current position.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c205321b 08-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop all btree locks when submitting btree node reads

As a rule we don't want to be holding btree locks while submitting IO -
this will improve overall filesystem latency.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 7138f220 08-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a spurious debug mode assertion

When we switched to using bch2_btree_bset_insert_key() for extents it
turned out it started leaving invalid keys around - of type deleted but
nonzero size - but this is fine (if ugly) because they're never written
out.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 66a0a497 04-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_iter->should_be_locked

Add a field to struct btree_iter for tracking whether it should be
locked - this fixes spurious transaction restarts in
bch2_trans_relock().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 531a0095 04-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve btree iterator tracepoints

This patch adds some new tracepoints to the btree iterator code, and
adds new fields to the existing tracepoints - primarily for the iterator
position.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c0ebe3e4 23-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Assorted endianness fixes

Found by sparse

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2cd05634 16-May-2021 Brett Holman <bpholman5@gmail.com>

bcachefs: made changes to support clang, fixed a couple bugs

fs/bcachefs/bset.c edited prefetch macro to add clang support
fs/bcachefs/btree_iter.c bugfix: initialize iter->real_pos in bch2_btree_iter_init for later use
fs/bcachefs/io.c bugfix: eliminated undefined behavior (negative bitshift)
fs/bcachefs/buckets.c bugfix: invert sign to handle 64bit abs()

Signed-off-by: Brett Holman <bpholman5@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 360746bf 29-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_btree_iter_peek_with_updates()

By not re-fetching the next update we were going into an infinite loop.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3dea728c 29-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New tracepoint for bch2_trans_get_iter()

Trying to debug an issue where after traverse_all() we shouldn't have to
traverse any iterators... yet we are

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d36cdb04 27-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix __bch2_trans_get_iter()

We need to also set iter->uptodate to indicate it needs to be traversed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5e6a668b 16-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix transaction restarts due to upgrading of cloned iterators

This fixes a regression from
52d86202fd bcachefs: Improve bch2_btree_iter_traverse_all()

We want to avoid mucking with other iterators in the btree transaction
in operations that are only supposed to be touching individual iterators
- that patch was a cleanup to move lock ordering handling to
bch2_btree_iter_traverse_all(). But it broke upgrading of cloned
iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 73a117d2 14-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve trans_restart_mem_realloced tracepoint

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 558509aa 14-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't downgrade iterators in bch2_trans_get_iter()

This fixes a livelock with btree node splits.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2527dd91 14-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve bch2_btree_iter_traverse_all()

By changing it to upgrade iterators to intent locks to avoid lock
restarts we can simplify __bch2_btree_node_lock() quite a bit - this
fixes a probable bug where it could potentially drop a lock on an
unrelated error but still succeed instead of causing a transaction
restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a0c9cc17 14-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Better iterator picking

Avoid cloning iterators if we don't have to.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b69ac13c 12-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_trans_relock()

The patch that changed bch2_trans_relock() to not look at iter->uptodate
also tried to add an optimization by only having it relock
btree_iter_key() iterators (iterators that are live or have been marked
as keep). But, this wasn't thought through - this pops internal iterator
assertions because on transaction restart, when we're traversing
iterators we traverse all iterators marked as linked, and having
bch2_trans_relock() skip some of those mean that it can skil the
iterator that bch2_btree_iter_traverse_one() is currently traversing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d7f35163 09-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix BTREE_ITER_NOT_EXTENTS

bch2_btree_iter_peek() wasn't properly checking for
BTREE_ITER_IS_EXTENTS when updating iter->pos.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 241e2636 31-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't flush btree writes more aggressively because of btree key cache

We need to flush the btree key cache when it's too dirty, because
otherwise the shrinker won't be able to reclaim memory - this is done by
journal reclaim. But journal reclaim also kicks btree node writes: this
meant that btree node writes were getting kicked much too often just
because we needed to flush btree key cache keys.

This patch splits journal pins into two different lists, and teaches
journal reclaim to not flush btree node writes when it only needs to
flush key cache keys.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 35d5aff2 03-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bch2_fs_usage_scratch_get()

This is an important cleanup, eliminating an unnecessary copy in the
transaction commit path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2d587674 02-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Increase commality between BTREE_ITER_NODES and BTREE_ITER_KEYS

Eventually BTREE_ITER_NODES should be going away. This patch is to fix a
transaction iterator overflow in the btree node merge path because
BTREE_ITER_NODES iterators couldn't be reused.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5c1d808a 31-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop trans->nounlock

Since we're no longer doing btree node merging post commit, we can now
delete a bunch of code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d5a43661 30-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve bch2_trans_relock()

We're getting away from relying on iter->uptodate - this changes
bch2_trans_relock() to more directly specify which iterators should be
relocked.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# acb3b26e 31-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move btree lock debugging to slowpath fn

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e751c01a 24-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Start using bpos.snapshot field

This patch starts treating the bpos.snapshot field like part of the key
in the btree code:

* bpos_successor() and bpos_predecessor() now include the snapshot field
* Keys in btrees that will be using snapshots (extents, inodes, dirents
and xattrs) now always have their snapshot field set to U32_MAX

The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that
determines whether we're iterating over keys in all snapshots or not -
internally, this controlls whether bkey_(successor|predecessor)
increment/decrement the snapshot field, or only the higher bits of the
key.

We add a new member to struct btree_iter, iter->snapshot: when
BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always
equal iter->snapshot, which will be 0 for btrees that don't use
snapshots, and alsways U32_MAX for btrees that will use snapshots
(until we enable snapshot creation).

This patch also introduces a new metadata version number, and compat
code for reading from/writing to older versions - this isn't a forced
upgrade (yet).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4cf91b02 04-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out bpos_cmp() and bkey_cmp()

With snapshots, we're going to need to differentiate between comparisons
that should and shouldn't include the snapshot field. bpos_cmp is now
the comparison function that does include the snapshot field, used by
core btree code.

Upper level filesystem code generally does _not_ want to compare against
the snapshot field - that code wants keys to compare as equal even when
one of them is in an ancestor snapshot.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 43d00243 03-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a mechanism for running callbacks at trans commit time

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f793fd85 27-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix for bch2_trans_commit() unlocking when it's not supposed to

When we pass BTREE_INSERT_NOUNLOCK bch2_trans_commit isn't supposed to
unlock after a successful commit, but it was calling
bch2_trans_cond_resched() - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a9d79c6e 23-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use pcpu mode of six locks for interior nodes

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 08070cba 23-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split btree_iter_traverse and bch2_btree_iter_traverse()

External (to the btree iterator code) users of bch2_btree_iter_traverse
expect that on success the iterator will be pointed at iter->pos and
have that position locked - but since we split iter->pos and
iter->real_pos, that means it has to update iter->real_pos if necessary.

Internal users don't expect it to modify iter->real_pos, so we need two
separate functions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bcad5622 21-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Update iter->real_pos lazily

peek() has to update iter->real_pos - there's no need for
bch2_btree_iter_set_pos() to update it as well.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 818664f5 21-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Consolidate bch2_btree_iter_peek() and peek_with_updates()

Ideally we'll be getting rid of peek_with_updates(), but the callers
will need to be checked.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ca58cbd4 21-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve iter->real_pos handling

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3b0baf6f 21-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Internal btree iterator renaming

This just gives some internal helpers some better names.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 07fc72e1 21-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill btree_iter_peek_uptodate()

Since we're no longer doing next() immediately followed by peek(), this
optimization isn't doing anything anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5cde51cd 21-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Iterators are now always consistent with iter->real_pos

This means bch2_btree_iter_traverse_one() can be made more efficient.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 345ca825 21-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Have btree_iter_next_node() use btree_iter_set_search_pos()

btree node iterators need to obey the regular btree node invarionts
w.r.t. iter->real_pos; once they do, bch2_btree_iter_traverse will have
less that it needs to check.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e0ba3b64 21-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Replace bch2_btree_iter_next() calls with bch2_btree_iter_advance

The way btree iterators work internally has been changing, particularly
with the iter->real_pos changes, and bch2_btree_iter_next() is no longer
hyper optimized - it's just advance followed by peek, so it's more
efficient to just call advance where we're not using the return value of
bch2_btree_iter_next().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4ce41957 20-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Optimize bch2_btree_iter_verify_level()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5c1ec980 20-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix iterator picking

comparison was wrong

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e9895f0a 19-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Assert that iterators aren't being double freed

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 50dc0f69 19-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Require all btree iterators to be freed

We keep running into occasional bugs with btree transaction iterators
overflowing - this will make those bugs more visible.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8d956c2f 19-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_iter_set_dontneed()

This is a bit clearer than using bch2_btree_iter_free().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# abcecb49 19-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fsck code refactoring

Change fsck code to always put btree iterators - also, make some flow
control improvements to deal with lock restarts better, and refactor
check_extents() to not walk extents twice for counting/checking
i_sectors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f2eaea2f 15-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill btree_iter_pos_changed()

this is used in only one place now, so just inline it into the caller.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 57447b7a 15-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a btree iterator leak

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c7bb769c 19-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: __bch2_trans_get_iter() refactoring, BTREE_ITER_NOT_EXTENTS

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a045be5a 04-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Simplify bch2_btree_iter_peek_prev()

Since we added iter->real_pos, btree_iter_set_pos_to_(next|prev)_leaf no
longer modify iter->pos, so we don't have to save it at the start
anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 61a19ce4 04-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bpos_diff()

Previously, bpos_diff() did not handle borrows correctly. Minor thing
considering how it was used, but worth fixing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f020bfcd 04-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bch2_bpos_to_text() more consistently

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 18fc6ae5 02-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_iter_prev_slot()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1f7fdc0a 20-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_iter_live()

New helper to clean things up a bit - also, improve iter->flags
handling.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c052cf82 19-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: KEY_TYPE_discard is no longer used

KEY_TYPE_discard used to be used for extent whiteouts, but when handling
over overlapping extents was lifted above the core btree code it became
unused. This patch updates various code to reflect that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9620c3ec 23-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a mempool for the replicas delta list

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e131b6aa 23-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a mempool for btree_trans bump allocator

This allocation is required for filesystem operations to make forward
progress, thus needs a mempool.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8042b5b7 10-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Extents may now cross btree node boundaries

When snapshots arrive, we won't necessarily be able to arbitrarily split
existis - when we need to split an existing extent, we'll have to check
if the extent was overwritten in child snapshots and if so emit a
whiteout for the split in the child snapshot.

Because extents couldn't span btree nodes previously, journal replay
would sometimes have to split existing extents. That's no good anymore,
but fortunately since extent handling has already been lifted above most
of the btree code there's no real need for that rule anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7e1a3aa9 11-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: iter->real_pos

We need to differentiate between the search position of a btree
iterator, vs. what it actually points at (what we found). This matters
for extents, where iter->pos will typically be the start of the key we
found and iter->real_pos will be the end of the key we found (which soon
won't necessarily be in the same btree node!) and it will also matter
for snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3d495595 07-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_btree_iter_peek_prev()

This makes bch2_btree_iter_peek_prev() and bch2_btree_iter_prev()
consistent with peek() and next(), w.r.t. iter->pos.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 434094be 07-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_btree_iter_advance_pos()

This adds a new common helper for advancing past the last key returned
by peek().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 792e2c4c 07-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bch2_btree_iter_set_pos_same_leaf()

The only reason we were keeping this around was for
BTREE_INSERT_NOUNLOCK semantics - if bch2_btree_iter_set_pos() advances
to the next leaf node, it'll drop the lock on the node that we just
inserted to.

But we don't rely on BTREE_INSERT_NOUNLOCK semantics for the extents
btree, just the inodes btree, and if we do need it for the extents btree
in the future we can do it more cleanly by cloning the iterator - this
lets us delete some special cases in the btree iterator code, which is
complicated enough as it is.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2b2c1a89 07-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Simplify btree_iter_(next|prev)_leaf()

There's no good reason for these functions to not be using
bch2_btree_iter_set_pos().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# edfbba58 11-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add btree node prefetching to bch2_btree_and_journal_walk()

bch2_btree_and_journal_walk() walks the btree overlaying keys from the
journal; it was introduced so that we could read in the alloc btree
prior to journal replay being done, when journalling of updates to
interior btree nodes was introduced.

But it didn't have btree node prefetching, which introduced a severe
regression with mount times, particularly on spinning rust. This patch
implements btree node prefetching for the btree + journal walk,
hopefully fixing that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 07a1006a 17-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reduce/kill BKEY_PADDED use

With various newer key types - stripe keys, inline data extents - the
old approach of calculating the maximum size of the value is becoming
more and more error prone. Better to switch to bkey_on_stack, which can
dynamically allocate if necessary to handle any size bkey.

In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 07bd4c28 19-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix btree lock being incorrectly dropped

__btree_trans_get_iter() was using bch2_btree_iter_upgrade, but it
shouldn't have been because on failure bch2_btree_iter_upgrade may drop
locks in other iterators, expecting the transaction to be restarted. But
__btree_trans_get_iter can't return an error to indicate that we need to
restart thet transaction - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 537c49d6 10-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix btree node merge -> split operations

If a btree node merger is followed by a split or compact of the parent
node, we could end up with the parent btree node iterator pointing to
the whiteout inserted by the btree node merge operation - the fix is to
ensure that interior btree node iterators always point to the first non
whiteout.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cc578a36 09-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix __btree_iter_next() when all iters are in use_next() when all iters are in use

Also, print out more information on btree transaction iterator overflow.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a2bfc841 06-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Try to print full btree error message

Metadata corruption bugs are hard to debug if we can't see exactly what
went wrong - try to allocate a bigger buffer so we can print out
everything we have.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3eb26d01 01-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_get_iter() no longer returns errors

Since we now always preallocate the maximum number of iterators when we
initialize a btree transaction, getting an iterator never fails - we can
delete a fair amount of error path code.

This patch also simplifies the iterator allocation code a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dbd1e825 16-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Dont' use percpu btree_iter buf in userspace

bcachefs-tools doesn't have a real percpu (per thread) implementation
yet

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0b5c9f59 15-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Set preallocated transaction mem to avoid restarts

this will reduce transaction restarts, from observation of tracepoints.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 876c7af3 15-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Take a SRCU lock in btree transactions

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b3d1e6ca 06-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix build warning when CONFIG_BCACHEFS_DEBUG=n

this function is only used by debug code, but we'd like to always build
it so we know that it does build.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7e7ae6ca 05-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix spurious transaction restarts

The checks for lock ordering violations weren't quite right.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1a21bf98 05-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a single slot percpu buf for btree iters

Allocating our array of btree iters is a big enough allocation that it
hits the buddy allocator, and we're seeing lots of lock contention.
Sticking a single element buffer in front of it should help.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ae1ede58 02-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't embed btree iters in btree_trans

These haven't been in used since reallocing iterators has been disabled,
and saves us a lot of stack if we get rid of it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 29364f34 02-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop sysfs interface to debug parameters

It's not used much anymore, the module paramter interface is better.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dcf141b9 28-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix spurious transaction restarts

The check for whether locking a btree node would deadlock was wrong - we
have to check that interior nodes are locked before descendents, but
this check was wrong when consider cached vs. non cached iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a301dc38 28-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve tracing for transaction restarts

We have a bug where we can get stuck with a process spinning in
transaction restarts - need more information.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 645d72aa 26-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix btree updates when mixing cached and non cached iterators

There was a bug where bch2_trans_update() would incorrectly delete a
pending update where the new update did not actually overwrite the
existing update, because we were incorrectly using BTREE_ITER_TYPE when
sorting pending btree updates.

This affects the pending patch to use cached iterators for inode
updates.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c61b7e21 26-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a null ptr deref in bch2_btree_iter_traverse_one()

We use sentinal values that aren't NULL to indicate there's a btree node
at a higher level; occasionally, this may result in
btree_iter_up_until_good_node() stopping at one of those sentinal
values.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d211b408 15-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix lock ordering with new btree cache code

The code that checks lock ordering was recently changed to go off of the
pos of the btree node, rather than the iterator, but the btree cache
code didn't update to handle iterators that point to cached bkeys. Oops

Also, update various debug code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2ca88e5a 07-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Btree key cache

This introduces a new kind of btree iterator, cached iterators, which
point to keys cached in a hash table. The cache also acts as a write
cache - in the update path, we journal the update but defer updating the
btree until the cached entry is flushed by journal reclaim.

Cache coherency is for now up to the users to handle, which isn't ideal
but should be good enough for now.

These new iterators will be used for updating inodes and alloc info (the
alloc and stripes btrees).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bd2bb273 12-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't deadlock when btree node reuse changes lock ordering

Btree node lock ordering is based on the logical key. However, 'struct
btree' may be reused for a different btree node under memory pressure.
This patch uses the new six lock callback to check if a btree node is no
longer the node we wanted to lock before blocking.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 515282ac 12-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a deadlock

__bch2_btree_node_lock() was incorrectly using iter->pos as a proxy for
btree node lock ordering, this caused an off by one error that was
triggered by bch2_btree_node_get_sibling() getting the previous node.

This refactors the code to compare against btree node keys directly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 72545b5e 08-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_downgrade()

bch2_btree_iter_downgrade() was looping over all iterators in a
transaction; bch2_trans_downgrade() should be doing that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f96c0df4 02-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a deadlock in bch2_btree_node_get_sibling()

There was a bad interaction with bch2_btree_iter_set_pos_same_leaf(),
which can leave a btree node locked that is just outside iter->pos,
breaking the lock ordering checks in __bch2_btree_node_lock(). Ideally
we should get rid of this corner case, but for now fix it locally with
verbose comments.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 495aabed 02-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add debug code to print btree transactions

Intented to help debug deadlocks, since we can't use lockdep to check
btree node lock ordering.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96e2aa1b 25-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a mechanism for passing extra journal entries to bch2_trans_commit()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0329b150 01-Apr-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Trace where btree iterators are allocated

This will help with iterator overflow bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 39fb2983 07-Jan-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bkey_type_successor

Previously, BTREE_ID_INODES was special - inodes were indexed by the
inode field, which meant the offset field of struct bpos wasn't used,
which led to special cases in e.g. the btree iterator code.

Now, inodes in the inodes btree are indexed by the offset field.

Also: prevously min_key was special for extents btrees, min_key for
extents would equal max_key for the previous node. Now, min_key =
bkey_successor() of the previous node, same as non extent btrees.

This means we can completely get rid of
btree_type_sucessor/predecessor.

Also make some improvements to the metadata IO validate/compat code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8666a9ad 18-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an iterator bug

We were incorrectly not restarting the transaction when re-traversing
iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e3e464ac 30-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move extent overwrite handling out of core btree code

Ever since the btree code was first written, handling of overwriting
existing extents - including partially overwriting and splittin existing
extents - was handled as part of the core btree insert path. The modern
transaction and iterator infrastructure didn't exist then, so that was
the only way for it to be done.

This patch moves that outside of the core btree code to a pass that runs
at transaction commit time.

This is a significant simplification to the btree code and overall
reduction in code size, but more importantly it gets us much closer to
the core btree code being completely independent of extents and is
important prep work for snapshots.

This introduces a new feature bit; the old and new extent update models
are incompatible when the filesystem needs journal replay.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 57b0b3db 05-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_iter_peek_with_updates()

Introduce a new iterator method that provides a consistent view of the
btree plus uncommitted updates.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7d6f9b64 15-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix build when CONFIG_BCACHEFS_DEBUG=n

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2e70ce56 18-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More btree iter invariants

Ensure that iter->pos always lies between the start and end of iter->k
(the last key returned). Also, bch2_btree_iter_set_pos() now invalidates
the key that peek() or next() returned.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c3801239 13-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Simplify bch2_btree_iter_peek_slot()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2dac0eae 18-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Iterator debug code improvements

More aggressively checking iterator invariants, and fixing the resulting
bugs. Also greatly simplifying iter_next() and iter_next_slot() - they
were hyper optimized before, but the optimizations were getting too
brittle.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e3ecf4f5 02-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Some btree iterator improvements

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 72141e1f 24-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use btree_ptr_v2.mem_ptr to avoid hash table lookup

Nice performance optimization

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 163e885a 26-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill TRANS_RESET_MEM|TRANS_RESET_ITERS

All iterators should be released now with bch2_trans_iter_put(), so
TRANS_RESET_ITERS shouldn't be needed anymore, and TRANS_RESET_MEM is
always used.

Also convert more code to __bch2_trans_do().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b606c8aa 18-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix traversing to interior nodes

NULL is used to mean "reach end of traversal" - we were only
initializing the leaf node in the iterator to the right sentinal value.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c4a94ae3 31-Jan-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make BTREE_ITER_IS_EXTENTS private to iter code

Prep work for changing the core btree update path to handle extents like
regular keys; we need to reduce the scope of what BTREE_ITER_IS_EXTENTS
means

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6a9ec828 31-Jan-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: __bch2_btree_iter_set_pos()

This one takes an additional argument for whether we're searching for >=
or > the search key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8b53852d 18-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make sure we're releasing btree iterators

This wasn't originally required, but this is the model we're moving
towards.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a965ef49 15-Jan-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_sort_keys() to not modify src keys

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9626aeb1 06-Jan-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rework iter->pos handling

- Rework some of the helper comparison functions for consistency

- Currently trying to refactor all the logic that's different for
extents in the btree iterator code. The main difference is that for non
extents we search for a key greater than or equal to the search key,
while for extents we search for a key strictly greater than the search
key (iter->pos).

So that logic is now handled by btree_iter_search_key(), which computes
the real search key based on iter->pos and whether or not we're
searching for a key >= or > iter->pos.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d5cdf033 03-Jan-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an iterator error path

On transaction restart (-EINTR), we need to traverse all iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 24326cd1 31-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Sort & deduplicate updates in bch2_trans_update()

Previously, when doing multiple update in the same transaction commit
that overwrote each other, we relied on doing the updates in the same
order as the bch2_trans_update() calls in order to get the correct
result. But that wasn't correct for triggers; bch2_trans_mark_update()
when marking overwrites would do the wrong thing because it hadn't seen
the update that was being overwritten.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 54e86b58 30-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make btree_insert_entry more private to update path

This should be private to btree_update_leaf.c, and we might end up
removing it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f21539a5 29-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bch2_trans_reset in bch2_trans_commit()

Clean up a bit of duplicated code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 780c4e43 20-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop a faulty assertion

This assertion was wrong for interior nodes (and wasn't terribly useful
to begin with)

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b5a5c4c1 16-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a null ptr deref in btree_iter_traverse_one()

When traversing nodes and we've reached the end of the btree, the
current btree node will be NULL.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f58c22e7 04-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Avoid calling bch2_btree_iter_relock() in bch2_btree_iter_traverse()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 887c2a4e 02-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_btree_iter_fix_key_modified()

This is considerably cheaper than bch2_btree_node_iter_fix(), for cases
where the key was only modified and key ordering isn't changing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b7ba66c8 28-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Inline more of bch2_trans_commit hot path

The main optimization here is that if we let
bch2_replicas_delta_list_apply() fail, we can completely skip calling
bch2_bkey_replicas_marked_locked().

And assorted other small optimizations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c4e065c2 23-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More bset.c microoptimization

Improve a few paper cuts that've shown up during profiling.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 538abcb8 12-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a debug assertion

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ea3532cb 11-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a subtle race in the btree split path

We have to free the old (in memory) btree node _before_ unlocking the
new nodes - else, some other thread with a read lock on the old node
could see stale data after another thread has already updated the new
node.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 14989d54 09-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_btree_iter_next() after peek_slot()

this deserves a unit test

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ab9ff733 01-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an error path

It's possible to get -EIO in __btree_iter_traverse_all() after looping,
with orig_iter NULL.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 64bc0011 26-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rework btree iterator lifetimes

The btree_trans struct needs to memoize/cache btree iterators, so that
on transaction restart we don't have to completely redo btree lookups,
and so that we can do them all at once in the correct order when the
transaction had to restart to avoid a deadlock.

This switches the btree iterator lookups to work based on iterator
position, instead of trying to match them up based on the stack trace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bbd8d203 22-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BTREE_ITER_SLOTS isn't a type of btree iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ccf5a109 07-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_btree_iter_peek_prev()

Last of the basic operations for iterating forwards and backwards over
the btree: we now have
- peek(), returns key >= iter->pos
- next(), returns key > iter->pos
- peek_prev(), returns key <= iter->pos
- prev(), returns key < iter->pos

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 059e4134 19-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Debug assertion improvements

Call bch2_btree_iter_verify from bch2_btree_node_iter_fix(); also verify
in btree_iter_peek_uptodate() that iter->k matches what's in the btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f4b61341 07-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More btree iter improvements

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3745efd6 13-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve btree_iter_pos_in_node()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9b02d1c4 08-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Optimize calls to bch2_btree_iter_traverse()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c0fc30da 07-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: __bch2_btree_node_iter_fix() improvements

Being more rigorous about noting when the key the iterator currently
poins to has changed - which should also give us a nice performance
improvement due to not having to check if we have to skip other bsets
backwards as much.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 36e9d698 07-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Do updates in order they were queued up in

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 23bbd2bb 20-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_btree_node_iter_fix()

bch2_btree_node_iter_prev_all() depends on an invariant that wasn't
being maintained for extent leaf nodes - specifically, the node iterator
may not have advanced past any keys that compare after the key the node
iterator points to.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9df27940 17-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix __bch2_btree_iter_peek_slot_extents()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 63f1a598 17-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improved debug checks

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1904a65a 12-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Ensure bch2_trans_get_iter() returns iters with correct locks

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 37dd7834 24-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an error path in bch2_btree_iter_traverse()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3838be78 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use a fixed size buffer for fs_usage_deltas

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 61011ea2 14-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rip out old hacky transaction restart tracing

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 20bceecb 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More work to avoid transaction restarts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7d825866 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Avoid spurious transaction restarts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0e6dd8fb 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Ensure bch2_btree_iter_next() always advances

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 58fbf808 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Delete duplicate code

Also rename for consistency

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ed8413fd 14-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: improved btree locking tracepoints

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 60755344 10-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: kill BTREE_ITER_NOUNLOCK

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b03b81df 10-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't pass around may_drop_locks

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b7607ce9 10-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill remaining bch2_btree_iter_unlock() uses

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 932aa837 11-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_mark_update()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c43a6ef9 05-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_bkey_cached_common

This is prep work for the btree key cache: btree iterators will point to
either struct btree, or a new struct bkey_cached.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 33eb63e5 14-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a bug with multiple iterators being traversed

If upgrade fails on one iterator, but it was copied from another
iterator and will be freed before transaction restart, then the original
iterator will get traversed first, so we need to make required btree
nodes on the original iterator will be traversed too.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ab5c63f5 11-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't hardcode BTREE_ID_EXTENTS

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ba5c6557 22-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add actual tracepoints for transaction restarts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1dd7f9d9 04-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rewrite journal_seq_blacklist machinery

Now, we store blacklisted journal sequence numbers in the superblock,
not the journal: this helps to greatly simplify the code, and more
importantly it's now implemented in a way that doesn't require all btree
nodes to be visited before starting the journal - instead, we
unconditionally blacklist the next 4 journal sequence numbers after an
unclean shutdown.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ece254b2 04-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: don't lose errors from iterators that have been freed

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4c1c1e39 31-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix bch2_trans_unlock()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a2b6b072 29-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: add missing bch2_btree_iter_node_drop() call

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f13f5a8c 27-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: move some checks to expensive_debug_checks

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4afe7000 27-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Unlink not-touched iters on successful transaction commit

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bf7b87a4 27-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: traverse all iterators on transaction restart

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e1120a4c 27-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add iter->idx

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ecc892e4 27-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill btree_iter->next

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e542029e 27-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Change btree_iter_traverse_error() to not use iter->next

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0f238367 27-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trans_for_each_iter()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7c26ecae 25-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Better bch2_trans_copy_iter()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9e5e5b9e 25-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Btree iterators now always have a btree_trans

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 424eb881 25-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Only get btree iters from btree transactions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5df4be3f 25-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Btree iter improvements

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a8e00bd4 07-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: increase BTREE_ITER_MAX

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 446c562c 04-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Remove direct use of bch2_btree_iter_link()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 56e0e7c7 05-Dec-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix an incorrect bkey_debugcheck() call

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 26609b61 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make bkey types globally unique

this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.

Also improve the on disk format versioning stuff.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ad7ae8d6 23-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Btree locking fix, refactoring

Hit an assertion, probably spurious, indicating an iterator was unlocked
when it shouldn't have been (spurious because it wasn't locked at all
when the caller called btree_insert_at()).

Add a flag, BTREE_ITER_NOUNLOCK, and tighten up the assertions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8812600c 21-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix btree iterator bug when using depth > 0

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 319f9ac3 08-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: revamp to_text methods

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cbdf24ce 21-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a btree iter bug when iter pos == POS_MAX

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a00fd8c5 21-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Comparison function cleanups

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 216c9fac 11-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Pass around bset_tree less

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fc3268c1 08-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: kill extent_insert_hook

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 08af47df 08-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: convert bchfs_write_index_update() to bch2_extent_update()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 581edb63 08-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: mempoolify btree_trans

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e4ccb251 05-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: make struct btree_iter a bit smaller

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 271a3d3a 21-Jul-2016 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: lift ordering restriction on 0 size extents

This lifts the restriction that 0 size extents must not overlap with
other extents, which means we can now sort extents and non extents the
same way, and will let us simplify a bunch of other stuff as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 647d7b60 24-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an assertion in the btree node merge path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c7a0adf 12-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trace transaction restarts

exceptionally crappy "tracing", but it's a start at documenting the
places restarts can be triggered

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c6fdbd8 17-Mar-2017 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Initial commit

Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>