History log of /linux-master/fs/bcachefs/btree_cache.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# e879389f 12-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix bch2_btree_node_fill() for !path

We shouldn't be doing the unlock/relock dance when we're not using a
path - this fixes an assertion pop when called from btree node scan.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8cf2036e 12-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: add safety checks in bch2_btree_node_fill()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7b4c4ccf 11-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fix race in bch2_btree_node_evict()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 79032b07 23-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improved topology repair checks

Consolidate bch2_gc_check_topology() and btree_node_interior_verify(),
and replace them with an improved version,
bch2_btree_node_check_topology().

This checks that children of an interior node correctly span the full
range of the parent node with no overlaps.

Also, ensure that topology repairs at runtime are always a fatal error;
in particular, this adds a check in btree_iter_down() - if we don't find
a key while walking down the btree that's indicative of a topology error
and should be flagged as such, not a null ptr deref.

Some checks in btree_update_interior.c remaining BUG_ONS(), because we
already checked the node for topology errors when starting the update,
and the assertions indicate that we _just_ corrupted the btree node -
i.e. the problem can't be that existing on disk corruption, they
indicate an actual algorithmic bug.

In the future, we'll be annotating the fsck errors list with which
recovery pass corrects them; the open coded "run explicit recovery pass
or fatal error" in bch2_btree_node_check_topology() will in the future
be done for every fsck_err() call.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# aa6e130e 24-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add an assertion for trying to evict btree root

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 91dcad18 22-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Pin btree cache in ram for random access in fsck

Various phases of fsck involve checking references from one btree to
another: this means doing a sequential scan of one btree, and then
mostly random access into the second.

This is particularly painful for checking extents <-> backpointers; we
can prefetch btree node access on the sequential scan, but not on the
random access portion, and this is particularly painful on spinning
rust, where we'd like to keep the pipeline fairly full of btree node
reads so that the elevator can reduce seeking.

This patch implements prefetching and pinning of the portion of the
btree that we'll be doing random access to. We already calculate how
much of the random access btree will fit in memory so it's a fairly
straightforward change.

This will put more pressure on system memory usage, so we introduce a
new option, fsck_memory_usage_percent, which is the percentage of total
system ram that fsck is allowed to pin.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 52946d82 06-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill more -EIO error codes

This converts -EIOs related to btree node errors to private error codes,
which will help with some ongoing debugging by giving us better error
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cb6fc943 01-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: kill kvpmalloc()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5f43b013 22-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: btree node prefetching in check_topology

btree_and_journal_iter is old code that we want to get rid of, but we're
not ready to yet.

lack of btree node prefetching is, it turns out, a real performance
issue for fsck on spinning rust, so - add it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ec4edd7b 16-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Prep work for variable size btree node buffers

bcachefs btree nodes are big - typically 256k - and btree roots are
pinned in memory. As we're now up to 18 btrees, we now have significant
memory overhead in mostly empty btree roots.

And in the future we're going to start enforcing that certain btree node
boundaries exist, to solve lock contention issues - analagous to XFS's
AGIs.

Thus, we need to start allocating smaller btree node buffers when we
can. This patch changes code that refers to the filesystem constant
c->opts.btree_node_size to refer to the btree node buffer size -
btree_buf_bytes() - where appropriate.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b819f308 03-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: don't clear accessed bit in btree node fill

Seeing strange performance issues that might be caused by memory
pressure causing prefetched nodes to be evicted before they're used.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a564c9fa 02-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Include btree_trans in more tracepoints

This gives us more context information - e.g. which codepath is invoking
btree node reads.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0117591e 30-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't drop journal pins in exit path

There's no need to drop journal pins in our exit paths - the code was
trying to have everything cleaned up on any shutdown, but better to just
tweak the assertions a bit.

This fixes a bug where calling into journal reclaim in the exit path
would cass a null ptr deref.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a1d97d84 19-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix shrinker names

Shrinkers are now exported to debugfs, so the names can't have slashes
in them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 88dfe193 19-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_id_str()

Since we can run with unknown btree IDs, we can't directly index btree
IDs into fixed size arrays.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ecae0bd5 02-Nov-2023 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
"Many singleton patches against the MM code. The patch series which are
included in this merge do the following:

- Kemeng Shi has contributed some compation maintenance work in the
series 'Fixes and cleanups to compaction'

- Joel Fernandes has a patchset ('Optimize mremap during mutual
alignment within PMD') which fixes an obscure issue with mremap()'s
pagetable handling during a subsequent exec(), based upon an
implementation which Linus suggested

- More DAMON/DAMOS maintenance and feature work from SeongJae Park i
the following patch series:

mm/damon: misc fixups for documents, comments and its tracepoint
mm/damon: add a tracepoint for damos apply target regions
mm/damon: provide pseudo-moving sum based access rate
mm/damon: implement DAMOS apply intervals
mm/damon/core-test: Fix memory leaks in core-test
mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval

- In the series 'Do not try to access unaccepted memory' Adrian
Hunter provides some fixups for the recently-added 'unaccepted
memory' feature. To increase the feature's checking coverage. 'Plug
a few gaps where RAM is exposed without checking if it is
unaccepted memory'

- In the series 'cleanups for lockless slab shrink' Qi Zheng has done
some maintenance work which is preparation for the lockless slab
shrinking code

- Qi Zheng has redone the earlier (and reverted) attempt to make slab
shrinking lockless in the series 'use refcount+RCU method to
implement lockless slab shrink'

- David Hildenbrand contributes some maintenance work for the rmap
code in the series 'Anon rmap cleanups'

- Kefeng Wang does more folio conversions and some maintenance work
in the migration code. Series 'mm: migrate: more folio conversion
and unification'

- Matthew Wilcox has fixed an issue in the buffer_head code which was
causing long stalls under some heavy memory/IO loads. Some cleanups
were added on the way. Series 'Add and use bdev_getblk()'

- In the series 'Use nth_page() in place of direct struct page
manipulation' Zi Yan has fixed a potential issue with the direct
manipulation of hugetlb page frames

- In the series 'mm: hugetlb: Skip initialization of gigantic tail
struct pages if freed by HVO' has improved our handling of gigantic
pages in the hugetlb vmmemmep optimizaton code. This provides
significant boot time improvements when significant amounts of
gigantic pages are in use

- Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code
rationalization and folio conversions in the hugetlb code

- Yin Fengwei has improved mlock()'s handling of large folios in the
series 'support large folio for mlock'

- In the series 'Expose swapcache stat for memcg v1' Liu Shixin has
added statistics for memcg v1 users which are available (and
useful) under memcg v2

- Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
prctl so that userspace may direct the kernel to not automatically
propagate the denial to child processes. The series is named 'MDWE
without inheritance'

- Kefeng Wang has provided the series 'mm: convert numa balancing
functions to use a folio' which does what it says

- In the series 'mm/ksm: add fork-exec support for prctl' Stefan
Roesch makes is possible for a process to propagate KSM treatment
across exec()

- Huang Ying has enhanced memory tiering's calculation of memory
distances. This is used to permit the dax/kmem driver to use 'high
bandwidth memory' in addition to Optane Data Center Persistent
Memory Modules (DCPMM). The series is named 'memory tiering:
calculate abstract distance based on ACPI HMAT'

- In the series 'Smart scanning mode for KSM' Stefan Roesch has
optimized KSM by teaching it to retain and use some historical
information from previous scans

- Yosry Ahmed has fixed some inconsistencies in memcg statistics in
the series 'mm: memcg: fix tracking of pending stats updates
values'

- In the series 'Implement IOCTL to get and optionally clear info
about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap
which permits us to atomically read-then-clear page softdirty
state. This is mainly used by CRIU

- Hugh Dickins contributed the series 'shmem,tmpfs: general
maintenance', a bunch of relatively minor maintenance tweaks to
this code

- Matthew Wilcox has increased the use of the VMA lock over
file-backed page faults in the series 'Handle more faults under the
VMA lock'. Some rationalizations of the fault path became possible
as a result

- In the series 'mm/rmap: convert page_move_anon_rmap() to
folio_move_anon_rmap()' David Hildenbrand has implemented some
cleanups and folio conversions

- In the series 'various improvements to the GUP interface' Lorenzo
Stoakes has simplified and improved the GUP interface with an eye
to providing groundwork for future improvements

- Andrey Konovalov has sent along the series 'kasan: assorted fixes
and improvements' which does those things

- Some page allocator maintenance work from Kemeng Shi in the series
'Two minor cleanups to break_down_buddy_pages'

- In thes series 'New selftest for mm' Breno Leitao has developed
another MM self test which tickles a race we had between madvise()
and page faults

- In the series 'Add folio_end_read' Matthew Wilcox provides cleanups
and an optimization to the core pagecache code

- Nhat Pham has added memcg accounting for hugetlb memory in the
series 'hugetlb memcg accounting'

- Cleanups and rationalizations to the pagemap code from Lorenzo
Stoakes, in the series 'Abstract vma_merge() and split_vma()'

- Audra Mitchell has fixed issues in the procfs page_owner code's new
timestamping feature which was causing some misbehaviours. In the
series 'Fix page_owner's use of free timestamps'

- Lorenzo Stoakes has fixed the handling of new mappings of sealed
files in the series 'permit write-sealed memfd read-only shared
mappings'

- Mike Kravetz has optimized the hugetlb vmemmap optimization in the
series 'Batch hugetlb vmemmap modification operations'

- Some buffer_head folio conversions and cleanups from Matthew Wilcox
in the series 'Finish the create_empty_buffers() transition'

- As a page allocator performance optimization Huang Ying has added
automatic tuning to the allocator's per-cpu-pages feature, in the
series 'mm: PCP high auto-tuning'

- Roman Gushchin has contributed the patchset 'mm: improve
performance of accounted kernel memory allocations' which improves
their performance by ~30% as measured by a micro-benchmark

- folio conversions from Kefeng Wang in the series 'mm: convert page
cpupid functions to folios'

- Some kmemleak fixups in Liu Shixin's series 'Some bugfix about
kmemleak'

- Qi Zheng has improved our handling of memoryless nodes by keeping
them off the allocation fallback list. This is done in the series
'handle memoryless nodes more appropriately'

- khugepaged conversions from Vishal Moola in the series 'Some
khugepaged folio conversions'"

[ bcachefs conflicts with the dynamically allocated shrinkers have been
resolved as per Stephen Rothwell in

https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/

with help from Qi Zheng.

The clone3 test filtering conflict was half-arsed by yours truly ]

* tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits)
mm/damon/sysfs: update monitoring target regions for online input commit
mm/damon/sysfs: remove requested targets when online-commit inputs
selftests: add a sanity check for zswap
Documentation: maple_tree: fix word spelling error
mm/vmalloc: fix the unchecked dereference warning in vread_iter()
zswap: export compression failure stats
Documentation: ubsan: drop "the" from article title
mempolicy: migration attempt to match interleave nodes
mempolicy: mmap_lock is not needed while migrating folios
mempolicy: alloc_pages_mpol() for NUMA policy without vma
mm: add page_rmappable_folio() wrapper
mempolicy: remove confusing MPOL_MF_LAZY dead code
mempolicy: mpol_shared_policy_init() without pseudo-vma
mempolicy trivia: use pgoff_t in shared mempolicy tree
mempolicy trivia: slightly more consistent naming
mempolicy trivia: delete those ancient pr_debug()s
mempolicy: fix migrate_pages(2) syscall return nr_failed
kernfs: drop shared NUMA mempolicy hooks
hugetlbfs: drop shared NUMA mempolicy pretence
mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()
...


# 51c801bc 19-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Minor bch2_btree_node_get() smatch fixes

- it's no longer possible for trans to be NULL
- also, move "wait for read to complete" to the slowpath,
__bch2_btree_node_get().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 96dea3d5 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix W=12 build errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 6c643965 03-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bkey_format helper improvements

- add a to_text() method for bkey_format

- convert bch2_bkey_format_validate() to modern error message style,
where we pass a printbuf for the error string instead of returning a
static string

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 067d228b 07-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Enumerate recovery passes

Recovery and fsck have many different passes/jobs to do, which always
run in the same order - but not all of them run all the time. Some are
for fsck, some for unclean shutdown, some for version upgrades.

This adds some new structure: a defined list of recovery passes that we
can run in a loop, as well as consolidating the log messages.

The main benefit is consolidating the "should run this recovery pass"
logic, as well as cleaning up the "this recovery pass has finished"
state; instead of having a bunch of ad-hoc state bits in c->flags, we've
now got c->curr_recovery_pass.

By consolidating the "should run this recovery pass" logic, in the
future on disk format upgrades will be able to say "upgrading to this
version requires x passes to run", instead of forcing all of fsck to
run.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# c8b4534d 07-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Delete redundant log messages

Now that we have distinct error codes for different memory allocation
failures, the early init log messages are no longer needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# faa6cb6c 28-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Allow for unknown btree IDs

We need to allow filesystems with metadata from newer versions to be
mountable and usable by older versions.

This patch enables us to roll out new btrees without a new major version
number; we can now handle btree roots for unknown btree types.

The unknown btree roots will be retained, and fsck (including
backpointers) will check them, the same as other btree types.

We add a dynamic array for the extra, unknown btree roots, in addition
to the fixed size btree root array, and add new helpers for looking up
btree roots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# b3591acc 26-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: unregister_shrinker() now safe on not-registered shrinker

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 25aa8c21 18-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_trans_unlock_noassert()

This fixes a spurious assert in the btree node read path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# e1d29c5f 28-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Ensure bch2_btree_node_get() calls relock() after unlock()

Fix a bug where bch2_btree_node_get() might call bch2_trans_unlock() (in
fill) without calling bch2_trans_relock(); this is a bug when it's done
in the core btree code.

Also, twea bch2_btree_node_mem_alloc() to drop btree locks before doing
a blocking memory allocation.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 1fb4fe63 20-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

six locks: Kill six_lock_state union

As suggested by Linus, this drops the six_lock_state union in favor of
raw bitmasks.

On the one hand, bitfields give more type-level structure to the code.
However, a significant amount of the code was working with
six_lock_state as a u64/atomic64_t, and the conversions from the
bitfields to the u64 were deemed a bit too out-there.

More significantly, because bitfield order is poorly defined (#ifdef
__LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the
sequence number would overflow into the rest of the bitfield if the
compiler didn't put the sequence number at the high end of the word.

The new code is a bit saner when we're on an architecture without real
atomic64_t support - all accesses to lock->state now go through
atomic64_*() operations.

On architectures with real atomic64_t support, we additionally use
atomic bit ops for setting/clearing individual bits.

Text size: 7467 bytes -> 4649 bytes - compilers still suck at
bitfields.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 0d2234a7 20-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

six locks: Kill six_lock_pcpu_(alloc|free)

six_lock_pcpu_alloc() is an unsafe interface: it's not safe to allocate
or free the percpu reader count on an existing lock that's in use, the
only safe time to allocate percpu readers is when the lock is first
being initialized.

This patch adds a flags parameter to six_lock_init(), and instead of
six_lock_pcpu_free() we now expose six_lock_exit(), which does the same
thing but is less likely to be misused.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 0b438c5b 21-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Clear btree_node_just_written() when node reused or evicted

This fixes the following bug:

Journal reclaim attempts to flush a node, but races with the node being
evicted from the btree node cache; when we lock the node, the data
buffers have already been freed.

We don't evict a node that's dirty, so calling btree_node_write() is
fine - it's a noop - except that the btree_node_just_written bit causes
bch2_btree_post_write_cleanup() to run (resorting the node), which then
causes a null ptr deref.

00078 Unable to handle kernel NULL pointer dereference at virtual address 000000000000009e
00078 Mem abort info:
00078 ESR = 0x0000000096000005
00078 EC = 0x25: DABT (current EL), IL = 32 bits
00078 SET = 0, FnV = 0
00078 EA = 0, S1PTW = 0
00078 FSC = 0x05: level 1 translation fault
00078 Data abort info:
00078 ISV = 0, ISS = 0x00000005
00078 CM = 0, WnR = 0
00078 user pgtable: 4k pages, 39-bit VAs, pgdp=000000007ed64000
00078 [000000000000009e] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
00078 Internal error: Oops: 0000000096000005 [#1] SMP
00078 Modules linked in:
00078 CPU: 75 PID: 1170 Comm: stress-ng-utime Not tainted 6.3.0-ktest-g5ef5b466e77e #2078
00078 Hardware name: linux,dummy-virt (DT)
00078 pstate: 80001005 (Nzcv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00078 pc : btree_node_sort+0xc4/0x568
00078 lr : bch2_btree_post_write_cleanup+0x6c/0x1c0
00078 sp : ffffff803e30b350
00078 x29: ffffff803e30b350 x28: 0000000000000001 x27: ffffff80076e52a8
00078 x26: 0000000000000002 x25: 0000000000000000 x24: ffffffc00912e000
00078 x23: ffffff80076e52a8 x22: 0000000000000000 x21: ffffff80076e52bc
00078 x20: ffffff80076e5200 x19: 0000000000000000 x18: 0000000000000000
00078 x17: fffffffff8000000 x16: 0000000008000000 x15: 0000000008000000
00078 x14: 0000000000000002 x13: 0000000000000000 x12: 00000000000000a0
00078 x11: ffffff803e30b400 x10: ffffff803e30b408 x9 : 0000000000000001
00078 x8 : 0000000000000000 x7 : ffffff803e480000 x6 : 00000000000000a0
00078 x5 : 0000000000000088 x4 : 0000000000000000 x3 : 0000000000000010
00078 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80076e52a8
00078 Call trace:
00078 btree_node_sort+0xc4/0x568
00078 bch2_btree_post_write_cleanup+0x6c/0x1c0
00078 bch2_btree_node_write+0x108/0x148
00078 __btree_node_flush+0x104/0x160
00078 bch2_btree_node_flush0+0x1c/0x30
00078 journal_flush_pins.constprop.0+0x184/0x2d0
00078 __bch2_journal_reclaim+0x4d4/0x508
00078 bch2_journal_reclaim+0x1c/0x30
00078 __bch2_journal_preres_get+0x244/0x268
00078 bch2_trans_journal_preres_get_cold+0xa4/0x180
00078 __bch2_trans_commit+0x61c/0x1bb0
00078 bch2_setattr_nonsize+0x254/0x318
00078 bch2_setattr+0x5c/0x78
00078 notify_change+0x2bc/0x408
00078 vfs_utimes+0x11c/0x218
00078 do_utimes+0x84/0x140
00078 __arm64_sys_utimensat+0x68/0xa8
00078 invoke_syscall.constprop.0+0x54/0xf0
00078 do_el0_svc+0x48/0xd8
00078 el0_svc+0x14/0x48
00078 el0t_64_sync_handler+0xb0/0xb8
00078 el0t_64_sync+0x14c/0x150
00078 Code: 8b050265 910020c6 8b060266 910060ac (79402cad)
00078 ---[ end trace 0000000000000000 ]---

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 65d48e35 14-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Private error codes: ENOMEM

This adds private error codes for most (but not all) of our ENOMEM uses,
which makes it easier to track down assorted allocation failures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# a345b0f3 06-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_node_to_text() const correctness

This is for the Rust interface - Rust cares more about const than C
does.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 3329cf1b 02-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Centralize btree node lock initialization

This fixes some confusion in the lockdep code due to initializing btree
node/key cache locks with the same lockdep key, but different names.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 1306f87d 02-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Plumb btree_trans through btree cache code

Soon, __bch2_btree_node_write() is going to require a btree_trans: zoned
device support is going to require a new allocation for every btree node
write. This is a bit of prep work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 94c69faf 04-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Use six_lock_ip()

This uses the new _ip() interface to six locks and hooks it up to
btree_path->ip_allocated, when available.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 87ced107 13-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert EAGAIN errors to private error codes

More error code cleanup, for better error messages and debugability.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# e88a75eb 24-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New bpos_cmp(), bkey_cmp() replacements

This patch introduces
- bpos_eq()
- bpos_lt()
- bpos_le()
- bpos_gt()
- bpos_ge()

and equivalent replacements for bkey_cmp().

Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 447e9227 25-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't set accessed bit on btree node fill

Btree nodes shouldn't have their accessed bit set when entering the
btree cache by being read in from disk - this fixes linear scans
thrashing the cache.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 001783e2 22-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Split out __bch2_btree_node_get()

Standard splitting out of the slow path from the fast path of a
function. We may follow this up in another patch with inlining the fast
path into btree_iter.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 42af0ad5 17-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix a race with b->write_type

b->write_type needs to be set atomically with setting the
btree_node_need_write flag, so move it into b->flags.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# a1019576 22-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More style fixes

Fixes for various checkpatch errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 46fee692 28-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improved btree write statistics

This replaces sysfs btree_avg_write_size with btree_write_stats, which
now breaks out statistics by the source of the btree write.

Btree writes that are too small are a source of inefficiency, and
excessive btree resort overhead - this will let us see what's causing
them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 3e3e02e6 19-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted checkpatch fixes

checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:

ASSIGN_IN_IF
UNSPECIFIED_INT - bcachefs coding style prefers single token type names
NEW_TYPEDEFS - typedefs are occasionally good
FUNCTION_ARGUMENTS - we prefer to look at functions in .c files
(hopefully with docbook documentation), not .h
file prototypes
MULTISTATEMENT_MACRO_USE_DO_WHILE
- we have _many_ x-macros and other macros where
we can't do this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# b5ac23c4 05-Oct-2022 Daniel Hill <daniel@gluo.nz>

bcachefs: improve behaviour of btree_cache_scan()

Appending new nodes to the end of the list means we're more likely to
evict old entries when btree_cache_scan() is started.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# c36ff038 25-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_cache_scan() improvement

We're still seeing OOM issues caused by the btree node cache shrinker
not sufficiently freeing memory: thus, this patch changes the shrinker
to not exit if __GFP_FS was not supplied.

Instead, tweak btree node memory allocation so that we never invoke
memory reclaim while holding the btree node cache lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 0d7009d7 22-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Delete old deadlock avoidance code

This deletes our old lock ordering based deadlock avoidance code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# ca7d8fca 21-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New locking functions

In the future, with the new deadlock cycle detector, we won't be using
bare six_lock_* anymore: lock wait entries will all be embedded in
btree_trans, and we will need a btree_trans context whenever locking a
btree node.

This patch plumbs a btree_trans to the few places that need it, and adds
two new locking functions
- btree_node_lock_nopath, which may fail returning a transaction
restart, and
- btree_node_lock_nopath_nofail, to be used in places where we know we
cannot deadlock (i.e. because we're holding no other locks).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 674cfc26 26-Aug-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add persistent counters for all tracepoints

Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 14599cce 22-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Switch btree locking code to struct btree_bkey_cached_common

This is just some type safety cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 9f96568c 09-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tracepoint improvements

Our types are exported to the tracepoint code, so it's not necessary to
break things out individually when passing them to tracepoints - we can
also call other functions from TP_fast_assign().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 549d173c 17-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: EINTR -> BCH_ERR_transaction_restart

Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 8bfe14e8 14-Jul-2022 Daniel Hill <daniel@gluo.nz>

bcachefs: lock time stats prep work.

We need the caller name and a place to store our results, btree_trans provides this.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 401ec4db 03-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Printbuf rework

This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 1d8a2689 07-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve btree_bad_header()

In the future printbufs will be mempool-ified, so we shouldn't be using
more than one at a time if we don't have to.

This also fixes an extra trailing newline.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 7c7e071d 03-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't normalize to pages in btree cache shrinker

This behavior dates from the early, early days of bcache, and upon
further delving appears to not make any sense. The shrinker only works
in terms of 'objects' of unknown size; normalizing to pages only had the
effect of changing the batch size, which we could do directly - if we
wanted; we probably don't. Normalizing to pages meant our batch size was
very small, which seems to have been keeping us from doing as much
shrinking as we should be under heavy memory pressure; this patch
appears to alleviate some OOMs we've been seeing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 30985537 04-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix usage of six lock's percpu mode

Six locks have a percpu mode, which we use for interior btree nodes, as
well as btree key cache keys for the subvolumes btree. We've been
switching locks back and forth between percpu and non percpu mode as
needed, but it turns out this is racy - when we're reusing an existing
node, other threads could be attempting to lock it while we're switching
it between modes.

This patch fixes this by never switching 'struct btree' between the two
modes, and instead segragating them between two different freed lists.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 5b3f7805 04-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor bch2_btree_node_mem_alloc()

This is prep work for the next patch, which is going to fix our usage of
the percpu mode of six locks by never switching struct btree between the
two modes - which means we need separate freed lists.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 05a49d22 03-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make bch2_btree_cache_scan() try harder

Previously, when bch2_btree_cache_scan() attempted to reclaim a node but
failed (because trylock failed, because it was dirty, etc.), it would
count that against the number of nodes it was scanning and attempting to
free. This patch changes that behaviour, so that now we only count nodes
that we then don't free if they have the accessed bit (which we also
clear).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# bf3efff5 27-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix race leading to btree node write getting stuck

Checking btree_node_may_write() isn't atomic with the other btree flags,
dirty and need_write in particular. There was a rare race where we'd
unblock a node from writing while __btree_node_flush() was setting
need_write, and no thread would notice that the node was now both able
to write and needed to be written.

Fix this by adding btree node flags for will_make_reachable and
write_blocked that can be checked in the cmpxchg loop in
__bch2_btree_node_write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 82732ef5 26-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve btree_node_write_if_need()

btree_node_write_if_need() kicks off a btree node write only if
need_write is set; this makes the locking easier to reason about by
moving the check into the cmpxchg loop in __bch2_btree_node_write().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# de517c95 26-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use x-macros for btree node flags

This is for adding an array of strings for btree node flag names.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 55334d78 26-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BCH_FS_HOLD_BTREE_WRITES

This was just dead code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# fa8e94fa 25-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Heap allocate printbufs

This patch changes printbufs dynamically allocate and reallocate a
buffer as needed. Stack usage has become a bit of a problem, and a major
cause of that has been static size string buffers on the stack.

The most involved part of this refactoring is that printbufs must now be
exited with printbuf_exit().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 8f9ad91a 17-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix failure to allocate btree node in cache

The error code when we fail to allocate a node in the btree node cache
doesn't make it to bch2_btree_path_traverse_all(). Instead, we need to
stash a flag in btree_trans so we know we have to take the cannibalize
lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# bc82d08b 08-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tracepoint improvements

This improves the transaction restart tracepoints - adding distinct
tracepoints for all the locations and reasons a transaction might have
been restarted, and ensures that there's a tracepoint for every
transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 669f87a5 03-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Switch to __func__for recording where btree_trans was initialized

Symbol decoding, via %ps, isn't supported in userspace - this will also
be faster when we're using trans->fn in the fast path, as with the new
BCH_JSET_ENTRY_log journal messages.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# c7ce813f 27-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a tracepoint for the btree cache shrinker

This is to help with diagnosing why the btree node can doesn't seem to
be shrinking - we've had issues in the past with granularity/batch size,
since btree nodes are so big.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 1aeed454 19-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Optimize memory accesses in bch2_btree_node_get()

This puts a load behind some branches before where it's used, so that it
can execute in parallel with other loads.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 54b2db3d 11-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix infinite loop in bch2_btree_cache_scan()

When attempting to free btree nodes, we might not be able to free all
the nodes that were requested. But the code was looping until it had
freed _all_ the nodes requested, when it should have only been
attempting to free nr nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# aae4eea6 13-Sep-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve btree_node_mem_ptr optimization

This patch checks b->hash_val before attempting to lock the node in the
btree, which makes it more equivalent to the "lookup in hash table"
path - and potentially avoids an unnecessary transaction restart if
btree_node_mem_ptr(k) no longer points to the node we want.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 67e0dd8f 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_path

This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# cab8e233 31-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add an assertion for removing btree nodes from cache

Chasing a bug that has something to do with the btree node cache.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 78cf784e 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Further reduce iter->trans usage

This is prep work for splitting btree_path out from btree_iter -
btree_path will not have a pointer to btree_trans.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# e5af273f 25-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trans->restarted

Start tracking when btree transactions have been restarted - and assert
that we're always calling bch2_trans_begin() immediately after
transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 8b3e9bd6 24-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Always check for transaction restarts

On transaction restart iterators won't be locked anymore - make sure
we're always checking for errors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# a32b9573 26-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add an option for btree node mem ptr optimization

bch2_btree_node_ptr_v2 has a field for stashing a pointer to the in
memory btree node; this is safe because we clear this field when reading
in nodes from disk and we never free in memory btree nodes - but, we
have bug reports that indicate something might be faulty with this
optimization, so let's add an option for it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 6e075b54 24-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_btree_iter_relock_intent()

This adds a new helper for btree_cache.c that does what we want where
the iterator is still being traverse - and also eliminates some
unnecessary transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# f8f86c6a 15-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve btree_bad_header() error message

We should always print out the full btree node ptr.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 5aab6635 14-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tighten up btree_iter locking assertions

We weren't correctly verifying that we had interior node intent locks -
this patch also fixes bugs uncovered by the new assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 0a700890 11-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kick off btree node writes from write completions

This is a performance improvement by removing the need to wait for the
in flight btree write to complete before kicking one off, which is going
to be needed to avoid a performance regression with the upcoming patch
to update btree ptrs after every btree write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 19d54324 10-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Really don't hold btree locks while btree IOs are in flight

This is something we've attempted to stick to for quite some time, as it
helps guarantee filesystem latency - but there's a few remaining paths
that this patch fixes.

This is also necessary for an upcoming patch to update btree pointers
after every btree write - since the btree write completion path will now
be doing btree operations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# c205321b 08-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop all btree locks when submitting btree node reads

As a rule we don't want to be holding btree locks while submitting IO -
this will improve overall filesystem latency.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# 531a0095 04-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve btree iterator tracepoints

This patch adds some new tracepoints to the btree iterator code, and
adds new fields to the existing tracepoints - primarily for the iterator
position.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# e68031fb 29-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Mark newly allocated btree nodes as accessed

This was a major oversight - this means under memory pressure we can end
up reading in a btree node, then having it evicted before we get to use
it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# ceda1b9a 25-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Evict btree nodes we're deleting

There was a bug that led to duplicate btree node pointers being inserted
at the wrong level. The new topology repair code can fix that, except
that the btree cache code gets confused when we read in a btree node
from the pointer that was at the wrong level. This patch evicts nodes
that we're deleting to, which nicely solves the problem.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 65c0601a 23-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use mmap() instead of vmalloc_exec() in userspace

Calling mmap() directly is much better than malloc() then mprotect(), we
end up with much less address space fragmentation.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 537c32f5 23-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't BUG_ON() btree topology error

This replaces an assertion in the btree merge path with a
bch2_inconsistent_error() - fsck will fix it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 6adaac0b 20-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Update bch2_btree_verify()

bch2_btree_verify() verifies that the btree node on disk matches what we
have in memory. This patch changes it to verify every replica, and also
fixes it for interior btree nodes - there's a mem_ptr field which is
used as a scratch space and needs to be zeroed out for comparing with
what's on disk.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 2177147b 06-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve bset compaction

The previous patch that fixed btree nodes being written too aggressively
now meant that we weren't sorting btree node bsets optimally - this
patch fixes that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 54ca47e1 28-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bch2_btree_node_get_sibling()

This patch reworks the btree node merge path to use a second btree
iterator to get the sibling node - which means
bch2_btree_iter_get_sibling() can be deleted. Also, it uses
bch2_btree_iter_traverse_all() if necessary - which means it should be
more reliable. We don't currently even try to make it work when
trans->nounlock is set - after a BTREE_INSERT_NOUNLOCK transaction
commit, hopefully this will be a worthwhile tradeoff.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# e751c01a 24-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Start using bpos.snapshot field

This patch starts treating the bpos.snapshot field like part of the key
in the btree code:

* bpos_successor() and bpos_predecessor() now include the snapshot field
* Keys in btrees that will be using snapshots (extents, inodes, dirents
and xattrs) now always have their snapshot field set to U32_MAX

The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that
determines whether we're iterating over keys in all snapshots or not -
internally, this controlls whether bkey_(successor|predecessor)
increment/decrement the snapshot field, or only the higher bits of the
key.

We add a new member to struct btree_iter, iter->snapshot: when
BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always
equal iter->snapshot, which will be 0 for btrees that don't use
snapshots, and alsways U32_MAX for btrees that will use snapshots
(until we enable snapshot creation).

This patch also introduces a new metadata version number, and compat
code for reading from/writing to older versions - this isn't a forced
upgrade (yet).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 4cf91b02 04-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out bpos_cmp() and bkey_cmp()

With snapshots, we're going to need to differentiate between comparisons
that should and shouldn't include the snapshot field. bpos_cmp is now
the comparison function that does include the snapshot field, used by
core btree code.

Upper level filesystem code generally does _not_ want to compare against
the snapshot field - that code wants keys to compare as equal even when
one of them is in an ancestor snapshot.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# a9d79c6e 23-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use pcpu mode of six locks for interior nodes

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# f020bfcd 04-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bch2_bpos_to_text() more consistently

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 41f8b09e 20-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename BTREE_ID enums for consistency with other enums

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# c043a330 27-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_btree_cache_scan()

It was counting nodes on the freed list that it skips - because we want
to leave a few so that btree splits don't touch the allocator - as nodes
that it touched, meaning that if it was called with <= 3 nodes to
reclaim, and those nodes were on the freed list, it would never do any
work.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 18a7b972 23-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix for bch2_btree_node_get_noiter() returning -ENOMEM

bch2_btree_node_get_noiter() isn't used from the btree iterator code,
which retries with the btree node cache cannibalize lock held on
-ENOMEM, so we should do it ourself if necessary.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# a0b73c1c 26-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add (partial) support for fixing btree topology

When we walk the btrees during recovery, part of that is checking that
btree topology is correct: for every interior btree node, its child
nodes should exactly span the range the parent node covers.

Previously, we had checks for this, but not repair code. Now that we
have the ability to do btree updates during initial GC, this patch adds
that repair code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# edfbba58 11-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add btree node prefetching to bch2_btree_and_journal_walk()

bch2_btree_and_journal_walk() walks the btree overlaying keys from the
journal; it was introduced so that we could read in the alloc btree
prior to journal replay being done, when journalling of updates to
interior btree nodes was introduced.

But it didn't have btree node prefetching, which introduced a severe
regression with mount times, particularly on spinning rust. This patch
implements btree node prefetching for the btree + journal walk,
hopefully fixing that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# b929bbef 11-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add cannibalize lock to btree_cache_to_text()

More debugging info is always a good thing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 07a1006a 17-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reduce/kill BKEY_PADDED use

With various newer key types - stripe keys, inline data extents - the
old approach of calculating the maximum size of the value is becoming
more and more error prone. Better to switch to bkey_on_stack, which can
dynamically allocate if necessary to handle any size bkey.

In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# d8ebed7d 19-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add btree cache stats to sysfs

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# d8b46004 15-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check for errors from register_shrinker()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# e648448c 11-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix missing memalloc_nofs_restore()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 6a747c46 09-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add accounting for dirty btree nodes/keys

This lets us improve journal reclaim, so that it now tries to make sure
no more than 3/4s of the btree node cache and btree key cache are dirty
- ensuring the shrinkers can free memory.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 29364f34 02-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop sysfs interface to debug parameters

It's not used much anymore, the module paramter interface is better.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# a301dc38 28-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve tracing for transaction restarts

We have a bug where we can get stuck with a process spinning in
transaction restarts - need more information.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 645d72aa 26-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix btree updates when mixing cached and non cached iterators

There was a bug where bch2_trans_update() would incorrectly delete a
pending update where the new update did not actually overwrite the
existing update, because we were incorrectly using BTREE_ITER_TYPE when
sorting pending btree updates.

This affects the pending patch to use cached iterators for inode
updates.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 97c0e195 15-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix another lockdep splat

vfree() can allocate memory, so we need to call memalloc_nofs_save().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 5d0b7f90 11-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a lockdep splat

We can't allocate memory with GFP_FS while holding the btree cache lock,
and vfree() can allocate memory.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 4580baec 25-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Remove some uses of PAGE_SIZE in the btree code

For portability to userspace, we should try to avoid working in kernel
pages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 760992aa 25-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Ensure we wake up threads locking node when reusing it

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 47a5649a 15-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix incorrect gfp check

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# bd2bb273 12-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't deadlock when btree node reuse changes lock ordering

Btree node lock ordering is based on the logical key. However, 'struct
btree' may be reused for a different btree node under memory pressure.
This patch uses the new six lock callback to check if a btree node is no
longer the node we wanted to lock before blocking.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# e38821f3 09-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't allocate memory under the btree cache lock

The btree cache lock is needed for reclaiming from the btree node cache,
and memory allocation can potentially spin and sleep (for 100 ms at a
time), so.. don't do that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 8c9eef95 05-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check gfp_flags correctly in bch2_btree_cache_scan()

bch2_btree_node_mem_alloc() uses memalloc_nofs_save()/GFP_NOFS, but
GFP_NOFS does include __GFP_IO - oops. We used to use GFP_NOIO, but as
we're a filesystem now GFP_NOFS makes more sense now and is looser.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# f96c0df4 02-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a deadlock in bch2_btree_node_get_sibling()

There was a bad interaction with bch2_btree_iter_set_pos_same_leaf(),
which can leave a btree node locked that is just outside iter->pos,
breaking the lock ordering checks in __bch2_btree_node_lock(). Ideally
we should get rid of this corner case, but for now fix it locally with
verbose comments.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 692c3f06 27-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix memalloc_nofs_restore() usage

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 39fb2983 07-Jan-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bkey_type_successor

Previously, BTREE_ID_INODES was special - inodes were indexed by the
inode field, which meant the offset field of struct bpos wasn't used,
which led to special cases in e.g. the btree iterator code.

Now, inodes in the inodes btree are indexed by the offset field.

Also: prevously min_key was special for extents btrees, min_key for
extents would equal max_key for the previous node. Now, min_key =
bkey_successor() of the previous node, same as non extent btrees.

This means we can completely get rid of
btree_type_sucessor/predecessor.

Also make some improvements to the metadata IO validate/compat code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# e62d65f2 15-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trans_commit() path can now insert to interior nodes

This will be needed for the upcoming patches to journal updates to
interior btree nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 7f81d4cf 26-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix setting btree_node_accessed()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 72141e1f 24-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use btree_ptr_v2.mem_ptr to avoid hash table lookup

Nice performance optimization

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 237e8048 18-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: introduce b->hash_val

This is partly prep work for introducing bch_btree_ptr_v2, but it'll
also be a bit of a performance boost by moving the full key out of the
hot part of struct btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# c9bebae6 29-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Whiteout changes

More prep work for snapshots: extents will soon be using
KEY_TYPE_deleted for whiteouts, with 0 size. But we wen't be able to
keep these whiteouts with the rest of the extents in the btree node, due
to sorting invariants breaking.

We can deal with this by immediately moving the new whiteouts to the
unwritten whiteouts area - this just means those whiteouts won't be
sorted, so we need new code to sort them prior to merging them with the
rest of the keys to be written.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 58404bb2 23-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fall back to slowpath on exact comparison

This is basically equivalent to the original strategy of falling back to
checking against the original key when the original key and previous key
didn't differ in the required bits - except, now we only fall back when
the search key doesn't differ in the required bits, which ends up being
a bit faster.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 1bdb67e8 06-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: kill BFLOAT_FAILED_PREV

The assumption underlying BFLOAT_FAILED_PREV was wrong; the comparison
we're doing in bset_search_tree() doesn't have to tell the pivot apart
from the previous key, it just has to tell if search is definitely
greater than or equal to the pivot.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# fb975d14 21-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop unnecessary rcu_read_lock()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# e0dfc08b 11-Jun-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: use memalloc_nofs_save() for vmalloc allocation

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 61011ea2 14-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rip out old hacky transaction restart tracing

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 20bceecb 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More work to avoid transaction restarts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 58fbf808 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Delete duplicate code

Also rename for consistency

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 60755344 10-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: kill BTREE_ITER_NOUNLOCK

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# b03b81df 10-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't pass around may_drop_locks

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# c43a6ef9 05-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_bkey_cached_common

This is prep work for the btree key cache: btree iterators will point to
either struct btree, or a new struct bkey_cached.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# ba5c6557 22-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add actual tracepoints for transaction restarts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 0f238367 27-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trans_for_each_iter()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# d0cc3def 13-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More allocator startup improvements

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 26609b61 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make bkey types globally unique

this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.

Also improve the on disk format versioning stuff.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 319f9ac3 08-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: revamp to_text methods

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 1c7a0adf 12-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trace transaction restarts

exceptionally crappy "tracing", but it's a start at documenting the
places restarts can be triggered

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 1c6fdbd8 17-Mar-2017 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Initial commit

Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

# 51c801bc 19-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Minor bch2_btree_node_get() smatch fixes

- it's no longer possible for trans to be NULL
- also, move "wait for read to complete" to the slowpath,
__bch2_btree_node_get().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96dea3d5 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix W=12 build errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6c643965 03-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bkey_format helper improvements

- add a to_text() method for bkey_format

- convert bch2_bkey_format_validate() to modern error message style,
where we pass a printbuf for the error string instead of returning a
static string

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 067d228b 07-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Enumerate recovery passes

Recovery and fsck have many different passes/jobs to do, which always
run in the same order - but not all of them run all the time. Some are
for fsck, some for unclean shutdown, some for version upgrades.

This adds some new structure: a defined list of recovery passes that we
can run in a loop, as well as consolidating the log messages.

The main benefit is consolidating the "should run this recovery pass"
logic, as well as cleaning up the "this recovery pass has finished"
state; instead of having a bunch of ad-hoc state bits in c->flags, we've
now got c->curr_recovery_pass.

By consolidating the "should run this recovery pass" logic, in the
future on disk format upgrades will be able to say "upgrading to this
version requires x passes to run", instead of forcing all of fsck to
run.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c8b4534d 07-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Delete redundant log messages

Now that we have distinct error codes for different memory allocation
failures, the early init log messages are no longer needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# faa6cb6c 28-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Allow for unknown btree IDs

We need to allow filesystems with metadata from newer versions to be
mountable and usable by older versions.

This patch enables us to roll out new btrees without a new major version
number; we can now handle btree roots for unknown btree types.

The unknown btree roots will be retained, and fsck (including
backpointers) will check them, the same as other btree types.

We add a dynamic array for the extra, unknown btree roots, in addition
to the fixed size btree root array, and add new helpers for looking up
btree roots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b3591acc 26-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: unregister_shrinker() now safe on not-registered shrinker

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 25aa8c21 18-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_trans_unlock_noassert()

This fixes a spurious assert in the btree node read path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e1d29c5f 28-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Ensure bch2_btree_node_get() calls relock() after unlock()

Fix a bug where bch2_btree_node_get() might call bch2_trans_unlock() (in
fill) without calling bch2_trans_relock(); this is a bug when it's done
in the core btree code.

Also, twea bch2_btree_node_mem_alloc() to drop btree locks before doing
a blocking memory allocation.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1fb4fe63 20-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

six locks: Kill six_lock_state union

As suggested by Linus, this drops the six_lock_state union in favor of
raw bitmasks.

On the one hand, bitfields give more type-level structure to the code.
However, a significant amount of the code was working with
six_lock_state as a u64/atomic64_t, and the conversions from the
bitfields to the u64 were deemed a bit too out-there.

More significantly, because bitfield order is poorly defined (#ifdef
__LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the
sequence number would overflow into the rest of the bitfield if the
compiler didn't put the sequence number at the high end of the word.

The new code is a bit saner when we're on an architecture without real
atomic64_t support - all accesses to lock->state now go through
atomic64_*() operations.

On architectures with real atomic64_t support, we additionally use
atomic bit ops for setting/clearing individual bits.

Text size: 7467 bytes -> 4649 bytes - compilers still suck at
bitfields.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0d2234a7 20-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

six locks: Kill six_lock_pcpu_(alloc|free)

six_lock_pcpu_alloc() is an unsafe interface: it's not safe to allocate
or free the percpu reader count on an existing lock that's in use, the
only safe time to allocate percpu readers is when the lock is first
being initialized.

This patch adds a flags parameter to six_lock_init(), and instead of
six_lock_pcpu_free() we now expose six_lock_exit(), which does the same
thing but is less likely to be misused.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0b438c5b 21-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Clear btree_node_just_written() when node reused or evicted

This fixes the following bug:

Journal reclaim attempts to flush a node, but races with the node being
evicted from the btree node cache; when we lock the node, the data
buffers have already been freed.

We don't evict a node that's dirty, so calling btree_node_write() is
fine - it's a noop - except that the btree_node_just_written bit causes
bch2_btree_post_write_cleanup() to run (resorting the node), which then
causes a null ptr deref.

00078 Unable to handle kernel NULL pointer dereference at virtual address 000000000000009e
00078 Mem abort info:
00078 ESR = 0x0000000096000005
00078 EC = 0x25: DABT (current EL), IL = 32 bits
00078 SET = 0, FnV = 0
00078 EA = 0, S1PTW = 0
00078 FSC = 0x05: level 1 translation fault
00078 Data abort info:
00078 ISV = 0, ISS = 0x00000005
00078 CM = 0, WnR = 0
00078 user pgtable: 4k pages, 39-bit VAs, pgdp=000000007ed64000
00078 [000000000000009e] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
00078 Internal error: Oops: 0000000096000005 [#1] SMP
00078 Modules linked in:
00078 CPU: 75 PID: 1170 Comm: stress-ng-utime Not tainted 6.3.0-ktest-g5ef5b466e77e #2078
00078 Hardware name: linux,dummy-virt (DT)
00078 pstate: 80001005 (Nzcv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00078 pc : btree_node_sort+0xc4/0x568
00078 lr : bch2_btree_post_write_cleanup+0x6c/0x1c0
00078 sp : ffffff803e30b350
00078 x29: ffffff803e30b350 x28: 0000000000000001 x27: ffffff80076e52a8
00078 x26: 0000000000000002 x25: 0000000000000000 x24: ffffffc00912e000
00078 x23: ffffff80076e52a8 x22: 0000000000000000 x21: ffffff80076e52bc
00078 x20: ffffff80076e5200 x19: 0000000000000000 x18: 0000000000000000
00078 x17: fffffffff8000000 x16: 0000000008000000 x15: 0000000008000000
00078 x14: 0000000000000002 x13: 0000000000000000 x12: 00000000000000a0
00078 x11: ffffff803e30b400 x10: ffffff803e30b408 x9 : 0000000000000001
00078 x8 : 0000000000000000 x7 : ffffff803e480000 x6 : 00000000000000a0
00078 x5 : 0000000000000088 x4 : 0000000000000000 x3 : 0000000000000010
00078 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80076e52a8
00078 Call trace:
00078 btree_node_sort+0xc4/0x568
00078 bch2_btree_post_write_cleanup+0x6c/0x1c0
00078 bch2_btree_node_write+0x108/0x148
00078 __btree_node_flush+0x104/0x160
00078 bch2_btree_node_flush0+0x1c/0x30
00078 journal_flush_pins.constprop.0+0x184/0x2d0
00078 __bch2_journal_reclaim+0x4d4/0x508
00078 bch2_journal_reclaim+0x1c/0x30
00078 __bch2_journal_preres_get+0x244/0x268
00078 bch2_trans_journal_preres_get_cold+0xa4/0x180
00078 __bch2_trans_commit+0x61c/0x1bb0
00078 bch2_setattr_nonsize+0x254/0x318
00078 bch2_setattr+0x5c/0x78
00078 notify_change+0x2bc/0x408
00078 vfs_utimes+0x11c/0x218
00078 do_utimes+0x84/0x140
00078 __arm64_sys_utimensat+0x68/0xa8
00078 invoke_syscall.constprop.0+0x54/0xf0
00078 do_el0_svc+0x48/0xd8
00078 el0_svc+0x14/0x48
00078 el0t_64_sync_handler+0xb0/0xb8
00078 el0t_64_sync+0x14c/0x150
00078 Code: 8b050265 910020c6 8b060266 910060ac (79402cad)
00078 ---[ end trace 0000000000000000 ]---

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 65d48e35 14-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Private error codes: ENOMEM

This adds private error codes for most (but not all) of our ENOMEM uses,
which makes it easier to track down assorted allocation failures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a345b0f3 06-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_node_to_text() const correctness

This is for the Rust interface - Rust cares more about const than C
does.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3329cf1b 02-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Centralize btree node lock initialization

This fixes some confusion in the lockdep code due to initializing btree
node/key cache locks with the same lockdep key, but different names.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1306f87d 02-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Plumb btree_trans through btree cache code

Soon, __bch2_btree_node_write() is going to require a btree_trans: zoned
device support is going to require a new allocation for every btree node
write. This is a bit of prep work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 94c69faf 04-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Use six_lock_ip()

This uses the new _ip() interface to six locks and hooks it up to
btree_path->ip_allocated, when available.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 87ced107 13-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert EAGAIN errors to private error codes

More error code cleanup, for better error messages and debugability.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e88a75eb 24-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New bpos_cmp(), bkey_cmp() replacements

This patch introduces
- bpos_eq()
- bpos_lt()
- bpos_le()
- bpos_gt()
- bpos_ge()

and equivalent replacements for bkey_cmp().

Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 447e9227 25-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't set accessed bit on btree node fill

Btree nodes shouldn't have their accessed bit set when entering the
btree cache by being read in from disk - this fixes linear scans
thrashing the cache.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 001783e2 22-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Split out __bch2_btree_node_get()

Standard splitting out of the slow path from the fast path of a
function. We may follow this up in another patch with inlining the fast
path into btree_iter.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 42af0ad5 17-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix a race with b->write_type

b->write_type needs to be set atomically with setting the
btree_node_need_write flag, so move it into b->flags.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a1019576 22-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More style fixes

Fixes for various checkpatch errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 46fee692 28-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improved btree write statistics

This replaces sysfs btree_avg_write_size with btree_write_stats, which
now breaks out statistics by the source of the btree write.

Btree writes that are too small are a source of inefficiency, and
excessive btree resort overhead - this will let us see what's causing
them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e3e02e6 19-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted checkpatch fixes

checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:

ASSIGN_IN_IF
UNSPECIFIED_INT - bcachefs coding style prefers single token type names
NEW_TYPEDEFS - typedefs are occasionally good
FUNCTION_ARGUMENTS - we prefer to look at functions in .c files
(hopefully with docbook documentation), not .h
file prototypes
MULTISTATEMENT_MACRO_USE_DO_WHILE
- we have _many_ x-macros and other macros where
we can't do this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b5ac23c4 05-Oct-2022 Daniel Hill <daniel@gluo.nz>

bcachefs: improve behaviour of btree_cache_scan()

Appending new nodes to the end of the list means we're more likely to
evict old entries when btree_cache_scan() is started.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c36ff038 25-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_cache_scan() improvement

We're still seeing OOM issues caused by the btree node cache shrinker
not sufficiently freeing memory: thus, this patch changes the shrinker
to not exit if __GFP_FS was not supplied.

Instead, tweak btree node memory allocation so that we never invoke
memory reclaim while holding the btree node cache lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0d7009d7 22-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Delete old deadlock avoidance code

This deletes our old lock ordering based deadlock avoidance code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# ca7d8fca 21-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New locking functions

In the future, with the new deadlock cycle detector, we won't be using
bare six_lock_* anymore: lock wait entries will all be embedded in
btree_trans, and we will need a btree_trans context whenever locking a
btree node.

This patch plumbs a btree_trans to the few places that need it, and adds
two new locking functions
- btree_node_lock_nopath, which may fail returning a transaction
restart, and
- btree_node_lock_nopath_nofail, to be used in places where we know we
cannot deadlock (i.e. because we're holding no other locks).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 674cfc26 26-Aug-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add persistent counters for all tracepoints

Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 14599cce 22-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Switch btree locking code to struct btree_bkey_cached_common

This is just some type safety cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9f96568c 09-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tracepoint improvements

Our types are exported to the tracepoint code, so it's not necessary to
break things out individually when passing them to tracepoints - we can
also call other functions from TP_fast_assign().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 549d173c 17-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: EINTR -> BCH_ERR_transaction_restart

Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8bfe14e8 14-Jul-2022 Daniel Hill <daniel@gluo.nz>

bcachefs: lock time stats prep work.

We need the caller name and a place to store our results, btree_trans provides this.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 401ec4db 03-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Printbuf rework

This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1d8a2689 07-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve btree_bad_header()

In the future printbufs will be mempool-ified, so we shouldn't be using
more than one at a time if we don't have to.

This also fixes an extra trailing newline.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 7c7e071d 03-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't normalize to pages in btree cache shrinker

This behavior dates from the early, early days of bcache, and upon
further delving appears to not make any sense. The shrinker only works
in terms of 'objects' of unknown size; normalizing to pages only had the
effect of changing the batch size, which we could do directly - if we
wanted; we probably don't. Normalizing to pages meant our batch size was
very small, which seems to have been keeping us from doing as much
shrinking as we should be under heavy memory pressure; this patch
appears to alleviate some OOMs we've been seeing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 30985537 04-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix usage of six lock's percpu mode

Six locks have a percpu mode, which we use for interior btree nodes, as
well as btree key cache keys for the subvolumes btree. We've been
switching locks back and forth between percpu and non percpu mode as
needed, but it turns out this is racy - when we're reusing an existing
node, other threads could be attempting to lock it while we're switching
it between modes.

This patch fixes this by never switching 'struct btree' between the two
modes, and instead segragating them between two different freed lists.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5b3f7805 04-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor bch2_btree_node_mem_alloc()

This is prep work for the next patch, which is going to fix our usage of
the percpu mode of six locks by never switching struct btree between the
two modes - which means we need separate freed lists.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 05a49d22 03-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make bch2_btree_cache_scan() try harder

Previously, when bch2_btree_cache_scan() attempted to reclaim a node but
failed (because trylock failed, because it was dirty, etc.), it would
count that against the number of nodes it was scanning and attempting to
free. This patch changes that behaviour, so that now we only count nodes
that we then don't free if they have the accessed bit (which we also
clear).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# bf3efff5 27-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix race leading to btree node write getting stuck

Checking btree_node_may_write() isn't atomic with the other btree flags,
dirty and need_write in particular. There was a rare race where we'd
unblock a node from writing while __btree_node_flush() was setting
need_write, and no thread would notice that the node was now both able
to write and needed to be written.

Fix this by adding btree node flags for will_make_reachable and
write_blocked that can be checked in the cmpxchg loop in
__bch2_btree_node_write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 82732ef5 26-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve btree_node_write_if_need()

btree_node_write_if_need() kicks off a btree node write only if
need_write is set; this makes the locking easier to reason about by
moving the check into the cmpxchg loop in __bch2_btree_node_write().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# de517c95 26-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use x-macros for btree node flags

This is for adding an array of strings for btree node flag names.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 55334d78 26-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BCH_FS_HOLD_BTREE_WRITES

This was just dead code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# fa8e94fa 25-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Heap allocate printbufs

This patch changes printbufs dynamically allocate and reallocate a
buffer as needed. Stack usage has become a bit of a problem, and a major
cause of that has been static size string buffers on the stack.

The most involved part of this refactoring is that printbufs must now be
exited with printbuf_exit().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8f9ad91a 17-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix failure to allocate btree node in cache

The error code when we fail to allocate a node in the btree node cache
doesn't make it to bch2_btree_path_traverse_all(). Instead, we need to
stash a flag in btree_trans so we know we have to take the cannibalize
lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# bc82d08b 08-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tracepoint improvements

This improves the transaction restart tracepoints - adding distinct
tracepoints for all the locations and reasons a transaction might have
been restarted, and ensures that there's a tracepoint for every
transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 669f87a5 03-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Switch to __func__for recording where btree_trans was initialized

Symbol decoding, via %ps, isn't supported in userspace - this will also
be faster when we're using trans->fn in the fast path, as with the new
BCH_JSET_ENTRY_log journal messages.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c7ce813f 27-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a tracepoint for the btree cache shrinker

This is to help with diagnosing why the btree node can doesn't seem to
be shrinking - we've had issues in the past with granularity/batch size,
since btree nodes are so big.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 1aeed454 19-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Optimize memory accesses in bch2_btree_node_get()

This puts a load behind some branches before where it's used, so that it
can execute in parallel with other loads.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 54b2db3d 11-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix infinite loop in bch2_btree_cache_scan()

When attempting to free btree nodes, we might not be able to free all
the nodes that were requested. But the code was looping until it had
freed _all_ the nodes requested, when it should have only been
attempting to free nr nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# aae4eea6 13-Sep-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve btree_node_mem_ptr optimization

This patch checks b->hash_val before attempting to lock the node in the
btree, which makes it more equivalent to the "lookup in hash table"
path - and potentially avoids an unnecessary transaction restart if
btree_node_mem_ptr(k) no longer points to the node we want.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 67e0dd8f 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_path

This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cab8e233 31-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add an assertion for removing btree nodes from cache

Chasing a bug that has something to do with the btree node cache.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 78cf784e 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Further reduce iter->trans usage

This is prep work for splitting btree_path out from btree_iter -
btree_path will not have a pointer to btree_trans.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e5af273f 25-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trans->restarted

Start tracking when btree transactions have been restarted - and assert
that we're always calling bch2_trans_begin() immediately after
transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8b3e9bd6 24-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Always check for transaction restarts

On transaction restart iterators won't be locked anymore - make sure
we're always checking for errors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# a32b9573 26-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add an option for btree node mem ptr optimization

bch2_btree_node_ptr_v2 has a field for stashing a pointer to the in
memory btree node; this is safe because we clear this field when reading
in nodes from disk and we never free in memory btree nodes - but, we
have bug reports that indicate something might be faulty with this
optimization, so let's add an option for it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 6e075b54 24-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_btree_iter_relock_intent()

This adds a new helper for btree_cache.c that does what we want where
the iterator is still being traverse - and also eliminates some
unnecessary transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f8f86c6a 15-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve btree_bad_header() error message

We should always print out the full btree node ptr.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 5aab6635 14-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Tighten up btree_iter locking assertions

We weren't correctly verifying that we had interior node intent locks -
this patch also fixes bugs uncovered by the new assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 0a700890 11-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kick off btree node writes from write completions

This is a performance improvement by removing the need to wait for the
in flight btree write to complete before kicking one off, which is going
to be needed to avoid a performance regression with the upcoming patch
to update btree ptrs after every btree write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 19d54324 10-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Really don't hold btree locks while btree IOs are in flight

This is something we've attempted to stick to for quite some time, as it
helps guarantee filesystem latency - but there's a few remaining paths
that this patch fixes.

This is also necessary for an upcoming patch to update btree pointers
after every btree write - since the btree write completion path will now
be doing btree operations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c205321b 08-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop all btree locks when submitting btree node reads

As a rule we don't want to be holding btree locks while submitting IO -
this will improve overall filesystem latency.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 531a0095 04-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve btree iterator tracepoints

This patch adds some new tracepoints to the btree iterator code, and
adds new fields to the existing tracepoints - primarily for the iterator
position.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e68031fb 29-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Mark newly allocated btree nodes as accessed

This was a major oversight - this means under memory pressure we can end
up reading in a btree node, then having it evicted before we get to use
it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ceda1b9a 25-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Evict btree nodes we're deleting

There was a bug that led to duplicate btree node pointers being inserted
at the wrong level. The new topology repair code can fix that, except
that the btree cache code gets confused when we read in a btree node
from the pointer that was at the wrong level. This patch evicts nodes
that we're deleting to, which nicely solves the problem.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 65c0601a 23-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use mmap() instead of vmalloc_exec() in userspace

Calling mmap() directly is much better than malloc() then mprotect(), we
end up with much less address space fragmentation.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 537c32f5 23-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't BUG_ON() btree topology error

This replaces an assertion in the btree merge path with a
bch2_inconsistent_error() - fsck will fix it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6adaac0b 20-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Update bch2_btree_verify()

bch2_btree_verify() verifies that the btree node on disk matches what we
have in memory. This patch changes it to verify every replica, and also
fixes it for interior btree nodes - there's a mem_ptr field which is
used as a scratch space and needs to be zeroed out for comparing with
what's on disk.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2177147b 06-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve bset compaction

The previous patch that fixed btree nodes being written too aggressively
now meant that we weren't sorting btree node bsets optimally - this
patch fixes that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 54ca47e1 28-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bch2_btree_node_get_sibling()

This patch reworks the btree node merge path to use a second btree
iterator to get the sibling node - which means
bch2_btree_iter_get_sibling() can be deleted. Also, it uses
bch2_btree_iter_traverse_all() if necessary - which means it should be
more reliable. We don't currently even try to make it work when
trans->nounlock is set - after a BTREE_INSERT_NOUNLOCK transaction
commit, hopefully this will be a worthwhile tradeoff.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e751c01a 24-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Start using bpos.snapshot field

This patch starts treating the bpos.snapshot field like part of the key
in the btree code:

* bpos_successor() and bpos_predecessor() now include the snapshot field
* Keys in btrees that will be using snapshots (extents, inodes, dirents
and xattrs) now always have their snapshot field set to U32_MAX

The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that
determines whether we're iterating over keys in all snapshots or not -
internally, this controlls whether bkey_(successor|predecessor)
increment/decrement the snapshot field, or only the higher bits of the
key.

We add a new member to struct btree_iter, iter->snapshot: when
BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always
equal iter->snapshot, which will be 0 for btrees that don't use
snapshots, and alsways U32_MAX for btrees that will use snapshots
(until we enable snapshot creation).

This patch also introduces a new metadata version number, and compat
code for reading from/writing to older versions - this isn't a forced
upgrade (yet).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4cf91b02 04-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out bpos_cmp() and bkey_cmp()

With snapshots, we're going to need to differentiate between comparisons
that should and shouldn't include the snapshot field. bpos_cmp is now
the comparison function that does include the snapshot field, used by
core btree code.

Upper level filesystem code generally does _not_ want to compare against
the snapshot field - that code wants keys to compare as equal even when
one of them is in an ancestor snapshot.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a9d79c6e 23-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use pcpu mode of six locks for interior nodes

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f020bfcd 04-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bch2_bpos_to_text() more consistently

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 41f8b09e 20-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename BTREE_ID enums for consistency with other enums

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c043a330 27-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_btree_cache_scan()

It was counting nodes on the freed list that it skips - because we want
to leave a few so that btree splits don't touch the allocator - as nodes
that it touched, meaning that if it was called with <= 3 nodes to
reclaim, and those nodes were on the freed list, it would never do any
work.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 18a7b972 23-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix for bch2_btree_node_get_noiter() returning -ENOMEM

bch2_btree_node_get_noiter() isn't used from the btree iterator code,
which retries with the btree node cache cannibalize lock held on
-ENOMEM, so we should do it ourself if necessary.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a0b73c1c 26-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add (partial) support for fixing btree topology

When we walk the btrees during recovery, part of that is checking that
btree topology is correct: for every interior btree node, its child
nodes should exactly span the range the parent node covers.

Previously, we had checks for this, but not repair code. Now that we
have the ability to do btree updates during initial GC, this patch adds
that repair code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# edfbba58 11-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add btree node prefetching to bch2_btree_and_journal_walk()

bch2_btree_and_journal_walk() walks the btree overlaying keys from the
journal; it was introduced so that we could read in the alloc btree
prior to journal replay being done, when journalling of updates to
interior btree nodes was introduced.

But it didn't have btree node prefetching, which introduced a severe
regression with mount times, particularly on spinning rust. This patch
implements btree node prefetching for the btree + journal walk,
hopefully fixing that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b929bbef 11-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add cannibalize lock to btree_cache_to_text()

More debugging info is always a good thing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 07a1006a 17-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reduce/kill BKEY_PADDED use

With various newer key types - stripe keys, inline data extents - the
old approach of calculating the maximum size of the value is becoming
more and more error prone. Better to switch to bkey_on_stack, which can
dynamically allocate if necessary to handle any size bkey.

In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d8ebed7d 19-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add btree cache stats to sysfs

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d8b46004 15-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check for errors from register_shrinker()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e648448c 11-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix missing memalloc_nofs_restore()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6a747c46 09-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add accounting for dirty btree nodes/keys

This lets us improve journal reclaim, so that it now tries to make sure
no more than 3/4s of the btree node cache and btree key cache are dirty
- ensuring the shrinkers can free memory.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 29364f34 02-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop sysfs interface to debug parameters

It's not used much anymore, the module paramter interface is better.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a301dc38 28-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve tracing for transaction restarts

We have a bug where we can get stuck with a process spinning in
transaction restarts - need more information.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 645d72aa 26-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix btree updates when mixing cached and non cached iterators

There was a bug where bch2_trans_update() would incorrectly delete a
pending update where the new update did not actually overwrite the
existing update, because we were incorrectly using BTREE_ITER_TYPE when
sorting pending btree updates.

This affects the pending patch to use cached iterators for inode
updates.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 97c0e195 15-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix another lockdep splat

vfree() can allocate memory, so we need to call memalloc_nofs_save().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5d0b7f90 11-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a lockdep splat

We can't allocate memory with GFP_FS while holding the btree cache lock,
and vfree() can allocate memory.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4580baec 25-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Remove some uses of PAGE_SIZE in the btree code

For portability to userspace, we should try to avoid working in kernel
pages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 760992aa 25-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Ensure we wake up threads locking node when reusing it

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 47a5649a 15-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix incorrect gfp check

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bd2bb273 12-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't deadlock when btree node reuse changes lock ordering

Btree node lock ordering is based on the logical key. However, 'struct
btree' may be reused for a different btree node under memory pressure.
This patch uses the new six lock callback to check if a btree node is no
longer the node we wanted to lock before blocking.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e38821f3 09-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't allocate memory under the btree cache lock

The btree cache lock is needed for reclaiming from the btree node cache,
and memory allocation can potentially spin and sleep (for 100 ms at a
time), so.. don't do that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8c9eef95 05-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check gfp_flags correctly in bch2_btree_cache_scan()

bch2_btree_node_mem_alloc() uses memalloc_nofs_save()/GFP_NOFS, but
GFP_NOFS does include __GFP_IO - oops. We used to use GFP_NOIO, but as
we're a filesystem now GFP_NOFS makes more sense now and is looser.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f96c0df4 02-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a deadlock in bch2_btree_node_get_sibling()

There was a bad interaction with bch2_btree_iter_set_pos_same_leaf(),
which can leave a btree node locked that is just outside iter->pos,
breaking the lock ordering checks in __bch2_btree_node_lock(). Ideally
we should get rid of this corner case, but for now fix it locally with
verbose comments.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 692c3f06 27-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix memalloc_nofs_restore() usage

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 39fb2983 07-Jan-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bkey_type_successor

Previously, BTREE_ID_INODES was special - inodes were indexed by the
inode field, which meant the offset field of struct bpos wasn't used,
which led to special cases in e.g. the btree iterator code.

Now, inodes in the inodes btree are indexed by the offset field.

Also: prevously min_key was special for extents btrees, min_key for
extents would equal max_key for the previous node. Now, min_key =
bkey_successor() of the previous node, same as non extent btrees.

This means we can completely get rid of
btree_type_sucessor/predecessor.

Also make some improvements to the metadata IO validate/compat code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e62d65f2 15-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trans_commit() path can now insert to interior nodes

This will be needed for the upcoming patches to journal updates to
interior btree nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7f81d4cf 26-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix setting btree_node_accessed()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 72141e1f 24-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use btree_ptr_v2.mem_ptr to avoid hash table lookup

Nice performance optimization

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 237e8048 18-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: introduce b->hash_val

This is partly prep work for introducing bch_btree_ptr_v2, but it'll
also be a bit of a performance boost by moving the full key out of the
hot part of struct btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c9bebae6 29-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Whiteout changes

More prep work for snapshots: extents will soon be using
KEY_TYPE_deleted for whiteouts, with 0 size. But we wen't be able to
keep these whiteouts with the rest of the extents in the btree node, due
to sorting invariants breaking.

We can deal with this by immediately moving the new whiteouts to the
unwritten whiteouts area - this just means those whiteouts won't be
sorted, so we need new code to sort them prior to merging them with the
rest of the keys to be written.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 58404bb2 23-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fall back to slowpath on exact comparison

This is basically equivalent to the original strategy of falling back to
checking against the original key when the original key and previous key
didn't differ in the required bits - except, now we only fall back when
the search key doesn't differ in the required bits, which ends up being
a bit faster.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1bdb67e8 06-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: kill BFLOAT_FAILED_PREV

The assumption underlying BFLOAT_FAILED_PREV was wrong; the comparison
we're doing in bset_search_tree() doesn't have to tell the pivot apart
from the previous key, it just has to tell if search is definitely
greater than or equal to the pivot.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fb975d14 21-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop unnecessary rcu_read_lock()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e0dfc08b 11-Jun-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: use memalloc_nofs_save() for vmalloc allocation

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 61011ea2 14-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rip out old hacky transaction restart tracing

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 20bceecb 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More work to avoid transaction restarts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 58fbf808 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Delete duplicate code

Also rename for consistency

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 60755344 10-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: kill BTREE_ITER_NOUNLOCK

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b03b81df 10-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't pass around may_drop_locks

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c43a6ef9 05-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_bkey_cached_common

This is prep work for the btree key cache: btree iterators will point to
either struct btree, or a new struct bkey_cached.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ba5c6557 22-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add actual tracepoints for transaction restarts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0f238367 27-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trans_for_each_iter()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d0cc3def 13-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More allocator startup improvements

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 26609b61 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make bkey types globally unique

this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.

Also improve the on disk format versioning stuff.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 319f9ac3 08-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: revamp to_text methods

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c7a0adf 12-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trace transaction restarts

exceptionally crappy "tracing", but it's a start at documenting the
places restarts can be triggered

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c6fdbd8 17-Mar-2017 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Initial commit

Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>