History log of /linux-master/fs/bcachefs/btree_io.c
Revision Date Author Comments
# 9fd5a48a 16-Apr-2024 Nathan Chancellor <nathan@kernel.org>

bcachefs: Fix format specifier in validate_bset_keys()

When building for 32-bit platforms, for which size_t is 'unsigned int',
there is a warning from a format string in validate_bset_keys():

fs/bcachefs/btree_io.c: In function 'validate_bset_keys':
fs/bcachefs/btree_io.c:891:34: error: format '%lu' expects argument of type 'long unsigned int', but argument 12 has type 'unsigned int' [-Werror=format=]
891 | "bad k->u64s %u (min %u max %lu)", k->u64s,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/bcachefs/btree_io.c:603:32: note: in definition of macro 'btree_err'
603 | msg, ##__VA_ARGS__); \
| ^~~
fs/bcachefs/btree_io.c:887:21: note: in expansion of macro 'btree_err_on'
887 | if (btree_err_on(!bkeyp_u64s_valid(&b->format, k),
| ^~~~~~~~~~~~
fs/bcachefs/btree_io.c:891:64: note: format string is defined here
891 | "bad k->u64s %u (min %u max %lu)", k->u64s,
| ~~^
| |
| long unsigned int
| %u
cc1: all warnings being treated as errors

BKEY_U64s is size_t so the entire expression is promoted to size_t. Use
the '%zu' specifier so that there is no warning regardless of the width
of size_t.

Fixes: 031ad9e7dbd1 ("bcachefs: Check for packed bkeys that are too big")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202404130747.wH6Dd23p-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202404131536.HdAMBOVc-lkp@intel.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ba8ed36e 11-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: don't queue btree nodes for rewrites during scan

many nodes found during scan will be old nodes, overwritten by newer
nodes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 031ad9e7 11-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Check for packed bkeys that are too big

add missing validation; fixes assertion pop in bkey unpack

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 55936afe 15-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Flag btrees with missing data

We need this to know when we should attempt to reconstruct the snapshots
btree

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e2a316b3 01-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: BCH_WATERMARK_interior_updates

This adds a new watermark, higher priority than BCH_WATERMARK_reclaim,
for interior btree updates. We've seen a deadlock where journal replay
triggers a ton of btree node merges, and these use up all available open
buckets and then interior updates get stuck.

One cause of this is that we're currently lacking btree node merging on
write buffer btrees - that needs to be fixed as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 812a9297 26-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix btree node keys accounting in topology repair path

When dropping keys now outside a now because we're changing the node
min/max, we need to redo the node's accounting as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 79032b07 23-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improved topology repair checks

Consolidate bch2_gc_check_topology() and btree_node_interior_verify(),
and replace them with an improved version,
bch2_btree_node_check_topology().

This checks that children of an interior node correctly span the full
range of the parent node with no overlaps.

Also, ensure that topology repairs at runtime are always a fatal error;
in particular, this adds a check in btree_iter_down() - if we don't find
a key while walking down the btree that's indicative of a topology error
and should be flagged as such, not a null ptr deref.

Some checks in btree_update_interior.c remaining BUG_ONS(), because we
already checked the node for topology errors when starting the update,
and the assertions indicate that we _just_ corrupted the btree node -
i.e. the problem can't be that existing on disk corruption, they
indicate an actual algorithmic bug.

In the future, we'll be annotating the fsck errors list with which
recovery pass corrects them; the open coded "run explicit recovery pass
or fatal error" in bch2_btree_node_check_topology() will in the future
be done for every fsck_err() call.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3ed94062 17-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve bch2_fatal_error()

error messages should always include __func__

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a5860368 16-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't corrupt journal keys gap buffer when dropping alloc info

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 52946d82 06-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill more -EIO error codes

This converts -EIOs related to btree node errors to private error codes,
which will help with some ongoing debugging by giving us better error
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cb6fc943 01-Feb-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: kill kvpmalloc()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 94817db9 08-Mar-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Correctly validate k->u64s in btree node read path

validate_bset_keys() never properly validated k->u64s; it checked if it
was 0, but not if it was smaller than keys for the given packed format;
this fixes that small oversight.

This patch was backported, so it's adding quite a few error enums so
that they don't get renumbered and we don't have confusing gaps.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ec4edd7b 16-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Prep work for variable size btree node buffers

bcachefs btree nodes are big - typically 256k - and btree roots are
pinned in memory. As we're now up to 18 btrees, we now have significant
memory overhead in mostly empty btree roots.

And in the future we're going to start enforcing that certain btree node
boundaries exist, to solve lock contention issues - analagous to XFS's
AGIs.

Thus, we need to start allocating smaller btree node buffers when we
can. This patch changes code that refers to the filesystem constant
c->opts.btree_node_size to refer to the btree node buffer size -
btree_buf_bytes() - where appropriate.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4819b66e 05-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: improve checksum error messages

new helpers:
- bch2_csum_to_text()
- bch2_csum_err_msg()

standardize our checksum error messages a bit, and print out the
checksums a bit more nicely.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2d02bfb0 05-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: improve validate_bset_keys()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e9bc59f9 03-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: add missing bch2_latency_acct() call

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c72e4d7a 03-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: add time_stats for btree_node_read_done()

Seeing weird latency issues in the btree node read path - add one
bch2_btree_node_read_done().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0beebd92 21-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bkey_for_each_ptr() now declares loop iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 53b67d8d 23-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: better error message in btree_node_write_work()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 483dea44 05-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve error message when finding wrong btree node

single_device.merge_torture_flakey is, very rarely, finding a btree node
that doesn't match the key that points to it: this patch improves the
error message to print out more fields from the btree node header, so
that we can see what else does or does not match the key.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a564c9fa 02-Dec-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Include btree_trans in more tracepoints

This gives us more context information - e.g. which codepath is invoking
btree node reads.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cb52d23e 11-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename BTREE_INSERT flags

BTREE_INSERT flags are actually transaction commit flags - rename them
for clarity.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0117591e 30-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't drop journal pins in exit path

There's no need to drop journal pins in our exit paths - the code was
trying to have everything cleaned up on any shutdown, but better to just
tweak the assertions a bit.

This fixes a bug where calling into journal reclaim in the exit path
would cass a null ptr deref.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d4e3b928 17-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

closures: CLOSURE_CALLBACK() to fix type punning

Control flow integrity is now checking that type signatures match on
indirect function calls. That breaks closures, which embed a work_struct
in a closure in such a way that a closure_fn may also be used as a
workqueue fn by the underlying closure code.

So we have to change closure fns to take a work_struct as their
argument - but that results in a loss of clarity, as closure fns have
different semantics from normal workqueue functions (they run owning a
ref on the closure, which must be released with continue_at() or
closure_return()).

Thus, this patc introduces CLOSURE_CALLBACK() and closure_type() macros
as suggested by Kees, to smooth things over a bit.

Suggested-by: Kees Cook <keescook@chromium.org>
Cc: Coly Li <colyli@suse.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a8958a1a 02-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bkey_copy() is no longer a macro

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b65db750 24-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Enumerate fsck errors

This patch adds a superblock error counter for every distinct fsck
error; this means that when analyzing filesystems out in the wild we'll
be able to see what sorts of inconsistencies are being found and repair,
and hence what bugs to look for.

Errors validating bkeys are not yet considered distinct fsck errors, but
this patch adds a new helper, bkey_fsck_err(), in order to add distinct
error types for them as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 94119eeb 25-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add IO error counts to bch_member

We now track IO errors per device since filesystem creation.

IO error counts can be viewed in sysfs, or with the 'bcachefs
show-super' command.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 88dfe193 19-Oct-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_btree_id_str()

Since we can run with unknown btree IDs, we can't directly index btree
IDs into fixed size arrays.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6bd68ec2 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Heap allocate btree_trans

We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96dea3d5 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix W=12 build errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1809b8cb 10-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Break up io.c

More reorganization, this splits up io.c into
- io_read.c
- io_misc.c - fallocate, fpunch, truncate
- io_write.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5cfd6977 09-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Array bounds fixes

It's no longer legal to use a zero size array as a flexible array
member - this causes UBSAN to complain.

This patch switches our zero size arrays to normal flexible array
members when possible, and inserts casts in other places (e.g. where we
use the zero size array as a marker partway through an array).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e08e63e4 06-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: BCH_COMPAT_bformat_overflow_done no longer required

Awhile back, we changed bkey_format generation to ensure that the packed
representation could never represent fields larger than the unpacked
representation.

This was to ensure that bkey_packed_successor() always gave a sensible
result, but in the current code bkey_packed_successor() is only used in
a debug assertion - not for anything important.

This kills the requirement that we've gotten rid of those weird bkey
formats, and instead changes the assertion to check if we're dealing
with an old weird bkey format.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 56046e3e 03-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert btree_err_type to normal error codes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 73adfcaf 03-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix btree_err() macro

Error code wasn't being propagated correctly, change it to match
fsck_err()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ad52bac2 03-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Log a message when running an explicit recovery pass

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6c643965 03-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bkey_format helper improvements

- add a to_text() method for bkey_format

- convert bch2_bkey_format_validate() to modern error message style,
where we pass a printbuf for the error string instead of returning a
static string

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 922bc5a0 16-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Make topology repair a normal recovery pass

This adds bch2_run_explicit_recovery_pass(), for rewinding recovery and
explicitly running a specific recovery pass - this is a more general
replacement for how we were running topology repair before.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ba8eeae8 27-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bcachefs_metadata_version_major_minor

This introduces major/minor versioning to the superblock version number.
Major version number changes indicate incompatible releases; we can move
forward to a new major version number, but not backwards. Minor version
numbers indicate compatible changes - these add features, but can still
be mounted and used by old versions.

With the recent patches that make it possible to roll out new btrees and
key types without breaking compatibility, we should be able to roll out
most new features without incompatible changes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 73bd774d 06-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted sparse fixes

- endianness fixes
- mark some things static
- fix a few __percpu annotations
- fix silent enum conversions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# faa6cb6c 28-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Allow for unknown btree IDs

We need to allow filesystems with metadata from newer versions to be
mountable and usable by older versions.

This patch enables us to roll out new btrees without a new major version
number; we can now handle btree roots for unknown btree types.

The unknown btree roots will be retained, and fsck (including
backpointers) will check them, the same as other btree types.

We add a dynamic array for the extra, unknown btree roots, in addition
to the fixed size btree root array, and add new helpers for looking up
btree roots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a02a0121 28-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_version_compatible()

This adds a new helper for checking if an on-disk version is compatible
with the running version of bcachefs - prep work for introducing
major:minor version numbers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f33c58fc 27-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill BTREE_INSERT_USE_RESERVE

Now that we have journal watermarks and alloc watermarks unified,
BTREE_INSERT_USE_RESERVE is redundant and can be deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e4eb661d 27-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix btree node write error message

Error messages should include the error code, when available.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 19c304be 28-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: GFP_NOIO -> GFP_NOFS

GFP_NOIO dates from the bcache days, when we operated under the block
layer. Now, GFP_NOFS is more appropriate, so switch all GFP_NOIO uses to
GFP_NOFS.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1fb4fe63 20-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

six locks: Kill six_lock_state union

As suggested by Linus, this drops the six_lock_state union in favor of
raw bitmasks.

On the one hand, bitfields give more type-level structure to the code.
However, a significant amount of the code was working with
six_lock_state as a u64/atomic64_t, and the conversions from the
bitfields to the u64 were deemed a bit too out-there.

More significantly, because bitfield order is poorly defined (#ifdef
__LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the
sequence number would overflow into the rest of the bitfield if the
compiler didn't put the sequence number at the high end of the word.

The new code is a bit saner when we're on an architecture without real
atomic64_t support - all accesses to lock->state now go through
atomic64_*() operations.

On architectures with real atomic64_t support, we additionally use
atomic bit ops for setting/clearing individual bits.

Text size: 7467 bytes -> 4649 bytes - compilers still suck at
bitfields.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 09ebfa61 21-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Drop a redundant error message

When we're already read-only, we don't need to print out errors from
writing btree nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 65d48e35 14-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Private error codes: ENOMEM

This adds private error codes for most (but not all) of our ENOMEM uses,
which makes it easier to track down assorted allocation failures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ac2ccddc 04-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Drop some anonymous structs, unions

Rust bindgen doesn't cope well with anonymous structs and unions. This
patch drops the fancy anonymous structs & unions in bkey_i that let us
use the same helpers for bkey_i and bkey_packed; since bkey_packed is an
internal type that's never exposed to outside code, it's only a minor
inconvenienc.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 45dd05b3 04-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: BKEY_PADDED_ONSTACK()

Rust bindgen doesn't do anonymous structs very nicely: BKEY_PADDED()
only needs the anonymous struct when it's used on the stack, to
guarantee layout, not when it's embedded in another struct.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3329cf1b 02-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Centralize btree node lock initialization

This fixes some confusion in the lockdep code due to initializing btree
node/key cache locks with the same lockdep key, but different names.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1306f87d 02-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Plumb btree_trans through btree cache code

Soon, __bch2_btree_node_write() is going to require a btree_trans: zoned
device support is going to require a new allocation for every btree node
write. This is a bit of prep work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 12795a19 10-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add some logging for btree node rewrites due to errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a8b3a677 02-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Nocow support

This adds support for nocow mode, where we do writes in-place when
possible. Patch components:

- New boolean filesystem and inode option, nocow: note that when nocow
is enabled, data checksumming and compression are implicitly disabled

- To prevent in-place writes from racing with data moves
(data_update.c) or bucket reuse (i.e. a bucket being reused and
re-allocated while a nocow write is in flight, we have a new locking
mechanism.

Buckets can be locked for either data update or data move, using a
fixed size hash table of two_state_shared locks. We don't have any
chaining, meaning updates and moves to different buckets that hash to
the same lock will wait unnecessarily - we'll want to watch for this
becoming an issue.

- The allocator path also needs to check for in-place writes in flight
to a given bucket before giving it out: thus we add another counter
to bucket_alloc_state so we can track this.

- Fsync now may need to issue cache flushes to block devices instead of
flushing the journal. We add a device bitmask to bch_inode_info,
ei_devs_need_flush, which tracks devices that need to have flushes
issued - note that this will lead to unnecessary flushes when other
codepaths have already issued flushes, we may want to replace this with
a sequence number.

- New nocow write path: look up extents, and if they're writable write
to them - otherwise fall back to the normal COW write path.

XXX: switch to sequence numbers instead of bitmask for devs needing
journal flush

XXX: ei_quota_lock being a mutex means bch2_nocow_write_done() needs to
run in process context - see if we can improve this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2e984040 01-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve btree node read error path

This ensures that failure to read a btree node error is treated as a
topology error, and returns the correct error so that the topology
repair pass will be run.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 494dcc57 03-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Plumb saw_error through to btree_err()

The btree node read path has the ability to kick off an asynchronous
btree node rewrite if we saw and corrected an error. Previously this was
only used for errors that caused one of the replicas to be unusable -
this patch plumbs it through to all error paths, so that normal fsck
errors can be corrected.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b8fe1b1d 03-Jan-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Convert btree_err() to a function

This makes the code more readable, and reduces text size by 8 kb.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 149651dc 25-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fix fsck error

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e88a75eb 24-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New bpos_cmp(), bkey_cmp() replacements

This patch introduces
- bpos_eq()
- bpos_lt()
- bpos_le()
- bpos_gt()
- bpos_ge()

and equivalent replacements for bkey_cmp().

Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 42af0ad5 17-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix a race with b->write_type

b->write_type needs to be set atomically with setting the
btree_node_need_write flag, so move it into b->flags.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a1019576 22-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More style fixes

Fixes for various checkpatch errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2cb75179 28-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: should_compact_all()

This factors out a properly-documented helper for deciding when we want
to sort a btree node with MAX_BSETS bsets down to a single bset.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 46fee692 28-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improved btree write statistics

This replaces sysfs btree_avg_write_size with btree_write_stats, which
now breaks out statistics by the source of the btree write.

Btree writes that are too small are a source of inefficiency, and
excessive btree resort overhead - this will let us see what's causing
them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8cbb0002 30-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Write new btree nodes after parent update

In order to avoid locking all btree nodes up to the root for btree node
splits, we're going to have to introduce a new error path into
bch2_btree_insert_node(); this mean we can't have done any writes or
modified global state before that point.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d704d623 25-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: btree_err() now uses bch2_print_string_as_lines()

We've seen long error messages get truncated here, so convert to the new
bch2_print_string_as_lines().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ca7d8fca 21-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New locking functions

In the future, with the new deadlock cycle detector, we won't be using
bare six_lock_* anymore: lock wait entries will all be embedded in
btree_trans, and we will need a btree_trans context whenever locking a
btree node.

This patch plumbs a btree_trans to the few places that need it, and adds
two new locking functions
- btree_node_lock_nopath, which may fail returning a transaction
restart, and
- btree_node_lock_nopath_nofail, to be used in places where we know we
cannot deadlock (i.e. because we're holding no other locks).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 674cfc26 26-Aug-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add persistent counters for all tracepoints

Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bbf42884 17-Aug-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Always rebuild aux search trees when node boundaries change

Topology repair may change btree node min/max keys: when it does so, we
need to always rebuild eytzinger search trees because nodes directly
depend on those values.

This fixes a bug found by the 'kill_btree_node' test, where we'd pop an
assertion in bch2_bset_search_linear().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# efa8a701 15-Aug-2022 Olexa Bilaniuk <obilaniu@gmail.com>

bcachefs: remove dead whiteout_u64s argument.

Signed-off-by: Olexa Bilaniuk <obilaniu@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1ed0a5d2 19-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert fsck errors to errcode.h

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c9bd6732 13-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix btree node read retries

b->written wasn't being reset to 0 in the btree node read retry path,
causing decrypting & validation of previously read bsets to not be
re-run - ouch.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 401ec4db 03-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Printbuf rework

This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 652018d6 06-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix btree node read error path

We were forgetting to clear the read_in_flight flag - oops. This also
fixes it to not call bch2_fatal_error() before topology repair has had a
chance to do its thing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c7372678 26-May-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Print message on btree node read retry success

Right now, we print an error message on btree node read error, and we
print that we're retrying, but we don't explicitly say if the retry
succeeded - this makes things a little clearer.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# ae21f74e 18-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve invalid bkey error message

Bkeys have gotten a lot bigger since this code was written and now are
often formatted across multiple lines - while the reason a bkey is
invalid will still be short and fit on a single line. This patch prints
the error bfore the bkey, making it a bit more readable.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c0960603 17-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Shutdown path improvements

We're seeing occasional firings of the assertion in the key cache
shutdown code that nr_dirty == 0, which means we must sometimes be doing
transaction commits after we've gone read only.

Cleanups & changes:
- BCH_FS_ALLOC_CLEAN renamed to BCH_FS_CLEAN_SHUTDOWN
- new helper bch2_btree_interior_updates_flush(), which returns true if
it had to wait
- bch2_btree_flush_writes() now also returns true if there were btree
writes in flight
- __bch2_fs_read_only now checks if btree writes were in flight in the
shutdown loop: btree write completion does a transaction update, to
update the pointer in the parent node
- assert that !BCH_FS_CLEAN_SHUTDOWN in __bch2_trans_commit

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# cf0dd697 09-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't trigger extra assertions in journal replay

We now pass a rw argument to .key_invalid methods so they can trigger
assertions for updates but not on existing keys. We shouldn't trigger
these extra assertions in journal replay - this patch changes the
transaction commit path accordingly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 275c8426 03-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add rw to .key_invalid()

This adds a new parameter to .key_invalid() methods for whether the key
is being read or written; the idea being that methods can do more
aggressive checks when a key is newly created and being written, when we
wouldn't want to delete the key because of those checks.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f0ac7df2 03-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert .key_invalid methods to printbufs

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c6b2826c 11-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Freespace, need_discard btrees

This adds two new btrees for the upcoming allocator rewrite: an extents
btree of free buckets, and a btree for buckets awaiting discards.

We also add a new trigger for alloc keys to keep the new btrees up to
date, and a compatibility path to initialize them on existing
filesystems.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3756111d 21-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add printf format attribute to bch2_pr_buf()

This tells the compiler to check printf format strings, and catches a
few bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 74b33393 20-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: x-macro metadata version enum

Now we've got strings for metadata versions - this changes
bch2_sb_to_text() and our mount log message to use it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# cc23255e 10-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a missing wakeup

This fixes a rare bug with bch2_btree_flush_all_writes() getting stuck.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 30985537 04-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix usage of six lock's percpu mode

Six locks have a percpu mode, which we use for interior btree nodes, as
well as btree key cache keys for the subvolumes btree. We've been
switching locks back and forth between percpu and non percpu mode as
needed, but it turns out this is racy - when we're reusing an existing
node, other threads could be attempting to lock it while we're switching
it between modes.

This patch fixes this by never switching 'struct btree' between the two
modes, and instead segragating them between two different freed lists.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bf3efff5 27-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix race leading to btree node write getting stuck

Checking btree_node_may_write() isn't atomic with the other btree flags,
dirty and need_write in particular. There was a rare race where we'd
unblock a node from writing while __btree_node_flush() was setting
need_write, and no thread would notice that the node was now both able
to write and needed to be written.

Fix this by adding btree node flags for will_make_reachable and
write_blocked that can be checked in the cmpxchg loop in
__bch2_btree_node_write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 82732ef5 26-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve btree_node_write_if_need()

btree_node_write_if_need() kicks off a btree node write only if
need_write is set; this makes the locking easier to reason about by
moving the check into the cmpxchg loop in __bch2_btree_node_write().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 39dcace8 26-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix locking in btree_node_write_done()

There was a rare recursive locking bug, in __bch2_btree_node_write()
nowrite path -> btree_node_write_done(), in the path that kicks off
another write.

This splits out an inner __btree_node_write_done() that expects to be
run with the btree node lock held.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 75ef2c59 26-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Start moving debug info from sysfs to debugfs

In sysfs, files can only output at most PAGE_SIZE. This is a problem for
debug info that needs to list an arbitrary number of times, and because
of this limit some of our debug info has been terser and harder to read
than we'd like.

This patch moves info about journal pins and cached btree nodes to
debugfs, and greatly expands and improves the output we return.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 55334d78 26-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BCH_FS_HOLD_BTREE_WRITES

This was just dead code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# fa8e94fa 25-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Heap allocate printbufs

This patch changes printbufs dynamically allocate and reallocate a
buffer as needed. Stack usage has become a bit of a problem, and a major
cause of that has been static size string buffers on the stack.

The most involved part of this refactoring is that printbufs must now be
exited with printbuf_exit().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 78a8f362 23-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve some btree node read error messages

On btree node read error, it's helpful to see what we were trying to
read - was it all zeroes?

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# a9de137b 18-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check for errors from crypto_skcipher_encrypt()

Apparently it actually is possible for crypto_skcipher_encrypt() to
return an error - not sure why that would be - but we need to replace
our assertion with actual error handling.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 03ea3962 04-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Log & error message improvements

- Add a shim uuid_unparse_lower() in the kernel, since %pU doesn't work
in userspace

- We don't need to print the bcachefs: or the filesystem name prefix in
userspace

- Improve a few error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8244f320 14-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Option improvements

This adds flags for options that must be a power of two (block size and
btree node size), and options that are stored in the superblock as a
power of two (encoded extent max).

Also: options are now stored in memory in the same units they're
displayed in (bytes): we now convert when getting and setting from the
superblock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 62d5bd95 19-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bch2_sort_repack_merge()

The main function of bch2_sort_repack_merge() was to call .key_normalize
on every key, which drops stale (cached) pointers - it hasn't actually
merged extents in quite some time.

But bch2_gc_gens() now works on individual keys - we used to gc old gens
by rewriting entire btree nodes. With that gone, there's no need for
internal btree code to be calling .key_normalize anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2a863c6c 14-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix debug build in userspace

This fixes some compiler warnings that only trigger in userspace - dead
code, a maybe uninitialed variable, a maybe null ptr passed to printk.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# c79272d1 09-Sep-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix some compiler warnings

gcc couldn't always deduce that written wasn't used uninitialized

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f7a966a3 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Clean up/rename bch2_trans_node_* fns

These utility functions are for managing btree node state within a
btree_trans - rename them for consistency, and drop some unneeded
arguments.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9f6bd307 24-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reduce iter->trans usage

Disfavoured, and should go away.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e719fc34 15-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BSET_OFFSET()

Add a field to struct bset for the sector offset within the btree node
where it was written.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9f1833ca 10-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Update btree ptrs after every write

This closes a significant hole (and last known hole) in our ability to
verify metadata. Previously, since btree nodes are log structured, we
couldn't detect lost btree writes that weren't the first write to a
given node. Additionally, this seems to have lead to some significant
metadata corruption on multi device filesystems with metadata
replication: since a write may have made it to one device and not
another, if we read that btree node back from the replica that did have
that write and started appending after that point, the other replica
would have a gap in the bset entries and reading from that replica
wouldn't find the rest of the bsets.

But, since updates to interior btree nodes are now journalled, we can
close this hole by updating pointers to btree nodes after every write
with the currently written number of sectors, without negatively
affecting performance. This means we will always detect lost or corrupt
metadata - it also means that our btree is now a curious hybrid of COW
and non COW btrees, with all the benefits of both (excluding
complexity).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 0a700890 11-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kick off btree node writes from write completions

This is a performance improvement by removing the need to wait for the
in flight btree write to complete before kicking one off, which is going
to be needed to avoid a performance regression with the upcoming patch
to update btree ptrs after every btree write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 19d54324 10-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Really don't hold btree locks while btree IOs are in flight

This is something we've attempted to stick to for quite some time, as it
helps guarantee filesystem latency - but there's a few remaining paths
that this patch fixes.

This is also necessary for an upcoming patch to update btree pointers
after every btree write - since the btree write completion path will now
be doing btree operations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e3a67bdb 10-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Regularize argument passing of btree_trans

btree_trans should always be passed when we have one - iter->trans is
disfavoured. This mainly updates old code in btree_update_interior.c,
some of which predates btree_trans.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 50ad5d09 22-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix btree_node_read_all_replicas() error handling

We weren't checking bch2_btree_node_read_done() for errors, oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# ee757054 10-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix a deadlock

Waiting on a btree node write with btree locks held can deadlock, if the
write errors: the write error path has to do do a btree update to drop
the pointer to the replica that errored.

The interior update path has to wait on in flight btree writes before
freeing nodes on disk. Previously, this was done in
bch2_btree_interior_update_will_free_node(), and could deadlock; now, we
just stash a pointer to the node and do it in
btree_update_nodes_written(), just prior to the transactional part of
the update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9f2772c4 27-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out btree_error_wq

We can't use btree_update_wq becuase btree updates may be waiting on
btree writes to complete.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9dd89a05 22-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an issue with inconsistent btree writes after unclean shutdown

After unclean shutdown, btree writes may have completed on one device
and not others - and this inconsistency could lead us to writing new
bsets with a gap in our btree node in one of our replicas.

Fortunately, this is only an issue with bsets that are newer than the
most recent journal flush, and we already have a mechanism for detecting
and blacklisting those. We just need to make sure to start new btree
writes after the most recent _non_ blacklisted bset.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 731bdd2e 22-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a workqueue for btree io completions

Also, clean up workqueue usage - we shouldn't be using system
workqueues, pretty much everything we do needs to be on our own
WQ_MEM_RECLAIM workqueues.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 1ce0cf5f 21-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a debug mode that always reads from every btree replica

There's a new module parameter, verify_all_btree_replicas, that enables
reading from every btree replica when reading in btree nodes and
comparing them against each other. We've been seeing some strange btree
corruption - this will hopefully aid in tracking it down and catching it
more often.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 5bc38f44 07-May-2021 Dan Robertson <dan@dlrobertson.com>

bcachefs: Fix oob write in __bch2_btree_node_write

Fix a possible out of bounds write in __bch2_btree_node_write when
the data buffer padding is cleared up to the block size. The out of
bounds write is possible if the data buffers size is not a multiple
of the block size.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# aae15aaf 24-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: New and improved topology repair code

This splits out btree topology repair into a separate pass, and makes
some improvements:
- When we have to pick which of two overlapping nodes to drop keys
from, we use the btree node header sequence number to preserve the
newer node

- the gc code has been changed so that it doesn't bail out if we're
continuing/ignoring on fsck error - this way the dump tool can skip
running the repair pass but still walk all reachable metadata

- add a new superblock flag indicating when a filesystem is known to
have btree topology issues, and the topology repair pass should be
run

- changing the start/end of a node might mean keys in that node have to
be deleted: this patch handles that better by splitting it out into a
separate function and running it explicitly in the topology repair
code, previously those keys were only being dropped when the btree
node was read in.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bcd25dac 24-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rewrite btree nodes with errors

This patch adds self healing functionality for btree nodes - if we
notice a problem when reading a btree node, we just rewrite it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 51c804ed 06-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Punt btree writes to workqueue to submit

We don't want to be submitting IO with btree locks held, and btree
writes usually aren't latency sensitive.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2177147b 06-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve bset compaction

The previous patch that fixed btree nodes being written too aggressively
now meant that we weren't sorting btree node bsets optimally - this
patch fixes that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ba5f03d3 31-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a sysfs var for average btree write size

Useful number for performance tuning.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5f65d74d 28-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add repair code for out of order keys in a btree node.

This just drops the offending key - in the bug report where this was
seen, it was clearly a single bit memory error, and fsck will fix the
missing key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e751c01a 24-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Start using bpos.snapshot field

This patch starts treating the bpos.snapshot field like part of the key
in the btree code:

* bpos_successor() and bpos_predecessor() now include the snapshot field
* Keys in btrees that will be using snapshots (extents, inodes, dirents
and xattrs) now always have their snapshot field set to U32_MAX

The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that
determines whether we're iterating over keys in all snapshots or not -
internally, this controlls whether bkey_(successor|predecessor)
increment/decrement the snapshot field, or only the higher bits of the
key.

We add a new member to struct btree_iter, iter->snapshot: when
BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always
equal iter->snapshot, which will be 0 for btrees that don't use
snapshots, and alsways U32_MAX for btrees that will use snapshots
(until we enable snapshot creation).

This patch also introduces a new metadata version number, and compat
code for reading from/writing to older versions - this isn't a forced
upgrade (yet).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4cf91b02 04-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out bpos_cmp() and bkey_cmp()

With snapshots, we're going to need to differentiate between comparisons
that should and shouldn't include the snapshot field. bpos_cmp is now
the comparison function that does include the snapshot field, used by
core btree code.

Upper level filesystem code generally does _not_ want to compare against
the snapshot field - that code wants keys to compare as equal even when
one of them is in an ancestor snapshot.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0390ea8a 24-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop bkey noops

Bkey noops were introduced to deal with trimming inline data extents in
place in the btree: if the u64s field of a bkey was 0, that u64 was a
noop and we'd start looking for the next bkey immediately after it.

But extent handling has been lifted above the btree - we no longer
modify existing extents in place in the btree, and the compatibilty code
for old style extent btree nodes is gone, so we can completely drop this
code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 84cc758d 21-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Validate bset version field against sb version fields

The superblock version fields need to be accurate to know whether a
filesystem is supported, thus we should be verifying them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 50dc0f69 19-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Require all btree iterators to be freed

We keep running into occasional bugs with btree transaction iterators
overflowing - this will make those bugs more visible.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f020bfcd 04-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bch2_bpos_to_text() more consistently

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2436cb9f 20-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use x-macros for more enums

This patch standardizes all the enums that have associated string tables
(probably more enums should have string tables).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 41f8b09e 20-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename BTREE_ID enums for consistency with other enums

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c052cf82 19-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: KEY_TYPE_discard is no longer used

KEY_TYPE_discard used to be used for extent whiteouts, but when handling
over overlapping extents was lifted above the core btree code it became
unused. This patch updates various code to reflect that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f2785955 19-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill support for !BTREE_NODE_NEW_EXTENT_OVERWRITE()

bcachefs has been aggressively migrating filesystems and btree nodes to
the new format for quite some time - this shouldn't affect anyone
anymore, and lets us delete a _lot_ of code. Also, it frees up
KEY_TYPE_discard for a new whiteout key type for snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 006d69aa 16-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't drop ptrs to btree nodes

If a ptr gen doesn't match the bucket gen, the bucket likely doesn't
contain the data we want - but it's still possible the data we want
might have been overwritten, and for btree node pointers we can verify
whether or not the node is the one we wanted with the node's sequence
number, so it's better to keep the pointer and try reading from it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1889ad5a 14-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add code to scan for/rewite old btree nodes

This adds a new data job type to scan for btree nodes in the old extent
format, and rewrite them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 91f6ad6f 02-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Include device in btree IO error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 51d2dfb8 26-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add BTREE_PTR_RANGE_UPDATED

This is so that when we discover btree topology issues, we can just
update the pointer to a btree node and signal btree read path that the
min/max keys in the node header should be updated from the node pointer.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a5cd80ea 20-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an assertion pop

There was a race: btree node writes drop their reference on journal pins
before clearing the btree_node_write_in_flight flag.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ed9d58a2 14-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Run jset_validate in write path as well

This is because we had a bug where we were writing out journal entries
with garbage last_seq, and not catching it.

Also, completely ignore jset->last_seq when JSET_NO_FLUSH is true,
because of aforementioned bug, but change the write path to set last_seq
to 0 when JSET_NO_FLUSH is true.

Minor other cleanups and comments.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 07a1006a 17-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reduce/kill BKEY_PADDED use

With various newer key types - stripe keys, inline data extents - the
old approach of calculating the maximum size of the value is becoming
more and more error prone. Better to switch to bkey_on_stack, which can
dynamically allocate if necessary to handle any size bkey.

In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a2bfc841 06-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Try to print full btree error message

Metadata corruption bugs are hard to debug if we can't see exactly what
went wrong - try to allocate a bigger buffer so we can print out
everything we have.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5db43418 03-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't issue btree writes that weren't journalled

If we have an error in the btree interior update path that prevents us
from journalling the update, we can't issue the corresponding btree node
write - we didn't get a journal sequence number that would cause it to
be ignored in recovery.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0fefe8d8 03-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve some IO error messages

it's useful to know whether an error was for a read or a write - this
also standardizes error messages a bit more.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c74cec1 16-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add more debug checks

tracking down a bug where we see a btree node pointer in the wrong node

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6d9378f3 10-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Hack around bch2_varint_decode invalid reads

bch2_varint_decode can do reads up to 7 bytes past the end ptr, for the
sake of performance - these extra bytes are always masked off.

This won't be a problem in practice if we make sure to burn 8 bytes in
any buffer that has bkeys in it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6a747c46 09-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add accounting for dirty btree nodes/keys

This lets us improve journal reclaim, so that it now tries to make sure
no more than 3/4s of the btree node cache and btree key cache are dirty
- ensuring the shrinkers can free memory.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 811d2bcd 06-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop typechecking from bkey_cmp_packed()

This only did anything in two places, and those can just be replaced
wiht bkey_cmp_left_packed()).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 29364f34 02-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Drop sysfs interface to debug parameters

It's not used much anymore, the module paramter interface is better.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e00711d2 24-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve some error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9f115ce9 04-Aug-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a bug with the journal_seq_blacklist mechanism

Previously, we would start doing btree updates before writing the first
journal entry; if this was after an unclean shutdown, this could cause
those btree updates to not be blacklisted.

Also, move some code to headers for userspace debug tools.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7807e143 25-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert various code to printbuf

printbufs know how big the buffer is that was allocated, so we can get
rid of the random PAGE_SIZEs all over the place.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4580baec 25-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Remove some uses of PAGE_SIZE in the btree code

For portability to userspace, we should try to avoid working in kernel
pages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 63b214e7 21-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add bch2_blk_status_to_str()

We define our own BLK_STS_REMOVED, so we need our own to_str helper too.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 89fd25be 09-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use x-macros for data types

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fff899b1 03-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Mark btree nodes as needing rewrite when not all replicas are RW

This fixes a bug where recovery fails when one of the devices is read
only.

Also - consolidate the "must rewrite this node to insert it" behind a
new btree node flag.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 306d40df 02-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use blk_status_to_str()

Improved error messages are always a good thing

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a34782a0 17-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Change bch2_dump_bset() to also print key values

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9ef846a7 03-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve assorted error messages

This also consolidates the various checks in bch2_mark_pointer() and
bch2_trans_mark_pointer().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f36dff28 12-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Validate that we read the correct btree node

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bc970cec 02-May-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix two more deadlocks

Deadlock on shutdown:

btree_update_nodes_written() unblocks btree nodes from being written;
after doing so, it has to check if they were marked as needing to be
written and if so kick off those writes - if that doesn't happen, we'll
never release journal pins and shutdown will get stuck when flushing the
journal.

There was an error path where this didn't happen, because in the error
path we don't actually want those btree nodes write to happen; however,
we still have to kick off the write path so the journal pins get
released. The btree write path checks if we're in a journal error state
and doesn't do the actual write if we are.

Also - there was another deadlock because btree_update_nodes_written()
was taking the btree update off of the unwritten_list too soon - before
getting a journal reservation, which could fail and have to be retried.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 39fb2983 07-Jan-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bkey_type_successor

Previously, BTREE_ID_INODES was special - inodes were indexed by the
inode field, which meant the offset field of struct bpos wasn't used,
which led to special cases in e.g. the btree iterator code.

Now, inodes in the inodes btree are indexed by the offset field.

Also: prevously min_key was special for extents btrees, min_key for
extents would equal max_key for the previous node. Now, min_key =
bkey_successor() of the previous node, same as non extent btrees.

This means we can completely get rid of
btree_type_sucessor/predecessor.

Also make some improvements to the metadata IO validate/compat code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4e4758c6 27-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use memalloc_nofs_save()

vmalloc allocations don't always obey GFP_NOFS - memalloc_nofs_save() is
the prefered approach for the future.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6357d607 08-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Journal updates to interior nodes

Previously, the btree has always been self contained and internally
consistent on disk without anything from the journal - the journal just
contained pointers to the btree roots.

However, this meant that btree node split or compact operations - i.e.
anything that changes btree node topology and involves updates to
interior nodes - would require that interior btree node to be written
immediately, which means emitting a btree node write that's mostly empty
(using 4k of space on disk if the filesystemm blocksize is 4k to only
write perhaps ~100 bytes of new keys).

More importantly, this meant most btree node writes had to be FUA, and
consumer drives have a history of slow and/or buggy FUA support - other
filesystes have been bit by this.

This patch changes the interior btree update path to journal updates to
interior nodes, after the writes for the new btree nodes have completed.
Best of all, it turns out to simplify the interior node update path
somewhat.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e3e464ac 30-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Move extent overwrite handling out of core btree code

Ever since the btree code was first written, handling of overwriting
existing extents - including partially overwriting and splittin existing
extents - was handled as part of the core btree insert path. The modern
transaction and iterator infrastructure didn't exist then, so that was
the only way for it to be done.

This patch moves that outside of the core btree code to a pass that runs
at transaction commit time.

This is a significant simplification to the btree code and overall
reduction in code size, but more importantly it gets us much closer to
the core btree code being completely independent of extents and is
important prep work for snapshots.

This introduces a new feature bit; the old and new extent update models
are incompatible when the filesystem needs journal replay.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f1f5f114 26-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve an error message

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 72141e1f 24-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use btree_ptr_v2.mem_ptr to avoid hash table lookup

Nice performance optimization

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 548b3d20 07-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_ptr_v2

Add a new btree ptr type which contains the sequence number (random 64
bit cookie, actually) for that btree node - this lets us verify that
when we read in a btree node it really is the btree node we wanted.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 237e8048 18-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: introduce b->hash_val

This is partly prep work for introducing bch_btree_ptr_v2, but it'll
also be a bit of a performance boost by moving the full key out of the
hot part of struct btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1f49dafc 06-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_ptr_swab for indirect extents

bch2_ptr_swab was never updated when the code for generic keys with
pointers was added - it assumed the entire val was only used for
pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bcd6f3e0 26-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use KEY_TYPE_deleted whitouts for extents

Previously, partial overwrites of existing extents were handled
implicitly by the btree code; when reading in a btree node, we'd do a
mergesort of the different bsets and detect and fix partially
overlapping extents during that mergesort.

That approach won't work with snapshots: this changes extents to work
like regular keys as far as the btree code is concerned, where a 0 size
KEY_TYPE_deleted whiteout will completely overwrite an existing extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ae2f17d5 14-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill btree_node_iter_large

Long overdue cleanup - this converts btree_node_iter_large uses to
sort_iter.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8f82280e 14-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use one buffer for sorting whiteouts

We're not really supposed to allocate from the same mempool more than
once.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c297a763 13-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor whiteouts compaction

The whiteout compaction path - as opposed to just dropping whiteouts -
is now only needed for extents, and soon will only be needed for extent
btree nodes in the old format.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c9bebae6 29-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Whiteout changes

More prep work for snapshots: extents will soon be using
KEY_TYPE_deleted for whiteouts, with 0 size. But we wen't be able to
keep these whiteouts with the rest of the extents in the btree node, due
to sorting invariants breaking.

We can deal with this by immediately moving the new whiteouts to the
unwritten whiteouts area - this just means those whiteouts won't be
sorted, so we need new code to sort them prior to merging them with the
rest of the keys to be written.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ad44bdc3 09-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bkey noops

For upcoming inline data extents, we're going to need to be able to
shorten the value of existing bkeys in the btree - and to make that work
we're going to be able to need to pad out the space the value previously
took up with something.

This patch changes the various code that iterates over bkeys to handle
k->u64s == 0 as meaning "skip the next 8 bytes".

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cdd775e6 21-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use FUA unnecessarily

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 885678f6 03-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill direct access to bi_io_vec

Switch to always using bio_add_page(), which merges contiguous pages now
that we have multipage bvecs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 20bceecb 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More work to avoid transaction restarts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c43a6ef9 05-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_bkey_cached_common

This is prep work for the btree key cache: btree iterators will point to
either struct btree, or a new struct bkey_cached.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1dd7f9d9 04-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rewrite journal_seq_blacklist machinery

Now, we store blacklisted journal sequence numbers in the superblock,
not the journal: this helps to greatly simplify the code, and more
importantly it's now implemented in a way that doesn't require all btree
nodes to be visited before starting the journal - instead, we
unconditionally blacklist the next 4 journal sequence numbers after an
unclean shutdown.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 424eb881 25-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Only get btree iters from btree transactions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dc3b63dc 21-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add time stats for btree updates

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d0cc3def 13-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More allocator startup improvements

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 26609b61 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make bkey types globally unique

this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.

Also improve the on disk format versioning stuff.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5b8a9227 27-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out bkey_sort.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 319f9ac3 08-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: revamp to_text methods

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ac10a961 03-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Some fixes for building in userspace

userspace allocators don't align allocations as nicely as kernel
allocators, which meant that in some cases we weren't allocating big
enough bvec arrays - just make the calculations more rigorous and
explicit to fix it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5bd95a37 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: new avoid mechanism for io retries

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 198d6700 21-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: add functionality for heaps to update backpointers

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a2753581 30-Sep-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_extent_drop_ptrs()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4cb13156 02-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: extent_ptr_decoded

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a00fd8c5 21-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Comparison function cleanups

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 271a3d3a 21-Jul-2016 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: lift ordering restriction on 0 size extents

This lifts the restriction that 0 size extents must not overlap with
other extents, which means we can now sort extents and non extents the
same way, and will let us simplify a bunch of other stuff as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1fe08f31 05-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bkey_written()

also cleanups of btree node offsets

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c6fdbd8 17-Mar-2017 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Initial commit

Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>