History log of /linux-master/fs/bcachefs/alloc_types.h
Revision Date Author Comments
# e2a316b3 01-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: BCH_WATERMARK_interior_updates

This adds a new watermark, higher priority than BCH_WATERMARK_reclaim,
for interior btree updates. We've seen a deadlock where journal replay
triggers a ton of btree node merges, and these use up all available open
buckets and then interior updates get stuck.

One cause of this is that we're currently lacking btree node merging on
write buffer btrees - that needs to be fixed as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1e81f89b 06-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix assorted checkpatch nits

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bf5a261c 01-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted fixes for clang

clang had a few more warnings about enum conversion, and also didn't
like the opts.c initializer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ec14fc60 27-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill JOURNAL_WATERMARK

This unifies JOURNAL_WATERMARK with BCH_WATERMARK; we're working towards
specifying watermarks once in the transaction commit path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 494036d8 27-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: BCH_WATERMARK_reclaim

Add another watermark for journal reclaim - this is needed for the next
patches, that unify BCH_WATERMARK with JOURNAL_WATERMARK.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e53a961c 24-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename enum alloc_reserve -> bch_watermark

This is prep work for consolidating with JOURNAL_WATERMARK.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7635e1a6 25-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rework open bucket partial list allocation

Now, any open_bucket can go on the partial list: allocating from the
partial list has been moved to its own dedicated function,
open_bucket_add_bucets() -> bucket_alloc_set_partial().

In particular, this means that erasure coded buckets can safely go on
the partial list; the new location works with the "allocate an ec bucket
first, then the rest" logic.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e84face6 01-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: RESERVE_stripe

Rework stripe creation path - new algorithm for deciding when to create
new stripes or reuse existing stripes.

We add a new allocation watermark, RESERVE_stripe, above RESERVE_none.
Then we always try to create a new stripe by doing RESERVE_stripe
allocations; if this fails, we reuse an existing stripe and allocate
buckets for it with the reserve watermark for the given write
(RESERVE_none or RESERVE_movinggc).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b1cfe5ed 01-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve dev_alloc_debug_to_text()

Now we also print the number of buckets reserved for each watermark.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a8b3a677 02-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Nocow support

This adds support for nocow mode, where we do writes in-place when
possible. Patch components:

- New boolean filesystem and inode option, nocow: note that when nocow
is enabled, data checksumming and compression are implicitly disabled

- To prevent in-place writes from racing with data moves
(data_update.c) or bucket reuse (i.e. a bucket being reused and
re-allocated while a nocow write is in flight, we have a new locking
mechanism.

Buckets can be locked for either data update or data move, using a
fixed size hash table of two_state_shared locks. We don't have any
chaining, meaning updates and moves to different buckets that hash to
the same lock will wait unnecessarily - we'll want to watch for this
becoming an issue.

- The allocator path also needs to check for in-place writes in flight
to a given bucket before giving it out: thus we add another counter
to bucket_alloc_state so we can track this.

- Fsync now may need to issue cache flushes to block devices instead of
flushing the journal. We add a device bitmask to bch_inode_info,
ei_devs_need_flush, which tracks devices that need to have flushes
issued - note that this will lead to unnecessary flushes when other
codepaths have already issued flushes, we may want to replace this with
a sequence number.

- New nocow write path: look up extents, and if they're writable write
to them - otherwise fall back to the normal COW write path.

XXX: switch to sequence numbers instead of bitmask for devs needing
journal flush

XXX: ei_quota_lock being a mutex means bch2_nocow_write_done() needs to
run in process context - see if we can improve this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ae10fe01 04-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bucket_alloc_state

This refactoring puts our various allocation path counters into a
dedicated struct - the upcoming nocow patch is going to add another
counter.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 822835ff 31-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fold bucket_state in to BCH_DATA_TYPES()

Previously, we were missing accounting for buckets in need_gc_gens and
need_discard states. This matters because buckets in those states need
other btree operations done before they can be used, so they can't be
conuted when checking current number of free buckets against the
allocation watermark.

Also, we weren't directly counting free buckets at all. Now, data type 0
== BCH_DATA_free, and free buckets are counted; this means we can get
rid of the separate (poorly defined) count of unavailable buckets.

This is a new on disk format version, with upgrade and fsck required for
the accounting changes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f25d8215 09-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill allocator threads & freelists

Now that we have new persistent data structures for the allocator, this
patch converts the allocator to use them.

Now, foreground bucket allocation uses the freespace btree to find
buckets to allocate, instead of popping buckets off the freelist.

The background allocator threads are no longer needed and are deleted,
as well as the allocator freelists. Now we only need background tasks
for invalidating buckets containing cached data (when we are low on
empty buckets), and for issuing discards.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b17d3cec 31-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Run btree updates after write out of write_point

In the write path, after the write to the block device(s) complete we
have to punt to process context to do the btree update.

Instead of using the work item embedded in op->cl, this patch switches
to a per write-point work item. This helps with two different issues:

- lock contention: btree updates to the same writepoint will (usually)
be updating the same alloc keys
- context switch overhead: when we're bottlenecked on btree updates,
having a thread (running out of a work item) checking the write point
for completed ops is cheaper than queueing up a new work item and
waking up a kworker.

In an arbitrary benchmark, 4k random writes with fio running inside a
VM, this patch resulted in a 10% improvement in total iops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e154711 13-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: x-macroize alloc_reserve enum

This makes an array of strings available, like our other enums.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9ddffaf8 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Put open_buckets in a hashtable

This is so that the copygc code doesn't have to refer to
bucket_mark.owned_by_allocator - assisting in getting rid of the in
memory bucket array.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# abe19d45 25-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor open_bucket code

Prep work for adding a hash table of open buckets - instead of embedding
a bch_extent_ptr, we need to refer to the bucket directly so that we're
not calling sector_to_bucket() in the hash table lookup code, which has
an expensive divide.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# bae895a5 18-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add allocator thread state to sysfs

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2abe5420 21-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Persist 64 bit io clocks

Originally, bcachefs - going back to bcache - stored, for each bucket, a
16 bit counter corresponding to how long it had been since the bucket
was read from. But, this required periodically rescaling counters on
every bucket to avoid wraparound. That wasn't an issue in bcache, where
we'd perodically rewrite the per bucket metadata all at once, but in
bcachefs we're trying to avoid having to walk every single bucket.

This patch switches to persisting 64 bit io clocks, corresponding to the
64 bit bucket timestaps introduced in the previous patch with
KEY_TYPE_alloc_v2.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 890e3f5b 07-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reserve some open buckets for btree allocations

This reverts part of the change from "bcachefs: Don't use
BTREE_INSERT_USE_RESERVE so much" - it turns out we still should be
reserving open buckets for btree node allocations, because otherwise
data bucket allocations (especially with erasure coding enabled) can use
up all our open buckets and we won't be able to do the metadata update
that lets us release those open bucket references. Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8deed5f4 14-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use separate new stripes for copygc and non-copygc

Allocations for copygc have to be kept separate from everything else,
so that copygc doesn't get starved.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3187aa8d 21-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much

Previously, we were using BTREE_INSERT_RESERVE in a lot of places where
it no longer makes sense.

- we now have more open_buckets than we used to, and the reserves work
better, so we shouldn't need to use BTREE_INSERT_RESERVE just because
we're holding open_buckets pinned anymore.

- We have the btree key cache for updates to the alloc btree, meaning
we no longer need the btree reserve to ensure the allocator can make
forward progress.

This means that we should only need a reserve for btree updates to
ensure that copygc can make forward progress.

Since it's now just for copygc, we can also fold RESERVE_BTREE into
RESERVE_MOVINGGC (the allocator's freelist reserve).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 74ed7e56 21-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't let copygc buckets be stolen by other threads

And assorted other copygc fixes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 374153c2 09-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More open buckets

We need a larger open bucket reserve now that the btree interior update
path holds onto open bucket references; filesystems with many high
through devices may need more open buckets now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b030f691 19-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix some reserve calculations

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cd575ddf 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Erasure coding

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b092dadd 04-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Scale down number of writepoints when low on space

this means we don't have to reserve space for them when calculating
filesystem capacity

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ef337c54 06-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Allocation code refactoring

bch2_alloc_sectors_start() was a nightmare to work with - it's got some
tricky stuff to do, since it wants to use the buckets the writepoint
already has, unless they're not in the target it wants to write to,
unless it can't allocate from any other devices in which case it will
use those buckets if it has to - et cetera.

This restructures the code to start with a new empty list of open
buckets we're going to use for the new allocation, pulling buckets from
the write point's list as we decide that we really are going to use
them - making the code somewhat more functional and drastically easier
to understand.

Also fixes a bug where we could end up waiting on c->freelist_wait
(because allocating from one device failed) but return success from
bch2_bucket_alloc(), because allocating from a different device
succeeded.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c6fdbd8 17-Mar-2017 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Initial commit

Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>