#
e858beed |
|
21-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: If we run merges at a lower watermark, they must be nonblocking Fix another deadlock related to the merge path; previously, we switched to always running merges at a lower watermark (because they are noncritical); but when we run at a lower watermark we also need to run nonblocking or we've introduced a new deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-and-tested-by: s@m-h.ug
|
#
27c15ed2 |
|
12-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_member.btree_allocated_bitmap This adds a small (64 bit) per-device bitmap that tracks ranges that have btree nodes, for accelerating btree node scan if it is ever needed. - New helpers, bch2_dev_btree_bitmap_marked() and bch2_dev_bitmap_mark(), for checking and updating the bitmap - Interior btree update path updates the bitmaps when required - The check_allocations pass has a new fsck_err check, btree_bitmap_not_marked - New on disk format version, mi_btree_mitmap, which indicates the new bitmap is present - Upgrade table lists the required recovery pass and expected fsck error - Btree node scan uses the bitmap to skip ranges if we're on the new version Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3f100489 |
|
13-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Disable merges from interior update path There's been a bug in the btree write buffer where it wasn't triggering btree node merges - and leaving behind a bunch of nearly empty btree nodes. Then during journal replay, when updates to the backpointers btree aren't using the btree write buffer (because we require synchronization with journal replay), we end up doing those merges all at once. Then if it's the interior update path running them, we deadlock because those run with the highest watermark. There's no real need for the interior update path to be doing btree node merges; other code paths can handle that at lower watermarks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9054ef2e |
|
13-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Run merges at BCH_WATERMARK_btree This fixes a deadlock where the interior update path during journal replay ends up doing a ton of merges on the backpointers btree, and deadlocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2b3e79fe |
|
10-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't use bch2_btree_node_lock_write_nofail() in btree split path It turns out - btree splits happen with the rest of the transaction still locked, to avoid unnecessary restarts, which means using nofail doesn't work here - we can deadlock. Fortunately, we now have the ability to return errors here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
beccf291 |
|
09-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix a race in btree_update_nodes_written() One btree update might have terminated in a node update, and then while it is in flight another btree update might free that original node. This race has to be handled in btree_update_nodes_written() - we were missing a READ_ONCE(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6088234c |
|
05-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: JOURNAL_SPACE_LOW "bcachefs; Fix deadlock in bch2_btree_update_start()" was a significant performance regression (nearly 50%) on multithreaded random writes with fio. The reason is that the journal watermark checks multiple things, including the state of the btree write buffer, and on multithreaded update heavy workloads we're bottleneked on write buffer flushing - we don't want kicknig off btree updates to depend on the state of the write buffer. This isn't strictly correct; the interior btree update path does do write buffer updates, but it's a tiny fraction of total accounting updates and we're more concerned with space in the journal itself. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d880a438 |
|
03-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Further improve btree_update_to_text() Print start and end level of the btree update; also a bit of cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f2f61f41 |
|
14-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e0319af2 |
|
02-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve bch2_btree_update_to_text() Print out the mode as a string, and also print out the btree and watermark. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e2a316b3 |
|
01-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BCH_WATERMARK_interior_updates This adds a new watermark, higher priority than BCH_WATERMARK_reclaim, for interior btree updates. We've seen a deadlock where journal replay triggers a ton of btree node merges, and these use up all available open buckets and then interior updates get stuck. One cause of this is that we're currently lacking btree node merging on write buffer btrees - that needs to be fixed as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ba947ecd |
|
01-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix btree node reserve Sign error when checking the watermark - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d2554263 |
|
23-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Split out recovery_passes.c We've grown a fair amount of code for managing recovery passes; tracking which ones we're running, which ones need to be run, and flagging in the superblock which ones need to be run on the next recovery. So it's worth splitting out into its own file, this code is pretty different from the code in recovery.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7f9e5080 |
|
14-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix bch2_btree_increase_depth() When we haven't yet allocated any btree nodes for a given btree, we first need to call the regular split path to allocate one. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
812a9297 |
|
26-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix btree node keys accounting in topology repair path When dropping keys now outside a now because we're changing the node min/max, we need to redo the node's accounting as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
79032b07 |
|
23-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improved topology repair checks Consolidate bch2_gc_check_topology() and btree_node_interior_verify(), and replace them with an improved version, bch2_btree_node_check_topology(). This checks that children of an interior node correctly span the full range of the parent node with no overlaps. Also, ensure that topology repairs at runtime are always a fatal error; in particular, this adds a check in btree_iter_down() - if we don't find a key while walking down the btree that's indicative of a topology error and should be flagged as such, not a null ptr deref. Some checks in btree_update_interior.c remaining BUG_ONS(), because we already checked the node for topology errors when starting the update, and the assertions indicate that we _just_ corrupted the btree node - i.e. the problem can't be that existing on disk corruption, they indicate an actual algorithmic bug. In the future, we'll be annotating the fsck errors list with which recovery pass corrects them; the open coded "run explicit recovery pass or fatal error" in bch2_btree_node_check_topology() will in the future be done for every fsck_err() call. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
40cb2623 |
|
26-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Be careful about btree node splits during journal replay Don't pick a pivot that's going to be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c502b5b8 |
|
18-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs; Fix deadlock in bch2_btree_update_start() BCH_TRANS_COMMIT_journal_reclaim with watermark != BCH_WATERMARK_reclaim means nonblocking, and we need the journal_res_get() in btree_update_start() to respect that. In a future refactoring we'll be deleting BCH_TRANS_COMMIT_journal_reclaim and replacing it with an explicit BCH_TRANS_COMMIT_nonblocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b38114dd |
|
17-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: ratelimit errors from async_btree_node_rewrite Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3ed94062 |
|
17-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve bch2_fatal_error() error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3becdd48 |
|
17-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve sysfs internal/btree_updates Print out the function that launched the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a0a466ea |
|
17-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Split out btree_node_rewrite_worker This fixes a deadlock due to using btree_interior_update_worker for non interior updates - async btree node rewrites were blocking, and then blocking other interior updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1fdb9685 |
|
08-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill unused flags argument to btree_split() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
52946d82 |
|
06-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill more -EIO error codes This converts -EIOs related to btree node errors to private error codes, which will help with some ongoing debugging by giving us better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
82fdc1dc |
|
05-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bump max_active on btree_interior_update_worker WQ_UNBOUND with max_active 1 means ordered workqueue, but we don't actually need or want ordered semantics - and probably want a higher concurrency limit anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ba89083e |
|
08-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix journal replay with unreadable btree roots When a btree root is unreadable, we still might be able to get some data back by replaying what's in the journal. Previously though, we got confused when journal replay would attempt to replay a key for a level that didn't exist. This adds bch2_btree_increase_depth(), so that journal replay can handle this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4e074475 |
|
10-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Clamp replicas_required to replicas This prevents going emergency read only when the user has specified replicas_required > replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ec4edd7b |
|
16-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Prep work for variable size btree node buffers bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f0431c5f |
|
31-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Combine .trans_trigger, .atomic_trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
717296c3 |
|
27-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: trans_mark now takes bkey_s Prep work for disk space accounting rewrite: we're going to want to use a single callback for both of our current triggers, so we need to change them to have the same type signature first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ff70ad2c |
|
15-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix interior update path btree_path uses Since the btree_paths array is now about to become growable, we have to be careful not to refer to paths by pointer across contexts where they may be reallocated. This fixes the remaining btree_interior_update() paths - split and merge. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d7e14035 |
|
10-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: get_unlocked_mut_path() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b0b67378 |
|
10-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: trans_for_each_path_with_node() no longer uses path->idx Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ccb7b08f |
|
10-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: trans_for_each_path() no longer uses path->idx path->idx is now a code smell: we should be using path_idx_t, since it's stable across btree path reallocation. This is also a bit faster, using the same loop counter vs. fetching path->idx from each path we iterate over. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
07f383c7 |
|
03-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree_iter -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
96ed47d1 |
|
08-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_btree_path_traverse() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f6363aca |
|
08-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_btree_path_make_mut() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
74e600c1 |
|
10-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs; bch2_path_put() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
255ebbbf |
|
08-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_path_get() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1a2a9f9f |
|
21-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: for_each_keylist_key() declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a7dc10ce |
|
19-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Make sure allocation failure errors are logged The previous patch fixed a bug in allocation path error handling, and it would've been noticed sooner had it been logged properly. Generally speaking, errors that shouldn't happen in normal operation and are being returned up the stack should be logged: the write path was already logging IO errors, but non IO errors were missed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cf904c8d |
|
16-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_err_(fn|msg) check if should print Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f3360005 |
|
10-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_trans_node_add no longer uses trans_for_each_path() In the future we'll be making trans->paths resizable and potentially having _many_ more paths (for fsck); we need to start fixing algorithms that walk each path in a transaction where possible. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
24de63da |
|
10-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve trans->extra_journal_entries Instead of using a darray, we now allocate journal entries for the transaction commit path with our normal bump allocator - with an inlined fastpath, and using btree_transaction_stats to remember how much to initially allocate so as to avoid transaction restarts. This is prep work for converting write buffer updates to use this mechanism. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a564c9fa |
|
02-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Include btree_trans in more tracepoints This gives us more context information - e.g. which codepath is invoking btree node reads. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3c471b65 |
|
26-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: convert bch_fs_flags to x-macro Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cb52d23e |
|
11-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Rename BTREE_INSERT flags BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
aa62aabb |
|
11-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill dead BTREE_INSERT flags BTREE_INSERT_NOWAIT and BTREE_INSERT_GC_LOCK_HELD are no longer used, and can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3eedfe1a |
|
09-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Journal pins must always have a flush_fn flush_fn is how we identify journal pins in debugfs - this is a debugging aid. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7ba1f6ec |
|
17-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs; guard against overflow in btree node split Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0fa3b977 |
|
17-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree_node_u64s_with_format() takes nr keys Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
87b0d8d3 |
|
30-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix a journal deadlock in replay Recently, journal pre-reservations were removed. They were for reserving space ahead of time in the journal for operations that are required for journal reclaim, e.g. btree key cache flushing and interior node btree updates. Instead we have watermarks - only operations for journal reclaim are allowed when the journal is low on space, and in general we're quite good about doing operations in the order that will free up space in the journal quickest when we're low on space. If we're doing a journal reclaim operation out of order, we usually do it in nonblocking mode if it's not freeing up space at the end of the journal. There's an exceptino though - interior btree node update operations have to be BCH_WATERMARK_reclaim - once they've been started, and they can't be nonblocking. Generally this is fine because they'll only be a very small fraction of transaction commits - but there's an exception, which is during journal replay. Journal replay does many btree operations, but doesn't need to commit them to the journal since they're already in the journal. So killing off of pre-reservation, plus another change to make journal replay more efficient by initially doing the replay in sorted btree order, made it possible for the interior update operations replay generates to fill and deadlock the journal. Fix this by introducing a new check on journal space at the _start_ of an interior update operation. This causes us to block if necessary in exactly the same way as we used to when interior updates took a journal pre-reservaiton, but without all the expensive accounting pre-reservations required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2111f394 |
|
28-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix race between btree writes and metadata drop btree writes update the btree node key after every write, in order to update sectors_written, and they also might need to drop pointers if one of the writes failed in a replicated btree node. But the btree node might also have had a pointer dropped while the write was in flight, by bch2_dev_metadata_drop(), and thus there was a bug where the btree node write would ovewrite the btree node's key with what it had at the start of the write. Fix this by dropping pointers not currently in the btree node key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5510a4af |
|
26-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix split_race livelock bch2_btree_update_start() calculates which nodes are going to have to be split/rewritten, so that we know how many nodes to reserve and how deep in the tree we have to take locks. But btree node merges require inserting two keys into the parent node, not just splits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d4e3b928 |
|
17-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
closures: CLOSURE_CALLBACK() to fix type punning Control flow integrity is now checking that type signatures match on indirect function calls. That breaks closures, which embed a work_struct in a closure in such a way that a closure_fn may also be used as a workqueue fn by the underlying closure code. So we have to change closure fns to take a work_struct as their argument - but that results in a loss of clarity, as closure fns have different semantics from normal workqueue functions (they run owning a ref on the closure, which must be released with continue_at() or closure_return()). Thus, this patc introduces CLOSURE_CALLBACK() and closure_type() macros as suggested by Kees, to smooth things over a bit. Suggested-by: Kees Cook <keescook@chromium.org> Cc: Coly Li <colyli@suse.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
006ccc30 |
|
04-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill journal pre-reservations This deletes the complicated and somewhat expensive journal pre-reservation machinery in favor of just using journal watermarks: when the journal is more than half full, we run journal reclaim more aggressively, and when the journal is more than 3/4s full we only allow journal reclaim to get new journal reservations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
769b3600 |
|
02-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't iterate over journal entries just for btree roots Small performance optimization, and a bit of a code cleanup too. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6dfa10ab |
|
31-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix build errors with gcc 10 gcc 10 seems to complain about array bounds in situations where gcc 11 does not - curious. This unfortunately requires adding some casts for now; we may investigate getting rid of our __u64 _data[] VLA in a future patch so that our start[0] members can be VLAs. Reported-by: John Stoffel <john@stoffel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
be9e782d |
|
27-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't downgrade locks on transaction restart We should only be downgrading locks on success - otherwise, our transaction restarts won't be getting the correct locks and we'll livelock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b65db750 |
|
24-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Enumerate fsck errors This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6bd68ec2 |
|
12-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Heap allocate btree_trans We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
96dea3d5 |
|
12-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix W=12 build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7cb0e699 |
|
12-Sep-2023 |
Colin Ian King <colin.i.king@gmail.com> |
bcachefs: remove redundant initialization of pointer d The pointer d is being initialized with a value that is never read, it is being re-assigned later on when it is used in a for-loop. The initialization is redundant and can be removed. Cleans up clang-scan build warning: fs/bcachefs/buckets.c:1303:25: warning: Value stored to 'd' during its initialization is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e46c181a |
|
10-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Convert more code to bch_err_msg() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5b7fbdcd |
|
09-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix silent enum conversion error This changes mark_btree_node_locked() to take an enum btree_node_locked_type, not a six_lock_type, since BTREE_NODE_UNLOCKED is -1 which may cause problems converting back and forth to six_lock_type if short enums are in use. With this change, we never store BTREE_NODE_UNLOCKED in a six_lock_type enum. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
93ee2c4b |
|
12-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't open code closure_nr_remaining() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
401585fe |
|
05-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree_journal_iter.c Split out a new file from recovery.c for managing the list of keys we read from the journal: before journal replay finishes the btree iterator code needs to be able to iterate over and return keys from the journal as well, so there's a fair bit of code here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bf5a261c |
|
01-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Assorted fixes for clang clang had a few more warnings about enum conversion, and also didn't like the opts.c initializer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
01e691e8 |
|
10-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix a write buffer flush deadlock We're not supposed to block if BTREE_INSERT_JOURNAL_RECLAIM && watermark != BCH_WATERMARK_reclaim. This should really be a separate BTREE_INSERT_NONBLOCK flag - add some comments to that effect, it's not important for this patch. btree write buffer flush depends on this behaviour though - the first loop tries to flush sequentially, which doesn't free up space in the journal optimally. If that can't proceed we bail out and flush in journal order - that won't work if we're blocked instead of returning an error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
73bd774d |
|
06-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Assorted sparse fixes - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
faa6cb6c |
|
28-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Allow for unknown btree IDs We need to allow filesystems with metadata from newer versions to be mountable and usable by older versions. This patch enables us to roll out new btrees without a new major version number; we can now handle btree roots for unknown btree types. The unknown btree roots will be retained, and fsck (including backpointers) will check them, the same as other btree types. We add a dynamic array for the extra, unknown btree roots, in addition to the fixed size btree root array, and add new helpers for looking up btree roots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f33c58fc |
|
27-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill BTREE_INSERT_USE_RESERVE Now that we have journal watermarks and alloc watermarks unified, BTREE_INSERT_USE_RESERVE is redundant and can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
65db6049 |
|
27-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix a null ptr deref in bch2_fs_alloc() error path This fixes a null ptr deref in bch2_free_pending_node_rewrites() when the list head wasn't initialized. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ec14fc60 |
|
27-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill JOURNAL_WATERMARK This unifies JOURNAL_WATERMARK with BCH_WATERMARK; we're working towards specifying watermarks once in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e53a961c |
|
24-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Rename enum alloc_reserve -> bch_watermark This is prep work for consolidating with JOURNAL_WATERMARK. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
45a1ab57 |
|
16-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix bch2_btree_update_start() The calculation for number of nodes to allocate in bch2_btree_update_start() was incorrect - this fixes a BUG_ON() on the small nodes test. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
25c70097 |
|
11-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Delete weird hacky transaction restart injection since we currently don't have a good fault injection library, bch2_btree_insert_node() was randomly injecting faults based on local_clock(). At the very least this should have been a debug mode only thing, but this is a brittle method so let's just delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
49c7cd9d |
|
30-May-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: More drop_locks_do() conversions Using drop_locks_do() ensures that every unlock() is paired with a relock(), with proper error checking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b5fd7566 |
|
28-May-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: drop_locks_do() Add a new helper for the common pattern of: - trans_unlock() - do something - trans_relock() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
19c304be |
|
28-May-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: GFP_NOIO -> GFP_NOFS GFP_NOIO dates from the bcache days, when we operated under the block layer. Now, GFP_NOFS is more appropriate, so switch all GFP_NOIO uses to GFP_NOFS. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1fb4fe63 |
|
20-May-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
six locks: Kill six_lock_state union As suggested by Linus, this drops the six_lock_state union in favor of raw bitmasks. On the one hand, bitfields give more type-level structure to the code. However, a significant amount of the code was working with six_lock_state as a u64/atomic64_t, and the conversions from the bitfields to the u64 were deemed a bit too out-there. More significantly, because bitfield order is poorly defined (#ifdef __LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the sequence number would overflow into the rest of the bitfield if the compiler didn't put the sequence number at the high end of the word. The new code is a bit saner when we're on an architecture without real atomic64_t support - all accesses to lock->state now go through atomic64_*() operations. On architectures with real atomic64_t support, we additionally use atomic bit ops for setting/clearing individual bits. Text size: 7467 bytes -> 4649 bytes - compilers still suck at bitfields. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3d86f13d |
|
30-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve trans_restart_split_race tracepoint Seeing occasional test failures where we get stuck in a livelock that involves this event - this will help track it down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
abab7609 |
|
17-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix bch2_extent_fallocate() in nocow mode When we allocate disk space, we need to be incrementing the WRITE io clock, which perhaps should be renamed to sectors allocated - copygc uses this io clock to know when to run. Also, we should be incrementing the same clock when allocating btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
65d48e35 |
|
14-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Private error codes: ENOMEM This adds private error codes for most (but not all) of our ENOMEM uses, which makes it easier to track down assorted allocation failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ac2ccddc |
|
04-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Drop some anonymous structs, unions Rust bindgen doesn't cope well with anonymous structs and unions. This patch drops the fancy anonymous structs & unions in bkey_i that let us use the same helpers for bkey_i and bkey_packed; since bkey_packed is an internal type that's never exposed to outside code, it's only a minor inconvenienc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
45dd05b3 |
|
04-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BKEY_PADDED_ONSTACK() Rust bindgen doesn't do anonymous structs very nicely: BKEY_PADDED() only needs the anonymous struct when it's used on the stack, to guarantee layout, not when it's embedded in another struct. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1306f87d |
|
02-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Plumb btree_trans through btree cache code Soon, __bch2_btree_node_write() is going to require a btree_trans: zoned device support is going to require a new allocation for every btree node write. This is a bit of prep work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e151580d |
|
20-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add tracepoint & counter for btree split race Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9f6db127 |
|
18-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_journal_entries_postprocess() This brings back journal_entries_compact(), but in a more efficient form - we need to do multiple postprocess steps, so iterate over the journal entries being written just once to make it more efficient. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a1f26d70 |
|
10-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Handle btree node rewrites before going RW Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
12795a19 |
|
10-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add some logging for btree node rewrites due to errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d94189ad |
|
08-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Debug mode for c->writes references This adds a debug mode where we split up the c->writes refcount into distinct refcounts for every codepath that takes a reference, and adds sysfs code to print the value of each ref. This will make it easier to debug shutdown hangs due to refcount leaks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
06ab86d5 |
|
09-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix btree_node_write_blocked() not being cleared The btree_node_write_blocked bit was a later addition to this code, it only mirrors the state of the b->write_blocked list (empty or nonempty) - unfortunately, when it was added it wasn't correctly kept in sync - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
adf6360b |
|
01-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve btree_reserve_get_fail tracepoint Now we include the return code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
87ced107 |
|
13-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Convert EAGAIN errors to private error codes More error code cleanup, for better error messages and debugability. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ee2c6ea7 |
|
08-Jan-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree_iter->ip_allocated In debug mode, we now track where btree iterators and paths are initialized/allocated - helpful in tracking down btree path overflows. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e88a75eb |
|
24-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: New bpos_cmp(), bkey_cmp() replacements This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
42af0ad5 |
|
17-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix a race with b->write_type b->write_type needs to be set atomically with setting the btree_node_need_write flag, so move it into b->flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4fcdd6ec |
|
15-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Btree split improvement This improves the bkey_format calculation when splitting btree nodes. Previously, we'd use a format calculated for the original node for the lower of the two new nodes. This was particularly bad on sequential insertions, where we iteratively split the last btree node, whos format has to include KEY_MAX. Now, we calculate formats precisely for the keys the two new nodes will contain. This also should make splitting a bit more efficient, since we're only copying keys once (from the original node to the new node, instead of new node, replacement node, then upper split). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
46fee692 |
|
28-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improved btree write statistics This replaces sysfs btree_avg_write_size with btree_write_stats, which now breaks out statistics by the source of the btree write. Btree writes that are too small are a source of inefficiency, and excessive btree resort overhead - this will let us see what's causing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
df6a24f8 |
|
22-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Make error messages more uniform Use __func__ in error messages that refer to function name, and do so more uniformly - suggested by checkpatch.pl Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3e3e02e6 |
|
19-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Assorted checkpatch fixes checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
de107dc8 |
|
12-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Call bch2_btree_update_add_new_node() before dropping write lock btree nodes can be written by other threads (shrinker, journal reclaim) with only a read lock, but brand new nodes should only be written by the thread doing the split/interior update. bch2_btree_update_add_new_node() sets btree node flags to indicate that this is a new node and should not be written out by other threads, thus we need to call it before dropping our write lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1f0f731f |
|
27-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Btree splits now only take the locks they need Previously, bch2_btree_update_start() would always take all intent locks, all the way up to the root. We've finally got data from users where this became a scalability issue - so, this patch fixes bch2_btree_update_start() to only take the locks we need. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1ff7849f |
|
09-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_btree_insert_node() no longer uses lock_write_nofail Now that we have an error path plumbed through, there's no need to be using bch2_btree_node_lock_write_nofail(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a8eefbd3 |
|
01-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add error path to btree_split() The next patch in the series is (finally!) going to change btree splits (and interior updates in general) to not take intent locks all the way up to the root - instead only locking the nodes they'll need to modify. However, this will be introducing a race since if we're not holding a write lock on a btree node it can be written out by another thread, and then we might not have enough space for a new bset entry. We can handle this by retrying - we just need to introduce a new error path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8cbb0002 |
|
30-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Write new btree nodes after parent update In order to avoid locking all btree nodes up to the root for btree node splits, we're going to have to introduce a new error path into bch2_btree_insert_node(); this mean we can't have done any writes or modified global state before that point. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8aaee94d |
|
03-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix a deadlock in btree_update_nodes_written() btree_node_lock_nopath() is something we'd like to get rid of, it's always prone to deadlocks if we accidentally are holding other locks, because it doesn't mark the lock it's taking in a path: we'll want to get rid of it in the future, but for now this patch works it by calling bch2_trans_unlock(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c298fd7d |
|
26-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs; Mark __bch2_trans_iter_init as inline This function is fairly small and only used in two places: one very hot, the other cold, so it should definitely be inlined. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c6cf49a9 |
|
23-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix blocking with locks held This is a major oopsy - we should always be unlocking before calling closure_sync(), else we'll cause a deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
01ed3359 |
|
22-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree_update_nodes_written() needs BTREE_INSERT_USE_RESERVE This fixes an obvious deadlock - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d602657c |
|
22-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix error handling in bch2_btree_update_start() We were checking for -EAGAIN, but we're not returned that when we didn't pass a closure to wait with - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e4215d0f |
|
16-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: All held locks must be in a btree path With the new deadlock cycle detector, it's critical that all held locks be marked in a btree_path, because that's what the cycle detector traverses - any locks that aren't correctly marked will cause deadlocks. This changes the btree_path to allocate some btree_paths for the new nodes, since until the final update is done we otherwise don't have a path referencing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
367d72dd |
|
17-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_btree_path_upgrade() now emits transaction restart Centralizing the transaction restart/tracepoint in bch2_btree_path_upgrade() lets us improve the tracepoint - now it emits old and new locks_want. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
38474c26 |
|
02-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Avoid using btree_node_lock_nopath() With the upcoming cycle detector, we have to be careful about using btree_node_lock_nopath - in particular, using it to take write locks can cause deadlocks. All held locks need to be tracked in a btree_path, so that the cycle detector knows about them - unless we know that we cannot cause deadlocks for other reasons: e.g. we are only taking read locks, or we're in very early fsck (topology repair). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
da4474f2 |
|
03-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Convert more locking code to btree_bkey_cached_common Ideally, all the code in btree_locking.c should be converted, but then we'd want to convert btree_path to point to btree_key_cached_common too, and then we'd be in for a much bigger cleanup - but a bit of incremental cleanup will still be helpful for the next patches. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d5024b01 |
|
22-Aug-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_btree_node_lock_write_nofail() Taking a write lock will be able to fail, with the new cycle detector - unless we pass it nofail, which is possible but not preferred. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ca7d8fca |
|
21-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New locking functions In the future, with the new deadlock cycle detector, we won't be using bare six_lock_* anymore: lock wait entries will all be embedded in btree_trans, and we will need a btree_trans context whenever locking a btree node. This patch plumbs a btree_trans to the few places that need it, and adds two new locking functions - btree_node_lock_nopath, which may fail returning a transaction restart, and - btree_node_lock_nopath_nofail, to be used in places where we know we cannot deadlock (i.e. because we're holding no other locks). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
674cfc26 |
|
26-Aug-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add persistent counters for all tracepoints Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d97e6aae |
|
26-Aug-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix bch2_btree_update_start() to return -BCH_ERR_journal_reclaim_would_deadlock On failure to get a journal pre-reservation because we're called from journal reclaim we're not supposed to return a transaction restart error - this fixes a livelock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ce56bf7f |
|
26-Aug-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve trans_restart_journal_preres_get tracepoint It now includes journal_flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f0d2e9f2 |
|
06-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add assertions for unexpected transaction restarts Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
9f96568c |
|
09-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Tracepoint improvements Our types are exported to the tracepoint code, so it's not necessary to break things out individually when passing them to tracepoints - we can also call other functions from TP_fast_assign(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
315c9ba6 |
|
10-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_NO_NODE -> BCH_ERR codes Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
fd211bc7 |
|
10-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't set should_be_locked on paths that aren't locked It doesn't make any sense to set should_be_locked on btree_paths that aren't locked, and is often a bug - this patch adds assertions and fixes some of those bugs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
549d173c |
|
17-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: EINTR -> BCH_ERR_transaction_restart Now that we have error codes, with subtypes, we can switch to our own error code for transaction restarts - and even better, a distinct error code for each transaction restart reason: clearer code and better debugging. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
8bfe14e8 |
|
14-Jul-2022 |
Daniel Hill <daniel@gluo.nz> |
bcachefs: lock time stats prep work. We need the caller name and a place to store our results, btree_trans provides this. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e68914ca |
|
13-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rename __bch2_trans_do() -> commit_do() Better/more descriptive naming, and prep for adding nested_lockrestart_do() and nested_commit_do(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
a3d7afa5 |
|
18-Jun-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Always use percpu_ref_tryget_live() on c->writes If we're trying to get a ref and the refcount has been killed, it means we're doing an emergency shutdown - we always want tryget_live(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
401ec4db |
|
03-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Printbuf rework This converts bcachefs to the modern printbuf interface/implementation, synced with the version to be submitted upstream. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1f93726e |
|
17-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Tracepoint improvements Delete some obsolete tracepoints, organize alloc tracepoints better, make a few tracepoints more consistent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
fd4cecd2 |
|
18-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Lock ordering fix Can't take btree node locks while holding btree_reserve_cache_lock - it would be nice if we could check this with lockdep. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
c0960603 |
|
17-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Shutdown path improvements We're seeing occasional firings of the assertion in the key cache shutdown code that nr_dirty == 0, which means we must sometimes be doing transaction commits after we've gone read only. Cleanups & changes: - BCH_FS_ALLOC_CLEAN renamed to BCH_FS_CLEAN_SHUTDOWN - new helper bch2_btree_interior_updates_flush(), which returns true if it had to wait - bch2_btree_flush_writes() now also returns true if there were btree writes in flight - __bch2_fs_read_only now checks if btree writes were in flight in the shutdown loop: btree write completion does a transaction update, to update the pointer in the parent node - assert that !BCH_FS_CLEAN_SHUTDOWN in __bch2_trans_commit Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
7419646b |
|
14-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_update_interior.c prep for backpointers Previously, btree_update_interior.c passed keys to bch2_trans_mark_* that hadn't been fully initialized - they didn't have the key field filled out, just the value. With backpointers, we need to make sure keys are fully initialized before marking them - because the backpointer points back to the original key. This patch tweaks the interior update paths to fix this. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
e1b8f5f5 |
|
31-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Plumb btree_id & level to trans_mark For backpointers, we'll need the full key location - that means btree_id and btree level. This patch plumbs it through. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
275c8426 |
|
03-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add rw to .key_invalid() This adds a new parameter to .key_invalid() methods for whether the key is being read or written; the idea being that methods can do more aggressive checks when a key is newly created and being written, when we wouldn't want to delete the key because of those checks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
f0ac7df2 |
|
03-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert .key_invalid methods to printbufs Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
f25d8215 |
|
09-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill allocator threads & freelists Now that we have new persistent data structures for the allocator, this patch converts the allocator to use them. Now, foreground bucket allocation uses the freespace btree to find buckets to allocate, instead of popping buckets off the freelist. The background allocator threads are no longer needed and are deleted, as well as the allocator freelists. Now we only need background tasks for invalidating buckets containing cached data (when we are low on empty buckets), and for issuing discards. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b17d3cec |
|
31-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Run btree updates after write out of write_point In the write path, after the write to the block device(s) complete we have to punt to process context to do the btree update. Instead of using the work item embedded in op->cl, this patch switches to a per write-point work item. This helps with two different issues: - lock contention: btree updates to the same writepoint will (usually) be updating the same alloc keys - context switch overhead: when we're bottlenecked on btree updates, having a thread (running out of a work item) checking the write point for completed ops is cheaper than queueing up a new work item and waking up a kworker. In an arbitrary benchmark, 4k random writes with fio running inside a VM, this patch resulted in a 10% improvement in total iops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5f417394 |
|
11-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_btree_update_start() refactoring This simplifies the logic in bch2_btree_update_start() a bit, handling the unlock/block logic more locally. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
31f63fd1 |
|
14-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Introduce a separate journal watermark for copygc Since journal reclaim -> btree key cache flushing may require the allocation of new btree nodes, it has an implicit dependency on copygc in order to make forward progress - so we should avoid blocking copygc unless the journal is really close to full. This introduces watermarks to replace our single MAY_GET_UNRESERVED bit in the journal, and adds a watermark for copygc and plumbs it through. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3e154711 |
|
13-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: x-macroize alloc_reserve enum This makes an array of strings available, like our other enums. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2a6870ad |
|
29-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use darray for extra_journal_entries Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
7071878b |
|
30-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a missing btree_path_set_dirty() calls bch2_btree_iter_next_node() was mucking with other btree_path state without setting path->update to be consistent with the fact that the path is very much no longer uptodate - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
30985537 |
|
04-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix usage of six lock's percpu mode Six locks have a percpu mode, which we use for interior btree nodes, as well as btree key cache keys for the subvolumes btree. We've been switching locks back and forth between percpu and non percpu mode as needed, but it turns out this is racy - when we're reusing an existing node, other threads could be attempting to lock it while we're switching it between modes. This patch fixes this by never switching 'struct btree' between the two modes, and instead segragating them between two different freed lists. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ee68105f |
|
04-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Simplify parameters to bch2_btree_update_start() We don't need to pass the number of nodes required to bch2_btree_update_start, just whether we're doing a split at @level. This is prep work for a fix to our usage of six lock's percpu mode, which is going to require us to count up and allocate interior nodes and leaf nodes seperately. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b66fbf33 |
|
01-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Drop unneeded journal pin in bch2_btree_update_start() When we do an interior btree update, we create new btree nodes and link them into the btree in memory, but they don't become reachable on disk until later, when btree_update_nodes_written_trans() runs. Updates to the new nodes can thus happen before they're reachable on disk, and if the updates to those new nodes are written before the nodes become reachable, we would then drop the journal pin for those updates before the btree has them. This is what the journal pin in bch2_btree_update_start() was protecting against. However, it's not actually needed because we don't allow subsequent append writes to btree nodes until the node is reachable on disk. Dropping this unneeded pin also fixes a bug introduced by "bcachefs: Journal seq now incremented at entry open, not close" - in the new code, if the journal is completely empty a journal pin list for journal_cur_seq() won't exist. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
bf3efff5 |
|
27-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix race leading to btree node write getting stuck Checking btree_node_may_write() isn't atomic with the other btree flags, dirty and need_write in particular. There was a rare race where we'd unblock a node from writing while __btree_node_flush() was setting need_write, and no thread would notice that the node was now both able to write and needed to be written. Fix this by adding btree node flags for will_make_reachable and write_blocked that can be checked in the cmpxchg loop in __bch2_btree_node_write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
82732ef5 |
|
26-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve btree_node_write_if_need() btree_node_write_if_need() kicks off a btree node write only if need_write is set; this makes the locking easier to reason about by moving the check into the cmpxchg loop in __bch2_btree_node_write(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
de517c95 |
|
26-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use x-macros for btree node flags This is for adding an array of strings for btree node flag names. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
55334d78 |
|
26-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill BCH_FS_HOLD_BTREE_WRITES This was just dead code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
a0a07c59 |
|
25-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix btree path sorting In btree_update_interior.c, we were changing a path's level directly - which affects path sort order - without re-sorting paths, leading to assertions when bch2_path_get() verified paths were sorted correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
fa8e94fa |
|
25-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Heap allocate printbufs This patch changes printbufs dynamically allocate and reallocate a buffer as needed. Stack usage has become a bit of a problem, and a major cause of that has been static size string buffers on the stack. The most involved part of this refactoring is that printbufs must now be exited with printbuf_exit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ae94c78f |
|
10-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_trans_mark_key() now takes a bkey_i * We're now coming up with triggers that modify the update being done. A bkey_s_c is const - bkey_i is the correct type to be using here. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
6e44568c |
|
22-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Set BTREE_NODE_SEQ() correctly in merge path BTREE_NODE_SEQ() is supposed to give us a time ordering of btree nodes on disk, so that we can tell which btree node is newer if we ever have to scan the entire device to find btree nodes. The btree node merge path wasn't setting it correctly on the new node - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
c7ce2732 |
|
15-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Also show when blocked on write locks This consolidates some of the btree node lock path, so that when we're blocked taking a write lock on a node it shows up in bch2_btree_trans_to_text(), along with intent and read locks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
35228ecb |
|
06-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't keep nodes in btree_reserve locked These nodes aren't reachable by other threads, so there's no need to keep it locked - and this fixes a bug with the assertion in bch2_trans_unlock() firing on transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
b674bfad |
|
08-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use BTREE_INSERT_USE_RESERVE in btree_update_key() bch2_btree_update_key() is used in the btree node write path - before delivering the completion we have to update the parent pointer with the number of sectors written. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
669f87a5 |
|
03-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Switch to __func__for recording where btree_trans was initialized Symbol decoding, via %ps, isn't supported in userspace - this will also be faster when we're using trans->fn in the fast path, as with the new BCH_JSET_ENTRY_log journal messages. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
d8601afc |
|
27-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Simplify journal replay With BTREE_ITER_WITH_JOURNAL, there's no longer any restrictions on the order we have to replay keys from the journal in, and we can also start up journal reclaim right away - and delete a bunch of code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
5222a460 |
|
25-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_WITH_JOURNAL This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is automatically enabled when initializing a btree iterator before journal replay has completed - it overlays the contents of the journal with the btree. This lets us delete bch2_btree_and_journal_walk() and just use the normal btree iterator interface instead - which also lets us delete a significant amount of duplicated code. Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch - we're redoing the binary search over keys in the journal every time we call bch2_btree_iter_peek(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
57af63b2 |
|
25-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_alloc_sectors_append_ptrs() now takes cached flag Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8244f320 |
|
14-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Option improvements This adds flags for options that must be a power of two (block size and btree node size), and options that are stored in the superblock as a power of two (encoded extent max). Also: options are now stored in memory in the same units they're displayed in (bytes): we now convert when getting and setting from the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
f3e1f444 |
|
21-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_NOPRESERVE This adds a flag to not mark the initial btree_path as preserve, for paths that we expect to be cheap to reconstitute if necessary - this solves a btree_path overflow caused by need_whiteout_for_snapshot(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
991ba021 |
|
10-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add more time_stats This adds more latency/event measurements and breaks some apart into more events. Journal writes are broken apart into flush writes and noflush writes, btree compactions are broken out from btree splits, btree mergers are added, as well as btree_interior_updates - foreground and total. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
f3cf0999 |
|
24-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_btree_node_rewrite() now returns transaction restarts We have been getting away from handling transaction restarts locally - convert bch2_btree_node_rewrite() to the newer style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
d355c6f4 |
|
19-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: for_each_btree_node() now returns errors directly This changes for_each_btree_node() to work like for_each_btree_key(), and to that end bch2_btree_iter_peek_node() and next_node() also return error ptrs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
d697b9ab |
|
07-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: More btree iterator fixes - check for getting to the end of the btree in bch2_path_verify_locks and __btree_path_traverse_all(), this fixes an infinite loop in __btree_path_traverse_all(). - relax requirement in bch2_btree_node_upgrade() that we must want an intent lock, this fixes bugs with paths that point to interior nodes (nonzero level). - bch2_btree_node_update_key(): fix it to upgrade the path to an intent lock, if necessary Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
71ed0056 |
|
26-Sep-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix an assertion We can end up in a strange situation where a btree_path points to a node being freed even after pointers to it should have been replaced by pointers to the new node - if the btree node has been reused since the pointer to it was created. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
22b383ad |
|
04-Sep-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill retry loop in btree merge path Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
1d3ecd7e |
|
04-Sep-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Tighten up btree locking invariants New rule is: if a btree path holds any locks it should be holding precisely the locks wanted (accoringing to path->level and path->locks_want). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
67e0dd8f |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_path This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cab8e233 |
|
31-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add an assertion for removing btree nodes from cache Chasing a bug that has something to do with the btree node cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
a0a56879 |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: More renaming Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
f7a966a3 |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Clean up/rename bch2_trans_node_* fns These utility functions are for managing btree node state within a btree_trans - rename them for consistency, and drop some unneeded arguments. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
78cf784e |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Further reduce iter->trans usage This is prep work for splitting btree_path out from btree_iter - btree_path will not have a pointer to btree_trans. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
9f6bd307 |
|
24-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Reduce iter->trans usage Disfavoured, and should go away. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
1a488e73 |
|
27-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill BTREE_INSERT_NOUNLOCK With the recent transaction restart changes, it's no longer needed - all transaction commits have BTREE_INSERT_NOUNLOCK semantics. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
e5af273f |
|
25-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: trans->restarted Start tracking when btree transactions have been restarted - and assert that we're always calling bch2_trans_begin() immediately after transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
a88171c9 |
|
24-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Clean up interior update paths Btree node merging now happens prior to transaction commit, not after, so we don't need to pay attention to BTREE_INSERT_NOUNLOCK. Also, foreground_maybe_merge shouldn't be calling bch2_btree_iter_traverse_all() - this is becoming private to the btree iterator code and should only be called by bch2_trans_begin(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
9f1833ca |
|
10-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Update btree ptrs after every write This closes a significant hole (and last known hole) in our ability to verify metadata. Previously, since btree nodes are log structured, we couldn't detect lost btree writes that weren't the first write to a given node. Additionally, this seems to have lead to some significant metadata corruption on multi device filesystems with metadata replication: since a write may have made it to one device and not another, if we read that btree node back from the replica that did have that write and started appending after that point, the other replica would have a gap in the bset entries and reading from that replica wouldn't find the rest of the bsets. But, since updates to interior btree nodes are now journalled, we can close this hole by updating pointers to btree nodes after every write with the currently written number of sectors, without negatively affecting performance. This means we will always detect lost or corrupt metadata - it also means that our btree is now a curious hybrid of COW and non COW btrees, with all the benefits of both (excluding complexity). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
9f6e1f7b |
|
13-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix an allocator shutdown deadlock On fstest generic/388, we were seeing sporadic deadlocks in the emergency shutdown, where we'd get stuck shutting down the allocator because bch2_btree_update_start() -> bch2_btree_reserve_get() allocated and then deallocated some btree nodes, putting them back on the btree_reserve_cache, after the allocator shutdown code had already cleared out that cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
19d54324 |
|
10-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Really don't hold btree locks while btree IOs are in flight This is something we've attempted to stick to for quite some time, as it helps guarantee filesystem latency - but there's a few remaining paths that this patch fixes. This is also necessary for an upcoming patch to update btree pointers after every btree write - since the btree write completion path will now be doing btree operations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e3a67bdb |
|
10-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Regularize argument passing of btree_trans btree_trans should always be passed when we have one - iter->trans is disfavoured. This mainly updates old code in btree_update_interior.c, some of which predates btree_trans. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
618b1c0e |
|
05-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out SPOS_MAX Internal btree code really wants a POS_MAX with all fields ~0; external code more likely wants the snapshot field to be 0, because when we're passing it to bch2_trans_get_iter() it's used for the snapshot we're operating in, which should be 0 for most btrees that don't use snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
297d8934 |
|
10-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Extensive triggers cleanups - We no longer mark subsets of extents, they're marked like regular keys now - which means we can drop the offset & sectors arguments to trigger functions - Drop other arguments that are no longer needed anymore in various places - fs_usage - Drop the logic for handling extents in bch2_mark_update() that isn't needed anymore, to match bch2_trans_mark_update() - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we don't have an old key to mark Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
8c3f6da9 |
|
14-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve iter->should_be_locked Adding iter->should_be_locked introduced a regression where it ended up not being set on the iterator passed to bch2_btree_update_start(), which is definitely not what we want. This patch requires it to be set when calling bch2_trans_update(), and adds various fixups to make that happen. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
531a0095 |
|
04-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve btree iterator tracepoints This patch adds some new tracepoints to the btree iterator code, and adds new fields to the existing tracepoints - primarily for the iterator position. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
ee757054 |
|
10-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix a deadlock Waiting on a btree node write with btree locks held can deadlock, if the write errors: the write error path has to do do a btree update to drop the pointer to the replica that errored. The interior update path has to wait on in flight btree writes before freeing nodes on disk. Previously, this was done in bch2_btree_interior_update_will_free_node(), and could deadlock; now, we just stash a pointer to the node and do it in btree_update_nodes_written(), just prior to the transactional part of the update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
731bdd2e |
|
22-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a workqueue for btree io completions Also, clean up workqueue usage - we shouldn't be using system workqueues, pretty much everything we do needs to be on our own WQ_MEM_RECLAIM workqueues. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
596d3bdc |
|
21-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't repair btree nodes until after interior journal replay is done We need the btree to be in a consistent state before we can rewrite btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
304b7e08 |
|
20-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix an uninitialized var this fixes a valgrind complaint Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
aae15aaf |
|
24-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New and improved topology repair code This splits out btree topology repair into a separate pass, and makes some improvements: - When we have to pick which of two overlapping nodes to drop keys from, we use the btree node header sequence number to preserve the newer node - the gc code has been changed so that it doesn't bail out if we're continuing/ignoring on fsck error - this way the dump tool can skip running the repair pass but still walk all reachable metadata - add a new superblock flag indicating when a filesystem is known to have btree topology issues, and the topology repair pass should be run - changing the start/end of a node might mean keys in that node have to be deleted: this patch handles that better by splitting it out into a separate function and running it explicitly in the topology repair code, previously those keys were only being dropped when the btree node was read in. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0098376f |
|
23-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New helper __bch2_btree_insert_keys_interior() Consolidate common parts of bch2_btree_insert_keys_interior() and btree_split_insert_keys() - prep work for adding some new topology assertions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bcd25dac |
|
24-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rewrite btree nodes with errors This patch adds self healing functionality for btree nodes - if we notice a problem when reading a btree node, we just rewrite it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
537c32f5 |
|
23-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't BUG_ON() btree topology error This replaces an assertion in the btree merge path with a bch2_inconsistent_error() - fsck will fix it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4d47b21c |
|
19-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a use after free Turns out, we weren't waiting on in flight btree writes when freeing existing btree nodes. This lead to stray btree writes overwriting newly allocated buckets, but only started showing itself with some of the recent allocator work and another patch to move submitting of btree writes to worqueues. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
241e2636 |
|
31-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't flush btree writes more aggressively because of btree key cache We need to flush the btree key cache when it's too dirty, because otherwise the shrinker won't be able to reclaim memory - this is done by journal reclaim. But journal reclaim also kicks btree node writes: this meant that btree node writes were getting kicked much too often just because we needed to flush btree key cache keys. This patch splits journal pins into two different lists, and teaches journal reclaim to not flush btree node writes when it only needs to flush key cache keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
35d5aff2 |
|
03-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill bch2_fs_usage_scratch_get() This is an important cleanup, eliminating an unnecessary copy in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2940295c |
|
03-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Be more careful about JOURNAL_RES_GET_RESERVED JOURNAL_RES_GET_RESERVED should only be used for updatse that need to be done to free up space in the journal. In particular, when we're flushing keys from the key cache, if we're flushing them out of order we shouldn't be using it, since we're using up our remaining space in the journal without dropping a pin that will let us make forward progress. With this patch, BTREE_INSERT_JOURNAL_RECLAIM without BTREE_INSERT_JOURNAL_RESERVED may return -EAGAIN - we can't wait on journal reclaim if we're already in journal reclaim. This means we need to propagate these errors up to journal reclaim, indicating that flushing a journal pin should be retried in the future. This is prep work for a patch to change the way journal reclaim works, to split out flushing key cache keys because the btree key cache is too dirty from journal reclaim because we need space in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ecab6be7 |
|
31-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_foreground_maybe_merge() now correctly reports lock restarts This means that btree node splits don't have to automatically trigger a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
54ca47e1 |
|
28-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill bch2_btree_node_get_sibling() This patch reworks the btree node merge path to use a second btree iterator to get the sibling node - which means bch2_btree_iter_get_sibling() can be deleted. Also, it uses bch2_btree_iter_traverse_all() if necessary - which means it should be more reliable. We don't currently even try to make it work when trans->nounlock is set - after a BTREE_INSERT_NOUNLOCK transaction commit, hopefully this will be a worthwhile tradeoff. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1259cc31 |
|
31-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Change where merging of interior btree nodes is trigger from Previously, we were doing btree node merging from bch2_btree_insert_node() - but this is called from the split path, when we're in the middle of creating new nodes and deleting new nodes and the iterators are in a weird state. Also, this means we're starting a new btree_update while in the middle of an existing one, and that's asking for deadlocks. Much simpler and saner to trigger btree node merging _after_ the whole btree node split path is finished. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e264b2f6 |
|
31-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve bch2_btree_update_start() bch2_btree_update_start() is now responsible for taking gc_lock and upgrading the iterator to lock parent nodes - greatly simplifying error handling and all of the callers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e751c01a |
|
24-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Start using bpos.snapshot field This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4cf91b02 |
|
04-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out bpos_cmp() and bkey_cmp() With snapshots, we're going to need to differentiate between comparisons that should and shouldn't include the snapshot field. bpos_cmp is now the comparison function that does include the snapshot field, used by core btree code. Upper level filesystem code generally does _not_ want to compare against the snapshot field - that code wants keys to compare as equal even when one of them is in an ancestor snapshot. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3bf57160 |
|
26-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix packed bkey format calculation for new btree roots Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2da5d000 |
|
26-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Generate better bkey formats when splitting nodes On btree node split, we weren't ensuring the min_key of the new larger node packs in the new format for this node. This triggers some painful slowpaths in the bset.c aux search tree code - this patch fixes that by calculating a new format for the new node with the new min_key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0390ea8a |
|
24-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Drop bkey noops Bkey noops were introduced to deal with trimming inline data extents in place in the btree: if the u64s field of a bkey was 0, that u64 was a noop and we'd start looking for the next bkey immediately after it. But extent handling has been lifted above the btree - we no longer modify existing extents in place in the btree, and the compatibilty code for old style extent btree nodes is gone, so we can completely drop this code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a9d79c6e |
|
23-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use pcpu mode of six locks for interior nodes Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f020bfcd |
|
04-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use bch2_bpos_to_text() more consistently Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
41f8b09e |
|
20-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rename BTREE_ID enums for consistency with other enums Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c052cf82 |
|
19-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: KEY_TYPE_discard is no longer used KEY_TYPE_discard used to be used for extent whiteouts, but when handling over overlapping extents was lifted above the core btree code it became unused. This patch updates various code to reflect that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f2785955 |
|
19-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill support for !BTREE_NODE_NEW_EXTENT_OVERWRITE() bcachefs has been aggressively migrating filesystems and btree nodes to the new format for quite some time - this shouldn't affect anyone anymore, and lets us delete a _lot_ of code. Also, it frees up KEY_TYPE_discard for a new whiteout key type for snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1889ad5a |
|
14-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add code to scan for/rewite old btree nodes This adds a new data job type to scan for btree nodes in the old extent format, and rewrite them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d042b040 |
|
29-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add an option for metadata_target Also, make journal writes obey foreground_target and metadata_target. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
51d2dfb8 |
|
26-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add BTREE_PTR_RANGE_UPDATED This is so that when we discover btree topology issues, we can just update the pointer to a btree node and signal btree read path that the min/max keys in the node header should be updated from the node pointer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dcf64dfb |
|
08-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix btree node split after merge operations A btree node merge operation deletes a key in the parent node; if when inserting into the parent node we split the parent node, we can end up with a whiteout in the parent node that we don't want. The existing code drops them before doing the split, because they can screw up picking the pivot, but we forgot about the unwritten writeouts area - that needs to be cleared out too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
890e3f5b |
|
07-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Reserve some open buckets for btree allocations This reverts part of the change from "bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much" - it turns out we still should be reserving open buckets for btree node allocations, because otherwise data bucket allocations (especially with erasure coding enabled) can use up all our open buckets and we won't be able to do the metadata update that lets us release those open bucket references. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
07a1006a |
|
17-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Reduce/kill BKEY_PADDED use With various newer key types - stripe keys, inline data extents - the old approach of calculating the maximum size of the value is becoming more and more error prone. Better to switch to bkey_on_stack, which can dynamically allocate if necessary to handle any size bkey. In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3187aa8d |
|
21-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much Previously, we were using BTREE_INSERT_RESERVE in a lot of places where it no longer makes sense. - we now have more open_buckets than we used to, and the reserves work better, so we shouldn't need to use BTREE_INSERT_RESERVE just because we're holding open_buckets pinned anymore. - We have the btree key cache for updates to the alloc btree, meaning we no longer need the btree reserve to ensure the allocator can make forward progress. This means that we should only need a reserve for btree updates to ensure that copygc can make forward progress. Since it's now just for copygc, we can also fold RESERVE_BTREE into RESERVE_MOVINGGC (the allocator's freelist reserve). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
719fe7fb |
|
10-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Update transactional triggers interface to pass old & new keys This is needed to fix a bug where we're overflowing iterators within a btree transaction, because we're updating the stripes btree (to update block counts) and the stripes btree trigger is unnecessarily updating the alloc btree - it doesn't need to update the alloc btree when the pointers within a stripe aren't changing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5db43418 |
|
03-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't issue btree writes that weren't journalled If we have an error in the btree interior update path that prevents us from journalling the update, we can't issue the corresponding btree node write - we didn't get a journal sequence number that would cause it to be ignored in recovery. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7bfbbd88 |
|
02-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix spurious alloc errors on forced shutdown Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
04e23a56 |
|
30-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Ensure we always have a journal pin in interior update path For the new nodes an interior btree update makes reachable, updates to those nodes may be journalled after the btree update starts but before the transactional part - where we make those nodes reachable. Those updates need to be kept in the journal until after the btree update completes, hence we should always get a journal pin at the start of the interior update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7b489207 |
|
20-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Delete dead code The interior btree node update path has changed, this is no longer needed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4e92cbb6 |
|
19-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: More debug code improvements Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1c74cec1 |
|
16-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add more debug checks tracking down a bug where we see a btree node pointer in the wrong node Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0b5c9f59 |
|
15-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Set preallocated transaction mem to avoid restarts this will reduce transaction restarts, from observation of tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6a747c46 |
|
09-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add accounting for dirty btree nodes/keys This lets us improve journal reclaim, so that it now tries to make sure no more than 3/4s of the btree node cache and btree key cache are dirty - ensuring the shrinkers can free memory. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
811d2bcd |
|
06-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Drop typechecking from bkey_cmp_packed() This only did anything in two places, and those can just be replaced wiht bkey_cmp_left_packed()). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7807e143 |
|
25-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert various code to printbuf printbufs know how big the buffer is that was allocated, so we can get rid of the random PAGE_SIZEs all over the place. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
760992aa |
|
25-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Ensure we wake up threads locking node when reusing it Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4fe7efa1 |
|
22-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix an error path We were missing a 'goto retry' and continuing on with an error pointer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a2b5313a |
|
21-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a faulty assertion Now that updates to interior nodes are journalled, we shouldn't be checking topology of interior nodes until we've finished replaying updates to that node. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fff899b1 |
|
03-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Mark btree nodes as needing rewrite when not all replicas are RW This fixes a bug where recovery fails when one of the devices is read only. Also - consolidate the "must rewrite this node to insert it" behind a new btree node flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
937f5036 |
|
18-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use btree reserve when appropriate Whenever we're doing an update that has pointers, that generally means we need to do the update in order to release open bucket references - so we should be using the btree open bucket reserve. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2ca88e5a |
|
07-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree key cache This introduces a new kind of btree iterator, cached iterators, which point to keys cached in a hash table. The cache also acts as a write cache - in the update path, we journal the update but defer updating the btree until the cached entry is flushed by journal reclaim. Cache coherency is for now up to the users to handle, which isn't ideal but should be good enough for now. These new iterators will be used for updating inodes and alloc info (the alloc and stripes btrees). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bd2bb273 |
|
12-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't deadlock when btree node reuse changes lock ordering Btree node lock ordering is based on the logical key. However, 'struct btree' may be reused for a different btree node under memory pressure. This patch uses the new six lock callback to check if a btree node is no longer the node we wanted to lock before blocking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4efe71a6 |
|
09-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Always give out journal pre-res if we already have one This is better than skipping the journal pre-reservation if we already have one - we should still acount for the journal reservation we're going to have to get. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
966885ee |
|
09-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a linked list bug Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
40ca39b5 |
|
09-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_update_nodes_written() requires alloc reserve Also, in the btree_update_start() path, if we already have a journal pre-reservation we don't want to take another - that's a deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
00b8ccf7 |
|
25-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Interior btree updates are now fully transactional We now update the alloc info (bucket sector counts) atomically with journalling the update to the interior btree nodes, and we also set new btree roots atomically with the journalled part of the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c823c339 |
|
25-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Factor out bch2_fs_btree_interior_update_init() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bc970cec |
|
02-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix two more deadlocks Deadlock on shutdown: btree_update_nodes_written() unblocks btree nodes from being written; after doing so, it has to check if they were marked as needing to be written and if so kick off those writes - if that doesn't happen, we'll never release journal pins and shutdown will get stuck when flushing the journal. There was an error path where this didn't happen, because in the error path we don't actually want those btree nodes write to happen; however, we still have to kick off the write path so the journal pins get released. The btree write path checks if we're in a journal error state and doesn't do the actual write if we are. Also - there was another deadlock because btree_update_nodes_written() was taking the btree update off of the unwritten_list too soon - before getting a journal reservation, which could fail and have to be retried. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5b6d505a |
|
01-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix another deadlock in btree_update_nodes_written() We also can't be blocking on btree node write locks while holding btree_interior_update_lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
297604c9 |
|
10-Apr-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a few tracepoints Transaction restart tracing should probably be overhaulled at some point. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
58fb3e51 |
|
07-Apr-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix another deadlock in the btree interior update path Can't take read locks on btree nodes while holding btree_interior_update_lock. Also, fix a bug where we were leaking journal prereservations. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0f9dda47 |
|
05-Apr-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a deadlock on starting an interior btree update Not legal to block on a journal prereservation with btree locks held. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1e3b1f9a |
|
04-Apr-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a debug mode assertion Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8707ab0d |
|
04-Apr-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix another error path locking bug btree_update_nodes_written() was leaking a btree node lock on failure to get a journal reservation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
501e1bda |
|
31-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix journalling of interior node updates We weren't journalling updates done while splitting/compacting nodes - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a0e491c0 |
|
29-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't allocate memory while holding journal reservation This fixes a lockdep splat - allocating memory can call bch2_clear_page_bits() which takes mark_lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
39fb2983 |
|
07-Jan-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill bkey_type_successor Previously, BTREE_ID_INODES was special - inodes were indexed by the inode field, which meant the offset field of struct bpos wasn't used, which led to special cases in e.g. the btree iterator code. Now, inodes in the inodes btree are indexed by the offset field. Also: prevously min_key was special for extents btrees, min_key for extents would equal max_key for the previous node. Now, min_key = bkey_successor() of the previous node, same as non extent btrees. This means we can completely get rid of btree_type_sucessor/predecessor. Also make some improvements to the metadata IO validate/compat code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
56a40fbc |
|
28-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Various fixes for interior update path The locking was wrong, and we could get a use after free in the error path where we weren't taking the entrie being freed off the unwritten list. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6357d607 |
|
08-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal updates to interior nodes Previously, the btree has always been self contained and internally consistent on disk without anything from the journal - the journal just contained pointers to the btree roots. However, this meant that btree node split or compact operations - i.e. anything that changes btree node topology and involves updates to interior nodes - would require that interior btree node to be written immediately, which means emitting a btree node write that's mostly empty (using 4k of space on disk if the filesystemm blocksize is 4k to only write perhaps ~100 bytes of new keys). More importantly, this meant most btree node writes had to be FUA, and consumer drives have a history of slow and/or buggy FUA support - other filesystes have been bit by this. This patch changes the interior btree update path to journal updates to interior nodes, after the writes for the new btree nodes have completed. Best of all, it turns out to simplify the interior node update path somewhat. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e62d65f2 |
|
15-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: trans_commit() path can now insert to interior nodes This will be needed for the upcoming patches to journal updates to interior btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2dac0eae |
|
18-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Iterator debug code improvements More aggressively checking iterator invariants, and fixing the resulting bugs. Also greatly simplifying iter_next() and iter_next_slot() - they were hyper optimized before, but the optimizations were getting too brittle. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3f58a197 |
|
27-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal pin cleanups Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
00aad62a |
|
26-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix incorrect initialization of btree_node_old_extent_overwrite() b->level and b->btree_id weren't set when the code was checking btree_node_is_extents() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ac7c51b2 |
|
08-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Seralize btree_update operations at btree_update_nodes_written() Prep work for journalling updates to interior nodes - enforcing ordering will greatly simplify those changes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
548b3d20 |
|
07-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_ptr_v2 Add a new btree ptr type which contains the sequence number (random 64 bit cookie, actually) for that btree node - this lets us verify that when we read in a btree node it really is the btree node we wanted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
237e8048 |
|
18-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: introduce b->hash_val This is partly prep work for introducing bch_btree_ptr_v2, but it'll also be a bit of a performance boost by moving the full key out of the hot part of struct btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5525f632 |
|
15-Jan-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Change btree split threshold to be in u64s This fixes a bug with very small btree nodes where splitting would end up with one of the new nodes empty. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9626aeb1 |
|
06-Jan-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rework iter->pos handling - Rework some of the helper comparison functions for consistency - Currently trying to refactor all the logic that's different for extents in the btree iterator code. The main difference is that for non extents we search for a key greater than or equal to the search key, while for extents we search for a key strictly greater than the search key (iter->pos). So that logic is now handled by btree_iter_search_key(), which computes the real search key based on iter->pos and whether or not we're searching for a key >= or > iter->pos. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2d594dfb |
|
31-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out btree_trigger_flags The trigger flags really belong with individual btree_insert_entries, not the transaction commit flags - this splits out those flags and unifies them with the BCH_BUCKET_MARK flags. Todo - split out btree_trigger.c from buckets.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bcd6f3e0 |
|
26-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use KEY_TYPE_deleted whitouts for extents Previously, partial overwrites of existing extents were handled implicitly by the btree code; when reading in a btree node, we'd do a mergesort of the different bsets and detect and fix partially overlapping extents during that mergesort. That approach won't work with snapshots: this changes extents to work like regular keys as far as the btree code is concerned, where a 0 size KEY_TYPE_deleted whiteout will completely overwrite an existing extent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
27b3e523 |
|
27-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add an assertion to track down a heisenbug Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ad44bdc3 |
|
09-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bkey noops For upcoming inline data extents, we're going to need to be able to shorten the value of existing bkeys in the btree - and to make that work we're going to be able to need to pad out the space the value previously took up with something. This patch changes the various code that iterates over bkeys to handle k->u64s == 0 as meaning "skip the next 8 bytes". Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
928c839c |
|
11-Oct-2019 |
Justin Husted <sigstop@gmail.com> |
bcachefs: Initialize btree_node flags field in bch2_btree_root_alloc. Valgrind data indicated that the flags field was only partially initialized when written to disk. Signed-off-by: Justin Husted <sigstop@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ea3532cb |
|
11-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a subtle race in the btree split path We have to free the old (in memory) btree node _before_ unlocking the new nodes - else, some other thread with a read lock on the old node could see stale data after another thread has already updated the new node. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2cbe5cfe |
|
09-Aug-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rework calling convention for marking overwrites Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6e738539 |
|
24-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve key marking interface Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
20bceecb |
|
15-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: More work to avoid transaction restarts Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
58fbf808 |
|
15-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Delete duplicate code Also rename for consistency Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ed8413fd |
|
14-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: improved btree locking tracepoints Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b03b81df |
|
10-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't pass around may_drop_locks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b7607ce9 |
|
10-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill remaining bch2_btree_iter_unlock() uses Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c43a6ef9 |
|
05-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_bkey_cached_common This is prep work for the btree key cache: btree iterators will point to either struct btree, or a new struct bkey_cached. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5e82a9a1 |
|
10-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Write out fs usage consistently Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
36e916e1 |
|
29-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Caller now responsible for calling mark_key for gc Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a2b6b072 |
|
29-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: add missing bch2_btree_iter_node_drop() call Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0f238367 |
|
27-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: trans_for_each_iter() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dc3b63dc |
|
21-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add time stats for btree updates Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4d8100da |
|
15-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Allocate fs_usage in do_btree_insert_at() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8c96cfcc |
|
13-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: fix more locking bugs Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2ecc6171 |
|
12-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix double counting when gc is running Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
39fbc5a4 |
|
11-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: gc lock no longer needed for disk reservations Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
42b72e0b |
|
24-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: journal_replay_early() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
430735cd |
|
18-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Persist alloc info on clean shutdown - Does not persist alloc info for stripes yet - Also does not yet include filesystem block/sector counts yet, from struct fs_usage - Not made use of just yet Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7ef2a73a |
|
21-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix check for if extent update is allocating Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d0cc3def |
|
13-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: More allocator startup improvements Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9166b41d |
|
25-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: s/usage_lock/mark_lock better describes what it's for, and we're going to call a new lock usage_lock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
eb863265 |
|
24-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: drop bogus percpu_ref_tryget caller should already be guarding against rw, and checking here breaks when caller needs to finish updates for going RO Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
26609b61 |
|
01-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make bkey types globally unique this lets us get rid of a lot of extra switch statements - in a lot of places we dispatch on the btree node type, and then the key type, so this is a nice cleanup across a lot of code. Also improve the on disk format versioning stuff. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
eeb83e25 |
|
22-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Hold usage_lock over mark_key and fs_usage_apply Fixes an inconsistency at the end of gc Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ad7ae8d6 |
|
23-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree locking fix, refactoring Hit an assertion, probably spurious, indicating an iterator was unlocked when it shouldn't have been (spurious because it wasn't locked at all when the caller called btree_insert_at()). Add a flag, BTREE_ITER_NOUNLOCK, and tighten up the assertions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9ca53b55 |
|
23-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: gc now operates on second set of bucket marks This means we can now use gc to verify the allocation information - important for testing persistant alloc info Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f1a79365 |
|
18-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't block on journal reservation with btree locks held Fixes a deadlock between the allocator thread, when it first starts up, and journal replay Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cd575ddf |
|
01-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Erasure coding Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
319f9ac3 |
|
08-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: revamp to_text methods Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
47799326 |
|
01-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: more key marking refactoring prep work for erasure coding Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
103e2127 |
|
30-Oct-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: replicas: prep work for stripes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ef337c54 |
|
06-Oct-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Allocation code refactoring bch2_alloc_sectors_start() was a nightmare to work with - it's got some tricky stuff to do, since it wants to use the buckets the writepoint already has, unless they're not in the target it wants to write to, unless it can't allocate from any other devices in which case it will use those buckets if it has to - et cetera. This restructures the code to start with a new empty list of open buckets we're going to use for the new allocation, pulling buckets from the write point's list as we decide that we really are going to use them - making the code somewhat more functional and drastically easier to understand. Also fixes a bug where we could end up waiting on c->freelist_wait (because allocating from one device failed) but return success from bch2_bucket_alloc(), because allocating from a different device succeeded. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7b3f84ea |
|
05-Oct-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out alloc_background.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
34b8e552 |
|
15-Sep-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a deadlock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a00fd8c5 |
|
21-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Comparison function cleanups Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
271a3d3a |
|
21-Jul-2016 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: lift ordering restriction on 0 size extents This lifts the restriction that 0 size extents must not overlap with other extents, which means we can now sort extents and non extents the same way, and will let us simplify a bunch of other stuff as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6eac2c2e |
|
24-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Change how replicated data is accounted Due to compression, the different replicas of a replicated extent don't necessarily have to take up the same amount of space - so replicated data sector counts shouldn't be stored divided by the number of replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5b650fd1 |
|
24-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Account for internal fragmentation better Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
09f3297a |
|
24-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill s_alloc, use bch_data_type Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a7c7a309 |
|
23-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_mark_key() now takes bch_data_type Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b29e197a |
|
22-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Invalidate buckets when writing to alloc btree Prep work for persistent alloc information. Refactoring also lets us make free_inc much smaller, which means a lot fewer buckets stranded on freelists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4077991c |
|
16-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a use after free in the journal code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1c6fdbd8 |
|
17-Mar-2017 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Initial commit Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write filesystem with every feature you could possibly want. Website: https://bcachefs.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|