#
9e203c43 |
|
12-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix missing write refs in fs fio paths bch2_journal_flush_seq requires us to have a write ref Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a292be3b |
|
27-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Reconstruct missing snapshot nodes When the snapshots btree is going, we'll have to delete huge amounts of data - unless we can reconstruct it by looking at the keys that refer to it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
55936afe |
|
15-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Flag btrees with missing data We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4409b808 |
|
11-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Repair pass for scanning for btree nodes If a btree root or interior btree node goes bad, we're going to lose a lot of data, unless we can recover the nodes that it pointed to by scanning. Fortunately btree node headers are fully self describing, and additionally the magic number is xored with the filesytem UUID, so we can do so safely. This implements the scanning - next patch will rework topology repair to make use of the found nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d2554263 |
|
23-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Split out recovery_passes.c We've grown a fair amount of code for managing recovery passes; tracking which ones we're running, which ones need to be run, and flagging in the superblock which ones need to be run on the next recovery. So it's worth splitting out into its own file, this code is pretty different from the code in recovery.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
63332394 |
|
21-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Move snapshot table size to struct snapshot_table We need to add bounds checking for snapshot table accesses - it turns out there are cases where we do need to use the snapshots table before fsck checks have completed (and indeed, fsck may not have been run). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a0a466ea |
|
17-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Split out btree_node_rewrite_worker This fixes a deadlock due to using btree_interior_update_worker for non interior updates - async btree node rewrites were blocking, and then blocking other interior updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
273960b8 |
|
01-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
bcachefs: time_stats: split stats-with-quantiles into a separate structure Currently, struct time_stats has the optional ability to quantize the information that it collects. This is /probably/ useful for callers who want to see quantized information, but it more than doubles the size of the structure from 224 bytes to 464. For users who don't care about that (e.g. upcoming xfs patches) and want to avoid wasting 240 bytes per counter, split the two into separate pieces. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f1ca1abf |
|
13-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: pull out time_stats.[ch] prep work for lifting out of fs/bcachefs/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
894d0622 |
|
23-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Rename journal_keys.d -> journal_keys.data This will let us use some darray helpers in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a393f331 |
|
15-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Split out discard fastpath Buckets usually can't be discarded until the transaction that made them empty has been committed in the journal. Tracing has indicated that we're queuing the discard worker excessively, only for it to skip over many buckets that are still waiting on a journal commit, discarding only one or two buckets per iteration. We want to switch to only queuing the discard worker after a journal flush write, but there's an important optimization we need to preserve: if a bucket becomes empty and it was never committed in the journal while it was in use, we want to discard it and reuse it right away - since overwriting it before the previous writes are flushed from the device cache eans those writes only cost bus bandwidth. So, this patch implements a fast path for buckets that can be discarded right away. We need new locking between the two discard workers; the new list of buckets being discarded provides that locking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b63570f7 |
|
12-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_print_opts() Make sure early error messages get redirected, for kernel-fsck-from-userland. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b26d7914 |
|
21-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BTREE_ID_subvolume_children Add a btree to record a parent -> child subvolume relationships, according to the filesystem heirarchy. The subvolume_children btree is a bitset btree: if a bit is set at pos p, that means p.offset is a child of subvolume p.inode. This will be used for efficiently listing subvolumes, as well as recursive deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4e074475 |
|
10-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Clamp replicas_required to replicas This prevents going emergency read only when the user has specified replicas_required > replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ec4edd7b |
|
16-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Prep work for variable size btree node buffers bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c72e4d7a |
|
03-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: add time_stats for btree_node_read_done() Seeing weird latency issues in the btree node read path - add one bch2_btree_node_read_done(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d55ddf6e |
|
31-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Online fsck can now fix errors BCH_FS_fsck_done -> BCH_FS_fsck_running; set when we might be fixing fsck errors. Also; set fix_errors to ask by default when fsck is running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
96f37eab |
|
31-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: factor out thread_with_file, thread_with_stdio thread_with_stdio now knows how to handle input - fsck can now prompt to fix errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
89056f24 |
|
23-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: track transaction durations Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a7dc10ce |
|
19-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Make sure allocation failure errors are logged The previous patch fixed a bug in allocation path error handling, and it would've been noticed sooner had it been logged properly. Generally speaking, errors that shouldn't happen in normal operation and are being returned up the stack should be logged: the write path was already logging IO errors, but non IO errors were missed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cf904c8d |
|
16-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_err_(fn|msg) check if should print Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e06af207 |
|
15-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: fix userspace build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
67997234 |
|
11-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: kill btree_trans->wb_updates the btree write buffer path now creates a journal entry directly Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
09caeabe |
|
02-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree write buffer now slurps keys from journal Previosuly, the transaction commit path would have to add keys to the btree write buffer as a separate operation, requiring additional global synchronization. This patch introduces a new journal entry type, which indicates that the keys need to be copied into the btree write buffer prior to being written out. We switch the journal entry type back to JSET_ENTRY_btree_keys prior to write, so this is not an on disk format change. Flushing the btree write buffer may require pulling keys out of journal entries yet to be written, and quiescing outstanding journal reservations; we previously added journal->buf_lock for synchronization with the journal write path. We also can't put strict bounds on the number of keys in the journal destined for the write buffer, which means we might overflow the size of the preallocated buffer and have to reallocate - this introduces a potentially fatal memory allocation failure. This is something we'll have to watch for, if it becomes an issue in practice we can do additional mitigation. The transaction commit path no longer has to explicitly check if the write buffer is full and wait on flushing; this is another performance optimization. Instead, when the btree write buffer is close to full we change the journal watermark, so that only reservations for journal reclaim are allowed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
24de63da |
|
10-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve trans->extra_journal_entries Instead of using a darray, we now allocate journal entries for the transaction commit path with our normal bump allocator - with an inlined fastpath, and using btree_transaction_stats to remember how much to initially allocate so as to avoid transaction restarts. This is prep work for converting write buffer updates to use this mechanism. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
249bf593 |
|
09-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix snapshot.c assertion for online fsck c->curr_recovery_pass can go backwards; this adds a non rewinding version, c->recovery_pass_done. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
267b801f |
|
04-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BCH_IOCTL_FSCK_ONLINE This adds a new ioctl for running fsck on a mounted, in use filesystem. This reuses the fsck_thread code from the previous patch for running fsck on an offline, unmounted filesystem, so that log messages for the fsck thread are redirected to userspace. Only one running fsck instance is allowed at a time; a new semaphore (since the lock will be taken by one thread and released by another) is added for this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7f391b2f |
|
06-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_run_online_recovery_passes() Add a new helper for running online recovery passes - i.e. online fsck. This is a subset of our normal recovery passes, and does not - for now - use or follow c->curr_recovery_pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2b41226d |
|
04-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add ability to redirect log output Upcoming patches are going to add two new ioctls for running fsck in the kernel, but pretending that we're running our normal userspace fsck. This patch adds some plumbing for redirecting our normal log messages away from the dmesg log to a thread_with_file file descriptor - via a struct log_output, which will be consumed by the fsck f_op's read method. The new ioctls will allow for running fsck in the kernel against an offline filesystem (without mounting it), and an online filesystem. For an offline filesystem we need a way to pass in a pointer to the log_output, which is done via a new hidden opts.h option. For online fsck, we can set c->output directly, but only want to redirect log messages from the thread running fsck - hence the new c->output_filter method. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
63508b75 |
|
06-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: c->ro_ref Add a new refcount for async ops that don't necessarily need the fs to be RW, with similar lifetime/rules otherwise as c->writes. To be used by online fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
74644030 |
|
27-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: count_event() Small helper for event counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
183bcc89 |
|
02-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Clean up btree write buffer write ref handling __bch2_btree_write_buffer_flush() now assumes a write ref is already held (as called by the transaction commit path); and the wrappers bch2_write_buffer_flush() and flush_sync() take an explicit write ref. This means internally the write buffer code can always use BTREE_INSERT_NOCHECK_RW, instead of in the previous code passing flags around and hoping the NOCHECK_RW flag was always carried around correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3c471b65 |
|
26-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: convert bch_fs_flags to x-macro Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
066a2646 |
|
09-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: track_event_change() This introduces a new helper for connecting time_stats to state changes, i.e. when taking journal reservations is blocked for some reason. We use this to track separately the different reasons the journal might be blocked - i.e. space in the journal full, or the journal pin fifo full. Also do some cleanup and improvements on the time stats code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8b16413c |
|
29-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_sb.recovery_passes_required Add two new superblock fields. Since the main section of the superblock is now fully, we have to add a new variable length section for them - bch_sb_field_ext. - recovery_passes_requried: recovery passes that must be run on the next mount - errors_silent: errors that will be silently fixed These are to improve upgrading and dwongrading: these fields won't be cleared until after recovery successfully completes, so there won't be any issues with crashing partway through an upgrade or a downgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bbc3a460 |
|
24-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix zstd compress workspace size zstd apparently lies about the size of the compression workspace it requires; if we double it compression succeeds. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8a443d3e |
|
17-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Proper refcounting for journal_keys The btree iterator code overlays keys from the journal until journal replay is finished; since we're now starting copygc/rebalance etc. before replay is finished, this is multithreaded access and thus needs refcounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
274c2f8f |
|
06-Nov-2023 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
bcachefs: Fix multiple -Warray-bounds warnings Transform zero-length array `entries` into a proper flexible-array member in `struct journal_seq_blacklist_table`; and fix the following -Warray-bounds warnings: fs/bcachefs/journal_seq_blacklist.c:148:26: warning: array subscript idx is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:150:30: warning: array subscript idx is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:154:27: warning: array subscript idx is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:176:27: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:177:27: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:297:34: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:298:34: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:300:31: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] This results in no differences in binary output. This helps with the ongoing efforts to globally enable -Warray-bounds. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f5d26fa3 |
|
25-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_sb_field_errors Add a new superblock section to keep counts of errors seen since filesystem creation: we'll be addingcounters for every distinct fsck error. The new superblock section has entries of the for [ id, count, time_of_last_error ]; this is intended to let us see what errors are occuring - and getting fixed - via show-super output. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
94119eeb |
|
25-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add IO error counts to bch_member We now track IO errors per device since filesystem creation. IO error counts can be viewed in sysfs, or with the 'bcachefs show-super' command. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fb3f57bb |
|
20-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: rebalance_work This adds a new btree, rebalance_work, to eliminate scanning required for finding extents that need work done on them in the background - i.e. for the background_target and background_compression options. rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an extent in the extents or reflink btree at the same pos. A new extent field is added, bch_extent_rebalance, which indicates that this extent has work that needs to be done in the background - and which options to use. This allows per-inode options to be propagated to indirect extents - at least in some circumstances. In this patch, changing IO options on a file will not propagate the new options to indirect extents pointed to by that file. Updating (setting/clearing) the rebalance_work btree is done by the extent trigger, which looks at the bch_extent_rebalance field. Scanning is still requrired after changing IO path options - either just for a given inode, or for the whole filesystem. We indicate that scanning is required by adding a KEY_TYPE_cookie key to the rebalance_work btree: the cookie counter is so that we can detect that scanning is still required when an option has been flipped mid-way through an existing scan. Future possible work: - Propagate options to indirect extents when being changed - Add other IO path options - nr_replicas, ec, to rebalance_work so they can be applied in the background when they change - Add a counter, for bcachefs fs usage output, showing the pending amount of rebalance work: we'll probably want to do this after the disk space accounting rewrite (moving it to a new btree) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
96a363a7 |
|
23-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: move: move_stats refactoring data_progress_list is gone - it was redundant with moving_context_list The upcoming rebalance rewrite is going to have it using two different move_stats objects with the same moving_context, depending on whether it's scanning or using the rebalance_work btree - this patch plumbs stats around a bit differently so that will work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
37707bb1 |
|
22-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Split out disk_groups_types.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b0b5bbf9 |
|
19-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't run bch2_delete_dead_snapshots() unnecessarily Be a bit more careful about when bch2_delete_dead_snapshots needs to run: it only needs to run synchronously if we're running fsck, and it only needs to run at all if we have snapshot nodes to delete or if fsck has noticed that it needs to run. Also: Rename BCH_FS_HAVE_DELETED_SNAPSHOTS -> BCH_FS_NEED_DELETE_DEAD_SNAPSHOTS Kill bch2_delete_dead_snapshots_hook(), move functionality to bch2_mark_snapshot() Factor out bch2_check_snapshot_needs_deletion(), to explicitly check if we need to be running snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
37fad949 |
|
28-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: snapshot_create_lock Add a new lock for snapshot creation - this addresses a few races with logged operations and snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d2a990d1 |
|
26-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_err_msg(), bch_err_fn() now filters out transaction restart errors These errors aren't actual errors, and should never be printed - do this in the common helpers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6bd68ec2 |
|
12-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Heap allocate btree_trans We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
96dea3d5 |
|
12-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix W=12 build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
aaad530a |
|
27-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BTREE_ID_logged_ops Add a new btree for long running logged operations - i.e. for logging operations that we can't do within a single btree transaction, so that they can be resumed if we crash. Keys in the logged operations btree will represent operations in progress, with the state of the operation stored in the value. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e691b391 |
|
06-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add logging to bch2_inode_peek() & related Add error messages when we fail to lookup an inode, and also add a few missing bch2_err_class() calls. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0ed4ca14 |
|
03-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Ensure topology repair runs This fixes should_restart_for_topology_repair() - previously it was returning false if the btree io path had already seleceted topology repair to run, even if it hadn't run yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ad52bac2 |
|
03-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Log a message when running an explicit recovery pass Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dde8cb11 |
|
16-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bcachefs_metadata_version_deleted_inodes Add a new bitset btree for inodes pending deletion; this means we no longer have to scan the full inodes btree after an unclean shutdown. Specifically, this adds: - a trigger to update the deleted_inodes btree based on changes to the inodes btree - a new recovery pass - and check_inodes is now only a fsck pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1074a21c |
|
02-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: recovery_types.h Move some code out of bcachefs.h, which is too much of an everything header. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
922bc5a0 |
|
16-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Make topology repair a normal recovery pass This adds bch2_run_explicit_recovery_pass(), for rewinding recovery and explicitly running a specific recovery pass - this is a more general replacement for how we were running topology repair before. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ae2e13d7 |
|
16-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_run_explicit_recovery_pass() This introduces bch2_run_explicit_recovery_pass() and uses it for when fsck detects that we need to re-run dead snaphots cleanup, and makes dead snapshot cleanup more like a normal recovery pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
813e0cec |
|
15-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Upgrade path fixes Some minor fixes to not print errors that are actually due to a verson upgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8479938d |
|
12-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Convert snapshot table to RCU array This switches the generic radix tree for the in-memory table of snapshot nodes to a simple rcu array. This means we have to add new locking to deal with reallocations, but is faster than traversing the radix tree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
065bd335 |
|
10-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Version table now lists required recovery passes Now that we've got forward compatibility sorted out, we should be doing more frequent version upgrades in the future. To avoid having to run a full fsck for every version upgrade, this improves the BCH_METADATA_VERSIONS() table to explicitly specify a bitmask of recovery passes to run when upgrading to or past a given version. This means we can also delete PASS_UPGRADE(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ba8eeae8 |
|
27-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bcachefs_metadata_version_major_minor This introduces major/minor versioning to the superblock version number. Major version number changes indicate incompatible releases; we can move forward to a new major version number, but not backwards. Minor version numbers indicate compatible changes - these add features, but can still be mounted and used by old versions. With the recent patches that make it possible to roll out new btrees and key types without breaking compatibility, we should be able to roll out most new features without incompatible changes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
067d228b |
|
07-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Enumerate recovery passes Recovery and fsck have many different passes/jobs to do, which always run in the same order - but not all of them run all the time. Some are for fsck, some for unclean shutdown, some for version upgrades. This adds some new structure: a defined list of recovery passes that we can run in a loop, as well as consolidating the log messages. The main benefit is consolidating the "should run this recovery pass" logic, as well as cleaning up the "this recovery pass has finished" state; instead of having a bunch of ad-hoc state bits in c->flags, we've now got c->curr_recovery_pass. By consolidating the "should run this recovery pass" logic, in the future on disk format upgrades will be able to say "upgrading to this version requires x passes to run", instead of forcing all of fsck to run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
78328fec |
|
08-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Stash journal replay params in bch_fs For the upcoming enumeration of recovery passes, we need all recovery passes to be called the same way - including journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3045bb95 |
|
27-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: version_upgrade is now an enum The version_upgrade parameter is now an enum, not a bool, and it's persistent in the superblock: - compatible (default): upgrade to the latest compatible version - incompatible: upgrade to latest incompatible version - none Currently all upgrades are incompatible upgrades, but the next release will introduce major:minor versions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
24964e1c |
|
28-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE() Version upgrades are not atomic operations: when we do a version upgrade we need to update the superblock before we start using new features, and then when the upgrade completes we need to update the superblock again. This adds a new superblock field so we can detect and handle incomplete version upgrades. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
faa6cb6c |
|
28-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Allow for unknown btree IDs We need to allow filesystems with metadata from newer versions to be mountable and usable by older versions. This patch enables us to roll out new btrees without a new major version number; we can now handle btree roots for unknown btree types. The unknown btree roots will be retained, and fsck (including backpointers) will check them, the same as other btree types. We add a dynamic array for the extra, unknown btree roots, in addition to the fixed size btree root array, and add new helpers for looking up btree roots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1bb3c2a9 |
|
20-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: New error message helpers Add two new helpers for printing error messages with __func__ and bch2_err_str(): - bch_err_fn - bch_err_msg Also kill the old error strings in the recovery path, which were causing us to incorrectly report memory allocation failures - they're not needed anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a5b696ee |
|
19-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: seqmutex; fix a lockdep splat We can't be holding btree_trans_lock while copying to user space, which might incur a page fault. To fix this, convert it to a seqmutex so we can unlock/relock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fc0ee376 |
|
25-May-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't reuse reflink btree keyspace We've been seeing difficult to debug "missing indirect extent" bugs, that fsck doesn't seem to find. One possibility is that there was a missing indirect extent, but then a new indirect extent was created at the location of the previous indirect extent. This patch eliminates that possibility by always creating new indirect extents right after the last one, at the end of the reflink btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1c59b483 |
|
29-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BTREE_ID_snapshot_tree This adds a new btree which gets us a persistent per-snapshot-tree identifier. - BTREE_ID_snapshot_trees - KEY_TYPE_snapshot_tree - bch_snapshot now has a field that points to a snapshot_tree This is going to be used to designate one snapshot ID/subvolume out of a given tree of snapshots as the "main" subvolume, so that we can do quota accounting in that subvolume and not the rest. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4f77dcde |
|
29-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: move snapshot_t to subvolume_types.h this doesn't need to be in bcachefs.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0fb11e08 |
|
17-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improved copygc wait debugging This just adds a line for how long copygc has been waiting to sysfs copygc_wait, helpful for debugging why copygc isn't running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8bff9875 |
|
23-Mar-2023 |
Brian Foster <bfoster@redhat.com> |
bcachefs: use dedicated workqueue for tasks holding write refs A workqueue resource deadlock has been observed when running fsck on a filesystem with a full/stuck journal. fsck is not currently able to repair the fs due to fairly rapid emergency shutdown, but rather than exit gracefully the fsck process hangs during the shutdown sequence. Fortunately this is easily recoverable from userspace, but the root cause involves code shared between the kernel and userspace and so should be addressed. The deadlock scenario involves the main task in the bch2_fs_stop() -> bch2_fs_read_only() path waiting on write references to drain with the fs state lock held. A bch2_read_only_work() workqueue task is scheduled on the system_long_wq, blocked on the state lock. Finally, various other write ref holding workqueue tasks are scheduled to run on the same workqueue and must complete in order to release references that the initial task is waiting on. To avoid this problem, we can split the dependent workqueue tasks across different workqueues. It's a bit of a waste to create a dedicated wq for the read-only worker, but there are several tasks throughout the fs that follow the pattern of acquiring a write reference and then scheduling to the system wq. Use a local wq for such tasks to break the subtle dependency between these and the read-only worker. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9edbcc72 |
|
15-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix bch2_evict_subvolume_inodes() This fixes a bug in bch2_evict_subvolume_inodes(): d_mark_dontcache() doesn't handle the case where i_count is already 0, we need to grab and put the inode in order for it to be dropped. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b40901b0 |
|
13-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: New erasure coding shutdown path This implements a new shutdown path for erasure coding, which is needed for the upcoming BCH_WRITE_WAIT_FOR_EC write path. The process is: - Cancel new stripes being built up - Close out/cancel open buckets on write points or the partial list that are for stripes - Shutdown rebalance/copygc - Then wait for in flight new stripes to finish With BCH_WRITE_WAIT_FOR_EC, move ops will be waiting on stripes to fill up before they complete; the new ec shutdown path is needed for shutting down copygc/rebalance without deadlocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b9fa375b |
|
11-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_fs_moving_ctxts_to_text() This also adds bch2_write_op_to_text(): now we can see outstand moves, useful for debugging shutdown with the upcoming BCH_WRITE_WAIT_FOR_EC and likely for other things in the future. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
45dd05b3 |
|
04-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BKEY_PADDED_ONSTACK() Rust bindgen doesn't do anonymous structs very nicely: BKEY_PADDED() only needs the anonymous struct when it's used on the stack, to guarantee layout, not when it's embedded in another struct. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
39a1ea12 |
|
24-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Single open_bucket_partial list Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4b1e6699 |
|
18-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Erasure coding: Track open stripes This adds a new hash table for stripes being created or updated, instead of hackily relying on the stripes heap. This lets us reserve the slot for the new stripe up front, at the same time as we would pick an existing stripe - if we were updating an existing stripe - making the overall code more consistent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
627a2312 |
|
18-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Switch ec_stripes_heap_lock to a mutex Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
80c33085 |
|
05-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fragmentation LRU Now that we have much more efficient updates to the LRU btree, this patch adds a new LRU that indexes buckets by fragmentation. This means copygc no longer has to scan every bucket to find buckets that need to be evacuated. Changes: - A new field in bch_alloc_v4, fragmentation_lru - this corresponds to the bucket's position in the fragmentation LRU. We add a new field for this instead of calculating it as needed because we may make the fragmentation LRU optional; this field indicates whether a bucket is on the fragmentation LRU. Also, zoned devices will introduce variable bucket sizes; explicitly recording the LRU position will be safer for them. - A new copygc path for using the fragmentation LRU instead of scanning every bucket and building up an in-memory heap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a1f26d70 |
|
10-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Handle btree node rewrites before going RW Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
350175bf |
|
14-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improved nocow locking This improves the nocow lock table so that hash table entries have multiple locks, and locks specify which bucket they're for - i.e. we can now resolve hash collisions. This is important because the allocator has to skip buckets that are locked in the nocow lock table, and previously hash collisions would cause it to spuriously skip unlocked buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5250b74d |
|
25-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bucket_gens btree To improve mount times, add a btree for just bucket gens, 256 of them per key: this means we'll have to scan drastically less metadata at startup. This adds - trigger for keeping it in sync with the all btree - initialization code, for filesystems from previous versions - new path for reading bucket gens - new fsck code And a new on disk format version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a8b3a677 |
|
02-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Nocow support This adds support for nocow mode, where we do writes in-place when possible. Patch components: - New boolean filesystem and inode option, nocow: note that when nocow is enabled, data checksumming and compression are implicitly disabled - To prevent in-place writes from racing with data moves (data_update.c) or bucket reuse (i.e. a bucket being reused and re-allocated while a nocow write is in flight, we have a new locking mechanism. Buckets can be locked for either data update or data move, using a fixed size hash table of two_state_shared locks. We don't have any chaining, meaning updates and moves to different buckets that hash to the same lock will wait unnecessarily - we'll want to watch for this becoming an issue. - The allocator path also needs to check for in-place writes in flight to a given bucket before giving it out: thus we add another counter to bucket_alloc_state so we can track this. - Fsync now may need to issue cache flushes to block devices instead of flushing the journal. We add a device bitmask to bch_inode_info, ei_devs_need_flush, which tracks devices that need to have flushes issued - note that this will lead to unnecessary flushes when other codepaths have already issued flushes, we may want to replace this with a sequence number. - New nocow write path: look up extents, and if they're writable write to them - otherwise fall back to the normal COW write path. XXX: switch to sequence numbers instead of bitmask for devs needing journal flush XXX: ei_quota_lock being a mutex means bch2_nocow_write_done() needs to run in process context - see if we can improve this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a8c752bb |
|
17-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New on disk format: Backpointers This patch adds backpointers: we now have a reverse index from device and offset on that device (specifically, offset within a bucket) back to btree nodes and (non cached) data extents. The first 40 backpointers within a bucket are stored in the alloc key; after that backpointers spill over to the next backpointers btree. This is to help avoid performance regressions from additional btree updates on large streaming workloads. This patch adds all the code for creating, checking and repairing backpointers. The next patch in the series is going to use backpointers for copygc - finally getting rid of the need to scan all extents to do copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
920e69bc |
|
03-Jan-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Btree write buffer This adds a new method of doing btree updates - a straight write buffer, implemented as a flat fixed size array. This is only useful when we don't need to read from the btree in order to do the update, and when reading is infrequent - perfect for the LRU btree. This will make LRU btree updates fast enough that we'll be able to use it for persistently indexing buckets by fragmentation, which will be a massive boost to copygc performance. Changes: - A new btree_insert_type enum, for btree_insert_entries. Specifies btree, btree key cache, or btree write buffer. - bch2_trans_update_buffered(): updates via the btree write buffer don't need a btree path, so we need a new update path. - Transaction commit path changes: The update to the btree write buffer both mutates global, and can fail if there isn't currently room. Therefore we do all write buffer updates in the transaction all at once, and also if it fails we have to revert filesystem usage counter changes. If there isn't room we flush the write buffer in the transaction commit error path and retry. - A new persistent option, for specifying the number of entries in the write buffer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f2b542ba |
|
11-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Go RW before check_alloc_info() It's possible to do btree updates before going RW by adding them to the list of updates for journal replay to do, but this is limited by what fits in RAM. This patch switches the second alloc info phase to run after going RW - btree_gc has already ensured the alloc btree itself is correct - and tweaks the allocation path to deal with the potential small inconsistencies. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d94189ad |
|
08-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Debug mode for c->writes references This adds a debug mode where we split up the c->writes refcount into distinct refcounts for every codepath that takes a reference, and adds sysfs code to print the value of each ref. This will make it easier to debug shutdown hangs due to refcount leaks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
42af0ad5 |
|
17-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix a race with b->write_type b->write_type needs to be set atomically with setting the btree_node_need_write flag, so move it into b->flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7fec8266 |
|
15-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Error message improvement - Centralize format strings in bcachefs.h - Add bch2_fmt_inum_offset() and related helpers - Switch error messages for inodes to also print out the offset, in bytes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b2d1d56b |
|
13-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fixes for building in userspace - Marking a non-static function as inline doesn't actually work and is now causing problems - drop that - Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages with bcachefs (filesystem name) - Userspace doesn't have real percpu variables (maybe we can get this fixed someday), put an #ifdef around bch2_disk_reservation_add() fastpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a1019576 |
|
22-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: More style fixes Fixes for various checkpatch errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
46fee692 |
|
28-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improved btree write statistics This replaces sysfs btree_avg_write_size with btree_write_stats, which now breaks out statistics by the source of the btree write. Btree writes that are too small are a source of inefficiency, and excessive btree resort overhead - this will let us see what's causing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3e3e02e6 |
|
19-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Assorted checkpatch fixes checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
307e3c13 |
|
17-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Optimize bch2_trans_init() Now we store the transaction's fn idx in a local variable, instead of redoing the lookup every time we call bch2_trans_init(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dbb9936b |
|
25-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve bch2_fsck_err() - factor out fsck_err_get() - if the "bcachefs (%s):" prefix has already been applied, don't duplicate it - convert to printbufs instead of static char arrays - tidy up control flow a bit - use bch2_print_string_as_lines(), to avoid messages getting truncated Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1ffb876f |
|
12-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill journal_keys->journal_seq_base This removes an optimization that didn't actually save us any memory, due to alignment, but did make the code more complicated than it needed to be. We were also seeing a bug where journal_seq_base wasn't getting correctly initailized, so hopefully it'll fix that too. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
534a591e |
|
27-Aug-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Delete time_stats for lock contended times Since we've now got time_stats for lock hold times (per btree transaction), we don't need this anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
674cfc26 |
|
26-Aug-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add persistent counters for all tracepoints Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
616928c3 |
|
22-Aug-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Track maximum transaction memory This patch - tracks maximum bch2_trans_kmalloc() memory used in btree_transaction_stats - makes it available in debugfs - switches bch2_trans_init() to using that for the amount of memory to preallocate, instead of the parameter passed in This drastically reduces transaction restarts, and means we no longer need to track this in the source code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5c0bb66a |
|
11-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Track the maximum btree_paths ever allocated by each transaction We need a way to check if the machinery for handling btree_paths with in a transaction is behaving reasonably, as it often has not been - we've had bugs with transaction path overflows caused by duplicate paths and plenty of other things. This patch tracks, per transaction fn, the most btree paths ever allocated by that transaction and makes it available in debugfs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4aba7d45 |
|
11-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rename lock_held_stats -> btree_transaction_stats Going to be adding more things to this in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c807ca95 |
|
14-Jul-2022 |
Daniel Hill <daniel@gluo.nz> |
bcachefs: added lock held time stats We now record the length of time btree locks are held and expose this in debugfs. Enabled via CONFIG_BCACHEFS_LOCK_TIME_STATS. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4ab35c34 |
|
13-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix subvol/snapshot deleting in recovery fsck doesn't want to run while we're cleaning up deleted snapshots - if that work needs to be done, we want it to have finished before fsck runs, otherwise fsck will get confused when it finds multiple keys in the same snapshot ID equivalence class (i.e. the mechanism that snapshot deletion uses for cleaning up redundant keys). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
c91996c5 |
|
15-Jun-2022 |
Daniel Hill <daniel@gluo.nz> |
bcachefs: data jobs, including rebalance wait for copygc. move_ratelimit() now has a bool that specifies whether we want to wait for copygc to finish. When copygc is running, we're probably low on free buckets instead of consuming the remaining buckets, we want to wait for copygc to finish. This should help with performance, and run away bucket fragmentation. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1cab5a82 |
|
21-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Go RW before bch2_check_lrus() btree updates before going RW are expensive if they're in random order, since they use the list of keys for journal replay to insert, which is just a gap buffer. This patch improves the bucket invalidate path so that if bch2_check_lrus() hasn't finished it only prints warnings instead of doing an emergency shutdown, which means we can now set BCH_FS_MAY_GO_RW before bch2_check_lrus(). Also, the filesystem state bits are reorganized a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
104c6974 |
|
15-Mar-2022 |
Daniel Hill <daniel@gluo.nz> |
bcachefs: Add persistent counters This adds a new superblock field for persisting counters and adds a sysfs interface in counters/ exposing these counters. The superblock field is ignored by older versions letting us avoid an on disk version bump. Each sysfs file outputs a counter that tracks since filesystem creation and a counter for the current mount session. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c0960603 |
|
17-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Shutdown path improvements We're seeing occasional firings of the assertion in the key cache shutdown code that nr_dirty == 0, which means we must sometimes be doing transaction commits after we've gone read only. Cleanups & changes: - BCH_FS_ALLOC_CLEAN renamed to BCH_FS_CLEAN_SHUTDOWN - new helper bch2_btree_interior_updates_flush(), which returns true if it had to wait - bch2_btree_flush_writes() now also returns true if there were btree writes in flight - __bch2_fs_read_only now checks if btree writes were in flight in the shutdown loop: btree write completion does a transaction update, to update the pointer in the parent node - assert that !BCH_FS_CLEAN_SHUTDOWN in __bch2_trans_commit Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
75c8d030 |
|
12-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill old rebuild_replicas option This option was useful when the replicas mechism was new and still being debugged, but hasn't been used in ages - let's delete it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
ce6201c4 |
|
20-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use a genradix for reading journal entries Previously, the journal read path used a linked list for storing the journal entries we read from disk. But there's been a bug that's been causing journal_flush_delay to incorrectly be set to 0, leading to far more journal entries than is normal being written out, which then means filesystems are no longer able to start due to the O(n^2) behaviour of inserting into/searching that linked list. Fix this by switching to a radix tree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
d1d7737f |
|
03-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Gap buffer for journal keys Btree updates before we go RW work by inserting into the array of keys that journal replay will insert - but inserting into a flat array is O(n), meaning if btree_gc needs to update many alloc keys, we're O(n^2). Fortunately, the updates btree_gc does happens in sequential order, which means a gap buffer works nicely here - this patch implements a gap buffer for journal keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
5735608c |
|
10-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill main in-memory bucket array All code using the in-memory bucket array, excluding GC, has now been converted to use the alloc btree directly - so we can finally delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
caece7fe |
|
10-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New bucket invalidate path In the old allocator code, preparing an existing empty bucket was part of the same code path that invalidated buckets containing cached data. In the new allocator code this is no longer the case: the main allocator path finds empty buckets (via the new freespace btree), and can't allocate buckets that contain cached data. We now need a separate code path to invalidate buckets containing cached data when we're low on empty buckets, which this patch implements. When the number of free buckets decreases that triggers the new invalidate path to run, which uses the LRU btree to pick cached data buckets to invalidate until we're above our watermark. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
59cc38b8 |
|
10-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New discard implementation In the old allocator code, buckets would be discarded just prior to being used - this made sense in bcache where we were discarding buckets just after invalidating the cached data they contain, but in a filesystem where we typically have more free space we want to be discarding buckets when they become empty. This patch implements the new behaviour - it checks the need_discard btree for buckets awaiting discards, and then clears the appropriate bit in the alloc btree, which moves the buckets to the freespace btree. Additionally, discards are now enabled by default. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f25d8215 |
|
09-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill allocator threads & freelists Now that we have new persistent data structures for the allocator, this patch converts the allocator to use them. Now, foreground bucket allocation uses the freespace btree to find buckets to allocate, instead of popping buckets off the freelist. The background allocator threads are no longer needed and are deleted, as well as the allocator freelists. Now we only need background tasks for invalidating buckets containing cached data (when we are low on empty buckets), and for issuing discards. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c6b2826c |
|
11-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Freespace, need_discard btrees This adds two new btrees for the upcoming allocator rewrite: an extents btree of free buckets, and a btree for buckets awaiting discards. We also add a new trigger for alloc keys to keep the new btrees up to date, and a compatibility path to initialize them on existing filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d326ab2f |
|
04-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: LRU btree This implements new persistent LRUs, to be used for buckets containing cached data, as well as stripes ordered by time when a block became empty. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
91d961ba |
|
29-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: darrays Inspired by CCAN darray - simple, stupid resizable (dynamic) arrays. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
75ef2c59 |
|
26-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Start moving debug info from sysfs to debugfs In sysfs, files can only output at most PAGE_SIZE. This is a problem for debug info that needs to list an arbitrary number of times, and because of this limit some of our debug info has been terser and harder to read than we'd like. This patch moves info about journal pins and cached btree nodes to debugfs, and greatly expands and improves the output we return. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
55334d78 |
|
26-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill BCH_FS_HOLD_BTREE_WRITES This was just dead code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
78c8fe20 |
|
19-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Normal update/commit path now works before going RW This improves __bch2_trans_commit - early in the recovery process, when we're running btree_gc and before we want to go RW, it now uses bch2_journal_key_insert() to add the update to the list of updates for journal replay to do, instead of btree_gc having to use separate interfaces depending on whether we're running at bringup or, later, runtime. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
10b93677 |
|
19-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Delete some flag bits that are no longer used Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
2ce8fbd9 |
|
13-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill bch2_bkey_debugcheck The old .debugcheck methods are no more and this just calls the .invalid method, which doesn't add much since we already check that when doing btree updates and when reading metadata in. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
c45c8667 |
|
24-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_gc_gens() no longer uses bucket array Like the previous patches, this converts bch2_gc_gens() to use the alloc btree directly, and private arrays of generation numbers for its own recalculation of oldest_gen. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
ec061b21 |
|
25-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_gc no longer uses main in-memory bucket array This changes the btree_gc code to only use the second bucket array, the one dedicated to GC. On completion, it compares what's in its in memory bucket array to the allocation information in the btree and writes it directly, instead of updating the main in-memory bucket array and writing that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
7c8f6f98 |
|
12-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_id_cached() Add a new helper that returns true if the given btree ID uses the btree key cache. This enables some new cleanups, since the helper can check the options for whether caching is enabled on a given btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
80bf2f34 |
|
06-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix freeing in bch2_dev_buckets_resize() We were double-freeing old_buckets and not freeing old_buckets_gens: also, the code was supposed to free buckets, not old_buckets; old_buckets is only needed because we have to use rcu_assign_pointer() instead of swap(), and won't be set if we hit the error path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
21aec962 |
|
04-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New data structure for buckets waiting on journal commit Implement a hash table, using cuckoo hashing, for empty buckets that are waiting on a journal commit before they can be reused. This replaces the journal_seq field of bucket_mark, and is part of eventually getting rid of the in memory bucket array. We may need to make bch2_bucket_needs_journal_commit() lockless, pending profiling and testing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
9b6e2f1e |
|
04-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
Revert "bcachefs: Delete some obsolete journal_seq_blacklist code" This reverts commit f95b61228efd04c9c158123da5827c96e9773b29. It turns out, we're seeing filesystems in the wild end up with blacklisted btree node bsets - this should not be happening, and until we understand why and fix it we need to keep this code around. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
03ea3962 |
|
04-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Log & error message improvements - Add a shim uuid_unparse_lower() in the kernel, since %pU doesn't work in userspace - We don't need to print the bcachefs: or the filesystem name prefix in userspace - Improve a few error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
365f64f3 |
|
03-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add verbose log messages for journal read Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
eacb2574 |
|
02-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch_dev->dev Add a field to bch_dev for the dev_t of the underlying block device - this fixes a null ptr deref in tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
d8601afc |
|
27-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Simplify journal replay With BTREE_ITER_WITH_JOURNAL, there's no longer any restrictions on the order we have to replay keys from the journal in, and we can also start up journal reclaim right away - and delete a bunch of code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
5222a460 |
|
25-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_WITH_JOURNAL This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is automatically enabled when initializing a btree iterator before journal replay has completed - it overlays the contents of the journal with the btree. This lets us delete bch2_btree_and_journal_walk() and just use the normal btree iterator interface instead - which also lets us delete a significant amount of duplicated code. Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch - we're redoing the binary search over keys in the journal every time we call bch2_btree_iter_peek(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dfd41fb9 |
|
31-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix race between btree updates & journal replay Add a flag to indicate whether a journal replay key has been overwritten, and set/test it with appropriate btree locks held. This fixes a race between the allocator - invalidating buckets, and doing btree updates - and journal replay, which before this patch could clobber the allocator thread's update with an older version of the key from the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
a7860877 |
|
25-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New in-memory array for bucket gens The main in-memory bucket array is going away, but we'll still need to keep bucket generations in memory, at least for now - ptr_stale() needs to be an efficient operation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
9ddffaf8 |
|
25-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Put open_buckets in a hashtable This is so that the copygc code doesn't have to refer to bucket_mark.owned_by_allocator - assisting in getting rid of the in memory bucket array. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
04f0f77d |
|
26-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Delete some obsolete journal_seq_blacklist code Since metadata version bcachefs_metadata_version_btree_ptr_sectors_written, we haven't needed the journal seq blacklist mechanism for ignoring blacklisted btree node writes - we now only need it for ignoring journal entries that were written after the newest flush journal entry, and then we only need to keep those blacklist entries around until journal replay is finished. That means we can delete the code for scanning btree nodes to GC journal_seq_blacklist entries. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
c64740ef |
|
30-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't start allocator threads too early If the allocator threads start before journal replay has finished replaying alloc keys, journal replay might overwrite the allocator's btree updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
09943313 |
|
24-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rewrite bch2_bucket_alloc_new_fs() This changes bch2_bucket_alloc_new_fs() to a simple bump allocator that doesn't need to use the in memory bucket array, part of a larger patch series to entirely get rid of the in memory bucket array, except for gc/fsck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
e4099990 |
|
14-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Turn encoded_extent_max into a regular option It'll now be handled at format time and in sysfs like other options - it still can only be set at format time, though. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
8244f320 |
|
14-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Option improvements This adds flags for options that must be a power of two (block size and btree node size), and options that are stored in the superblock as a power of two (encoded extent max). Also: options are now stored in memory in the same units they're displayed in (bytes): we now convert when getting and setting from the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
99fafb04 |
|
20-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix some shutdown path bugs This fixes some bugs when we hit an error very early in the filesystem startup path, before most things have been initialized. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
991ba021 |
|
10-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add more time_stats This adds more latency/event measurements and breaks some apart into more events. Journal writes are broken apart into flush writes and noflush writes, btree compactions are broken out from btree splits, btree mergers are added, as well as btree_interior_updates - foreground and total. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
990d42d1 |
|
04-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out struct gc_stripe from struct stripe We have two radix trees of stripes - one that mirrors some information from the stripes btree in normal operation, and another that GC uses to recalculate block usage counts. The normal one is now only used for finding partially empty stripes in order to reuse them - the normal stripes radix tree and the GC stripes radix tree are used significantly differently, so this patch splits them into separate types. In an upcoming patch we'll be replacing c->stripes with a btree that indexes stripes by the order we want to reuse them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
fc6c01e2 |
|
28-Nov-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert bucket_alloc_ret to negative error codes Start a new header, errcode.h, for bcachefs-private error codes - more error codes will be converted later. This patch just converts bucket_alloc_ret so that they can be mixed with standard error codes and passed as ERR_PTR errors - the ec.c code was doing this already, but incorrectly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
c714614b |
|
15-Nov-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Disk space accounting fix on brand-new fs The filesystem initialization path first marks superblock and journal buckets non transactionally, since the btree isn't functional yet. That path was updating the per-journal-buf percpu counters via bch2_dev_usage_update(), and updating the wrong set of counters so those updates didn't get written out until journal entry 4. The relevant code is going to get significantly rewritten in the future as we transition away from the in memory bucket array, so this just hacks around it for now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
0a84a066 |
|
15-Nov-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Also log device name in userspace Change log messages in userspace to be closer to what they are in kernel space, and include the device name - it's also useful in userspace. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
2027875b |
|
10-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add BCH_SUBVOLUME_UNLINKED Snapshot deletion needs to become a multi step process, where we unlink, then tear down the page cache, then delete the subvolume - the deleting flag is equivalent to an inode with i_nlink = 0. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
14b393ee |
|
15-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Subvolumes, snapshots This patch adds subvolume.c - support for the subvolumes and snapshots btrees and related data types and on disk data structures. The next patches will start hooking up this new code to existing code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
67e0dd8f |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_path This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8dd6ed94 |
|
23-Jul-2021 |
Brett Holman <bholman.devel@gmail.com> |
bcachefs: add progress stats to sysfs This adds progress stats to sysfs for copygc, rebalance, recovery, and the cmd_job ioctls. Signed-off-by: Brett Holman <bholman.devel@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9f1833ca |
|
10-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Update btree ptrs after every write This closes a significant hole (and last known hole) in our ability to verify metadata. Previously, since btree nodes are log structured, we couldn't detect lost btree writes that weren't the first write to a given node. Additionally, this seems to have lead to some significant metadata corruption on multi device filesystems with metadata replication: since a write may have made it to one device and not another, if we read that btree node back from the replica that did have that write and started appending after that point, the other replica would have a gap in the bset entries and reading from that replica wouldn't find the rest of the bsets. But, since updates to interior btree nodes are now journalled, we can close this hole by updating pointers to btree nodes after every write with the currently written number of sectors, without negatively affecting performance. This means we will always detect lost or corrupt metadata - it also means that our btree is now a curious hybrid of COW and non COW btrees, with all the benefits of both (excluding complexity). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
d976a84e |
|
22-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't loop into topology repair Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
890b74f0 |
|
23-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fsck for reflink refcounts Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
9f2772c4 |
|
27-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out btree_error_wq We can't use btree_update_wq becuase btree updates may be waiting on btree writes to complete. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ddc7dd62 |
|
27-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't use uuid in tracepoints %pU for printing out pointers to uuids doesn't work in perf trace Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
731bdd2e |
|
22-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a workqueue for btree io completions Also, clean up workqueue usage - we shouldn't be using system workqueues, pretty much everything we do needs to be on our own WQ_MEM_RECLAIM workqueues. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
1ce0cf5f |
|
21-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a debug mode that always reads from every btree replica There's a new module parameter, verify_all_btree_replicas, that enables reading from every btree replica when reading in btree nodes and comparing them against each other. We've been seeing some strange btree corruption - this will hopefully aid in tracking it down and catching it more often. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
ef1b2092 |
|
18-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Ratelimiting for writeback IOs Writeback throttling is a kernel config option and not always enabled. When it's not enabled we need a fallback, to avoid unbounded memory pinning and work item backlogs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
595c1e9b |
|
28-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix time handling There were some overflows in the time conversion functions - fix this by converting tv_sec and tv_nsec separately. Also, set sb->time_min and sb->time_max. Fixes xfstest generic/258. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
aae15aaf |
|
24-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New and improved topology repair code This splits out btree topology repair into a separate pass, and makes some improvements: - When we have to pick which of two overlapping nodes to drop keys from, we use the btree node header sequence number to preserve the newer node - the gc code has been changed so that it doesn't bail out if we're continuing/ignoring on fsck error - this way the dump tool can skip running the repair pass but still walk all reachable metadata - add a new superblock flag indicating when a filesystem is known to have btree topology issues, and the topology repair pass should be run - changing the start/end of a node might mean keys in that node have to be deleted: this patch handles that better by splitting it out into a separate function and running it explicitly in the topology repair code, previously those keys were only being dropped when the btree node was read in. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4932e07e |
|
24-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix key cache assertion Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6adaac0b |
|
20-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Update bch2_btree_verify() bch2_btree_verify() verifies that the btree node on disk matches what we have in memory. This patch changes it to verify every replica, and also fixes it for interior btree nodes - there's a mem_ptr field which is used as a scratch space and needs to be zeroed out for comparing with what's on disk. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dac1525d |
|
16-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: gc shouldn't care about owned_by_allocator The owned_by_allocator field is a purely in memory thing, even if/when we bring back GC at runtime there's no need for it to be recalculating this field. This is prep work for pulling it out of struct bucket, and eventually getting rid of the bucket array. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ac516d0e |
|
13-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add the status of bucket gen gc to sysfs Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ba5f03d3 |
|
31-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a sysfs var for average btree write size Useful number for performance tuning. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
84cc758d |
|
21-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Validate bset version field against sb version fields The superblock version fields need to be accurate to know whether a filesystem is supported, thus we should be verifying them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
41f8b09e |
|
20-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rename BTREE_ID enums for consistency with other enums Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9620c3ec |
|
23-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a mempool for the replicas delta list Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e131b6aa |
|
23-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a mempool for btree_trans bump allocator This allocation is required for filesystem operations to make forward progress, thus needs a mempool. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bae895a5 |
|
18-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add allocator thread state to sysfs Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
51c66fed |
|
17-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rip out copygc pd controller We have a separate mechanism for ratelimiting copygc now - the pd controller has only been causing problems. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5bbe4bf9 |
|
13-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add copygc wait to sysfs Currently debugging an issue with copygc not running when it's supposed to, and this is an obvious first step. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cb66fc5f |
|
13-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix copygc threshold Awhile back the meaning of is_available_bucket() and thus also bch_dev_usage->buckets_unavailable changed to include buckets that are owned by the allocator - this was so that the stat could be persisted like other allocation information, and wouldn't have to be regenerated by walking each bucket at mount time. This broke copygc, which needs to consider buckets that are reclaimable and haven't yet been grabbed by the allocator thread and moved onta freelist. This patch fixes that by adding dev_buckets_reclaimable() for copygc and the allocator thread, and cleans up some of the callers a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4b8f89af |
|
03-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fixes/improvements for journal entry reservations This fixes some arithmetic bugs in "bcachefs: Journal updates to dev usage" - additionally, it cleans things up by switching everything that goes in every journal entry to the journal_entry_res mechanism. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
180fb49d |
|
21-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal updates to dev usage This eliminates the need to scan every bucket to regenerate dev_usage at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2abe5420 |
|
21-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Persist 64 bit io clocks Originally, bcachefs - going back to bcache - stored, for each bucket, a 16 bit counter corresponding to how long it had been since the bucket was read from. But, this required periodically rescaling counters on every bucket to avoid wraparound. That wasn't an issue in bcache, where we'd perodically rewrite the per bucket metadata all at once, but in bcachefs we're trying to avoid having to walk every single bucket. This patch switches to persisting 64 bit io clocks, corresponding to the 64 bit bucket timestaps introduced in the previous patch with KEY_TYPE_alloc_v2. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a28bd48a |
|
29-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add an assertion to check for journal writes to same location Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a0b73c1c |
|
26-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add (partial) support for fixing btree topology When we walk the btrees during recovery, part of that is checking that btree topology is correct: for every interior btree node, its child nodes should exactly span the range the parent node covers. Previously, we had checks for this, but not repair code. Now that we have the ability to do btree updates during initial GC, this patch adds that repair code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5b593ee1 |
|
26-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add support for doing btree updates prior to journal replay Some errors may need to be fixed in order for GC to successfully run - walk and mark all metadata. But we can't start the allocators and do normal btree updates until after GC has completed, and allocation information is known to be consistent, so we need a different method of doing btree updates. Fortunately, we already have code for walking the btree while overlaying keys from the journal to be replayed. This patch adds an update path that adds keys to the list of keys to be replayed by journal replay, and also fixes up iterators. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4291a331 |
|
08-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_alloc_write() should be writing for all devices Alloc info isn't stored on a particular device, it makes no sense to only be writing it out for rw members - this was causing fsck to not fix alloc info errors, oops. Also, make sure we write out alloc info in other repair paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0fefe8d8 |
|
03-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve some IO error messages it's useful to know whether an error was for a read or a write - this also standardizes error messages a bit more. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f299d573 |
|
13-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Refactor filesystem usage accounting Various filesystem usage counters are kept in percpu counters, with one set per in flight journal buffer. Right now all the code that deals with it assumes that there's only two buffers/sets of counters, but the number of journal bufs is getting increased to 4 in the next patch - so refactor that code to not assume a constant. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b7a9bbfc |
|
19-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Move journal reclaim to a kthread This is to make tracing easier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
876c7af3 |
|
15-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Take a SRCU lock in btree transactions Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1a21bf98 |
|
05-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a single slot percpu buf for btree iters Allocating our array of btree iters is a big enough allocation that it hits the buddy allocator, and we're seeing lots of lock contention. Sticking a single element buffer in front of it should help. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b5e8a699 |
|
02-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improved inode create optimization This shards new inodes into different btree nodes by using the processor ID for the high bits of the new inode number. Much faster than the previous inode create optimization - this also helps with sharding in the other btrees that index by inode number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
692d4031 |
|
02-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out debug_check_btree_accounting This check is very expensive Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
29364f34 |
|
02-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Drop sysfs interface to debug parameters It's not used much anymore, the module paramter interface is better. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
45e4dcba |
|
27-Oct-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Inode create optimization On workloads that do a lot of multithreaded creates all at once, lock contention on the inodes btree turns out to still be an issue. This patch adds a small buffer of inode numbers that are known to be free, so that we can avoid touching the btree on every create. Also, this changes inode creates to update via the btree key cache for the initial create. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d5e4dcc2 |
|
08-Sep-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix unmount path There was a long standing race in the mount/unmount code - the VFS intends for mount/unmount synchronizatino to be handled by the list of superblocks, but we were still holding devices open after tearing down our superblock in the unmount path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e6d11615 |
|
11-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make copygc thread global Per device copygc threads don't move data to different devices and they make fragmentation works - they don't make much sense anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
703e2a43 |
|
06-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Move stripe creation to workqueue This is mainly to solve a lock ordering issue, and also simplifies the code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ba6dd1dd |
|
06-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve stripe triggers/heap code Soon we'll be able to modify existing stripes - replacing empty blocks with new blocks and new p/q blocks. This patch updates the trigger code to handle pointers changing in an existing stripe; also, it significantly improves how the stripes heap works, which means we can get rid of the stripe creation/deletion lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7dd1ebfa |
|
15-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Increase size of btree node reserve Also tweak the allocator to be more aggressive about keeping it full. The recent changes to make updates to interior nodes transactional (and thus generate updates to the alloc btree) all put more stress on the btree node reserves. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2ca88e5a |
|
07-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree key cache This introduces a new kind of btree iterator, cached iterators, which point to keys cached in a hash table. The cache also acts as a write cache - in the update path, we journal the update but defer updating the btree until the cached entry is flushed by journal reclaim. Cache coherency is for now up to the users to handle, which isn't ideal but should be good enough for now. These new iterators will be used for updating inodes and alloc info (the alloc and stripes btrees). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1ada1606 |
|
15-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Turn c->state_lock into an rwsem Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
374153c2 |
|
09-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: More open buckets We need a larger open bucket reserve now that the btree interior update path holds onto open bucket references; filesystems with many high through devices may need more open buckets now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a27443bc |
|
03-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill old allocator startup code It's not needed anymore since we can now write to buckets before updating the alloc btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
495aabed |
|
02-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add debug code to print btree transactions Intented to help debug deadlocks, since we can't use lockdep to check btree node lock ordering. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
039fc4c5 |
|
28-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fixes for going RO Now that interior btree updates are fully transactional, we don't need to write out alloc info in a loop. However, interior btree updates do put more things in the journal, so we still need a loop in the RO sequence. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
00b8ccf7 |
|
25-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Interior btree updates are now fully transactional We now update the alloc info (bucket sector counts) atomically with journalling the update to the interior btree nodes, and we also set new btree roots atomically with the journalled part of the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2340fd9d |
|
24-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Be more rigorous about marking the filesystem clean Previously, there was at least one error path where we could mark the filesystem clean when we hadn't sucessfully written out alloc info. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f1d786a0 |
|
25-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add an option for keeping journal entries after startup This will be used by the userspace debug tools. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ac7c51b2 |
|
08-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Seralize btree_update operations at btree_update_nodes_written() Prep work for journalling updates to interior nodes - enforcing ordering will greatly simplify those changes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1c3ff72c |
|
28-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert some enums to x-macros Helps for preventing things from getting out of sync. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bd7e82ee |
|
20-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill ca->freelist_lock All uses were supposed to be switched over to c->freelist_lock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
35189e09 |
|
09-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bkey_on_stack This implements code for storing small bkeys on the stack and allocating out of a mempool if they're too big. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ff929515 |
|
28-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Trust btree alloc info at runtime This lets us avoid a cache miss in the write path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2a9101a9 |
|
19-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Refactor bch2_trans_commit() path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ad7e137e |
|
28-Aug-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Switch reconstruct_alloc to a mount option Right now this is the only way of repairing bucket gens in the future Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6671a708 |
|
27-Aug-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Refactor bch2_alloc_write() Major simplification - gets rid of the need for marking buckets as dirty, instead we write buckets if the in memory mark is different from what's in the btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4e1510c3 |
|
22-Aug-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a hint for allocating new stripes This way we aren't doing a full linear scan every time we create a new stripe. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
76426098 |
|
16-Aug-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Reflink Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5e82a9a1 |
|
10-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Write out fs usage consistently Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fca1223c |
|
03-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Avoid write lock on mark_lock mark_lock is a frequently taken lock, and there's also potential for deadlocks since currently bch2_clear_page_bits which is called from memory reclaim has to take it to drop disk reservations. The disk reservation get path takes it when it recalculates the number of sectors known to be available, but it's not really needed for consistency. We just want to make sure we only have one thread updating the sectors_available count, which we can do with a dedicated mutex. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3811aa6d |
|
11-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_bkey_ptrs_invalid() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ea416023 |
|
16-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: use same timesource as current_time() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f80b4e64 |
|
16-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix hang while shutting down If the allocator thread exited before bch2_dev_allocator_stop() was called (because of an error), bch2_dev_allocator_quiesce() could hang. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1dd7f9d9 |
|
04-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rewrite journal_seq_blacklist machinery Now, we store blacklisted journal sequence numbers in the superblock, not the journal: this helps to greatly simplify the code, and more importantly it's now implemented in a way that doesn't require all btree nodes to be visited before starting the journal - instead, we unconditionally blacklist the next 4 journal sequence numbers after an unclean shutdown. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ac7f0d77 |
|
03-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: ratelimit copygc warning Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0bc166ff |
|
28-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Track whether filesystem has errors in superblock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f13f5a8c |
|
27-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: move some checks to expensive_debug_checks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
03e183cb |
|
21-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Verify fs hasn't been modified before going rw Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
134915f3 |
|
21-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Go rw lazily Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6122ab63 |
|
21-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: More debug params for testing of recovery paths Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dc3b63dc |
|
21-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add time stats for btree updates Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
49a67206 |
|
18-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add more time stats for being blocked on allocator Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4d8100da |
|
15-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Allocate fs_usage in do_btree_insert_at() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
76f4c7b0 |
|
11-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix oldest_gen handling Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1df42b57 |
|
06-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: don't do initial gc if have alloc info feature Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2c5af169 |
|
24-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: reserve space in journal for fs usage entries Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b935a8a6 |
|
09-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a bug when shutting down before allocator started Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
430735cd |
|
18-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Persist alloc info on clean shutdown - Does not persist alloc info for stripes yet - Also does not yet include filesystem block/sector counts yet, from struct fs_usage - Not made use of just yet Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7ef2a73a |
|
21-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix check for if extent update is allocating Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b030f691 |
|
19-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix some reserve calculations Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0519b72d |
|
18-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a workqueue for journal reclaim journal reclaim writes btree nodes, which can end up waiting for in flight btree writes to complete, and btree write completions run out of workqueues - so we can't run out of the same workqueue or we risk deadlock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0b847a19 |
|
18-Dec-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Lots of option handling improvements Add helptext to option definitions - so we can unify the option handling with the format command Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5663a415 |
|
27-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: refactor bch_fs_usage Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
73e6ab95 |
|
01-Dec-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Switch replicas to mark_lock Prep work for upcoming disk accounting changes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9166b41d |
|
25-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: s/usage_lock/mark_lock better describes what it's for, and we're going to call a new lock usage_lock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8eb7f3ee |
|
18-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: move dirty into bucket_mark Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
90541a74 |
|
21-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add new alloc fields prep work for persistent alloc info Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f0cfb963 |
|
29-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Track nr_inodes with the key marking machinery Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
26609b61 |
|
01-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make bkey types globally unique this lets us get rid of a lot of extra switch statements - in a lot of places we dispatch on the btree node type, and then the key type, so this is a nice cleanup across a lot of code. Also improve the on disk format versioning stuff. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dfe9bfb3 |
|
24-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Stripes now properly subject to gc gc now verifies the contents of the stripes radix tree, important for persistent alloc info Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9ca53b55 |
|
23-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: gc now operates on second set of bucket marks This means we can now use gc to verify the allocation information - important for testing persistant alloc info Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
61274e9d |
|
18-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Allocator startup improvements Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cd575ddf |
|
01-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Erasure coding Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8b335bae |
|
04-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Assorted fixes for running on very small devices It's now possible to create and use a filesystem on a 512k device with 4k buckets (though at that size we still waste almost half to internal reserves) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b092dadd |
|
04-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Scale down number of writepoints when low on space this means we don't have to reserve space for them when calculating filesystem capacity Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7a920560 |
|
30-Oct-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill struct bch_replicas_cpu_entry Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
581edb63 |
|
08-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: mempoolify btree_trans Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a9bec520 |
|
01-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Better calculation of copygc threshold Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b29e197a |
|
22-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Invalidate buckets when writing to alloc btree Prep work for persistent alloc information. Refactoring also lets us make free_inc much smaller, which means a lot fewer buckets stranded on freelists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b2be7c8b |
|
22-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill bucket mark sector count saturation Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c6923995 |
|
21-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: don't call bch2_bucket_seq_cleanup from journal_buf_switch journal_buf_switch is called from the foreground when getting a journal reservation and thus is somewhat latency sensitive; bch2_bucket_seq_cleanup has to run infrequently but is a bit expensive when it does run. Call it from the journal write path instead, and punt the journal write to worqueue context. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
88c07f73 |
|
14-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Only check inode i_nlink during full fsck Now that all filesystem operatinos that manipulate the filesystem heirachy and i_nlink are fully atomic, we can add a feature bit to indicate i_nlink doesn't need to be checked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1c6fdbd8 |
|
17-Mar-2017 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Initial commit Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write filesystem with every feature you could possibly want. Website: https://bcachefs.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|