#
79055f50 |
|
15-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: make sure to release last journal pin in replay This fixes a deadlock when journal replay has many keys to insert that were from fsck, not the journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5ab4beb7 |
|
08-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't scan for btree nodes when we can reconstruct Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a292be3b |
|
27-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Reconstruct missing snapshot nodes When the snapshots btree is going, we'll have to delete huge amounts of data - unless we can reconstruct it by looking at the keys that refer to it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
55936afe |
|
15-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Flag btrees with missing data We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4409b808 |
|
11-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Repair pass for scanning for btree nodes If a btree root or interior btree node goes bad, we're going to lose a lot of data, unless we can recover the nodes that it pointed to by scanning. Fortunately btree node headers are fully self describing, and additionally the magic number is xored with the filesytem UUID, so we can do so safely. This implements the scanning - next patch will rework topology repair to make use of the found nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f2f61f41 |
|
14-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bdbf953b |
|
19-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_shoot_down_journal_keys() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
27fcec6c |
|
30-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Clear recovery_passes_required as they complete without errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
13c1e583 |
|
28-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve -o norecovery; opts.recovery_pass_limit This adds opts.recovery_pass_limit, and redoes -o norecovery to make use of it; this fixes some issues with -o norecovery so it can be safely used for data recovery. Norecovery means "don't do journal replay"; it's an important data recovery tool when we're getting stuck in journal replay. When using it this way we need to make sure we don't free journal keys after startup, so we continue to overlay them: thus it needs to imply retain_recovery_info, as well as nochanges. recovery_pass_limit is an explicit option for telling recovery to exit after a specific recovery pass; this is a much cleaner way of implementing -o norecovery, as well as being a useful debug feature in its own right. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0a34c058 |
|
30-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Ensure bch_sb_field_ext always exists This makes bch_sb_field_ext more consistent with the rest of -o nochanges - we don't want to be varying other codepaths based on -o nochanges, since it's used for testing in dry run mode; also fixes some potential null ptr derefs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4fe0eeea |
|
28-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Flush journal immediately after replay if we did early repair Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d2554263 |
|
23-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Split out recovery_passes.c We've grown a fair amount of code for managing recovery passes; tracking which ones we're running, which ones need to be run, and flagging in the superblock which ones need to be run on the next recovery. So it's worth splitting out into its own file, this code is pretty different from the code in recovery.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a5860368 |
|
16-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't corrupt journal keys gap buffer when dropping alloc info Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cdce1094 |
|
11-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: reconstruct_alloc cleanup Now that we've got the errors_silent mechanism, we don't have to check if the reconstruct_alloc option is set all over the place. Also - users no longer have to explicitly select fsck and fix_errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2cce3752 |
|
25-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: split out ignore_blacklisted, ignore_not_dirty prep work for replaying the journal backwards Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
69426613 |
|
23-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: improve move_gap() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
95ffc7fb |
|
23-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: journal_keys now uses darray helpers nice bit of code cleanup Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
894d0622 |
|
23-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Rename journal_keys.d -> journal_keys.data This will let us use some darray helpers in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
52946d82 |
|
06-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill more -EIO error codes This converts -EIOs related to btree node errors to private error codes, which will help with some ongoing debugging by giving us better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ba89083e |
|
08-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix journal replay with unreadable btree roots When a btree root is unreadable, we still might be able to get some data back by replaying what's in the journal. Previously though, we got confused when journal replay would attempt to replay a key for a level that didn't exist. This adds bch2_btree_increase_depth(), so that journal replay can handle this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6fa30fe7 |
|
06-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: journal_seq_blacklist_add() now handles entries being added out of order bch2_journal_seq_blacklist_add() was bugged when the new entry overlapped with multiple existing entries, and it also assumed new entries are being added in increasing order. This is true on any sane filesystem, but when trying to recover from very badly mangled filesystems we might end up with the journal sequence number rewinding vs. what the blacklist list knows about - easiest to just handle that here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2eeccee8 |
|
12-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix check_version_upgrade() When also downgrading, check_version_upgrade() could pick a new version greater than the latest supported version. Fixes: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8e7834a8 |
|
16-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_fs_usage_base Split out base filesystem usage into its own type; prep work for breaking up bch2_trans_fs_usage_apply(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
72e2c920 |
|
05-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Restart recovery passes more reliably Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
15eaaa4c |
|
03-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Upgrades now specify errors to fix, like downgrades Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d55ddf6e |
|
31-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Online fsck can now fix errors BCH_FS_fsck_done -> BCH_FS_fsck_running; set when we might be fixing fsck errors. Also; set fix_errors to ask by default when fsck is running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
eff1f728 |
|
29-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Upgrading uses bch_sb.recovery_passes_required Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
62719cf3 |
|
23-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix nochanges/read_only interaction nochanges means "we cannot issue writes at all"; it's possible to go into a pseudo read-write mode where we pin dirty metadata in memory, which is used for fsck in dry run mode and doing journal replay on a read only mount, but we do not want to allow an actual read-write mount in nochanges mode. But we do always want to allow early read-write, during recovery - this patch clarifies that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cea07a7b |
|
17-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: vstruct_for_each() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9fea2274 |
|
16-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: for_each_member_device() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
defd9e39 |
|
16-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: darray_for_each() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cf904c8d |
|
16-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_err_(fn|msg) check if should print Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
249bf593 |
|
09-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix snapshot.c assertion for online fsck c->curr_recovery_pass can go backwards; this adds a non rewinding version, c->recovery_pass_done. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
27b2df98 |
|
07-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill for_each_btree_key() for_each_btree_key() handles transaction restarts, like for_each_btree_key2(), but only calls bch2_trans_begin() after a transaction restart - for_each_btree_key2() wraps every loop iteration in a transaction. The for_each_btree_key() behaviour is problematic when it leads to holding the SRCU lock that prevents key cache reclaim for an unbounded amount of time - there's no real need to keep it around. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7f391b2f |
|
06-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_run_online_recovery_passes() Add a new helper for running online recovery passes - i.e. online fsck. This is a subset of our normal recovery passes, and does not - for now - use or follow c->curr_recovery_pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2b41226d |
|
04-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add ability to redirect log output Upcoming patches are going to add two new ioctls for running fsck in the kernel, but pretending that we're running our normal userspace fsck. This patch adds some plumbing for redirecting our normal log messages away from the dmesg log to a thread_with_file file descriptor - via a struct log_output, which will be consumed by the fsck f_op's read method. The new ioctls will allow for running fsck in the kernel against an offline filesystem (without mounting it), and an online filesystem. For an offline filesystem we need a way to pass in a pointer to the log_output, which is done via a new hidden opts.h option. For online fsck, we can set c->output directly, but only want to redirect log messages from the thread running fsck - hence the new c->output_filter method. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3f0e297d |
|
28-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Explicity go RW for fsck This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less error prone. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3c471b65 |
|
26-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: convert bch_fs_flags to x-macro Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9b34f02c |
|
23-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill dev_usage->buckets_ec This counter is redundant; it's simply the sum of BCH_DATA_stripe and BCH_DATA_parity buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b27d7afb |
|
09-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't flush journal after replay The flush_all_pins() after journal replay was unecessary, and trying to completely flush the journal while RW is not a great idea - it's not guaranteed to terminate if other threads keep adding things to the jorunal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cb52d23e |
|
11-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Rename BTREE_INSERT flags BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
57322430 |
|
09-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Make journal replay more efficient Journal replay now first attempts to replay keys in sorted order, similar to how the btree write buffer flush path works. Any keys that can not be replayed due to journal deadlock are then left for later and replayed in journal order, unpinning journal entries as we go. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bdde9829 |
|
09-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Go rw before journal replay This gets us slightly nicer log messages. Also, this slightly clarifies synchronization of c->journal_keys; after we go RW it's in use by multiple threads (so that the btree iterator code can overlay keys from the journal); so it has to be prepped before that point. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9a71de67 |
|
08-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BTREE_INSERT_JOURNAL_REPLAY now "don't init trans->journal_res" This slightly changes how trans->journal_res works, in preparation for changing the btree write buffer flush path to use it. Now, BTREE_INSERT_JOURNAL_REPLAY means "don't take a journal reservation; trans->journal_res.seq already refers to the journal sequence number to pin". Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fbf92708 |
|
16-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Print old version when scanning for old metadata Also: we should be using bch2_fs_read_write_early() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
30418de0 |
|
13-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Flush fsck errors before running twice It's confusing if we run fsck a second time (in debug mode, to verify the second run is clean), but errors are still ratelimited from the first run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
84f16387 |
|
29-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_sb_field_downgrade Add a new superblock section that contains a list of { minor version, recovery passes, errors_to_fix } that is - a list of recovery passes that must be run when downgrading past a given version, and a list of errors to silently fix. The upcoming disk accounting rewrite is not going to be fully compatible: we're going to have to regenerate accounting both when upgrading to the new version, and also from downgrading from the new version, since the new method of doing disk space accounting is a completely different architecture based on deltas, and synchronizing them for every jounal entry write to maintain compatibility is going to be too expensive and impractical. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8b16413c |
|
29-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_sb.recovery_passes_required Add two new superblock fields. Since the main section of the superblock is now fully, we have to add a new variable length section for them - bch_sb_field_ext. - recovery_passes_requried: recovery passes that must be run on the next mount - errors_silent: errors that will be silently fixed These are to improve upgrading and dwongrading: these fields won't be cleared until after recovery successfully completes, so there won't be any issues with crashing partway through an upgrade or a downgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
808c680f |
|
29-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add persistent identifiers for recovery passes The next patch will start to refer to recovery passes from the superblock; naturally, we now need identifiers that don't change, since the existing enum is in the order in which they are run and is not fixed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f87bf892 |
|
29-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: fix setting version_upgrade_complete If a superblock write hasn't happened (i.e. we never had to go rw), then c->sb.version will be out of date w.r.t. c->disk_sb.sb->version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4a147af2 |
|
09-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix uninitialized var in bch2_journal_replay() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8a443d3e |
|
17-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Proper refcounting for journal_keys The btree iterator code overlays keys from the journal until journal replay is finished; since we're now starting copygc/rebalance etc. before replay is finished, this is multithreaded access and thus needs refcounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5a53f851 |
|
03-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix recovery when forced to use JSET_NO_FLUSH journal entry When we didn't find anything in the journal that we'd like to use, and we're forced to use whatever we can find - that entry will have been a JSET_NO_FLUSH entry with a garbage last_seq value, since it's not normally used. Initialize it to something sane, for bch2_fs_journal_start(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6dfa10ab |
|
31-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix build errors with gcc 10 gcc 10 seems to complain about array bounds in situations where gcc 11 does not - curious. This unfortunately requires adding some casts for now; we may investigate getting rid of our __u64 _data[] VLA in a future patch so that our start[0] members can be VLAs. Reported-by: John Stoffel <john@stoffel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b65db750 |
|
24-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Enumerate fsck errors This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fb3f57bb |
|
20-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: rebalance_work This adds a new btree, rebalance_work, to eliminate scanning required for finding extents that need work done on them in the background - i.e. for the background_target and background_compression options. rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an extent in the extents or reflink btree at the same pos. A new extent field is added, bch_extent_rebalance, which indicates that this extent has work that needs to be done in the background - and which options to use. This allows per-inode options to be propagated to indirect extents - at least in some circumstances. In this patch, changing IO options on a file will not propagate the new options to indirect extents pointed to by that file. Updating (setting/clearing) the rebalance_work btree is done by the extent trigger, which looks at the bch_extent_rebalance field. Scanning is still requrired after changing IO path options - either just for a given inode, or for the whole filesystem. We indicate that scanning is required by adding a KEY_TYPE_cookie key to the rebalance_work btree: the cookie counter is so that we can detect that scanning is still required when an option has been flipped mid-way through an existing scan. Future possible work: - Propagate options to indirect extents when being changed - Add other IO path options - nr_replicas, ec, to rebalance_work so they can be applied in the background when they change - Add a counter, for bcachefs fs usage output, showing the pending amount of rebalance work: we'll probably want to do this after the disk space accounting rewrite (moving it to a new btree) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bbe682c7 |
|
21-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Ensure devices are always correctly initialized We can't mark device superblocks or allocate journal on a device that isn't online. That means we may need to do this on every mount, because we may have formatted a new filesystem and then done the first mount (bch2_fs_initialize()) in degraded mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
88dfe193 |
|
19-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_btree_id_str() Since we can run with unknown btree IDs, we can't directly index btree IDs into fixed size arrays. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b0b5bbf9 |
|
19-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't run bch2_delete_dead_snapshots() unnecessarily Be a bit more careful about when bch2_delete_dead_snapshots needs to run: it only needs to run synchronously if we're running fsck, and it only needs to run at all if we have snapshot nodes to delete or if fsck has noticed that it needs to run. Also: Rename BCH_FS_HAVE_DELETED_SNAPSHOTS -> BCH_FS_NEED_DELETE_DEAD_SNAPSHOTS Kill bch2_delete_dead_snapshots_hook(), move functionality to bch2_mark_snapshot() Factor out bch2_check_snapshot_needs_deletion(), to explicitly check if we need to be running snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
795413c5 |
|
29-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix drop_alloc_keys() For consistency with the rest of the reconstruct_alloc option, we should be skipping all alloc keys. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4fc1f402 |
|
27-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix another smatch complaint This should be harmless, but initialize last_seq anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7dcf62c0 |
|
26-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Make btree root read errors recoverable The entire btree will be lost, but that is better than the entire filesystem not being recoverable. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6bd68ec2 |
|
12-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Heap allocate btree_trans We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
96dea3d5 |
|
12-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix W=12 build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6bf3766b |
|
12-Sep-2023 |
Colin Ian King <colin.i.king@gmail.com> |
bcachefs: Fix a handful of spelling mistakes in various messages There are several spelling mistakes in error messages. Fix these. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
aaad530a |
|
27-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BTREE_ID_logged_ops Add a new btree for long running logged operations - i.e. for logging operations that we can't do within a single btree transaction, so that they can be resumed if we crash. Keys in the logged operations btree will represent operations in progress, with the state of the operation stored in the value. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8e877caa |
|
16-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Split out snapshot.c subvolume.c has gotten a bit large, this splits out a separate file just for managing snapshot trees - BTREE_ID_snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e0a2b00a |
|
11-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix check_version_upgrade() We were failing to upgrade to the latest compatible version - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
401585fe |
|
05-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree_journal_iter.c Split out a new file from recovery.c for managing the list of keys we read from the journal: before journal replay finishes the btree iterator code needs to be able to iterate over and return keys from the journal as well, so there's a fair bit of code here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a37ad1a3 |
|
05-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: sb-clean.c Pull code for bch_sb_field_clean out into its own file. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1e81f89b |
|
06-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix assorted checkpatch nits Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e08e63e4 |
|
06-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BCH_COMPAT_bformat_overflow_done no longer required Awhile back, we changed bkey_format generation to ensure that the packed representation could never represent fields larger than the unpacked representation. This was to ensure that bkey_packed_successor() always gave a sensible result, but in the current code bkey_packed_successor() is only used in a debug assertion - not for anything important. This kills the requirement that we've gotten rid of those weird bkey formats, and instead changes the assertion to check if we're dealing with an old weird bkey format. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0ed4ca14 |
|
03-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Ensure topology repair runs This fixes should_restart_for_topology_repair() - previously it was returning false if the btree io path had already seleceted topology repair to run, even if it hadn't run yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ad52bac2 |
|
03-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Log a message when running an explicit recovery pass Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a1d1072f |
|
03-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Print out required recovery passes on version upgrade Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b56b787c |
|
02-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: In debug mode, run fsck again after fixing errors We want to ensure that fsck actually fixed all the errors it found - the second fsck run should be clean. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
922bc5a0 |
|
16-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Make topology repair a normal recovery pass This adds bch2_run_explicit_recovery_pass(), for rewinding recovery and explicitly running a specific recovery pass - this is a more general replacement for how we were running topology repair before. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ae2e13d7 |
|
16-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_run_explicit_recovery_pass() This introduces bch2_run_explicit_recovery_pass() and uses it for when fsck detects that we need to re-run dead snaphots cleanup, and makes dead snapshot cleanup more like a normal recovery pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a0f8faea |
|
11-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: fix_errors option is now a proper enum Before, it was parsed as a bool but internally it was really an enum: this lets us pass in all the possible values. But we special case the option parsing: no supplied value is parsed as FSCK_FIX_yes, to match the previous behaviour. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f26c67f4 |
|
25-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Snapshot depth, skiplist fields This extents KEY_TYPE_snapshot to include some new fields: - depth, to indicate depth of this particular node from the root - skip[3], skiplist entries for quickly walking back up to the root These are to improve bch2_snapshot_is_ancestor(), making it O(ln(n)) instead of O(n) in the snapshot tree depth. Skiplist nodes are picked at random from the set of ancestor nodes, not some fixed fraction. This introduces bcachefs_metadata_version 1.1, snapshot_skiplists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
065bd335 |
|
10-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Version table now lists required recovery passes Now that we've got forward compatibility sorted out, we should be doing more frequent version upgrades in the future. To avoid having to run a full fsck for every version upgrade, this improves the BCH_METADATA_VERSIONS() table to explicitly specify a bitmask of recovery passes to run when upgrading to or past a given version. This means we can also delete PASS_UPGRADE(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6619d846 |
|
09-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_sb_maybe_downgrade(), bch2_sb_upgrade() Add some new helpers, and fix upgrade/downgrade in bch2_fs_initialize(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ba8eeae8 |
|
27-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bcachefs_metadata_version_major_minor This introduces major/minor versioning to the superblock version number. Major version number changes indicate incompatible releases; we can move forward to a new major version number, but not backwards. Minor version numbers indicate compatible changes - these add features, but can still be mounted and used by old versions. With the recent patches that make it possible to roll out new btrees and key types without breaking compatibility, we should be able to roll out most new features without incompatible changes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
067d228b |
|
07-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Enumerate recovery passes Recovery and fsck have many different passes/jobs to do, which always run in the same order - but not all of them run all the time. Some are for fsck, some for unclean shutdown, some for version upgrades. This adds some new structure: a defined list of recovery passes that we can run in a loop, as well as consolidating the log messages. The main benefit is consolidating the "should run this recovery pass" logic, as well as cleaning up the "this recovery pass has finished" state; instead of having a bunch of ad-hoc state bits in c->flags, we've now got c->curr_recovery_pass. By consolidating the "should run this recovery pass" logic, in the future on disk format upgrades will be able to say "upgrading to this version requires x passes to run", instead of forcing all of fsck to run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
78328fec |
|
08-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Stash journal replay params in bch_fs For the upcoming enumeration of recovery passes, we need all recovery passes to be called the same way - including journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
10a6ced2 |
|
08-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill bch2_bucket_gens_read() This folds bch2_bucket_gens_read() into bch2_alloc_read(), doing the version check there. This is prep work for enumarating all recovery passes: we need some cleanup first to make calling all the recovery passes consistent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3045bb95 |
|
27-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: version_upgrade is now an enum The version_upgrade parameter is now an enum, not a bool, and it's persistent in the superblock: - compatible (default): upgrade to the latest compatible version - incompatible: upgrade to latest incompatible version - none Currently all upgrades are incompatible upgrades, but the next release will introduce major:minor versions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
24964e1c |
|
28-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE() Version upgrades are not atomic operations: when we do a version upgrade we need to update the superblock before we start using new features, and then when the upgrade completes we need to update the superblock again. This adds a new superblock field so we can detect and handle incomplete version upgrades. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c8b4534d |
|
07-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Delete redundant log messages Now that we have distinct error codes for different memory allocation failures, the early init log messages are no longer needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
73bd774d |
|
06-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Assorted sparse fixes - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
faa6cb6c |
|
28-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Allow for unknown btree IDs We need to allow filesystems with metadata from newer versions to be mountable and usable by older versions. This patch enables us to roll out new btrees without a new major version number; we can now handle btree roots for unknown btree types. The unknown btree roots will be retained, and fsck (including backpointers) will check them, the same as other btree types. We add a dynamic array for the extra, unknown btree roots, in addition to the fixed size btree root array, and add new helpers for looking up btree roots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e3804b55 |
|
28-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_version_to_text() Add a new helper for printing out metadata versions in a standard format. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ec14fc60 |
|
27-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill JOURNAL_WATERMARK This unifies JOURNAL_WATERMARK with BCH_WATERMARK; we're working towards specifying watermarks once in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1bb3c2a9 |
|
20-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: New error message helpers Add two new helpers for printing error messages with __func__ and bch2_err_str(): - bch_err_fn - bch_err_msg Also kill the old error strings in the recovery path, which were causing us to incorrectly report memory allocation failures - they're not needed anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e47a390a |
|
27-May-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Convert -ENOENT to private error codes As with previous conversions, replace -ENOENT uses with more informative private error codes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1c59b483 |
|
29-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BTREE_ID_snapshot_tree This adds a new btree which gets us a persistent per-snapshot-tree identifier. - BTREE_ID_snapshot_trees - KEY_TYPE_snapshot_tree - bch_snapshot now has a field that points to a snapshot_tree This is going to be used to designate one snapshot ID/subvolume out of a given tree of snapshots as the "main" subvolume, so that we can do quota accounting in that subvolume and not the rest. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bcb79a51 |
|
29-Apr-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_bkey_get_iter() helpers Introduce new helpers for a common pattern: bch2_trans_iter_init(); bch2_btree_iter_peek_slot(); - bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of the correct type - bch2_bkey_get_val_typed() copies the val out of the btree to a (typically stack allocated) variable; it handles the case where the value in the btree is smaller than the current version of the type, zeroing out the remainder. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
27763692 |
|
04-Apr-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add a cond_resched() call to journal_keys_sort() We're just doing cpu work here and it could take awhile, a cond_resched() is definitely needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
62a03559 |
|
31-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Rip out code for storing backpointers in alloc keys We don't store backpointers in alloc keys anymore, since we gained the btree write buffer. This patch drops support for backpointers in alloc keys, and revs the on disk format version so that we know a fsck is required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
349b1d83 |
|
22-Mar-2023 |
Brian Foster <bfoster@redhat.com> |
bcachefs: use reservation for log messages during recovery If we block on journal reservation attempting to log journal messages during recovery, particularly for the first message(s) before we start doing actual work, chances are the filesystem ends up deadlocked. Allow logged messages to use reserved journal space to mitigate this problem. In the worst case where no space is available whatsoever, this at least allows the fs to recognize that the journal is stuck and fail the mount gracefully. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
26559553 |
|
15-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add a fallback when journal_keys doesn't fit in ram We may end up in a situation where allocating the buffer for the sorted journal_keys fails - but it would likely succeed, post compaction where we drop duplicates. We've had reports of this allocation failing, so this adds a slowpath to do the compaction incrementally. This is only a band-aid fix; we need to look at limiting the number of keys in the journal based on the amount of system RAM. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
40a18fe2 |
|
14-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add error message for failing to allocate sorted journal keys Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
65d48e35 |
|
14-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Private error codes: ENOMEM This adds private error codes for most (but not all) of our ENOMEM uses, which makes it easier to track down assorted allocation failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ac2ccddc |
|
04-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Drop some anonymous structs, unions Rust bindgen doesn't cope well with anonymous structs and unions. This patch drops the fancy anonymous structs & unions in bkey_i that let us use the same helpers for bkey_i and bkey_packed; since bkey_packed is an internal type that's never exposed to outside code, it's only a minor inconvenienc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
27616a31 |
|
18-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Simplify ec stripes heap Now that we have a separate data structure for tracking open stripes, the stripes heap can track all existing stripes, which is a nice simplification. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
80c33085 |
|
05-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fragmentation LRU Now that we have much more efficient updates to the LRU btree, this patch adds a new LRU that indexes buckets by fragmentation. This means copygc no longer has to scan every bucket to find buckets that need to be evacuated. Changes: - A new field in bch_alloc_v4, fragmentation_lru - this corresponds to the bucket's position in the fragmentation LRU. We add a new field for this instead of calculating it as needed because we may make the fragmentation LRU optional; this field indicates whether a bucket is on the fragmentation LRU. Also, zoned devices will introduce variable bucket sizes; explicitly recording the LRU position will be safer for them. - A new copygc path for using the fragmentation LRU instead of scanning every bucket and building up an in-memory heap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
806c8a6a |
|
12-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix failure to read btree roots If failed to read a btree root - or if we're not using a btree root, because of the reconstruct_alloc option - make sure we update the corresponding info for the key/level for the root on disk. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
83f33d68 |
|
05-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Rework lru btree This patch changes how the LRU index works: Instead of using KEY_TYPE_lru where the bucket the lru entry points to is part of the value, this switches to KEY_TYPE_set and encoding the bucket we refer to in the low bits of the key. This means that we no longer have to check for collisions when inserting LRU entries. We'll be making using of this in the next patch, which adds a btree write buffer - a pure write buffer for btree updates, where updates are appended to a simple array and then periodically sorted and batch inserted. This is a new on disk format version, and a forced upgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5250b74d |
|
25-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bucket_gens btree To improve mount times, add a btree for just bucket gens, 256 of them per key: this means we'll have to scan drastically less metadata at startup. This adds - trigger for keeping it in sync with the all btree - initialization code, for filesystems from previous versions - new path for reading bucket gens - new fsck code And a new on disk format version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8dd69d9f |
|
21-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: KEY_TYPE_inode_v3, metadata_version_inode_v3 Move bi_size and bi_sectors into the non-varint portion of the inode, so that the write path can update them without going through the relatively expensive unpack/pack operations. Other changes: - Add a field for the offset of the varint section, so we can add new non-varint fields without needing a new inode type, like alloc_v3 - Move bi_mode into the flags field, so that the varint section can be u64 aligned Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
47b323a0 |
|
19-Jan-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Start snapshots before bch2_gc() bch2_gc may require snapshots to be started - the repair path when checking the reflink btree may do updates to the extents btree. This moves bch2_fs_initialize_subvolumes() and bch2_fs_snapshots_start() to before bch2_gc() - since we haven't gone RW yet, the updates in bch2_fs_initialize_subvolumes() are done via the journal replay keys list, so it's fine to do this before bch2_gc(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a8c752bb |
|
17-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New on disk format: Backpointers This patch adds backpointers: we now have a reverse index from device and offset on that device (specifically, offset within a bucket) back to btree nodes and (non cached) data extents. The first 40 backpointers within a bucket are stored in the alloc key; after that backpointers spill over to the next backpointers btree. This is to help avoid performance regressions from additional btree updates on large streaming workloads. This patch adds all the code for creating, checking and repairing backpointers. The next patch in the series is going to use backpointers for copygc - finally getting rid of the need to scan all extents to do copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f2b542ba |
|
11-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Go RW before check_alloc_info() It's possible to do btree updates before going RW by adding them to the list of updates for journal replay to do, but this is limited by what fits in RAM. This patch switches the second alloc info phase to run after going RW - btree_gc has already ensured the alloc btree itself is correct - and tweaks the allocation path to deal with the potential small inconsistencies. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5f5c7466 |
|
17-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Start copygc when first going read-write In the distant past, it wasn't possible to start copygc until after journal replay had finished. Now, the btree iterator code overlays keys from the journal, so there's no reason not to start it earlier - and it solves a rare deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
858536c7 |
|
11-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Convert EROFS errors to private error codes More error code improvements - this gets us more useful error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5bbe3f2d |
|
14-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Log more messages in the journal This patch - Adds a mechanism for queuing up journal entries prior to the journal being started, which will be used for early journal log messages - Adds bch2_fs_log_msg() and improves bch2_trans_log_msg(), which now take format strings. bch2_fs_log_msg() can be used before or after the journal has been started, and will use the appropriate mechanism. - Deletes the now obsolete bch2_journal_log_msg() - And adds more log messages to the recovery path - messages for journal/filesystem started, journal entries being blacklisted, and journal replay starting/finishing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
67ace272 |
|
22-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add a missing bch2_err_str() call Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1ba8a796 |
|
14-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Recover from blacklisted journal entries If it so happens that we crash while dirty, meaning we don't have the superblock clean section, and we erroneously mark a journal entry we wrote as blacklisted, we won't be able to recover. This patch fixes this by adding a fallback: if we've got no superblock clean section, and no non-ignored journal entries, we try the most recent ignored journal entry. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4f948723 |
|
09-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix bch2_journal_keys_peek_upto() bch2_journal_keys_peek_upto() was comparing against btree_id & level incorrectly - fix this by using __journal_key_cmp(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e0de429a |
|
01-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't error out when just reading the journal This tweaks the recovery and journal paths so that we don't error out before we need to: the list_journal command should work, even if we wouldn't be able to replay successfully. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e88a75eb |
|
24-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: New bpos_cmp(), bkey_cmp() replacements This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b2d1d56b |
|
13-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fixes for building in userspace - Marking a non-static function as inline doesn't actually work and is now causing problems - drop that - Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages with bcachefs (filesystem name) - Userspace doesn't have real percpu variables (maybe we can get this fixed someday), put an #ifdef around bch2_disk_reservation_add() fastpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a1019576 |
|
22-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: More style fixes Fixes for various checkpatch errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c167f9e5 |
|
23-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Journal keys overlay fixes - In the btree iterator code that overlays keys from the journal, we were incorrectly specifying level=0 instead of the btree_path's current level in a few places - When we didn't do journal replay, we shouldn't free the journal keys: this fixes cmd_list and cmd_dump, which run in norecovery mode Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3e3e02e6 |
|
19-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Assorted checkpatch fixes checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1ffb876f |
|
12-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill journal_keys->journal_seq_base This removes an optimization that didn't actually save us any memory, due to alignment, but did make the code more complicated than it needed to be. We were also seeing a bug where journal_seq_base wasn't getting correctly initailized, so hopefully it'll fix that too. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1ed0a5d2 |
|
19-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert fsck errors to errcode.h Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
d4bf5eec |
|
18-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use bch2_err_str() in error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
3ab25c1b |
|
22-Jun-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: We can handle missing btree roots for all alloc btrees We can rebuild alloc info if these btree roots are missing - no need to bail out and say the filesystem is unrecoverable Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4ab35c34 |
|
13-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix subvol/snapshot deleting in recovery fsck doesn't want to run while we're cleaning up deleted snapshots - if that work needs to be done, we want it to have finished before fsck runs, otherwise fsck will get confused when it finds multiple keys in the same snapshot ID equivalence class (i.e. the mechanism that snapshot deletion uses for cleaning up redundant keys). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
401ec4db |
|
03-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Printbuf rework This converts bcachefs to the modern printbuf interface/implementation, synced with the version to be submitted upstream. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
57617902 |
|
06-Jun-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix btree_and_journal_iter We had a bug where btree_and_journal_iter would return the same key twice - after deleting it (perhaps because it was present in both the btree and the journal?) This reworks btree_and_journal_iter to track the current position, much like btree_paths, which makes the logic considerably simpler and more robust. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
f2aa0265 |
|
06-Jun-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix for cmd_list_journal cmd_list_journal wasn't correctly listing the most recent journal entries as blacklisted - because in the recovery path when just reading the journal, we were failing to add those to the blacklist table. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
30525f68 |
|
21-May-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix journal_keys_search() overhead Previously, on every btree_iter_peek() operation we were searching the journal keys, doing a full binary search - which was slow. This patch fixes that by saving our position in the journal keys, so that we only do a full binary search when moving our position backwards or a large jump forwards. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
11f5e595 |
|
25-May-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Always print when doing journal replay in fsck This logging improvement helps see when the previous fsck pass has completed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
d8a161ad |
|
14-May-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: LRU repair tweaks - Drop old unneeded parameter for whether we're in initial GC - which was from when btree updates had to be done differently before we went RW. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
099989c1 |
|
20-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix journal_iters_fix() journal_iters_fix() was incorrectly rewinding iterators past keys they had already returned, leading to those keys being double counted in the bch2_gc() path - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
1cab5a82 |
|
21-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Go RW before bch2_check_lrus() btree updates before going RW are expensive if they're in random order, since they use the list of keys for journal replay to insert, which is just a gap buffer. This patch improves the bucket invalidate path so that if bch2_check_lrus() hasn't finished it only prints warnings instead of doing an emergency shutdown, which means we can now set BCH_FS_MAY_GO_RW before bch2_check_lrus(). Also, the filesystem state bits are reorganized a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
75c8d030 |
|
12-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill old rebuild_replicas option This option was useful when the replicas mechism was new and still being debugged, but hasn't been used in ages - let's delete it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
c609947b |
|
11-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix for getting stuck in journal replay In journal replay, we weren't immediately dropping journal pins when we start doing updates that ewern't from journal replay - leading to journal reclaim getting stuck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
5650bb46 |
|
11-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Introduce bch2_journal_keys_peek_(upto|slot)() When many journal replay keys have been overwritten, bch2_journal_keys_peek() was taking excessively long to scan before it found a key to return. Fix this by introducing bch2_journal_keys_peek_upto() which takes a parameter for the end of the range we want, so that we can terminate the search much sooner, and replace all uses of bch2_journal_keys_peek() with peek_upto() or peek_slot(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
502f973d |
|
09-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a few warnings on 32 bit These showed up when building for mips. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
ce6201c4 |
|
20-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use a genradix for reading journal entries Previously, the journal read path used a linked list for storing the journal entries we read from disk. But there's been a bug that's been causing journal_flush_delay to incorrectly be set to 0, leading to far more journal entries than is normal being written out, which then means filesystems are no longer able to start due to the O(n^2) behaviour of inserting into/searching that linked list. Fix this by switching to a radix tree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
95752a02 |
|
20-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Refactor journal_keys_sort() to return an error code When there weren't any keys in the journal there's no need to allocate the buffer - but doing that causes a spurious -ENOMEM. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
822835ff |
|
31-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fold bucket_state in to BCH_DATA_TYPES() Previously, we were missing accounting for buckets in need_gc_gens and need_discard states. This matters because buckets in those states need other btree operations done before they can be used, so they can't be conuted when checking current number of free buckets against the allocation watermark. Also, we weren't directly counting free buckets at all. Now, data type 0 == BCH_DATA_free, and free buckets are counted; this means we can get rid of the separate (poorly defined) count of unavailable buckets. This is a new on disk format version, with upgrade and fsck required for the accounting changes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
e1effd42 |
|
05-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: More improvements for alloc info checks - Move checks for whether the device & bucket are valid from the .key_invalid method to bch2_check_alloc_key(). This is because .key_invalid() is called on keys that may no longer exist (post journal replay), which is a problem when removing/resizing devices. - We weren't checking the need_discard btree to ensure that every set bucket has a corresponding alloc key. This refactors the code for checking the freespace btree, so that it now checks both. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
d1d7737f |
|
03-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Gap buffer for journal keys Btree updates before we go RW work by inserting into the array of keys that journal replay will insert - but inserting into a flat array is O(n), meaning if btree_gc needs to update many alloc keys, we're O(n^2). Fortunately, the updates btree_gc does happens in sequential order, which means a gap buffer works nicely here - this patch implements a gap buffer for journal keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
5735608c |
|
10-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill main in-memory bucket array All code using the in-memory bucket array, excluding GC, has now been converted to use the alloc btree directly - so we can finally delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5add07d5 |
|
17-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fsck for need_discard & freespace btrees Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f25d8215 |
|
09-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill allocator threads & freelists Now that we have new persistent data structures for the allocator, this patch converts the allocator to use them. Now, foreground bucket allocation uses the freespace btree to find buckets to allocate, instead of popping buckets off the freelist. The background allocator threads are no longer needed and are deleted, as well as the allocator freelists. Now we only need background tasks for invalidating buckets containing cached data (when we are low on empty buckets), and for issuing discards. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c6b2826c |
|
11-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Freespace, need_discard btrees This adds two new btrees for the upcoming allocator rewrite: an extents btree of free buckets, and a btree for buckets awaiting discards. We also add a new trigger for alloc keys to keep the new btrees up to date, and a compatibility path to initialize them on existing filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
31f63fd1 |
|
14-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Introduce a separate journal watermark for copygc Since journal reclaim -> btree key cache flushing may require the allocation of new btree nodes, it has an implicit dependency on copygc in order to make forward progress - so we should avoid blocking copygc unless the journal is really close to full. This introduces watermarks to replace our single MAY_GET_UNRESERVED bit in the journal, and adds a watermark for copygc and plumbs it through. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d5d3be7d |
|
10-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_journal_log_msg() This adds bch2_journal_log_msg(), which just logs a message to the journal, and uses it to mark startup and when journal replay finishes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
fa8e94fa |
|
25-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Heap allocate printbufs This patch changes printbufs dynamically allocate and reallocate a buffer as needed. Stack usage has become a bit of a problem, and a major cause of that has been static size string buffers on the stack. The most involved part of this refactoring is that printbufs must now be exited with printbuf_exit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
78c8fe20 |
|
19-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Normal update/commit path now works before going RW This improves __bch2_trans_commit - early in the recovery process, when we're running btree_gc and before we want to go RW, it now uses bch2_journal_key_insert() to add the update to the list of updates for journal replay to do, instead of btree_gc having to use separate interfaces depending on whether we're running at bringup or, later, runtime. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
12bf93a4 |
|
20-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add .to_text() methods for all superblock sections This patch improves the superblock .to_text() methods and adds methods for all types that were missing them. It also improves printbufs by allowing them to specfiy what units we want to be printing in, and adds new wrapper methods for unifying our kernel and userspace environments. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
8ccf4dff |
|
19-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: opts.read_journal_only Add an option that tells recovery to only read the journal, to be used by the list_journal command. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
10b93677 |
|
19-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Delete some flag bits that are no longer used Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
0f78264a |
|
13-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Print a better message for mark and sweep pass Btree gc, aka mark and sweep, checks allocations - so let's just print that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
ec061b21 |
|
25-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_gc no longer uses main in-memory bucket array This changes the btree_gc code to only use the second bucket array, the one dedicated to GC. On completion, it compares what's in its in memory bucket array to the allocation information in the btree and writes it directly, instead of updating the main in-memory bucket array and writing that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
45e4cd9e |
|
24-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: run_one_trigger() now checks journal keys Previously, when doing updates and running triggers before journal replay completes, triggers would see the incorrect key for the old key being overwritten - this patch updates the trigger code to check the journal keys when necessary, needed for the upcoming allocator rewrite. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
9b6e2f1e |
|
04-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
Revert "bcachefs: Delete some obsolete journal_seq_blacklist code" This reverts commit f95b61228efd04c9c158123da5827c96e9773b29. It turns out, we're seeing filesystems in the wild end up with blacklisted btree node bsets - this should not be happening, and until we understand why and fix it we need to keep this code around. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
03ea3962 |
|
04-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Log & error message improvements - Add a shim uuid_unparse_lower() in the kernel, since %pU doesn't work in userspace - We don't need to print the bcachefs: or the filesystem name prefix in userspace - Improve a few error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
365f64f3 |
|
03-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add verbose log messages for journal read Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
fe312f81 |
|
03-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use kvmalloc() for array of sorted keys in journal replay Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
d8601afc |
|
27-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Simplify journal replay With BTREE_ITER_WITH_JOURNAL, there's no longer any restrictions on the order we have to replay keys from the journal in, and we can also start up journal reclaim right away - and delete a bunch of code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
5222a460 |
|
25-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_WITH_JOURNAL This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is automatically enabled when initializing a btree iterator before journal replay has completed - it overlays the contents of the journal with the btree. This lets us delete bch2_btree_and_journal_walk() and just use the normal btree iterator interface instead - which also lets us delete a significant amount of duplicated code. Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch - we're redoing the binary search over keys in the journal every time we call bch2_btree_iter_peek(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f28620c1 |
|
01-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Tweak journal reclaim order Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
cd7c2d3d |
|
01-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make sure BCH_FS_FSCK_DONE gets set If we're not running fsck we still want to set BCH_FS_FSCK_DONE, so that bch2_fsck_err() calls are interpreted as bch2_inconsistent_error() calls(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
dfd41fb9 |
|
31-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix race between btree updates & journal replay Add a flag to indicate whether a journal replay key has been overwritten, and set/test it with appropriate btree locks held. This fixes a race between the allocator - invalidating buckets, and doing btree updates - and journal replay, which before this patch could clobber the allocator thread's update with an older version of the key from the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
528b18e6 |
|
31-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_journal_entry_to_text() This adds a _to_text() pretty printer for journal entries - including every subtype - which will shortly be used by the 'bcachefs list_journal' subcommand. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
5ba2fd11 |
|
29-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal replay does't resort main list of keys The upcoming BTREE_ITER_WITH_JOURNAL patch will require journal keys to stay in sorted order, so the btree iterator code can overlay them over btree keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
d93cf685 |
|
27-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Run scan_old_btree_nodes after version upgrade In the recovery path, we scan for old btree nodes if we don't have certain compat bits set. If we do this, we should be doing it after we upgraded to the newest on disk format. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
04f0f77d |
|
26-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Delete some obsolete journal_seq_blacklist code Since metadata version bcachefs_metadata_version_btree_ptr_sectors_written, we haven't needed the journal seq blacklist mechanism for ignoring blacklisted btree node writes - we now only need it for ignoring journal entries that were written after the newest flush journal entry, and then we only need to keep those blacklist entries around until journal replay is finished. That means we can delete the code for scanning btree nodes to GC journal_seq_blacklist entries. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
e75b2d4c |
|
23-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_journal_key_insert() no longer transfers ownership bch2_journal_key_insert() used to assume that the key passed to it was allocated with kmalloc(), and on success took ownership. This patch deletes that behaviour, making it more similar to bch2_trans_update()/bch2_trans_commit(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
c64740ef |
|
30-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't start allocator threads too early If the allocator threads start before journal replay has finished replaying alloc keys, journal replay might overwrite the allocator's btree updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
77170d0d |
|
24-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_bucket_alloc_new_fs() no longer depends on bucket marks Now that bch2_bucket_alloc_new_fs() isn't looking at bucket marks to decide what buckets are eligible to allocate, we can clean up the filesystem initialization and device add paths. Previously, we had to use ancient code to mark superblock/journal buckets in the in memory bucket marks as we allocated them, and then zero that out and re-do that marking using the newer transational bucket mark paths. Now, we can simply delete the in-memory bucket marking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
09943313 |
|
24-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rewrite bch2_bucket_alloc_new_fs() This changes bch2_bucket_alloc_new_fs() to a simple bump allocator that doesn't need to use the in memory bucket array, part of a larger patch series to entirely get rid of the in memory bucket array, except for gc/fsck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
fb0e4808 |
|
10-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_alloc_write() This adds a new helper that much like the one we have for inode updates, that allocates the packed alloc key, packs it and calls bch2_trans_update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
990d42d1 |
|
04-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out struct gc_stripe from struct stripe We have two radix trees of stripes - one that mirrors some information from the stripes btree in normal operation, and another that GC uses to recalculate block usage counts. The normal one is now only used for finding partially empty stripes in order to reuse them - the normal stripes radix tree and the GC stripes radix tree are used significantly differently, so this patch splits them into separate types. In an upcoming patch we'll be replacing c->stripes with a btree that indexes stripes by the order we want to reuse them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
9be1efe9 |
|
15-Nov-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix error reporting from bch2_journal_flush_seq - bch2_journal_halt() was unconditionally overwriting j->err_seq, the sequence number that we failed to write - journal_write_done was updating seq_ondisk and flushed_seq_ondisk even for writes that errored, which broke the way bch2_journal_flush_seq_async() locklessly checked for completions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
0a84a066 |
|
15-Nov-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Also log device name in userspace Change log messages in userspace to be closer to what they are in kernel space, and include the device name - it's also useful in userspace. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
697e546f |
|
26-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Refactor journal replay code This consolidates duplicated code in journal replay - it's only a few flags that are different for replaying alloc keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
3e52c222 |
|
29-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add journal_seq to inode & alloc keys Add fields to inode & alloc keys that record the journal sequence number when they were most recently modified. For alloc keys, this is needed to know what journal sequence number we have to flush before the bucket can be reused. Currently this is tracked in memory, but we'll be getting rid of the in memory bucket array. For inodes, this is needed for fsync when the inode has been evicted from the vfs cache. Currently we use a bloom filter per outstanding journal buf - but that mechanism has been broken since we added the ability to not issue a flush/fua for every journal write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
904823de |
|
29-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert bch2_mark_key() to take a btree_trans * This helps to unify the interface between bch2_mark_key() and bch2_trans_mark_key() - and it also gives access to the journal reservation and journal seq in the mark_key path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
8325cd1e |
|
27-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't do upgrades in nochanges mode nochanges mode is often used for getting data off of otherwise nonrecoverable filesystems, which is often because of errors hit during fsck. Don't force version upgrade & fsck in nochanges mode, so that it's more likely to mount. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
4db65027 |
|
11-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Subvol dirents are now only visible in parent subvol This changes the on disk format for dirents that point to subvols so that they also record the subvolid of the parent subvol, so that we can filter them out in other subvolumes. This also updates the dirent code to do that filtering, and in particular tweaks the rename code - we need to ensure that there's only ever one dirent (counting multiplicities in different snapshots) that point to a subvolume. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
bfe88863 |
|
19-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New on disk format to fix reflink_p pointers We had a bug where reflink_p pointers weren't being initialized to 0, and when we started using the second word, things broke badly. This patch revs the on disk format version and adds cleanup code to zero out the second word of reflink_p pointers before we start using it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
0476fa94 |
|
27-Sep-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rev the on disk format version for snapshots This will cause the compat code to be run that creates entries in the subvolumes and snapshots btrees. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
42d23732 |
|
16-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Snapshot creation, deletion This is the final patch in the patch series implementing snapshots. This patch implements two new ioctls that work like creation and deletion of directories, but fancier. - BCH_IOCTL_SUBVOLUME_CREATE, for creating new subvolumes and snaphots - BCH_IOCTL_SUBVOLUME_DESTROY, for deleting subvolumes and snapshots Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
6fed42bb |
|
15-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Plumb through subvolume id To implement snapshots, we need every filesystem btree operation (every btree operation without a subvolume) to start by looking up the subvolume and getting the current snapshot ID, with bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode. This patch adds those bch2_subvolume_get_snapshot() calls, and also switches to passing around a subvol_inum instead of just an inode number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
14b393ee |
|
15-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Subvolumes, snapshots This patch adds subvolume.c - support for the subvolumes and snapshots btrees and related data types and on disk data structures. The next patches will start hooking up this new code to existing code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
67e0dd8f |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_path This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
78cf784e |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Further reduce iter->trans usage This is prep work for splitting btree_path out from btree_iter - btree_path will not have a pointer to btree_trans. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
8dd6ed94 |
|
23-Jul-2021 |
Brett Holman <bholman.devel@gmail.com> |
bcachefs: add progress stats to sysfs This adds progress stats to sysfs for copygc, rebalance, recovery, and the cmd_job ioctls. Signed-off-by: Brett Holman <bholman.devel@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
877da05f |
|
30-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Zero out mem_ptr field in btree ptr keys from journal replay This fixes a bad ptr deref on recovery from unclean shutdown in bch2_btree_node_get_noiter(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
9f1833ca |
|
10-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Update btree ptrs after every write This closes a significant hole (and last known hole) in our ability to verify metadata. Previously, since btree nodes are log structured, we couldn't detect lost btree writes that weren't the first write to a given node. Additionally, this seems to have lead to some significant metadata corruption on multi device filesystems with metadata replication: since a write may have made it to one device and not another, if we read that btree node back from the replica that did have that write and started appending after that point, the other replica would have a gap in the bset entries and reading from that replica wouldn't find the rest of the bsets. But, since updates to interior btree nodes are now journalled, we can close this hole by updating pointers to btree nodes after every write with the currently written number of sectors, without negatively affecting performance. This means we will always detect lost or corrupt metadata - it also means that our btree is now a curious hybrid of COW and non COW btrees, with all the benefits of both (excluding complexity). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
8c3f6da9 |
|
14-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve iter->should_be_locked Adding iter->should_be_locked introduced a regression where it ended up not being set on the iterator passed to bch2_btree_update_start(), which is definitely not what we want. This patch requires it to be set when calling bch2_trans_update(), and adds various fixups to make that happen. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
90d22a66 |
|
10-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix overflow in journal_replay_entry_early If filesystem on disk was used by a version with a larger BCH_DATA_NR thas the currently running version, we don't want this to cause a buffer overrun. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
c0ebe3e4 |
|
23-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Assorted endianness fixes Found by sparse Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
3a402c8d |
|
07-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix some refcounting bugs We really need debug mode assertions that ca->ref and ca->io_ref are used correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ac1019d3 |
|
29-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Clean up bch2_btree_and_journal_walk() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
aae15aaf |
|
24-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New and improved topology repair code This splits out btree topology repair into a separate pass, and makes some improvements: - When we have to pick which of two overlapping nodes to drop keys from, we use the btree node header sequence number to preserve the newer node - the gc code has been changed so that it doesn't bail out if we're continuing/ignoring on fsck error - this way the dump tool can skip running the repair pass but still walk all reachable metadata - add a new superblock flag indicating when a filesystem is known to have btree topology issues, and the topology repair pass should be run - changing the start/end of a node might mean keys in that node have to be deleted: this patch handles that better by splitting it out into a separate function and running it explicitly in the topology repair code, previously those keys were only being dropped when the btree node was read in. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4932e07e |
|
24-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix key cache assertion Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d62ab355 |
|
14-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix bch2_trans_mark_dev_sb() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
423300e8 |
|
13-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BCH_BEATURE_atomic_nlink is obsolete Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8a85b20c |
|
06-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Inode backpointers are now required This lets us simplify fsck quite a bit, which we need for making fsck snapshot aware. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3a14d58e |
|
06-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Drop bch2_fsck_inode_nlink() We've had BCH_FEATURE_atomic_nlink for quite some time, we can drop this now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e751c01a |
|
24-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Start using bpos.snapshot field This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4cf91b02 |
|
04-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out bpos_cmp() and bkey_cmp() With snapshots, we're going to need to differentiate between comparisons that should and shouldn't include the snapshot field. bpos_cmp is now the comparison function that does include the snapshot field, used by core btree code. Upper level filesystem code generally does _not_ want to compare against the snapshot field - that code wants keys to compare as equal even when one of them is in an ancestor snapshot. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
73590619 |
|
21-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't unconditially version_upgrade in initialize This is mkfs's job. Also, clean up the handling of feature bits some. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c7bb769c |
|
19-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: __bch2_trans_get_iter() refactoring, BTREE_ITER_NOT_EXTENTS Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7d6f07ed |
|
04-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix compat code for superblock The bkey compat code wasn't being run for btree roots in the superblock clean section - this patch fixes it to use the journal entry validate code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
41f8b09e |
|
20-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rename BTREE_ID enums for consistency with other enums Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f2785955 |
|
19-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill support for !BTREE_NODE_NEW_EXTENT_OVERWRITE() bcachefs has been aggressively migrating filesystems and btree nodes to the new format for quite some time - this shouldn't affect anyone anymore, and lets us delete a _lot_ of code. Also, it frees up KEY_TYPE_discard for a new whiteout key type for snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
41e37786 |
|
16-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Bring back metadata only gc This is useful for the filesystem dump debugging tool - when we're hitting bugs we want to skip as much of the recovery process as possible, and the dump tool only needs to know where metadata lives. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
19dd3172 |
|
04-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use x-macros for compat feature bits This is to generate strings for them, so that we can print them out. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e01dacf7 |
|
20-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix bkey format generation for 32 bit fields Having a packed format that can represent a field larger than the unpacked type breaks bkey_packed_successor() assertions - we need to fix this to start using the snapshot filed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a4805d66 |
|
22-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Scan for old btree nodes if necessary on mount We dropped support for !BTREE_NODE_NEW_EXTENT_OVERWRITE but it turned out there were people who still had filesystems with btree nodes in that format in the wild. This adds a new compat feature that indicates we've scanned for and rewritten nodes in the old format, and does that scan at mount time if the option isn't set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dab9ef0d |
|
23-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add error message for some allocation failures Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8042b5b7 |
|
10-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Extents may now cross btree node boundaries When snapshots arrive, we won't necessarily be able to arbitrarily split existis - when we need to split an existing extent, we'll have to check if the extent was overwritten in child snapshots and if so emit a whiteout for the split in the child snapshot. Because extents couldn't span btree nodes previously, journal replay would sometimes have to split existing extents. That's no good anymore, but fortunately since extent handling has already been lifted above most of the btree code there's no real need for that rule anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5d428c7c |
|
03-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Run fsck if BCH_FEATURE_alloc_v2 isn't set We're using BCH_FEATURE_alloc_v2 to also gate journalling updates to dev usage - we don't have the code for reconstructing this from buckets anymore, so we need to run fsck if it's not set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
180fb49d |
|
21-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal updates to dev usage This eliminates the need to scan every bucket to regenerate dev_usage at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2abe5420 |
|
21-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Persist 64 bit io clocks Originally, bcachefs - going back to bcache - stored, for each bucket, a 16 bit counter corresponding to how long it had been since the bucket was read from. But, this required periodically rescaling counters on every bucket to avoid wraparound. That wasn't an issue in bcache, where we'd perodically rewrite the per bucket metadata all at once, but in bcachefs we're trying to avoid having to walk every single bucket. This patch switches to persisting 64 bit io clocks, corresponding to the 64 bit bucket timestaps introduced in the previous patch with KEY_TYPE_alloc_v2. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a0b73c1c |
|
26-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add (partial) support for fixing btree topology When we walk the btrees during recovery, part of that is checking that btree topology is correct: for every interior btree node, its child nodes should exactly span the range the parent node covers. Previously, we had checks for this, but not repair code. Now that we have the ability to do btree updates during initial GC, this patch adds that repair code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5b593ee1 |
|
26-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add support for doing btree updates prior to journal replay Some errors may need to be fixed in order for GC to successfully run - walk and mark all metadata. But we can't start the allocators and do normal btree updates until after GC has completed, and allocation information is known to be consistent, so we need a different method of doing btree updates. Fortunately, we already have code for walking the btree while overlaying keys from the journal to be replayed. This patch adds an update path that adds keys to the list of keys to be replayed by journal replay, and also fixes up iterators. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
079663d8 |
|
21-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill metadata only gc This was useful before we had transactional updates to interior btree nodes - but now, it's just extra unneeded complexity. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ac958006 |
|
14-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Factor out bch2_ec_stripes_heap_start() This fixes a bug where mark and sweep gc incorrectly was clearing out the stripes heap and causing assertions to fire later - simpler to just create the stripes heap after gc has finished. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
edfbba58 |
|
11-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add btree node prefetching to bch2_btree_and_journal_walk() bch2_btree_and_journal_walk() walks the btree overlaying keys from the journal; it was introduced so that we could read in the alloc btree prior to journal replay being done, when journalling of updates to interior btree nodes was introduced. But it didn't have btree node prefetching, which introduced a severe regression with mount times, particularly on spinning rust. This patch implements btree node prefetching for the btree + journal walk, hopefully fixing that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4291a331 |
|
08-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_alloc_write() should be writing for all devices Alloc info isn't stored on a particular device, it makes no sense to only be writing it out for rw members - this was causing fsck to not fix alloc info errors, oops. Also, make sure we write out alloc info in other repair paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
07a1006a |
|
17-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Reduce/kill BKEY_PADDED use With various newer key types - stripe keys, inline data extents - the old approach of calculating the maximum size of the value is becoming more and more error prone. Better to switch to bkey_on_stack, which can dynamically allocate if necessary to handle any size bkey. In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
719fe7fb |
|
10-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Update transactional triggers interface to pass old & new keys This is needed to fix a bug where we're overflowing iterators within a btree transaction, because we're updating the stripes btree (to update block counts) and the stripes btree trigger is unnecessarily updating the alloc btree - it doesn't need to update the alloc btree when the pointers within a stripe aren't changing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
adbcada4 |
|
14-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't require flush/fua on every journal write This patch adds a flag to journal entries which, if set, indicates that they weren't done as flush/fua writes. - non flush/fua journal writes don't update last_seq (i.e. they don't free up space in the journal), thus the journal free space calculations now check whether nonflush journal writes are currently allowed (i.e. are we low on free space, or would doing a flush write free up a lot of space in the journal) - write_delay_ms, the user configurable option for when open journal entries are automatically written, is now interpreted as the max delay between flush journal writes (default 1 second). - bch2_journal_flush_seq_async is changed to ensure a flush write >= the requested sequence number has happened - journal read/replay must now ignore, and blacklist, any journal entries newer than the most recent flush entry in the journal. Also, the way the read_entire_journal option is handled has been improved; struct journal_replay now has an entry, 'ignore', for entries that were read but should not be used. - assorted refactoring and improvements related to journal read in journal_io.c and recovery.c Previously, we'd have to issue a flush/fua write every time we accumulated a full journal entry - typically the bucket size. Now we need to issue them much less frequently: when an fsync is requested, or it's been more than write_delay_ms since the last flush, or when we need to free up space in the journal. This is a significant performance improvement on many write heavy workloads. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ebb84d09 |
|
13-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Increase journal pipelining This patch increases the maximum journal buffers in flight from 2 to 4 - this will be particularly helpful when in the future we stop requiring flush+fua for every journal write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3eb26d01 |
|
01-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_trans_get_iter() no longer returns errors Since we now always preallocate the maximum number of iterators when we initialize a btree transaction, getting an iterator never fails - we can delete a fair amount of error path code. This patch also simplifies the iterator allocation code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2e9f3b88 |
|
01-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use BTREE_ITER_PREFETCH in journal+btree iter Introducing the journal+btree iter introduced a regression where we stopped using BTREE_ITER_PREFETCH - this is a performance regression on rotating disks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5731cf01 |
|
29-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix journal reclaim spinning in recovery We can't run journal reclaim until we've finished replaying updates to interior btree nodes - the check for this was in the wrong place though, leading to journal reclaim spinning before it was allowed to proceed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6d758368 |
|
13-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a btree transaction iter overflow extent_replay_key dates from before putting iterators was required - fixed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a3e72262 |
|
05-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New varints Previous varint implementation used by the inode code was not nearly as fast as it could have been; partly because it was attempting to encode integers up to 96 bits (for timestamps) but this meant that encoding and decoding the length required a table lookup. Instead, we'll just encode timestamps greater than 64 bits as two separate varints; this will make decoding/encoding of inodes significantly faster overall. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
33114c2d |
|
24-Oct-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Drop alloc keys from journal when -o reconstruct_alloc This fixes a bug where we'd pop an assertion due to replaying a key for an interior btree node when that node no longer exists. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8d6b6222 |
|
16-Oct-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improvements to writing alloc info Now that we've got transactional alloc info updates (and have for awhile), we don't need to write it out on shutdown, and we don't need to write it out on startup except when GC found errors - this is a big improvement to mount/unmount performance. This patch also fixes a few bugs where we weren't writing out alloc info (on new filesystems, and new devices) and should have been. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9f115ce9 |
|
04-Aug-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a bug with the journal_seq_blacklist mechanism Previously, we would start doing btree updates before writing the first journal entry; if this was after an unclean shutdown, this could cause those btree updates to not be blacklisted. Also, move some code to headers for userspace debug tools. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f621e152 |
|
20-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add an option for rebuilding the replicas section There is a bug where we cnan end up clearing the data_has field in the superblock members section, which causes us to skip reading the journal and thus journal replay fails. This option tells the recovery path to not trust those fields. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
697e45b2 |
|
06-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill BTREE_TRIGGER_NOOVERWRITES This is prep work for reworking the triggers machinery - we have triggers that need to know both the old and the new key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5d20ba48 |
|
04-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use cached iterators for alloc btree Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7fffc85b |
|
13-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add an internal option for reading entire journal To be used the debug tool that dumps the contents of the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
61fc3c96 |
|
03-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Set filesystem features earlier in fs init path Before we were setting features after allocating btree nodes, which meant we were using the old btree pointer format. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
00b8ccf7 |
|
25-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Interior btree updates are now fully transactional We now update the alloc info (bucket sector counts) atomically with journalling the update to the interior btree nodes, and we also set new btree roots atomically with the journalled part of the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b2930396 |
|
24-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix reading of alloc info after unclean shutdown When updates to interior nodes started being journalled, that meant that after an unclean shutdown, until journal replay is done we can't walk the btree without overlaying the updates from the journal. The initial btree gc was changed to walk the btree overlaying keys from the journal - but bch2_alloc_read() and bch2_stripes_read() were missed. Major whoops... Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b58a181d |
|
30-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix iterating of journal keys within a btree node Extent btrees no longer have weird special behaviour for min_key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5a655f06 |
|
28-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Read journal when keep_journal on Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f1d786a0 |
|
25-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add an option for keeping journal entries after startup This will be used by the userspace debug tools. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2f194e16 |
|
25-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix an assertion when nothing to replay Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f44a6a71 |
|
15-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Replay interior node keys This slightly modifies the journal replay code so that it can replay updates to interior nodes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e62d65f2 |
|
15-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: trans_commit() path can now insert to interior nodes This will be needed for the upcoming patches to journal updates to interior btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5d548743 |
|
16-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Clear BCH_FEATURE_extents_above_btree_updates on clean shutdown This is needed so that users can roll back to before "d9bb516b2d bcachefs: Move extent overwrite handling out of core btree code", which it appears may still be buggy. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e3e464ac |
|
30-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Move extent overwrite handling out of core btree code Ever since the btree code was first written, handling of overwriting existing extents - including partially overwriting and splittin existing extents - was handled as part of the core btree insert path. The modern transaction and iterator infrastructure didn't exist then, so that was the only way for it to be done. This patch moves that outside of the core btree code to a pass that runs at transaction commit time. This is a significant simplification to the btree code and overall reduction in code size, but more importantly it gets us much closer to the core btree code being completely independent of extents and is important prep work for snapshots. This introduces a new feature bit; the old and new extent update models are incompatible when the filesystem needs journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3186c80f |
|
05-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Skip 0 size deleted extents in journal replay These are created by the new extent update path, but not used yet by the recovery code and they break the existing recovery code, so we can just skip them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f6d0368e |
|
09-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Traverse iterator in journal replay This fixes a bug where we end up spinning in journal replay - in theory this shouldn't be necessary though, transaction reset should be re-traversing all iterators. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
27beb810 |
|
07-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix another iterator leak Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b807a0c8 |
|
26-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BCH_SB_FEATURES_ALL BCH_FEATURE_btree_ptr_v2 wasn't getting set on new filesystems, oops Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
548b3d20 |
|
07-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_ptr_v2 Add a new btree ptr type which contains the sequence number (random 64 bit cookie, actually) for that btree node - this lets us verify that when we read in a btree node it really is the btree node we wanted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5c4a5cd5 |
|
27-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_and_journal_iter Introduce a new iterator that iterates over keys in the btree with keys from the journal overlaid on top. This factors out what the erasure coding init code was doing manually. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2d594dfb |
|
31-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out btree_trigger_flags The trigger flags really belong with individual btree_insert_entries, not the transaction commit flags - this splits out those flags and unifies them with the BCH_BUCKET_MARK flags. Todo - split out btree_trigger.c from buckets.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bcd6f3e0 |
|
26-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use KEY_TYPE_deleted whitouts for extents Previously, partial overwrites of existing extents were handled implicitly by the btree code; when reading in a btree node, we'd do a mergesort of the different bsets and detect and fix partially overlapping extents during that mergesort. That approach won't work with snapshots: this changes extents to work like regular keys as far as the btree code is concerned, where a 0 size KEY_TYPE_deleted whiteout will completely overwrite an existing extent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1c3ff72c |
|
28-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert some enums to x-macros Helps for preventing things from getting out of sync. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e731d466 |
|
26-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't export __bch2_fs_read_write BTREE_INSERT_LAZY_RW was added for this since this code was written; use it instead. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8b3bbe2c |
|
24-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't reexecute triggers when retrying transaction commit This was causing a bug with transaction iterators overflowing; now, if triggers have to be reexecuted we always return -EINTR and retry from the start of the transaction. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
58e2388f |
|
22-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill BTREE_INSERT_ATOMIC Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b1fd23df |
|
22-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert all bch2_trans_commit() users to BTREE_INSERT_ATOMIC BTREE_INSERT_ATOMIC should really be the default mode, and there's not that much code that doesn't need it - so this is prep work for getting rid of the flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ba239c95 |
|
29-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_check_set_feature() New helper function for setting incompatible feature bits Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4de77495 |
|
16-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Reorganize extents.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4be1a412 |
|
09-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Inline data extents This implements extents that have their data inline, in the value, instead of the bkey value being pointers to the data - and the read and write paths are updated to read from these new extent types and write them out, when the write size is small enough. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
085ab693 |
|
09-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rework of cut_front & cut_back This changes bch2_cut_front and bch2_cut_back so that they're able to shorten the size of the value, and it also changes the extent update path to update the accounting in the btree node when this happens. When the size of the value is shortened, they zero out the space that's no longer used, so it's interpreted as noops (as implemented in the last patch). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b627c7d8 |
|
09-Nov-2019 |
Justin Husted <sigstop@gmail.com> |
bcachefs: Set lost+found mode to 0700 For security and conformance with other filesystems, the lost+found directory should not be world or group accessible. Signed-off-by: Justin Husted <sigstop@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ff929515 |
|
28-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Trust btree alloc info at runtime This lets us avoid a cache miss in the write path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
05240ba6 |
|
11-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix creation of lost+found Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a40d97a7 |
|
07-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix incorrect use of bch2_extent_atomic_end() bch2_extent_atomic_end counts the number of iterators requried for marking overwrites - but journal replay never marks overwrites, so that part was incorrect. And counting iterators for the key being inserted should be unnecessary because we did that prior to the key being inserted before it was first journalled. This should fix an iterator overflow bug - the iterators for walking overwrites were totally unneeded. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
137b0ed9 |
|
04-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_extent_atomic_end() now traverses iter This fixes a bug in io.c bch2_write_index_default() - it was missing the traverse call, but bch2_extent_atomic_end returns an error now and can just call it itself. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
96385742 |
|
02-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Factor out fs-common.c This refactoring makes the code easier to understand by separating the bcachefs btree transactional code from the linux VFS code - but more importantly, it's also to share code with the fuse port. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a7199432 |
|
22-Sep-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill deferred btree updates Will be replaced by cached btree iterators Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f9c55193 |
|
07-Sep-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Drop trans arg to bch2_extent_atomic_end() Just for consistency Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
89b05118 |
|
06-Sep-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Flush fsck errors when looping in btree gc Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ad7e137e |
|
28-Aug-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Switch reconstruct_alloc to a mount option Right now this is the only way of repairing bucket gens in the future Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
06f6c3ec |
|
27-Aug-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Reflink pointers also have to be remarked if split in journal replay Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
76426098 |
|
16-Aug-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Reflink Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3c7f3b7a |
|
16-Aug-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Refactor bch2_extent_trim_atomic() for reflink Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2cbe5cfe |
|
09-Aug-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rework calling convention for marking overwrites Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e222d206 |
|
12-Jul-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix ec_stripes_read() Change it to not mark keys that will be overwritten by keys in the journal - this fixes a bug where we pop an assertion in bucket_set_stripe() because of a stale pointer - because the stripe that has the stale pointer has been deleted. This code could be factored out and used elsewhere, at some point. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2ded276b |
|
24-Jun-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix array overrun with unknown btree roots Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f707e3d8 |
|
18-Jun-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: fix kasan splat Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6e738539 |
|
24-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve key marking interface Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3838be78 |
|
15-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't use a fixed size buffer for fs_usage_deltas Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
20bceecb |
|
15-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: More work to avoid transaction restarts Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
619f5bee |
|
17-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: some improvements to startup messages and options Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
932aa837 |
|
11-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_trans_mark_update() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5e82a9a1 |
|
10-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Write out fs usage consistently Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c6dd04f8 |
|
15-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Mark overwrites from journal replay in initial gc Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d0734356 |
|
11-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Deduplicate keys in the journal before replay Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
644d180b |
|
11-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal replay refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
478259b7 |
|
04-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: delete duplicated code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1dd7f9d9 |
|
04-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rewrite journal_seq_blacklist machinery Now, we store blacklisted journal sequence numbers in the superblock, not the journal: this helps to greatly simplify the code, and more importantly it's now implemented in a way that doesn't require all btree nodes to be visited before starting the journal - instead, we unconditionally blacklist the next 4 journal sequence numbers after an unclean shutdown. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a1d58243 |
|
29-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: add ability to run gc on metadata only Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7b512638 |
|
29-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Refactor bch2_fs_recovery() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0bc166ff |
|
28-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Track whether filesystem has errors in superblock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
03e183cb |
|
21-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Verify fs hasn't been modified before going rw Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
134915f3 |
|
21-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Go rw lazily Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6122ab63 |
|
21-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: More debug params for testing of recovery paths Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
05235e99 |
|
21-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Run gc if failed to read alloc btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
082f0801 |
|
21-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix error handling in bch2_fs_recovery() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1633e492 |
|
28-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: improved flush_held_btree_writes() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
768ac639 |
|
14-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a mechanism for blocking the journal Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f7e76361 |
|
10-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: no need to run gc when initializing new fs Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1df42b57 |
|
06-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: don't do initial gc if have alloc info feature Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3577df5f |
|
09-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: serialize persistent_reserved Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3e0745e2 |
|
24-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: initialize fs usage summary in recovery Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
42b72e0b |
|
24-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: journal_replay_early() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2c5af169 |
|
24-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: reserve space in journal for fs usage entries Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
94cd106f |
|
09-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: delete a debug printk Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
61c8d7c8 |
|
25-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Persist stripe blocks_used Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
73e6ab95 |
|
01-Dec-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Switch replicas to mark_lock Prep work for upcoming disk accounting changes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f0cfb963 |
|
29-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Track nr_inodes with the key marking machinery Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
26609b61 |
|
01-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make bkey types globally unique this lets us get rid of a lot of extra switch statements - in a lot of places we dispatch on the btree node type, and then the key type, so this is a nice cleanup across a lot of code. Also improve the on disk format versioning stuff. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dfe9bfb3 |
|
24-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Stripes now properly subject to gc gc now verifies the contents of the stripes radix tree, important for persistent alloc info Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4e65431c |
|
23-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
Revert "bcachefs: start erasure coding after journal replay" This reverts commit 36f389604294dfc953e6f5624ceb683818d32f28. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
129550c4 |
|
18-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: start erasure coding after journal replay Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cd575ddf |
|
01-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Erasure coding Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
af9d3bc2 |
|
30-Oct-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: stripe support for replicas tracking Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8b335bae |
|
04-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Assorted fixes for running on very small devices It's now possible to create and use a filesystem on a 512k device with 4k buckets (though at that size we still waste almost half to internal reserves) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
72644db1 |
|
03-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix an assertion when rebuilding replicas Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
02f1a96c |
|
03-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rename nofsck opt to fsck Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7b3f84ea |
|
05-Oct-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out alloc_background.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fc3268c1 |
|
08-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill extent_insert_hook Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
88c07f73 |
|
14-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Only check inode i_nlink during full fsck Now that all filesystem operatinos that manipulate the filesystem heirachy and i_nlink are fully atomic, we can add a feature bit to indicate i_nlink doesn't need to be checked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1c6fdbd8 |
|
17-Mar-2017 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Initial commit Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write filesystem with every feature you could possibly want. Website: https://bcachefs.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|