#
605109ff |
|
17-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: KEY_TYPE_error is allowed for reflink KEY_TYPE_error is left behind when we have to delete all pointers in an extent in fsck; it allows errors to be correctly returned by reads later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
27c15ed2 |
|
12-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_member.btree_allocated_bitmap This adds a small (64 bit) per-device bitmap that tracks ranges that have btree nodes, for accelerating btree node scan if it is ever needed. - New helpers, bch2_dev_btree_bitmap_marked() and bch2_dev_bitmap_mark(), for checking and updating the bitmap - Interior btree update path updates the bitmaps when required - The check_allocations pass has a new fsck_err check, btree_bitmap_not_marked - New on disk format version, mi_btree_mitmap, which indicates the new bitmap is present - Upgrade table lists the required recovery pass and expected fsck error - Btree node scan uses the bitmap to skip ranges if we're on the new version Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9abb6dd7 |
|
12-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Standardize helpers for printing enum strs with bounds checks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5ab4beb7 |
|
08-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't scan for btree nodes when we can reconstruct Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
55936afe |
|
15-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Flag btrees with missing data We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b07ce726 |
|
15-Feb-2024 |
Thomas Bertschinger <tahbertschinger@gmail.com> |
bcachefs: omit alignment attribute on big endian struct bkey This is needed for building Rust bindings on big endian architectures like s390x. Currently this is only done in userspace, but it might happen in-kernel in the future. When creating a Rust binding for struct bkey, the "packed" attribute is needed to get a type with the correct member offsets in the big endian case. However, rustc does not allow types to have both a "packed" and "align" attribute. Thus, in order to get a Rust type compatible with the C type, we must omit the "aligned" attribute in C. This does not affect the struct's size or member offsets, only its toplevel alignment, which should be an acceptable impact. The little endian version can have the "align" attribute because the "packed" attr is redundant, and rust-bindgen will omit the "packed" attr when an "align" attr is present and it can do so without changing a type's layout Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b26d7914 |
|
21-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BTREE_ID_subvolume_children Add a btree to record a parent -> child subvolume relationships, according to the filesystem heirarchy. The subvolume_children btree is a bitset btree: if a bit is set at pos p, that means p.offset is a child of subvolume p.inode. This will be used for efficiently listing subvolumes, as well as recursive deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b8628a25 |
|
08-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_subvolume::fs_path_parent Record the filesystem path heirarchy for subvolumes in bch_subvolume Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
52f7d75e |
|
27-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: jset_entry_datetime This gives us a way to record the date and time every journal entry was written - useful for debugging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d826cc57 |
|
21-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: logged_ops_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8d52ba60 |
|
21-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: reflink_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b2fa1b63 |
|
21-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs; extents_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0560eb9a |
|
21-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: ec_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c6c4ff65 |
|
21-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: subvolume_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8fed323b |
|
21-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: snapshot_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d455179f |
|
20-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: alloc_background_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
72e08010 |
|
20-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: xattr_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7ffc4daa |
|
20-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: dirent_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b36425da |
|
20-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: inode_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
82de6207 |
|
20-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs; quota_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
43314801 |
|
20-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: sb-counters_format.h bcachefs_format.h has gotten too big; let's do some organizing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
12207f49 |
|
20-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: comment bch_subvolume Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d32088f2 |
|
20-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_snapshot::btime Add a field to bch_snapshot for creation time; this will be important when we start exposing the snapshot tree to userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
15eaaa4c |
|
03-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Upgrades now specify errors to fix, like downgrades Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6b00de06 |
|
27-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_member->seq Add new fields for split brain detection: - bch_member->seq, which tracks the sequence number of the last superblock write that happened to each member device - bch_sb->write_time, which tracks the time of the last superblock write, to allow detection of when two members have diverged but had the same number of superblock writes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5e329145 |
|
20-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Check journal entries for invalid keys in trans commit path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e06af207 |
|
15-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: fix userspace build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
09caeabe |
|
02-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree write buffer now slurps keys from journal Previosuly, the transaction commit path would have to add keys to the btree write buffer as a separate operation, requiring additional global synchronization. This patch introduces a new journal entry type, which indicates that the keys need to be copied into the btree write buffer prior to being written out. We switch the journal entry type back to JSET_ENTRY_btree_keys prior to write, so this is not an on disk format change. Flushing the btree write buffer may require pulling keys out of journal entries yet to be written, and quiescing outstanding journal reservations; we previously added journal->buf_lock for synchronization with the journal write path. We also can't put strict bounds on the number of keys in the journal destined for the write buffer, which means we might overflow the size of the preallocated buffer and have to reallocate - this introduces a potentially fatal memory allocation failure. This is something we'll have to watch for, if it becomes an issue in practice we can do additional mitigation. The transaction commit path no longer has to explicitly check if the write buffer is full and wait on flushing; this is another performance optimization. Instead, when the btree write buffer is close to full we change the journal watermark, so that only reservations for journal reclaim are allowed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
56db2429 |
|
02-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve btree write buffer tracepoints - add a tracepoint for write_buffer_flush_sync; this is expensive - fix the write_buffer_flush_slowpath tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9b34f02c |
|
23-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill dev_usage->buckets_ec This counter is redundant; it's simply the sum of BCH_DATA_stripe and BCH_DATA_parity buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
086a52f7 |
|
09-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Rename bch_replicas_entry -> bch_replicas_entry_v1 Prep work for introducing bch_replicas_entry_v2 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1ae8a090 |
|
16-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill memset() in bch2_btree_iter_init() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
84f16387 |
|
29-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_sb_field_downgrade Add a new superblock section that contains a list of { minor version, recovery passes, errors_to_fix } that is - a list of recovery passes that must be run when downgrading past a given version, and a list of errors to silently fix. The upcoming disk accounting rewrite is not going to be fully compatible: we're going to have to regenerate accounting both when upgrading to the new version, and also from downgrading from the new version, since the new method of doing disk space accounting is a completely different architecture based on deltas, and synchronizing them for every jounal entry write to maintain compatibility is going to be too expensive and impractical. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8b16413c |
|
29-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_sb.recovery_passes_required Add two new superblock fields. Since the main section of the superblock is now fully, we have to add a new variable length section for them - bch_sb_field_ext. - recovery_passes_requried: recovery passes that must be run on the next mount - errors_silent: errors that will be silently fixed These are to improve upgrading and dwongrading: these fields won't be cleared until after recovery successfully completes, so there won't be any issues with crashing partway through an upgrade or a downgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ae4d612c |
|
26-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: trace_move_extent_start_fail() now includes errcode Renamed from trace_move_extent_alloc_mem_fail, because there are other reasons we colud fail (disk space allocation failure). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3f3ae125 |
|
24-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bpos is misaligned on big endian bkey embeds a bpos that is misaligned on big endian; this is so that bch2_bkey_swab() works correctly without having to differentiate between packed and non-packed keys (a debatable design decision). This means it can't have the __aligned() tag on big endian. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
103ffe9a |
|
02-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: x-macro-ify inode flags enum This lets us use bch2_prt_bitflags to print them out. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d3c7727b |
|
03-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: rebalance_work btree is not a snapshots btree rebalance_work entries may refer to entries in the extents btree, which is a snapshots btree, or they may also refer to entries in the reflink btree, which is not. Hence rebalance_work keys may use the snapshot field but it's not required to be nonzero - add a new btree flag to reflect this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6dfa10ab |
|
31-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix build errors with gcc 10 gcc 10 seems to complain about array bounds in situations where gcc 11 does not - curious. This unfortunately requires adding some casts for now; we may investigate getting rid of our __u64 _data[] VLA in a future patch so that our start[0] members can be VLAs. Reported-by: John Stoffel <john@stoffel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f5d26fa3 |
|
25-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_sb_field_errors Add a new superblock section to keep counts of errors seen since filesystem creation: we'll be addingcounters for every distinct fsck error. The new superblock section has entries of the for [ id, count, time_of_last_error ]; this is intended to let us see what errors are occuring - and getting fixed - via show-super output. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
94119eeb |
|
25-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add IO error counts to bch_member We now track IO errors per device since filesystem creation. IO error counts can be viewed in sysfs, or with the 'bcachefs show-super' command. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fb3f57bb |
|
20-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: rebalance_work This adds a new btree, rebalance_work, to eliminate scanning required for finding extents that need work done on them in the background - i.e. for the background_target and background_compression options. rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an extent in the extents or reflink btree at the same pos. A new extent field is added, bch_extent_rebalance, which indicates that this extent has work that needs to be done in the background - and which options to use. This allows per-inode options to be propagated to indirect extents - at least in some circumstances. In this patch, changing IO options on a file will not propagate the new options to indirect extents pointed to by that file. Updating (setting/clearing) the rebalance_work btree is done by the extent trigger, which looks at the bch_extent_rebalance field. Scanning is still requrired after changing IO path options - either just for a given inode, or for the whole filesystem. We indicate that scanning is required by adding a KEY_TYPE_cookie key to the rebalance_work btree: the cookie counter is so that we can detect that scanning is still required when an option has been flipped mid-way through an existing scan. Future possible work: - Propagate options to indirect extents when being changed - Add other IO path options - nr_replicas, ec, to rebalance_work so they can be applied in the background when they change - Add a counter, for bcachefs fs usage output, showing the pending amount of rebalance work: we'll probably want to do this after the disk space accounting rewrite (moving it to a new btree) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
40f7914e |
|
24-Sep-2023 |
Hunter Shaffer <huntershaffer182456@gmail.com> |
bcachefs: Add iops fields to bch_member Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9af26120 |
|
24-Sep-2023 |
Hunter Shaffer <huntershaffer182456@gmail.com> |
bcachefs: Rename bch_sb_field_members -> bch_sb_field_members_v1 Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3f7b9713 |
|
24-Sep-2023 |
Hunter Shaffer <huntershaffer182456@gmail.com> |
bcachefs: New superblock section members_v2 members_v2 has dynamically resizable entries so that we can extend bch_member. The members can no longer be accessed with simple array indexing Instead members_v2_get is used to find a member's exact location within the array and returns a copy of that member. Alternatively member_v2_get_mut retrieves a mutable point to a member. Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
96dea3d5 |
|
12-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix W=12 build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f3e374ef |
|
10-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Log finsert/fcollapse operations Now that we have the logged operations btree, we can make finsert/fcollapse atomic w.r.t. unclean shutdown as well. This adds bch_logged_op_finsert to represent the state of an finsert or fcollapse, which is a bit more complicated than truncate since we need to track our position in the "shift extents" operation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b030e262 |
|
10-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Log truncate operations Previously, we guaranteed atomicity of truncate after unclean shutdown with the BCH_INODE_I_SIZE_DIRTY flag - which required a full scan of the inodes btree. Recently the deleted inodes btree was added so that we no longer have to scan for deleted inodes, but truncate was unfinished and that change left it broken. This patch uses the new logged operations btree to fix truncate atomicity; we now log an operation that can be replayed at the start of a truncate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
aaad530a |
|
27-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BTREE_ID_logged_ops Add a new btree for long running logged operations - i.e. for logging operations that we can't do within a single btree transaction, so that they can be resumed if we crash. Keys in the logged operations btree will represent operations in progress, with the state of the operation stored in the value. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5cfd6977 |
|
09-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Array bounds fixes It's no longer legal to use a zero size array as a flexible array member - this causes UBSAN to complain. This patch switches our zero size arrays to normal flexible array members when possible, and inserts casts in other places (e.g. where we use the zero size array as a marker partway through an array). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f55d6e07 |
|
17-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Cleanup redundant snapshot nodes After deleteing snapshots, we may be left with a snapshot tree where some nodes only have one child, and we have a linear chain. Interior snapshot nodes are never used directly (i.e. they never have subvolumes that point to them), they are only referered to by child snapshot nodes - hence, they are redundant. The existing code talks about redundant snapshot nodes as forming and equivalence class; i.e. nodes for which snapshot_t->equiv is equal. In a given equivalence class, we only ever need a single key at a given position - i.e. multiple versions with different snapshot fields are redundant. The existing snapshot cleanup code deletes these redundant keys, but not redundant nodes. It turns out this is buggy, because we assume that after snapshot deletion finishes we should only have a single key per equivalence class, but the btree update path doesn't preserve this - overwriting keys in old snapshots doesn't check for the equivalence class being equal, and thus we can end up with duplicate keys in the same equivalence class and fsck complaining about snapshot deletion not having run correctly. The equivalence class notion has been leaking out of the core snapshots code and into too much other code, i.e. fsck, so this patch takes a different approach: snapshot deletion now moves keys to the node in an equivalence class being kept (the leafiest node) and then deletes the redundant nodes in the equivalance class. Some work has to be done to correctly delete interior snapshot nodes; snapshot node depth and skiplist fields for descendent nodes have to be fixed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8e877caa |
|
16-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Split out snapshot.c subvolume.c has gotten a bit large, this splits out a separate file just for managing snapshot trees - BTREE_ID_snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a125c074 |
|
13-Aug-2023 |
Joshua Ashton <joshua@froggi.es> |
bcachefs: Lower BCH_NAME_MAX to 512 To ensure we aren't shooting ourselves in the foot after merge for potentially doing future revisions for dirent or for storing multiple names for casefolding, limit this to 512 for now. Previously this define was linked to the max size a d_name in bch_dirent could be. Signed-off-by: Joshua Ashton <joshua@froggi.es> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dde8cb11 |
|
16-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bcachefs_metadata_version_deleted_inodes Add a new bitset btree for inodes pending deletion; this means we no longer have to scan the full inodes btree after an unclean shutdown. Specifically, this adds: - a trigger to update the deleted_inodes btree based on changes to the inodes btree - a new recovery pass - and check_inodes is now only a fsck pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bf5a261c |
|
01-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Assorted fixes for clang clang had a few more warnings about enum conversion, and also didn't like the opts.c initializer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e8d2fe3b |
|
21-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Consolidate btree id properties This refactoring centralizes defining per-btree properties. bch2_key_types_allowed was also about to overflow a u32, so expand that to a u64. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e86e9124 |
|
12-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Extent sb compression type fields to 8 bits The upper 4 bits are for compression level. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a5cf5a4b |
|
12-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bcachefs_format.h should be using __u64 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f26c67f4 |
|
25-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Snapshot depth, skiplist fields This extents KEY_TYPE_snapshot to include some new fields: - depth, to indicate depth of this particular node from the root - skip[3], skiplist entries for quickly walking back up to the root These are to improve bch2_snapshot_is_ancestor(), making it O(ln(n)) instead of O(n) in the snapshot tree depth. Skiplist nodes are picked at random from the set of ancestor nodes, not some fixed fraction. This introduces bcachefs_metadata_version 1.1, snapshot_skiplists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
065bd335 |
|
10-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Version table now lists required recovery passes Now that we've got forward compatibility sorted out, we should be doing more frequent version upgrades in the future. To avoid having to run a full fsck for every version upgrade, this improves the BCH_METADATA_VERSIONS() table to explicitly specify a bitmask of recovery passes to run when upgrading to or past a given version. This means we can also delete PASS_UPGRADE(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ba8eeae8 |
|
27-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bcachefs_metadata_version_major_minor This introduces major/minor versioning to the superblock version number. Major version number changes indicate incompatible releases; we can move forward to a new major version number, but not backwards. Minor version numbers indicate compatible changes - these add features, but can still be mounted and used by old versions. With the recent patches that make it possible to roll out new btrees and key types without breaking compatibility, we should be able to roll out most new features without incompatible changes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3045bb95 |
|
27-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: version_upgrade is now an enum The version_upgrade parameter is now an enum, not a bool, and it's persistent in the superblock: - compatible (default): upgrade to the latest compatible version - incompatible: upgrade to latest incompatible version - none Currently all upgrades are incompatible upgrades, but the next release will introduce major:minor versions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
24964e1c |
|
28-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE() Version upgrades are not atomic operations: when we do a version upgrade we need to update the superblock before we start using new features, and then when the upgrade completes we need to update the superblock again. This adds a new superblock field so we can detect and handle incomplete version upgrades. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
73bd774d |
|
06-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Assorted sparse fixes - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a02a0121 |
|
28-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_version_compatible() This adds a new helper for checking if an on-disk version is compatible with the running version of bcachefs - prep work for introducing major:minor version numbers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2766876d |
|
27-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: struct bch_extent_rebalance This adds the extent entry for extents that rebalance needs to do something with. We're adding this ahead of the main rebalance_work patchset, because adding new extent entries can't be done in a forwards-compatible way. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4e1430a7 |
|
27-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Expand BTREE_NODE_ID We now have 20 bits for the btree ID in the on disk format - sufficient for 1 million distinct btrees. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
653693be |
|
29-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add otime, parent to bch_subvolume Add two new fields to bch_subvolume: - otime: creation time - parent: For snapshots, this is the id of the subvolume the snapshot was created from Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1c59b483 |
|
29-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BTREE_ID_snapshot_tree This adds a new btree which gets us a persistent per-snapshot-tree identifier. - BTREE_ID_snapshot_trees - KEY_TYPE_snapshot_tree - bch_snapshot now has a field that points to a snapshot_tree This is going to be used to designate one snapshot ID/subvolume out of a given tree of snapshots as the "main" subvolume, so that we can do quota accounting in that subvolume and not the rest. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
62a03559 |
|
31-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Rip out code for storing backpointers in alloc keys We don't store backpointers in alloc keys anymore, since we gained the btree write buffer. This patch drops support for backpointers in alloc keys, and revs the on disk format version so that we know a fsck is required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ac2ccddc |
|
04-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Drop some anonymous structs, unions Rust bindgen doesn't cope well with anonymous structs and unions. This patch drops the fancy anonymous structs & unions in bkey_i that let us use the same helpers for bkey_i and bkey_packed; since bkey_packed is an internal type that's never exposed to outside code, it's only a minor inconvenienc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
45dd05b3 |
|
04-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BKEY_PADDED_ONSTACK() Rust bindgen doesn't do anonymous structs very nicely: BKEY_PADDED() only needs the anonymous struct when it's used on the stack, to guarantee layout, not when it's embedded in another struct. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e151580d |
|
20-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add tracepoint & counter for btree split race Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
80c33085 |
|
05-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fragmentation LRU Now that we have much more efficient updates to the LRU btree, this patch adds a new LRU that indexes buckets by fragmentation. This means copygc no longer has to scan every bucket to find buckets that need to be evacuated. Changes: - A new field in bch_alloc_v4, fragmentation_lru - this corresponds to the bucket's position in the fragmentation LRU. We add a new field for this instead of calculating it as needed because we may make the fragmentation LRU optional; this field indicates whether a bucket is on the fragmentation LRU. Also, zoned devices will introduce variable bucket sizes; explicitly recording the LRU position will be safer for them. - A new copygc path for using the fragmentation LRU instead of scanning every bucket and building up an in-memory heap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
83f33d68 |
|
05-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Rework lru btree This patch changes how the LRU index works: Instead of using KEY_TYPE_lru where the bucket the lru entry points to is part of the value, this switches to KEY_TYPE_set and encoding the bucket we refer to in the low bits of the key. This means that we no longer have to check for collisions when inserting LRU entries. We'll be making using of this in the next patch, which adds a btree write buffer - a pure write buffer for btree updates, where updates are appended to a simple array and then periodically sorted and batch inserted. This is a new on disk format version, and a forced upgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5250b74d |
|
25-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bucket_gens btree To improve mount times, add a btree for just bucket gens, 256 of them per key: this means we'll have to scan drastically less metadata at startup. This adds - trigger for keeping it in sync with the all btree - initialization code, for filesystems from previous versions - new path for reading bucket gens - new fsck code And a new on disk format version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a8b3a677 |
|
02-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Nocow support This adds support for nocow mode, where we do writes in-place when possible. Patch components: - New boolean filesystem and inode option, nocow: note that when nocow is enabled, data checksumming and compression are implicitly disabled - To prevent in-place writes from racing with data moves (data_update.c) or bucket reuse (i.e. a bucket being reused and re-allocated while a nocow write is in flight, we have a new locking mechanism. Buckets can be locked for either data update or data move, using a fixed size hash table of two_state_shared locks. We don't have any chaining, meaning updates and moves to different buckets that hash to the same lock will wait unnecessarily - we'll want to watch for this becoming an issue. - The allocator path also needs to check for in-place writes in flight to a given bucket before giving it out: thus we add another counter to bucket_alloc_state so we can track this. - Fsync now may need to issue cache flushes to block devices instead of flushing the journal. We add a device bitmask to bch_inode_info, ei_devs_need_flush, which tracks devices that need to have flushes issued - note that this will lead to unnecessary flushes when other codepaths have already issued flushes, we may want to replace this with a sequence number. - New nocow write path: look up extents, and if they're writable write to them - otherwise fall back to the normal COW write path. XXX: switch to sequence numbers instead of bitmask for devs needing journal flush XXX: ei_quota_lock being a mutex means bch2_nocow_write_done() needs to run in process context - see if we can improve this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
79203111 |
|
13-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Unwritten extents support - bch2_extent_merge checks unwritten bit - read path returns 0s for unwritten extents without actually reading - reflink path skips over unwritten extents - bch2_bkey_ptrs_invalid() checks for extents with both written and unwritten extents, and non-normal extents (stripes, btree ptrs) with unwritten ptrs - fiemap checks for unwritten extents and returns FIEMAP_EXTENT_UNWRITTEN Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8dd69d9f |
|
21-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: KEY_TYPE_inode_v3, metadata_version_inode_v3 Move bi_size and bi_sectors into the non-varint portion of the inode, so that the write path can update them without going through the relatively expensive unpack/pack operations. Other changes: - Add a field for the offset of the varint section, so we can add new non-varint fields without needing a new inode type, like alloc_v3 - Move bi_mode into the flags field, so that the varint section can be u64 aligned Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a8c752bb |
|
17-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New on disk format: Backpointers This patch adds backpointers: we now have a reverse index from device and offset on that device (specifically, offset within a bucket) back to btree nodes and (non cached) data extents. The first 40 backpointers within a bucket are stored in the alloc key; after that backpointers spill over to the next backpointers btree. This is to help avoid performance regressions from additional btree updates on large streaming workloads. This patch adds all the code for creating, checking and repairing backpointers. The next patch in the series is going to use backpointers for copygc - finally getting rid of the need to scan all extents to do copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
920e69bc |
|
03-Jan-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Btree write buffer This adds a new method of doing btree updates - a straight write buffer, implemented as a flat fixed size array. This is only useful when we don't need to read from the btree in order to do the update, and when reading is infrequent - perfect for the LRU btree. This will make LRU btree updates fast enough that we'll be able to use it for persistently indexing buckets by fragmentation, which will be a massive boost to copygc performance. Changes: - A new btree_insert_type enum, for btree_insert_entries. Specifies btree, btree key cache, or btree write buffer. - bch2_trans_update_buffered(): updates via the btree write buffer don't need a btree path, so we need a new update path. - Transaction commit path changes: The update to the btree write buffer both mutates global, and can fail if there isn't currently room. Therefore we do all write buffer updates in the transaction all at once, and also if it fails we have to revert filesystem usage counter changes. If there isn't room we flush the write buffer in the transaction commit error path and retry. - A new persistent option, for specifying the number of entries in the write buffer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
19a614d2 |
|
30-Jan-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Better inlining for bch2_alloc_to_v4_mut This separates out the slowpath into a separate function, and inlines bch2_alloc_v4_mut into bch2_trans_start_alloc_update(), the main place it's called. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e1538212 |
|
02-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: New magic number Add a new bcachefs-specific magic number for the superblock, instead of continuing to use the old bcache magic number3 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a1019576 |
|
22-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: More style fixes Fixes for various checkpatch errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fd0c7679 |
|
22-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Convert to __packed and __aligned Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3e3e02e6 |
|
19-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Assorted checkpatch fixes checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
33bd5d06 |
|
22-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Deadlock cycle detector We've outgrown our own deadlock avoidance strategy. The btree iterator API provides an interface where the user doesn't need to concern themselves with lock ordering - different btree iterators can be traversed in any order. Without special care, this will lead to deadlocks. Our previous strategy was to define a lock ordering internally, and whenever we attempt to take a lock and trylock() fails, we'd check if the current btree transaction is holding any locks that cause a lock ordering violation. If so, we'd issue a transaction restart, and then bch2_trans_begin() would re-traverse all previously used iterators, but in the correct order. That approach had some issues, though. - Sometimes we'd issue transaction restarts unnecessarily, when no deadlock would have actually occured. Lock ordering restarts have become our primary cause of transaction restarts, on some workloads totally 20% of actual transaction commits. - To avoid deadlock or livelock, we'd often have to take intent locks when we only wanted a read lock: with the lock ordering approach, it is actually illegal to hold _any_ read lock while blocking on an intent lock, and this has been causing us unnecessary lock contention. - It was getting fragile - the various lock ordering rules are not trivial, and we'd been seeing occasional livelock issues related to this machinery. So, since bcachefs is already a relational database masquerading as a filesystem, we're stealing the next traditional database technique and switching to a cycle detector for avoiding deadlocks. When we block taking a btree lock, after adding ourself to the waitlist but before sleeping, we do a DFS of btree transactions waiting on other btree transactions, starting with the current transaction and walking our held locks, and transactions blocking on our held locks. If we find a cycle, we emit a transaction restart. Occasionally (e.g. the btree split path) we can not allow the lock() operation to fail, so if necessary we'll tell another transaction that it has to fail. Result: trans_restart_would_deadlock events are reduced by a factor of 10 to 100, and we'll be able to delete a whole bunch of grotty, fragile code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
674cfc26 |
|
26-Aug-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add persistent counters for all tracepoints Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6f44a994 |
|
13-Jun-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a persistent counter for bucket discards Like the previous patch for bucket invalidates, add another counter for a core allocator path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
440c15cc |
|
13-Jun-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a persistent counter for bucket invalidation Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
cb685ce7 |
|
05-Jun-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Also log overwrites in journal Lately we've been doing a lot of debugging by looking at the journal to see what was changed, and by what code path. This patch adds a new journal entry type for recording overwrites, so that we don't have to search backwards through the journal to see what was being overwritten in order to work out what the triggers were supposed to be doing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
104c6974 |
|
15-Mar-2022 |
Daniel Hill <daniel@gluo.nz> |
bcachefs: Add persistent counters This adds a new superblock field for persisting counters and adds a sysfs interface in counters/ exposing these counters. The superblock field is ignored by older versions letting us avoid an on disk version bump. Each sysfs file outputs a counter that tracks since filesystem creation and a counter for the current mount session. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
502f973d |
|
09-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a few warnings on 32 bit These showed up when building for mips. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
822835ff |
|
31-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fold bucket_state in to BCH_DATA_TYPES() Previously, we were missing accounting for buckets in need_gc_gens and need_discard states. This matters because buckets in those states need other btree operations done before they can be used, so they can't be conuted when checking current number of free buckets against the allocation watermark. Also, we weren't directly counting free buckets at all. Now, data type 0 == BCH_DATA_free, and free buckets are counted; this means we can get rid of the separate (poorly defined) count of unavailable buckets. This is a new on disk format version, with upgrade and fsck required for the accounting changes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
66d90823 |
|
13-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill struct bucket_mark This switches struct bucket to using a lock, instead of cmpxchg. And now that the protected members no longer need to fit into a u64, we can expand the sector counts to 32 bits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c6b2826c |
|
11-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Freespace, need_discard btrees This adds two new btrees for the upcoming allocator rewrite: an extents btree of free buckets, and a btree for buckets awaiting discards. We also add a new trigger for alloc keys to keep the new btrees up to date, and a compatibility path to initialize them on existing filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3d48a7f8 |
|
31-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: KEY_TYPE_alloc_v4 This introduces a new alloc key which doesn't use varints. Soon we'll be adding backpointers and storing them in alloc keys, which means our pack/unpack workflow for alloc keys won't really work - we'll need to be mutating alloc keys in place. Instead of bch2_alloc_unpack(), we now have bch2_alloc_to_v4() that converts older types of alloc keys to v4 if needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d326ab2f |
|
04-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: LRU btree This implements new persistent LRUs, to be used for buckets containing cached data, as well as stripes ordered by time when a block became empty. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
179e3434 |
|
05-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: KEY_TYPE_set A new empty key type, to be used when using a btree as a set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
25be2e5d |
|
10-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch_sb_field_journal_v2 Add a new superblock field which represents journal buckets as ranges: also move code for the superblock journal fields to journal_sb.c. This also reworks the code for resizing the journal to write the new superblock before using the new journal buckets, and thus be a bit safer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
74b33393 |
|
20-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: x-macro metadata version enum Now we've got strings for metadata versions - this changes bch2_sb_to_text() and our mount log message to use it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
528b18e6 |
|
31-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_journal_entry_to_text() This adds a _to_text() pretty printer for journal entries - including every subtype - which will shortly be used by the 'bcachefs list_journal' subcommand. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
fb64f3fd |
|
31-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BCH_JSET_ENTRY_log Add a journal entry type for logging messages, and add an option to use it to log the transaction name - this makes for a very handy debugging tool, as with it we can use the 'bcachefs list_journal' command to see not only what updates were done, but what was doing them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
7243498d |
|
24-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill non-lru cache replacement policies Prep work for persistent LRUs and getting rid of the in memory bucket array. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
2430e72f |
|
04-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert journal sysfs params to regular options This converts journal_write_delay, journal_flush_disabled, and journal_reclaim_delay to normal filesystems options, and also adds them to the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
6404dcc9 |
|
10-Nov-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: More enum strings This patch converts more enums in the on disk format to our standard x-macro-with-strings deal - to enable better pretty-printing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
3e52c222 |
|
29-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add journal_seq to inode & alloc keys Add fields to inode & alloc keys that record the journal sequence number when they were most recently modified. For alloc keys, this is needed to know what journal sequence number we have to flush before the bucket can be reused. Currently this is tracked in memory, but we'll be getting rid of the in memory bucket array. For inodes, this is needed for fsync when the inode has been evicted from the vfs cache. Currently we use a bloom filter per outstanding journal buf - but that mechanism has been broken since we added the ability to not issue a flush/fua for every journal write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
2027875b |
|
10-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add BCH_SUBVOLUME_UNLINKED Snapshot deletion needs to become a multi step process, where we unlink, then tear down the page cache, then delete the subvolume - the deleting flag is equivalent to an inode with i_nlink = 0. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
4db65027 |
|
11-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Subvol dirents are now only visible in parent subvol This changes the on disk format for dirents that point to subvols so that they also record the subvolid of the parent subvol, so that we can filter them out in other subvolumes. This also updates the dirent code to do that filtering, and in particular tweaks the rename code - we need to ensure that there's only ever one dirent (counting multiplicities in different snapshots) that point to a subvolume. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
e5fa91d7 |
|
20-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix restart handling in for_each_btree_key() Code that uses for_each_btree_key often wants transaction restarts to be handled locally and not returned. Originally, we wouldn't return transaction restarts if there was a single iterator in the transaction - the reasoning being if there weren't other iterators being invalidated, and the current iterator was being advanced/retraversed, there weren't any locks or iterators we were required to preserve. But with the btree_path conversion that approach doesn't work anymore - even when we're using for_each_btree_key() with a single iterator there will still be two paths in the transaction, since we now always preserve the path at the pos the iterator was initialized at - the reason being that on restart we often restart from the same place. And it turns out there's now a lot of for_each_btree_key() uses that _do not_ want transaction restarts handled locally, and should be returning them. This patch splits out for_each_btree_key_norestart() and for_each_btree_key_continue_norestart(), and converts existing users as appropriate. for_each_btree_key(), for_each_btree_key_continue(), and for_each_btree_node() now handle transaction restarts themselves by calling bch2_trans_begin() when necessary - and the old hack to not return transaction restarts when there's a single path in the transaction has been deleted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
6d76aefe |
|
14-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix for leaking of reflinked extents When a reflink pointer points to only part of an indirect extent, and then that indirect extent is fragmented (e.g. by copygc), if the reflink pointer only points to one of the fragments we leak a reference. Fix this by storing front/back pad values in reflink pointers - when inserting reflink pointesr, we initialize them to cover the full range of the indirect extents we reference. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
bfe88863 |
|
19-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New on disk format to fix reflink_p pointers We had a bug where reflink_p pointers weren't being initialized to 0, and when we started using the second word, things broke badly. This patch revs the on disk format version and adds cleanup code to zero out the second word of reflink_p pointers before we start using it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
0476fa94 |
|
27-Sep-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rev the on disk format version for snapshots This will cause the compat code to be run that creates entries in the subvolumes and snapshots btrees. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
7a7d17b2 |
|
02-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Whiteouts for snapshots This patch adds KEY_TYPE_whiteout, a new type of whiteout for snapshots, when we're deleting and the key being deleted is in an ancestor snapshot - and updates the transaction update/commit path to use it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
14b393ee |
|
15-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Subvolumes, snapshots This patch adds subvolume.c - support for the subvolumes and snapshots btrees and related data types and on disk data structures. The next patches will start hooking up this new code to existing code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
e719fc34 |
|
15-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BSET_OFFSET() Add a field to struct bset for the sector offset within the btree node where it was written. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
9f1833ca |
|
10-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Update btree ptrs after every write This closes a significant hole (and last known hole) in our ability to verify metadata. Previously, since btree nodes are log structured, we couldn't detect lost btree writes that weren't the first write to a given node. Additionally, this seems to have lead to some significant metadata corruption on multi device filesystems with metadata replication: since a write may have made it to one device and not another, if we read that btree node back from the replica that did have that write and started appending after that point, the other replica would have a gap in the bset entries and reading from that replica wouldn't find the rest of the bsets. But, since updates to interior btree nodes are now journalled, we can close this hole by updating pointers to btree nodes after every write with the currently written number of sectors, without negatively affecting performance. This means we will always detect lost or corrupt metadata - it also means that our btree is now a curious hybrid of COW and non COW btrees, with all the benefits of both (excluding complexity). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
996fb577 |
|
13-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add an option for whether inodes use the key cache We probably don't ever want to flip this off in production, but it may be useful for certain kinds of testing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
618b1c0e |
|
05-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out SPOS_MAX Internal btree code really wants a POS_MAX with all fields ~0; external code more likely wants the snapshot field to be 0, because when we're passing it to bch2_trans_get_iter() it's used for the snapshot we're operating in, which should be 0 for most btrees that don't use snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
41e63382 |
|
17-Jun-2021 |
jpsollie <janpieter.sollie@edpnet.be> |
bcachefs: add bcachefs xxhash support xxhash is a much faster algorithm compared to crc32. could be used to speed up checksum calculation. xxhash 64-bit only, as it is much faster on 64-bit CPUs compared to xxh32. Signed-off-by: jpsollie <janpieter.sollie@edpnet.be> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b282a74f |
|
27-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add an option to control sharding new inode numbers We're seeing a bug where inode creates end up spinning in bch2_inode_create - disabling sharding will simplify what we're testing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
aae15aaf |
|
24-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New and improved topology repair code This splits out btree topology repair into a separate pass, and makes some improvements: - When we have to pick which of two overlapping nodes to drop keys from, we use the btree node header sequence number to preserve the newer node - the gc code has been changed so that it doesn't bail out if we're continuing/ignoring on fsck error - this way the dump tool can skip running the repair pass but still walk all reachable metadata - add a new superblock flag indicating when a filesystem is known to have btree topology issues, and the topology repair pass should be run - changing the start/end of a node might mean keys in that node have to be deleted: this patch handles that better by splitting it out into a separate function and running it explicitly in the topology repair code, previously those keys were only being dropped when the btree node was read in. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ab2a29cc |
|
02-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Inode backpointers This patch adds two new inode fields, bi_dir and bi_dir_offset, that point back to the inode's dirent. Since we're only adding fields for a single backpointer, files that have been hardlinked won't necessarily have valid backpointers: we also add a new inode flag, BCH_INODE_BACKPTR_UNTRUSTED, that's set if an inode has ever had multiple links to it. That's ok, because we only really need this functionality for directories, which can never have multiple hardlinks - when we add subvolumes, we'll need a way to enemurate and print subvolumes, and this will let us reconstruct a path to a subvolume root given a subvolume root inode. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e751c01a |
|
24-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Start using bpos.snapshot field This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
73590619 |
|
21-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't unconditially version_upgrade in initialize This is mkfs's job. Also, clean up the handling of feature bits some. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
87a432f5 |
|
15-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill reflink option An option was added to control whether reflink support was on or off because for a long time, reflink + inline data extent support was missing - but that's since been fixed, so we can drop the option now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7d6f07ed |
|
04-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix compat code for superblock The bkey compat code wasn't being run for btree roots in the superblock clean section - this patch fixes it to use the journal entry validate code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2436cb9f |
|
20-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use x-macros for more enums This patch standardizes all the enums that have associated string tables (probably more enums should have string tables). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
41f8b09e |
|
20-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rename BTREE_ID enums for consistency with other enums Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
79f88eba |
|
20-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rename KEY_TYPE_whiteout -> KEY_TYPE_hash_whiteout Snapshots are going to need a different whiteout key type. Also, switch to using BCH_BKEY_TYPES() to define the bkey value accessors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
19dd3172 |
|
04-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use x-macros for compat feature bits This is to generate strings for them, so that we can print them out. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e01dacf7 |
|
20-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix bkey format generation for 32 bit fields Having a packed format that can represent a field larger than the unpacked type breaks bkey_packed_successor() assertions - we need to fix this to start using the snapshot filed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a4805d66 |
|
22-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Scan for old btree nodes if necessary on mount We dropped support for !BTREE_NODE_NEW_EXTENT_OVERWRITE but it turned out there were people who still had filesystems with btree nodes in that format in the wild. This adds a new compat feature that indicates we've scanned for and rewritten nodes in the old format, and does that scan at mount time if the option isn't set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8042b5b7 |
|
10-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Extents may now cross btree node boundaries When snapshots arrive, we won't necessarily be able to arbitrarily split existis - when we need to split an existing extent, we'll have to check if the extent was overwritten in child snapshots and if so emit a whiteout for the split in the child snapshot. Because extents couldn't span btree nodes previously, journal replay would sometimes have to split existing extents. That's no good anymore, but fortunately since extent handling has already been lifted above most of the btree code there's no real need for that rule anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
180fb49d |
|
21-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal updates to dev usage This eliminates the need to scan every bucket to regenerate dev_usage at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2abe5420 |
|
21-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Persist 64 bit io clocks Originally, bcachefs - going back to bcache - stored, for each bucket, a 16 bit counter corresponding to how long it had been since the bucket was read from. But, this required periodically rescaling counters on every bucket to avoid wraparound. That wasn't an issue in bcache, where we'd perodically rewrite the per bucket metadata all at once, but in bcachefs we're trying to avoid having to walk every single bucket. This patch switches to persisting 64 bit io clocks, corresponding to the 64 bit bucket timestaps introduced in the previous patch with KEY_TYPE_alloc_v2. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7f4e1d5d |
|
22-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: KEY_TYPE_alloc_v2 This introduces a new version of KEY_TYPE_alloc, which uses the new varint encoding introduced for inodes. This means we'll eventually be able to support much larger bucket sizes (for SMR devices), and the read/write time fields are expanded to 64 bits - which will be used in the next patch to get rid of the periodic rescaling of those fields. Also, for buckets that are members of erasure coded stripes, this adds persistent fields for the index of the stripe they're members of and the stripe redundancy. This is part of work to get rid of having to scan and read into memory the alloc and stripes btrees at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d042b040 |
|
29-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add an option for metadata_target Also, make journal writes obey foreground_target and metadata_target. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
51d2dfb8 |
|
26-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add BTREE_PTR_RANGE_UPDATED This is so that when we discover btree topology issues, we can just update the pointer to a btree node and signal btree read path that the min/max keys in the node header should be updated from the node pointer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
07a1006a |
|
17-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Reduce/kill BKEY_PADDED use With various newer key types - stripe keys, inline data extents - the old approach of calculating the maximum size of the value is becoming more and more error prone. Better to switch to bkey_on_stack, which can dynamically allocate if necessary to handle any size bkey. In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
81d8599e |
|
14-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't read existing stripes synchronously in write path Previously, in the stripe creation path, when reusing an existing stripe we'd read the existing stripe synchronously - ouch. Now, we allocate two stripe bufs if we're using an existing stripe, so that we can do the read asynchronously - and, we read the full stripe so that we can run recovery, if necessary. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ffb7c3d3 |
|
16-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add BCH_BKEY_PTRS_MAX This now means "the maximum number of pointers within a bkey" - and bch_devs_list is updated to use it instead of BCH_REPLICAS_MAX, since stripes can contain more than BCH_REPLICAS_MAX pointers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
adbcada4 |
|
14-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't require flush/fua on every journal write This patch adds a flag to journal entries which, if set, indicates that they weren't done as flush/fua writes. - non flush/fua journal writes don't update last_seq (i.e. they don't free up space in the journal), thus the journal free space calculations now check whether nonflush journal writes are currently allowed (i.e. are we low on free space, or would doing a flush write free up a lot of space in the journal) - write_delay_ms, the user configurable option for when open journal entries are automatically written, is now interpreted as the max delay between flush journal writes (default 1 second). - bch2_journal_flush_seq_async is changed to ensure a flush write >= the requested sequence number has happened - journal read/replay must now ignore, and blacklist, any journal entries newer than the most recent flush entry in the journal. Also, the way the read_entire_journal option is handled has been improved; struct journal_replay now has an entry, 'ignore', for entries that were read but should not be used. - assorted refactoring and improvements related to journal read in journal_io.c and recovery.c Previously, we'd have to issue a flush/fua write every time we accumulated a full journal entry - typically the bucket size. Now we need to issue them much less frequently: when an fsync is requested, or it's been more than write_delay_ms since the last flush, or when we need to free up space in the journal. This is a significant performance improvement on many write heavy workloads. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a3e72262 |
|
05-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New varints Previous varint implementation used by the inode code was not nearly as fast as it could have been; partly because it was attempting to encode integers up to 96 bits (for timestamps) but this meant that encoding and decoding the length required a table lookup. Instead, we'll just encode timestamps greater than 64 bits as two separate varints; this will make decoding/encoding of inodes significantly faster overall. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
801a3de6 |
|
24-Oct-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Indirect inline data extents When inline data extents were added, reflink was forgotten about - we need indirect inline data extents for reflink + inline data to work correctly. This patch adds them, and a new feature bit that's flipped when they're used. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
af4d05c4 |
|
09-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Account for stripe parity sectors separately Instead of trying to charge EC parity to the data within the stripe (which is subject to rounding errors), let's charge it to the stripe itself. It should also make -ENOSPC issues easier to deal with if we charge for parity blocks up front, and means we can also make more fine grained accounting available to the user. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
89fd25be |
|
09-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use x-macros for data types Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
36b8372b |
|
02-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add an option to disable reflink support Reflink might be buggy, so we're adding an option so users can help bisect what's going on. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
39fb2983 |
|
07-Jan-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill bkey_type_successor Previously, BTREE_ID_INODES was special - inodes were indexed by the inode field, which meant the offset field of struct bpos wasn't used, which led to special cases in e.g. the btree iterator code. Now, inodes in the inodes btree are indexed by the offset field. Also: prevously min_key was special for extents btrees, min_key for extents would equal max_key for the previous node. Now, min_key = bkey_successor() of the previous node, same as non extent btrees. This means we can completely get rid of btree_type_sucessor/predecessor. Also make some improvements to the metadata IO validate/compat code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6357d607 |
|
08-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal updates to interior nodes Previously, the btree has always been self contained and internally consistent on disk without anything from the journal - the journal just contained pointers to the btree roots. However, this meant that btree node split or compact operations - i.e. anything that changes btree node topology and involves updates to interior nodes - would require that interior btree node to be written immediately, which means emitting a btree node write that's mostly empty (using 4k of space on disk if the filesystemm blocksize is 4k to only write perhaps ~100 bytes of new keys). More importantly, this meant most btree node writes had to be FUA, and consumer drives have a history of slow and/or buggy FUA support - other filesystes have been bit by this. This patch changes the interior btree update path to journal updates to interior nodes, after the writes for the new btree nodes have completed. Best of all, it turns out to simplify the interior node update path somewhat. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e3e464ac |
|
30-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Move extent overwrite handling out of core btree code Ever since the btree code was first written, handling of overwriting existing extents - including partially overwriting and splittin existing extents - was handled as part of the core btree insert path. The modern transaction and iterator infrastructure didn't exist then, so that was the only way for it to be done. This patch moves that outside of the core btree code to a pass that runs at transaction commit time. This is a significant simplification to the btree code and overall reduction in code size, but more importantly it gets us much closer to the core btree code being completely independent of extents and is important prep work for snapshots. This introduces a new feature bit; the old and new extent update models are incompatible when the filesystem needs journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b807a0c8 |
|
26-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BCH_SB_FEATURES_ALL BCH_FEATURE_btree_ptr_v2 wasn't getting set on new filesystems, oops Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
548b3d20 |
|
07-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_ptr_v2 Add a new btree ptr type which contains the sequence number (random 64 bit cookie, actually) for that btree node - this lets us verify that when we read in a btree node it really is the btree node we wanted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ab05de4c |
|
23-Feb-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Track incompressible data This fixes the background_compression option: wihout some way of marking data as incompressible, rebalance will keep rewriting incompressible data over and over. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bcd6f3e0 |
|
26-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use KEY_TYPE_deleted whitouts for extents Previously, partial overwrites of existing extents were handled implicitly by the btree code; when reading in a btree node, we'd do a mergesort of the different bsets and detect and fix partially overlapping extents during that mergesort. That approach won't work with snapshots: this changes extents to work like regular keys as far as the btree code is concerned, where a 0 size KEY_TYPE_deleted whiteout will completely overwrite an existing extent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1c3ff72c |
|
28-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert some enums to x-macros Helps for preventing things from getting out of sync. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
22502ac2 |
|
16-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Redo filesystem usage ioctls When disk space accounting was changed to be tracked by replicas entry, the ioctl interface was never update: this patch finally does that. Aditionally, the BCH_IOCTL_USAGE ioctl is now broken out into separate ioctls for filesystem and device usage. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4be1a412 |
|
09-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Inline data extents This implements extents that have their data inline, in the value, instead of the bkey value being pointers to the data - and the read and write paths are updated to read from these new extent types and write them out, when the write size is small enough. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9ec211b0 |
|
08-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix ec_stripes_read() The bkey_s_c returned by btree_iter_(peek|next) points into the btree iter type, so advancing the iterator and then using the one previously returned is a bug... Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
73501ab8 |
|
04-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't use sha256 for siphash str hash key With the refactoring that's coming to add fuse support, we want bch2_hash_info_init() to be cheaper so we don't have to rely on anything cached besides the inode in the btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5055b509 |
|
06-Sep-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rebalance now adds replicas if needed Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
76426098 |
|
16-Aug-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Reflink Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d74dfe02 |
|
02-Jul-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix for building with old gcc Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
91052b9d |
|
24-Jun-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Refactor trans_(get|update)_key these are still pretty ugly... Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
932aa837 |
|
11-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_trans_mark_update() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1dd7f9d9 |
|
04-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rewrite journal_seq_blacklist machinery Now, we store blacklisted journal sequence numbers in the superblock, not the journal: this helps to greatly simplify the code, and more importantly it's now implemented in a way that doesn't require all btree nodes to be visited before starting the journal - instead, we unconditionally blacklist the next 4 journal sequence numbers after an unclean shutdown. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0bc166ff |
|
28-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Track whether filesystem has errors in superblock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8fe826f9 |
|
13-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert bucket invalidation to key marking path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
76f4c7b0 |
|
11-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix oldest_gen handling Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1df42b57 |
|
06-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: don't do initial gc if have alloc info feature Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3577df5f |
|
09-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: serialize persistent_reserved Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2c5af169 |
|
24-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: reserve space in journal for fs usage entries Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
721d4ad8 |
|
13-Dec-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add flags to indicate if inode opts were inherited or explicitly set Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d42dd4ad |
|
13-Dec-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: merge BCH_INODE_FIELDS_INHERIT/BCH_INODE_OPTS Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a3e70fb2 |
|
13-Dec-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: use x-macros more consistently Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7121643e |
|
12-Dec-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix for building in userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
90541a74 |
|
21-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add new alloc fields prep work for persistent alloc info Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
26609b61 |
|
01-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make bkey types globally unique this lets us get rid of a lot of extra switch statements - in a lot of places we dispatch on the btree node type, and then the key type, so this is a nice cleanup across a lot of code. Also improve the on disk format versioning stuff. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cd575ddf |
|
01-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Erasure coding Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
af9d3bc2 |
|
30-Oct-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: stripe support for replicas tracking Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c258f28e |
|
12-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Check for unsupported features Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8b335bae |
|
04-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Assorted fixes for running on very small devices It's now possible to create and use a filesystem on a 512k device with 4k buckets (though at that size we still waste almost half to internal reserves) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7a920560 |
|
30-Oct-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill struct bch_replicas_cpu_entry Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
abce30b7 |
|
30-Sep-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BCH_EXTENT_ENTRY_TYPES() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a50ed7c8 |
|
24-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BCH_SB_RESERVE_BYTES Add an option, gc_reserve_bytes, to set the copygc reserve as a size instead of a percent Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1c6fdbd8 |
|
17-Mar-2017 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Initial commit Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write filesystem with every feature you could possibly want. Website: https://bcachefs.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|