#
ec438ac5 |
|
19-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix missing call to bch2_fs_allocator_background_exit() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9802ff48 |
|
20-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Print shutdown journal sequence number Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4409b808 |
|
11-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Repair pass for scanning for btree nodes If a btree root or interior btree node goes bad, we're going to lose a lot of data, unless we can recover the nodes that it pointed to by scanning. Fortunately btree node headers are fully self describing, and additionally the magic number is xored with the filesytem UUID, so we can do so safely. This implements the scanning - next patch will rework topology repair to make use of the found nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
13c1e583 |
|
28-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve -o norecovery; opts.recovery_pass_limit This adds opts.recovery_pass_limit, and redoes -o norecovery to make use of it; this fixes some issues with -o norecovery so it can be safely used for data recovery. Norecovery means "don't do journal replay"; it's an important data recovery tool when we're getting stuck in journal replay. When using it this way we need to make sure we don't free journal keys after startup, so we continue to overlay them: thus it needs to imply retain_recovery_info, as well as nochanges. recovery_pass_limit is an explicit option for telling recovery to exit after a specific recovery pass; this is a much cleaner way of implementing -o norecovery, as well as being a useful debug feature in its own right. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0a34c058 |
|
30-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Ensure bch_sb_field_ext always exists This makes bch_sb_field_ext more consistent with the rest of -o nochanges - we don't want to be varying other codepaths based on -o nochanges, since it's used for testing in dry run mode; also fixes some potential null ptr derefs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3ed94062 |
|
17-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve bch2_fatal_error() error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f3589bfa |
|
16-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: fix for building in userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
273960b8 |
|
01-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
bcachefs: time_stats: split stats-with-quantiles into a separate structure Currently, struct time_stats has the optional ability to quantize the information that it collects. This is /probably/ useful for callers who want to see quantized information, but it more than doubles the size of the structure from 224 bytes to 464. For users who don't care about that (e.g. upcoming xfs patches) and want to avoid wasting 240 bytes per counter, split the two into separate pieces. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b63570f7 |
|
12-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_print_opts() Make sure early error messages get redirected, for kernel-fsck-from-userland. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
130d229f |
|
12-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve error messages in device remove path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
60e1baa8 |
|
04-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: thread_with_stdio: convert to darray - eliminate the dependency on printbufs, so that we can lift thread_with_file for use in xfs - add a nonblocking parameter to stdio_redirect_printf(), and either block if the buffer is full or drop it on the floor - don't buffer infinitely Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cb6fc943 |
|
01-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: kill kvpmalloc() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6b83aee8a4 |
|
22-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Workqueues should be WQ_HIGHPRI Most bcachefs workqueues are used for completions, and should be WQ_HIGHPRI - this helps reduce queuing delays, we want to complete quickly once we can no longer signal backpressure by blocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
067f244c |
|
26-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: fix split brain message Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2f300f09 |
|
08-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: no_splitbrain_check option This adds an option to disable kicking out devices when splitbrain is detected - it seems there's some issues with splitbrain detection and we're kicking out devices erronously. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f8cdf65b |
|
03-Mar-2024 |
Li Zetao <lizetao1@huawei.com> |
bcachefs: Fix null-ptr-deref in bch2_fs_alloc() There is a null-ptr-deref issue reported by kasan: KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] Call Trace: <TASK> bch2_fs_alloc+0x1092/0x2170 [bcachefs] bch2_fs_open+0x683/0xe10 [bcachefs] ... When initializing the name of bch_fs, it needs to dynamically alloc memory to meet the length of the name. However, when name allocation failed, it will cause a null-ptr-deref access exception in subsequent string copy. Fix this issue by checking if name allocation is successful. Fixes: 401ec4db6308 ("bcachefs: Printbuf rework") Signed-off-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4e074475 |
|
10-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Clamp replicas_required to replicas This prevents going emergency read only when the user has specified replicas_required > replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3a58dfbc |
|
20-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: counters.c -> sb-counters.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ec4edd7b |
|
16-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Prep work for variable size btree node buffers bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e58f963c |
|
06-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: helpers for printing data types We need bounds checking since new versions may introduce new data types. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e28b0359 |
|
10-Jan-2024 |
Kees Cook <keescook@chromium.org> |
bcachefs: Replace strlcpy() with strscpy() strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated[1]. Additionally, it returns the size of the source string, not the resulting size of the destination string. In an effort to remove strlcpy() completely[2], replace strlcpy() here with strscpy(). Nothing checks the return value here, so a direct replacement with strspy() is possible. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [1] Link: https://github.com/KSPP/linux/issues/89 [2] Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Brian Foster <bfoster@redhat.com> Cc: <linux-bcachefs@vger.kernel.org> Link: https://lore.kernel.org/r/20240110235438.work.385-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
1f5af5fc |
|
05-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: %pg is banished not portable to userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4798bd24 |
|
03-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: increase max_active on io_complete_wq this definitely should _not_ be 1, and we don't actually want any concurrency limiting at all here - btree node read completions are getting blocked behind btree node write submissions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
96f37eab |
|
31-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: factor out thread_with_file, thread_with_stdio thread_with_stdio now knows how to handle input - fsck can now prompt to fix errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0d529663 |
|
27-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Split brain detection Use the new bch_member->seq, sb->write_time fields to detect split brain and kick out devices when necessary. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
62719cf3 |
|
23-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix nochanges/read_only interaction nochanges means "we cannot issue writes at all"; it's possible to go into a pseudo read-write mode where we pin dirty metadata in memory, which is used for fsck in dry run mode and doing journal replay on a read only mount, but we do not want to allow an actual read-write mount in nochanges mode. But we do always want to allow early read-write, during recovery - this patch clarifies that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
41b84fb4 |
|
17-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: for_each_member_device_rcu() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9fea2274 |
|
16-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: for_each_member_device() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e34ec13a |
|
23-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: add more verbose logging Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
defd9e39 |
|
16-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: darray_for_each() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cf904c8d |
|
16-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_err_(fn|msg) check if should print Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
09caeabe |
|
02-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree write buffer now slurps keys from journal Previosuly, the transaction commit path would have to add keys to the btree write buffer as a separate operation, requiring additional global synchronization. This patch introduces a new journal entry type, which indicates that the keys need to be copied into the btree write buffer prior to being written out. We switch the journal entry type back to JSET_ENTRY_btree_keys prior to write, so this is not an on disk format change. Flushing the btree write buffer may require pulling keys out of journal entries yet to be written, and quiescing outstanding journal reservations; we previously added journal->buf_lock for synchronization with the journal write path. We also can't put strict bounds on the number of keys in the journal destined for the write buffer, which means we might overflow the size of the preallocated buffer and have to reallocate - this introduces a potentially fatal memory allocation failure. This is something we'll have to watch for, if it becomes an issue in practice we can do additional mitigation. The transaction commit path no longer has to explicitly check if the write buffer is full and wait on flushing; this is another performance optimization. Instead, when the btree write buffer is close to full we change the journal watermark, so that only reservations for journal reclaim are allowed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
267b801f |
|
04-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BCH_IOCTL_FSCK_ONLINE This adds a new ioctl for running fsck on a mounted, in use filesystem. This reuses the fsck_thread code from the previous patch for running fsck on an offline, unmounted filesystem, so that log messages for the fsck thread are redirected to userspace. Only one running fsck instance is allowed at a time; a new semaphore (since the lock will be taken by one thread and released by another) is added for this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2b41226d |
|
04-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add ability to redirect log output Upcoming patches are going to add two new ioctls for running fsck in the kernel, but pretending that we're running our normal userspace fsck. This patch adds some plumbing for redirecting our normal log messages away from the dmesg log to a thread_with_file file descriptor - via a struct log_output, which will be consumed by the fsck f_op's read method. The new ioctls will allow for running fsck in the kernel against an offline filesystem (without mounting it), and an online filesystem. For an offline filesystem we need a way to pass in a pointer to the log_output, which is done via a new hidden opts.h option. For online fsck, we can set c->output directly, but only want to redirect log messages from the thread running fsck - hence the new c->output_filter method. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
63508b75 |
|
06-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: c->ro_ref Add a new refcount for async ops that don't necessarily need the fs to be RW, with similar lifetime/rules otherwise as c->writes. To be used by online fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3c471b65 |
|
26-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: convert bch_fs_flags to x-macro Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
066a2646 |
|
09-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: track_event_change() This introduces a new helper for connecting time_stats to state changes, i.e. when taking journal reservations is blocked for some reason. We use this to track separately the different reasons the journal might be blocked - i.e. space in the journal full, or the journal pin fifo full. Also do some cleanup and improvements on the time stats code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e7f7dded |
|
22-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add extra verbose logging for ro path Also log time waiting for c->writes references to be dropped; this will help in debugging why unmounts are taking longer than they should. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
85c6db98 |
|
04-Dec-2023 |
Daniel Hill <daniel@gluo.nz> |
bcachefs: improve modprobe support by providing softdeps We need to help modprobe load architecture specific modules so we don't fall back to generic software implementations, this should help performance when building as a module. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
50a8a732 |
|
14-Dec-2023 |
Thomas Bertschinger <tahbertschinger@gmail.com> |
bcachefs: fix invalid memory access in bch2_fs_alloc() error path When bch2_fs_alloc() gets an error before calling bch2_fs_btree_iter_init(), bch2_fs_btree_iter_exit() makes an invalid memory access because btree_trans_list is uninitialized. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Fixes: 6bd68ec266ad ("bcachefs: Heap allocate btree_trans") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8a443d3e |
|
17-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Proper refcounting for journal_keys The btree iterator code overlays keys from the journal until journal replay is finished; since we're now starting copygc/rebalance etc. before replay is finished, this is multithreaded access and thus needs refcounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
468035ca |
|
23-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Start gc, copygc, rebalance threads after initing writes ref This fixes a bug where copygc would occasionally race with going read-write and die, thinking we were read only, because it couldn't take a ref on c->writes. It's not necessary for copygc (or rebalance, or copygc) to take write refs; they could run with BCH_TRANS_COMMIT_nocheck_rw, but this is an easier fix that making sure that flag is passed correctly everywhere. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d4c8bb69 |
|
31-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Convert bch2_fs_open() to darray Open coded dynamic arrays are deprecated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f5d26fa3 |
|
25-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_sb_field_errors Add a new superblock section to keep counts of errors seen since filesystem creation: we'll be addingcounters for every distinct fsck error. The new superblock section has entries of the for [ id, count, time_of_last_error ]; this is intended to let us see what errors are occuring - and getting fixed - via show-super output. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
94119eeb |
|
25-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add IO error counts to bch_member We now track IO errors per device since filesystem creation. IO error counts can be viewed in sysfs, or with the 'bcachefs show-super' command. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e8484348 |
|
26-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix a kasan splat in bch2_dev_add() This fixes a use after free - mi is dangling after the resize call. Additionally, resizing the device's member info section was useless - we were attempting to preallocate the space required before adding it to the filesystem superblock, but there's other sections that we should have been preallocating as well for that to work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e677179b |
|
22-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_disk_path_to_text() no longer takes sb_lock We're going to be using bch2_target_to_text() -> bch2_disk_path_to_text() from bch2_bkey_ptrs_to_text() and bch2_bkey_ptrs_invalid(), which can be called in any context. This patch adds the actual label to bch_disk_group_cpu so that it can be used by bch2_disk_path_to_text, and splits out bch2_disk_path_to_text() into two variants - like the previous patch, one for when we have a running filesystem and another for when we only have a superblock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bbe682c7 |
|
21-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Ensure devices are always correctly initialized We can't mark device superblocks or allocate journal on a device that isn't online. That means we may need to do this on every mount, because we may have formatted a new filesystem and then done the first mount (bch2_fs_initialize()) in degraded mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d0261559 |
|
21-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Delete duplicate time stats initialization This code duplicated initialization already done in bch2_fs_btree_iter_init(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
37fad949 |
|
28-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: snapshot_create_lock Add a new lock for snapshot creation - this addresses a few races with logged operations and snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4637429e |
|
26-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_sb_field_get() refactoring Instead of using token pasting to generate methods for each superblock section, just make the type a parameter to bch2_sb_field_get(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
69d1f052 |
|
28-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Correctly initialize new buckets on device resize bch2_dev_resize() was never updated for the allocator rewrite with persistent freelists, and it wasn't noticed because the tests weren't running fsck - oops. Fix this by running bch2_dev_freespace_init() for the new buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3f7b9713 |
|
24-Sep-2023 |
Hunter Shaffer <huntershaffer182456@gmail.com> |
bcachefs: New superblock section members_v2 members_v2 has dynamically resizable entries so that we can extend bch_member. The members can no longer be accessed with simple array indexing Instead members_v2_get is used to find a member's exact location within the array and returns a copy of that member. Alternatively member_v2_get_mut retrieves a mutable point to a member. Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1241df58 |
|
24-Sep-2023 |
Hunter Shaffer <huntershaffer182456@gmail.com> |
bcachefs: Add new helper to retrieve bch_member from sb Prep work for introducing bch_sb_field_members_v2 - introduce new helpers that will check for members_v2 if it exists, otherwise using v1 Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1e3b4098 |
|
24-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: More assertions for nocow locking - assert in shutdown path that no nocow locks are held - check for overflow when taking nocow locks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
867c1fe0 |
|
13-Sep-2023 |
Dan Carpenter <dan.carpenter@linaro.org> |
bcachefs: fix error checking in bch2_fs_alloc() There is a typo here where it uses ";" instead of "?:". The result is that bch2_fs_fs_io_direct_init() is called unconditionally and the errors from it are not checked. Fixes: 0060c68159fc ("bcachefs: Split up fs-io.[ch]") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
0198b235 |
|
13-Sep-2023 |
Christophe JAILLET <christophe.jaillet@wanadoo.fr> |
bcachefs: Remove a redundant and harmless bch2_free_super() call Remove a redundant call to bch2_free_super(). This is harmless because bch2_free_super() has a memset() at its end. So a second call would only lead to from kfree(NULL). Remove the redundant call and only rely on the error handling path. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
71933fb6 |
|
13-Sep-2023 |
Christophe JAILLET <christophe.jaillet@wanadoo.fr> |
bcachefs: Fix use-after-free in bch2_dev_add() If __bch2_dev_attach_bdev() fails, bch2_dev_free() is called twice. Once here and another time in the error handling path. This leads to several use-after-free. Remove the redundant call and only rely on the error handling path. Fixes: 6a44735653d4 ("bcachefs: Improved superblock-related error messages") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a9737e0b |
|
13-Sep-2023 |
Brian Foster <bfoster@redhat.com> |
bcachefs: add module description to fix modpost warning modpost produces the following warning: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/bcachefs/bcachefs.o Add a module description for bcachefs. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6bd68ec2 |
|
12-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Heap allocate btree_trans We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
96dea3d5 |
|
12-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix W=12 build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
aaad530a |
|
27-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BTREE_ID_logged_ops Add a new btree for long running logged operations - i.e. for logging operations that we can't do within a single btree transaction, so that they can be resumed if we crash. Keys in the logged operations btree will represent operations in progress, with the state of the operation stored in the value. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1809b8cb |
|
10-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Break up io.c More reorganization, this splits up io.c into - io_read.c - io_misc.c - fallocate, fpunch, truncate - io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
39791d7d |
|
11-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill incorrect assertion In the bch2_fs_alloc() error path we call bch2_fs_free() without setting BCH_FS_STOPPING - this is fine. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e46c181a |
|
10-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Convert more code to bch_err_msg() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
197763a7 |
|
30-Aug-2023 |
Brian Foster <bfoster@redhat.com> |
bcachefs: restart journal reclaim thread on ro->rw transitions Commit c2d5ff36065a4 ("bcachefs: Start journal reclaim thread earlier") tweaked reclaim thread management to start a bit earlier in the mount sequence by moving the start call from __bch2_fs_read_write() to bch2_fs_journal_start(). This has the side effect of never starting the reclaim thread on a ro->rw transition, which can be observed by monitoring reclaim behavior via the journal_reclaim tracepoints. I.e. once an fs has remounted ro->rw, we only ever rely on direct reclaim from that point forward. Since bch2_journal_reclaim_start() properly handles the case where the reclaim thread has already been created, restore the start call in the read-write helper. This allows the reclaim thread to start early when appropriate and also exit/restart on remounts or freeze cycles. In the latter case it may be possible to simply allow the task to freeze rather than destroy it, but for now just fix the immediate bug. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7573041a |
|
18-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix bch2_mount error path In the bch2_mount() error path, we were calling deactivate_locked_super(), which calls ->kill_sb(), which in our case was calling bch2_fs_free() without __bch2_fs_stop(). This changes bch2_mount() to just call bch2_fs_stop() directly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8e877caa |
|
16-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Split out snapshot.c subvolume.c has gotten a bit large, this splits out a separate file just for managing snapshot trees - BTREE_ID_snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
401585fe |
|
05-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree_journal_iter.c Split out a new file from recovery.c for managing the list of keys we read from the journal: before journal replay finishes the btree iterator code needs to be able to iterate over and return keys from the journal as well, so there's a fair bit of code here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a37ad1a3 |
|
05-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: sb-clean.c Pull code for bch_sb_field_clean out into its own file. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dbbfca9f |
|
03-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Split up fs-io.[ch] fs-io.c is too big - time for some reorganization - fs-dio.c: direct io - fs-pagecache.c: pagecache data structures (bch_folio), utility code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e08e63e4 |
|
06-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BCH_COMPAT_bformat_overflow_done no longer required Awhile back, we changed bkey_format generation to ensure that the packed representation could never represent fields larger than the unpacked representation. This was to ensure that bkey_packed_successor() always gave a sensible result, but in the current code bkey_packed_successor() is only used in a debug assertion - not for anything important. This kills the requirement that we've gotten rid of those weird bkey formats, and instead changes the assertion to check if we're dealing with an old weird bkey format. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ef1634f0 |
|
20-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Print version, options earlier in startup path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ea28c867 |
|
10-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't start copygc until recovery is finished With "bcachefs: Snapshot depth, skiplist fields", we now can't run data move operations until after bch2_check_snapshots() is complete. Ideally we'd have the copygc (and rebalance) threads wait until c->curr_recovery_pass has advanced, but the waitlist handling is tricky - so for now, move starting copygc back to read_write_late(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7c50140f |
|
07-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Convert more -EROFS to private error codes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c8b4534d |
|
07-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Delete redundant log messages Now that we have distinct error codes for different memory allocation failures, the early init log messages are no longer needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
73bd774d |
|
06-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Assorted sparse fixes - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
faa6cb6c |
|
28-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Allow for unknown btree IDs We need to allow filesystems with metadata from newer versions to be mountable and usable by older versions. This patch enables us to roll out new btrees without a new major version number; we can now handle btree roots for unknown btree types. The unknown btree roots will be retained, and fsck (including backpointers) will check them, the same as other btree types. We add a dynamic array for the extra, unknown btree roots, in addition to the fixed size btree root array, and add new helpers for looking up btree roots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bc652905 |
|
30-Jun-2023 |
Brian Foster <bfoster@redhat.com> |
bcachefs: flush journal to avoid invalid dev usage entries on recovery A crash immediately after device removal can result in an unmountable filesystem due to recovery failure. The following command reliably reproduces on a multi-device fs: bcachefs device remove <dev> && xfs_io -xc shutdown <mnt> The post-crash mount fails with an error similar to the following, reported by fsck: invalid journal entry dev_usage at offset 7994/8034 seq 12: bad dev, fixing This refers to a device usage entry in the journal that refers to the index of the just removed device. Recovery considers this an invalid entry and fails to proceed. Device usage entries are added to journal buffer writes via bch_journal_write() -> bch2_journal_super_entries_add_common(), which means any journal buffer write has content that refers to member devices at the time of the journal write. The device remove sequence already removes metadata references to the device being removed. It then flushes any pins that refer to the device, clears replica entries, removes the in-memory device object and lastly updates the superblock to reflect that the device is no longer present. The problem is that any journal writes that occur during this sequence will include a dev usage entry so long as the device is present. To avoid this problem, we can flush the journal once more after the device entry is removed from the in-core structures, but before the superblock is updated to fully remove the device on-disk. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e3804b55 |
|
28-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_version_to_text() Add a new helper for printing out metadata versions in a standard format. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
65db6049 |
|
27-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix a null ptr deref in bch2_fs_alloc() error path This fixes a null ptr deref in bch2_free_pending_node_rewrites() when the list head wasn't initialized. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7724664f |
|
11-Jun-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: New assertions when marking filesystem clean Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e47a390a |
|
27-May-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Convert -ENOENT to private error codes As with previous conversions, replace -ENOENT uses with more informative private error codes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
02d51bb9 |
|
19-Apr-2023 |
Brian Foster <bfoster@redhat.com> |
bcachefs: remove bucket_gens btree keys on device removal If a device has keys in the bucket_gens btree associated with its buckets and is removed from a bcachefs volume, fsck will complain about the presence of keys associated with an invalid device index. A repair removes the associated keys and restores correctness. Update bch2_dev_remove_alloc() to remove device related keys at device removal time to avoid the problem. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b1c945b3 |
|
22-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Run freespace init in device hot add path Like in the recovery, and device add, we have to check if devices don't have the freespace btree initialized - this was missed in the device hot add path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8bff9875 |
|
23-Mar-2023 |
Brian Foster <bfoster@redhat.com> |
bcachefs: use dedicated workqueue for tasks holding write refs A workqueue resource deadlock has been observed when running fsck on a filesystem with a full/stuck journal. fsck is not currently able to repair the fs due to fairly rapid emergency shutdown, but rather than exit gracefully the fsck process hangs during the shutdown sequence. Fortunately this is easily recoverable from userspace, but the root cause involves code shared between the kernel and userspace and so should be addressed. The deadlock scenario involves the main task in the bch2_fs_stop() -> bch2_fs_read_only() path waiting on write references to drain with the fs state lock held. A bch2_read_only_work() workqueue task is scheduled on the system_long_wq, blocked on the state lock. Finally, various other write ref holding workqueue tasks are scheduled to run on the same workqueue and must complete in order to release references that the initial task is waiting on. To avoid this problem, we can split the dependent workqueue tasks across different workqueues. It's a bit of a waste to create a dedicated wq for the read-only worker, but there are several tasks throughout the fs that follow the pattern of acquiring a write reference and then scheduling to the system wq. Use a local wq for such tasks to break the subtle dependency between these and the read-only worker. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9edbcc72 |
|
15-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix bch2_evict_subvolume_inodes() This fixes a bug in bch2_evict_subvolume_inodes(): d_mark_dontcache() doesn't handle the case where i_count is already 0, we need to grab and put the inode in order for it to be dropped. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b40901b0 |
|
13-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: New erasure coding shutdown path This implements a new shutdown path for erasure coding, which is needed for the upcoming BCH_WRITE_WAIT_FOR_EC write path. The process is: - Cancel new stripes being built up - Close out/cancel open buckets on write points or the partial list that are for stripes - Shutdown rebalance/copygc - Then wait for in flight new stripes to finish With BCH_WRITE_WAIT_FOR_EC, move ops will be waiting on stripes to fill up before they complete; the new ec shutdown path is needed for shutting down copygc/rebalance without deadlocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b9fa375b |
|
11-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_fs_moving_ctxts_to_text() This also adds bch2_write_op_to_text(): now we can see outstand moves, useful for debugging shutdown with the upcoming BCH_WRITE_WAIT_FOR_EC and likely for other things in the future. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
65d48e35 |
|
14-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Private error codes: ENOMEM This adds private error codes for most (but not all) of our ENOMEM uses, which makes it easier to track down assorted allocation failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
83ec519a |
|
07-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: When shutting down, flush btree node writes last Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
93bd2f87 |
|
20-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve a verbose log message We should be using bch2_err_str() where applicable. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
627a2312 |
|
18-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Switch ec_stripes_heap_lock to a mutex Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
80c33085 |
|
05-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fragmentation LRU Now that we have much more efficient updates to the LRU btree, this patch adds a new LRU that indexes buckets by fragmentation. This means copygc no longer has to scan every bucket to find buckets that need to be evacuated. Changes: - A new field in bch_alloc_v4, fragmentation_lru - this corresponds to the bucket's position in the fragmentation LRU. We add a new field for this instead of calculating it as needed because we may make the fragmentation LRU optional; this field indicates whether a bucket is on the fragmentation LRU. Also, zoned devices will introduce variable bucket sizes; explicitly recording the LRU position will be safer for them. - A new copygc path for using the fragmentation LRU instead of scanning every bucket and building up an in-memory heap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a1f26d70 |
|
10-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Handle btree node rewrites before going RW Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
350175bf |
|
14-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improved nocow locking This improves the nocow lock table so that hash table entries have multiple locks, and locks specify which bucket they're for - i.e. we can now resolve hash collisions. This is important because the allocator has to skip buckets that are locked in the nocow lock table, and previously hash collisions would cause it to spuriously skip unlocked buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
15949c54 |
|
09-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't stop copygc while removing devices With the new backpointer based copygc we don't need an explicit copygc reserve, we're always evacuating buckets one at a time - so this is no longer needed, and in fact removing it fixes a deadlock in bch2_dev_allocator_remove(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a8c752bb |
|
17-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New on disk format: Backpointers This patch adds backpointers: we now have a reverse index from device and offset on that device (specifically, offset within a bucket) back to btree nodes and (non cached) data extents. The first 40 backpointers within a bucket are stored in the alloc key; after that backpointers spill over to the next backpointers btree. This is to help avoid performance regressions from additional btree updates on large streaming workloads. This patch adds all the code for creating, checking and repairing backpointers. The next patch in the series is going to use backpointers for copygc - finally getting rid of the need to scan all extents to do copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
920e69bc |
|
03-Jan-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Btree write buffer This adds a new method of doing btree updates - a straight write buffer, implemented as a flat fixed size array. This is only useful when we don't need to read from the btree in order to do the update, and when reading is infrequent - perfect for the LRU btree. This will make LRU btree updates fast enough that we'll be able to use it for persistently indexing buckets by fragmentation, which will be a massive boost to copygc performance. Changes: - A new btree_insert_type enum, for btree_insert_entries. Specifies btree, btree key cache, or btree write buffer. - bch2_trans_update_buffered(): updates via the btree write buffer don't need a btree path, so we need a new update path. - Transaction commit path changes: The update to the btree write buffer both mutates global, and can fail if there isn't currently room. Therefore we do all write buffer updates in the transaction all at once, and also if it fails we have to revert filesystem usage counter changes. If there isn't room we flush the write buffer in the transaction commit error path and retry. - A new persistent option, for specifying the number of entries in the write buffer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5f5c7466 |
|
17-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Start copygc when first going read-write In the distant past, it wasn't possible to start copygc until after journal replay had finished. Now, the btree iterator code overlays keys from the journal, so there's no reason not to start it earlier - and it solves a rare deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d94189ad |
|
08-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Debug mode for c->writes references This adds a debug mode where we split up the c->writes refcount into distinct refcounts for every codepath that takes a reference, and adds sysfs code to print the value of each ref. This will make it easier to debug shutdown hangs due to refcount leaks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dd81a060 |
|
08-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: ec_stripe_delete_work() now takes ref on c->writes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
60573ff5 |
|
20-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Make log message at startup a bit cleaner Don't print out opts= if no options have been specified. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
78c0b75c |
|
19-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: More errcode cleanup We shouldn't be overloading standard error codes now that we have provisions for bcachefs-specific errorcodes: this patch converts super.c and super-io.c to per error site errcodes, with a bit of cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b9004e85 |
|
02-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix a "no journal entries found" bug On startup, we need to ensure the first journal entry written is a flush write: after a clean shutdown we generally don't read the journal, which means we might be overwriting whatever was there previously, and there must always be at least one flush entry in the journal or recovery will fail. Found by fstests generic/388. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3e3e02e6 |
|
19-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Assorted checkpatch fixes checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bf8f8b20 |
|
11-Aug-2022 |
Daniel Hill <daniel@gluo.nz> |
bcachefs: time stats now uses the mean_and_variance module. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f3b8403e |
|
25-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Run bch2_fs_counters_init() earlier We need counters to be initialized before initializing shrinkers - the shrinker callbacks will update those counters. This fixes a segfault in userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
098ef98d |
|
18-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add private error codes for ENOSPC Continuing the saga of introducing private dedicated error codes for each error path, this patch converts ENOSPC to error codes that are subtypes of ENOSPC. We've recently had a test failure where we got -ENOSPC where we shouldn't have, and didn't have enough information to tell where it came from, so this patch will solve that problem. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
02afcb8c |
|
18-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix adding a device with a label Device labels are represented as pointers in the member info section: we need to get and then set the label for it to be kept correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
1ed0a5d2 |
|
19-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert fsck errors to errcode.h Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
d4bf5eec |
|
18-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use bch2_err_str() in error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
0e96f5dc |
|
13-Jun-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Call bch2_do_invalidates() when going read write Like bch2_do_discards(), we should check if this needs to be done when going rw. Also, add some sysfs code for debugging bucket invalidation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
401ec4db |
|
03-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Printbuf rework This converts bcachefs to the modern printbuf interface/implementation, synced with the version to be submitted upstream. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9b688da3 |
|
28-May-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix error checking in bch2_fs_alloc() One of the init calls had a ; instead of a ?:, and errors after that got dropped - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
104c6974 |
|
15-Mar-2022 |
Daniel Hill <daniel@gluo.nz> |
bcachefs: Add persistent counters This adds a new superblock field for persisting counters and adds a sysfs interface in counters/ exposing these counters. The superblock field is ignored by older versions letting us avoid an on disk version bump. Each sysfs file outputs a counter that tracks since filesystem creation and a counter for the current mount session. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c0960603 |
|
17-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Shutdown path improvements We're seeing occasional firings of the assertion in the key cache shutdown code that nr_dirty == 0, which means we must sometimes be doing transaction commits after we've gone read only. Cleanups & changes: - BCH_FS_ALLOC_CLEAN renamed to BCH_FS_CLEAN_SHUTDOWN - new helper bch2_btree_interior_updates_flush(), which returns true if it had to wait - bch2_btree_flush_writes() now also returns true if there were btree writes in flight - __bch2_fs_read_only now checks if btree writes were in flight in the shutdown loop: btree write completion does a transaction update, to update the pointer in the parent node - assert that !BCH_FS_CLEAN_SHUTDOWN in __bch2_trans_commit Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
a9c0a4cb |
|
09-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Minor device removal fixes - We weren't clearing the LRU btree - bch2_alloc_read() runs before bch2_check_alloc_key() deletes alloc keys for devices/buckets that don't exists, so it needs to check for that - bch2_check_lrus() needs to check that buckets exists - improve some error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
84c72755 |
|
08-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Initialize ec work structs early We need to ensure that work structs in bch_fs always get initialized - otherwise an error in filesystem initialization can pop a warning in the workqueue code when we try to cancel a work struct that wasn't initialized. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
ce6201c4 |
|
20-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use a genradix for reading journal entries Previously, the journal read path used a linked list for storing the journal entries we read from disk. But there's been a bug that's been causing journal_flush_delay to incorrectly be set to 0, leading to far more journal entries than is normal being written out, which then means filesystems are no longer able to start due to the O(n^2) behaviour of inserting into/searching that linked list. Fix this by switching to a radix tree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
822835ff |
|
31-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fold bucket_state in to BCH_DATA_TYPES() Previously, we were missing accounting for buckets in need_gc_gens and need_discard states. This matters because buckets in those states need other btree operations done before they can be used, so they can't be conuted when checking current number of free buckets against the allocation watermark. Also, we weren't directly counting free buckets at all. Now, data type 0 == BCH_DATA_free, and free buckets are counted; this means we can get rid of the separate (poorly defined) count of unavailable buckets. This is a new on disk format version, with upgrade and fsck required for the accounting changes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
e1effd42 |
|
05-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: More improvements for alloc info checks - Move checks for whether the device & bucket are valid from the .key_invalid method to bch2_check_alloc_key(). This is because .key_invalid() is called on keys that may no longer exist (post journal replay), which is a problem when removing/resizing devices. - We weren't checking the need_discard btree to ensure that every set bucket has a corresponding alloc key. This refactors the code for checking the freespace btree, so that it now checks both. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
59cc38b8 |
|
10-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New discard implementation In the old allocator code, buckets would be discarded just prior to being used - this made sense in bcache where we were discarding buckets just after invalidating the cached data they contain, but in a filesystem where we typically have more free space we want to be discarding buckets when they become empty. This patch implements the new behaviour - it checks the need_discard btree for buckets awaiting discards, and then clears the appropriate bit in the alloc btree, which moves the buckets to the freespace btree. Additionally, discards are now enabled by default. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f25d8215 |
|
09-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill allocator threads & freelists Now that we have new persistent data structures for the allocator, this patch converts the allocator to use them. Now, foreground bucket allocation uses the freespace btree to find buckets to allocate, instead of popping buckets off the freelist. The background allocator threads are no longer needed and are deleted, as well as the allocator freelists. Now we only need background tasks for invalidating buckets containing cached data (when we are low on empty buckets), and for issuing discards. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c6b2826c |
|
11-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Freespace, need_discard btrees This adds two new btrees for the upcoming allocator rewrite: an extents btree of free buckets, and a btree for buckets awaiting discards. We also add a new trigger for alloc keys to keep the new btrees up to date, and a compatibility path to initialize them on existing filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b17d3cec |
|
31-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Run btree updates after write out of write_point In the write path, after the write to the block device(s) complete we have to punt to process context to do the btree update. Instead of using the work item embedded in op->cl, this patch switches to a per write-point work item. This helps with two different issues: - lock contention: btree updates to the same writepoint will (usually) be updating the same alloc keys - context switch overhead: when we're bottlenecked on btree updates, having a thread (running out of a work item) checking the write point for completed ops is cheaper than queueing up a new work item and waking up a kworker. In an arbitrary benchmark, 4k random writes with fio running inside a VM, this patch resulted in a 10% improvement in total iops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
74b33393 |
|
20-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: x-macro metadata version enum Now we've got strings for metadata versions - this changes bch2_sb_to_text() and our mount log message to use it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
5521b1df |
|
04-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert bch2_sb_to_text to master option list Options no longer have to be manually added to bch2_sb_to_text() - it now uses the master list of options in opts.h. Also, improve some of the formatting by converting it to tabstops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
102a6a8f |
|
04-Mar-2022 |
Daniel Hill <daniel@gluo.nz> |
bcachefs: respect superblock discard flag. We were accidentally using default mount options and overwriting the discard flag. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fa8e94fa |
|
25-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Heap allocate printbufs This patch changes printbufs dynamically allocate and reallocate a buffer as needed. Stack usage has become a bit of a problem, and a major cause of that has been static size string buffers on the stack. The most involved part of this refactoring is that printbufs must now be exited with printbuf_exit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b66b2bc0 |
|
23-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Revert "Ensure journal doesn't get stuck in nochanges mode" This patch was originally to work around the journal geting stuck in nochanges mode - but that was just a hack, we needed to fix the actual bug. It should be fixed now, so revert it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
bf7e49a4 |
|
16-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Change bch2_dev_lookup() to not use lookup_bdev() bch2_dev_lookup() is used from the extended attribute set methods, for setting the target options, where we're already holding an inode lock - it turns out pathname lookups also take inode locks, so that was susceptible to deadlocks. Fortunately we already stash the device name in ca->name. This does change user-visible behaviour though: instead of specifying e.g. /dev/sda1, user must now specify sda1. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
c45c8667 |
|
24-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_gc_gens() no longer uses bucket array Like the previous patches, this converts bch2_gc_gens() to use the alloc btree directly, and private arrays of generation numbers for its own recalculation of oldest_gen. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
7c8f6f98 |
|
12-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_id_cached() Add a new helper that returns true if the given btree ID uses the btree key cache. This enables some new cleanups, since the helper can check the options for whether caching is enabled on a given btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
21aec962 |
|
04-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New data structure for buckets waiting on journal commit Implement a hash table, using cuckoo hashing, for empty buckets that are waiting on a journal commit before they can be reused. This replaces the journal_seq field of bucket_mark, and is part of eventually getting rid of the in memory bucket array. We may need to make bch2_bucket_needs_journal_commit() lockless, pending profiling and testing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
9b6e2f1e |
|
04-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
Revert "bcachefs: Delete some obsolete journal_seq_blacklist code" This reverts commit f95b61228efd04c9c158123da5827c96e9773b29. It turns out, we're seeing filesystems in the wild end up with blacklisted btree node bsets - this should not be happening, and until we understand why and fix it we need to keep this code around. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
03ea3962 |
|
04-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Log & error message improvements - Add a shim uuid_unparse_lower() in the kernel, since %pU doesn't work in userspace - We don't need to print the bcachefs: or the filesystem name prefix in userspace - Improve a few error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
efe68e1d |
|
03-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improved superblock-related error messages This patch converts bch2_sb_validate() and the .validate methods for the various superblock sections to take printbuf, to which they can print detailed error messages, including printing the entire section that was invalid. This is a great improvement over the previous situation, where we could only return static strings that didn't have precise information about what was wrong. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
eacb2574 |
|
02-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch_dev->dev Add a field to bch_dev for the dev_t of the underlying block device - this fixes a null ptr deref in tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
d248ee56 |
|
29-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add iter_flags arg to bch2_btree_delete_range() Will be used by the new snapshot tests, to pass in BTREE_ITER_ALL_SNAPSHOTS. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
e8536925 |
|
28-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve error messages in device add path This converts the error messages in the device add to a better style, and adds some missing ones. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
04f0f77d |
|
26-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Delete some obsolete journal_seq_blacklist code Since metadata version bcachefs_metadata_version_btree_ptr_sectors_written, we haven't needed the journal seq blacklist mechanism for ignoring blacklisted btree node writes - we now only need it for ignoring journal entries that were written after the newest flush journal entry, and then we only need to keep those blacklist entries around until journal replay is finished. That means we can delete the code for scanning btree nodes to GC journal_seq_blacklist entries. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
77170d0d |
|
24-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_bucket_alloc_new_fs() no longer depends on bucket marks Now that bch2_bucket_alloc_new_fs() isn't looking at bucket marks to decide what buckets are eligible to allocate, we can clean up the filesystem initialization and device add paths. Previously, we had to use ancient code to mark superblock/journal buckets in the in memory bucket marks as we allocated them, and then zero that out and re-do that marking using the newer transational bucket mark paths. Now, we can simply delete the in-memory bucket marking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
09943313 |
|
24-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rewrite bch2_bucket_alloc_new_fs() This changes bch2_bucket_alloc_new_fs() to a simple bump allocator that doesn't need to use the in memory bucket array, part of a larger patch series to entirely get rid of the in memory bucket array, except for gc/fsck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
8244f320 |
|
14-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Option improvements This adds flags for options that must be a power of two (block size and btree node size), and options that are stored in the superblock as a power of two (encoded extent max). Also: options are now stored in memory in the same units they're displayed in (bytes): we now convert when getting and setting from the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
991ba021 |
|
10-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add more time_stats This adds more latency/event measurements and breaks some apart into more events. Journal writes are broken apart into flush writes and noflush writes, btree compactions are broken out from btree splits, btree mergers are added, as well as btree_interior_updates - foreground and total. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
2430e72f |
|
04-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert journal sysfs params to regular options This converts journal_write_delay, journal_flush_disabled, and journal_reclaim_delay to normal filesystems options, and also adds them to the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
e2b60560 |
|
05-Nov-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Clean up error reporting in the startup path It used to be that error reporting in the startup path was done by returning strings describing the error, but that turned out to be a rather silly idea - if there's something we can describe about the error, just print it right away. This converts a good chunk of code to returning error codes, as is more typical style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
7be9ab63 |
|
04-Nov-2021 |
Chris Webb <chris@arachsys.com> |
bcachefs: Return -ENOKEY/EINVAL when mount decryption fails bch2_fs_encryption_init() correctly passes back -ENOKEY from request_key() when no unlock key is found, or -EINVAL if superblock decryption fails because of an invalid key. However, these get absorbed into a generic NULL return from bch2_fs_alloc() and later returned to user space as -ENOMEM, leading to a misleading error from mount(1): mount(2) system call failed: Out of memory. Return explicit error pointers out of bch2_fs_alloc() and handle them in both callers, so the user instead sees mount(2) system call failed: Required key not available. when attempting to mount a filesystem which is still locked. Signed-off-by: Chris Webb <chris@arachsys.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fae1157d |
|
28-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Ensure journal doesn't get stuck in nochanges mode This tweaks the journal code to always act as if there's space available in nochanges mode, when we're not going to be doing any writes. This helps in recovering filesystems that won't mount because they need journal replay and the journal has gotten stuck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
f124345e |
|
26-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Drop bch2_journal_meta() call when going RW Back when we relied on the journal sequence number blacklist machinery for consistency between btree and the journal, we needed to ensure a new journal entry was written before any btree writes were done. But, this had the side effect of consuming some space in the journal prior to doing journal replay - which could lead to a very wedged filesystem, since we don't yet have a way to grow the journal prior to going RW. Fortunately, the journal sequence number blacklist machinery isn't needed anymore, as btree node pointers now record the numer of sectors currently written to that node - that code should all be ripped out. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
114eea75 |
|
24-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix dev accounting after device add This is a hacky but effective fix to device usage stats for superblock and journal being wrong on a newly added device (following the comment that already told us how it needed to be done!) Reported-by: Chris Webb <chris@arachsys.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
a9cb0a67 |
|
07-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix bch2_dev_remove_alloc() It was missing a lockrestart_do(), to call bch2_trans_begin() and also handle transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
14b393ee |
|
15-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Subvolumes, snapshots This patch adds subvolume.c - support for the subvolumes and snapshots btrees and related data types and on disk data structures. The next patches will start hooking up this new code to existing code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
67e0dd8f |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_path This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8dd6ed94 |
|
23-Jul-2021 |
Brett Holman <bholman.devel@gmail.com> |
bcachefs: add progress stats to sysfs This adds progress stats to sysfs for copygc, rebalance, recovery, and the cmd_job ioctls. Signed-off-by: Brett Holman <bholman.devel@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9f1833ca |
|
10-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Update btree ptrs after every write This closes a significant hole (and last known hole) in our ability to verify metadata. Previously, since btree nodes are log structured, we couldn't detect lost btree writes that weren't the first write to a given node. Additionally, this seems to have lead to some significant metadata corruption on multi device filesystems with metadata replication: since a write may have made it to one device and not another, if we read that btree node back from the replica that did have that write and started appending after that point, the other replica would have a gap in the bset entries and reading from that replica wouldn't find the rest of the bsets. But, since updates to interior btree nodes are now journalled, we can close this hole by updating pointers to btree nodes after every write with the currently written number of sectors, without negatively affecting performance. This means we will always detect lost or corrupt metadata - it also means that our btree is now a curious hybrid of COW and non COW btrees, with all the benefits of both (excluding complexity). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
224ec3e6 |
|
08-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't mark superblocks past end of usable space bcachefs-tools recently started putting a backup superblock at the end of the device. This causes a problem if the bucket size doesn't divide the device size - but we can fix it by just skipping marking that part. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
c0ebe3e4 |
|
23-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Assorted endianness fixes Found by sparse Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
9f2772c4 |
|
27-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out btree_error_wq We can't use btree_update_wq becuase btree updates may be waiting on btree writes to complete. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
731bdd2e |
|
22-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a workqueue for btree io completions Also, clean up workqueue usage - we shouldn't be using system workqueues, pretty much everything we do needs to be on our own WQ_MEM_RECLAIM workqueues. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
ef1b2092 |
|
18-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Ratelimiting for writeback IOs Writeback throttling is a kernel config option and not always enabled. When it's not enabled we need a fallback, to avoid unbounded memory pinning and work item backlogs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ec4ab9d2 |
|
12-May-2021 |
Dan Robertson <dan@dlrobertson.com> |
bcachefs: Fix possible null deref on mount Ensure that the block device pointer in a superblock handle is not null before dereferencing it in bch2_dev_to_fs. The block device pointer may be null when mounting a new bcachefs filesystem given another mounted bcachefs filesystem exists that has at least one device that is offline. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3a402c8d |
|
07-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix some refcounting bugs We really need debug mode assertions that ca->ref and ca->io_ref are used correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
aae15aaf |
|
24-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New and improved topology repair code This splits out btree topology repair into a separate pass, and makes some improvements: - When we have to pick which of two overlapping nodes to drop keys from, we use the btree node header sequence number to preserve the newer node - the gc code has been changed so that it doesn't bail out if we're continuing/ignoring on fsck error - this way the dump tool can skip running the repair pass but still walk all reachable metadata - add a new superblock flag indicating when a filesystem is known to have btree topology issues, and the topology repair pass should be run - changing the start/end of a node might mean keys in that node have to be deleted: this patch handles that better by splitting it out into a separate function and running it explicitly in the topology repair code, previously those keys were only being dropped when the btree node was read in. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4932e07e |
|
24-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix key cache assertion Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d62ab355 |
|
14-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix bch2_trans_mark_dev_sb() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9d8022db |
|
06-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Eliminate more PAGE_SIZE uses In userspace, we don't really have a well defined PAGE_SIZE and shouln't be relying on it. This is some more incremental work to remove references to it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2c944fa1 |
|
19-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a print statement for when we go read-write Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2436cb9f |
|
20-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use x-macros for more enums This patch standardizes all the enums that have associated string tables (probably more enums should have string tables). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
41f8b09e |
|
20-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rename BTREE_ID enums for consistency with other enums Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9620c3ec |
|
23-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a mempool for the replicas delta list Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9ae28f82 |
|
21-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Start journal reclaim thread earlier Especially in userspace, we sometime run into resource exhaustion issues with starting up threads after mark and sweep/fsck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2ee47eec |
|
18-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix for copygc getting stuck waiting for reserve to be filled This fixes a regression from the patch bcachefs: Fix copygc dying on startup In general only the allocator thread itself should be updating ca->allocator_state, the thread waking up the allocator setting it is an ugly hack only needed to avoid racing with the copygc threads when we're first starting up. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
51c66fed |
|
17-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rip out copygc pd controller We have a separate mechanism for ratelimiting copygc now - the pd controller has only been causing problems. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
220d2062 |
|
11-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix an allocator startup race Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
59a74051 |
|
05-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Create allocator threads when allocating filesystem We're seeing failures to mount because of a failure to start the allocator threads, which currently happens fairly late in the mount process, after walking all metadata, and kthread_create() fails if something has tried to kill the mount process, which is probably not what we want. This patch avoids this issue by creating, but not starting, the allocator threads when we preallocate all of our other in memory data structures. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fcb3431b |
|
06-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Redo checks for sufficient devices When the replicas mechanism was added, for tracking data by which drives it's replicated on, the check for whether we have sufficient devices was never updated to make use of it. This patch finally does that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4b8f89af |
|
03-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fixes/improvements for journal entry reservations This fixes some arithmetic bugs in "bcachefs: Journal updates to dev usage" - additionally, it cleans things up by switching everything that goes in every journal entry to the journal_entry_res mechanism. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
180fb49d |
|
21-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal updates to dev usage This eliminates the need to scan every bucket to regenerate dev_usage at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2abe5420 |
|
21-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Persist 64 bit io clocks Originally, bcachefs - going back to bcache - stored, for each bucket, a 16 bit counter corresponding to how long it had been since the bucket was read from. But, this required periodically rescaling counters on every bucket to avoid wraparound. That wasn't an issue in bcache, where we'd perodically rewrite the per bucket metadata all at once, but in bcachefs we're trying to avoid having to walk every single bucket. This patch switches to persisting 64 bit io clocks, corresponding to the 64 bit bucket timestaps introduced in the previous patch with KEY_TYPE_alloc_v2. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5b593ee1 |
|
26-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add support for doing btree updates prior to journal replay Some errors may need to be fixed in order for GC to successfully run - walk and mark all metadata. But we can't start the allocators and do normal btree updates until after GC has completed, and allocation information is known to be consistent, so we need a different method of doing btree updates. Fortunately, we already have code for walking the btree while overlaying keys from the journal to be replayed. This patch adds an update path that adds keys to the list of keys to be replayed by journal replay, and also fixes up iterators. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bfcf840d |
|
22-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Mark superblocks transactionally More work towards getting rid of the in memory struct bucket: this path adds code for marking superblock and journal buckets via the btree, and uses it in the device add and journal resize paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
72eab8da |
|
21-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Refactor dev usage This is to make it more amenable for serialization. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a5cd80ea |
|
20-Jan-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix an assertion pop There was a race: btree node writes drop their reference on journal pins before clearing the btree_node_write_in_flight flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f299d573 |
|
13-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Refactor filesystem usage accounting Various filesystem usage counters are kept in percpu counters, with one set per in flight journal buffer. Right now all the code that deals with it assumes that there's only two buffers/sets of counters, but the number of journal bufs is getting increased to 4 in the next patch - so refactor that code to not assume a constant. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b7a9bbfc |
|
19-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Move journal reclaim to a kthread This is to make tracing easier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
14ba3706 |
|
18-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a kmem_cache for btree_key_cache objects We allocate a lot of these, and we're seeing sporading OOMs - this will help with tracking those down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a3e72262 |
|
05-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New varints Previous varint implementation used by the inode code was not nearly as fast as it could have been; partly because it was attempting to encode integers up to 96 bits (for timestamps) but this meant that encoding and decoding the length required a table lookup. Instead, we'll just encode timestamps greater than 64 bits as two separate varints; this will make decoding/encoding of inodes significantly faster overall. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1a21bf98 |
|
05-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a single slot percpu buf for btree iters Allocating our array of btree iters is a big enough allocation that it hits the buddy allocator, and we're seeing lots of lock contention. Sticking a single element buffer in front of it should help. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b5e8a699 |
|
02-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improved inode create optimization This shards new inodes into different btree nodes by using the processor ID for the high bits of the new inode number. Much faster than the previous inode create optimization - this also helps with sharding in the other btrees that index by inode number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2f33ece9 |
|
02-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Minor journal reclaim improvement With the btree key cache code, journal reclaim now has a lot more work to do. It could be the case that after journal reclaim has finished one iteration there's already more work to do, so put it in a loop to check for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
45e4dcba |
|
27-Oct-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Inode create optimization On workloads that do a lot of multithreaded creates all at once, lock contention on the inodes btree turns out to still be an issue. This patch adds a small buffer of inode numbers that are known to be free, so that we can avoid touching the btree on every create. Also, this changes inode creates to update via the btree key cache for the initial create. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
28998019 |
|
17-Oct-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Start/stop io clock hands in read/write paths This fixes a bug where the clock hands in the journal and superblock didn't match, because we were still incrementing the read clock hand while read-only. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8d6b6222 |
|
16-Oct-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improvements to writing alloc info Now that we've got transactional alloc info updates (and have for awhile), we don't need to write it out on shutdown, and we don't need to write it out on startup except when GC found errors - this is a big improvement to mount/unmount performance. This patch also fixes a few bugs where we weren't writing out alloc info (on new filesystems, and new devices) and should have been. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9f20ed15 |
|
15-Oct-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix copygc dying on startup The copygc threads errors out and makes the filesystem go RO if it ever tries to run and discovers it has no reserve allocated - which is a problem if it races with the allocator thread and its reserve hasn't been filled yet. The allocator thread doesn't start filling the copygc reserve until after BCH_FS_STARTED has been set, so make sure to wake up the allocator threads after setting that and before starting copygc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
505b7a4c |
|
15-Oct-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix errors early in the fs init process At some point bch2_fs_alloc() was changed to always call bch2_fs_free() in the error path, which means we need c->cl to always be initialized. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d5e4dcc2 |
|
08-Sep-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix unmount path There was a long standing race in the mount/unmount code - the VFS intends for mount/unmount synchronizatino to be handled by the list of superblocks, but we were still holding devices open after tearing down our superblock in the unmount path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
625104ea |
|
06-Sep-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't fail mount if device has been removed Also - make sure to show the devices we actually have open in /proc Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9f115ce9 |
|
04-Aug-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a bug with the journal_seq_blacklist mechanism Previously, we would start doing btree updates before writing the first journal entry; if this was after an unclean shutdown, this could cause those btree updates to not be blacklisted. Also, move some code to headers for userspace debug tools. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
74ed7e56 |
|
21-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't let copygc buckets be stolen by other threads And assorted other copygc fixes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e6d11615 |
|
11-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make copygc thread global Per device copygc threads don't move data to different devices and they make fragmentation works - they don't make much sense anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
89fd25be |
|
09-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use x-macros for data types Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
703e2a43 |
|
06-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Move stripe creation to workqueue This is mainly to solve a lock ordering issue, and also simplifies the code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ba6dd1dd |
|
06-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve stripe triggers/heap code Soon we'll be able to modify existing stripes - replacing empty blocks with new blocks and new p/q blocks. This patch updates the trigger code to handle pointers changing in an existing stripe; also, it significantly improves how the stripes heap works, which means we can get rid of the stripe creation/deletion lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5d20ba48 |
|
04-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use cached iterators for alloc btree Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2ca88e5a |
|
07-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree key cache This introduces a new kind of btree iterator, cached iterators, which point to keys cached in a hash table. The cache also acts as a write cache - in the update path, we journal the update but defer updating the btree until the cached entry is flushed by journal reclaim. Cache coherency is for now up to the users to handle, which isn't ideal but should be good enough for now. These new iterators will be used for updating inodes and alloc info (the alloc and stripes btrees). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1ada1606 |
|
15-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Turn c->state_lock into an rwsem Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a27443bc |
|
03-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill old allocator startup code It's not needed anymore since we can now write to buckets before updating the alloc btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
039fc4c5 |
|
28-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fixes for going RO Now that interior btree updates are fully transactional, we don't need to write out alloc info in a loop. However, interior btree updates do put more things in the journal, so we still need a loop in the RO sequence. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
00b8ccf7 |
|
25-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Interior btree updates are now fully transactional We now update the alloc info (bucket sector counts) atomically with journalling the update to the interior btree nodes, and we also set new btree roots atomically with the journalled part of the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c823c339 |
|
25-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Factor out bch2_fs_btree_interior_update_init() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b2930396 |
|
24-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix reading of alloc info after unclean shutdown When updates to interior nodes started being journalled, that meant that after an unclean shutdown, until journal replay is done we can't walk the btree without overlaying the updates from the journal. The initial btree gc was changed to walk the btree overlaying keys from the journal - but bch2_alloc_read() and bch2_stripes_read() were missed. Major whoops... Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2340fd9d |
|
24-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Be more rigorous about marking the filesystem clean Previously, there was at least one error path where we could mark the filesystem clean when we hadn't sucessfully written out alloc info. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a9310ab0 |
|
11-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fixes for startup on very full filesystems - Always pass BTREE_INSERT_USE_RESERVE when writing alloc btree keys - Don't strand buckest on the copygc freelist until after recovery is done and we're starting copygc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f1d786a0 |
|
25-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add an option for keeping journal entries after startup This will be used by the userspace debug tools. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
883f1a7c |
|
27-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Dont't del sysfs dir until after we go RO This will help for debugging hangs during unmount Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ac7c51b2 |
|
08-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Seralize btree_update operations at btree_update_nodes_written() Prep work for journalling updates to interior nodes - enforcing ordering will greatly simplify those changes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
31ba2cd3 |
|
03-Jan-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Hacky fixes for device removal The device remove test was sporadically failing, because we hadn't finished dropping btree sector counts for the device when bch2_replicas_gc2() was called - mainly due to in flight journal writes. We don't yet have a good mechanism for flushing the counts that correspend to open journal entries yet. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e731d466 |
|
26-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't export __bch2_fs_read_write BTREE_INSERT_LAZY_RW was added for this since this code was written; use it instead. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ae2f17d5 |
|
14-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill btree_node_iter_large Long overdue cleanup - this converts btree_node_iter_large uses to sort_iter. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bd7e82ee |
|
20-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill ca->freelist_lock All uses were supposed to be switched over to c->freelist_lock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
35189e09 |
|
09-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bkey_on_stack This implements code for storing small bkeys on the stack and allocating out of a mempool if they're too big. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
36e9d698 |
|
07-Sep-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Do updates in order they were queued up in Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f516c872 |
|
10-Jul-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix stripe_idx_to_delete() There was a null ptr deref when there wasn't a stripes heap allocated Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
97fd13ad |
|
10-Jul-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't try to delete stripes when RO We weren't checking for errors when trying to delet stripes, which meant ec_stripe_delete_work() would spin trying to delete the same stripe over and over. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9516950c |
|
22-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix return code from bch2_fs_start() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
619f5bee |
|
17-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: some improvements to startup messages and options Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
460651ee |
|
17-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Various improvements to bch2_alloc_write() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5e82a9a1 |
|
10-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Write out fs usage consistently Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fca1223c |
|
03-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Avoid write lock on mark_lock mark_lock is a frequently taken lock, and there's also potential for deadlocks since currently bch2_clear_page_bits which is called from memory reclaim has to take it to drop disk reservations. The disk reservation get path takes it when it recalculates the number of sectors known to be available, but it's not really needed for consistency. We just want to make sure we only have one thread updating the sectors_available count, which we can do with a dedicated mutex. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a0e0bda1 |
|
06-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Pass flags arg to bch2_alloc_write() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d1170ce5 |
|
06-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: allocate sb_read_scratch with __get_free_page kmalloc allocations aren't guranteed alignment for io Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
330581f1 |
|
04-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: disallow ever going rw if nochanges or noreplay Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1dd7f9d9 |
|
04-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rewrite journal_seq_blacklist machinery Now, we store blacklisted journal sequence numbers in the superblock, not the journal: this helps to greatly simplify the code, and more importantly it's now implemented in a way that doesn't require all btree nodes to be visited before starting the journal - instead, we unconditionally blacklist the next 4 journal sequence numbers after an unclean shutdown. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
58a46dc5 |
|
29-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: allow journal reply on ro mount Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0bc166ff |
|
28-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Track whether filesystem has errors in superblock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d5f70c1f |
|
28-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Write out alloc info more carefully In flight btree updates could update alloc info until they're flushed - so we have to try writing again after they've been flushed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
03e183cb |
|
21-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Verify fs hasn't been modified before going rw Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
134915f3 |
|
21-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Go rw lazily Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
49a67206 |
|
18-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add more time stats for being blocked on allocator Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4d8100da |
|
15-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Allocate fs_usage in do_btree_insert_at() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3aea4342 |
|
09-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix for shutting down before fs started marking it clean Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a8e00bd4 |
|
07-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: increase BTREE_ITER_MAX Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ecf37a4a |
|
14-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: fs_usage_u64s() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3e0745e2 |
|
24-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: initialize fs usage summary in recovery Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2c5af169 |
|
24-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: reserve space in journal for fs usage entries Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b935a8a6 |
|
09-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a bug when shutting down before allocator started Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
61c8d7c8 |
|
25-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Persist stripe blocks_used Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
430735cd |
|
18-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Persist alloc info on clean shutdown - Does not persist alloc info for stripes yet - Also does not yet include filesystem block/sector counts yet, from struct fs_usage - Not made use of just yet Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7ef2a73a |
|
21-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix check for if extent update is allocating Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dbaee468 |
|
20-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: fix error message in device remove path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0519b72d |
|
18-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a workqueue for journal reclaim journal reclaim writes btree nodes, which can end up waiting for in flight btree writes to complete, and btree write completions run out of workqueues - so we can't run out of the same workqueue or we risk deadlock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d3bb629d |
|
18-Dec-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: fix device remove error path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5663a415 |
|
27-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: refactor bch_fs_usage Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
73e6ab95 |
|
01-Dec-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Switch replicas to mark_lock Prep work for upcoming disk accounting changes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9166b41d |
|
25-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: s/usage_lock/mark_lock better describes what it's for, and we're going to call a new lock usage_lock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
26609b61 |
|
01-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make bkey types globally unique this lets us get rid of a lot of extra switch statements - in a lot of places we dispatch on the btree node type, and then the key type, so this is a nice cleanup across a lot of code. Also improve the on disk format versioning stuff. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5b8a9227 |
|
27-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out bkey_sort.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dfe9bfb3 |
|
24-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Stripes now properly subject to gc gc now verifies the contents of the stripes radix tree, important for persistent alloc info Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9ca53b55 |
|
23-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: gc now operates on second set of bucket marks This means we can now use gc to verify the allocation information - important for testing persistant alloc info Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cd575ddf |
|
01-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Erasure coding Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
319f9ac3 |
|
08-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: revamp to_text methods Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a420eea6 |
|
04-Nov-2018 |
Tim Schlueter <schlueter.tim@linux.com> |
bcachefs: Set the last mount time using the realtime clock This way the last mount time is actually meaningful instead of just being various times from 1970 (which happens with the monotonic clock). Also, roundup_pow_of_two() is undefined when passed in 0, so check before calling it. Signed-off-by: Tim Schlueter <schlueter.tim@linux.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b092dadd |
|
04-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Scale down number of writepoints when low on space this means we don't have to reserve space for them when calculating filesystem capacity Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7b3f84ea |
|
05-Oct-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out alloc_background.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fc3268c1 |
|
08-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill extent_insert_hook Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
581edb63 |
|
08-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: mempoolify btree_trans Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6eac2c2e |
|
24-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Change how replicated data is accounted Due to compression, the different replicas of a replicated extent don't necessarily have to take up the same amount of space - so replicated data sector counts shouldn't be stored divided by the number of replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
af1c6871 |
|
21-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: add bch_verbose() statements for shutdown Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1c6fdbd8 |
|
17-Mar-2017 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Initial commit Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write filesystem with every feature you could possibly want. Website: https://bcachefs.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|