#
6088234c |
|
05-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: JOURNAL_SPACE_LOW "bcachefs; Fix deadlock in bch2_btree_update_start()" was a significant performance regression (nearly 50%) on multithreaded random writes with fio. The reason is that the journal watermark checks multiple things, including the state of the btree write buffer, and on multithreaded update heavy workloads we're bottleneked on write buffer flushing - we don't want kicknig off btree updates to depend on the state of the write buffer. This isn't strictly correct; the interior btree update path does do write buffer updates, but it's a tiny fraction of total accounting updates and we're more concerned with space in the journal itself. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f1ca1abf |
|
13-Mar-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: pull out time_stats.[ch] prep work for lifting out of fs/bcachefs/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4f70176c |
|
31-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill unnecessary wakeups in journal reclaim Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
097471f9 |
|
17-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix bch2_journal_flush_device_pins() If a journal write errored, the list of devices it was written to could be empty - we're not supposed to mark an empty replicas list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4e074475 |
|
10-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Clamp replicas_required to replicas This prevents going emergency read only when the user has specified replicas_required > replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
41b84fb4 |
|
17-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: for_each_member_device_rcu() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9fea2274 |
|
16-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: for_each_member_device() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cf904c8d |
|
16-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_err_(fn|msg) check if should print Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
09caeabe |
|
02-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree write buffer now slurps keys from journal Previosuly, the transaction commit path would have to add keys to the btree write buffer as a separate operation, requiring additional global synchronization. This patch introduces a new journal entry type, which indicates that the keys need to be copied into the btree write buffer prior to being written out. We switch the journal entry type back to JSET_ENTRY_btree_keys prior to write, so this is not an on disk format change. Flushing the btree write buffer may require pulling keys out of journal entries yet to be written, and quiescing outstanding journal reservations; we previously added journal->buf_lock for synchronization with the journal write path. We also can't put strict bounds on the number of keys in the journal destined for the write buffer, which means we might overflow the size of the preallocated buffer and have to reallocate - this introduces a potentially fatal memory allocation failure. This is something we'll have to watch for, if it becomes an issue in practice we can do additional mitigation. The transaction commit path no longer has to explicitly check if the write buffer is full and wait on flushing; this is another performance optimization. Instead, when the btree write buffer is close to full we change the journal watermark, so that only reservations for journal reclaim are allowed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0ba9375a |
|
07-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Unwritten journal buffers are always dirty Ensure that journal bufs that haven't been written can't be reclaimed from the journal pin fifo, and can thus have new pins taken. Prep work for changing the btree write buffer to pull keys from the journal directly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
066a2646 |
|
09-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: track_event_change() This introduces a new helper for connecting time_stats to state changes, i.e. when taking journal reservations is blocked for some reason. We use this to track separately the different reasons the journal might be blocked - i.e. space in the journal full, or the journal pin fifo full. Also do some cleanup and improvements on the time stats code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3eedfe1a |
|
09-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Journal pins must always have a flush_fn flush_fn is how we identify journal pins in debugfs - this is a debugging aid. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
df8e13cc |
|
06-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add an assertion in bch2_journal_pin_set() Previously, bch2_journal_pin_set() would silently ignore a request to pin a journal sequence number that was no longer dirty, because it was used internally by bch2_journal_pin_copy() which could race with the src pin being flushed. Split these apart so that we can properly assert that @seq is a currently dirty journal sequence number - this is almost always a bug. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a66ff26b |
|
10-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Close journal entry if necessary when flushing all pins Since outstanding journal buffers hold a journal pin, when flushing all pins we need to close the current journal entry if necessary so its pin can be released. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
006ccc30 |
|
04-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill journal pre-reservations This deletes the complicated and somewhat expensive journal pre-reservation machinery in favor of just using journal watermarks: when the journal is more than half full, we run journal reclaim more aggressively, and when the journal is more than 3/4s full we only allow journal reclaim to get new journal reservations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
92b63f5b |
|
15-Sep-2023 |
Brian Foster <bfoster@redhat.com> |
bcachefs: refactor pin put helpers We have a couple journal pin put helpers to handle cases where the journal lock is already held or not. Refactor the helpers to lock and reclaim from the highest level and open code the reclaim from the one caller of the internal variant. The latter call will be moved into the journal buf release helper in a later patch. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
96dea3d5 |
|
12-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix W=12 build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e46c181a |
|
10-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Convert more code to bch_err_msg() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fb8e5b4c |
|
05-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: sb-members.c Split out a new file for bch_sb_field_members - we'll likely want to move more code here in the future. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1e81f89b |
|
06-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix assorted checkpatch nits Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9a644843 |
|
08-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix error path in bch2_journal_flush_device_pins() We need to always call bch2_replicas_gc_end() after we've called bch2_replicas_gc_start(), else we leave state around that needs to be cleaned up. Partial fix for: https://github.com/koverstreet/bcachefs/issues/560 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
73bd774d |
|
06-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Assorted sparse fixes - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d14bfd10 |
|
30-Jun-2023 |
Brian Foster <bfoster@redhat.com> |
bcachefs: mark active journal devices on journal replicas gc A simple device evacuate, remove, add test loop with concurrent shutdowns occasionally reproduces a problem where the filesystem fails to mount. The mount failure occurs because the filesystem was uncleanly shut down, yet no member device is marked for journal data in the superblock. An fsck detects the problem, restores the mark and allows the mount to proceed without further consistency issues. The reason for the lack of journal data marks is the gc mechanism invoked via bch2_journal_flush_device_pins() runs while the journal happens to be empty. This results in garbage collection of all journal replicas entries. Once the updated replicas table is written to the superblock, the filesystem is put in a transiently unrecoverable state until further journal data is written, because journal recovery expects to find at least one marked journal device whenever the filesystem is not otherwise marked clean (i.e. as on clean unmount). To fix this problem, update the journal replicas gc algorithm to always mark currently active journal replicas entries by writing to the journal. This ensures that only entries for devices that are no longer used for journaling are garbage collected, not just those that don't happen to currently hold journal data. This preserves the journal recovery invariant above and avoids putting the fs into a transiently unrecoverable state. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
19c304be |
|
28-May-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: GFP_NOIO -> GFP_NOFS GFP_NOIO dates from the bcache days, when we operated under the block layer. Now, GFP_NOFS is more appropriate, so switch all GFP_NOIO uses to GFP_NOFS. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
030e9f92 |
|
21-Mar-2023 |
Brian Foster <bfoster@redhat.com> |
bcachefs: drop unnecessary journal stuck check from space calculation The journal stucking check in bch2_journal_space_available() is particularly aggressive and can lead to premature shutdown in some rare cases. This is difficult to reproduce, but also comes along with a fatal error and so is worthwhile to be cautious. For example, we've seen instances where the journal is under heavy reservation pressure, the journal allocation path transitions into the final available journal bucket, the journal write path immediately consumes that bucket and calls into bch2_journal_space_available(), which then in turn flags the journal as stuck because there is no available space and shuts down the filesystem instead of submitting the journal write (that would have otherwise succeeded). To avoid this problem, simplify the journal stuck checking by just relying on the higher level logic in the journal reservation path. This produces more useful debug output and is a more reliable indicator that things have bogged down. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
83ec519a |
|
07-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: When shutting down, flush btree node writes last Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
637de729 |
|
11-Nov-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Ensure btree node cache is not more than half dirty Tweak journal reclaim to ensure the btree node cache isn't more than half dirty so that memory reclaim can always make progress - the same as we do for the btree key cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
a2b9a5b2 |
|
06-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix bch2_journal_flush_device_pins() It's now legal for the pin fifo to be empty, which means this code needs to be updated in order to not hit an assert. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
3e3e02e6 |
|
19-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Assorted checkpatch fixes checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
674cfc26 |
|
26-Aug-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add persistent counters for all tracepoints Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d4bf5eec |
|
18-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use bch2_err_str() in error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
1f93726e |
|
17-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Tracepoint improvements Delete some obsolete tracepoints, organize alloc tracepoints better, make a few tracepoints more consistent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
31f63fd1 |
|
14-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Introduce a separate journal watermark for copygc Since journal reclaim -> btree key cache flushing may require the allocation of new btree nodes, it has an implicit dependency on copygc in order to make forward progress - so we should avoid blocking copygc unless the journal is really close to full. This introduces watermarks to replace our single MAY_GET_UNRESERVED bit in the journal, and adds a watermark for copygc and plumbs it through. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
70a9953c |
|
05-Jan-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix bch2_journal_pin_set() When bch2_journal_pin_set() is updating an existing pin, we shouldn't call bch2_journal_reclaim_fast() after dropping the old pin and before dropping the new pin - that could reclaim the entry we're trying to pin. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7fda0f08 |
|
28-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Work around a journal self-deadlock bch2_journal_space_available -> bch2_journal_halt() self deadlocks on journal lock; work around this by dropping/retaking journal lock before we call bch2_fatal_error(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
718ce1eb |
|
06-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Skip periodic wakeup of journal reclaim when journal empty Less system noise. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
30ef633a |
|
28-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Refactor journal code to not use unwritten_idx It makes the code more readable if we work off of sequence numbers, instead of direct indexes into the array of journal buffers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
f0a3a2cc |
|
28-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal seq now incremented at entry open, not close This patch changes journal_entry_open() to initialize the new journal entry, not __journal_entry_close(). This also means that journal_cur_seq() refers to the sequence number of the last journal entry when we don't have an open journal entry, not the next one. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
2975cd47 |
|
25-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't spin in journal reclaim If we're not able to flush anything, we shouldn't keep looping. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
cb598111 |
|
25-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix journal_flush_done() journal_flush_done() was overwriting did_work, thus occasionally returning false when it did do work and occasional assertions in the shutdown sequence because we didn't completely flush the key cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
fa8e94fa |
|
25-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Heap allocate printbufs This patch changes printbufs dynamically allocate and reallocate a buffer as needed. Stack usage has become a bit of a problem, and a major cause of that has been static size string buffers on the stack. The most involved part of this refactoring is that printbufs must now be exited with printbuf_exit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b66b2bc0 |
|
23-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Revert "Ensure journal doesn't get stuck in nochanges mode" This patch was originally to work around the journal geting stuck in nochanges mode - but that was just a hack, we needed to fix the actual bug. It should be fixed now, so revert it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
3117db99 |
|
21-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't issue discards when in nochanges mode When the nochanges option is selected, we're supposed to never issue writes. Unfortunately, it seems discards were missed when implemnting this, leading to some painful filesystem corruption. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
d8601afc |
|
27-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Simplify journal replay With BTREE_ITER_WITH_JOURNAL, there's no longer any restrictions on the order we have to replay keys from the journal in, and we can also start up journal reclaim right away - and delete a bunch of code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
2430e72f |
|
04-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert journal sysfs params to regular options This converts journal_write_delay, journal_flush_disabled, and journal_reclaim_delay to normal filesystems options, and also adds them to the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
fae1157d |
|
28-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Ensure journal doesn't get stuck in nochanges mode This tweaks the journal code to always act as if there's space available in nochanges mode, when we're not going to be doing any writes. This helps in recovering filesystems that won't mount because they need journal replay and the journal has gotten stuck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
6a0f414e |
|
16-Oct-2021 |
Brett Holman <bholman.devel@gmail.com> |
bcachefs: Fix compiler warnings Type size_t is architecture-specific. Fix warnings for some non-amd64 arches. Signed-off-by: Brett Holman <bholman.devel@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d7fc453b |
|
30-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal space calculation fix When devices have different bucket sizes, we may accumulate a journal write that doesn't fit on some of our devices - previously, we'd underflow when calculating space on that device and then everything would get weird. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
2ce867df |
|
28-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make sure to initialize j->last_flushed If the journal reclaim thread makes it to the timeout without ever initializing j->last_flushed, we could end up sleeping for a very long time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f09517fc |
|
20-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a deadlock on journal reclaim Flushing the btree key cache needs to use allocation reserves - journal reclaim depends on flushing the btree key cache for making forward progress, and the allocator and copygc depend on journal reclaim making forward progress. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
96f399d0 |
|
15-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix journal reclaim loop When dirty key cache keys were separated from other journal pins, we broke the loop conditional in __bch2_journal_reclaim() - it's supposed to keep looping as long as there's work to do. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
241e2636 |
|
31-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't flush btree writes more aggressively because of btree key cache We need to flush the btree key cache when it's too dirty, because otherwise the shrinker won't be able to reclaim memory - this is done by journal reclaim. But journal reclaim also kicks btree node writes: this meant that btree node writes were getting kicked much too often just because we needed to flush btree key cache keys. This patch splits journal pins into two different lists, and teaches journal reclaim to not flush btree node writes when it only needs to flush key cache keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2940295c |
|
03-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Be more careful about JOURNAL_RES_GET_RESERVED JOURNAL_RES_GET_RESERVED should only be used for updatse that need to be done to free up space in the journal. In particular, when we're flushing keys from the key cache, if we're flushing them out of order we shouldn't be using it, since we're using up our remaining space in the journal without dropping a pin that will let us make forward progress. With this patch, BTREE_INSERT_JOURNAL_RECLAIM without BTREE_INSERT_JOURNAL_RESERVED may return -EAGAIN - we can't wait on journal reclaim if we're already in journal reclaim. This means we need to propagate these errors up to journal reclaim, indicating that flushing a journal pin should be retried in the future. This is prep work for a patch to change the way journal reclaim works, to split out flushing key cache keys because the btree key cache is too dirty from journal reclaim because we need space in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
24db24c7 |
|
31-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't make foreground writes wait behind journal reclaim too long Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c5f51cdd |
|
28-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Have journal reclaim thread flush more aggressively This adds a new watermark for the journal reclaim when flushing btree key cache entries - it should try and stay ahead of where foreground threads doing transaction commits will enter direct journal reclaim. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
331194a2 |
|
24-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree key cache locking improvements The btree key cache mutex was becoming a significant bottleneck - it was mainly used to protect the lists of dirty, clean and freed cached keys. This patch eliminates the dirty and clean lists - instead, when we need to scan for keys to drop from the cache we iterate over the rhashtable, and thus we're able to remove most uses of that lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dab9ef0d |
|
23-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add error message for some allocation failures Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d483dd17 |
|
16-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix race between journal_seq_copy() and journal_seq_drop() In bch2_btree_interior_update_will_free_node, we copy the journal pins from outstanding writes on the btree node we're about to free. But, this can race with the writes completing, and dropping their journal pins. To guard against this, just use READ_ONCE() in bch2_journal_pin_copy(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b18df768 |
|
06-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Prevent journal reclaim from spinning Without checking if we actually flushed anything, journal reclaim could still go into an infinite loop while trying ot shut down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f51e84fe |
|
05-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix btree key cache dirty checks Had a type that meant we were triggering journal reclaim _much_ more aggressively than needed. Also, fix a potential integer overflow. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5d32c5bb |
|
05-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Be more conservation about journal pre-reservations - Try to always keep 1/8th of the journal free, on top of pre-reservations - Move the check for whether the journal is stuck to bch2_journal_space_available, and make it only fire when there aren't any journal writes in flight (that might free up space by updating last_seq) Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
adbcada4 |
|
14-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't require flush/fua on every journal write This patch adds a flag to journal entries which, if set, indicates that they weren't done as flush/fua writes. - non flush/fua journal writes don't update last_seq (i.e. they don't free up space in the journal), thus the journal free space calculations now check whether nonflush journal writes are currently allowed (i.e. are we low on free space, or would doing a flush write free up a lot of space in the journal) - write_delay_ms, the user configurable option for when open journal entries are automatically written, is now interpreted as the max delay between flush journal writes (default 1 second). - bch2_journal_flush_seq_async is changed to ensure a flush write >= the requested sequence number has happened - journal read/replay must now ignore, and blacklist, any journal entries newer than the most recent flush entry in the journal. Also, the way the read_entire_journal option is handled has been improved; struct journal_replay now has an entry, 'ignore', for entries that were read but should not be used. - assorted refactoring and improvements related to journal read in journal_io.c and recovery.c Previously, we'd have to issue a flush/fua write every time we accumulated a full journal entry - typically the bucket size. Now we need to issue them much less frequently: when an fsync is requested, or it's been more than write_delay_ms since the last flush, or when we need to free up space in the journal. This is a significant performance improvement on many write heavy workloads. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b6df4325 |
|
13-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve journal free space calculations Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ebb84d09 |
|
13-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Increase journal pipelining This patch increases the maximum journal buffers in flight from 2 to 4 - this will be particularly helpful when in the future we stop requiring flush+fua for every journal write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
afa7cb0c |
|
03-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Check for errors in bch2_journal_reclaim() If the journal is halted, journal reclaim won't necessarily be able to make any forward progress, and won't accomplish anything anyways - we should bail out so that we don't get stuck looping in reclaim when the caches are too dirty and we should be shutting down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
231db03c |
|
01-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal pin refactoring This deletes some duplicated code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5731cf01 |
|
29-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix journal reclaim spinning in recovery We can't run journal reclaim until we've finished replaying updates to interior btree nodes - the check for this was in the wrong place though, leading to journal reclaim spinning before it was allowed to proceed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b7a9bbfc |
|
19-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Move journal reclaim to a kthread This is to make tracing easier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9d4582ff |
|
19-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal reclaim requires memalloc_noreclaim_save() Memory reclaim requires journal reclaim to make forward progress - it's what cleans our caches - thus, while we're in journal reclaim or holding the journal reclaim lock we can't recurse into memory reclaim. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8a92e545 |
|
19-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Ensure journal reclaim runs when btree key cache is too dirty Ensuring the key cache isn't too dirty is critical for ensuring that the shrinker can reclaim memory. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ed0e24c0 |
|
18-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Be more precise with journal error reporting We were incorrectly detecting a journal deadlock - the journal filling up - when only the journal pin fifo had filled up; if the journal pin fifo is full that just means we need to wait on reclaim. This plumbs through better error reporting so we can better discriminate in the journal_res_get path what's going on. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f526d26d |
|
11-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix btree key cache shutdown On emergency shutdown, we might still have dirty keys in the btree key cache that need to be cleaned up properly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6a747c46 |
|
09-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add accounting for dirty btree nodes/keys This lets us improve journal reclaim, so that it now tries to make sure no more than 3/4s of the btree node cache and btree key cache are dirty - ensuring the shrinkers can free memory. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2f33ece9 |
|
02-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Minor journal reclaim improvement With the btree key cache code, journal reclaim now has a lot more work to do. It could be the case that after journal reclaim has finished one iteration there's already more work to do, so put it in a loop to check for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
89fd25be |
|
09-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use x-macros for data types Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5d20ba48 |
|
04-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use cached iterators for alloc btree Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2ca88e5a |
|
07-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree key cache This introduces a new kind of btree iterator, cached iterators, which point to keys cached in a hash table. The cache also acts as a write cache - in the update path, we journal the update but defer updating the btree until the cached entry is flushed by journal reclaim. Cache coherency is for now up to the users to handle, which isn't ideal but should be good enough for now. These new iterators will be used for updating inodes and alloc info (the alloc and stripes btrees). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a27443bc |
|
03-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill old allocator startup code It's not needed anymore since we can now write to buckets before updating the alloc btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
039fc4c5 |
|
28-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fixes for going RO Now that interior btree updates are fully transactional, we don't need to write out alloc info in a loop. However, interior btree updates do put more things in the journal, so we still need a loop in the RO sequence. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
00b8ccf7 |
|
25-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Interior btree updates are now fully transactional We now update the alloc info (bucket sector counts) atomically with journalling the update to the interior btree nodes, and we also set new btree roots atomically with the journalled part of the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
94035eed |
|
10-Apr-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a locking bug in bch2_journal_pin_copy() There was a race where the src pin would be flushed - releasing the last pin on that sequence number - before adding the new journal pin. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3f58a197 |
|
27-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal pin cleanups Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b5d05635 |
|
07-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: minor journal reclaim fixes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
68ef94a6 |
|
19-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a pre-reserve mechanism for the journal Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9ace606e |
|
28-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't block on reclaim_lock from journal_res_get When we're doing btree updates from journal flush, this becomes a locking inversion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
03d5eaed |
|
03-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_journal_space_available improvements Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2384db8f |
|
03-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Separate discards from rest of journal reclaim Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0ce2dbbe |
|
03-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: ja->discard_idx, ja->dirty_idx Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dc9aa178 |
|
01-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Drop a faulty assertion the assertion was meant to check that bch2_journal_reclaim_fast() was always being called, but since the atomic dec can happen outside of j->lock the assertion itself can race Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e5a66496 |
|
21-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal reclaim refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7ef2a73a |
|
21-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix check for if extent update is allocating Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0519b72d |
|
18-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a workqueue for journal reclaim journal reclaim writes btree nodes, which can end up waiting for in flight btree writes to complete, and btree write completions run out of workqueues - so we can't run out of the same workqueue or we risk deadlock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
000de459 |
|
13-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: fixes for getting stuck flushing journal pins Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3636ed48 |
|
17-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Deferred btree updates Will be used in the future for inode updates, which will be very helpful for multithreaded workloads that have to update the inode with every extent update (appends, or updates that change i_sectors) Also will be used eventually for fully persistent alloc info However - we still need a mechanism for reserving space in the journal prior to getting a journal reservation, so it's not technically safe to make use of this just yet, we could deadlock with the journal full (although not likely to be an issue in practice) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a9ec3454 |
|
18-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4077991c |
|
16-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a use after free in the journal code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1c6fdbd8 |
|
17-Mar-2017 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Initial commit Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write filesystem with every feature you could possibly want. Website: https://bcachefs.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|