Cross Reference: /linux-master/fs/bcachefs/journal

History log of /linux-master/fs/bcachefs/journal_reclaim.c
Revision	Date	Author	Comments
# 6088234c	05-Apr-2024	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: JOURNAL_SPACE_LOW "bcachefs; Fix deadlock in bch2_btree_update_start()" was a significant performance regression (nearly 50%) on multithreaded random writes with fio. The reason is that the journal watermark checks multiple things, including the state of the btree write buffer, and on multithreaded update heavy workloads we're bottleneked on write buffer flushing - we don't want kicknig off btree updates to depend on the state of the write buffer. This isn't strictly correct; the interior btree update path does do write buffer updates, but it's a tiny fraction of total accounting updates and we're more concerned with space in the journal itself. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# f1ca1abf	13-Mar-2024	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: pull out time_stats.[ch] prep work for lifting out of fs/bcachefs/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 4f70176c	31-Jan-2024	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: Kill unnecessary wakeups in journal reclaim Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 097471f9	17-Feb-2024	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: Fix bch2_journal_flush_device_pins() If a journal write errored, the list of devices it was written to could be empty - we're not supposed to mark an empty replicas list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 4e074475	10-Feb-2024	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: Clamp replicas_required to replicas This prevents going emergency read only when the user has specified replicas_required > replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 41b84fb4	17-Dec-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: for_each_member_device_rcu() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 9fea2274	16-Dec-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: for_each_member_device() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# cf904c8d	16-Dec-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: bch_err_(fn\|msg) check if should print Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 09caeabe	02-Nov-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: btree write buffer now slurps keys from journal Previosuly, the transaction commit path would have to add keys to the btree write buffer as a separate operation, requiring additional global synchronization. This patch introduces a new journal entry type, which indicates that the keys need to be copied into the btree write buffer prior to being written out. We switch the journal entry type back to JSET_ENTRY_btree_keys prior to write, so this is not an on disk format change. Flushing the btree write buffer may require pulling keys out of journal entries yet to be written, and quiescing outstanding journal reservations; we previously added journal->buf_lock for synchronization with the journal write path. We also can't put strict bounds on the number of keys in the journal destined for the write buffer, which means we might overflow the size of the preallocated buffer and have to reallocate - this introduces a potentially fatal memory allocation failure. This is something we'll have to watch for, if it becomes an issue in practice we can do additional mitigation. The transaction commit path no longer has to explicitly check if the write buffer is full and wait on flushing; this is another performance optimization. Instead, when the btree write buffer is close to full we change the journal watermark, so that only reservations for journal reclaim are allowed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 0ba9375a	07-Nov-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: Unwritten journal buffers are always dirty Ensure that journal bufs that haven't been written can't be reclaimed from the journal pin fifo, and can thus have new pins taken. Prep work for changing the btree write buffer to pull keys from the journal directly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 066a2646	09-Nov-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: track_event_change() This introduces a new helper for connecting time_stats to state changes, i.e. when taking journal reservations is blocked for some reason. We use this to track separately the different reasons the journal might be blocked - i.e. space in the journal full, or the journal pin fifo full. Also do some cleanup and improvements on the time stats code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 3eedfe1a	09-Nov-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: Journal pins must always have a flush_fn flush_fn is how we identify journal pins in debugfs - this is a debugging aid. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# df8e13cc	06-Nov-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: Add an assertion in bch2_journal_pin_set() Previously, bch2_journal_pin_set() would silently ignore a request to pin a journal sequence number that was no longer dirty, because it was used internally by bch2_journal_pin_copy() which could race with the src pin being flushed. Split these apart so that we can properly assert that @seq is a currently dirty journal sequence number - this is almost always a bug. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# a66ff26b	10-Dec-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: Close journal entry if necessary when flushing all pins Since outstanding journal buffers hold a journal pin, when flushing all pins we need to close the current journal entry if necessary so its pin can be released. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 006ccc30	04-Nov-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: Kill journal pre-reservations This deletes the complicated and somewhat expensive journal pre-reservation machinery in favor of just using journal watermarks: when the journal is more than half full, we run journal reclaim more aggressively, and when the journal is more than 3/4s full we only allow journal reclaim to get new journal reservations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 92b63f5b	15-Sep-2023	Brian Foster <bfoster@redhat.com>	bcachefs: refactor pin put helpers We have a couple journal pin put helpers to handle cases where the journal lock is already held or not. Refactor the helpers to lock and reclaim from the highest level and open code the reclaim from the one caller of the internal variant. The latter call will be moved into the journal buf release helper in a later patch. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 96dea3d5	12-Sep-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: Fix W=12 build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# e46c181a	10-Sep-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: Convert more code to bch_err_msg() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# fb8e5b4c	05-Aug-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: sb-members.c Split out a new file for bch_sb_field_members - we'll likely want to move more code here in the future. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 1e81f89b	06-Aug-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: Fix assorted checkpatch nits Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 9a644843	08-Jul-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: Fix error path in bch2_journal_flush_device_pins() We need to always call bch2_replicas_gc_end() after we've called bch2_replicas_gc_start(), else we leave state around that needs to be cleaned up. Partial fix for: https://github.com/koverstreet/bcachefs/issues/560 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 73bd774d	06-Jul-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: Assorted sparse fixes - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# d14bfd10	30-Jun-2023	Brian Foster <bfoster@redhat.com>	bcachefs: mark active journal devices on journal replicas gc A simple device evacuate, remove, add test loop with concurrent shutdowns occasionally reproduces a problem where the filesystem fails to mount. The mount failure occurs because the filesystem was uncleanly shut down, yet no member device is marked for journal data in the superblock. An fsck detects the problem, restores the mark and allows the mount to proceed without further consistency issues. The reason for the lack of journal data marks is the gc mechanism invoked via bch2_journal_flush_device_pins() runs while the journal happens to be empty. This results in garbage collection of all journal replicas entries. Once the updated replicas table is written to the superblock, the filesystem is put in a transiently unrecoverable state until further journal data is written, because journal recovery expects to find at least one marked journal device whenever the filesystem is not otherwise marked clean (i.e. as on clean unmount). To fix this problem, update the journal replicas gc algorithm to always mark currently active journal replicas entries by writing to the journal. This ensures that only entries for devices that are no longer used for journaling are garbage collected, not just those that don't happen to currently hold journal data. This preserves the journal recovery invariant above and avoids putting the fs into a transiently unrecoverable state. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 19c304be	28-May-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: GFP_NOIO -> GFP_NOFS GFP_NOIO dates from the bcache days, when we operated under the block layer. Now, GFP_NOFS is more appropriate, so switch all GFP_NOIO uses to GFP_NOFS. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 030e9f92	21-Mar-2023	Brian Foster <bfoster@redhat.com>	bcachefs: drop unnecessary journal stuck check from space calculation The journal stucking check in bch2_journal_space_available() is particularly aggressive and can lead to premature shutdown in some rare cases. This is difficult to reproduce, but also comes along with a fatal error and so is worthwhile to be cautious. For example, we've seen instances where the journal is under heavy reservation pressure, the journal allocation path transitions into the final available journal bucket, the journal write path immediately consumes that bucket and calls into bch2_journal_space_available(), which then in turn flags the journal as stuck because there is no available space and shuts down the filesystem instead of submitting the journal write (that would have otherwise succeeded). To avoid this problem, simplify the journal stuck checking by just relying on the higher level logic in the journal reservation path. This produces more useful debug output and is a more reliable indicator that things have bogged down. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 83ec519a	07-Mar-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: When shutting down, flush btree node writes last Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 637de729	11-Nov-2021	Kent Overstreet <kent.overstreet@gmail.com>	bcachefs: Ensure btree node cache is not more than half dirty Tweak journal reclaim to ensure the btree node cache isn't more than half dirty so that memory reclaim can always make progress - the same as we do for the btree key cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
# a2b9a5b2	06-Mar-2022	Kent Overstreet <kent.overstreet@gmail.com>	bcachefs: Fix bch2_journal_flush_device_pins() It's now legal for the pin fifo to be empty, which means this code needs to be updated in order to not hit an assert. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
# 3e3e02e6	19-Oct-2022	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: Assorted checkpatch fixes checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 674cfc26	26-Aug-2022	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: Add persistent counters for all tracepoints Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# d4bf5eec	18-Jul-2022	Kent Overstreet <kent.overstreet@gmail.com>	bcachefs: Use bch2_err_str() in error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
# 1f93726e	17-Apr-2022	Kent Overstreet <kent.overstreet@gmail.com>	bcachefs: Tracepoint improvements Delete some obsolete tracepoints, organize alloc tracepoints better, make a few tracepoints more consistent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
# 31f63fd1	14-Mar-2022	Kent Overstreet <kent.overstreet@gmail.com>	bcachefs: Introduce a separate journal watermark for copygc Since journal reclaim -> btree key cache flushing may require the allocation of new btree nodes, it has an implicit dependency on copygc in order to make forward progress - so we should avoid blocking copygc unless the journal is really close to full. This introduces watermarks to replace our single MAY_GET_UNRESERVED bit in the journal, and adds a watermark for copygc and plumbs it through. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 70a9953c	05-Jan-2023	Kent Overstreet <kent.overstreet@linux.dev>	bcachefs: Fix bch2_journal_pin_set() When bch2_journal_pin_set() is updating an existing pin, we shouldn't call bch2_journal_reclaim_fast() after dropping the old pin and before dropping the new pin - that could reclaim the entry we're trying to pin. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
# 7fda0f08	28-Mar-2022	Kent Overstreet <kent.overstreet@gmail.com>	bcachefs: Work around a journal self-deadlock bch2_journal_space_available -> bch2_journal_halt() self deadlocks on journal lock; work around this by dropping/retaking journal lock before we call bch2_fatal_error(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
# 718ce1eb	06-Mar-2022	Kent Overstreet <kent.overstreet@gmail.com>	bcachefs: Skip periodic wakeup of journal reclaim when journal empty Less system noise. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
# 30ef633a	28-Feb-2022	Kent Overstreet <kent.overstreet@gmail.com>	bcachefs: Refactor journal code to not use unwritten_idx It makes the code more readable if we work off of sequence numbers, instead of direct indexes into the array of journal buffers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
# f0a3a2cc	28-Feb-2022	Kent Overstreet <kent.overstreet@gmail.com>	bcachefs: Journal seq now incremented at entry open, not close This patch changes journal_entry_open() to initialize the new journal entry, not __journal_entry_close(). This also means that journal_cur_seq() refers to the sequence number of the last journal entry when we don't have an open journal entry, not the next one. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
# 2975cd47	25-Feb-2022	Kent Overstreet <kent.overstreet@gmail.com>	bcachefs: Don't spin in journal reclaim If we're not able to flush anything, we shouldn't keep looping. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
# cb598111	25-Feb-2022	Kent Overstreet <kent.overstreet@gmail.com>	bcachefs: Fix journal_flush_done() journal_flush_done() was overwriting did_work, thus occasionally returning false when it did do work and occasional assertions in the shutdown sequence because we didn't completely flush the key cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>