#
eb386617 |
|
21-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Errcode tracepoint, documentation Add a tracepoint for downcasting private errors to standard errors, so they can be recovered even when not logged; also, add some documentation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ec4edd7b |
|
16-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Prep work for variable size btree node buffers bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b97de453 |
|
15-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve trace_trans_restart_relock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e6a2566f |
|
15-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Better journal tracepoints Factor out bch2_journal_bufs_to_text(), and use it in the journal_entry_full() tracepoint; when we can't get a journal reservation we need to know the outstanding journal entry sizes to know if the problem is due to excessive flushing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fa3185af |
|
15-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Re-add move_extent_write tracepoint It appears this was accidentally deleted at some point - also, do a bit of cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c13fbb7d |
|
04-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve would_deadlock trace event We now include backtraces for every thread involved in the cycle. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a83b6c89 |
|
10-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: kill btree_path->(alloc_seq|downgrade_seq) These were for extra info in tracepoints for debugging a specialized issue - we do not want to bloat btree_path for this, at least in release builds. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a564c9fa |
|
02-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Include btree_trans in more tracepoints This gives us more context information - e.g. which codepath is invoking btree node reads. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
33981244 |
|
26-May-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve trace_trans_restart_would_deadlock In the CI, we're seeing tests failing due to excessive would_deadlock transaction restarts - the tracepoint now includes the lock cycle that occured. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e153a0d7 |
|
26-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve trace_trans_restart_too_many_iters() We now include the list of paths in use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
56db2429 |
|
02-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve btree write buffer tracepoints - add a tracepoint for write_buffer_flush_sync; this is expensive - fix the write_buffer_flush_slowpath tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
25d1e39d |
|
24-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add a rebalance, data_update tracepoints Add a tracepoint for rebalance, printing out - the target option - the compression option - the key being rebalanced Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ae0e6117 |
|
16-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add a tracepoint for journal entry close Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
eb54e81f |
|
12-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve btree_path_dowgrade tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ae4d612c |
|
26-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: trace_move_extent_start_fail() now includes errcode Renamed from trace_move_extent_alloc_mem_fail, because there are other reasons we colud fail (disk space allocation failure). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
006ccc30 |
|
04-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill journal pre-reservations This deletes the complicated and somewhat expensive journal pre-reservation machinery in favor of just using journal watermarks: when the journal is more than half full, we run journal reclaim more aggressively, and when the journal is more than 3/4s full we only allow journal reclaim to get new journal reservations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
be9e782d |
|
27-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't downgrade locks on transaction restart We should only be downgrading locks on success - otherwise, our transaction restarts won't be getting the correct locks and we'll livelock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
96a363a7 |
|
23-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: move: move_stats refactoring data_progress_list is gone - it was redundant with moving_context_list The upcoming rebalance rewrite is going to have it using two different move_stats objects with the same moving_context, depending on whether it's scanning or using the rebalance_work btree - this patch plumbs stats around a bit differently so that will work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
88dfe193 |
|
19-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_btree_id_str() Since we can run with unknown btree IDs, we can't directly index btree IDs into fixed size arrays. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
feb5cc39 |
|
11-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: trace_read_nopromote() Add a tracepoint to print the reason a read wasn't promoted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
55d5276d |
|
17-Aug-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve btree_path_relock_fail tracepoint In https://github.com/koverstreet/bcachefs/issues/450, we're seeing unexplained btree_path_relock_fail events - according to the information currently in the tracepoint, it appears the relock should be succeeding. This adds lock counts to the tracepoint to help track it down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
49c7cd9d |
|
30-May-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: More drop_locks_do() conversions Using drop_locks_do() ensures that every unlock() is paired with a relock(), with proper error checking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1fb4fe63 |
|
20-May-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
six locks: Kill six_lock_state union As suggested by Linus, this drops the six_lock_state union in favor of raw bitmasks. On the one hand, bitfields give more type-level structure to the code. However, a significant amount of the code was working with six_lock_state as a u64/atomic64_t, and the conversions from the bitfields to the u64 were deemed a bit too out-there. More significantly, because bitfield order is poorly defined (#ifdef __LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the sequence number would overflow into the rest of the bitfield if the compiler didn't put the sequence number at the high end of the word. The new code is a bit saner when we're on an architecture without real atomic64_t support - all accesses to lock->state now go through atomic64_*() operations. On architectures with real atomic64_t support, we additionally use atomic bit ops for setting/clearing individual bits. Text size: 7467 bytes -> 4649 bytes - compilers still suck at bitfields. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5a21764d |
|
20-Apr-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve move path tracepoints Move path tracepoints now include the key being moved. Also, add new tracepoints for the start of move_extent, and evacuate_bucket. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bb6c4b92 |
|
10-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve trace_move_extent_fail() This greatly expands the move_extent_fail tracepoint - now it includes all the information we have available, including exactly why the extent wasn't updated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3d86f13d |
|
30-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve trans_restart_split_race tracepoint Seeing occasional test failures where we get stuck in a livelock that involves this event - this will help track it down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7635e1a6 |
|
25-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Rework open bucket partial list allocation Now, any open_bucket can go on the partial list: allocating from the partial list has been moved to its own dedicated function, open_bucket_add_bucets() -> bucket_alloc_set_partial(). In particular, this means that erasure coded buckets can safely go on the partial list; the new location works with the "allocate an ec bucket first, then the rest" logic. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1a14e255 |
|
24-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Make bucket_alloc tracepoint more readable Print bucket in dev:bucket notation, to be consistent with how we refer to buckets elsewhere. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e151580d |
|
20-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add tracepoint & counter for btree split race Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
80c33085 |
|
05-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fragmentation LRU Now that we have much more efficient updates to the LRU btree, this patch adds a new LRU that indexes buckets by fragmentation. This means copygc no longer has to scan every bucket to find buckets that need to be evacuated. Changes: - A new field in bch_alloc_v4, fragmentation_lru - this corresponds to the bucket's position in the fragmentation LRU. We add a new field for this instead of calculating it as needed because we may make the fragmentation LRU optional; this field indicates whether a bucket is on the fragmentation LRU. Also, zoned devices will introduce variable bucket sizes; explicitly recording the LRU position will be safer for them. - A new copygc path for using the fragmentation LRU instead of scanning every bucket and building up an in-memory heap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a8b3a677 |
|
02-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Nocow support This adds support for nocow mode, where we do writes in-place when possible. Patch components: - New boolean filesystem and inode option, nocow: note that when nocow is enabled, data checksumming and compression are implicitly disabled - To prevent in-place writes from racing with data moves (data_update.c) or bucket reuse (i.e. a bucket being reused and re-allocated while a nocow write is in flight, we have a new locking mechanism. Buckets can be locked for either data update or data move, using a fixed size hash table of two_state_shared locks. We don't have any chaining, meaning updates and moves to different buckets that hash to the same lock will wait unnecessarily - we'll want to watch for this becoming an issue. - The allocator path also needs to check for in-place writes in flight to a given bucket before giving it out: thus we add another counter to bucket_alloc_state so we can track this. - Fsync now may need to issue cache flushes to block devices instead of flushing the journal. We add a device bitmask to bch_inode_info, ei_devs_need_flush, which tracks devices that need to have flushes issued - note that this will lead to unnecessary flushes when other codepaths have already issued flushes, we may want to replace this with a sequence number. - New nocow write path: look up extents, and if they're writable write to them - otherwise fall back to the normal COW write path. XXX: switch to sequence numbers instead of bitmask for devs needing journal flush XXX: ei_quota_lock being a mutex means bch2_nocow_write_done() needs to run in process context - see if we can improve this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8e3f913e |
|
17-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Copygc now uses backpointers Previously, copygc needed to walk the entire extents & reflink btrees to find extents that needed to be moved. Now that we have backpointers, this patch implements bch2_evacuate_bucket() in the move code, which copygc now uses for evacuating mostly empty buckets. Also, thanks to the new backpointers code, copygc can now move btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
920e69bc |
|
03-Jan-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Btree write buffer This adds a new method of doing btree updates - a straight write buffer, implemented as a flat fixed size array. This is only useful when we don't need to read from the btree in order to do the update, and when reading is infrequent - perfect for the LRU btree. This will make LRU btree updates fast enough that we'll be able to use it for persistently indexing buckets by fragmentation, which will be a massive boost to copygc performance. Changes: - A new btree_insert_type enum, for btree_insert_entries. Specifies btree, btree key cache, or btree write buffer. - bch2_trans_update_buffered(): updates via the btree write buffer don't need a btree path, so we need a new update path. - Transaction commit path changes: The update to the btree write buffer both mutates global, and can fail if there isn't currently room. Therefore we do all write buffer updates in the transaction all at once, and also if it fails we have to revert filesystem usage counter changes. If there isn't room we flush the write buffer in the transaction commit error path and retry. - A new persistent option, for specifying the number of entries in the write buffer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
adf6360b |
|
01-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve btree_reserve_get_fail tracepoint Now we include the return code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a1019576 |
|
22-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: More style fixes Fixes for various checkpatch errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ae10fe01 |
|
04-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bucket_alloc_state This refactoring puts our various allocation path counters into a dedicated struct - the upcoming nocow patch is going to add another counter. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
68b6cd19 |
|
26-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve bucket_alloc tracepoint It now includes more info - whether the bucket was for metadata or data - and also call it in the same place as the bucket_alloc_fail tracepoint. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0d7009d7 |
|
22-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Delete old deadlock avoidance code This deletes our old lock ordering based deadlock avoidance code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
33bd5d06 |
|
22-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Deadlock cycle detector We've outgrown our own deadlock avoidance strategy. The btree iterator API provides an interface where the user doesn't need to concern themselves with lock ordering - different btree iterators can be traversed in any order. Without special care, this will lead to deadlocks. Our previous strategy was to define a lock ordering internally, and whenever we attempt to take a lock and trylock() fails, we'd check if the current btree transaction is holding any locks that cause a lock ordering violation. If so, we'd issue a transaction restart, and then bch2_trans_begin() would re-traverse all previously used iterators, but in the correct order. That approach had some issues, though. - Sometimes we'd issue transaction restarts unnecessarily, when no deadlock would have actually occured. Lock ordering restarts have become our primary cause of transaction restarts, on some workloads totally 20% of actual transaction commits. - To avoid deadlock or livelock, we'd often have to take intent locks when we only wanted a read lock: with the lock ordering approach, it is actually illegal to hold _any_ read lock while blocking on an intent lock, and this has been causing us unnecessary lock contention. - It was getting fragile - the various lock ordering rules are not trivial, and we'd been seeing occasional livelock issues related to this machinery. So, since bcachefs is already a relational database masquerading as a filesystem, we're stealing the next traditional database technique and switching to a cycle detector for avoiding deadlocks. When we block taking a btree lock, after adding ourself to the waitlist but before sleeping, we do a DFS of btree transactions waiting on other btree transactions, starting with the current transaction and walking our held locks, and transactions blocking on our held locks. If we find a cycle, we emit a transaction restart. Occasionally (e.g. the btree split path) we can not allow the lock() operation to fail, so if necessary we'll tell another transaction that it has to fail. Result: trans_restart_would_deadlock events are reduced by a factor of 10 to 100, and we'll be able to delete a whole bunch of grotty, fragile code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
367d72dd |
|
17-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_btree_path_upgrade() now emits transaction restart Centralizing the transaction restart/tracepoint in bch2_btree_path_upgrade() lets us improve the tracepoint - now it emits old and new locks_want. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
674cfc26 |
|
26-Aug-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add persistent counters for all tracepoints Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ce56bf7f |
|
26-Aug-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve trans_restart_journal_preres_get tracepoint It now includes journal_flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5f1dd9a6 |
|
26-Aug-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve btree_node_relock_fail tracepoint It now prints the error name when the btree node is an error pointer; also, don't trace failures when the the btree node is BCH_ERR_no_btree_node_up. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
14599cce |
|
22-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Switch btree locking code to struct btree_bkey_cached_common This is just some type safety cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
e3738c69 |
|
21-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
six locks: Improve six_lock_count six_lock_count now counts up whether a write lock held, and this patch now also correctly counts six_lock->intent_lock_recurse. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
12043cf1 |
|
18-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: fsck: Another transaction restart handling fix Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9f96568c |
|
09-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Tracepoint improvements Our types are exported to the tracepoint code, so it's not necessary to break things out individually when passing them to tracepoints - we can also call other functions from TP_fast_assign(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
49e401fa |
|
07-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Tracepoint improvements - use strlcpy(), not strncpy() - add tracepoints for btree_path alloc and free - give the tracepoint for key cache upgrade fail a proper name - add a tracepoint for btree_node_upgrade_fail Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
a0cb8d78 |
|
17-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Inject transaction restarts in debug mode In CONFIG_BCACHEFS_DEBUG mode, we'll now randomly issue transaction restarts - with a decaying probability based on the number of restarts we've already had, to ensure that transactions eventually make forward progress. This should help shake out some bugs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0990efae |
|
05-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_trans_too_many_iters() is now a transaction restart All transaction restarts need a tracepoint - this is essential for debugging Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
615f867c |
|
17-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improved errcodes Instead of overloading standard error codes (EINTR/EAGAIN), and defining short lists of error codes in multiple places that potentially end up overlapping & conflicting, we're now going to have one master list of error codes. Error codes are defined with an x-macro: thus we also have bch2_err_str() now. Also, error codes have a class field. Now, instead of checking for errors with ==, code should use bch2_err_matches(), which returns true if the error is equal to or a sub-error of the error class. This means we can define unique errors for every source location where an error is generated, which will help improve our error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8ef98313 |
|
17-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve bucket_alloc_fail tracepoint We should be printing the number of free buckets, not just the number of available buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
38585367 |
|
20-Jun-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Bucket invalidate path improvements - invalidate_one_bucket() now returns 1 when we don't have any buckets on this device to invalidate, ensuring we don't spin - the tracepoint invocation is moved to after the transaction commit, and we now include the number of cached sectors in the tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
1f93726e |
|
17-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Tracepoint improvements Delete some obsolete tracepoints, organize alloc tracepoints better, make a few tracepoints more consistent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
7c7e071d |
|
03-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't normalize to pages in btree cache shrinker This behavior dates from the early, early days of bcache, and upon further delving appears to not make any sense. The shrinker only works in terms of 'objects' of unknown size; normalizing to pages only had the effect of changing the batch size, which we could do directly - if we wanted; we probably don't. Normalizing to pages meant our batch size was very small, which seems to have been keeping us from doing as much shrinking as we should be under heavy memory pressure; this patch appears to alleviate some OOMs we've been seeing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
4254f5bf |
|
03-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a tracepoint for superblock writes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
59cc38b8 |
|
10-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New discard implementation In the old allocator code, buckets would be discarded just prior to being used - this made sense in bcache where we were discarding buckets just after invalidating the cached data they contain, but in a filesystem where we typically have more free space we want to be discarding buckets when they become empty. This patch implements the new behaviour - it checks the need_discard btree for buckets awaiting discards, and then clears the appropriate bit in the alloc btree, which moves the buckets to the freespace btree. Additionally, discards are now enabled by default. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f25d8215 |
|
09-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill allocator threads & freelists Now that we have new persistent data structures for the allocator, this patch converts the allocator to use them. Now, foreground bucket allocation uses the freespace btree to find buckets to allocate, instead of popping buckets off the freelist. The background allocator threads are no longer needed and are deleted, as well as the allocator freelists. Now we only need background tasks for invalidating buckets containing cached data (when we are low on empty buckets), and for issuing discards. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5f417394 |
|
11-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_btree_update_start() refactoring This simplifies the logic in bch2_btree_update_start() a bit, handling the unlock/block logic more locally. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3e154711 |
|
13-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: x-macroize alloc_reserve enum This makes an array of strings available, like our other enums. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f13fd87a |
|
30-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Run overwrite triggers before insert For backpointers, we'll need to delete old backpointers before adding new backpointers - otherwise we'll run into spurious duplicate backpointer errors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
3a306f3c |
|
17-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix large key cache keys Previously, we'd go into an infinite loop when attempting to cache a bkey in the key cache larger than 128 u64s - since we were only using a u8 for the size field, it'd get rounded up to 256 then truncated to 0. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
ddf11d8c |
|
27-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a use after free This fixes a regression from "bcachefs: Stash a copy of key being overwritten in btree_insert_entry". In btree_key_can_insert_cached(), we may reallocate the key cache key, invalidating pointers previously returned by peek() - fix it by issuing a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
8f9ad91a |
|
17-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix failure to allocate btree node in cache The error code when we fail to allocate a node in the btree node cache doesn't make it to bch2_btree_path_traverse_all(). Instead, we need to stash a flag in btree_trans so we know we have to take the cannibalize lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
4b59a319 |
|
15-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix slow tracepoints Some of our tracepoints were calling snprintf("pS") - which does symbol table lookups - in TP_fast_assign(), which turns out to be a really bad idea. This was done because perf trace wasn't correctly printing tracepoints that use %pS anymore - but it turns out trace-cmd does handle it correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
12ce5b7d |
|
11-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree key cache coherency - Updates to non key cache iterators will now be transparently redirected to the key cache for cached btrees. - Except when creating new keys: then the update goes to underlying btree For for iterating over a cached btree to work, we need to ensure that if a key exists in the key cache, it also exists in the btree - otherwise the iterator code will skip past it and not check the key cache. Otherwise, for consistency, all updates should go to the same place - the key cache. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bc82d08b |
|
08-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Tracepoint improvements This improves the transaction restart tracepoints - adding distinct tracepoints for all the locations and reasons a transaction might have been restarted, and ensures that there's a tracepoint for every transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
669f87a5 |
|
03-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Switch to __func__for recording where btree_trans was initialized Symbol decoding, via %ps, isn't supported in userspace - this will also be faster when we're using trans->fn in the fast path, as with the new BCH_JSET_ENTRY_log journal messages. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
eacb2574 |
|
02-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch_dev->dev Add a field to bch_dev for the dev_t of the underlying block device - this fixes a null ptr deref in tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
c7ce813f |
|
27-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a tracepoint for the btree cache shrinker This is to help with diagnosing why the btree node can doesn't seem to be shrinking - we've had issues in the past with granularity/batch size, since btree nodes are so big. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
caaa66aa |
|
07-Sep-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Better approach to write vs. read lock deadlocks Instead of unconditionally upgrading read locks to intent locks in do_bch2_trans_commit(), this patch changes the path that takes write locks to first trylock, and then if trylock fails check if we have a conflicting read lock, and restart the transaction if necessary. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
f48361b0 |
|
04-Sep-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Drop some fast path tracepoints These haven't turned out to be useful Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
67e0dd8f |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_path This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2b4e4b8c |
|
24-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Minor tracepoint improvements Btree iterator tracepoints should print whether they're for the key cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
531a0095 |
|
04-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve btree iterator tracepoints This patch adds some new tracepoints to the btree iterator code, and adds new fields to the existing tracepoints - primarily for the iterator position. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
ddc7dd62 |
|
27-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't use uuid in tracepoints %pU for printing out pointers to uuids doesn't work in perf trace Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
19d2819d |
|
25-May-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a tracepoint for copygc waiting Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
4f6dad46 |
|
28-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a tracepoint for when we block on journal reclaim Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3dea728c |
|
29-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: New tracepoint for bch2_trans_get_iter() Trying to debug an issue where after traverse_all() we shouldn't have to traverse any iterators... yet we are Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
89baec78 |
|
17-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Allocator refactoring This uses the kthread_wait_freezable() macro to simplify a lot of the allocator thread code, along with cleaning up bch2_invalidate_bucket2(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
73a117d2 |
|
14-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve trans_restart_mem_realloced tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2527dd91 |
|
14-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve bch2_btree_iter_traverse_all() By changing it to upgrade iterators to intent locks to avoid lock restarts we can simplify __bch2_btree_node_lock() quite a bit - this fixes a probable bug where it could potentially drop a lock on an unrelated error but still succeed instead of causing a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d44a6e35 |
|
13-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Drop old style btree node coalescing We have foreground btree node merging now, and any future btree node merging improvements are going to be based off of that code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
241e2636 |
|
31-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't flush btree writes more aggressively because of btree key cache We need to flush the btree key cache when it's too dirty, because otherwise the shrinker won't be able to reclaim memory - this is done by journal reclaim. But journal reclaim also kicks btree node writes: this meant that btree node writes were getting kicked much too often just because we needed to flush btree key cache keys. This patch splits journal pins into two different lists, and teaches journal reclaim to not flush btree node writes when it only needs to flush key cache keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d5425a3b |
|
19-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Throttle updates when btree key cache is too dirty This is needed to ensure we don't deadlock because journal reclaim and thus memory reclaim isn't making forward progress. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b3c2a06b |
|
20-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Simplify transaction commit error path The transaction restart path traverses all iterators, we don't need to do it here. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8a92e545 |
|
19-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Ensure journal reclaim runs when btree key cache is too dirty Ensuring the key cache isn't too dirty is critical for ensuring that the shrinker can reclaim memory. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3dc5fcfc |
|
16-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert tracepoints to use %ps, not %pf Symbol decoding was changed from %pf to %ps Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a301dc38 |
|
28-Oct-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve tracing for transaction restarts We have a bug where we can get stuck with a process spinning in transaction restarts - need more information. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e6d11615 |
|
11-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make copygc thread global Per device copygc threads don't move data to different devices and they make fragmentation works - they don't make much sense anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
297604c9 |
|
10-Apr-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a few tracepoints Transaction restart tracing should probably be overhaulled at some point. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
20bceecb |
|
15-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: More work to avoid transaction restarts Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ed8413fd |
|
14-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: improved btree locking tracepoints Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c43a6ef9 |
|
05-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_bkey_cached_common This is prep work for the btree key cache: btree iterators will point to either struct btree, or a new struct bkey_cached. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ba5c6557 |
|
22-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add actual tracepoints for transaction restarts Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
26609b61 |
|
01-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make bkey types globally unique this lets us get rid of a lot of extra switch statements - in a lot of places we dispatch on the btree node type, and then the key type, so this is a nice cleanup across a lot of code. Also improve the on disk format versioning stuff. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b2be7c8b |
|
22-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill bucket mark sector count saturation Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1c6fdbd8 |
|
17-Mar-2017 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Initial commit Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write filesystem with every feature you could possibly want. Website: https://bcachefs.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|