#
6e4d9bd1 |
|
20-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bkey_cached.btree_trans_barrier_seq needs to be a ulong this stores the SRCU sequence number, which we use to check if an SRCU barrier has elapsed; this is a partial fix for the key cache shrinker not actually freeing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
be42e4a6 |
|
04-Apr-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Bump limit in btree_trans_too_many_iters() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
91dcad18 |
|
22-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Pin btree cache in ram for random access in fsck Various phases of fsck involve checking references from one btree to another: this means doing a sequential scan of one btree, and then mostly random access into the second. This is particularly painful for checking extents <-> backpointers; we can prefetch btree node access on the sequential scan, but not on the random access portion, and this is particularly painful on spinning rust, where we'd like to keep the pipeline fairly full of btree node reads so that the elevator can reduce seeking. This patch implements prefetching and pinning of the portion of the btree that we'll be doing random access to. We already calculate how much of the random access btree will fit in memory so it's a fairly straightforward change. This will put more pressure on system memory usage, so we introduce a new option, fsck_memory_usage_percent, which is the percentage of total system ram that fsck is allowed to pin. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b26d7914 |
|
21-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BTREE_ID_subvolume_children Add a btree to record a parent -> child subvolume relationships, according to the filesystem heirarchy. The subvolume_children btree is a bitset btree: if a bit is set at pos p, that means p.offset is a child of subvolume p.inode. This will be used for efficiently listing subvolumes, as well as recursive deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
52946d82 |
|
06-Feb-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill more -EIO error codes This converts -EIOs related to btree node errors to private error codes, which will help with some ongoing debugging by giving us better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b97de453 |
|
15-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve trace_trans_restart_relock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8e7834a8 |
|
16-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch_fs_usage_base Split out base filesystem usage into its own type; prep work for breaking up bch2_trans_fs_usage_apply(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
38c23fb8 |
|
07-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BTREE_TRIGGER_ATOMIC Add a new flag to be explicit about when we're running atomic triggers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0c99e17d |
|
10-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: growable btree_paths XXX: we're allocating memory with btree locks held - bad We need to plumb through an error path so we can do allocate_dropping_locks() - but we're merging this now because it fixes a transaction path overflow caused by indirect extent fragmentation, and the resize path is rare. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2c3b0fc3 |
|
10-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: trans->nr_paths Start to plumb through dynamically growable btree_paths; this patch replaces most BTREE_ITER_MAX references with trans->nr_paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5cc6daf7 |
|
12-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: trans->updates will also be resizable the reflink triggers are also bumping up against the maximum number of paths in a transaction - and generating proportional numbers of updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
31403dca |
|
11-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS - Some tweaks to greatly reduce locking overhead for the list of btree transactions, so that it can always be enabled: leave btree_trans objects on the list when they're on the percpu single item freelist, and only check for duplicates in the same process when CONFIG_BCACHEFS_DEBUG is enabled - don't zero out the full btree_trans() unless we allocated it from the mempool Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fea153a8 |
|
12-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: rcu protect trans->paths Upcoming patches are going to be changing trans->paths to a reallocatable buffer. We need to guard against use after free when it's used by other threads; this introduces RCU protection to those paths and changes them to check for trans->paths == NULL Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6474b706 |
|
11-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Clean up btree_trans Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
398c9834 |
|
10-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: kill btree_path.idx Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7f9821a7 |
|
10-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree_insert_entry -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
07f383c7 |
|
03-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree_iter -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
255ebbbf |
|
08-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: bch2_path_get() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
67997234 |
|
11-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: kill btree_trans->wb_updates the btree write buffer path now creates a journal entry directly Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
24de63da |
|
10-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improve trans->extra_journal_entries Instead of using a darray, we now allocate journal entries for the transaction commit path with our normal bump allocator - with an inlined fastpath, and using btree_transaction_stats to remember how much to initially allocate so as to avoid transaction restarts. This is prep work for converting write buffer updates to use this mechanism. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a83b6c89 |
|
10-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: kill btree_path->(alloc_seq|downgrade_seq) These were for extra info in tracepoints for debugging a specialized issue - we do not want to bloat btree_path for this, at least in release builds. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6e92d155 |
|
03-Dec-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Refactor trans->paths_allocated to be standard bitmap Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ad9c7992 |
|
17-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill btree_iter->journal_pos For BTREE_ITER_WITH_JOURNAL, we memoize lookups in the journal keys, to avoid the binary search overhead. Previously we stashed the pos of the last key returned from the journal, in order to force the lookup to be redone when rewinding. Now bch2_journal_keys_peek_upto() handles rewinding itself when necessary - so we can slim down btree_iter. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1ae8a090 |
|
16-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill memset() in bch2_btree_iter_init() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e56978c8 |
|
12-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill BTREE_ITER_ALL_LEVELS As discussed in the previous patch, BTREE_ITER_ALL_LEVELS appears to be racy with concurrent interior node updates - and perhaps it is fixable, but it's tricky and unnecessary. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
43c7ede0 |
|
08-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill BTREE_UPDATE_PREJOURNAL With the previous patch that reworks BTREE_INSERT_JOURNAL_REPLAY, we can now switch the btree write buffer to use it for flushing. This has the advantage that transaction commits don't need to take a journal reservation at all. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7125063f |
|
02-Mar-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't decrease BTREE_ITER_MAX when LOCKDEP=y Running with fewer max btree paths doesn't work anymore when replication is enabled - as we've added e.g. the freespace and bucket gens btrees, we naturally end up needing more btree paths. This is an issue with lockdep, we end up taking more locks than lockdep will track (the MAX_LOCKD_DEPTH constant). But bcachefs as merged does not yet support lockdep anyways, so we can leave that for later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
006ccc30 |
|
04-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill journal pre-reservations This deletes the complicated and somewhat expensive journal pre-reservation machinery in favor of just using journal watermarks: when the journal is more than half full, we run journal reclaim more aggressively, and when the journal is more than 3/4s full we only allow journal reclaim to get new journal reservations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3b8c4507 |
|
06-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree_trans->write_locked As prep work for the next patch to fix a key cache reclaim issue, we need to start tracking whether we're currently holding write locks - so that we can release and retake the before calling into memory reclaim. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1bd5bcc9 |
|
13-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Split out btree_key_cache_types.h More consistent organization. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9fcdd23b |
|
04-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add a comment for BTREE_INSERT_NOJOURNAL usage BTREE_INSERT_NOJOURNAL is primarily used for a performance optimization related to inode updates and fsync - document it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d3c7727b |
|
03-Nov-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: rebalance_work btree is not a snapshots btree rebalance_work entries may refer to entries in the extents btree, which is a snapshots btree, or they may also refer to entries in the reflink btree, which is not. Hence rebalance_work keys may use the snapshot field but it's not required to be nonzero - add a new btree flag to reflect this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c4accde4 |
|
29-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Ensure srcu lock is not held too long The SRCU read lock that btree_trans takes exists to make it safe for bch2_trans_relock() to deref pointers to btree nodes/key cache items we don't have locked, but as a side effect it blocks reclaim from freeing those items. Thus, it's important to not hold it for too long: we need to differentiate between bch2_trans_unlock() calls that will be only for a short duration, and ones that will be for an unbounded duration. This introduces bch2_trans_unlock_long(), to be used mainly by the data move paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
be9e782d |
|
27-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Don't downgrade locks on transaction restart We should only be downgrading locks on success - otherwise, our transaction restarts won't be getting the correct locks and we'll livelock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
50a38ca1 |
|
19-Oct-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix btree_node_type enum More forwards compatibility fixups: having BKEY_TYPE_btree at the end of the enum conflicts with unnkown btree IDs, this shifts BKEY_TYPE_btree to slot 0 and fixes things up accordingly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ecae0bd5 |
|
02-Nov-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Kemeng Shi has contributed some compation maintenance work in the series 'Fixes and cleanups to compaction' - Joel Fernandes has a patchset ('Optimize mremap during mutual alignment within PMD') which fixes an obscure issue with mremap()'s pagetable handling during a subsequent exec(), based upon an implementation which Linus suggested - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the following patch series: mm/damon: misc fixups for documents, comments and its tracepoint mm/damon: add a tracepoint for damos apply target regions mm/damon: provide pseudo-moving sum based access rate mm/damon: implement DAMOS apply intervals mm/damon/core-test: Fix memory leaks in core-test mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval - In the series 'Do not try to access unaccepted memory' Adrian Hunter provides some fixups for the recently-added 'unaccepted memory' feature. To increase the feature's checking coverage. 'Plug a few gaps where RAM is exposed without checking if it is unaccepted memory' - In the series 'cleanups for lockless slab shrink' Qi Zheng has done some maintenance work which is preparation for the lockless slab shrinking code - Qi Zheng has redone the earlier (and reverted) attempt to make slab shrinking lockless in the series 'use refcount+RCU method to implement lockless slab shrink' - David Hildenbrand contributes some maintenance work for the rmap code in the series 'Anon rmap cleanups' - Kefeng Wang does more folio conversions and some maintenance work in the migration code. Series 'mm: migrate: more folio conversion and unification' - Matthew Wilcox has fixed an issue in the buffer_head code which was causing long stalls under some heavy memory/IO loads. Some cleanups were added on the way. Series 'Add and use bdev_getblk()' - In the series 'Use nth_page() in place of direct struct page manipulation' Zi Yan has fixed a potential issue with the direct manipulation of hugetlb page frames - In the series 'mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO' has improved our handling of gigantic pages in the hugetlb vmmemmep optimizaton code. This provides significant boot time improvements when significant amounts of gigantic pages are in use - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code rationalization and folio conversions in the hugetlb code - Yin Fengwei has improved mlock()'s handling of large folios in the series 'support large folio for mlock' - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has added statistics for memcg v1 users which are available (and useful) under memcg v2 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable) prctl so that userspace may direct the kernel to not automatically propagate the denial to child processes. The series is named 'MDWE without inheritance' - Kefeng Wang has provided the series 'mm: convert numa balancing functions to use a folio' which does what it says - In the series 'mm/ksm: add fork-exec support for prctl' Stefan Roesch makes is possible for a process to propagate KSM treatment across exec() - Huang Ying has enhanced memory tiering's calculation of memory distances. This is used to permit the dax/kmem driver to use 'high bandwidth memory' in addition to Optane Data Center Persistent Memory Modules (DCPMM). The series is named 'memory tiering: calculate abstract distance based on ACPI HMAT' - In the series 'Smart scanning mode for KSM' Stefan Roesch has optimized KSM by teaching it to retain and use some historical information from previous scans - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the series 'mm: memcg: fix tracking of pending stats updates values' - In the series 'Implement IOCTL to get and optionally clear info about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits us to atomically read-then-clear page softdirty state. This is mainly used by CRIU - Hugh Dickins contributed the series 'shmem,tmpfs: general maintenance', a bunch of relatively minor maintenance tweaks to this code - Matthew Wilcox has increased the use of the VMA lock over file-backed page faults in the series 'Handle more faults under the VMA lock'. Some rationalizations of the fault path became possible as a result - In the series 'mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap()' David Hildenbrand has implemented some cleanups and folio conversions - In the series 'various improvements to the GUP interface' Lorenzo Stoakes has simplified and improved the GUP interface with an eye to providing groundwork for future improvements - Andrey Konovalov has sent along the series 'kasan: assorted fixes and improvements' which does those things - Some page allocator maintenance work from Kemeng Shi in the series 'Two minor cleanups to break_down_buddy_pages' - In thes series 'New selftest for mm' Breno Leitao has developed another MM self test which tickles a race we had between madvise() and page faults - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups and an optimization to the core pagecache code - Nhat Pham has added memcg accounting for hugetlb memory in the series 'hugetlb memcg accounting' - Cleanups and rationalizations to the pagemap code from Lorenzo Stoakes, in the series 'Abstract vma_merge() and split_vma()' - Audra Mitchell has fixed issues in the procfs page_owner code's new timestamping feature which was causing some misbehaviours. In the series 'Fix page_owner's use of free timestamps' - Lorenzo Stoakes has fixed the handling of new mappings of sealed files in the series 'permit write-sealed memfd read-only shared mappings' - Mike Kravetz has optimized the hugetlb vmemmap optimization in the series 'Batch hugetlb vmemmap modification operations' - Some buffer_head folio conversions and cleanups from Matthew Wilcox in the series 'Finish the create_empty_buffers() transition' - As a page allocator performance optimization Huang Ying has added automatic tuning to the allocator's per-cpu-pages feature, in the series 'mm: PCP high auto-tuning' - Roman Gushchin has contributed the patchset 'mm: improve performance of accounted kernel memory allocations' which improves their performance by ~30% as measured by a micro-benchmark - folio conversions from Kefeng Wang in the series 'mm: convert page cpupid functions to folios' - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about kmemleak' - Qi Zheng has improved our handling of memoryless nodes by keeping them off the allocation fallback list. This is done in the series 'handle memoryless nodes more appropriately' - khugepaged conversions from Vishal Moola in the series 'Some khugepaged folio conversions'" [ bcachefs conflicts with the dynamically allocated shrinkers have been resolved as per Stephen Rothwell in https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/ with help from Qi Zheng. The clone3 test filtering conflict was half-arsed by yours truly ] * tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits) mm/damon/sysfs: update monitoring target regions for online input commit mm/damon/sysfs: remove requested targets when online-commit inputs selftests: add a sanity check for zswap Documentation: maple_tree: fix word spelling error mm/vmalloc: fix the unchecked dereference warning in vread_iter() zswap: export compression failure stats Documentation: ubsan: drop "the" from article title mempolicy: migration attempt to match interleave nodes mempolicy: mmap_lock is not needed while migrating folios mempolicy: alloc_pages_mpol() for NUMA policy without vma mm: add page_rmappable_folio() wrapper mempolicy: remove confusing MPOL_MF_LAZY dead code mempolicy: mpol_shared_policy_init() without pseudo-vma mempolicy trivia: use pgoff_t in shared mempolicy tree mempolicy trivia: slightly more consistent naming mempolicy trivia: delete those ancient pr_debug()s mempolicy: fix migrate_pages(2) syscall return nr_failed kernfs: drop shared NUMA mempolicy hooks hugetlbfs: drop shared NUMA mempolicy pretence mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets() ...
|
#
6bd68ec2 |
|
12-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Heap allocate btree_trans We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
96dea3d5 |
|
12-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix W=12 build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
e8d2fe3b |
|
21-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Consolidate btree id properties This refactoring centralizes defining per-btree properties. bch2_key_types_allowed was also about to overflow a u32, so expand that to a u64. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
eabb10dc |
|
19-Jul-2023 |
Brian Foster <bfoster@redhat.com> |
bcachefs: support btree updates of prejournaled keys Introduce support for prejournaled key updates. This allows a transaction to commit an update for a key that already exists (and is pinned) in the journal. This is required for btree write buffer updates as the current scheme of journaling both on write buffer insertion and write buffer (slow path) flush is unsafe in certain crash recovery scenarios. Create a small trans update wrapper to pass along the seq where the key resides into the btree_insert_entry. From there, trans commit passes the seq into the btree insert path where it is used to manage the journal pin for the associated btree leaf. Note that this patch only introduces the underlying mechanism and otherwise includes no functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
73bd774d |
|
06-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Assorted sparse fixes - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
95b595a5 |
|
30-Apr-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Btree iterator, update flags no longer conflict Change btree_update_flags to start after the last btree iterator flag, so that we can pass both in the same flags argument. This is needed for the upcoming bch2_bkey_get_mut() helper. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
f3a65bb9 |
|
27-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Convert constants to consts Rust bindgen doesn't handle macros, but it does handle integer constants: this conversion aids in implementing safe Rust wrapper interfaces. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
5e2d8be8 |
|
18-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Split trans->last_begin_ip and trans->last_restarted_ip These are two different things - this improves our debug assert messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
930c0c4c |
|
11-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add missing include Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
920e69bc |
|
03-Jan-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Btree write buffer This adds a new method of doing btree updates - a straight write buffer, implemented as a flat fixed size array. This is only useful when we don't need to read from the btree in order to do the update, and when reading is infrequent - perfect for the LRU btree. This will make LRU btree updates fast enough that we'll be able to use it for persistently indexing buckets by fragmentation, which will be a massive boost to copygc performance. Changes: - A new btree_insert_type enum, for btree_insert_entries. Specifies btree, btree key cache, or btree write buffer. - bch2_trans_update_buffered(): updates via the btree write buffer don't need a btree path, so we need a new update path. - Transaction commit path changes: The update to the btree write buffer both mutates global, and can fail if there isn't currently room. Therefore we do all write buffer updates in the transaction all at once, and also if it fails we have to revert filesystem usage counter changes. If there isn't room we flush the write buffer in the transaction commit error path and retry. - A new persistent option, for specifying the number of entries in the write buffer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
30ca6ece |
|
09-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill trans->flags Recursive transaction commits are occasionally necessary - in particular, for the upcoming btree write buffer's flush path. This avoids bugs due to trans->flags being accidentally mutated mid-commit, which can cause c->writes refcount leaks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
60b55388 |
|
09-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: trans->notrace_relock_fail When we unlock in order to submit IO, the next relock event is likely to fail if submit_bio() blocked - we shouldn't those events in our _fail stats, since those are expected events and shouldn't cause test failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
94c69faf |
|
04-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Use six_lock_ip() This uses the new _ip() interface to six locks and hooks it up to btree_path->ip_allocated, when available. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
464b4155 |
|
05-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix bch2_trans_reset_updates() This should have been resetting trans->fs_usage_deltas as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
ee2c6ea7 |
|
08-Jan-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree_iter->ip_allocated In debug mode, we now track where btree iterators and paths are initialized/allocated - helpful in tracking down btree path overflows. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
5bbe3f2d |
|
14-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Log more messages in the journal This patch - Adds a mechanism for queuing up journal entries prior to the journal being started, which will be used for early journal log messages - Adds bch2_fs_log_msg() and improves bch2_trans_log_msg(), which now take format strings. bch2_fs_log_msg() can be used before or after the journal has been started, and will use the appropriate mechanism. - Deletes the now obsolete bch2_journal_log_msg() - And adds more log messages to the recovery path - messages for journal/filesystem started, journal entries being blacklisted, and journal replay starting/finishing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
e242b92a |
|
15-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix for long running btree transactions & key cache While a btree transaction is running, we hold a SRCU read lock on the btree key cache that prevents btree key cache keys from being freed - this is so that relock() operations won't access freed memory. The downside of this is that long running btree transactions prevent memory from being freed from the key cache. This adds a check in bch2_trans_begin() - if the transaction has been running longer than 1 second, drop and retake the SRCU read lock and zero out pointers to unlock key cache paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
a16b19cd |
|
02-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Allow for more btrees Expand some bitfields so we can keep adding more btrees. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
ac9fa4bd |
|
07-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill btree_insert_ret enum Replace with standard bcachefs-private error codes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
1617d56d |
|
22-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Key cache now works for snapshots btrees This switches btree_key_cache_fill() to use a btree iterator, not a btree path, so that it can search for keys in previous snapshots. We also add another iterator flag, BTREE_ITER_KEY_CACHE_FILL, to avoid recursion back into the key cache. This will allow us to re-enable the key cache for inodes in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
087e53c2 |
|
20-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Bring back BTREE_ITER_CACHED_NOFILL Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
c96f108b |
|
24-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Optimize bch2_trans_iter_init() When flags & btree_id are constants, we can constant fold the entire calculation of the actual iterator flags - and the whole thing becomes small enough to inline. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
42af0ad5 |
|
17-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix a race with b->write_type b->write_type needs to be set atomically with setting the btree_node_need_write flag, so move it into b->flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
46fee692 |
|
28-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improved btree write statistics This replaces sysfs btree_avg_write_size with btree_write_stats, which now breaks out statistics by the source of the btree write. Btree writes that are too small are a source of inefficiency, and excessive btree resort overhead - this will let us see what's causing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
fd0c7679 |
|
22-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Convert to __packed and __aligned Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
77671e8f |
|
21-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Move bkey bkey_unpack_key() to bkey.h Long ago, bkey_unpack_key() was added to bset.h instead of bkey.h because bkey.h didn't include btree_types.h, which it needs for the compiled unpack function. This patch finally moves it to the proper location. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
0d7009d7 |
|
22-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Delete old deadlock avoidance code This deletes our old lock ordering based deadlock avoidance code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
33bd5d06 |
|
22-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Deadlock cycle detector We've outgrown our own deadlock avoidance strategy. The btree iterator API provides an interface where the user doesn't need to concern themselves with lock ordering - different btree iterators can be traversed in any order. Without special care, this will lead to deadlocks. Our previous strategy was to define a lock ordering internally, and whenever we attempt to take a lock and trylock() fails, we'd check if the current btree transaction is holding any locks that cause a lock ordering violation. If so, we'd issue a transaction restart, and then bch2_trans_begin() would re-traverse all previously used iterators, but in the correct order. That approach had some issues, though. - Sometimes we'd issue transaction restarts unnecessarily, when no deadlock would have actually occured. Lock ordering restarts have become our primary cause of transaction restarts, on some workloads totally 20% of actual transaction commits. - To avoid deadlock or livelock, we'd often have to take intent locks when we only wanted a read lock: with the lock ordering approach, it is actually illegal to hold _any_ read lock while blocking on an intent lock, and this has been causing us unnecessary lock contention. - It was getting fragile - the various lock ordering rules are not trivial, and we'd been seeing occasional livelock issues related to this machinery. So, since bcachefs is already a relational database masquerading as a filesystem, we're stealing the next traditional database technique and switching to a cycle detector for avoiding deadlocks. When we block taking a btree lock, after adding ourself to the waitlist but before sleeping, we do a DFS of btree transactions waiting on other btree transactions, starting with the current transaction and walking our held locks, and transactions blocking on our held locks. If we find a cycle, we emit a transaction restart. Occasionally (e.g. the btree split path) we can not allow the lock() operation to fail, so if necessary we'll tell another transaction that it has to fail. Result: trans_restart_would_deadlock events are reduced by a factor of 10 to 100, and we'll be able to delete a whole bunch of grotty, fragile code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
3d21d48e |
|
03-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix usage of six lock's percpu mode, key cache version Similar to "bcachefs: Fix usage of six lock's percpu mode", six locks have a percpu mode, but we can't switch between percpu and non percpu modes while a lock is in use: threads attempting to take a read lock may race, and we'll end up with the read count permanently off. Fixing this the "correct" way, in six_lock_pcpu_(alloc|free) would require an RCU barrier, and we don't want to do that - instead, we have to permanently segragate percpu and non percpu objects, including when on freelists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
4e6defd1 |
|
31-Aug-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree_bkey_cached_common->cached Add a type descriptor to btree_bkey_cached_common - there's no reason not to since we've got padding that was otherwise unused, and this is a nice cleanup (and helpful in later patches). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
616928c3 |
|
22-Aug-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Track maximum transaction memory This patch - tracks maximum bch2_trans_kmalloc() memory used in btree_transaction_stats - makes it available in debugfs - switches bch2_trans_init() to using that for the amount of memory to preallocate, instead of the parameter passed in This drastically reduces transaction restarts, and means we no longer need to track this in the source code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
2e27f656 |
|
21-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill nodes_intent_locked Previously, we used two different bit arrays for tracking held btree node locks. This patch switches to an array of two bit integers, which will let us track, in a future patch, when we hold a write lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
cd5afabe |
|
19-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_locking.c Start to centralize some of the locking code in a new file; more locking code will be moving here in the future. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
5c0bb66a |
|
11-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Track the maximum btree_paths ever allocated by each transaction We need a way to check if the machinery for handling btree_paths with in a transaction is behaving reasonably, as it often has not been - we've had bugs with transaction path overflows caused by duplicate paths and plenty of other things. This patch tracks, per transaction fn, the most btree paths ever allocated by that transaction and makes it available in debugfs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
4aba7d45 |
|
11-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rename lock_held_stats -> btree_transaction_stats Going to be adding more things to this in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
6fae65c1 |
|
10-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill BTREE_ITER_CACHED_(NOFILL|NOCREATE) These were used more prior to getting rid of the in-memory bucket arrays - they don't serve much purpose anymore, and deleting them lets us write better assertions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
315c9ba6 |
|
10-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_NO_NODE -> BCH_ERR codes Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
86b74451 |
|
05-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix bch2_btree_trans_to_text() bch2_btree_trans_to_text() is used to print btree_transactions owned by other threads; thus, it needs to be particularly careful. This fixes a null ptr deref caused by racing with the owning thread changing path->l[].b. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
549d173c |
|
17-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: EINTR -> BCH_ERR_transaction_restart Now that we have error codes, with subtypes, we can switch to our own error code for transaction restarts - and even better, a distinct error code for each transaction restart reason: clearer code and better debugging. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
e941ae7d |
|
17-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a counter for btree_trans restarts This will help us improve nested transactions - we need to add assertions that whenever an inner transaction handles a restart, it still returns -EINTR to the outer transaction. This also adds nested_lockrestart_do() and nested_commit_do() which use the new counters to correctly return -EINTR when the transaction was restarted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
89333156 |
|
16-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert bch2_dev_usrdata_drop() to for_each_btree_key2() The new for_each_btree_key2() macro handles transaction retries, allowing us to avoid nested transactions - which we want to avoid since they're tricky to do completely correctly and upcoming assertions are going to be checking for that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
c807ca95 |
|
14-Jul-2022 |
Daniel Hill <daniel@gluo.nz> |
bcachefs: added lock held time stats We now record the length of time btree locks are held and expose this in debugfs. Enabled via CONFIG_BCACHEFS_LOCK_TIME_STATS. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
43de721a |
|
13-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Unlock in bch2_trans_begin() if we've held locks more than 10us We try to ensure we never hold btree locks for too long - bcachefs tries to be soft realtime. This adds a check when restarting a transaction, where a transaction restart is cheap - if we've been holding locks for too long, drop and retake them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
8f7f566f |
|
16-Jun-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree key cache pcpu freedlist Originally, the btree key cache code would always allocate new entries by reusing from the recently-freed list, if that list wasn't empty. But that behaviour was dropped, for lock contention reasons. But it seems that entries stranded on the freed list have been contributing to some of our oom issues, because long running btree transactions will prevent them from being freed. This patch re-adds allocating from the freed list, but it also adds percpu buffers to solve the lock contention issues - and the new percpu freed lists will improve the evict paths, too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
50b13bee |
|
17-Jun-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve an error message When inserting a key type that's not valid for a given btree, we should print out which btree we were inserting into. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
b7c11046 |
|
19-May-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Increase max size for btree_trans bump allocator With backpointers, alloc keys have gotten bigger, so we're needing more memory here. We're probably going to need to go with something more sophisticated than a bump allocator, but - let's see if we can avoid doing that just yet. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
30525f68 |
|
21-May-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix journal_keys_search() overhead Previously, on every btree_iter_peek() operation we were searching the journal keys, doing a full binary search - which was slow. This patch fixes that by saving our position in the journal keys, so that we only do a full binary search when moving our position backwards or a large jump forwards. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
b0babf2a |
|
12-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_btree_iter_peek_all_levels() This adds bch2_btree_iter_peek_all_levels(), which returns keys from every level of the btree - interior nodes included - in monotonically increasing order, soon to be used by the backpointers check & repair code. - BTREE_ITER_ALL_LEVELS can now be passed to for_each_btree_key() to iterate thusly, much like BTREE_ITER_SLOTS - The existing algorithm in bch2_btree_iter_advance() doesn't work with peek_all_levels(): we have to defer the actual advancing until the next time we call peek, where we have the btree path traversed and uptodate. So, we add an advanced bit to btree_iter; when BTREE_ITER_ALL_LEVELS is set bch2_btree_iter_advanced() just marks the iterator as advanced. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
e1b8f5f5 |
|
31-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Plumb btree_id & level to trans_mark For backpointers, we'll need the full key location - that means btree_id and btree level. This patch plumbs it through. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
c6b2826c |
|
11-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Freespace, need_discard btrees This adds two new btrees for the upcoming allocator rewrite: an extents btree of free buckets, and a btree for buckets awaiting discards. We also add a new trigger for alloc keys to keep the new btrees up to date, and a compatibility path to initialize them on existing filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
3d48a7f8 |
|
31-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: KEY_TYPE_alloc_v4 This introduces a new alloc key which doesn't use varints. Soon we'll be adding backpointers and storing them in alloc keys, which means our pack/unpack workflow for alloc keys won't really work - we'll need to be mutating alloc keys in place. Instead of bch2_alloc_unpack(), we now have bch2_alloc_to_v4() that converts older types of alloc keys to v4 if needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
2a6870ad |
|
29-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use darray for extra_journal_entries Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
3a306f3c |
|
17-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix large key cache keys Previously, we'd go into an infinite loop when attempting to cache a bkey in the key cache larger than 128 u64s - since we were only using a u8 for the size field, it'd get rounded up to 256 then truncated to 0. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
30985537 |
|
04-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix usage of six lock's percpu mode Six locks have a percpu mode, which we use for interior btree nodes, as well as btree key cache keys for the subvolumes btree. We've been switching locks back and forth between percpu and non percpu mode as needed, but it turns out this is racy - when we're reusing an existing node, other threads could be attempting to lock it while we're switching it between modes. This patch fixes this by never switching 'struct btree' between the two modes, and instead segragating them between two different freed lists. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
bf3efff5 |
|
27-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix race leading to btree node write getting stuck Checking btree_node_may_write() isn't atomic with the other btree flags, dirty and need_write in particular. There was a rare race where we'd unblock a node from writing while __btree_node_flush() was setting need_write, and no thread would notice that the node was now both able to write and needed to be written. Fix this by adding btree node flags for will_make_reachable and write_blocked that can be checked in the cmpxchg loop in __bch2_btree_node_write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
de517c95 |
|
26-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use x-macros for btree node flags This is for adding an array of strings for btree node flag names. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
8322a937 |
|
04-Jan-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Btree key cache optimization This helps with lock contention in the journalling code: instead of updating our journal pin on every write, only get a journal pin if we don't have one. This means we can avoid hammering on journal locks nearly so much, at the cost of carrying around a journal pin for an older entry than the one we actually need. To handle that, if needed we update our journal pin to the correct one when flushed by journal reclaim. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
8f9ad91a |
|
17-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix failure to allocate btree node in cache The error code when we fail to allocate a node in the btree node cache doesn't make it to bch2_btree_path_traverse_all(). Instead, we need to stash a flag in btree_trans so we know we have to take the cannibalize lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
c7ce2732 |
|
15-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Also show when blocked on write locks This consolidates some of the btree node lock path, so that when we're blocked taking a write lock on a node it shows up in bch2_btree_trans_to_text(), along with intent and read locks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
12ce5b7d |
|
11-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree key cache coherency - Updates to non key cache iterators will now be transparently redirected to the key cache for cached btrees. - Except when creating new keys: then the update goes to underlying btree For for iterating over a cached btree to work, we need to ensure that if a key exists in the key cache, it also exists in the btree - otherwise the iterator code will skip past it and not check the key cache. Otherwise, for consistency, all updates should go to the same place - the key cache. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
f7b6ca23 |
|
06-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_WITH_KEY_CACHE This is the start of cache coherency with the btree key cache - this adds a btree iterator flag that causes lookups to also check the key cache when we're iterating over the btree (not iterating over the key cache). Note that we could still race with another thread creating at item in the key cache and updating it, since we aren't holding the key cache locked if it wasn't found. The next patch for the update path will address this by causing the transaction to restart if the key cache is found to be dirty. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
2e63e180 |
|
24-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Stash a copy of key being overwritten in btree_insert_entry We currently need to call bch2_btree_path_peek_slot() multiple times in the transaction commit path - and some of those need to be updated to also check the keys from journal replay, too. Let's consolidate this and stash the key being overwritten in btree_insert_entry. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
1f2d9192 |
|
08-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: iter->update_path With BTREE_ITER_FILTER_SNAPSHOTS, we have to distinguish between the path where the key was found, and the path for inserting into the current snapshot. This adds a new field to struct btree_iter for saving a path for the current snapshot, and plumbs it through bch2_trans_update(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
669f87a5 |
|
03-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Switch to __func__for recording where btree_trans was initialized Symbol decoding, via %ps, isn't supported in userspace - this will also be faster when we're using trans->fn in the fast path, as with the new BCH_JSET_ENTRY_log journal messages. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
5222a460 |
|
25-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_WITH_JOURNAL This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is automatically enabled when initializing a btree iterator before journal replay has completed - it overlays the contents of the journal with the btree. This lets us delete bch2_btree_and_journal_walk() and just use the normal btree iterator interface instead - which also lets us delete a significant amount of duplicated code. Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch - we're redoing the binary search over keys in the journal every time we call bch2_btree_iter_peek(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
fb64f3fd |
|
31-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BCH_JSET_ENTRY_log Add a journal entry type for logging messages, and add an option to use it to log the transaction name - this makes for a very handy debugging tool, as with it we can use the 'bcachefs list_journal' command to see not only what updates were done, but what was doing them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
f3e1f444 |
|
21-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_NOPRESERVE This adds a flag to not mark the initial btree_path as preserve, for paths that we expect to be cheap to reconstitute if necessary - this solves a btree_path overflow caused by need_whiteout_for_snapshot(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
b84d42c3 |
|
16-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out CONFIG_BCACHEFS_DEBUG_TRANSACTIONS This puts the btree_transactions sysfs/debugfs file behind a separate config option - it's highly useful, but not cheap enough to enable permenantly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
f0c3f88b |
|
26-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Run insert triggers before overwrite triggers Currently, btree triggers are run in natural key order, which presents a problem for fallocate in INSERT_RANGE mode: since we're moving existing extents to higher offsets, the trigger for deleting the old extent runs before the trigger that adds the new extent, potentially leading to indirect extents being deleted that shouldn't be when the delete causes the refcount to hit 0. This changes the order we run triggers so that for a givin btree, we run all insert triggers before overwrite triggers, nicely sidestepping this issue. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
3e52c222 |
|
29-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add journal_seq to inode & alloc keys Add fields to inode & alloc keys that record the journal sequence number when they were most recently modified. For alloc keys, this is needed to know what journal sequence number we have to flush before the bucket can be reused. Currently this is tracked in memory, but we'll be getting rid of the in memory bucket array. For inodes, this is needed for fsync when the inode has been evicted from the vfs cache. Currently we use a bloom filter per outstanding journal buf - but that mechanism has been broken since we added the ability to not issue a flush/fua for every journal write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
34d74830 |
|
30-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BTREE_UPDATE_NOJOURNAL We're going to have btree updates that don't need to be journalled; add a flag for that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
9a796fdb |
|
19-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_trans_exit() no longer returns errors Now that peek_node()/next_node() are converted to return errors directly, we don't need bch2_trans_exit() to return errors - it's cleaner this way and wasn't used much anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
c075ff70 |
|
04-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_FILTER_SNAPSHOTS For snapshots, we need to implement btree lookups that return the first key that's an ancestor of the snapshot ID the lookup is being done in - and filter out keys in unrelated snapshots. This patch adds the btree iterator flag BTREE_ITER_FILTER_SNAPSHOTS which does that filtering. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
14b393ee |
|
15-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Subvolumes, snapshots This patch adds subvolume.c - support for the subvolumes and snapshots btrees and related data types and on disk data structures. The next patches will start hooking up this new code to existing code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
3074bc0f |
|
15-Sep-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
Revert "bcachefs: Add more assertions for locking btree iterators out of order" Figured out the bug we were chasing, and it had nothing to do with locking btree iterators/paths out of order. This reverts commit ff08733dd298c969aec7c7828095458f73fd5374. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
068bcaa5 |
|
03-Sep-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add more assertions for locking btree iterators out of order btree_path_traverse_all() traverses btree iterators in sorted order, and thus shouldn't see transaction restarts due to potential deadlocks - but sometimes we do. This patch adds some more assertions and tracks some more state to help track this down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
67e0dd8f |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_path This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
f21566f1 |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill BTREE_ITER_NODES We really only need to distinguish between btree iterators and btree key cache iterators - this is more prep work for btree_path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
deb0e573 |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill BTREE_ITER_NEED_PEEK This was used for an optimization that hasn't existing in quite awhile - iter->uptodate will probably be going away as well. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
6fba6b83 |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Prefer using btree_insert_entry to btree_iter This moves some data dependencies forward, to improve pipelining. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
5f8077cc |
|
29-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill BTREE_ITER_SET_POS_AFTER_COMMIT BTREE_ITER_SET_POS_AFTER_COMMIT is used internally to automagically advance extent btree iterators on sucessful commit. But with the upcomnig btree_path patch it's getting more awkward to support, and it adds overhead to core data structures that's only used in a few places, and can be easily done by the caller instead. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
84841b0d |
|
24-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_dump_trans_iters_updates() This factors out bch2_dump_trans_iters_updates() from the iter alloc overflow path, and makes some small improvements to what it prints. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
0423fb71 |
|
12-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Keep a sorted list of btree iterators This will be used to make other operations on btree iterators within a transaction more efficient, and enable some other improvements to how we manage btree iterators. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
e5af273f |
|
25-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: trans->restarted Start tracking when btree transactions have been restarted - and assert that we're always calling bch2_trans_begin() immediately after transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
9f1833ca |
|
10-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Update btree ptrs after every write This closes a significant hole (and last known hole) in our ability to verify metadata. Previously, since btree nodes are log structured, we couldn't detect lost btree writes that weren't the first write to a given node. Additionally, this seems to have lead to some significant metadata corruption on multi device filesystems with metadata replication: since a write may have made it to one device and not another, if we read that btree node back from the replica that did have that write and started appending after that point, the other replica would have a gap in the bset entries and reading from that replica wouldn't find the rest of the bsets. But, since updates to interior btree nodes are now journalled, we can close this hole by updating pointers to btree nodes after every write with the currently written number of sectors, without negatively affecting performance. This means we will always detect lost or corrupt metadata - it also means that our btree is now a curious hybrid of COW and non COW btrees, with all the benefits of both (excluding complexity). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
b00fde8f |
|
05-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE Add a new flag to control assertions about updating to internal snapshot nodes, that normally should not be written to - to be used in an upcoming patch. Also do some renaming - trigger_flags is now update_flags. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
297d8934 |
|
10-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Extensive triggers cleanups - We no longer mark subsets of extents, they're marked like regular keys now - which means we can drop the offset & sectors arguments to trigger functions - Drop other arguments that are no longer needed anymore in various places - fs_usage - Drop the logic for handling extents in bch2_mark_update() that isn't needed anymore, to match bch2_trans_mark_update() - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we don't have an old key to mark Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
a49e9a05 |
|
12-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix null ptr deref when splitting compressed extents Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
cd8319fd |
|
07-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill trans->updates2 Now that extent handling has been lifted to bch2_trans_update(), we don't need to keep two different lists of updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
5288e66a |
|
03-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_WITH_UPDATES This drops bch2_btree_iter_peek_with_updates() and replaces it with a new flag, BTREE_ITER_WITH_UPDATES, and also reworks bch2_btree_iter_peek_slot() to respect it too. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
509d3e0a |
|
20-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Child btree iterators This adds the ability for btree iterators to own child iterators - to be used by an upcoming rework of bch2_btree_iter_peek_slot(), so we can scan forwards while maintaining our current position. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
66a0a497 |
|
04-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_iter->should_be_locked Add a field to struct btree_iter for tracking whether it should be locked - this fixes spurious transaction restarts in bch2_trans_relock(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
531a0095 |
|
04-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve btree iterator tracepoints This patch adds some new tracepoints to the btree iterator code, and adds new fields to the existing tracepoints - primarily for the iterator position. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> |
#
d7f35163 |
|
09-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix BTREE_ITER_NOT_EXTENTS bch2_btree_iter_peek() wasn't properly checking for BTREE_ITER_IS_EXTENTS when updating iter->pos. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
3ce8b463 |
|
03-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill bset_tree->max_key Since we now ensure a btree node's max key fits in its packed format, this isn't needed for the reasons it used to be - and, it was being used inconsistently. Also reorder struct btree a bit for performance, and kill some dead code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
5c1d808a |
|
31-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Drop trans->nounlock Since we're no longer doing btree node merging post commit, we can now delete a bunch of code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
e751c01a |
|
24-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Start using bpos.snapshot field This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
43d00243 |
|
03-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a mechanism for running callbacks at trans commit time Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
331194a2 |
|
24-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree key cache locking improvements The btree key cache mutex was becoming a significant bottleneck - it was mainly used to protect the lists of dirty, clean and freed cached keys. This patch eliminates the dirty and clean lists - instead, when we need to scan for keys to drop from the cache we iterate over the rhashtable, and thus we're able to remove most uses of that lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
c7bb769c |
|
19-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: __bch2_trans_get_iter() refactoring, BTREE_ITER_NOT_EXTENTS Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
1f7fdc0a |
|
20-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_iter_live() New helper to clean things up a bit - also, improve iter->flags handling. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
6333bd2f |
|
20-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve handling of extents in bch2_trans_update() The transaction update/commit path cares about whether it's inserting extents or regular keys; extents require extra passes (handling of overlapping extents) but sometimes we want to skip all that. This clarifies things by adding a new member to btree_insert_entry specifying whether the key being inserted is an extent, instead of overloading BTREE_ITER_IS_EXTENTS. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
41f8b09e |
|
20-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rename BTREE_ID enums for consistency with other enums Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
f2785955 |
|
19-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill support for !BTREE_NODE_NEW_EXTENT_OVERWRITE() bcachefs has been aggressively migrating filesystems and btree nodes to the new format for quite some time - this shouldn't affect anyone anymore, and lets us delete a _lot_ of code. Also, it frees up KEY_TYPE_discard for a new whiteout key type for snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
e131b6aa |
|
23-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a mempool for btree_trans bump allocator This allocation is required for filesystem operations to make forward progress, thus needs a mempool. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
1889ad5a |
|
14-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add code to scan for/rewite old btree nodes This adds a new data job type to scan for btree nodes in the old extent format, and rewrite them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
7e1a3aa9 |
|
11-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: iter->real_pos We need to differentiate between the search position of a btree iterator, vs. what it actually points at (what we found). This matters for extents, where iter->pos will typically be the start of the key we found and iter->real_pos will be the end of the key we found (which soon won't necessarily be in the same btree node!) and it will also matter for snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
07a1006a |
|
17-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Reduce/kill BKEY_PADDED use With various newer key types - stripe keys, inline data extents - the old approach of calculating the maximum size of the value is becoming more and more error prone. Better to switch to bkey_on_stack, which can dynamically allocate if necessary to handle any size bkey. In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
5db43418 |
|
03-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't issue btree writes that weren't journalled If we have an error in the btree interior update path that prevents us from journalling the update, we can't issue the corresponding btree node write - we didn't get a journal sequence number that would cause it to be ignored in recovery. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
3eb26d01 |
|
01-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_trans_get_iter() no longer returns errors Since we now always preallocate the maximum number of iterators when we initialize a btree transaction, getting an iterator never fails - we can delete a fair amount of error path code. This patch also simplifies the iterator allocation code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
d0022290 |
|
29-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix error in filesystem initialization The rhashtable code doesn't like when we destroy an rhashtable that was never initialized Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
d5425a3b |
|
19-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Throttle updates when btree key cache is too dirty This is needed to ensure we don't deadlock because journal reclaim and thus memory reclaim isn't making forward progress. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
12590720 |
|
19-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve btree key cache shrinker The shrinker should start scanning for entries that can be freed oldest to newest - this way, we can avoid scanning a lot of entries that are too new to be freed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
628a3ad2 |
|
12-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a shrinker for the btree key cache Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
876c7af3 |
|
15-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Take a SRCU lock in btree transactions Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
6a747c46 |
|
09-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add accounting for dirty btree nodes/keys This lets us improve journal reclaim, so that it now tries to make sure no more than 3/4s of the btree node cache and btree key cache are dirty - ensuring the shrinkers can free memory. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
ae1ede58 |
|
02-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't embed btree iters in btree_trans These haven't been in used since reallocing iterators has been disabled, and saves us a lot of stack if we get rid of it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
29364f34 |
|
02-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Drop sysfs interface to debug parameters It's not used much anymore, the module paramter interface is better. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
dcf141b9 |
|
28-Oct-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix spurious transaction restarts The check for whether locking a btree node would deadlock was wrong - we have to check that interior nodes are locked before descendents, but this check was wrong when consider cached vs. non cached iterators. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
39283c71 |
|
19-Oct-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix for bad stripe pointers The allocator usually doesn't increment bucket gens right away on buckets that it's about to hand out (for reasons that need to be documented), instead deferring that to whatever extent update first references that bucket. But stripe pointers reference buckets without changing bucket sector counts, meaning we could end up with a pointer in a stripe with a gen newer than the bucket it points to. Fix this by adding a transactional trigger for KEY_TYPE_stripe that just writes out the keys in the alloc btree for the buckets it points to. Also - consolidate the code that checks pointer validity. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
f3721e12 |
|
16-Oct-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Perf improvements for bch_alloc_read() On large filesystems reading in the alloc info takes a significant amount of time. But we don't need to be calling into the fully general bch2_mark_key() path, just open code what we need in bch2_alloc_read_fn(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
4580baec |
|
25-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Remove some uses of PAGE_SIZE in the btree code For portability to userspace, we should try to avoid working in kernel pages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
697e45b2 |
|
06-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill BTREE_TRIGGER_NOOVERWRITES This is prep work for reworking the triggers machinery - we have triggers that need to know both the old and the new key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
fff899b1 |
|
03-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Mark btree nodes as needing rewrite when not all replicas are RW This fixes a bug where recovery fails when one of the devices is read only. Also - consolidate the "must rewrite this node to insert it" behind a new btree node flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
64f2a880 |
|
28-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix bch2_extent_can_insert() not being called It's supposed to check whether we're splitting a compressed extent and if so get a bigger disk reservation - hence this fixes a "disk usage increased by x without a reservaiton" bug. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
e27b03b3 |
|
15-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Give bkey_cached_key same attributes as bpos Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
2ca88e5a |
|
07-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree key cache This introduces a new kind of btree iterator, cached iterators, which point to keys cached in a hash table. The cache also acts as a write cache - in the update path, we journal the update but defer updating the btree until the cached entry is flushed by journal reclaim. Cache coherency is for now up to the users to handle, which isn't ideal but should be good enough for now. These new iterators will be used for updating inodes and alloc info (the alloc and stripes btrees). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
515282ac |
|
12-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a deadlock __bch2_btree_node_lock() was incorrectly using iter->pos as a proxy for btree node lock ordering, this caused an off by one error that was triggered by bch2_btree_node_get_sibling() getting the previous node. This refactors the code to compare against btree node keys directly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
f96c0df4 |
|
02-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a deadlock in bch2_btree_node_get_sibling() There was a bad interaction with bch2_btree_iter_set_pos_same_leaf(), which can leave a btree node locked that is just outside iter->pos, breaking the lock ordering checks in __bch2_btree_node_lock(). Ideally we should get rid of this corner case, but for now fix it locally with verbose comments. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
495aabed |
|
02-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add debug code to print btree transactions Intented to help debug deadlocks, since we can't use lockdep to check btree node lock ordering. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
00b8ccf7 |
|
25-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Interior btree updates are now fully transactional We now update the alloc info (bucket sector counts) atomically with journalling the update to the interior btree nodes, and we also set new btree roots atomically with the journalled part of the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
96e2aa1b |
|
25-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a mechanism for passing extra journal entries to bch2_trans_commit() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
0329b150 |
|
01-Apr-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Trace where btree iterators are allocated This will help with iterator overflow bugs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
2c31e657 |
|
29-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Reduce max nr of btree iters when lockdep is on This is so we don't overflow MAX_LOCK_DEPTH. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
6357d607 |
|
08-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal updates to interior nodes Previously, the btree has always been self contained and internally consistent on disk without anything from the journal - the journal just contained pointers to the btree roots. However, this meant that btree node split or compact operations - i.e. anything that changes btree node topology and involves updates to interior nodes - would require that interior btree node to be written immediately, which means emitting a btree node write that's mostly empty (using 4k of space on disk if the filesystemm blocksize is 4k to only write perhaps ~100 bytes of new keys). More importantly, this meant most btree node writes had to be FUA, and consumer drives have a history of slow and/or buggy FUA support - other filesystes have been bit by this. This patch changes the interior btree update path to journal updates to interior nodes, after the writes for the new btree nodes have completed. Best of all, it turns out to simplify the interior node update path somewhat. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
e62d65f2 |
|
15-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: trans_commit() path can now insert to interior nodes This will be needed for the upcoming patches to journal updates to interior btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
e3e464ac |
|
30-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Move extent overwrite handling out of core btree code Ever since the btree code was first written, handling of overwriting existing extents - including partially overwriting and splittin existing extents - was handled as part of the core btree insert path. The modern transaction and iterator infrastructure didn't exist then, so that was the only way for it to be done. This patch moves that outside of the core btree code to a pass that runs at transaction commit time. This is a significant simplification to the btree code and overall reduction in code size, but more importantly it gets us much closer to the core btree code being completely independent of extents and is important prep work for snapshots. This introduces a new feature bit; the old and new extent update models are incompatible when the filesystem needs journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
2dac0eae |
|
18-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Iterator debug code improvements More aggressively checking iterator invariants, and fixing the resulting bugs. Also greatly simplifying iter_next() and iter_next_slot() - they were hyper optimized before, but the optimizations were getting too brittle. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
237e8048 |
|
18-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: introduce b->hash_val This is partly prep work for introducing bch_btree_ptr_v2, but it'll also be a bit of a performance boost by moving the full key out of the hot part of struct btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
d98a5e39 |
|
09-Jan-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add some comments for btree iterator flags Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
24326cd1 |
|
31-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Sort & deduplicate updates in bch2_trans_update() Previously, when doing multiple update in the same transaction commit that overwrote each other, we relied on doing the updates in the same order as the bch2_trans_update() calls in order to get the correct result. But that wasn't correct for triggers; bch2_trans_mark_update() when marking overwrites would do the wrong thing because it hadn't seen the update that was being overwritten. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
2d594dfb |
|
31-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out btree_trigger_flags The trigger flags really belong with individual btree_insert_entries, not the transaction commit flags - this splits out those flags and unifies them with the BCH_BUCKET_MARK flags. Todo - split out btree_trigger.c from buckets.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
54e86b58 |
|
30-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make btree_insert_entry more private to update path This should be private to btree_update_leaf.c, and we might end up removing it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
bcd6f3e0 |
|
26-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use KEY_TYPE_deleted whitouts for extents Previously, partial overwrites of existing extents were handled implicitly by the btree code; when reading in a btree node, we'd do a mergesort of the different bsets and detect and fix partially overlapping extents during that mergesort. That approach won't work with snapshots: this changes extents to work like regular keys as far as the btree code is concerned, where a 0 size KEY_TYPE_deleted whiteout will completely overwrite an existing extent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
8b3bbe2c |
|
24-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't reexecute triggers when retrying transaction commit This was causing a bug with transaction iterators overflowing; now, if triggers have to be reexecuted we always return -EINTR and retry from the start of the transaction. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
c297a763 |
|
13-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Refactor whiteouts compaction The whiteout compaction path - as opposed to just dropping whiteouts - is now only needed for extents, and soon will only be needed for extent btree nodes in the old format. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
c9bebae6 |
|
29-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Whiteout changes More prep work for snapshots: extents will soon be using KEY_TYPE_deleted for whiteouts, with 0 size. But we wen't be able to keep these whiteouts with the rest of the extents in the btree node, due to sorting invariants breaking. We can deal with this by immediately moving the new whiteouts to the unwritten whiteouts area - this just means those whiteouts won't be sorted, so we need new code to sort them prior to merging them with the rest of the keys to be written. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
2a9101a9 |
|
19-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Refactor bch2_trans_commit() path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
8f1965391 |
|
19-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make btree_node_type_needs_gc() cheaper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
64bc0011 |
|
26-Sep-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rework btree iterator lifetimes The btree_trans struct needs to memoize/cache btree iterators, so that on transaction restart we don't have to completely redo btree lookups, and so that we can do them all at once in the correct order when the transaction had to restart to avoid a deadlock. This switches the btree iterator lookups to work based on iterator position, instead of trying to match them up based on the stack trace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
a7199432 |
|
22-Sep-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill deferred btree updates Will be replaced by cached btree iterators Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
bbd8d203 |
|
22-Sep-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_SLOTS isn't a type of btree iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
36e9d698 |
|
07-Sep-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Do updates in order they were queued up in Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
76426098 |
|
16-Aug-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Reflink Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
88767d65 |
|
24-Jun-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Update path now handles triggers that generate more triggers Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
3838be78 |
|
15-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't use a fixed size buffer for fs_usage_deltas Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
61011ea2 |
|
14-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rip out old hacky transaction restart tracing Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
87c3beb4 |
|
15-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a bug with spinning on the journal Transactional triggers meant that when we failed to get a journal reservation, then bailed out into the error path to block on a journal reservation, the second blocking call into the journal code was asking for less space, which is not what we want. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
60755344 |
|
10-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill BTREE_ITER_NOUNLOCK Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
932aa837 |
|
11-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_trans_mark_update() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
c43a6ef9 |
|
05-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_bkey_cached_common This is prep work for the btree key cache: btree iterators will point to either struct btree, or a new struct bkey_cached. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
ba5c6557 |
|
22-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add actual tracepoints for transaction restarts Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
ece254b2 |
|
04-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: don't lose errors from iterators that have been freed Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
e1120a4c |
|
27-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add iter->idx Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
ecc892e4 |
|
27-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill btree_iter->next Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
9e5e5b9e |
|
25-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree iterators now always have a btree_trans Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
5df4be3f |
|
25-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree iter improvements Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
dc3b63dc |
|
21-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add time stats for btree updates Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
9623ab27 |
|
15-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree update path cleanup Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
0dc17247 |
|
13-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill struct btree_insert Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
c93cead0 |
|
16-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Always use bch2_extent_trim_atomic() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
a8e00bd4 |
|
07-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: increase BTREE_ITER_MAX Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
3e5d6c59 |
|
19-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use journal preres for deferred btree updates Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
8fe826f9 |
|
13-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert bucket invalidation to key marking path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
39fbc5a4 |
|
11-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: gc lock no longer needed for disk reservations Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
42b72e0b |
|
24-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: journal_replay_early() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
3636ed48 |
|
17-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Deferred btree updates Will be used in the future for inode updates, which will be very helpful for multithreaded workloads that have to update the inode with every extent update (appends, or updates that change i_sectors) Also will be used eventually for fully persistent alloc info However - we still need a mechanism for reserving space in the journal prior to getting a journal reservation, so it's not technically safe to make use of this just yet, we could deadlock with the journal full (although not likely to be an issue in practice) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
f0cfb963 |
|
29-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Track nr_inodes with the key marking machinery Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
26609b61 |
|
01-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make bkey types globally unique this lets us get rid of a lot of extra switch statements - in a lot of places we dispatch on the btree node type, and then the key type, so this is a nice cleanup across a lot of code. Also improve the on disk format versioning stuff. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
ad7ae8d6 |
|
23-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree locking fix, refactoring Hit an assertion, probably spurious, indicating an iterator was unlocked when it shouldn't have been (spurious because it wasn't locked at all when the caller called btree_insert_at()). Add a flag, BTREE_ITER_NOUNLOCK, and tighten up the assertions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
1d25849c |
|
07-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Centralize marking of replicas in btree update path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
2252aa27 |
|
21-Oct-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree gc refactoring prep work for erasure coding Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
ef337c54 |
|
06-Oct-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Allocation code refactoring bch2_alloc_sectors_start() was a nightmare to work with - it's got some tricky stuff to do, since it wants to use the buckets the writepoint already has, unless they're not in the target it wants to write to, unless it can't allocate from any other devices in which case it will use those buckets if it has to - et cetera. This restructures the code to start with a new empty list of open buckets we're going to use for the new allocation, pulling buckets from the write point's list as we decide that we really are going to use them - making the code somewhat more functional and drastically easier to understand. Also fixes a bug where we could end up waiting on c->freelist_wait (because allocating from one device failed) but return success from bch2_bucket_alloc(), because allocating from a different device succeeded. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
fc3268c1 |
|
08-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill extent_insert_hook Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
cc1add4a |
|
05-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_INSERT_JOURNAL_RES_FULL is no longer possible Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
e4ccb251 |
|
05-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: make struct btree_iter a bit smaller Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
271a3d3a |
|
21-Jul-2016 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: lift ordering restriction on 0 size extents This lifts the restriction that 0 size extents must not overlap with other extents, which means we can now sort extents and non extents the same way, and will let us simplify a bunch of other stuff as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
1fe08f31 |
|
05-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bkey_written() also cleanups of btree node offsets Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
617391ba |
|
05-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: improved rw_aux_tree_bsearch() shouldn't be any reason for an actual binary search here Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
b0004d8d |
|
03-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Factor out btree_key_can_insert() working on getting rid of all the reasons bch2_insert_fixup_extent() can fail/stop partway, which is needed for other refactorings. One of the reasons we could have to bail out is if we're splitting a compressed extent we might need to add to our disk reservation - but we can check that before actually starting the insert. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
1c7a0adf |
|
12-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: trace transaction restarts exceptionally crappy "tracing", but it's a start at documenting the places restarts can be triggered Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
1c6fdbd8 |
|
17-Mar-2017 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Initial commit Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write filesystem with every feature you could possibly want. Website: https://bcachefs.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
#
6bd68ec2 |
|
12-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Heap allocate btree_trans We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
96dea3d5 |
|
12-Sep-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix W=12 build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e8d2fe3b |
|
21-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Consolidate btree id properties This refactoring centralizes defining per-btree properties. bch2_key_types_allowed was also about to overflow a u32, so expand that to a u64. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
eabb10dc |
|
19-Jul-2023 |
Brian Foster <bfoster@redhat.com> |
bcachefs: support btree updates of prejournaled keys Introduce support for prejournaled key updates. This allows a transaction to commit an update for a key that already exists (and is pinned) in the journal. This is required for btree write buffer updates as the current scheme of journaling both on write buffer insertion and write buffer (slow path) flush is unsafe in certain crash recovery scenarios. Create a small trans update wrapper to pass along the seq where the key resides into the btree_insert_entry. From there, trans commit passes the seq into the btree insert path where it is used to manage the journal pin for the associated btree leaf. Note that this patch only introduces the underlying mechanism and otherwise includes no functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
73bd774d |
|
06-Jul-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Assorted sparse fixes - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
95b595a5 |
|
30-Apr-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Btree iterator, update flags no longer conflict Change btree_update_flags to start after the last btree iterator flag, so that we can pass both in the same flags argument. This is needed for the upcoming bch2_bkey_get_mut() helper. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f3a65bb9 |
|
27-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Convert constants to consts Rust bindgen doesn't handle macros, but it does handle integer constants: this conversion aids in implementing safe Rust wrapper interfaces. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5e2d8be8 |
|
18-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Split trans->last_begin_ip and trans->last_restarted_ip These are two different things - this improves our debug assert messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
930c0c4c |
|
11-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Add missing include Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
920e69bc |
|
03-Jan-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Btree write buffer This adds a new method of doing btree updates - a straight write buffer, implemented as a flat fixed size array. This is only useful when we don't need to read from the btree in order to do the update, and when reading is infrequent - perfect for the LRU btree. This will make LRU btree updates fast enough that we'll be able to use it for persistently indexing buckets by fragmentation, which will be a massive boost to copygc performance. Changes: - A new btree_insert_type enum, for btree_insert_entries. Specifies btree, btree key cache, or btree write buffer. - bch2_trans_update_buffered(): updates via the btree write buffer don't need a btree path, so we need a new update path. - Transaction commit path changes: The update to the btree write buffer both mutates global, and can fail if there isn't currently room. Therefore we do all write buffer updates in the transaction all at once, and also if it fails we have to revert filesystem usage counter changes. If there isn't room we flush the write buffer in the transaction commit error path and retry. - A new persistent option, for specifying the number of entries in the write buffer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
30ca6ece |
|
09-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill trans->flags Recursive transaction commits are occasionally necessary - in particular, for the upcoming btree write buffer's flush path. This avoids bugs due to trans->flags being accidentally mutated mid-commit, which can cause c->writes refcount leaks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
60b55388 |
|
09-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: trans->notrace_relock_fail When we unlock in order to submit IO, the next relock event is likely to fail if submit_bio() blocked - we shouldn't those events in our _fail stats, since those are expected events and shouldn't cause test failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
94c69faf |
|
04-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Use six_lock_ip() This uses the new _ip() interface to six locks and hooks it up to btree_path->ip_allocated, when available. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
464b4155 |
|
05-Feb-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix bch2_trans_reset_updates() This should have been resetting trans->fs_usage_deltas as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ee2c6ea7 |
|
08-Jan-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree_iter->ip_allocated In debug mode, we now track where btree iterators and paths are initialized/allocated - helpful in tracking down btree path overflows. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5bbe3f2d |
|
14-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Log more messages in the journal This patch - Adds a mechanism for queuing up journal entries prior to the journal being started, which will be used for early journal log messages - Adds bch2_fs_log_msg() and improves bch2_trans_log_msg(), which now take format strings. bch2_fs_log_msg() can be used before or after the journal has been started, and will use the appropriate mechanism. - Deletes the now obsolete bch2_journal_log_msg() - And adds more log messages to the recovery path - messages for journal/filesystem started, journal entries being blacklisted, and journal replay starting/finishing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e242b92a |
|
15-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix for long running btree transactions & key cache While a btree transaction is running, we hold a SRCU read lock on the btree key cache that prevents btree key cache keys from being freed - this is so that relock() operations won't access freed memory. The downside of this is that long running btree transactions prevent memory from being freed from the key cache. This adds a check in bch2_trans_begin() - if the transaction has been running longer than 1 second, drop and retake the SRCU read lock and zero out pointers to unlock key cache paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a16b19cd |
|
02-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Allow for more btrees Expand some bitfields so we can keep adding more btrees. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ac9fa4bd |
|
07-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Kill btree_insert_ret enum Replace with standard bcachefs-private error codes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1617d56d |
|
22-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Key cache now works for snapshots btrees This switches btree_key_cache_fill() to use a btree iterator, not a btree path, so that it can search for keys in previous snapshots. We also add another iterator flag, BTREE_ITER_KEY_CACHE_FILL, to avoid recursion back into the key cache. This will allow us to re-enable the key cache for inodes in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
087e53c2 |
|
20-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Bring back BTREE_ITER_CACHED_NOFILL Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c96f108b |
|
24-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Optimize bch2_trans_iter_init() When flags & btree_id are constants, we can constant fold the entire calculation of the actual iterator flags - and the whole thing becomes small enough to inline. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
42af0ad5 |
|
17-Nov-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix a race with b->write_type b->write_type needs to be set atomically with setting the btree_node_need_write flag, so move it into b->flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
46fee692 |
|
28-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Improved btree write statistics This replaces sysfs btree_avg_write_size with btree_write_stats, which now breaks out statistics by the source of the btree write. Btree writes that are too small are a source of inefficiency, and excessive btree resort overhead - this will let us see what's causing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fd0c7679 |
|
22-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Convert to __packed and __aligned Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
77671e8f |
|
21-Oct-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Move bkey bkey_unpack_key() to bkey.h Long ago, bkey_unpack_key() was added to bset.h instead of bkey.h because bkey.h didn't include btree_types.h, which it needs for the compiled unpack function. This patch finally moves it to the proper location. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0d7009d7 |
|
22-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Delete old deadlock avoidance code This deletes our old lock ordering based deadlock avoidance code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
33bd5d06 |
|
22-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Deadlock cycle detector We've outgrown our own deadlock avoidance strategy. The btree iterator API provides an interface where the user doesn't need to concern themselves with lock ordering - different btree iterators can be traversed in any order. Without special care, this will lead to deadlocks. Our previous strategy was to define a lock ordering internally, and whenever we attempt to take a lock and trylock() fails, we'd check if the current btree transaction is holding any locks that cause a lock ordering violation. If so, we'd issue a transaction restart, and then bch2_trans_begin() would re-traverse all previously used iterators, but in the correct order. That approach had some issues, though. - Sometimes we'd issue transaction restarts unnecessarily, when no deadlock would have actually occured. Lock ordering restarts have become our primary cause of transaction restarts, on some workloads totally 20% of actual transaction commits. - To avoid deadlock or livelock, we'd often have to take intent locks when we only wanted a read lock: with the lock ordering approach, it is actually illegal to hold _any_ read lock while blocking on an intent lock, and this has been causing us unnecessary lock contention. - It was getting fragile - the various lock ordering rules are not trivial, and we'd been seeing occasional livelock issues related to this machinery. So, since bcachefs is already a relational database masquerading as a filesystem, we're stealing the next traditional database technique and switching to a cycle detector for avoiding deadlocks. When we block taking a btree lock, after adding ourself to the waitlist but before sleeping, we do a DFS of btree transactions waiting on other btree transactions, starting with the current transaction and walking our held locks, and transactions blocking on our held locks. If we find a cycle, we emit a transaction restart. Occasionally (e.g. the btree split path) we can not allow the lock() operation to fail, so if necessary we'll tell another transaction that it has to fail. Result: trans_restart_would_deadlock events are reduced by a factor of 10 to 100, and we'll be able to delete a whole bunch of grotty, fragile code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
3d21d48e |
|
03-Sep-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Fix usage of six lock's percpu mode, key cache version Similar to "bcachefs: Fix usage of six lock's percpu mode", six locks have a percpu mode, but we can't switch between percpu and non percpu modes while a lock is in use: threads attempting to take a read lock may race, and we'll end up with the read count permanently off. Fixing this the "correct" way, in six_lock_pcpu_(alloc|free) would require an RCU barrier, and we don't want to do that - instead, we have to permanently segragate percpu and non percpu objects, including when on freelists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4e6defd1 |
|
31-Aug-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: btree_bkey_cached_common->cached Add a type descriptor to btree_bkey_cached_common - there's no reason not to since we've got padding that was otherwise unused, and this is a nice cleanup (and helpful in later patches). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
616928c3 |
|
22-Aug-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Track maximum transaction memory This patch - tracks maximum bch2_trans_kmalloc() memory used in btree_transaction_stats - makes it available in debugfs - switches bch2_trans_init() to using that for the amount of memory to preallocate, instead of the parameter passed in This drastically reduces transaction restarts, and means we no longer need to track this in the source code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2e27f656 |
|
21-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill nodes_intent_locked Previously, we used two different bit arrays for tracking held btree node locks. This patch switches to an array of two bit integers, which will let us track, in a future patch, when we hold a write lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
cd5afabe |
|
19-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_locking.c Start to centralize some of the locking code in a new file; more locking code will be moving here in the future. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
5c0bb66a |
|
11-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Track the maximum btree_paths ever allocated by each transaction We need a way to check if the machinery for handling btree_paths with in a transaction is behaving reasonably, as it often has not been - we've had bugs with transaction path overflows caused by duplicate paths and plenty of other things. This patch tracks, per transaction fn, the most btree paths ever allocated by that transaction and makes it available in debugfs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4aba7d45 |
|
11-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rename lock_held_stats -> btree_transaction_stats Going to be adding more things to this in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6fae65c1 |
|
10-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill BTREE_ITER_CACHED_(NOFILL|NOCREATE) These were used more prior to getting rid of the in-memory bucket arrays - they don't serve much purpose anymore, and deleting them lets us write better assertions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
315c9ba6 |
|
10-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_NO_NODE -> BCH_ERR codes Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
86b74451 |
|
05-Aug-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix bch2_btree_trans_to_text() bch2_btree_trans_to_text() is used to print btree_transactions owned by other threads; thus, it needs to be particularly careful. This fixes a null ptr deref caused by racing with the owning thread changing path->l[].b. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
549d173c |
|
17-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: EINTR -> BCH_ERR_transaction_restart Now that we have error codes, with subtypes, we can switch to our own error code for transaction restarts - and even better, a distinct error code for each transaction restart reason: clearer code and better debugging. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
e941ae7d |
|
17-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a counter for btree_trans restarts This will help us improve nested transactions - we need to add assertions that whenever an inner transaction handles a restart, it still returns -EINTR to the outer transaction. This also adds nested_lockrestart_do() and nested_commit_do() which use the new counters to correctly return -EINTR when the transaction was restarted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
89333156 |
|
16-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert bch2_dev_usrdata_drop() to for_each_btree_key2() The new for_each_btree_key2() macro handles transaction retries, allowing us to avoid nested transactions - which we want to avoid since they're tricky to do completely correctly and upcoming assertions are going to be checking for that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c807ca95 |
|
14-Jul-2022 |
Daniel Hill <daniel@gluo.nz> |
bcachefs: added lock held time stats We now record the length of time btree locks are held and expose this in debugfs. Enabled via CONFIG_BCACHEFS_LOCK_TIME_STATS. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
43de721a |
|
13-Jul-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Unlock in bch2_trans_begin() if we've held locks more than 10us We try to ensure we never hold btree locks for too long - bcachefs tries to be soft realtime. This adds a check when restarting a transaction, where a transaction restart is cheap - if we've been holding locks for too long, drop and retake them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
8f7f566f |
|
16-Jun-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree key cache pcpu freedlist Originally, the btree key cache code would always allocate new entries by reusing from the recently-freed list, if that list wasn't empty. But that behaviour was dropped, for lock contention reasons. But it seems that entries stranded on the freed list have been contributing to some of our oom issues, because long running btree transactions will prevent them from being freed. This patch re-adds allocating from the freed list, but it also adds percpu buffers to solve the lock contention issues - and the new percpu freed lists will improve the evict paths, too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
50b13bee |
|
17-Jun-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve an error message When inserting a key type that's not valid for a given btree, we should print out which btree we were inserting into. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b7c11046 |
|
19-May-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Increase max size for btree_trans bump allocator With backpointers, alloc keys have gotten bigger, so we're needing more memory here. We're probably going to need to go with something more sophisticated than a bump allocator, but - let's see if we can avoid doing that just yet. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
30525f68 |
|
21-May-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix journal_keys_search() overhead Previously, on every btree_iter_peek() operation we were searching the journal keys, doing a full binary search - which was slow. This patch fixes that by saving our position in the journal keys, so that we only do a full binary search when moving our position backwards or a large jump forwards. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
b0babf2a |
|
12-Apr-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_btree_iter_peek_all_levels() This adds bch2_btree_iter_peek_all_levels(), which returns keys from every level of the btree - interior nodes included - in monotonically increasing order, soon to be used by the backpointers check & repair code. - BTREE_ITER_ALL_LEVELS can now be passed to for_each_btree_key() to iterate thusly, much like BTREE_ITER_SLOTS - The existing algorithm in bch2_btree_iter_advance() doesn't work with peek_all_levels(): we have to defer the actual advancing until the next time we call peek, where we have the btree path traversed and uptodate. So, we add an advanced bit to btree_iter; when BTREE_ITER_ALL_LEVELS is set bch2_btree_iter_advanced() just marks the iterator as advanced. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e1b8f5f5 |
|
31-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Plumb btree_id & level to trans_mark For backpointers, we'll need the full key location - that means btree_id and btree level. This patch plumbs it through. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
c6b2826c |
|
11-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Freespace, need_discard btrees This adds two new btrees for the upcoming allocator rewrite: an extents btree of free buckets, and a btree for buckets awaiting discards. We also add a new trigger for alloc keys to keep the new btrees up to date, and a compatibility path to initialize them on existing filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3d48a7f8 |
|
31-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: KEY_TYPE_alloc_v4 This introduces a new alloc key which doesn't use varints. Soon we'll be adding backpointers and storing them in alloc keys, which means our pack/unpack workflow for alloc keys won't really work - we'll need to be mutating alloc keys in place. Instead of bch2_alloc_unpack(), we now have bch2_alloc_to_v4() that converts older types of alloc keys to v4 if needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2a6870ad |
|
29-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use darray for extra_journal_entries Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
3a306f3c |
|
17-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix large key cache keys Previously, we'd go into an infinite loop when attempting to cache a bkey in the key cache larger than 128 u64s - since we were only using a u8 for the size field, it'd get rounded up to 256 then truncated to 0. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
30985537 |
|
04-Mar-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix usage of six lock's percpu mode Six locks have a percpu mode, which we use for interior btree nodes, as well as btree key cache keys for the subvolumes btree. We've been switching locks back and forth between percpu and non percpu mode as needed, but it turns out this is racy - when we're reusing an existing node, other threads could be attempting to lock it while we're switching it between modes. This patch fixes this by never switching 'struct btree' between the two modes, and instead segragating them between two different freed lists. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bf3efff5 |
|
27-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix race leading to btree node write getting stuck Checking btree_node_may_write() isn't atomic with the other btree flags, dirty and need_write in particular. There was a rare race where we'd unblock a node from writing while __btree_node_flush() was setting need_write, and no thread would notice that the node was now both able to write and needed to be written. Fix this by adding btree node flags for will_make_reachable and write_blocked that can be checked in the cmpxchg loop in __bch2_btree_node_write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
de517c95 |
|
26-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use x-macros for btree node flags This is for adding an array of strings for btree node flag names. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
8322a937 |
|
04-Jan-2023 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: Btree key cache optimization This helps with lock contention in the journalling code: instead of updating our journal pin on every write, only get a journal pin if we don't have one. This means we can avoid hammering on journal locks nearly so much, at the cost of carrying around a journal pin for an older entry than the one we actually need. To handle that, if needed we update our journal pin to the correct one when flushed by journal reclaim. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8f9ad91a |
|
17-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix failure to allocate btree node in cache The error code when we fail to allocate a node in the btree node cache doesn't make it to bch2_btree_path_traverse_all(). Instead, we need to stash a flag in btree_trans so we know we have to take the cannibalize lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
c7ce2732 |
|
15-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Also show when blocked on write locks This consolidates some of the btree node lock path, so that when we're blocked taking a write lock on a node it shows up in bch2_btree_trans_to_text(), along with intent and read locks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
12ce5b7d |
|
11-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree key cache coherency - Updates to non key cache iterators will now be transparently redirected to the key cache for cached btrees. - Except when creating new keys: then the update goes to underlying btree For for iterating over a cached btree to work, we need to ensure that if a key exists in the key cache, it also exists in the btree - otherwise the iterator code will skip past it and not check the key cache. Otherwise, for consistency, all updates should go to the same place - the key cache. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f7b6ca23 |
|
06-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_WITH_KEY_CACHE This is the start of cache coherency with the btree key cache - this adds a btree iterator flag that causes lookups to also check the key cache when we're iterating over the btree (not iterating over the key cache). Note that we could still race with another thread creating at item in the key cache and updating it, since we aren't holding the key cache locked if it wasn't found. The next patch for the update path will address this by causing the transaction to restart if the key cache is found to be dirty. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2e63e180 |
|
24-Feb-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Stash a copy of key being overwritten in btree_insert_entry We currently need to call bch2_btree_path_peek_slot() multiple times in the transaction commit path - and some of those need to be updated to also check the keys from journal replay, too. Let's consolidate this and stash the key being overwritten in btree_insert_entry. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
1f2d9192 |
|
08-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: iter->update_path With BTREE_ITER_FILTER_SNAPSHOTS, we have to distinguish between the path where the key was found, and the path for inserting into the current snapshot. This adds a new field to struct btree_iter for saving a path for the current snapshot, and plumbs it through bch2_trans_update(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
669f87a5 |
|
03-Jan-2022 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Switch to __func__for recording where btree_trans was initialized Symbol decoding, via %ps, isn't supported in userspace - this will also be faster when we're using trans->fn in the fast path, as with the new BCH_JSET_ENTRY_log journal messages. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
5222a460 |
|
25-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_WITH_JOURNAL This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is automatically enabled when initializing a btree iterator before journal replay has completed - it overlays the contents of the journal with the btree. This lets us delete bch2_btree_and_journal_walk() and just use the normal btree iterator interface instead - which also lets us delete a significant amount of duplicated code. Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch - we're redoing the binary search over keys in the journal every time we call bch2_btree_iter_peek(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fb64f3fd |
|
31-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BCH_JSET_ENTRY_log Add a journal entry type for logging messages, and add an option to use it to log the transaction name - this makes for a very handy debugging tool, as with it we can use the 'bcachefs list_journal' command to see not only what updates were done, but what was doing them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
f3e1f444 |
|
21-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_NOPRESERVE This adds a flag to not mark the initial btree_path as preserve, for paths that we expect to be cheap to reconstitute if necessary - this solves a btree_path overflow caused by need_whiteout_for_snapshot(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
b84d42c3 |
|
16-Dec-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out CONFIG_BCACHEFS_DEBUG_TRANSACTIONS This puts the btree_transactions sysfs/debugfs file behind a separate config option - it's highly useful, but not cheap enough to enable permenantly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f0c3f88b |
|
26-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Run insert triggers before overwrite triggers Currently, btree triggers are run in natural key order, which presents a problem for fallocate in INSERT_RANGE mode: since we're moving existing extents to higher offsets, the trigger for deleting the old extent runs before the trigger that adds the new extent, potentially leading to indirect extents being deleted that shouldn't be when the delete causes the refcount to hit 0. This changes the order we run triggers so that for a givin btree, we run all insert triggers before overwrite triggers, nicely sidestepping this issue. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
3e52c222 |
|
29-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add journal_seq to inode & alloc keys Add fields to inode & alloc keys that record the journal sequence number when they were most recently modified. For alloc keys, this is needed to know what journal sequence number we have to flush before the bucket can be reused. Currently this is tracked in memory, but we'll be getting rid of the in memory bucket array. For inodes, this is needed for fsync when the inode has been evicted from the vfs cache. Currently we use a bloom filter per outstanding journal buf - but that mechanism has been broken since we added the ability to not issue a flush/fua for every journal write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
34d74830 |
|
30-Dec-2022 |
Kent Overstreet <kent.overstreet@linux.dev> |
bcachefs: BTREE_UPDATE_NOJOURNAL We're going to have btree updates that don't need to be journalled; add a flag for that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9a796fdb |
|
19-Oct-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_trans_exit() no longer returns errors Now that peek_node()/next_node() are converted to return errors directly, we don't need bch2_trans_exit() to return errors - it's cleaner this way and wasn't used much anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
c075ff70 |
|
04-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_FILTER_SNAPSHOTS For snapshots, we need to implement btree lookups that return the first key that's an ancestor of the snapshot ID the lookup is being done in - and filter out keys in unrelated snapshots. This patch adds the btree iterator flag BTREE_ITER_FILTER_SNAPSHOTS which does that filtering. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
14b393ee |
|
15-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Subvolumes, snapshots This patch adds subvolume.c - support for the subvolumes and snapshots btrees and related data types and on disk data structures. The next patches will start hooking up this new code to existing code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
3074bc0f |
|
15-Sep-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
Revert "bcachefs: Add more assertions for locking btree iterators out of order" Figured out the bug we were chasing, and it had nothing to do with locking btree iterators/paths out of order. This reverts commit ff08733dd298c969aec7c7828095458f73fd5374. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
068bcaa5 |
|
03-Sep-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add more assertions for locking btree iterators out of order btree_path_traverse_all() traverses btree iterators in sorted order, and thus shouldn't see transaction restarts due to potential deadlocks - but sometimes we do. This patch adds some more assertions and tracks some more state to help track this down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
67e0dd8f |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_path This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f21566f1 |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill BTREE_ITER_NODES We really only need to distinguish between btree iterators and btree key cache iterators - this is more prep work for btree_path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
deb0e573 |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill BTREE_ITER_NEED_PEEK This was used for an optimization that hasn't existing in quite awhile - iter->uptodate will probably be going away as well. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
6fba6b83 |
|
30-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Prefer using btree_insert_entry to btree_iter This moves some data dependencies forward, to improve pipelining. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
5f8077cc |
|
29-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill BTREE_ITER_SET_POS_AFTER_COMMIT BTREE_ITER_SET_POS_AFTER_COMMIT is used internally to automagically advance extent btree iterators on sucessful commit. But with the upcomnig btree_path patch it's getting more awkward to support, and it adds overhead to core data structures that's only used in a few places, and can be easily done by the caller instead. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
84841b0d |
|
24-Aug-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_dump_trans_iters_updates() This factors out bch2_dump_trans_iters_updates() from the iter alloc overflow path, and makes some small improvements to what it prints. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
0423fb71 |
|
12-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Keep a sorted list of btree iterators This will be used to make other operations on btree iterators within a transaction more efficient, and enable some other improvements to how we manage btree iterators. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e5af273f |
|
25-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: trans->restarted Start tracking when btree transactions have been restarted - and assert that we're always calling bch2_trans_begin() immediately after transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
9f1833ca |
|
10-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Update btree ptrs after every write This closes a significant hole (and last known hole) in our ability to verify metadata. Previously, since btree nodes are log structured, we couldn't detect lost btree writes that weren't the first write to a given node. Additionally, this seems to have lead to some significant metadata corruption on multi device filesystems with metadata replication: since a write may have made it to one device and not another, if we read that btree node back from the replica that did have that write and started appending after that point, the other replica would have a gap in the bset entries and reading from that replica wouldn't find the rest of the bsets. But, since updates to interior btree nodes are now journalled, we can close this hole by updating pointers to btree nodes after every write with the currently written number of sectors, without negatively affecting performance. This means we will always detect lost or corrupt metadata - it also means that our btree is now a curious hybrid of COW and non COW btrees, with all the benefits of both (excluding complexity). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
b00fde8f |
|
05-Jul-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE Add a new flag to control assertions about updating to internal snapshot nodes, that normally should not be written to - to be used in an upcoming patch. Also do some renaming - trigger_flags is now update_flags. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
297d8934 |
|
10-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Extensive triggers cleanups - We no longer mark subsets of extents, they're marked like regular keys now - which means we can drop the offset & sectors arguments to trigger functions - Drop other arguments that are no longer needed anymore in various places - fs_usage - Drop the logic for handling extents in bch2_mark_update() that isn't needed anymore, to match bch2_trans_mark_update() - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we don't have an old key to mark Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
a49e9a05 |
|
12-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix null ptr deref when splitting compressed extents Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
cd8319fd |
|
07-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill trans->updates2 Now that extent handling has been lifted to bch2_trans_update(), we don't need to keep two different lists of updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
5288e66a |
|
03-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_WITH_UPDATES This drops bch2_btree_iter_peek_with_updates() and replaces it with a new flag, BTREE_ITER_WITH_UPDATES, and also reworks bch2_btree_iter_peek_slot() to respect it too. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
509d3e0a |
|
20-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Child btree iterators This adds the ability for btree iterators to own child iterators - to be used by an upcoming rework of bch2_btree_iter_peek_slot(), so we can scan forwards while maintaining our current position. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
66a0a497 |
|
04-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_iter->should_be_locked Add a field to struct btree_iter for tracking whether it should be locked - this fixes spurious transaction restarts in bch2_trans_relock(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
531a0095 |
|
04-Jun-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve btree iterator tracepoints This patch adds some new tracepoints to the btree iterator code, and adds new fields to the existing tracepoints - primarily for the iterator position. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
#
d7f35163 |
|
09-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix BTREE_ITER_NOT_EXTENTS bch2_btree_iter_peek() wasn't properly checking for BTREE_ITER_IS_EXTENTS when updating iter->pos. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3ce8b463 |
|
03-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill bset_tree->max_key Since we now ensure a btree node's max key fits in its packed format, this isn't needed for the reasons it used to be - and, it was being used inconsistently. Also reorder struct btree a bit for performance, and kill some dead code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5c1d808a |
|
31-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Drop trans->nounlock Since we're no longer doing btree node merging post commit, we can now delete a bunch of code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e751c01a |
|
24-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Start using bpos.snapshot field This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
43d00243 |
|
03-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a mechanism for running callbacks at trans commit time Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
331194a2 |
|
24-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree key cache locking improvements The btree key cache mutex was becoming a significant bottleneck - it was mainly used to protect the lists of dirty, clean and freed cached keys. This patch eliminates the dirty and clean lists - instead, when we need to scan for keys to drop from the cache we iterate over the rhashtable, and thus we're able to remove most uses of that lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c7bb769c |
|
19-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: __bch2_trans_get_iter() refactoring, BTREE_ITER_NOT_EXTENTS Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1f7fdc0a |
|
20-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_iter_live() New helper to clean things up a bit - also, improve iter->flags handling. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6333bd2f |
|
20-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve handling of extents in bch2_trans_update() The transaction update/commit path cares about whether it's inserting extents or regular keys; extents require extra passes (handling of overlapping extents) but sometimes we want to skip all that. This clarifies things by adding a new member to btree_insert_entry specifying whether the key being inserted is an extent, instead of overloading BTREE_ITER_IS_EXTENTS. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
41f8b09e |
|
20-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rename BTREE_ID enums for consistency with other enums Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f2785955 |
|
19-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill support for !BTREE_NODE_NEW_EXTENT_OVERWRITE() bcachefs has been aggressively migrating filesystems and btree nodes to the new format for quite some time - this shouldn't affect anyone anymore, and lets us delete a _lot_ of code. Also, it frees up KEY_TYPE_discard for a new whiteout key type for snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e131b6aa |
|
23-Apr-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a mempool for btree_trans bump allocator This allocation is required for filesystem operations to make forward progress, thus needs a mempool. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1889ad5a |
|
14-Mar-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add code to scan for/rewite old btree nodes This adds a new data job type to scan for btree nodes in the old extent format, and rewrite them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
7e1a3aa9 |
|
11-Feb-2021 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: iter->real_pos We need to differentiate between the search position of a btree iterator, vs. what it actually points at (what we found). This matters for extents, where iter->pos will typically be the start of the key we found and iter->real_pos will be the end of the key we found (which soon won't necessarily be in the same btree node!) and it will also matter for snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
07a1006a |
|
17-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Reduce/kill BKEY_PADDED use With various newer key types - stripe keys, inline data extents - the old approach of calculating the maximum size of the value is becoming more and more error prone. Better to switch to bkey_on_stack, which can dynamically allocate if necessary to handle any size bkey. In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5db43418 |
|
03-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't issue btree writes that weren't journalled If we have an error in the btree interior update path that prevents us from journalling the update, we can't issue the corresponding btree node write - we didn't get a journal sequence number that would cause it to be ignored in recovery. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3eb26d01 |
|
01-Dec-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_trans_get_iter() no longer returns errors Since we now always preallocate the maximum number of iterators when we initialize a btree transaction, getting an iterator never fails - we can delete a fair amount of error path code. This patch also simplifies the iterator allocation code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d0022290 |
|
29-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix error in filesystem initialization The rhashtable code doesn't like when we destroy an rhashtable that was never initialized Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d5425a3b |
|
19-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Throttle updates when btree key cache is too dirty This is needed to ensure we don't deadlock because journal reclaim and thus memory reclaim isn't making forward progress. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
12590720 |
|
19-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Improve btree key cache shrinker The shrinker should start scanning for entries that can be freed oldest to newest - this way, we can avoid scanning a lot of entries that are too new to be freed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
628a3ad2 |
|
12-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a shrinker for the btree key cache Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
876c7af3 |
|
15-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Take a SRCU lock in btree transactions Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6a747c46 |
|
09-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add accounting for dirty btree nodes/keys This lets us improve journal reclaim, so that it now tries to make sure no more than 3/4s of the btree node cache and btree key cache are dirty - ensuring the shrinkers can free memory. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ae1ede58 |
|
02-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't embed btree iters in btree_trans These haven't been in used since reallocing iterators has been disabled, and saves us a lot of stack if we get rid of it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
29364f34 |
|
02-Nov-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Drop sysfs interface to debug parameters It's not used much anymore, the module paramter interface is better. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dcf141b9 |
|
28-Oct-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix spurious transaction restarts The check for whether locking a btree node would deadlock was wrong - we have to check that interior nodes are locked before descendents, but this check was wrong when consider cached vs. non cached iterators. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
39283c71 |
|
19-Oct-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix for bad stripe pointers The allocator usually doesn't increment bucket gens right away on buckets that it's about to hand out (for reasons that need to be documented), instead deferring that to whatever extent update first references that bucket. But stripe pointers reference buckets without changing bucket sector counts, meaning we could end up with a pointer in a stripe with a gen newer than the bucket it points to. Fix this by adding a transactional trigger for KEY_TYPE_stripe that just writes out the keys in the alloc btree for the buckets it points to. Also - consolidate the code that checks pointer validity. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f3721e12 |
|
16-Oct-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Perf improvements for bch_alloc_read() On large filesystems reading in the alloc info takes a significant amount of time. But we don't need to be calling into the fully general bch2_mark_key() path, just open code what we need in bch2_alloc_read_fn(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
4580baec |
|
25-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Remove some uses of PAGE_SIZE in the btree code For portability to userspace, we should try to avoid working in kernel pages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
697e45b2 |
|
06-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill BTREE_TRIGGER_NOOVERWRITES This is prep work for reworking the triggers machinery - we have triggers that need to know both the old and the new key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fff899b1 |
|
03-Jul-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Mark btree nodes as needing rewrite when not all replicas are RW This fixes a bug where recovery fails when one of the devices is read only. Also - consolidate the "must rewrite this node to insert it" behind a new btree node flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
64f2a880 |
|
28-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix bch2_extent_can_insert() not being called It's supposed to check whether we're splitting a compressed extent and if so get a bigger disk reservation - hence this fixes a "disk usage increased by x without a reservaiton" bug. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e27b03b3 |
|
15-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Give bkey_cached_key same attributes as bpos Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2ca88e5a |
|
07-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree key cache This introduces a new kind of btree iterator, cached iterators, which point to keys cached in a hash table. The cache also acts as a write cache - in the update path, we journal the update but defer updating the btree until the cached entry is flushed by journal reclaim. Cache coherency is for now up to the users to handle, which isn't ideal but should be good enough for now. These new iterators will be used for updating inodes and alloc info (the alloc and stripes btrees). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
515282ac |
|
12-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a deadlock __bch2_btree_node_lock() was incorrectly using iter->pos as a proxy for btree node lock ordering, this caused an off by one error that was triggered by bch2_btree_node_get_sibling() getting the previous node. This refactors the code to compare against btree node keys directly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f96c0df4 |
|
02-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a deadlock in bch2_btree_node_get_sibling() There was a bad interaction with bch2_btree_iter_set_pos_same_leaf(), which can leave a btree node locked that is just outside iter->pos, breaking the lock ordering checks in __bch2_btree_node_lock(). Ideally we should get rid of this corner case, but for now fix it locally with verbose comments. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
495aabed |
|
02-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add debug code to print btree transactions Intented to help debug deadlocks, since we can't use lockdep to check btree node lock ordering. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
00b8ccf7 |
|
25-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Interior btree updates are now fully transactional We now update the alloc info (bucket sector counts) atomically with journalling the update to the interior btree nodes, and we also set new btree roots atomically with the journalled part of the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
96e2aa1b |
|
25-May-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add a mechanism for passing extra journal entries to bch2_trans_commit() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0329b150 |
|
01-Apr-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Trace where btree iterators are allocated This will help with iterator overflow bugs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2c31e657 |
|
29-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Reduce max nr of btree iters when lockdep is on This is so we don't overflow MAX_LOCK_DEPTH. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
6357d607 |
|
08-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Journal updates to interior nodes Previously, the btree has always been self contained and internally consistent on disk without anything from the journal - the journal just contained pointers to the btree roots. However, this meant that btree node split or compact operations - i.e. anything that changes btree node topology and involves updates to interior nodes - would require that interior btree node to be written immediately, which means emitting a btree node write that's mostly empty (using 4k of space on disk if the filesystemm blocksize is 4k to only write perhaps ~100 bytes of new keys). More importantly, this meant most btree node writes had to be FUA, and consumer drives have a history of slow and/or buggy FUA support - other filesystes have been bit by this. This patch changes the interior btree update path to journal updates to interior nodes, after the writes for the new btree nodes have completed. Best of all, it turns out to simplify the interior node update path somewhat. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e62d65f2 |
|
15-Mar-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: trans_commit() path can now insert to interior nodes This will be needed for the upcoming patches to journal updates to interior btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e3e464ac |
|
30-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Move extent overwrite handling out of core btree code Ever since the btree code was first written, handling of overwriting existing extents - including partially overwriting and splittin existing extents - was handled as part of the core btree insert path. The modern transaction and iterator infrastructure didn't exist then, so that was the only way for it to be done. This patch moves that outside of the core btree code to a pass that runs at transaction commit time. This is a significant simplification to the btree code and overall reduction in code size, but more importantly it gets us much closer to the core btree code being completely independent of extents and is important prep work for snapshots. This introduces a new feature bit; the old and new extent update models are incompatible when the filesystem needs journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2dac0eae |
|
18-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Iterator debug code improvements More aggressively checking iterator invariants, and fixing the resulting bugs. Also greatly simplifying iter_next() and iter_next_slot() - they were hyper optimized before, but the optimizations were getting too brittle. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
237e8048 |
|
18-Feb-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: introduce b->hash_val This is partly prep work for introducing bch_btree_ptr_v2, but it'll also be a bit of a performance boost by moving the full key out of the hot part of struct btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
d98a5e39 |
|
09-Jan-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add some comments for btree iterator flags Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
24326cd1 |
|
31-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Sort & deduplicate updates in bch2_trans_update() Previously, when doing multiple update in the same transaction commit that overwrote each other, we relied on doing the updates in the same order as the bch2_trans_update() calls in order to get the correct result. But that wasn't correct for triggers; bch2_trans_mark_update() when marking overwrites would do the wrong thing because it hadn't seen the update that was being overwritten. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2d594dfb |
|
31-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Split out btree_trigger_flags The trigger flags really belong with individual btree_insert_entries, not the transaction commit flags - this splits out those flags and unifies them with the BCH_BUCKET_MARK flags. Todo - split out btree_trigger.c from buckets.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
54e86b58 |
|
30-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make btree_insert_entry more private to update path This should be private to btree_update_leaf.c, and we might end up removing it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bcd6f3e0 |
|
26-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use KEY_TYPE_deleted whitouts for extents Previously, partial overwrites of existing extents were handled implicitly by the btree code; when reading in a btree node, we'd do a mergesort of the different bsets and detect and fix partially overlapping extents during that mergesort. That approach won't work with snapshots: this changes extents to work like regular keys as far as the btree code is concerned, where a 0 size KEY_TYPE_deleted whiteout will completely overwrite an existing extent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8b3bbe2c |
|
24-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't reexecute triggers when retrying transaction commit This was causing a bug with transaction iterators overflowing; now, if triggers have to be reexecuted we always return -EINTR and retry from the start of the transaction. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c297a763 |
|
13-Dec-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Refactor whiteouts compaction The whiteout compaction path - as opposed to just dropping whiteouts - is now only needed for extents, and soon will only be needed for extent btree nodes in the old format. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c9bebae6 |
|
29-Nov-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Whiteout changes More prep work for snapshots: extents will soon be using KEY_TYPE_deleted for whiteouts, with 0 size. But we wen't be able to keep these whiteouts with the rest of the extents in the btree node, due to sorting invariants breaking. We can deal with this by immediately moving the new whiteouts to the unwritten whiteouts area - this just means those whiteouts won't be sorted, so we need new code to sort them prior to merging them with the rest of the keys to be written. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2a9101a9 |
|
19-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Refactor bch2_trans_commit() path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8f1965391 |
|
19-Oct-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make btree_node_type_needs_gc() cheaper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
64bc0011 |
|
26-Sep-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rework btree iterator lifetimes The btree_trans struct needs to memoize/cache btree iterators, so that on transaction restart we don't have to completely redo btree lookups, and so that we can do them all at once in the correct order when the transaction had to restart to avoid a deadlock. This switches the btree iterator lookups to work based on iterator position, instead of trying to match them up based on the stack trace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a7199432 |
|
22-Sep-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill deferred btree updates Will be replaced by cached btree iterators Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
bbd8d203 |
|
22-Sep-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_ITER_SLOTS isn't a type of btree iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
36e9d698 |
|
07-Sep-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Do updates in order they were queued up in Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
76426098 |
|
16-Aug-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Reflink Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
88767d65 |
|
24-Jun-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Update path now handles triggers that generate more triggers Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3838be78 |
|
15-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Don't use a fixed size buffer for fs_usage_deltas Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
61011ea2 |
|
14-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Rip out old hacky transaction restart tracing Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
87c3beb4 |
|
15-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Fix a bug with spinning on the journal Transactional triggers meant that when we failed to get a journal reservation, then bailed out into the error path to block on a journal reservation, the second blocking call into the journal code was asking for less space, which is not what we want. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
60755344 |
|
10-May-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill BTREE_ITER_NOUNLOCK Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
932aa837 |
|
11-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bch2_trans_mark_update() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c43a6ef9 |
|
05-Jun-2020 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree_bkey_cached_common This is prep work for the btree key cache: btree iterators will point to either struct btree, or a new struct bkey_cached. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ba5c6557 |
|
22-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add actual tracepoints for transaction restarts Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ece254b2 |
|
04-Apr-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: don't lose errors from iterators that have been freed Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e1120a4c |
|
27-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add iter->idx Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ecc892e4 |
|
27-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Kill btree_iter->next Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9e5e5b9e |
|
25-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree iterators now always have a btree_trans Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
5df4be3f |
|
25-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree iter improvements Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
dc3b63dc |
|
21-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Add time stats for btree updates Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
9623ab27 |
|
15-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree update path cleanup Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
0dc17247 |
|
13-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill struct btree_insert Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
c93cead0 |
|
16-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Always use bch2_extent_trim_atomic() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
a8e00bd4 |
|
07-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: increase BTREE_ITER_MAX Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3e5d6c59 |
|
19-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Use journal preres for deferred btree updates Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
8fe826f9 |
|
13-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Convert bucket invalidation to key marking path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
39fbc5a4 |
|
11-Feb-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: gc lock no longer needed for disk reservations Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
42b72e0b |
|
24-Jan-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: journal_replay_early() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
3636ed48 |
|
17-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Deferred btree updates Will be used in the future for inode updates, which will be very helpful for multithreaded workloads that have to update the inode with every extent update (appends, or updates that change i_sectors) Also will be used eventually for fully persistent alloc info However - we still need a mechanism for reserving space in the journal prior to getting a journal reservation, so it's not technically safe to make use of this just yet, we could deadlock with the journal full (although not likely to be an issue in practice) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
f0cfb963 |
|
29-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Track nr_inodes with the key marking machinery Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
26609b61 |
|
01-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Make bkey types globally unique this lets us get rid of a lot of extra switch statements - in a lot of places we dispatch on the btree node type, and then the key type, so this is a nice cleanup across a lot of code. Also improve the on disk format versioning stuff. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ad7ae8d6 |
|
23-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Btree locking fix, refactoring Hit an assertion, probably spurious, indicating an iterator was unlocked when it shouldn't have been (spurious because it wasn't locked at all when the caller called btree_insert_at()). Add a flag, BTREE_ITER_NOUNLOCK, and tighten up the assertions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1d25849c |
|
07-Nov-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Centralize marking of replicas in btree update path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
2252aa27 |
|
21-Oct-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: btree gc refactoring prep work for erasure coding Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
ef337c54 |
|
06-Oct-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Allocation code refactoring bch2_alloc_sectors_start() was a nightmare to work with - it's got some tricky stuff to do, since it wants to use the buckets the writepoint already has, unless they're not in the target it wants to write to, unless it can't allocate from any other devices in which case it will use those buckets if it has to - et cetera. This restructures the code to start with a new empty list of open buckets we're going to use for the new allocation, pulling buckets from the write point's list as we decide that we really are going to use them - making the code somewhat more functional and drastically easier to understand. Also fixes a bug where we could end up waiting on c->freelist_wait (because allocating from one device failed) but return success from bch2_bucket_alloc(), because allocating from a different device succeeded. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
fc3268c1 |
|
08-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: kill extent_insert_hook Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
cc1add4a |
|
05-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: BTREE_INSERT_JOURNAL_RES_FULL is no longer possible Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
e4ccb251 |
|
05-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: make struct btree_iter a bit smaller Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
271a3d3a |
|
21-Jul-2016 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: lift ordering restriction on 0 size extents This lifts the restriction that 0 size extents must not overlap with other extents, which means we can now sort extents and non extents the same way, and will let us simplify a bunch of other stuff as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1fe08f31 |
|
05-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: bkey_written() also cleanups of btree node offsets Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
617391ba |
|
05-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: improved rw_aux_tree_bsearch() shouldn't be any reason for an actual binary search here Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
b0004d8d |
|
03-Aug-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Factor out btree_key_can_insert() working on getting rid of all the reasons bch2_insert_fixup_extent() can fail/stop partway, which is needed for other refactorings. One of the reasons we could have to bail out is if we're splitting a compressed extent we might need to add to our disk reservation - but we can check that before actually starting the insert. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1c7a0adf |
|
12-Jul-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: trace transaction restarts exceptionally crappy "tracing", but it's a start at documenting the places restarts can be triggered Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
#
1c6fdbd8 |
|
17-Mar-2017 |
Kent Overstreet <kent.overstreet@gmail.com> |
bcachefs: Initial commit Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write filesystem with every feature you could possibly want. Website: https://bcachefs.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|