History log of /linux-master/fs/bcachefs/fs-io.c
Revision Date Author Comments
# 9e203c43 12-Apr-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix missing write refs in fs fio paths

bch2_journal_flush_seq requires us to have a write ref

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e44f325 11-Jan-2024 Christoph Hellwig <hch@lst.de>

bcachefs: fix incorrect usage of REQ_OP_FLUSH

REQ_OP_FLUSH is only for internal use in the blk-mq and request based
drivers. File systems and other block layer consumers must use
REQ_OP_WRITE | REQ_PREFLUSH as documented in
Documentation/block/writeback_cache_control.rst.

While REQ_OP_FLUSH appears to work for blk-mq drivers it does not
get the proper flush state machine handling, and completely fails
for any bio based drivers, including all the stacking drivers. The
block layer will also get a check in 6.8 to reject this use case
entirely.

[Note: completely untested, but as this never got fixed since the
original bug report in November:

https://bugzilla.kernel.org/show_bug.cgi?id=218184

and the the discussion in December:

https://lore.kernel.org/all/20231221053016.72cqcfg46vxwohcj@moria.home.lan/T/

this seems to be best way to force it]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 46bf2e9c 15-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix excess transaction restarts in __bchfs_fallocate()

drop_locks_do() should not be used in a fastpath without first trying
the do in nonblocking mode - the unlock and relock will cause excessive
transaction restarts and potentially livelocking with other threads that
are contending for the same locks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5a11b5fe 05-Dec-2023 Brian Foster <bfoster@redhat.com>

bcachefs: return from fsync on writeback error to avoid early shutdown

When investigating transient failures of generic/441 on bcachefs, it
was determined that the cause of the failure was a combination of
unconditional emergency shutdown and racing between background
journal activity and the test switchover from a working device
mapper table to an error injecting table.

Part of the reason for this sequence of events is that bcachefs
aggressively flushes as much as possible during fsync(), regardless
of errors. While this is reasonable behavior, it is technically
unnecessary because once an error is returned from fsync(), the
caller cannot make any assumptions about the resilience of data.

Tweak the bch2_fsync() logic to return an error on failure of any of
the steps involved in the flush. Note that this change alone does
not prevent generic/441 failure, but in combination with a test
tweak to avoid racing during the dm-error table switchover it avoids
the unnecessary shutdowns and allows the test to pass reliably on
bcachefs.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ecf8a74d 16-Nov-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: kill INODE_LOCK, use lock_two_nondirectories()

In an ideal world, we'd have a common helper that could be used for
sorting a list of inodes into the correct lock order, and then the same
lock ordering could be used for any type of inode lock, not just
i_rwsem.

But the lock ordering rules for i_rwsem are a bit complicated, so -
abandon that dream for now and do it the more standard way.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6bd68ec2 12-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Heap allocate btree_trans

We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c04cbc0d 12-Sep-2023 Colin Ian King <colin.i.king@gmail.com>

bcachefs: remove redundant initializations of variables start_offset and end_offset

The variables start_offset and end_offset are being initialized with
values that are never read, they being re-assigned later on. The
initializations are redundant and can be removed.

Cleans up clang-scan build warnings:
fs/bcachefs/fs-io.c:243:11: warning: Value stored to 'start_offset' during
its initialization is never read [deadcode.DeadStores]
fs/bcachefs/fs-io.c:244:11: warning: Value stored to 'end_offset' during
its initialization is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5902cc28 04-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New io_misc.c helpers

This pulls the non vfs specific parts of truncate and finsert/fcollapse
out of fs-io.c, and moves them to io_misc.c.

This is prep work for logging these operations, to make them atomic in
the event of a crash.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1809b8cb 10-Sep-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Break up io.c

More reorganization, this splits up io.c into
- io_read.c
- io_misc.c - fallocate, fpunch, truncate
- io_write.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 791236b8 12-Aug-2023 Joshua Ashton <joshua@froggi.es>

bcachefs: Add btree_trans* to inode_set_fn

This will be used when we need to re-hash a directory tree when setting
flags.

It is not possible to have concurrent btree_trans on a thread.

Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# dbbfca9f 03-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Split up fs-io.[ch]

fs-io.c is too big - time for some reorganization
- fs-dio.c: direct io
- fs-pagecache.c: pagecache data structures (bch_folio), utility code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1e81f89b 06-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix assorted checkpatch nits

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4198bf03 03-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix lock thrashing in __bchfs_fallocate()

We've observed significant lock thrashing on fstests generic/083 in
fallocate, due to dropping and retaking btree locks when checking the
pagecache for data.

This adds a nonblocking mode to bch2_clamp_data_hole(), where we only
use folio_trylock(), and can thus be used safely while btree locks are
held - thus we only have to drop btree locks as a fallback, on actual
lock contention.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0a6d6945 03-Aug-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix folio leak in folio_hole_offset()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a09818c7 09-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fallocate now checks page cache

Previously, fallocate would only check the state of the extents btree
when determining if we need to create a reservation.

But the page cache might already have dirty data or a disk reservation.
This changes __bchfs_fallocate() to call bch2_seek_pagecache_hole() to
check for this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c8b4534d 07-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Delete redundant log messages

Now that we have distinct error codes for different memory allocation
failures, the early init log messages are no longer needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 73bd774d 06-Jul-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted sparse fixes

- endianness fixes
- mark some things static
- fix a few __percpu annotations
- fix silent enum conversions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b6898917 20-Jun-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Check for ERR_PTR() from filemap_lock_folio()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5718fda0 27-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: fs-io: Eliminate GFP_NOFS usage

GFP_NOFS doesn't ever make sense. If we're allocatingc memory it should
be GFP_NOWAIT if btree locks are held, GFP_KERNEL otherwise.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 70d41c9e 27-May-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Avoid __GFP_NOFAIL

We've been using __GFP_NOFAIL for allocating struct bch_folio, our
private per-folio state.

However, that struct is variable size - it holds state for each sector
in the folio, and folios can be quite large now, which means it's
possible for bch_folio to be larger than PAGE_SIZE now.

__GFP_NOFAIL allocations are undesirable in normal circumstances, but
particularly so at >= PAGE_SIZE, and warnings are emitted for that.

So, this patch adds proper error paths and eliminates most uses of
__GFP_NOFAIL. Also, do some more cleanup of gfp flags w.r.t. btree node
locks: we can use GFP_KERNEL, but only if we're not holding btree locks,
and if we are holding btree locks we should be using GFP_NOWAIT.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# cb1b479d 28-Apr-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix quotas + snapshots

Now that we can reliably designate and find the master subvolume out of
a tree of snapshots, we can finally make quotas work with snapshots:

That is - quotas will now _ignore_ snapshot subvolumes, and only be in
effect for the master (non snapshot) subvolume.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bf98ee10 03-Apr-2023 Brian Foster <bfoster@redhat.com>

bcachefs: folio pos to bch_folio_sector index helper

Create a small helper to translate from file offset to the
associated bch_folio_sector index in the underlying bch_folio. The
helper assumes the file offset is covered by the passed folio.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6b9857b2 29-Mar-2023 Brian Foster <bfoster@redhat.com>

bcachefs: use u64 for folio end pos to avoid overflows

Some of the folio_end_*() helpers are prone to overflow of signed
64-bit types because the mapping is only limited by the max value of
loff_t and the associated helpers return the start offset of the
next folio. Therefore, a folio_end_pos() of the max allowable folio in a
mapping returns a value that overflows loff_t.

This makes it hard to rely on such values when doing folio
processing across a range of a file, as bcachefs attempts to do with
the recent folio changes. For example, generic/564 causes problems
in the buffered write path when testing writes at max boundary
conditions.

The current understanding is that the pagecache historically limited
the mapping to one less page to avoid this problem and this was
dropped with some of the folio conversions, but may be reinstated to
properly address the problem. In the meantime, update the internal
folio_end_*() helpers in bcachefs to return a u64, and all of the
associated code to use or cast to u64 to avoid overflow problems.
This allows generic/564 to pass and can be reverted back to using
loff_t if at any point the pagecache subsystem can guarantee these
boundary conditions will not overflow.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 335f7d4f 29-Mar-2023 Brian Foster <bfoster@redhat.com>

bcachefs: clean up post-eof folios on -ENOSPC

The buffered write path batches folio creations in the file mapping
based on the requested size of the write. Under low free space
conditions, it is possible to add a bunch of folios to the mapping
and then return a short write or -ENOSPC due to lack of space. If
this occurs on an extending write, the file size is updated based on
the amount of data successfully written to the file. If folios were
added beyond the final i_size, they may hang around until reclaimed,
truncated or encountered unexpectedly by another operation.

For example, generic/083 reproduces a sequence of events where a
short write leaves around one or more post-EOF folios on an inode, a
subsequent zero range request extends beyond i_size and overlaps
with an aforementioned folio, and __bch2_truncate_folio() happens
across it and complains.

Update __bch2_buffered_write() to keep track of the start offset of
the last folio added to the mapping for a prospective write. After
i_size is updated, check whether this offset starts beyond EOF. If
so, truncate pagecache beyond the latest EOF to clean up any folios
that don't reside at least partially within EOF upon completion of
the write.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4ad6aa46 29-Mar-2023 Brian Foster <bfoster@redhat.com>

bcachefs: fix truncate overflow if folio is beyond EOF

generic/083 occasionally reproduces a panic caused by an overflow
when accessing the bch_folio_sector array of the folio being
processed by __bch2_truncate_folio(). The immediate cause of the
overflow is that the folio offset is beyond i_size, and therefore
the sector index calculation underflows on subtraction of the folio
offset.

One cause of this is mainly observed on nocow mounts. When nocow is
enabled, fallocate performs physical block allocation (as opposed to
block reservation in cow mode), which range_has_data() then
interprets as valid data that requires partial zeroing on truncate.
Therefore, if a post-eof zero range request lands across post-eof
preallocated blocks, __bch2_truncate_folio() may actually create a
post-eof folio in order to perform zeroing. To avoid this problem,
update range_has_data() to filter out unwritten blocks from folio
creation and partial zeroing.

Even though we should never create folios beyond EOF like this, the
mere existence of such folios is not necessarily a fatal error. Fix
up the truncate code to warn about this condition and not overflow
the sector array and possibly crash the system. The addition of this
warning without the corresponding unwritten extent fix has shown
that various other fstests are able to reproduce this problem fairly
frequently, but often in ways that doesn't necessarily result in a
kernel panic or a change in user observable behavior, and therefore
the problem goes undetected.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 34fdcf06 27-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Check for folios that don't have bch_folio attached

With large folios, it's now incidentally possible to end up with a
clean, uptodate folio in the page cache that doesn't have a bch_folio
attached, if a folio has to be split.

This patch fixes __bch2_truncate_folio() to check for this; other code
paths appear to handle it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9567413c 17-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_readahead() large folio conversion

Readahead now uses the new filemap_get_contig_folios_d() helper.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 40022c01 22-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: filemap_get_contig_folios_d()

Add a new helper for getting a range of contiguous folios and returning
them in a darray.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a1774a05 23-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_folio_sector_state improvements

- X-macro-ize the bch_folio_sector_state enum: this means we can easily
generate strings, which is helpful for debugging.

- Add helpers for state transitions: folio_sector_dirty(),
folio_sector_undirty(), folio_sector_reserve()

- Add folio_sector_set(), a single helper for changing folio sector
state just so that we have a single place to instrument when we're
debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 959f7368 19-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_truncate_page() large folio conversion

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c42b57c4 18-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_buffered_write large folio conversion

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 49fe78ff 17-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_folio can now handle multi-order folios

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 33e2eb96 17-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More assorted large folio conversion

Various misc small conversions in fs-io.c for large folios.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a86a92cb 19-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_seek_pagecache_data() folio conversion

This converts bch2_seek_pagecache_data() to handle large folios.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e8d28c3e 19-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_seek_pagecache_hole() folio conversion

This converts bch2_seek_pagecache_hole() to handle large folios.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ff9c301f 19-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bio_for_each_segment_all() -> bio_for_each_folio_all()

This converts the writepage end_io path to folios.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 30bff594 17-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Initial folio conversion

This converts fs-io.c to pass folios, not pages. We're not handling
large folios yet, there's no functional changes in this patch - just a
lot of churn doing the initial type conversions.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3342ac13 16-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Rename bch_page_state -> bch_folio

Start of the large folio conversion.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c437e153 27-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add a bch_page_state assert

Seeing an odd bug with page/folio state not being properly initialized,
this is to help track it down.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 65d48e35 14-Mar-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Private error codes: ENOMEM

This adds private error codes for most (but not all) of our ENOMEM uses,
which makes it easier to track down assorted allocation failures.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a8b3a677 02-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Nocow support

This adds support for nocow mode, where we do writes in-place when
possible. Patch components:

- New boolean filesystem and inode option, nocow: note that when nocow
is enabled, data checksumming and compression are implicitly disabled

- To prevent in-place writes from racing with data moves
(data_update.c) or bucket reuse (i.e. a bucket being reused and
re-allocated while a nocow write is in flight, we have a new locking
mechanism.

Buckets can be locked for either data update or data move, using a
fixed size hash table of two_state_shared locks. We don't have any
chaining, meaning updates and moves to different buckets that hash to
the same lock will wait unnecessarily - we'll want to watch for this
becoming an issue.

- The allocator path also needs to check for in-place writes in flight
to a given bucket before giving it out: thus we add another counter
to bucket_alloc_state so we can track this.

- Fsync now may need to issue cache flushes to block devices instead of
flushing the journal. We add a device bitmask to bch_inode_info,
ei_devs_need_flush, which tracks devices that need to have flushes
issued - note that this will lead to unnecessary flushes when other
codepaths have already issued flushes, we may want to replace this with
a sequence number.

- New nocow write path: look up extents, and if they're writable write
to them - otherwise fall back to the normal COW write path.

XXX: switch to sequence numbers instead of bitmask for devs needing
journal flush

XXX: ei_quota_lock being a mutex means bch2_nocow_write_done() needs to
run in process context - see if we can improve this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 79203111 13-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Unwritten extents support

- bch2_extent_merge checks unwritten bit
- read path returns 0s for unwritten extents without actually reading
- reflink path skips over unwritten extents
- bch2_bkey_ptrs_invalid() checks for extents with both written and
unwritten extents, and non-normal extents (stripes, btree ptrs) with
unwritten ptrs
- fiemap checks for unwritten extents and returns
FIEMAP_EXTENT_UNWRITTEN

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 70de7a47 13-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_extent_fallocate()

This factors out part of __bchfs_fallocate() in fs-io.c into an new,
lower level io.c helper, which creates a single extent reservation.

This is prep work for nocow support - the new helper will shortly gain
the ability to create unwritten extents.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d94189ad 08-Feb-2023 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Debug mode for c->writes references

This adds a debug mode where we split up the c->writes refcount into
distinct refcounts for every codepath that takes a reference, and adds
sysfs code to print the value of each ref.

This will make it easier to debug shutdown hangs due to refcount leaks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c72f687a 11-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Use for_each_btree_key_upto() more consistently

It's important that in BTREE_ITER_FILTER_SNAPSHOTS mode we always use
peek_upto() and provide an end for the interval we're searching for -
otherwise, when we hit the end of the inode the next inode be in a
different subvolume and not have any keys in the current snapshot, and
we'd iterate over arbitrarily many keys before returning one.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 01ad6737 23-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_inode_opts_get()

This improves io_opts() and makes it a non-inline function - it's big
enough that it probably shouldn't be.

Also, bch_io_opts no longer needs fields for whether options are
defined, so we can slim it down a bit.

We'd like to stop passing around the full bch_io_opts, but that'll be
tricky because of bch2_rebalance_add_key().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e88a75eb 24-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: New bpos_cmp(), bkey_cmp() replacements

This patch introduces
- bpos_eq()
- bpos_lt()
- bpos_le()
- bpos_gt()
- bpos_ge()

and equivalent replacements for bkey_cmp().

Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4d868d18 24-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More dio inlining

Eliminate another function call in the O_DIRECT write path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7fec8266 15-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Error message improvement

- Centralize format strings in bcachefs.h
- Add bch2_fmt_inum_offset() and related helpers
- Switch error messages for inodes to also print out the offset, in
bytes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8eb71e9e 15-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve a few warnings

Warnings ought to always have a format string/log message - makes them
considerably more useful.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6b1b186a 13-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Minor dio write path improvements

This switches where we take quota reservations to be per bch_wirte_op
instead of per dio_write, so we can drop the quota reservation in the
same place as we call i_sectors_acct(), and only take/release
ei_quota_lock once.

In the future we'd like ei_quota_lock to not be a mutex, so that we can
avoid punting to process context before deliving write completions in
nocow mode.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a7ecd30c 04-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Factor out two_state_shared_lock

We have a unique lock used for controlling adding to the pagecache: the
lock has two states, where both states are shared - the lock may be held
multiple times for either state - but not both states at the same time.

This is exactly what we need for nocow mode locking, so this patch pulls
it out of fs.c into its own file.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a1ee777b 02-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill BCH_WRITE_FLUSH

BCH_WRITE_FLUSH is a write flag that causes a journal flush. It's only
used in the direct IO path, and this will allow for some consolidation
with the regular fsync path, which will help with the upcoming nocow
mode.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 182c7bbf 31-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: DIO write path optimization

- With BCH_WRITE_SYNC, we no longer need the completion in struct
dio_write
- Pull out bch2_dio_write_copy_iov() into a separate non-inline
function, it's code that doesn't run in the common case
- Copy mapping and inode pointers into dio_write, avoiding pointer
chasing at the start of bch2_dio_write_loop()
- kthread_use_mm() is not needed in the common case; move it into
bch2_dio_write_loop_async()
- factor out various helpers from bch2_dio_write_loop() and rework
control flow for better icache utilization

Other small optimizations:

- bch2_keylist_free() is only used in one place, at the end of the
bch2_write() path - drop the reinit
- in bch2_disk_reservation_put(), check if res->sectors is nonzero
before touching c->online_reserved, since that will likely be a cache
miss

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More DIO write path optimization

Better code prefetching (?)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1df3e199 29-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: BCH_WRITE_SYNC

This adds a new flag for the write path, BCH_WRITE_SYNC, and switches
the O_DIRECT write path to use it when we're not running asynchronously.

It runs the btree update after the write in the original thread's
context instead of a kworker, cutting context switches in half.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 80fe580c 24-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix a spurious warning

Fixes fstests generic/648

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 353448f3 23-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix buffered write path for generic/275

Per fstests generic/275, on -ENOSPC we're supposed write until the
filesystem is full - i.e. do a partial write instead of failing the full
write.

This is a partial fix for the buffered write path: we'll still fail on a
page boundary.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3e3e02e6 19-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Assorted checkpatch fixes

checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:

ASSIGN_IN_IF
UNSPECIFIED_INT - bcachefs coding style prefers single token type names
NEW_TYPEDEFS - typedefs are occasionally good
FUNCTION_ARGUMENTS - we prefer to look at functions in .c files
(hopefully with docbook documentation), not .h
file prototypes
MULTISTATEMENT_MACRO_USE_DO_WHILE
- we have _many_ x-macros and other macros where
we can't do this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bd954215 15-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Quota fixes

- We now correctly allow soft limits to be exceeded, instead of always
returning -EDQUOT
- Disk quota grate times/warnings can now be set, not just the
systemwide defaults

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 07bfcc0b 12-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix for not dropping privs in fallocate

When modifying a file, we may be required to drop the suid/sgid bits -
we were missing a file_modified() call to do this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3a4d3656 12-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix bch2_write_begin()

An error case was jumping to the wrong label, creating an infinite loop
- oops.

This fixes fstests generic/648.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e8540e56 11-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Reflink now respects quotas

This adds a new helper, quota_reserve_range(), which takes a quota
reservation for unallocated blocks in a given file range, and uses it in
bch2_remap_file_range().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2d848dac 26-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Kill io_in_flight semaphore

This used to be needed more for buffered IO, but now the block layer has
writeback throttling - we can delete this now.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 098ef98d 18-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add private error codes for ENOSPC

Continuing the saga of introducing private dedicated error codes for
each error path, this patch converts ENOSPC to error codes that are
subtypes of ENOSPC. We've recently had a test failure where we got
-ENOSPC where we shouldn't have, and didn't have enough information to
tell where it came from, so this patch will solve that problem.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5c1ef830 18-Sep-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Errcodes can now subtype standard error codes

The next patch is going to be adding private error codes for all the
places we return -ENOSPC.

Additionally, this patch updates return paths at all module boundaries
to call bch2_err_class(), to return the standard error code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 549d173c 17-Jul-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: EINTR -> BCH_ERR_transaction_restart

Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# a3d7afa5 18-Jun-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Always use percpu_ref_tryget_live() on c->writes

If we're trying to get a ref and the refcount has been killed, it means
we're doing an emergency shutdown - we always want tryget_live().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# facc8147 05-May-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Delete bch_writepage

Per Dave Chinner and the xfs folks, .writepage is no longer needed, and
it's better not to define it if .writepages is the intended path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b33bf1bc 16-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Go emergency RO when i_blocks underflows

This improves some of our warnings and assertions - they imply possible
filesystem inconsistencies, so they should be calling
bch2_fs_inconsistent().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 7c4ca54a 08-Apr-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't skip triggers in fcollapse()

With backpointers this doesn't work anymore - backpointers always need
to be updated to point to the new extent position.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f8494d25 16-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert some WARN_ONs to WARN_ON_ONCE

These warnings are symptomatic of something else going wrong, we don't
want them spamming up the logs as that'll make it harder to find the
real issue.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9552e19f 09-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix dio write path with loopback dio mode

When the iov_iter is a bvec iter, it's possible the IO was submitted
from a kthread that didn't have an mm to switch to.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 4d126dc8 08-Mar-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bio_iov_vecs_to_alloc()

This fixes a bug in the DIO read path where, when using a loopback
device in DIO mode, we'd allocate a biovec that would get overwritten
and leaked in bio_iov_iter_get_pages() -> bio_iov_bvec_set().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# eb331fe5 14-Feb-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check for stale dirty pointer before reads

Since we retry reads when we discover we read from a pointer that went
stale, if a dirty pointer is erroniously stale it would cause us to loop
retrying that read forever - unless we check before issuing the read,
while the btree is still locked, when we know that a dirty pointer
should never be stale.

This patch adds that check, along with printing some helpful debug info.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 57cfdd8b 04-Jan-2022 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BTREE_ITER_FILTER_SNAPSHOTS is selected automatically

It doesn't have to be specified - this patch deletes the two instances
where it was.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 51c4e406 15-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an assertion in bch2_truncate()

We recently added an assertion that when we truncate a file to 0,
i_blocks should also go to 0 - but that's not necessarily true if we're
doing an emergency shutdown, lots of invariants no longer hold true in
that case.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f54788cc 08-Dec-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert a BUG_ON() to a warning

A user reported hitting this assertion, and we can't reproduce it yet,
but it shouldn't be fatal - so convert it to a warning.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# dcfc593f 23-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix page state after fallocate

This tweaks the fallocate code to also update the page cache to reflect
the new on disk reservations, giving us better i_sectors consistency.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e6ec361f 23-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix page state when reading into !PageUptodate pages

This patch adds code to read page state before writing to pages that
aren't uptodate, which corrects i_sectors being tempororarily too large
and means we may not need to get a disk reservation.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# Conflicts:
# fs/bcachefs/fs-io.c


# 7279c1a2 23-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill PAGE_SECTOR_SHIFT

Replace it with the new, standard PAGE_SECTORS_SHIFT

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 084d42bb 23-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Apply workaround for too many btree iters to read path

Reading from cached data, which calls bch2_bucket_io_time_reset(), is
leading to transaction iterator overflows - this standardizes the
workaround.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b44a66a6 23-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: SECTOR_DIRTY_RESERVED

This fixes another i_sectors accounting bug - we need to differentiate
between dirty writes that overwrite a reservation and dirty writes to
unallocated space - dirty writes to unallocated space increase
i_sectors, dirty writes over a reservation do not.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b19d307d 21-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix i_sectors_leak in bch2_truncate_page

When bch2_truncate_page() discards dirty sectors in the page cache, we
need to account for that - we don't need to account for allocated
sectors because that'll be done by the bch2_fpunch() call when it
updates the btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8810386f 21-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an i_sectors accounting bug

We weren't checking for errors before calling i_sectors_acct()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f74a5051 11-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't check for -ENOSPC in page writeback

If at all possible we'd prefer to not fail page writeback unless the
filesystem has been shutdown; allowing errors in page writeback means
things we'd like to assert about i_size consistency between the VFS and
the btree go out the window.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 74163da7 06-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fallocate fixes

- fpunch wasn't always correctly updating i_size - when we drop buffered
writes that were extending a file, we become responsible for writing
i_size.

- fzero was sometimes zeroing out more data that it should have -
block_start and block_end were being rounded in the wrong directions

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 68a2054d 05-Nov-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Switch fsync to use bi_journal_seq

Now that we're recording in each inode the journal sequence number of
the most recent update, fsync becomes a lot simpler and we can delete
all the plumbing for ei_journal_seq.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e5fa91d7 20-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix restart handling in for_each_btree_key()

Code that uses for_each_btree_key often wants transaction restarts to be
handled locally and not returned. Originally, we wouldn't return
transaction restarts if there was a single iterator in the transaction -
the reasoning being if there weren't other iterators being invalidated,
and the current iterator was being advanced/retraversed, there weren't
any locks or iterators we were required to preserve.

But with the btree_path conversion that approach doesn't work anymore -
even when we're using for_each_btree_key() with a single iterator there
will still be two paths in the transaction, since we now always preserve
the path at the pos the iterator was initialized at - the reason being
that on restart we often restart from the same place.

And it turns out there's now a lot of for_each_btree_key() uses that _do
not_ want transaction restarts handled locally, and should be returning
them.

This patch splits out for_each_btree_key_norestart() and
for_each_btree_key_continue_norestart(), and converts existing users as
appropriate. for_each_btree_key(), for_each_btree_key_continue(), and
for_each_btree_node() now handle transaction restarts themselves by
calling bch2_trans_begin() when necessary - and the old hack to not
return transaction restarts when there's a single path in the
transaction has been deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9a796fdb 19-Oct-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_exit() no longer returns errors

Now that peek_node()/next_node() are converted to return errors
directly, we don't need bch2_trans_exit() to return errors - it's
cleaner this way and wasn't used much anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8c6d298a 12-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert io paths for snapshots

This plumbs around the subvolume ID as was done previously for other
filesystem code, but now for the IO paths - the control flow in the IO
paths is trickier so the changes in this patch are more involved.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 6fed42bb 15-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Plumb through subvolume id

To implement snapshots, we need every filesystem btree operation (every
btree operation without a subvolume) to start by looking up the
subvolume and getting the current snapshot ID, with
bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing
btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode.

This patch adds those bch2_subvolume_get_snapshot() calls, and also
switches to passing around a subvol_inum instead of just an inode
number.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 67e0dd8f 30-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: btree_path

This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9f6bd307 24-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reduce iter->trans usage

Disfavoured, and should go away.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 3737e0dd 05-Aug-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an unhandled transaction restart

__bch2_read() -> __bch2_read_extent() -> bch2_bucket_io_time_reset() may
cause a transaction restart, which we don't return an error for because
it doesn't prevent us from making forward progress on the read we're
submitting.

Instead, change __bch2_read() and bchfs_read() to check for transaction
restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 700c25b3 24-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bch2_trans_begin() more consistently

Upcoming patch will require that a transaction restart is always
immediately followed by bch2_trans_begin().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8b3e9bd6 24-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Always check for transaction restarts

On transaction restart iterators won't be locked anymore - make sure
we're always checking for errors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# b97bbd4e 20-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use bch2_inode_find_by_inum() in truncate

This is needed for snapshots because we need to start handling lock
restarts even when just calling bch2_inode_peek().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 5468f119 13-Jul-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a memory leak in the dio write path

There were some error paths where we were leaking page refs - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 78d66ab1 27-Jun-2021 Dan Robertson <dan@dlrobertson.com>

bcachefs: fix truncate without a size change

Do not attempt to shortcut a truncate when the given new size is
the same as the current size. There may be blocks allocated to the
file that extend beyond the i_size. The ctime and mtime should
not be updated in this case.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 68a507a2 14-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix truncate with ATTR_MODE

After the v5.12 rebase, we started oopsing when truncate was passed
ATTR_MODE, due to not passing mnt_userns to setattr_copy(). This
refactors things so that truncate/extend finish by using
bch2_setattr_nonsize(), which solves the problem.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 8c3f6da9 14-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve iter->should_be_locked

Adding iter->should_be_locked introduced a regression where it ended up
not being set on the iterator passed to bch2_btree_update_start(), which
is definitely not what we want.

This patch requires it to be set when calling bch2_trans_update(), and
adds various fixups to make that happen.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 2ed5cd50 14-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a memory leak in dio write path

Commit c42bca92be928ce7dece5fc04cf68d0e37ee6718 "bio: don't copy bvec
for direct IO" changed bio_iov_iter_get_pages() to point bio->bi_iovec
at the incoming biovec, meaning if we already allocated one, it'll be
leaked.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# f7beb4ca 02-Jun-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Preallocate transaction mem

This helps avoid transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# 9f311f21 29-Oct-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't use bch_write_op->cl for delivering completions

We already had op->end_io as an alternative mechanism to op->cl.parent
for delivering write completions; this switches all code paths to using
op->end_io.

Two reasons:
- op->end_io is more efficient, due to fewer atomic ops, this completes
the conversion that was originally only done for the direct IO path.
- We'll be restructing the write path to use a different mechanism for
punting to process context, refactoring to not use op->cl will make
that easier.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a6336910 20-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix for buffered writes getting -ENOSPC

Buffered writes may have to increase their disk reservation at btree
update time, due to compression and erasure coding being unpredictable:
O_DIRECT writes should be checking for -ENOSPC, but buffered writes have
already been accepted and should not.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# e7084c9c 19-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make bch2_remap_range respect O_SYNC

Caught by xfstest generic/628

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>


# ef1b2092 18-May-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Ratelimiting for writeback IOs

Writeback throttling is a kernel config option and not always enabled.
When it's not enabled we need a fallback, to avoid unbounded memory
pinning and work item backlogs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 050197b1 28-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Ensure that fpunch updates inode timestamps

Fixes xfstests generic/059

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 694015c2 16-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor bchfs_fallocate() to not nest btree_trans on stack

Upcoming patch is going to disallow multiple btree_trans on the stack.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 50dc0f69 19-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Require all btree iterators to be freed

We keep running into occasional bugs with btree transaction iterators
overflowing - this will make those bugs more visible.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 87a432f5 15-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill reflink option

An option was added to control whether reflink support was on or off
because for a long time, reflink + inline data extent support was
missing - but that's since been fixed, so we can drop the option now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5ff75ccb 14-Mar-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix read retry path for indirect extents

In the read path, for retry of indirect extents to work we need to
differentiate between the location in the btree the read was for, vs.
the location where we found the data. This patch adds that plumbing to
bch_read_bio.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 41f8b09e 20-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rename BTREE_ID enums for consistency with other enums

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3d495595 07-Feb-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_btree_iter_peek_prev()

This makes bch2_btree_iter_peek_prev() and bch2_btree_iter_prev()
consistent with peek() and next(), w.r.t. iter->pos.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b4725cc1 21-Jan-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix loopback in dio mode

We had a deadlock on page_lock, because buffered reads signal completion
by unlocking the page, but the dio read path normally dirties the pages
it's reading to with set_page_dirty_lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 032ac32c 27-Apr-2021 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix .splice_write

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 07a1006a 17-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reduce/kill BKEY_PADDED use

With various newer key types - stripe keys, inline data extents - the
old approach of calculating the maximum size of the value is becoming
more and more error prone. Better to switch to bkey_on_stack, which can
dynamically allocate if necessary to handle any size bkey.

In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 35a067b4 14-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Change when we allow overwrites

Originally, we'd check for -ENOSPC when getting a disk reservation
whenever the new extent took up more space on disk than the old extent.

Erasure coding screwed this up, because with erasure coding writes are
initially replicated, and then in the background the extra replicas are
dropped when the stripe is created. This means that with erasure coding
enabled, writes will always take up more space on disk than the data
they're overwriting - but, according to posix, overwrites aren't
supposed to return ENOSPC.

So, in this patch we fudge things: if the new extent has more replicas
than the _effective_ replicas of the old extent, or if the old extent is
compressed and the new one isn't, we check for ENOSPC when getting the
disk reservation - otherwise, we don't.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f30dd860 16-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't write bucket IO time lazily

With the btree key cache code, we don't need to update the alloc btree
lazily - and this will mean we can remove the bch2_alloc_write() call in
the shutdown path.

Future work: we really need to expend the bucket IO clocks from 16 to 64
bits, so that we don't have to rescale them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 33c74e41 03-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Flag inodes that had btree update errors

On write error, the vfs inode's i_size may be inconsistent with the
btree inode's i_size - flag this so we don't have spurious assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0fefe8d8 03-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improve some IO error messages

it's useful to know whether an error was for a read or a write - this
also standardizes error messages a bit more.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3eb26d01 01-Dec-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_get_iter() no longer returns errors

Since we now always preallocate the maximum number of iterators when we
initialize a btree transaction, getting an iterator never fails - we can
delete a fair amount of error path code.

This patch also simplifies the iterator allocation code a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 89931472 29-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix for __readahead_batch getting partial batch

We were incorrectly ignoring the return value of __readahead_batch,
leading to a null ptr deref in __bch2_page_state_create().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# eb8e6e9c 10-Nov-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Deadlock prevention for ei_pagecache_lock

In the dio write path, when get_user_pages() invokes the fault handler
we have a recursive locking situation - we have to handle the lock
ordering ourselves or we have a deadlock: this patch addresses that by
checking for locking ordering violations and doing the unlock/relock
dance if necessary.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 00276f9f 05-Nov-2020 Matthew Wilcox (Oracle) <willy@infradead.org>

bcachefs: Use attach_page_private and detach_page_private

These recently added helpers simplify the code.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 96fee47e 05-Nov-2020 Matthew Wilcox (Oracle) <willy@infradead.org>

bcachefs: Remove page_state_init_for_read

This is dead code; delete the function.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 13dcd4ab 24-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix rare use after free in read path

If the bkey_on_stack_reassemble() call in __bch2_read_indirect_extent()
reallocates the buffer, k in bch2_read - which we pointed at the
bkey_on_stack buffer - will now point to a stale buffer. Whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9ba2eb25 08-Oct-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix __bch2_truncate_page()

__bch2_truncate_page() will mark some of the blocks in a page as
unallocated. But, if the page is mmapped (and writable), every block in
the page needs to be marked dirty, else those blocks won't be written by
__bch2_writepage().

The solution is to change those userspace mappings to RO, so that we
force bch2_page_mkwrite() to be called again.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 912bdf17 09-Jul-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix short buffered writes

In the buffered write path, we have to check for short writes that write
to the full page, where the page wasn't UpToDate; when this happens, the
page is partly garbage, so we have to zero it out and revert that part
of the write.

This check was wrong - we reverted total from copied, but didn't revert
the iov_iter, probably also leading to corrupted writes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 52fbb7c8 30-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't cap ios in dio write path at 2 MB

It appears this was erronious, a different bug was responsible

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 042a1f26 29-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor dio write code to reinit bch_write_op

This fixes a bug where the BCH_WRITE_SKIP_CLOSURE_PUT was set
incorrectly, causing the completion to be delivered multiple times.
oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 36b8372b 02-Jun-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add an option to disable reflink support

Reflink might be buggy, so we're adding an option so users can help
bisect what's going on.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 22d8a33d 22-May-2020 Yuxuan Shui <yshuiv7@gmail.com>

bcachefs: fix stack corruption

When a bkey_on_stack is passed to bch_read_indirect_extent, there is no
guarantee that it will be big enough to hold the bkey. And
bch_read_indirect_extent is not aware of bkey_on_stack to call realloc
on it. This cause a stack corruption.

This commit makes bch_read_indirect_extent aware of bkey_on_stack so it
can call realloc when appropriate.

Tested-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f59b3464 29-Apr-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't issue writes that are more than 1 MB

the bcachefs io path in io.c can't bounce writes larger than that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 283eda57 01-Apr-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix fallocate FL_INSERT_RANGE

This was another bug because of bch2_btree_iter_set_pos() invalidating
iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 286d8ad0 16-Mar-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a use after free in dio write path

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 163e885a 26-Feb-2020 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill TRANS_RESET_MEM|TRANS_RESET_ITERS

All iterators should be released now with bch2_trans_iter_put(), so
TRANS_RESET_ITERS shouldn't be needed anymore, and TRANS_RESET_MEM is
always used.

Also convert more code to __bch2_trans_do().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 24326cd1 31-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Sort & deduplicate updates in bch2_trans_update()

Previously, when doing multiple update in the same transaction commit
that overwrote each other, we relied on doing the updates in the same
order as the bch2_trans_update() calls in order to get the correct
result. But that wasn't correct for triggers; bch2_trans_mark_update()
when marking overwrites would do the wrong thing because it hadn't seen
the update that was being overwritten.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2d594dfb 31-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out btree_trigger_flags

The trigger flags really belong with individual btree_insert_entries,
not the transaction commit flags - this splits out those flags and
unifies them with the BCH_BUCKET_MARK flags. Todo - split out
btree_trigger.c from buckets.c

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 58e2388f 22-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill BTREE_INSERT_ATOMIC

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b1fd23df 22-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert all bch2_trans_commit() users to BTREE_INSERT_ATOMIC

BTREE_INSERT_ATOMIC should really be the default mode, and there's not
that much code that doesn't need it - so this is prep work for getting
rid of the flag.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a8abd3a7 20-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_trans_reset() calls should be at the tops of loops

It needs to be called when we get -EINTR due to e.g. lock restart - this
fixes a transaction iterators overflow bug.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# c45d473d 18-Dec-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix for an assertion on filesystem error

Normally the in memory i_size is always greater than or equal to i_size
on disk; this doesn't hold on filesystem error.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5934a0ca 20-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bkey_on_stack_reassemble()

Small helper function.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4de77495 16-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reorganize extents.c

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4be1a412 09-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Inline data extents

This implements extents that have their data inline, in the value,
instead of the bkey value being pointers to the data - and the read and
write paths are updated to read from these new extent types and write
them out, when the write size is small enough.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 08c07fea 15-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out extent_update.c

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 085ab693 09-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rework of cut_front & cut_back

This changes bch2_cut_front and bch2_cut_back so that they're able to
shorten the size of the value, and it also changes the extent update
path to update the accounting in the btree node when this happens.

When the size of the value is shortened, they zero out the space that's
no longer used, so it's interpreted as noops (as implemented in the last
patch).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 35189e09 09-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bkey_on_stack

This implements code for storing small bkeys on the stack and allocating
out of a mempool if they're too big.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 50fe5bd6 13-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use wbc_to_write_flags()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 677fc056 04-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Some reflink fixes

len might fit into a loff_t when aligned_len does not - make sure we use
a u64 for aligned_len. Also, we weren't always extending the inode
correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a023127a 02-Nov-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Eliminate function calls in DIO fastpaths

We can assume that usually buffered and O_DIRECT IO won't be mixed, and
the calls to flush the page cache won't be needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 54847d25 04-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: DIO write path only needs to shoot down pagecache once, not twice

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1b783a69 18-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add pagecache_add lock to buffered IO path, fault path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7edcfbfe 01-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't hold inode lock longer than necessary in dio write path

In theory we should be able to do (non appending/extending) dio writes
without taking the inode lock at all - but this gets us most of the way
there.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f8f30863 01-Nov-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Avoid atomics in write fast path

This adds some horrible hacks, but the atomic ops for closures were
getting to be a pretty expensive part of the write path. We don't want
to rip out closures entirely from the write path, because they're used
for e.g. waiting on the allocator, or waiting on the journal flush, and
that stuff would get really ugly without closures.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 406d6d5a 25-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an error path race

On IO error, bch2_writepages_io_done() will set the page state to
indicate nothing's already reserved (since the write didn't happen, we
don't know what's already reserved). This can race with the buffered IO
path, in between getting a disk reservation and calling
bch2_set_page_dirty().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2a9101a9 19-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor bch2_trans_commit() path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a9440743 19-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Limit bios in writepages path to 256M

This works around a bug where bio_full() doesn't check for
bio->bi_iter.bi_size overflowing - and, we don't really want to build
bios that are that big anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 9a3df993 08-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill bchfs_extent_update()

The generic IO path now handles inode updates for i_size and i_sectors -
this means we can drop a fair amount of code from fs-io.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2e87eae1 09-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert bch2_fpunch to bch2_extent_update()

As before - we're moving non Linux specific code out of fs-io.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2925fc49 08-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out bchfs_extent_update()

The next few patches are going to be more moving the logic around
i_size/i_sectors updates to io.c, and better separating the Linux VFS
specific code from core bcachefs code, to better support the fuse port.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e0541a93 09-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill some dependencies on ei_inode

Moving bch2_extent_update() to io.c will be greatly simplified if we
no longer have to keep ei_inode.bi_size/bi_sectors up to date.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# daf3fe50 09-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check if extending inode differently

In bch2_extent_update(), we have to update the inode if i_size is
changing (the file is being extend) or if i_sectors is changing, but we
want to avoid touching the inode if it's not necessary.

Change sum_sector_overwrites() to also check if there's already data
above where we're writing to - this means we're definitely not extending
the file.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3826ee0b 09-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add a lock to bch_page_state

We can't use the page lock to protect it, because on writeback IO error
we need to access the page state before calling end_page_writeback() and
the page lock semantics are completely insane so that deadlocks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 137b0ed9 04-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_extent_atomic_end() now traverses iter

This fixes a bug in io.c bch2_write_index_default() - it was missing the
traverse call, but bch2_extent_atomic_end returns an error now and can
just call it itself.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 58677a1d 01-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_inode_peek()/bch2_inode_write()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8de819f8 01-Oct-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix __bch2_buffered_write() returning -ENOMEM

When grab_cache_page_write_begin() fails but we did pin some pages, we
shouldn't return -ENOMEM, we should do a partial write.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 64bc0011 26-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Rework btree iterator lifetimes

The btree_trans struct needs to memoize/cache btree iterators, so that
on transaction restart we don't have to completely redo btree lookups,
and so that we can do them all at once in the correct order when the
transaction had to restart to avoid a deadlock.

This switches the btree iterator lookups to work based on iterator
position, instead of trying to match them up based on the stack trace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a7199432 22-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill deferred btree updates

Will be replaced by cached btree iterators

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 877dfb34 26-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix for partial buffered writes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bbd8d203 22-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: BTREE_ITER_SLOTS isn't a type of btree iter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d55460bb 25-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Trivial cleanup

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fb472ac5 24-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert a BUG_ON() to a warning

We shouldn't ever be writing past i_size - but, apparently there's still
a bug to track down.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0a426c32 22-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Handle bio_iov_iter_get_pages() returning unaligned bio

If the user buffer isn't aligned to the filesystem block size, on a
large enough IO - where it won't fit into a single bio -
bio_iov_iter_get_pages() won't necessarily return a bio with the proper
alignment.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5f786787 07-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add support for FALLOC_FL_INSERT_RANGE

Somewhat tricky and ugly, because iterating over extents backwards is a
pain.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 6cc3535d 19-Sep-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Don't write past eof

When converting from PAGE_SIZE to block_size, the .mkwrite path was
missed

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 63095894 22-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Improved bch2_fcollapse()

Move extents instead of copying them - this way, we can iterate over
only live extents, not the entire keyspace. Also, this means we can
mostly skip running triggers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3fb5ebcd 22-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Inline some fast paths

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4b0a66d5 21-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Check alignment in write path

Also - fix alignment in bch2_set_page_dirty()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 76426098 16-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Reflink

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 3c7f3b7a 16-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor bch2_extent_trim_atomic() for reflink

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b3fce09c 13-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Mark space as unallocated on write failure

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a99b1caf 06-Aug-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Truncate/fpunch now works on block boundaries, not page

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2ba5d38b 30-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Count reserved extents as holes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 543ef2eb 30-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Handle partial pages in seek data/hole

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d1542e03 29-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Change buffered write path to write to partial pages

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7f5e31e1 28-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Change __bch2_writepage() to not write to holes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e10d3094 29-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix bch2_seek_data()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 99aaf570 25-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Refactor various code to not be extent specific

With reflink, various code now has to handle both KEY_TYPE_extent
or KEY_TYPE_reflink_v - so, convert it to be generic across all keys
with pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b17657d0 18-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Dont't call bch2_trans_begin_updates() in bch2_extent_update()

Prep work for reflink - for reflink, we're going to be using
bch2_extent_update() with other updates in the same transaction.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 06ed8558 08-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Add offset_into_extent param to bch2_read_extent()

With reflink, we'll no longer be able to calculate the offset of the
data we want into the extent we're reading from from the extent pos and
the iter pos - we'll have to pass it in separately.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f57a6a5d 02-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Track dirtyness at sector level, not page

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# adfcfaf0 02-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill page_state_cmpxchg

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e1036a2a 02-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Always touch page state with page locked

This will mean we don't have to use cmpxchg for modifying page state,
which will simplify a fair amount of code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 885678f6 03-Jul-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Kill direct access to bi_io_vec

Switch to always using bio_add_page(), which merges contiguous pages now
that we have multipage bvecs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 20bceecb 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: More work to avoid transaction restarts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 58fbf808 15-May-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Delete duplicate code

Also rename for consistency

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 75812e70 17-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix fsync error reporting

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 94f651e2 17-Apr-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Return errors from for_each_btree_key()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# a6d90385 24-Dec-2022 Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: (invalidate|release)_folio fixes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 0f238367 27-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: trans_for_each_iter()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 424eb881 25-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Only get btree iters from btree transactions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 61f321fc 13-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make deferred inode updates a mount option

Journal reclaim may still need performance tuning

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 446c562c 04-Mar-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Remove direct use of bch2_btree_iter_link()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5154704b 20-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Use deferred btree updates for inode updates

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7ef2a73a 21-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix check for if extent update is allocating

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 919dbbd1 19-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: dio arithmetic improvements

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# ed484030 13-Jan-2019 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a dio bug

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 26609b61 01-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Make bkey types globally unique

this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.

Also improve the on disk format versioning stuff.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 01a0108f 29-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix a btree iter usage error

previously, if the code traversed to the next btree node, that could
return an error (due to lock restarts) - which was not being checked
for.

fix is to rework it so it never iterates past the current leaf node, and
pops an assertion if it ever sees an error.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# f81b648d 14-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Clean up, possixly fix page disk reservation accounting

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# b1ba2359 14-Nov-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix an error path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1742237b 27-Sep-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: extent_for_each_ptr_decode()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 7b3f84ea 05-Oct-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Split out alloc_background.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# fc3268c1 08-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: kill extent_insert_hook

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 190fa7af 05-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: kill i_sectors_hook

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 8ef231bd 11-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: convert fcollapse to bch2_extent_update()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 5f461e01 08-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: convert fpunch to bch2_extent_update()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 54e2264e 08-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: convert truncate to bch2_extent_update()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 08af47df 08-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: convert bchfs_write_index_update() to bch2_extent_update()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# e2d9912c 05-Aug-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: bch2_extent_trim_atomic()

Prep work for extents insert hook removal

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# bb1b3658 23-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: minor fsync fix

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 658971f2 23-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: fix mtime/ctime update on truncate

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 2ea90048 17-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Fix mtime/ctime updates

Also make inode flags consistent with how the rest of the inode is
updated

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 4e1ec2cc 17-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Simplify bch2_write_inode_trans, fix lockdep splat

ei_update_lock isn't currently needed for write inode (but it will be
needed again when deferred btree updates are used for inode updates)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# d69f41d6 12-Jul-2018 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Convert raw uses of bch2_btree_iter_link() to new transactions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>


# 1c6fdbd8 17-Mar-2017 Kent Overstreet <kent.overstreet@gmail.com>

bcachefs: Initial commit

Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>