History log of /linux-master/fs/btrfs/bio.h
Revision Date Author Comments
# 602035d7 26-Jan-2024 David Sterba <dsterba@suse.com>

btrfs: add forward declarations and headers, part 2

Do a cleanup in more headers:

- add forward declarations for types referenced by pointers
- add includes when types need them

This fixes potential compilation problems if the headers are reordered
or the missing includes are not provided indirectly.

Signed-off-by: David Sterba <dsterba@suse.com>


# 96c36eaa 11-Dec-2023 Qu Wenruo <wqu@suse.com>

btrfs: migrate btrfs_repair_io_failure() to folio interfaces

[BUG]
Test case btrfs/124 failed if larger metadata folio is enabled, the
dying message looks like this:

BTRFS error (device dm-2): bad tree block start, mirror 2 want 31686656 have 0
BTRFS info (device dm-2): read error corrected: ino 0 off 31686656 (dev /dev/mapper/test-scratch2 sector 20928)
BUG: kernel NULL pointer dereference, address: 0000000000000020
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
CPU: 6 PID: 350881 Comm: btrfs Tainted: G OE 6.7.0-rc3-custom+ #128
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
RIP: 0010:btrfs_read_extent_buffer+0x106/0x180 [btrfs]
PKRU: 55555554
Call Trace:
<TASK>
read_tree_block+0x33/0xb0 [btrfs]
read_block_for_search+0x23e/0x340 [btrfs]
btrfs_search_slot+0x2f9/0xe60 [btrfs]
btrfs_lookup_csum+0x75/0x160 [btrfs]
btrfs_lookup_bio_sums+0x21a/0x560 [btrfs]
btrfs_submit_chunk+0x152/0x680 [btrfs]
btrfs_submit_bio+0x1c/0x50 [btrfs]
submit_one_bio+0x40/0x80 [btrfs]
submit_extent_page+0x158/0x390 [btrfs]
btrfs_do_readpage+0x330/0x740 [btrfs]
extent_readahead+0x38d/0x6c0 [btrfs]
read_pages+0x94/0x2c0
page_cache_ra_unbounded+0x12d/0x190
relocate_file_extent_cluster+0x7c1/0x9d0 [btrfs]
relocate_block_group+0x2d3/0x560 [btrfs]
btrfs_relocate_block_group+0x2c7/0x4b0 [btrfs]
btrfs_relocate_chunk+0x4c/0x1a0 [btrfs]
btrfs_balance+0x925/0x13c0 [btrfs]
btrfs_ioctl+0x19f1/0x25d0 [btrfs]
__x64_sys_ioctl+0x90/0xd0
do_syscall_64+0x3f/0xf0
entry_SYSCALL_64_after_hwframe+0x6e/0x76

[CAUSE]
The dying line is at btrfs_repair_io_failure() call inside
btrfs_repair_eb_io_failure().

The function is still relying on the extent buffer using page sized
folios.
When the extent buffer is using larger folio, we go into the 2nd slot of
folios[], and triggered the NULL pointer dereference.

[FIX]
Migrate btrfs_repair_io_failure() to folio interfaces.

So that when we hit a larger folio, we just submit the whole folio in
one go.

This also affects data repair path through btrfs_end_repair_bio(),
thankfully data is still fully page based, we can just add an
ASSERT(), and use page_folio() to convert the page to folio.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# ec63b84d 31-May-2023 Christoph Hellwig <hch@lst.de>

btrfs: add an ordered_extent pointer to struct btrfs_bio

Add a pointer to the ordered_extent to the existing union in struct
btrfs_bio, so all code dealing with data write bios can just use a
pointer dereference to retrieve the ordered_extent instead of doing
multiple rbtree lookups per I/O.

The reference to this ordered_extent is dropped at end I/O time,
which implies that an extra one must be acquired when the bio is split.
This also requires moving the btrfs_extract_ordered_extent call into
btrfs_split_bio so that the invariant of always having a valid
ordered_extent reference for the btrfs_bio is kept.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# a39da514 31-May-2023 Christoph Hellwig <hch@lst.de>

btrfs: limit write bios to a single ordered extent

Currently buffered writeback bios are allowed to span multiple
ordered_extents, although that basically never actually happens since
commit 4a445b7b6178 ("btrfs: don't merge pages into bio if their page
offset is not contiguous").

Supporting bios than span ordered_extents complicates the file
checksumming code, and prevents us from adding an ordered_extent pointer
to the btrfs_bio structure. Use the existing code to limit a bio to
single ordered_extent for zoned device writes for all writes.

This allows to remove the REQ_BTRFS_ONE_ORDERED flags, and the
handling of multiple ordered_extents in btrfs_csum_one_bio.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# cbfce4c7 24-May-2023 Christoph Hellwig <hch@lst.de>

btrfs: optimize the logical to physical mapping for zoned writes

The current code to store the final logical to physical mapping for a
zone append write in the extent tree is rather inefficient. It first has
to split the ordered extent so that there is one ordered extent per bio,
so that it can look up the ordered extent on I/O completion in
btrfs_record_physical_zoned and store the physical LBA returned by the
block driver in the ordered extent.

btrfs_rewrite_logical_zoned then has to do a lookup in the chunk tree to
see what physical address the logical address for this bio / ordered
extent is mapped to, and then rewrite it in the extent tree.

To optimize this process, we can store the physical address assigned in
the chunk tree to the original logical address and a pointer to
btrfs_ordered_sum structure the in the btrfs_bio structure, and then use
this information to rewrite the logical address in the btrfs_ordered_sum
structure directly at I/O completion time in btrfs_record_physical_zoned.
btrfs_rewrite_logical_zoned then simply updates the logical address in
the extent tree and the ordered_extent itself.

The code in btrfs_rewrite_logical_zoned now runs for all data I/O
completions in zoned file systems, which is fine as there is no remapping
to do for non-append writes to conventional zones or for relocation, and
the overhead for quickly breaking out of the loop is very low.

Because zoned file systems now need the ordered_sums structure to
record the actual write location returned by zone append, allocate dummy
structures without the csum array for them when the I/O doesn't use
checksums, and free them when completing the ordered_extent.

Note that the btrfs_bio doesn't grow as the new field are places into
a union that is so far not used for data writes and has plenty of space
left in it.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# 4886ff7b 19-Mar-2023 Qu Wenruo <wqu@suse.com>

btrfs: introduce a new helper to submit write bio for repair

Both scrub and read-repair are utilizing a special repair writes that:

- Only writes back to a single device
Even for read-repair on RAID56, we only update the corrupted data
stripe itself, not triggering the full RMW path.

- Requires a valid @mirror_num
For RAID56 case, only @mirror_num == 1 is valid.
For non-RAID56 cases, we need @mirror_num to locate our stripe.

- No data csum generation needed

These two call sites still have some differences though:

- Read-repair goes plain bio
It doesn't need a full btrfs_bio, and goes submit_bio_wait().

- New scrub repair would go btrfs_bio
To simplify both read and write path.

So here this patch would:

- Introduce a common helper, btrfs_map_repair_block()
Due to the single device nature, we can use an on-stack
btrfs_io_stripe to pass device and its physical bytenr.

- Introduce a new interface, btrfs_submit_repair_bio(), for later scrub
code
This is for the incoming scrub code.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# 4317ff00 23-Mar-2023 Qu Wenruo <wqu@suse.com>

btrfs: introduce btrfs_bio::fs_info member

Currently we're doing a lot of work for btrfs_bio:

- Checksum verification for data read bios
- Bio splits if it crosses stripe boundary
- Read repair for data read bios

However for the incoming scrub patches, we don't want this extra
functionality at all, just plain logical + mirror -> physical mapping
ability.

Thus here we do the following changes:

- Introduce btrfs_bio::fs_info
This is for the new scrub specific btrfs_bio, which would not populate
btrfs_bio::inode.
Thus we need such new member to grab a fs_info

This new member will always be populated.

- Replace @inode argument with @fs_info for btrfs_bio_init() and its
caller
Since @inode is no longer a mandatory member, replace it with
@fs_info, and let involved users populate @inode.

- Skip checksum verification and generation if @bbio->inode is NULL

- Add extra ASSERT()s
To make sure:

* bbio->inode is properly set for involved read repair path
* if @file_offset is set, bbio->inode is also populated

- Grab @fs_info from @bbio directly
We can no longer go @bbio->inode->root->fs_info, as bbio->inode can be
NULL. This involves:

* btrfs_simple_end_io()
* should_async_write()
* btrfs_wq_submit_bio()
* btrfs_use_zone_append()

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# 3480373e 26-Mar-2023 Christoph Hellwig <hch@lst.de>

btrfs, block: move REQ_CGROUP_PUNT to btrfs

REQ_CGROUP_PUNT is a bit annoying as it is hard to follow and adds
a branch to the bio submission hot path. To fix this, export
blkcg_punt_bio_submit and let btrfs call it directly. Add a new
REQ_FS_PRIVATE flag for btrfs to indicate to it's own low-level
bio submission code that a punt to the cgroup submission helper
is required.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# b41bbd29 07-Mar-2023 Christoph Hellwig <hch@lst.de>

btrfs: return a btrfs_bio from btrfs_bio_alloc

Return the containing struct btrfs_bio instead of the less type safe
struct bio from btrfs_bio_alloc.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>


# ae42a154 07-Mar-2023 Christoph Hellwig <hch@lst.de>

btrfs: pass a btrfs_bio to btrfs_submit_bio

btrfs_submit_bio expects the bio passed to it to be embedded into a
btrfs_bio structure. Pass the btrfs_bio directly to increase type
safety and make the code self-documenting.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>


# 285599b6 20-Jan-2023 Christoph Hellwig <hch@lst.de>

btrfs: remove the fs_info argument to btrfs_submit_bio

btrfs_submit_bio can derive it trivially from bbio->inode, so stop
bothering in the callers.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# 67d66982 20-Jan-2023 Christoph Hellwig <hch@lst.de>

btrfs: pass the iomap bio to btrfs_submit_bio

Now that btrfs_submit_bio splits the bio when crossing stripe boundaries,
there is no need for the higher level code to do that manually.

For direct I/O this is really helpful, as btrfs_submit_io can now simply
take the bio allocated by iomap and send it on to btrfs_submit_bio
instead of allocating clones.

For that to work, the bio embedded into struct btrfs_dio_private needs to
become a full btrfs_bio as expected by btrfs_submit_bio.

With this change there is a single work item to offload the entire iomap
bio so the heuristics to skip async processing for bios that were split
isn't needed anymore either.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>


# 852eee62 20-Jan-2023 Christoph Hellwig <hch@lst.de>

btrfs: allow btrfs_submit_bio to split bios

Currently the I/O submitters have to split bios according to the chunk
stripe boundaries. This leads to extra lookups in the extent trees and
a lot of boilerplate code.

To drop this requirement, split the bio when __btrfs_map_block returns a
mapping that is smaller than the requested size and keep a count of
pending bios in the original btrfs_bio so that the upper level
completion is only invoked when all clones have completed.

Based on a patch from Qu Wenruo.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>


# f8c44673 20-Jan-2023 Christoph Hellwig <hch@lst.de>

btrfs: simplify the btrfs_csum_one_bio calling convention

To prepare for further bio submission changes btrfs_csum_one_bio
should be able to take all it's arguments from the btrfs_bio structure.
It can always use the bbio->inode already, and once the compression code
is updated to set ->file_offset that one can be used unconditionally
as well instead of looking at the page mapping now that btrfs doesn't
allow ordered extents to span discontiguous data ranges.

The only slightly tricky bit is the one_ordered flag set by the
compressed writes. Replace that one with the driver private bio
flag, which gets cleared before the bio is handed off to the block layer
so that we don't get in the way of driver use.

Note: this leaves an argument and a flag to btrfs_wq_submit_bio unused.
But that whole mechanism will be removed in its current form in the
next patch.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# 295fe46f 20-Jan-2023 Christoph Hellwig <hch@lst.de>

btrfs: remove struct btrfs_bio::is_metadata flag

This flag is unused now, so remove it. Re-expand the mirror_num field
to 8 bits, and move it to the I/O completion internal section of the
structure.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# 0d3acb25 20-Jan-2023 Christoph Hellwig <hch@lst.de>

btrfs: rename btrfs_bio::iter field

Rename iter to saved_iter and move it next to the repair internals
and nothing outside of bio.c should be touching it.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# 860c8c45 20-Jan-2023 Christoph Hellwig <hch@lst.de>

btrfs: remove struct btrfs_bio::device field

The device field is only used by the simple end I/O handler, and for
that it can simply be stored in the bi_private field of the bio,
which is currently used for the fs_info that can be retrieved through
bbio->inode as well.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# ac9f942e 20-Jan-2023 Christoph Hellwig <hch@lst.de>

btrfs: remove btrfs_bio_for_each_sector

btrfs_bio_for_each_sector is unused now, so remove it.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# 7ab0fdfc 20-Jan-2023 Christoph Hellwig <hch@lst.de>

btrfs: open code btrfs_bio_free_csum

btrfs_bio_free_csum has only one caller left, and that caller is always
for an data inode and doesn't need zeroing of the csum pointer as that
pointer will never be touched again. Just open code the conditional
kfree there.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# d0e5cb2b 20-Jan-2023 Christoph Hellwig <hch@lst.de>

btrfs: add a btrfs_inode pointer to struct btrfs_bio

All btrfs_bio I/Os are associated with an inode. Add a pointer to that
inode, which will allow to simplify a lot of calling conventions, and
which will be needed in the I/O completion path in the future.

This grow the btrfs_bio structure by a pointer, but that grows will
be offset by the removal of the device pointer soon.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# e0cfbb2c 20-Jan-2023 Christoph Hellwig <hch@lst.de>

btrfs: better document struct btrfs_bio

Update the comments on btrfs_bio to better describe the structure.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# 103c1972 15-Nov-2022 Christoph Hellwig <hch@lst.de>

btrfs: split the bio submission path into a separate file

The code used by btrfs_submit_bio only interacts with the rest of
volumes.c through __btrfs_map_block (which itself is a more generic
version of two exported helpers) and does not really have anything
to do with volumes.c. Create a new bio.c file and a bio.h header
going along with it for the btrfs_bio-based storage layer, which
will grow even more going forward.

Also update the file with my copyright notice given that a large
part of the moved code was written or rewritten by me.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>