Cross Reference: /linux-master/fs/btrfs/volumes.c

History log of /linux-master/fs/btrfs/volumes.c
Revision	Date	Author	Comments
# 2f1aeab9	18-Mar-2024	Anand Jain <anand.jain@oracle.com>	btrfs: return accurate error code on open failure in open_fs_devices() When attempting to exclusive open a device which has no exclusive open permission, such as a physical device associated with the flakey dm device, the open operation will fail, resulting in a mount failure. In this particular scenario, we erroneously return -EINVAL instead of the correct error code provided by the bdev_open_by_path() function, which is -EBUSY. Fix this, by returning error code from the bdev_open_by_path() function. With this correction, the mount error message will align with that of ext4 and xfs. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 9f7eb840	29-Feb-2024	Anand Jain <anand.jain@oracle.com>	btrfs: validate device maj:min during open Boris managed to create a device capable of changing its maj:min without altering its device path. Only multi-devices can be scanned. A device that gets scanned and remains in the btrfs kernel cache might end up with an incorrect maj:min. Despite the temp-fsid feature patch did not introduce this bug, it could lead to issues if the above multi-device is converted to a single device with a stale maj:min. Subsequently, attempting to mount the same device with the correct maj:min might mistake it for another device with the same fsid, potentially resulting in wrongly auto-enabling the temp-fsid feature. To address this, this patch validates the device's maj:min at the time of device open and updates it if it has changed since the last scan. CC: stable@vger.kernel.org # 6.7+ Fixes: a5b8a5f9f835 ("btrfs: support cloned-device mount capability") Reported-by: Boris Burkov <boris@bur.io> Co-developed-by: Boris Burkov <boris@bur.io> Reviewed-by: Boris Burkov <boris@bur.io># Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# d565fffa	12-Feb-2024	Anand Jain <anand.jain@oracle.com>	btrfs: do not skip re-registration for the mounted device There are reports that since version 6.7 update-grub fails to find the device of the root on systems without initrd and on a single device. This looks like the device name changed in the output of /proc/self/mountinfo: 6.5-rc5 working 18 1 0:16 / / rw,noatime - btrfs /dev/sda8 ... 6.7 not working: 17 1 0:15 / / rw,noatime - btrfs /dev/root ... and "update-grub" shows this error: /usr/sbin/grub-probe: error: cannot find a device for / (is /dev mounted?) This looks like it's related to the device name, but grub-probe recognizes the "/dev/root" path and tries to find the underlying device. However there's a special case for some filesystems, for btrfs in particular. The generic root device detection heuristic is not done and it all relies on reading the device infos by a btrfs specific ioctl. This ioctl returns the device name as it was saved at the time of device scan (in this case it's /dev/root). The change in 6.7 for temp_fsid to allow several single device filesystem to exist with the same fsid (and transparently generate a new UUID at mount time) was to skip caching/registering such devices. This also skipped mounted device. One step of scanning is to check if the device name hasn't changed, and if yes then update the cached value. This broke the grub-probe as it always read the device /dev/root and couldn't find it in the system. A temporary workaround is to create a symlink but this does not survive reboot. The right fix is to allow updating the device path of a mounted filesystem even if this is a single device one. In the fix, check if the device's major:minor number matches with the cached device. If they do, then we can allow the scan to happen so that device_list_add() can take care of updating the device path. The file descriptor remains unchanged. This does not affect the temp_fsid feature, the UUID of the mounted filesystem remains the same and the matching is based on device major:minor which is unique per mounted filesystem. This covers the path when the device (that exists for all mounted devices) name changes, updating /dev/root to /dev/sdx. Any other single device with filesystem and is not mounted is still skipped. Note that if a system is booted and initial mount is done on the /dev/root device, this will be the cached name of the device. Only after the command "btrfs device scan" it will change as it triggers the rename. The fix was verified by users whose systems were affected. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=218353 Link: https://lore.kernel.org/lkml/CAKLYgeJ1tUuqLcsquwuFqjDXPSJpEiokrWK2gisPKDZLs8Y2TQ@mail.gmail.com/ Fixes: bc27d6f0aa0e ("btrfs: scan but don't register device on single device filesystem") CC: stable@vger.kernel.org # 6.7+ Tested-by: Alex Romosan <aromosan@gmail.com> Tested-by: CHECK_1234543212345@protonmail.com Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# ae6bd7f9	29-Feb-2024	Filipe Manana <fdmanana@suse.com>	btrfs: fix off-by-one chunk length calculation at contains_pending_extent() At contains_pending_extent() the value of the end offset of a chunk we found in the device's allocation state io tree is inclusive, so when we calculate the length we pass to the in_range() macro, we must sum 1 to the expression "physical_end - physical_offset". In practice the wrong calculation should be harmless as chunks sizes are never 1 byte and we should never have 1 byte ranges of unallocated space. Nevertheless fix the wrong calculation. Reported-by: Alex Lyakas <alex.lyakas@zadara.com> Link: https://lore.kernel.org/linux-btrfs/CAOcd+r30e-f4R-5x-S7sV22RJPe7+pgwherA6xqN2_qe7o4XTg@mail.gmail.com/ Fixes: 1c11b63eff2a ("btrfs: replace pending/pinned chunks lists with io tree") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 0782303a	24-Feb-2024	Anand Jain <anand.jain@oracle.com>	btrfs: include device major and minor numbers in the device scan notice To better debug issues surrounding device scans, include the device's major and minor numbers in the device scan notice for btrfs. Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 1cdeac6d	22-Feb-2024	David Sterba <dsterba@suse.com>	btrfs: pass btrfs_device to btrfs_scratch_superblocks() Replace the two parameters bdev and name by one that can be used to get them both. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# e6052347	16-Feb-2024	David Sterba <dsterba@suse.com>	btrfs: move balance args conversion helpers to volumes.c The from/to CPU/disk helpers for balance args are used only in volumes, no need to define them in accessors.h. Signed-off-by: David Sterba <dsterba@suse.com>
# 53e4d8c2	24-Jan-2024	David Sterba <dsterba@suse.com>	btrfs: change BUG_ON to assertion in reset_balance_state() The balance state machine is complex so it's good to verify the assumptions in helpers, however reset_balance_state() is used at the end of balance and fs_info::balance_ctl is properly set up before and protected by the exclusive op ownership in btrfs_balance(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 7411055d	23-Jan-2024	David Sterba <dsterba@suse.com>	btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks() The unhandled case in btrfs_relocate_sys_chunks() loop is a corruption, as it could be caused only by two impossible conditions: - at first the search key is set up to look for a chunk tree item, with offset -1, this is an inexact search and the key->offset will contain the correct offset upon a successful search, a valid chunk tree item cannot have an offset -1 - after first successful search, the found_key corresponds to a chunk item, the offset is decremented by 1 before the next loop, it's impossible to find a chunk item there due to alignment and size constraints Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 4dc4a3be	01-Feb-2024	Naohiro Aota <naohiro.aota@wdc.com>	btrfs: use READ/WRITE_ONCE for fs_devices->read_policy Since we can read/modify the value from the sysfs interface concurrently, it would be better to protect it from compiler optimizations. Currently, there is only one read policy BTRFS_READ_POLICY_PID available, so no actual problem can happen now. This is a preparation for the future expansion. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 2b712e3b	25-Jan-2024	David Sterba <dsterba@suse.com>	btrfs: remove unused included headers With help of neovim, LSP and clangd we can identify header files that are not actually needed to be included in the .c files. This is focused only on removal (with minor fixups), further cleanups are possible but will require doing the header files properly with forward declarations, minimized includes and include-what-you-use care. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 9ae061cf	23-Jan-2024	Christian Brauner <brauner@kernel.org>	btrfs: port device access to file Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-19-adbd023e19cc@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
# d967c914	21-Dec-2023	Naohiro Aota <naohiro.aota@wdc.com>	btrfs: fix unbalanced unlock of mapping_tree_lock The error path of btrfs_get_chunk_map() releases fs_info->mapping_tree_lock. But, it is taken and released in btrfs_find_chunk_map(). So, there is no need to do so. Fixes: 7dc66abb5a47 ("btrfs: use a dedicated data structure for chunk maps") Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
# e94dfb7a	13-Dec-2023	Johannes Thumshirn <johannes.thumshirn@wdc.com>	btrfs: pass btrfs_io_geometry into btrfs_max_io_len Instead of passing three individual members of 'struct btrfs_io_geometry' into btrfs_max_io_len(), pass a pointer to btrfs_io_geometry. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 6edf6822	13-Dec-2023	Johannes Thumshirn <johannes.thumshirn@wdc.com>	btrfs: pass struct btrfs_io_geometry to set_io_stripe Instead of passing three members of 'struct btrfs_io_geometry' into set_io_stripe() pass a pointer to the whole structure and then get the needed members out of btrfs_io_geometry. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 89f547c6	13-Dec-2023	Johannes Thumshirn <johannes.thumshirn@wdc.com>	btrfs: open code set_io_stripe for RAID56 Open code set_io_stripe() for RAID56, as it a) uses a different method to calculate the stripe_index b) doesn't need to go through raid-stripe-tree mapping code. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# b55b3077	13-Dec-2023	Johannes Thumshirn <johannes.thumshirn@wdc.com>	btrfs: change block mapping to switch/case in btrfs_map_block Now that all the per-profile if/else statement blocks have been converted to calls to helper the conversion to switch/case is straightforward. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# a16fb8c6	13-Dec-2023	Johannes Thumshirn <johannes.thumshirn@wdc.com>	btrfs: factor out block mapping for single profiles Now that we have a container for the I/O geometry that has all the needed information for the block mappings of SINGLE profiles, factor out a helper calculating this information. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 089221d3	13-Dec-2023	Johannes Thumshirn <johannes.thumshirn@wdc.com>	btrfs: factor out block mapping for RAID5/6 Now that we have a container for the I/O geometry that has all the needed information for the block mappings of RAID5 and RAID6, factor out a helper calculating this information. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# d9d4ce9f	13-Dec-2023	Johannes Thumshirn <johannes.thumshirn@wdc.com>	btrfs: reduce scope of data_stripes in btrfs_map_block Reduce the scope of 'data_stripes' in btrfs_map_block(). While the change alone may not make too much sense, it helps us factoring out a helper function for the block mapping of RAID56 I/O. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 8938f112	13-Dec-2023	Johannes Thumshirn <johannes.thumshirn@wdc.com>	btrfs: factor out block mapping for RAID10 Now that we have a container for the I/O geometry that has all the needed information for the block mappings of RAID10, factor out a helper calculating this information. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 5aeb15c8	13-Dec-2023	Johannes Thumshirn <johannes.thumshirn@wdc.com>	btrfs: factor out block mapping for DUP profiles Now that we have a container for the I/O geometry that has all the needed information for the block mappings of DUP, factor out a helper calculating this information. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 5e36aba8	13-Dec-2023	Johannes Thumshirn <johannes.thumshirn@wdc.com>	btrfs: factor out RAID1 block mapping Now that we have a container for the I/O geometry that has all the needed information for the block mappings of RAID1, factor out a helper calculating this information. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 30e8534b	13-Dec-2023	Johannes Thumshirn <johannes.thumshirn@wdc.com>	btrfs: factor out block-mapping for RAID0 Now that we have a container for the I/O geometry that has all the needed information for the block mappings of RAID0, factor out a helper calculating this information. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# fd747f2d	13-Dec-2023	Johannes Thumshirn <johannes.thumshirn@wdc.com>	btrfs: re-introduce struct btrfs_io_geometry Re-introduce struct btrfs_io_geometry, holding the necessary bits and pieces needed in btrfs_map_block() to decide the I/O geometry of a specific block mapping. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 02d05b64	13-Dec-2023	Johannes Thumshirn <johannes.thumshirn@wdc.com>	btrfs: factor out helper for single device IO check The check in btrfs_map_block() deciding if a particular I/O is targeting a single device is getting more and more convoluted. Factor out the check conditions into a helper function, with no functional change otherwise. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 7dc66abb	21-Nov-2023	Filipe Manana <fdmanana@suse.com>	btrfs: use a dedicated data structure for chunk maps Currently we abuse the extent_map structure for two purposes: 1) To actually represent extents for inodes; 2) To represent chunk mappings. This is odd and has several disadvantages: 1) To create a chunk map, we need to do two memory allocations: one for an extent_map structure and another one for a map_lookup structure, so more potential for an allocation failure and more complicated code to manage and link two structures; 2) For a chunk map we actually only use 3 fields (24 bytes) of the respective extent map structure: the 'start' field to have the logical start address of the chunk, the 'len' field to have the chunk's size, and the 'orig_block_len' field to contain the chunk's stripe size. Besides wasting a memory, it's also odd and not intuitive at all to have the stripe size in a field named 'orig_block_len'. We are also using 'block_len' of the extent_map structure to contain the chunk size, so we have 2 fields for the same value, 'len' and 'block_len', which is pointless; 3) When an extent map is associated to a chunk mapping, we set the bit EXTENT_FLAG_FS_MAPPING on its flags and then make its member named 'map_lookup' point to the associated map_lookup structure. This means that for an extent map associated to an inode extent, we are not using this 'map_lookup' pointer, so wasting 8 bytes (on a 64 bits platform); 4) Extent maps associated to a chunk mapping are never merged or split so it's pointless to use the existing extent map infrastructure. So add a dedicated data structure named 'btrfs_chunk_map' to represent chunk mappings, this is basically the existing map_lookup structure with some extra fields: 1) 'start' to contain the chunk logical address; 2) 'chunk_len' to contain the chunk's length; 3) 'stripe_size' for the stripe size; 4) 'rb_node' for insertion into a rb tree; 5) 'refs' for reference counting. This way we do a single memory allocation for chunk mappings and we don't waste memory for them with unused/unnecessary fields from an extent_map. We also save 8 bytes from the extent_map structure by removing the 'map_lookup' pointer, so the size of struct extent_map is reduced from 144 bytes down to 136 bytes, and we can now have 30 extents map per 4K page instead of 28. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 5031660a	21-Nov-2023	Filipe Manana <fdmanana@suse.com>	btrfs: mark sanity checks when getting chunk map as unlikely When getting a chunk map, at btrfs_get_chunk_map(), we do some sanity checks to verify that we found an extent map and that it includes the requested logical address. These are never expected to fail, so mark them as unlikely to make it more clear as well as to allow a compiler to generate more efficient code. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 7d410d5e	21-Nov-2023	Filipe Manana <fdmanana@suse.com>	btrfs: make error messages more clear when getting a chunk map When getting a chunk map, at btrfs_get_chunk_map(), we do some sanity checks to verify we found a chunk map and that map found covers the logical address the caller passed in. However the messages aren't very clear in the sense that don't mention the issue is with a chunk map and one of them prints the 'length' argument as if it were the end offset of the requested range (while the in the string format we use %llu-%llu which suggests a range, and the second %llu-%llu is actually a range for the chunk map). So improve these two details in the error messages. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 5fba5a57	21-Nov-2023	Filipe Manana <fdmanana@suse.com>	btrfs: fix off-by-one when checking chunk map includes logical address At btrfs_get_chunk_map() we get the extent map for the chunk that contains the given logical address stored in the 'logical' argument. Then we do sanity checks to verify the extent map contains the logical address. One of these checks verifies if the extent map covers a range with an end offset behind the target logical address - however this check has an off-by-one error since it will consider an extent map whose start offset plus its length matches the target logical address as inclusive, while the fact is that the last byte it covers is behind the target logical address (by 1). So fix this condition by using '<=' rather than '<' when comparing the extent map's "start + length" against the target logical address. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# cd63ffbd	30-Oct-2023	Filipe Manana <fdmanana@suse.com>	btrfs: fix error pointer dereference after failure to allocate fs devices At device_list_add() we allocate a btrfs_fs_devices structure and then before checking if the allocation failed (pointer is ERR_PTR(-ENOMEM)), we dereference the error pointer in a memcpy() argument if the feature BTRFS_FEATURE_INCOMPAT_METADATA_UUID is enabled. Fix this by checking for an allocation error before trying the memcpy(). Fixes: f7361d8c3fc3 ("btrfs: sipmlify uuid parameters of alloc_fs_devices()") Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 86ec15d0	27-Sep-2023	Jan Kara <jack@suse.cz>	btrfs: Convert to bdev_open_by_path() Convert btrfs to use bdev_open_by_path() and pass the handle around. We also drop the holder from struct btrfs_device as it is now not needed anymore. CC: David Sterba <dsterba@suse.com> CC: linux-btrfs@vger.kernel.org Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-20-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
# c47b02c1	04-Oct-2023	Anand Jain <anand.jain@oracle.com>	btrfs: disable the seed feature for temp-fsid A seed device is an integral component of the sprout device, which functions as a multi-device filesystem. Therefore, temp-fsid feature is not supported. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
# a5b8a5f9	27-Sep-2023	Anand Jain <anand.jain@oracle.com>	btrfs: support cloned-device mount capability Guilherme's previous work [1] aimed at the mounting of cloned devices using a superblock flag SINGLE_DEV during mkfs. [1] https://lore.kernel.org/linux-btrfs/20230831001544.3379273-1-gpiccoli@igalia.com/ Building upon this work, here is in memory only approach. As it mounts we determine if the same fsid is already mounted if then we generate a random temp fsid which shall be used the mount, in memory only not written to the disk. We distinguish devices by devt. Example: $ fallocate -l 300m ./disk1.img $ mkfs.btrfs -f ./disk1.img $ cp ./disk1.img ./disk2.img $ cp ./disk1.img ./disk3.img $ mount -o loop ./disk1.img /btrfs $ mount -o ./disk2.img /btrfs1 $ mount -o ./disk3.img /btrfs2 $ btrfs fi show -m Label: none uuid: 4a212b48-1bec-46a5-938a-783c8c1f0b02 Total devices 1 FS bytes used 144.00KiB devid 1 size 300.00MiB used 88.00MiB path /dev/loop0 Label: none uuid: adabf2fe-5515-4ad0-95b4-7b1609218c16 Total devices 1 FS bytes used 144.00KiB devid 1 size 300.00MiB used 88.00MiB path /dev/loop1 Label: none uuid: 1d77d0df-7d92-439e-adbd-20b9b86fdedb Total devices 1 FS bytes used 144.00KiB devid 1 size 300.00MiB used 88.00MiB path /dev/loop2 Co-developed-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 69d427f3	27-Sep-2023	Anand Jain <anand.jain@oracle.com>	btrfs: add helper function find_fsid_by_disk In preparation for adding support to mount multiple single-disk btrfs filesystems with the same FSID, wrap find_fsid() into find_fsid_by_disk(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 6f2d3c01	27-Sep-2023	Josef Bacik <josef@toxicpanda.com>	btrfs: increase ->free_chunk_space in btrfs_grow_device My overcommit patch exposed a bug with btrfs/177 [1]. The problem here is that when we grow the device we're not adding to ->free_chunk_space, so subsequent allocations can cause ->free_chunk_space to wrap, which causes problems in can_overcommit because we add this to ->total_bytes, which causes the counter to wrap and gives us an unexpected ENOSPC. Fix this by properly updating ->free_chunk_space with the new available space in btrfs_grow_device. [1] First version of the fix: https://lore.kernel.org/linux-btrfs/b97e47ce0ce1d41d221878de7d6090b90aa7a597.1695065233.git.josef@toxicpanda.com/ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
# e9fd2c05	27-Sep-2023	Josef Bacik <josef@toxicpanda.com>	btrfs: fix ->free_chunk_space math in btrfs_shrink_device There are two bugs in how we adjust ->free_chunk_space in btrfs_shrink_device. First we're removing the entire diff between new_size and old_size from ->free_chunk_space. This only works if we're reducing the free area, which we could potentially not be. So adjust the math to only subtract the diff in the free space from ->free_chunk_space. Additionally in the error case we're unconditionally adding the diff back into ->free_chunk_space, which we need to only do if this device is writeable. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 5966930d	20-Sep-2023	Anand Jain <anand.jain@oracle.com>	btrfs: remove incomplete metadata_uuid conversion fixup logic Previous commit ("btrfs: reject devices with CHANGING_FSID_V2") has stopped the assembly of devices with the CHANGING_FSID_V2 flag in the kernel. Such devices can be scanned but will not be registered and can't be mounted without a manual fix by btrfstune. Remove the related logic and now unused code. The original motivation was to allow an interrupted partial conversion fix itself on next mount, in case the system has to be rebooted. This is a convenience but brings a lot of complexity the device scanning and handling the partial states. It's hard to estimate if this was ever needed in practice, expecting the typical use case like a manual conversion of an unmounted filesystem where the user can verify the success and rerun it eventually. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add historical context ] Signed-off-by: David Sterba <dsterba@suse.com>
# 197a9ece	20-Sep-2023	Anand Jain <anand.jain@oracle.com>	btrfs: reject devices with CHANGING_FSID_V2 The BTRFS_SUPER_FLAG_CHANGING_FSID_V2 flag indicates a transient state where the device in the userspace btrfstune -m\|-M operation failed to complete changing the fsid. This flag makes the kernel to automatically determine the other partner devices to which a given device can be associated, based on the fsid, metadata_uuid and generation values. btrfstune -m\|M feature is especially useful in virtual cloud setups, where compute instances (disk images) are quickly copied, fsid changed, and launched. Given numerous disk images with the same metadata_uuid but different fsid, there's no clear way a device can be correctly assembled with the proper partners when the CHANGING_FSID_V2 flag is set. So, the disk could be assembled incorrectly, as in the example below: Before this patch: Consider the following two filesystems: /dev/loop[2-3] are raw copies of /dev/loop[0-1] and the btrsftune -m operation fails. In this scenario, as the /dev/loop0's fsid change is interrupted, and the CHANGING_FSID_V2 flag is set as shown below. $ p="device\|devid\|^metadata_uuid\|^fsid\|^incom\|^generation\|^flags" $ btrfs inspect dump-super /dev/loop0 \| egrep '$p' superblock: bytenr=65536, device=/dev/loop0 flags 0x1000000001 fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b generation 9 num_devices 2 incompat_flags 0x741 dev_item.devid 1 $ btrfs inspect dump-super /dev/loop1 \| egrep '$p' superblock: bytenr=65536, device=/dev/loop1 flags 0x1 fsid 11d2af4d-1b71-45a9-83f6-f2100766939d metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b generation 10 num_devices 2 incompat_flags 0x741 dev_item.devid 2 $ btrfs inspect dump-super /dev/loop2 \| egrep '$p' superblock: bytenr=65536, device=/dev/loop2 flags 0x1 fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b generation 8 num_devices 2 incompat_flags 0x741 dev_item.devid 1 $ btrfs inspect dump-super /dev/loop3 \| egrep '$p' superblock: bytenr=65536, device=/dev/loop3 flags 0x1 fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b generation 8 num_devices 2 incompat_flags 0x741 dev_item.devid 2 It is normal that some devices aren't instantly discovered during system boot or iSCSI discovery. The controlled scan below demonstrates this. $ btrfs device scan --forget $ btrfs device scan /dev/loop0 Scanning for btrfs filesystems on '/dev/loop0' $ mount /dev/loop3 /btrfs $ btrfs filesystem show -m Label: none uuid: 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 Total devices 2 FS bytes used 144.00KiB devid 1 size 300.00MiB used 48.00MiB path /dev/loop0 devid 2 size 300.00MiB used 40.00MiB path /dev/loop3 /dev/loop0 and /dev/loop3 are incorrectly partnered. This kernel patch removes functions and code connected to the CHANGING_FSID_V2 flag. With this patch, now devices with the CHANGING_FSID_V2 flag are rejected. And its partner will fail to mount with the extra -o degraded option. The check is removed from open_ctree(), devices are rejected during scanning which in turn fails the mount. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
# 568220fa	14-Sep-2023	Johannes Thumshirn <johannes.thumshirn@wdc.com>	btrfs: zoned: support RAID0/1/10 on top of raid stripe tree When we have a raid-stripe-tree, we can do RAID0/1/10 on zoned devices for data block groups. For metadata block groups, we don't actually need anything special, as all metadata I/O is protected by the btrfs_zoned_meta_io_lock() already. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>