#
5ab2b180 |
|
14-Feb-2024 |
David Sterba <dsterba@suse.com> |
btrfs: factor out validation of btrfs_ioctl_vol_args::name The validation of vol args name in several ioctls is not done properly. a terminating NUL is written to the end of the buffer unconditionally, assuming that this would be the last place in case the buffer is used completely. This does not communicate back the actual error (either an invalid or too long path). Factor out all such cases and use a helper to do the verification, simply look for NUL in the buffer. There's no expected practical change, the size of buffer is 4088, this is enough for most paths or names. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
2b712e3b |
|
25-Jan-2024 |
David Sterba <dsterba@suse.com> |
btrfs: remove unused included headers With help of neovim, LSP and clangd we can identify header files that are not actually needed to be included in the .c files. This is focused only on removal (with minor fixups), further cleanups are possible but will require doing the header files properly with forward declarations, minimized includes and include-what-you-use care. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
4e00422e |
|
16-Jan-2024 |
David Sterba <dsterba@suse.com> |
btrfs: replace sb::s_blocksize by fs_info::sectorsize The block size stored in the super block is used by subsystems outside of btrfs and it's a copy of fs_info::sectorsize. Unify that to always use our sectorsize, with the exception of mount where we first need to use fixed values (4K) until we read the super block and can set the sectorsize. Replace all uses, in most cases it's fewer pointer indirections. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
2018ef1d |
|
09-Jan-2024 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: use the original mount's mount options for the legacy reconfigure btrfs/330, which tests our old trick to allow mount -o ro,subvol=/x /dev/sda1 /foo mount -o rw,subvol=/y /dev/sda1 /bar fails on the block group tree. This is because we aren't preserving the mount options for what is essentially a remount, and thus we're ending up without the FREE_SPACE_TREE mount option, which triggers our free space tree delete codepath. This isn't possible with the block group tree and thus it falls over. Fix this by making sure we copy the existing mount options for the existing fs mount over in this case. Fixes: f044b318675f ("btrfs: handle the ro->rw transition for mounting different subvolumes") Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
a1912f71 |
|
21-Nov-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: remove code for inode_cache and recovery mount options We've deprecated these a while ago in 5.11, go ahead and remove the code for them. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
9fb3b1a7 |
|
21-Nov-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: set clear_cache if we use usebackuproot We're currently setting this when we try to load the roots and we see that usebackuproot is set. Instead set this at mount option parsing time. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
83e3a40a |
|
21-Nov-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move one shot mount option clearing to super.c There's no reason this has to happen in open_ctree, and in fact in the old mount API we had to call this from remount. Move this to super.c, unexport it, and call it from both mount and reconfigure. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
6941823c |
|
21-Nov-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: remove old mount API code Now that we've switched to the new mount API, remove the old stuff. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
41d46b29 |
|
21-Nov-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move the device specific mount options to super.c We add these mount options based on the fs_devices settings, which can be set once we've opened the fs_devices. Move these into their own helper and call it from get_tree_super. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
ad21f15b |
|
21-Nov-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: switch to the new mount API Now that we have all of the parts in place to use the new mount API, switch our fs_type to use the new callbacks. There are a few things that have to be done at the same time because of the order of operations changes that come along with the new mount API. These must be done in the same patch otherwise things will go wrong. 1. Export and use btrfs_check_options in open_ctree(). This is because the options are done ahead of time, and we need to check them once we have the feature flags loaded. 2. Update the free space cache settings. Since we're coming in with the options already set we need to make sure we don't undo what the user has asked for. 3. Set our sb_flags at init_fs_context time, the fs_context stuff is trying to manage the sb_flagss itself, so move that into init_fs_context and out of the fill super part. Additionally I've marked the unused functions with __maybe_unused and will remove them in a future patch. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
f044b318 |
|
21-Nov-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: handle the ro->rw transition for mounting different subvolumes This is a special case that we've carried around since 0723a0473fb4 ("btrfs: allow mounting btrfs subvolumes with different ro/rw options") where we'll under the covers flip the file system to RW if you're mixing and matching ro/rw options with different subvol mounts. The first mount is what the super gets setup as, so we'd handle this by remount the super as rw under the covers to facilitate this behavior. With the new mount API we can't really allow this, because user space has the ability to specify the super block settings, and the mount settings. So if the user explicitly sets the super block as read only, and then tried to mount a rw mount with the super block we'll reject this. However the old API was less descriptive and thus we allowed this kind of behavior. This patch preserves this behavior for the old API calls. This is inspired by Christians work [1], and includes his comment in btrfs_get_tree_super() explaining the history and how it all works in the old and new APIs. Link: https://lore.kernel.org/all/20230626-fs-btrfs-mount-api-v1-2-045e9735a00b@kernel.org/ Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
3bb17a25 |
|
21-Nov-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: add get_tree callback for new mount API This is the actual mounting callback for the new mount API. Implement this using our current fill super as a guideline, making the appropriate adjustments for the new mount API. Our old mount operation had two fs_types, one to handle the actual opening, and the one that we called to handle the actual opening and then did the subvol lookup for returning the actual root dentry. This is mirrored here, but simply with different behaviors for ->get_tree. We use the existence of ->s_fs_info to tell which part we're in. The initial call allocates the fs_info, then call mount_fc() with a duplicated fc to do the actual open_ctree part. Then we take that vfsmount and use it to look up our subvolume that we're mounting and return that as our s_root. This idea was taken from Christians attempt to convert us to the new mount API [1]. In btrfs_get_tree_super() the mount device is scanned and opened in one go under uuid_mutex we expect that all related devices have been already scanned, either by mount or from the outside. A device forget can be called on some of the devices as the whole context is not protected but it's an unlikely event, though it's a minor behaviour change. References: https://lore.kernel.org/all/20230626-fs-btrfs-mount-api-v1-2-045e9735a00b@kernel.org/ Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add note about device scanning ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
eddb1a43 |
|
21-Nov-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: add reconfigure callback for fs_context This is what is used to remount the file system with the new mount API. Because the mount options are parsed separately and one at a time I've added a helper to emit the mount options after the fact once the mount is configured, this matches the dmesg output for what happens with the old mount API. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
0f85e244 |
|
21-Nov-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: add fs context handling functions We are going to use the fs context to hold the mount options, so allocate the btrfs_fs_context when we're asked to init the fs context, and free it in the free callback. Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
17b36120 |
|
21-Nov-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: add parse_param callback for the new mount API The parse_param callback handles one parameter at a time, take our existing mount option parsing loop and adjust it to handle one parameter at a time, and tie it into the fs_context_operations. Create a btrfs_fs_context object that will store the various mount properties, we'll house this in fc->fs_private. This is necessary to separate because remounting will use ->reconfigure, and we'll get a new copy of the parsed parameters, so we can no longer directly mess with the fs_info in this stage. In the future we'll add this to the btrfs_fs_info and update the users to use the new context object instead. There's a change how the option device= is processed. Previously all mount options were parsed in one go under uuid_mutex and the devices opened. This prevented a concurrent scan to happen during mount. Now we could see a device scan happen (e.g. by udev) but this should not affect the end result, mount will either see the populated fs_devices or will scan the device by itself. Alternatively we could save all the device paths first and then process them in one go as before but this does not seem to be necessary. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add note about device scanning ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
15ddcdd3 |
|
21-Nov-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: add fs_parameter definitions In order to convert to the new mount API we have to change how we do the mount option parsing. For now we're going to duplicate these helpers to make it easier to follow, and then remove the old code once everything is in place. This patch contains the re-definition of all of our mount options into the new fs_parameter_spec format. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
9ef40c2e |
|
21-Nov-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: split out ro->rw and rw->ro helpers into their own functions When we remount ro->rw or rw->ro we have some cleanup tasks that have to be managed. Split these out into their own function to make btrfs_remount smaller. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
a6a8f22a |
|
21-Nov-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move space cache settings into open_ctree Currently we pre-load the space cache settings in btrfs_parse_options, however when we switch to the new mount API the mount option parsing will happen before we have the super block loaded. Add a helper to set the appropriate options based on the fs settings, this will allow us to have consistent free space cache settings. This also folds in the space cache related decisions we make for subpage sectorsize support, so all of this is done in one place. Since this was being called by parse options it looks like we're changing the behavior of remount, but in fact we aren't. The pre-loading of the free space cache settings is done because we want to handle the case of users not using any space_cache options, we'll derive the appropriate mount option based on the on disk state. On remount this wouldn't reset anything as we'll have cleared the v1 cache generation if we mounted -o nospace_cache. Similarly it's impossible to turn off the free space tree without specifically saying -o nospace_cache,clear_cache, which will delete the free space tree and clear the compat_ro option. Again in this case calling this code in remount wouldn't result in any change. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
2b41b19d |
|
21-Nov-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: split out the mount option validation code into its own helper We're going to need to validate mount options after they're all parsed with the new mount API, split this code out into its own helper so we can use it when we swap over to the new mount API. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ minor adjustments in the messages ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
ead62267 |
|
01-Nov-2023 |
Jan Kara <jack@suse.cz> |
btrfs: Do not restrict writes to btrfs devices Btrfs device probing code needs adaptation so that it works when writes are restricted to its mounted devices. Since btrfs maintainer wants to merge these changes through btrfs tree and there are review bandwidth issues with that, let's not block all other filesystems and just not restrict writes to btrfs devices for now. CC: <linux-btrfs@vger.kernel.org> CC: David Sterba <dsterba@suse.com> CC: Josef Bacik <josef@toxicpanda.com> CC: Chris Mason <clm@fb.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231101174325.10596-4-jack@suse.cz Acked-by: David Sterba <dsterba@suse.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
2db31320 |
|
01-Nov-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: add dmesg output for first mount and last unmount of a filesystem There is a feature request to add dmesg output when unmounting a btrfs. There are several alternative methods to do the same thing, but with their own problems: - Use eBPF to watch btrfs_put_super()/open_ctree() Not end user friendly, they have to dip their head into the source code. - Watch for directory /sys/fs/<uuid>/ This is way more simple, but still requires some simple device -> uuid lookups. And a script needs to use inotify to watch /sys/fs/. Compared to all these, directly outputting the information into dmesg would be the most simple one, with both device and UUID included. And since we're here, also add the output when mounting a filesystem for the first time for parity. A more fine grained monitoring of subvolume mounts should be done by another layer, like audit. Now mounting a btrfs with all default mkfs options would look like this: [81.906566] BTRFS info (device dm-8): first mount of filesystem 633b5c16-afe3-4b79-b195-138fe145e4f2 [81.907494] BTRFS info (device dm-8): using crc32c (crc32c-intel) checksum algorithm [81.908258] BTRFS info (device dm-8): using free space tree [81.912644] BTRFS info (device dm-8): auto enabling async discard [81.913277] BTRFS info (device dm-8): checking UUID tree [91.668256] BTRFS info (device dm-8): last unmount of filesystem 633b5c16-afe3-4b79-b195-138fe145e4f2 CC: stable@vger.kernel.org # 5.4+ Link: https://github.com/kdave/btrfs-progs/issues/689 Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
1720f5dd |
|
11-Sep-2023 |
Qi Zheng <zhengqi.arch@bytedance.com> |
fs: super: dynamically allocate the s_shrink In preparation for implementing lockless slab shrink, use new APIs to dynamically allocate the s_shrink, so that it can be freed asynchronously via RCU. Then it doesn't need to wait for RCU read-side critical section when releasing the struct super_block. Link: https://lkml.kernel.org/r/20230911094444.68966-39-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: David Sterba <dsterba@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Chao Yu <chao@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Chuck Lever <cel@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nadav Amit <namit@vmware.com> Cc: Neil Brown <neilb@suse.de> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sean Paul <sean@poorly.run> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Song Liu <song@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Tom Talpey <tom@talpey.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yue Hu <huyue2@coolpad.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
0124855f |
|
04-Oct-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: add and use helpers for reading and writing last_trans_committed Currently the last_trans_committed field of struct btrfs_fs_info is modified and read without any locking or other protection. For example early in the fsync path, skip_inode_logging() is called which reads fs_info->last_trans_committed, but at the same time we can have a transaction commit completing and updating that field. In the case of an fsync this is harmless and any data race should be rare and at most cause an unnecessary logging of an inode. To avoid data race warnings from tools like KCSAN and other issues such as load and store tearing (amongst others, see [1]), create helpers to access the last_trans_committed field of struct btrfs_fs_info using READ_ONCE() and WRITE_ONCE(), and use these helpers everywhere. [1] https://lwn.net/Articles/793253/ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
bc27d6f0 |
|
08-Sep-2023 |
Anand Jain <anand.jain@oracle.com> |
btrfs: scan but don't register device on single device filesystem After the commit 5f58d783fd78 ("btrfs: free device in btrfs_close_devices for a single device filesystem") we unregister the device from the kernel memory upon unmounting for a single device. So, device registration that was performed before mounting if any is no longer in the kernel memory. However, in fact, note that device registration is unnecessary for a single-device btrfs filesystem unless it's a seed device. So for commands like 'btrfs device scan' or 'btrfs device ready' with a non-seed single-device btrfs filesystem, they can return success just after superblock verification and without the actual device scan. When 'device scan --forget' is called on such device no error is returned. The seed device must remain in the kernel memory to allow the sprout device to mount without the need to specify the seed device explicitly. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
732fab95 |
|
08-Sep-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: check-integrity: remove CONFIG_BTRFS_FS_CHECK_INTEGRITY option Since all check-integrity entry points have been removed, let's also remove the config and all related code relying on that. And since we have removed the mount option for check-integrity, we also need to re-number all the BTRFS_MOUNT_* enums. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
c60a2880 |
|
25-Aug-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: include linux/security.h in super.c We use some of the security related code in here, include it in super.c so we can remove the include from ctree.h. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
54f67dec |
|
10-Oct-2023 |
David Sterba <dsterba@suse.com> |
Revert "btrfs: reject unknown mount options early" This reverts commit 5f521494cc73520ffac18ede0758883b9aedd018. The patch breaks mounts with security mount options like $ mount -o context=system_u:object_r:root_t:s0 /dev/sdX /mn mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdX, missing codepage or helper program, ... We cannot reject all unknown options in btrfs_parse_subvol_options() as intended, the security options can be present at this point and it's not possible to enumerate them in a future proof way. This means unknown mount options are silently accepted like before when the filesystem is mounted with either -o subvol=/path or as followup mounts of the same device. Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com Signed-off-by: David Sterba <dsterba@suse.com>
|
#
5f521494 |
|
26-Sep-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: reject unknown mount options early [BUG] The following script would allow invalid mount options to be specified (although such invalid options would just be ignored): # mkfs.btrfs -f $dev # mount $dev $mnt1 <<< Successful mount expected # mount $dev $mnt2 -o junk <<< Failed mount expected # echo $? 0 [CAUSE] For the 2nd mount, since the fs is already mounted, we won't go through open_ctree() thus no btrfs_parse_options(), but only through btrfs_parse_subvol_options(). However we do not treat unrecognized options from valid but irrelevant options, thus those invalid options would just be ignored by btrfs_parse_subvol_options(). [FIX] Add the handling for Opt_err to handle invalid options and error out, while still ignore other valid options inside btrfs_parse_subvol_options(). Reported-by: Anand Jain <anand.jain@oracle.com> CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
58bfe2cc |
|
18-Sep-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: properly report 0 avail for very full file systems A user reported some issues with smaller file systems that get very full. While investigating this issue I noticed that df wasn't showing 100% full, despite having 0 chunk space and having < 1MiB of available metadata space. This turns out to be an overflow issue, we're doing: total_available_metadata_space - SZ_4M < global_block_rsv_size to determine if there's not enough space to make metadata allocations, which overflows if total_available_metadata_space is < 4M. Fix this by checking to see if our available space is greater than the 4M threshold. This makes df properly report 100% usage on the file system. CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
efd34f03 |
|
20-Sep-2023 |
Christian Brauner <brauner@kernel.org> |
Revert "btrfs: convert to multigrain timestamps" This reverts commit 50e9ceef1d4f644ee0049e82e360058a64ec284c. Users reported regressions due to enabling multi-grained timestamps unconditionally. As no clear consensus on a solution has come up and the discussion has gone back to the drawing board revert the infrastructure changes for. If it isn't code that's here to stay, make it go away. Message-ID: <20230920-keine-eile-c9755b5825db@brauner> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
16c3a476 |
|
26-Jun-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: deprecate integrity checker feature The integrity checker feature needs to be enabled at compile time (BTRFS_FS_CHECK_INTEGRITY) and then enabled by mount options check_int*. Although it provides some unique features which can not be provided by any other sanity checks like tree-checker, it does not only have high CPU and memory overhead, but is also a maintenance burden. For example, it's the only caller of btrfs_map_block() with @need_raid_map = 0. Considering most btrfs developers are not even testing this feature, I'm here to propose deprecation of this feature. For now only warning messages will be printed, the feature itself would still work. Removal time has been set to 6.7 release. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
50e9ceef |
|
07-Aug-2023 |
Jeff Layton <jlayton@kernel.org> |
btrfs: convert to multigrain timestamps Enable multigrain timestamps, which should ensure that there is an apparent change to the timestamp whenever it has been written after being actively observed via getattr. Beyond enabling the FS_MGTIME flag, this patch eliminates update_time_for_write, which goes to great pains to avoid in-memory stores. Just have it overwrite the timestamps unconditionally. Signed-off-by: Jeff Layton <jlayton@kernel.org> Acked-by: David Sterba <dsterba@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230807-mgctime-v7-13-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
8bfec2e4 |
|
03-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: remove hipri_workers workqueue Now that btrfs_wq_submit_bio is never called for synchronous I/O, the hipri_workers workqueue is not used anymore and can be removed. Reviewed-by: Chris Mason <clm@fb.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
05bdb996 |
|
08-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
block: replace fmode_t with a block-specific type for block open flags The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
3f0b3e78 |
|
08-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
block: add a sb_open_mode helper Add a helper to return the open flags for blkdev_get_by* for passed in super block flags instead of open coding the logic in many places. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
2ef78928 |
|
08-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: don't pass a holder for non-exclusive blkdev_get_by_path Passing a holder to blkdev_get_by_path when FMODE_EXCL isn't set doesn't make sense, so pass NULL instead and remove the holder argument from the call chains the only end up in non-FMODE_EXCL blkdev_get_by_path calls. Exclusive mode for device scanning is not used since commit 50d281fc434c ("btrfs: scan device in non-exclusive mode")". Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: David Sterba <dsterba@suse.com> Link: https://lore.kernel.org/r/20230608110258.189493-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
981a37ba |
|
05-Jun-2023 |
Chris Mason <clm@fb.com> |
btrfs: properly enable async discard when switching from RO->RW The async discard uses the BTRFS_FS_DISCARD_RUNNING bit in the fs_info to force discards off when the filesystem has aborted or we're generally not able to run discards. This gets flipped on when we're mounted rw, and also when we go from ro->rw. Commit 63a7cb13071842 ("btrfs: auto enable discard=async when possible") enabled async discard by default, and this meant "mount -o ro /dev/xxx /yyy" had async discards turned on. Unfortunately, this meant our check in btrfs_remount_cleanup() would see that discards are already on: /* If we toggled discard async */ if (!btrfs_raw_test_opt(old_opts, DISCARD_ASYNC) && btrfs_test_opt(fs_info, DISCARD_ASYNC)) btrfs_discard_resume(fs_info); So, we'd never call btrfs_discard_resume() when remounting the root filesystem from ro->rw. drgn shows this really nicely: import os import sys from drgn.helpers.linux.fs import path_lookup from drgn import NULL, Object, Type, cast def btrfs_sb(sb): return cast("struct btrfs_fs_info *", sb.s_fs_info) if len(sys.argv) == 1: path = "/" else: path = sys.argv[1] fs_info = cast("struct btrfs_fs_info *", path_lookup(prog, path).mnt.mnt_sb.s_fs_info) BTRFS_FS_DISCARD_RUNNING = 1 << prog['BTRFS_FS_DISCARD_RUNNING'] if fs_info.flags & BTRFS_FS_DISCARD_RUNNING: print("discard running flag is on") else: print("discard running flag is off") [root]# mount | grep nvme /dev/nvme0n1p3 on / type btrfs (rw,relatime,compress-force=zstd:3,ssd,discard=async,space_cache=v2,subvolid=5,subvol=/) [root]# ./discard_running.drgn discard running flag is off [root]# mount -o remount,discard=sync / [root]# mount -o remount,discard=async / [root]# ./discard_running.drgn discard running flag is on The fix is to call btrfs_discard_resume() when we're going from ro->rw. It already checks to make sure the async discard flag is on, so it'll do the right thing. Fixes: 63a7cb13071842 ("btrfs: auto enable discard=async when possible") CC: stable@vger.kernel.org # 6.3+ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Chris Mason <clm@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
1d6a4fc8 |
|
28-Apr-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: make clear_cache mount option to rebuild FST without disabling it Previously clear_cache mount option would simply disable free-space-tree feature temporarily then re-enable it to rebuild the whole free space tree. But this is problematic for block-group-tree feature, as we have an artificial dependency on free-space-tree feature. If we go the existing method, after clearing the free-space-tree feature, we would flip the filesystem to read-only mode, as we detect a super block write with block-group-tree but no free-space-tree feature. This patch would change the behavior by properly rebuilding the free space tree without disabling this feature, thus allowing clear_cache mount option to work with block group tree. Now we can mount a filesystem with block-group-tree feature and clear_mount option: $ mkfs.btrfs -O block-group-tree /dev/test/scratch1 -f $ sudo mount /dev/test/scratch1 /mnt/btrfs -o clear_cache $ sudo dmesg -t | head -n 5 BTRFS info (device dm-1): force clearing of disk cache BTRFS info (device dm-1): using free space tree BTRFS info (device dm-1): auto enabling async discard BTRFS info (device dm-1): rebuilding free space tree BTRFS info (device dm-1): checking UUID tree CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
64b5d5b2 |
|
26-Apr-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: properly reject clear_cache and v1 cache for block-group-tree [BUG] With block-group-tree feature enabled, mounting it with clear_cache would cause the following transaction abort at mount or remount: BTRFS info (device dm-4): force clearing of disk cache BTRFS info (device dm-4): using free space tree BTRFS info (device dm-4): auto enabling async discard BTRFS info (device dm-4): clearing free space tree BTRFS info (device dm-4): clearing compat-ro feature flag for FREE_SPACE_TREE (0x1) BTRFS info (device dm-4): clearing compat-ro feature flag for FREE_SPACE_TREE_VALID (0x2) BTRFS error (device dm-4): block-group-tree feature requires fres-space-tree and no-holes BTRFS error (device dm-4): super block corruption detected before writing it to disk BTRFS: error (device dm-4) in write_all_supers:4288: errno=-117 Filesystem corrupted (unexpected superblock corruption detected) BTRFS warning (device dm-4: state E): Skipping commit of aborted transaction. [CAUSE] For block-group-tree feature, we have an artificial dependency on free-space-tree. This means if we detect block-group-tree without v2 cache, we consider it a corruption and cause the problem. For clear_cache mount option, it would temporary disable v2 cache, then re-enable it. But unfortunately for that temporary v2 cache disabled status, we refuse to write a superblock with bg tree only flag, thus leads to the above transaction abortion. [FIX] For now, just reject clear_cache and v1 cache mount option for block group tree. So now we got a graceful rejection other than a transaction abort: BTRFS info (device dm-4): force clearing of disk cache BTRFS error (device dm-4): cannot disable free space tree with block-group-tree feature BTRFS error (device dm-4): open_ctree failed CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
6e7a367e |
|
04-Apr-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: don't print the crc32c implementation at module load time Btrfs can use various different checksumming algorithms, and prints the one used for a given file system at mount time. Don't bother printing the crc32c implementation at module load time, the information is available in /sys/fs/btrfs/FSID/checksum. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
13b98989 |
|
07-Feb-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: use btrfs_handle_fs_error in btrfs_fill_super While trying to track down a lost EIO problem I hit the following assertion while doing my error injection testing BTRFS warning (device nvme1n1): transaction 1609 (with 180224 dirty metadata bytes) is not committed assertion failed: !found, in fs/btrfs/disk-io.c:4456 ------------[ cut here ]------------ kernel BUG at fs/btrfs/messages.h:169! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 1445 Comm: mount Tainted: G W 6.2.0-rc5+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:btrfs_assertfail.constprop.0+0x18/0x1a RSP: 0018:ffffb95fc3b0bc68 EFLAGS: 00010286 RAX: 0000000000000034 RBX: ffff9941c2ac2000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffffb6741f7d RDI: 00000000ffffffff RBP: ffff9941c2ac2428 R08: 0000000000000000 R09: ffffb95fc3b0bb38 R10: 0000000000000003 R11: ffffffffb71438a8 R12: ffff9941c2ac2428 R13: ffff9941c2ac2450 R14: ffff9941c2ac2450 R15: 000000000002c000 FS: 00007fcea2d07800(0000) GS:ffff9941fbc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f00cc7c83a8 CR3: 000000010c686000 CR4: 0000000000350ef0 Call Trace: <TASK> close_ctree+0x426/0x48f btrfs_mount_root.cold+0x7e/0xee ? legacy_parse_param+0x2b/0x220 legacy_get_tree+0x2b/0x50 vfs_get_tree+0x29/0xc0 vfs_kern_mount.part.0+0x73/0xb0 btrfs_mount+0x11d/0x3d0 ? legacy_parse_param+0x2b/0x220 legacy_get_tree+0x2b/0x50 vfs_get_tree+0x29/0xc0 path_mount+0x438/0xa40 __x64_sys_mount+0xe9/0x130 do_syscall_64+0x3e/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc This is because the error injection did an EIO for the root inode lookup and we simply jumped to closing the ctree. However because we didn't mark the file system as having an error we skipped all of the broken transaction cleanup stuff, and thus triggered this ASSERT(). Fix this by calling btrfs_handle_fs_error() in this case so we have the error set on the file system. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
68d99ab0 |
|
28-Mar-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: fix fast csum implementation detection The BTRFS_FS_CSUM_IMPL_FAST flag is currently set whenever a non-generic crc32c is detected, which is the incorrect check if the file system uses a different checksumming algorithm. Refactor the code to only check this if crc32c is actually used. Note that in an ideal world the information if an algorithm is hardware accelerated or not should be provided by the crypto API instead, but that's left for another day. CC: stable@vger.kernel.org # 5.4.x: c8a5f8ca9a9c: btrfs: print checksum type and implementation at mount time CC: stable@vger.kernel.org # 5.4.x Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
40fac647 |
|
27-Mar-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: restore the thread_pool= behavior in remount for the end I/O workqueues Commit d7b9416fe5c5 ("btrfs: remove btrfs_end_io_wq") converted the read and I/O handling from btrfs_workqueues to Linux workqueues, and as part of that lost the code to apply the thread_pool= based max_active limit on remount. Restore it. Fixes: d7b9416fe5c5 ("btrfs: remove btrfs_end_io_wq") CC: stable@vger.kernel.org # 6.0+ Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
67da05b3 |
|
17-Jan-2023 |
Colin Ian King <colin.i.king@gmail.com> |
btrfs: fix spelling mistakes found using codespell There quite a few spelling mistakes as found using codespell. Fix them. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
cfc2de0f |
|
15-Dec-2022 |
Boris Burkov <boris@bur.io> |
btrfs: pass find_free_extent_ctl to allocator tracepoints The allocator tracepoints currently have a pile of values from ffe_ctl. In modifying the allocator and adding more tracepoints, I found myself adding to the already long argument list of the tracepoints. It makes it a lot simpler to just send in the ffe_ctl itself. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
2ba48b20 |
|
21-Dec-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: fix compat_ro checks against remount [BUG] Even with commit 81d5d61454c3 ("btrfs: enhance unsupported compat RO flags handling"), btrfs can still mount a fs with unsupported compat_ro flags read-only, then remount it RW: # btrfs ins dump-super /dev/loop0 | grep compat_ro_flags -A 3 compat_ro_flags 0x403 ( FREE_SPACE_TREE | FREE_SPACE_TREE_VALID | unknown flag: 0x400 ) # mount /dev/loop0 /mnt/btrfs mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error. dmesg(1) may have more information after failed mount system call. ^^^ RW mount failed as expected ^^^ # dmesg -t | tail -n5 loop0: detected capacity change from 0 to 1048576 BTRFS: device fsid cb5b82f5-0fdd-4d81-9b4b-78533c324afa devid 1 transid 7 /dev/loop0 scanned by mount (1146) BTRFS info (device loop0): using crc32c (crc32c-intel) checksum algorithm BTRFS info (device loop0): using free space tree BTRFS error (device loop0): cannot mount read-write because of unknown compat_ro features (0x403) BTRFS error (device loop0): open_ctree failed # mount /dev/loop0 -o ro /mnt/btrfs # mount -o remount,rw /mnt/btrfs ^^^ RW remount succeeded unexpectedly ^^^ [CAUSE] Currently we use btrfs_check_features() to check compat_ro flags against our current mount flags. That function get reused between open_ctree() and btrfs_remount(). But for btrfs_remount(), the super block we passed in still has the old mount flags, thus btrfs_check_features() still believes we're mounting read-only. [FIX] Replace the existing @sb argument with @is_rw_mount. As originally we only use @sb to determine if the mount is RW. Now it's callers' responsibility to determine if the mount is RW, and since there are only two callers, the check is pretty simple: - caller in open_ctree() Just pass !sb_rdonly(). - caller in btrfs_remount() Pass !(*flags & SB_RDONLY), as our check should be against the new flags. Now we can correctly reject the RW remount: # mount /dev/loop0 -o ro /mnt/btrfs # mount -o remount,rw /mnt/btrfs mount: /mnt/btrfs: mount point not mounted or bad option. dmesg(1) may have more information after failed mount system call. # dmesg -t | tail -n 1 BTRFS error (device loop0: state M): cannot mount read-write because of unknown compat_ro features (0x403) Reported-by: Chung-Chiang Cheng <shepjeng@gmail.com> Fixes: 81d5d61454c3 ("btrfs: enhance unsupported compat RO flags handling") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
c68f7290 |
|
13-Dec-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: fix leak of fs devices after removing btrfs module When removing the btrfs module we are not calling btrfs_cleanup_fs_uuids() which results in leaking btrfs_fs_devices structures and other resources. This is a regression recently introduced by a refactoring of the module initialization and exit sequence, which simply removed the call to btrfs_cleanup_fs_uuids() in the exit path, resulting in the leaks. So fix this by calling btrfs_cleanup_fs_uuids() at exit_btrfs_fs(). Fixes: 5565b8e0adcd ("btrfs: make module init/exit match their sequence") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
103c1972 |
|
15-Nov-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: split the bio submission path into a separate file The code used by btrfs_submit_bio only interacts with the rest of volumes.c through __btrfs_map_block (which itself is a more generic version of two exported helpers) and does not really have anything to do with volumes.c. Create a new bio.c file and a bio.h header going along with it for the btrfs_bio-based storage layer, which will grow even more going forward. Also update the file with my copyright notice given that a large part of the moved code was written or rewritten by me. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
cb3e217b |
|
12-Nov-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: use btrfs_dev_name() helper to handle missing devices better [BUG] If dev-replace failed to re-construct its data/metadata, the kernel message would be incorrect for the missing device: BTRFS info (device dm-1): dev_replace from <missing disk> (devid 2) to /dev/mapper/test-scratch2 started BTRFS error (device dm-1): failed to rebuild valid logical 38862848 for dev (efault) Note the above "dev (efault)" of the second line. While the first line is properly reporting "<missing disk>". [CAUSE] Although dev-replace is using btrfs_dev_name(), the heavy lifting work is still done by scrub (scrub is reused by both dev-replace and regular scrub). Unfortunately scrub code never uses btrfs_dev_name() helper, as it's only declared locally inside dev-replace.c. [FIX] Fix the output by: - Move the btrfs_dev_name() helper to volumes.h - Use btrfs_dev_name() to replace open-coded rcu_str_deref() calls Only zoned code is not touched, as I'm not familiar with degraded zoned code. - Constify return value and parameter Now the output looks pretty sane: BTRFS info (device dm-1): dev_replace from <missing disk> (devid 2) to /dev/mapper/test-scratch2 started BTRFS error (device dm-1): failed to rebuild valid logical 38862848 for dev <missing disk> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
c03b2207 |
|
26-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move super prototypes into super.h Move these out of ctree.h into super.h to cut down on code in ctree.h. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
5c11adcc |
|
26-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move verity prototypes into verity.h Move these out of ctree.h into verity.h to cut down on code in ctree.h. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
2fc6822c |
|
26-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move scrub prototypes into scrub.h Move these out of ctree.h into scrub.h to cut down on code in ctree.h. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
7572dec8 |
|
26-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move ioctl prototypes into ioctl.h Move these out of ctree.h into ioctl.h to cut down on code in ctree.h. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
f2b39277 |
|
26-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move dir-item prototypes into dir-item.h Move these prototypes out of ctree.h and into their own header file. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
59b818e0 |
|
26-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move defrag related prototypes to their own header Now that the defrag code is all in one file, create a defrag.h and move all the defrag related prototypes and helper out of ctree.h and into defrag.h. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
083bd7e5 |
|
26-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move the printk and assert helpers to messages.c These helpers are core to btrfs, and in order to more easily sync various parts of the btrfs kernel code into btrfs-progs we need to be able to carry these helpers with us. However we want to have our own implementation for the helpers themselves, currently they're implemented in different files that we want to sync inside of btrfs-progs itself. Move these into their own C file, this will allow us to contain our overrides in btrfs-progs in it's own file without messing with the rest of the codebase. In copying things over I fixed up a few whitespace errors that already existed. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
6db75318 |
|
19-Oct-2022 |
Sweet Tea Dorminy <sweettea-kernel@dorminy.me> |
btrfs: use struct fscrypt_str instead of struct qstr While struct qstr is more natural without fscrypt, since it's provided by dentries, struct fscrypt_str is provided by the fscrypt handlers processing dentries, and is thus more natural in the fscrypt world. Replace all of the struct qstr uses with struct fscrypt_str. Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
e43eec81 |
|
19-Oct-2022 |
Sweet Tea Dorminy <sweettea-kernel@dorminy.me> |
btrfs: use struct qstr instead of name and namelen pairs Many functions throughout btrfs take name buffer and name length arguments. Most of these functions at the highest level are usually called with these arguments extracted from a supplied dentry's name. But the entire name can be passed instead, making each function a little more elegant. Each function whose arguments are currently the name and length extracted from a dentry is herein converted to instead take a pointer to the name in the dentry. The couple of calls to these calls without a struct dentry are converted to create an appropriate qstr to pass in. Additionally, every function which is only called with a name/len extracted directly from a qstr is also converted. This change has positive effect on stack consumption, frame of many functions is reduced but this will be used in the future for fscrypt related structures. Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
82c0efd3 |
|
19-Oct-2022 |
Anand Jain <anand.jain@oracle.com> |
btrfs: merge module cleanup sequence to one helper The module exit function exit_btrfs_fs() is duplicating a section of code in init_btrfs_fs(). Add a helper to remove the duplicated code. Due to the init/exit section requirements the function must be inline and not a plain static as it could cause section mismatch. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
07e81dc9 |
|
19-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move accessor helpers into accessors.h This is a large patch, but because they're all macros it's impossible to split up. Simply copy all of the item accessors in ctree.h and paste them in accessors.h, and then update any files to include the header so everything compiles. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ reformat comments, style fixups ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
c52cc7b7 |
|
19-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: add a BTRFS_FS_NEED_TRANS_COMMIT flag Currently we are only using fs_info->pending_changes to indicate that we need a transaction commit. The original users for this were removed years ago and we don't have more usage in sight, so this is the only remaining reason to have this field. Add a flag so we can remove this code. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
bbde07a4 |
|
19-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: push printk index code into their respective helpers The printk index work can be pushed into the printk helpers themselves, this allows us to further sanitize messages.h, removing the last include in the header itself. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
9b569ea0 |
|
19-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move the printk helpers out of ctree.h We have a bunch of printk helpers that are in ctree.h. These have nothing to do with ctree.c, so move them into their own header. Subsequent patches will cleanup the printk helpers. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
e118578a |
|
19-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move assert helpers out of ctree.h These call functions that aren't defined in, or will be moved out of, ctree.h Move them to super.c where the other assert/error message code is defined. Drop the __noreturn attribute for btrfs_assertfail as objtool does not like it and fails with warnings like fs/btrfs/dir-item.o: warning: objtool: .text.unlikely: unexpected end of section fs/btrfs/xattr.o: warning: objtool: btrfs_setxattr() falls through to next function btrfs_setxattr_trans.cold() fs/btrfs/xattr.o: warning: objtool: .text.unlikely: unexpected end of section Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
c7f13d42 |
|
19-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move fs wide helpers out of ctree.h We have several fs wide related helpers in ctree.h. The bulk of these are the incompat flag test helpers, but there are things such as btrfs_fs_closing() and the read only helpers that also aren't directly related to the ctree code. Move these into a fs.h header, which will serve as the location for file system wide related helpers. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
63a7cb13 |
|
26-Jul-2022 |
David Sterba <dsterba@suse.com> |
btrfs: auto enable discard=async when possible There's a request to automatically enable async discard for capable devices. We can do that, the async mode is designed to wait for larger freed extents and is not intrusive, with limits to iops, kbps or latency. The status and tunables will be exported in /sys/fs/btrfs/FSID/discard . The automatic selection is done if there's at least one discard capable device in the filesystem (not capable devices are skipped). Mounting with any other discard option will honor that option, notably mounting with nodiscard will keep it disabled. Link: https://lore.kernel.org/linux-btrfs/CAEg-Je_b1YtdsCR0zS5XZ_SbvJgN70ezwvRwLiCZgDGLbeMB=w@mail.gmail.com/ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
5565b8e0 |
|
12-Oct-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: make module init/exit match their sequence [BACKGROUND] In theory init_btrfs_fs() and exit_btrfs_fs() should match their sequence, thus normally they should look like this: init_btrfs_fs() | exit_btrfs_fs() ----------------------+------------------------ init_A(); | init_B(); | init_C(); | | exit_C(); | exit_B(); | exit_A(); So is for the error path of init_btrfs_fs(). But it's not the case, some exit functions don't match their init functions sequence in init_btrfs_fs(). Furthermore in init_btrfs_fs(), we need to have a new error label for each new init function we added. This is not really expandable, especially recently we may add several new functions to init_btrfs_fs(). [ENHANCEMENT] The patch will introduce the following things to enhance the situation: - struct init_sequence Just a wrapper of init and exit function pointers. The init function must use int type as return value, thus some init functions need to be updated to return 0. The exit function can be NULL, as there are some init sequence just outputting a message. - struct mod_init_seq[] array This is a const array, recording all the initialization we need to do in init_btrfs_fs(), and the order follows the old init_btrfs_fs(). - bool mod_init_result[] array This is a bool array, recording if we have initialized one entry in mod_init_seq[]. The reason to split mod_init_seq[] and mod_init_result[] is to avoid section mismatch in reference. All init function are in .init.text, but if mod_init_seq[] records the @initialized member it can no longer be const, thus will be put into .data section, and cause modpost warning. For init_btrfs_fs() we just call all init functions in their order in mod_init_seq[] array, and after each call, setting corresponding mod_init_result[] to true. For exit_btrfs_fs() and error handling path of init_btrfs_fs(), we just iterate mod_init_seq[] in reverse order, and skip all uninitialized entry. With this patch, init_btrfs_fs()/exit_btrfs_fs() will be much easier to expand and will always follow the strict order. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
eda517fd |
|
14-Sep-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move free space cachep's out of ctree.h This is local to the free-space-cache.c code, remove it from ctree.h and inode.c, create new init/exit functions for the cachep, and move it locally to free-space-cache.c. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
226463d7 |
|
14-Sep-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move btrfs_path_cachep out of ctree.h This is local to the ctree code, remove it from ctree.h and inode.c, create new init/exit functions for the cachep, and move it locally to ctree.c. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
956504a3 |
|
14-Sep-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move trans_handle_cachep out of ctree.h This is local to the transaction code, remove it from ctree.h and inode.c, create new helpers in the transaction to handle the init work and move the cachep locally to transaction.c. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
3d17adea |
|
17-Oct-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: make thaw time super block check to also verify checksum Previous commit a05d3c915314 ("btrfs: check superblock to ensure the fs was not modified at thaw time") only checks the content of the super block, but it doesn't really check if the on-disk super block has a matching checksum. This patch will add the checksum verification to thaw time superblock verification. This involves the following extra changes: - Export btrfs_check_super_csum() As we need to call it in super.c. - Change the argument list of btrfs_check_super_csum() Instead of passing a char *, directly pass struct btrfs_super_block * pointer. - Verify that our checksum type didn't change before checking the checksum value, like it's done at mount time Fixes: a05d3c915314 ("btrfs: check superblock to ensure the fs was not modified at thaw time") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d7f67ac9 |
|
11-Sep-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: relax block-group-tree feature dependency checks [BUG] When one user did a wrong attempt to clear block group tree, which can not be done through mount option, by using "-o clear_cache,space_cache=v2", it will cause the following error on a fs with block-group-tree feature: BTRFS info (device dm-1): force clearing of disk cache BTRFS info (device dm-1): using free space tree BTRFS info (device dm-1): clearing free space tree BTRFS info (device dm-1): clearing compat-ro feature flag for FREE_SPACE_TREE (0x1) BTRFS info (device dm-1): clearing compat-ro feature flag for FREE_SPACE_TREE_VALID (0x2) BTRFS error (device dm-1): block-group-tree feature requires fres-space-tree and no-holes BTRFS error (device dm-1): super block corruption detected before writing it to disk BTRFS: error (device dm-1) in write_all_supers:4318: errno=-117 Filesystem corrupted (unexpected superblock corruption detected) BTRFS warning (device dm-1: state E): Skipping commit of aborted transaction. [CAUSE] Although the dependency for block-group-tree feature is just an artificial one (to reduce test matrix), we put the dependency check into btrfs_validate_super(). This is too strict, and during space cache clearing, we will have a window where free space tree is cleared, and we need to commit the super block. In that window, we had block group tree without v2 cache, and triggered the artificial dependency check. This is not necessary at all, especially for such a soft dependency. [FIX] Introduce a new helper, btrfs_check_features(), to do all the runtime limitation checks, including: - Unsupported incompat flags check - Unsupported compat RO flags check - Setting missing incompat flags - Artificial feature dependency checks Currently only block group tree will rely on this. - Subpage runtime check for v1 cache With this helper, we can move quite some checks from open_ctree()/btrfs_remount() into it, and just call it after btrfs_parse_options(). Now "-o clear_cache,space_cache=v2" will not trigger the above error anymore. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ edit messages ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
a62a3bd9 |
|
09-Sep-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: separate out the extent state and extent buffer init code In order to help separate the extent buffer from the extent io tree code we need to break up the init functions. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
81d5d614 |
|
08-Aug-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: enhance unsupported compat RO flags handling Currently there are two corner cases not handling compat RO flags correctly: - Remount We can still mount the fs RO with compat RO flags, then remount it RW. We should not allow any write into a fs with unsupported RO flags. - Still try to search block group items In fact, behavior/on-disk format change to extent tree should not need a full incompat flag. And since we can ensure fs with unsupported RO flags never got any writes (with above case fixed), then we can even skip block group items search at mount time. This patch will enhance the unsupported RO compat flags by: - Reject read-write remount if there are unsupported RO compat flags - Go dummy block group items directly for unsupported RO compat flags In fact, only changes to chunk/subvolume/root/csum trees should go incompat flags. The latter part should allow future change to extent tree to be compat RO flags. Thus this patch also needs to be backported to all stable trees. CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
8e327b9c |
|
25-Aug-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: dump all space infos if we abort transaction due to ENOSPC We have hit some transaction abort due to -ENOSPC internally. Normally we should always reserve enough space for metadata for every transaction, thus hitting -ENOSPC should really indicate some cases we didn't expect. But unfortunately current error reporting will only give a kernel warning and stack trace, not really helpful to debug what's causing the problem. And mount option debug_enospc can only help when user can reproduce the problem, but under most cases, such transaction abort by -ENOSPC is really hard to reproduce. So this patch will dump all space infos (data, metadata, system) when we abort the first transaction with -ENOSPC. This should at least provide some clue to us. The example of a dump would look like this: BTRFS: Transaction aborted (error -28) WARNING: CPU: 8 PID: 3366 at fs/btrfs/transaction.c:2137 btrfs_commit_transaction+0xf81/0xfb0 [btrfs] <call trace skipped> ---[ end trace 0000000000000000 ]--- BTRFS info (device dm-1: state A): dumping space info: BTRFS info (device dm-1: state A): space_info DATA has 6791168 free, is not full BTRFS info (device dm-1: state A): space_info total=8388608, used=1597440, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0 BTRFS info (device dm-1: state A): space_info METADATA has 257114112 free, is not full BTRFS info (device dm-1: state A): space_info total=268435456, used=131072, pinned=180224, reserved=65536, may_use=10878976, readonly=65536 zone_unusable=0 BTRFS info (device dm-1: state A): space_info SYSTEM has 8372224 free, is not full BTRFS info (device dm-1: state A): space_info total=8388608, used=16384, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0 BTRFS info (device dm-1: state A): global_block_rsv: size 3670016 reserved 3670016 BTRFS info (device dm-1: state A): trans_block_rsv: size 0 reserved 0 BTRFS info (device dm-1: state A): chunk_block_rsv: size 0 reserved 0 BTRFS info (device dm-1: state A): delayed_block_rsv: size 4063232 reserved 4063232 BTRFS info (device dm-1: state A): delayed_refs_rsv: size 3145728 reserved 3145728 BTRFS: error (device dm-1: state A) in btrfs_commit_transaction:2137: errno=-28 No space left BTRFS info (device dm-1: state EA): forced readonly Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
a05d3c91 |
|
24-Aug-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: check superblock to ensure the fs was not modified at thaw time [BACKGROUND] There is an incident report that, one user hibernated the system, with one btrfs on removable device still mounted. Then by some incident, the btrfs got mounted and modified by another system/OS, then back to the hibernated system. After resuming from the hibernation, new write happened into the victim btrfs. Now the fs is completely broken, since the underlying btrfs is no longer the same one before the hibernation, and the user lost their data due to various transid mismatch. [REPRODUCER] We can emulate the situation using the following small script: truncate -s 1G $dev mkfs.btrfs -f $dev mount $dev $mnt fsstress -w -d $mnt -n 500 sync xfs_freeze -f $mnt cp $dev $dev.backup # There is no way to mount the same cloned fs on the same system, # as the conflicting fsid will be rejected by btrfs. # Thus here we have to wipe the fs using a different btrfs. mkfs.btrfs -f $dev.backup dd if=$dev.backup of=$dev bs=1M xfs_freeze -u $mnt fsstress -w -d $mnt -n 20 umount $mnt btrfs check $dev The final fsck will fail due to some tree blocks has incorrect fsid. This is enough to emulate the problem hit by the unfortunate user. [ENHANCEMENT] Although such case should not be that common, it can still happen from time to time. From the view of btrfs, we can detect any unexpected super block change, and if there is any unexpected change, we just mark the fs read-only, and thaw the fs. By this we can limit the damage to minimal, and I hope no one would lose their data by this anymore. Suggested-by: Goffredo Baroncelli <kreijack@libero.it> Link: https://lore.kernel.org/linux-btrfs/83bf3b4b-7f4c-387a-b286-9251e3991e34@bluemole.com/ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d45cfb88 |
|
06-Aug-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: move btrfs_bio allocation to volumes.c volumes.c is the place that implements the storage layer using the btrfs_bio structure, so move the bio_set and allocation helpers there as well. To make up for the new initialization boilerplate, merge the two init/exit helpers in extent_io.c into a single one. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
dbecac26 |
|
23-Aug-2022 |
Maciej S. Szmigiero <maciej.szmigiero@oracle.com> |
btrfs: don't print information about space cache or tree every remount btrfs currently prints information about space cache or free space tree being in use on every remount, regardless whether such remount actually enabled or disabled one of these features. This is actually unnecessary since providing remount options changing the state of these features will explicitly print the appropriate notice. Let's instead print such unconditional information just on an initial mount to avoid filling the kernel log when, for example, laptop-mode-tools remount the fs on some events. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
e33c267a |
|
31-May-2022 |
Roman Gushchin <roman.gushchin@linux.dev> |
mm: shrinkers: provide shrinkers with names Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g. for superblock's shrinkers it's nice to have at least an idea of to which superblock the shrinker belongs. This commit adds names to shrinkers. register_shrinker() and prealloc_shrinker() functions are extended to take a format and arguments to master a name. In some cases it's not possible to determine a good name at the time when a shrinker is allocated. For such cases shrinker_debugfs_rename() is provided. The expected format is: <subsystem>-<shrinker_type>[:<instance>]-<id> For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair. After this change the shrinker debugfs directory looks like: $ cd /sys/kernel/debug/shrinker/ $ ls dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42 mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43 mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44 rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49 sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13 sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36 sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19 sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10 sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9 sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37 sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38 sb-dax-11 sb-proc-45 sb-tmpfs-35 sb-debugfs-7 sb-proc-46 sb-tmpfs-40 [roman.gushchin@linux.dev: fix build warnings] Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle Reported-by: kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
d09cb9e1 |
|
23-Jun-2022 |
David Sterba <dsterba@suse.com> |
btrfs: use mask for all RAID1* profiles in btrfs_calc_avail_data_space There's a sequence of hard coded values for RAID1 profiles that are already stored in the raid_attr table that should be used instead. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
37f85ec3 |
|
13-Jun-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: use named constant for reserved device space There's a reserved space on each device of size 1MiB that can be used by bootloaders or to avoid accidental overwrite. Use a symbolic constant with the explaining comment instead of hard coding the value and multiple comments. Note: since btrfs-progs v4.1, mkfs.btrfs will reserve the first 1MiB for the primary super block (at offset 64KiB), until then the range could have been used by mistake. Kernel has been always respecting the 1MiB range for writes. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d7b9416f |
|
26-May-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: remove btrfs_end_io_wq All reads bio that go through btrfs_map_bio need to be completed in user context. And read I/Os are the most common and timing critical in almost any file system workloads. Embed a work_struct into struct btrfs_bio and use it to complete all read bios submitted through btrfs_map, using the REQ_META flag to decide which workqueue they are placed on. This removes the need for a separate 128 byte allocation (typically rounded up to 192 bytes by slab) for all reads with a size increase of 24 bytes for struct btrfs_bio. Future patches will reorganize struct btrfs_bio to make use of this extra space for writes as well. (All sizes are based a on typical 64-bit non-debug build) Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
fed8a72d |
|
26-May-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: don't use btrfs_bio_wq_end_io for compressed writes Compressed write bio completion is the only user of btrfs_bio_wq_end_io for writes, and the use of btrfs_bio_wq_end_io is a little suboptimal here as we only real need user context for the final completion of a compressed_bio structure, and not every single bio completion. Add a work_struct to struct compressed_bio instead and use that to call finish_compressed_bio_write. This allows to remove all handling of write bios in the btrfs_bio_wq_end_io infrastructure. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
b8bea09a |
|
01-Jun-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: add trace event for submitted RAID56 bio Add tracepoint for better insight to how the RAID56 data are submitted. The output looks like this: (trace event header and UUID skipped) raid56_read_partial: full_stripe=389152768 devid=3 type=DATA1 offset=32768 opf=0x0 physical=323059712 len=32768 raid56_read_partial: full_stripe=389152768 devid=1 type=DATA2 offset=0 opf=0x0 physical=67174400 len=65536 raid56_write_stripe: full_stripe=389152768 devid=3 type=DATA1 offset=0 opf=0x1 physical=323026944 len=32768 raid56_write_stripe: full_stripe=389152768 devid=2 type=PQ1 offset=0 opf=0x1 physical=323026944 len=32768 The above debug output is from a 32K data write into an empty RAID56 data chunk. Some explanation on the event output: full_stripe: the logical bytenr of the full stripe devid: btrfs devid type: raid stripe type. DATA1: the first data stripe DATA2: the second data stripe PQ1: the P stripe PQ2: the Q stripe offset: the offset inside the stripe. opf: the bio op type physical: the physical offset the bio is for len: the length of the bio The first two lines are from partial RMW read, which is reading the remaining data stripes from disks. The last two lines are for full stripe RMW write, which is writing the involved two 16K stripes (one for DATA1 stripe, one for P stripe). The stripe for DATA2 doesn't need to be written. There are 5 types of trace events: - raid56_read_partial Read remaining data for regular read/write path. - raid56_write_stripe Write the modified stripes for regular read/write path. - raid56_scrub_read_recover Read remaining data for scrub recovery path. - raid56_scrub_write_stripe Write the modified stripes for scrub path. - raid56_scrub_read Read remaining data for scrub path. Also, since the trace events are included at super.c, we have to export needed structure definitions to 'raid56.h' and include the header in super.c, or we're unable to access those members. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ reformat comments ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
143823cf |
|
25-May-2022 |
David Sterba <dsterba@suse.com> |
btrfs: fix typos in comments Codespell has found a few typos. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
e3a4167c |
|
02-Jun-2022 |
David Sterba <dsterba@suse.com> |
btrfs: add error messages to all unrecognized mount options Almost none of the errors stemming from a valid mount option but wrong value prints a descriptive message which would help to identify why mount failed. Like in the linked report: $ uname -r v4.19 $ mount -o compress=zstd /dev/sdb /mnt mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error. $ dmesg ... BTRFS error (device sdb): open_ctree failed Errors caused by memory allocation failures are left out as it's not a user error so reporting that would be confusing. Link: https://lore.kernel.org/linux-btrfs/9c3fec36-fc61-3a33-4977-a7e207c3fa4e@gmx.de/ CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
0591f040 |
|
17-May-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: prevent remounting to v1 space cache for subpage mount Upstream commit 9f73f1aef98b ("btrfs: force v2 space cache usage for subpage mount") forces subpage mount to use v2 cache, to avoid deprecated v1 cache which doesn't support subpage properly. But there is a loophole that user can still remount to v1 cache. The existing check will only give users a warning, but does not really prevent to do the remount. Although remounting to v1 will not cause any problems since the v1 cache will always be marked invalid when mounted with a different page size, it's still better to prevent v1 cache at all for subpage mounts. Fixes: 9f73f1aef98b ("btrfs: force v2 space cache usage for subpage mount") CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
be539518 |
|
17-Apr-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: use normal workqueues for scrub All three scrub workqueues don't need ordered execution or thread disabling threshold (as the thresh parameter is less than DFT_THRESHOLD). Just switch to the normal workqueues that use a lot less resources, especially in the work_struct vs btrfs_work structures. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
a31b4a43 |
|
17-Apr-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: simplify WQ_HIGHPRI handling in struct btrfs_workqueue Just let the one caller that wants optional WQ_HIGHPRI handling allocate a separate btrfs_workqueue for that. This allows to rename struct __btrfs_workqueue to btrfs_workqueue, remove a pointer indirection and separate allocation for all btrfs_workqueue users and generally simplify the code. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
b0a66a31 |
|
17-Mar-2022 |
Jonathan Lassoff <jof@thejof.com> |
btrfs: add messages to printk index In order for end users to quickly react to new issues that come up in production, it is proving useful to leverage this printk indexing system. This printk index enables kernel developers to use calls to printk() with changeable ad-hoc format strings, while still enabling end users to detect changes and develop a semi-stable interface for detecting and parsing these messages. So that detailed Btrfs messages are captured by this printk index, this patch wraps btrfs_printk and btrfs_handle_fs_error with macros. Example of the generated list: https://lore.kernel.org/lkml/12588e13d51a9c3bf59467d3fc1ac2162f1275c1.1647539056.git.jof@thejof.com Signed-off-by: Jonathan Lassoff <jof@thejof.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
c067da87 |
|
23-Feb-2022 |
Sweet Tea Dorminy <sweettea-kernel@dorminy.me> |
btrfs: add filesystems state details to error messages When a filesystem goes read-only due to an error, multiple errors tend to be reported, some of which are knock-on failures. Logging fs_states, in btrfs_handle_fs_error() and btrfs_printk() helps distinguish the first error from subsequent messages which may only exist due to an error state. Under the new format, most initial errors will look like: `BTRFS: error (device loop0) in ...` while subsequent errors will begin with: `error (device loop0: state E) in ...` An initial transaction abort error will look like `error (device loop0: state A) in ...` and subsequent messages will contain `(device loop0: state EA) in ...` In addition to the error states we can also print other states that are temporary, like remounting, device replace, or indicate a global state that may affect functionality. Now implemented: E - filesystem error detected A - transaction aborted L - log tree errors M - remounting in progress R - device replace in progress C - data checksums not verified (mounted with ignoredatacsums) Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
63cd070d |
|
15-Dec-2021 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: disable space cache related mount options for extent tree v2 We cannot fall back on the slow caching for extent tree v2, which means we can't just arbitrarily clear the free space trees at mount time. Furthermore we can't do v1 space cache with extent tree v2. Simply ignore these mount options for extent tree v2 as they aren't relevant. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
16cab91a |
|
11-Jan-2022 |
Anand Jain <anand.jain@oracle.com> |
btrfs: match stale devices by dev_t After the commit "btrfs: harden identification of the stale device", we don't have to match the device path anymore. Instead, we match the dev_t. So pass in the dev_t instead of the device path, in the call chain btrfs_forget_devices()->btrfs_free_stale_devices(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
0a4ee518 |
|
21-Jan-2022 |
Christoph Hellwig <hch@lst.de> |
mm: remove cleancache Patch series "remove Xen tmem leftovers". Since the removal of the Xen tmem driver in 2019, the cleancache hooks are entirely unused, as are large parts of frontswap. This series against linux-next (with the folio changes included) removes cleancaches, and cuts down frontswap to the bits actually used by zswap. This patch (of 13): The cleancache subsystem is unused since the removal of Xen tmem driver in commit 814bbf49dcd0 ("xen: remove tmem driver"). [akpm@linux-foundation.org: remove now-unreachable code] Link: https://lkml.kernel.org/r/20211224062246.1258487-1-hch@lst.de Link: https://lkml.kernel.org/r/20211224062246.1258487-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f26c9238 |
|
14-Dec-2021 |
Qu Wenruo <wqu@suse.com> |
btrfs: remove reada infrastructure Currently there is only one user for btrfs metadata readahead, and that's scrub. But even for the single user, it's not providing the correct functionality it needs, as scrub needs reada for commit root, which current readahead can't provide. (Although it's pretty easy to add such feature). Despite this, there are some extra problems related to metadata readahead: - Duplicated feature with btrfs_path::reada - Partly duplicated feature of btrfs_fs_info::buffer_radix Btrfs already caches its metadata in buffer_radix, while readahead tries to read the tree block no matter if it's already cached. - Poor layer separation Metadata readahead works kinda at device level. This is definitely not the correct layer it should be, since metadata is at btrfs logical address space, it should not bother device at all. This brings extra chance for bugs to sneak in, while brings unnecessary complexity. - Dead code In the very beginning of scrub.c we have #undef DEBUG, rendering all the debug related code useless and unable to test. Thus here I purpose to remove the metadata readahead mechanism completely. [BENCHMARK] There is a full benchmark for the scrub performance difference using the old btrfs_reada_add() and btrfs_path::reada. For the worst case (no dirty metadata, slow HDD), there could be a 5% performance drop for scrub. For other cases (even SATA SSD), there is no distinguishable performance difference. The number is reported scrub speed, in MiB/s. The resolution is limited by the reported duration, which only has a resolution of 1 second. Old New Diff SSD 455.3 466.332 +2.42% HDD 103.927 98.012 -5.69% Comprehensive test methodology is in the cover letter of the patch. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
84961539 |
|
05-Oct-2021 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: add a BTRFS_FS_ERROR helper We have a few flags that are inconsistently used to describe the fs in different states of failure. As of 5963ffcaf383 ("btrfs: always abort the transaction if we abort a trans handle") we will always set BTRFS_FS_STATE_ERROR if we abort, so we don't have to check both ABORTED and ERROR to see if things have gone wrong. Add a helper to check BTRFS_FS_STATE_ERROR and then convert all checkers of FS_STATE_ERROR to use the helper. The TRANS_ABORTED bit check was added in af7227338135 ("Btrfs: clean up resources during umount after trans is aborted") but is not actually specific. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
6605fd2f |
|
23-Aug-2021 |
Anand Jain <anand.jain@oracle.com> |
btrfs: use latest_dev in btrfs_show_devname The test case btrfs/238 reports the warning below: WARNING: CPU: 3 PID: 481 at fs/btrfs/super.c:2509 btrfs_show_devname+0x104/0x1e8 [btrfs] CPU: 2 PID: 1 Comm: systemd Tainted: G W O 5.14.0-rc1-custom #72 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: btrfs_show_devname+0x108/0x1b4 [btrfs] show_mountinfo+0x234/0x2c4 m_show+0x28/0x34 seq_read_iter+0x12c/0x3c4 vfs_read+0x29c/0x2c8 ksys_read+0x80/0xec __arm64_sys_read+0x28/0x34 invoke_syscall+0x50/0xf8 do_el0_svc+0x88/0x138 el0_svc+0x2c/0x8c el0t_64_sync_handler+0x84/0xe4 el0t_64_sync+0x198/0x19c Reason: While btrfs_prepare_sprout() moves the fs_devices::devices into fs_devices::seed_list, the btrfs_show_devname() searches for the devices and found none, leading to the warning as in above. Fix: latest_dev is updated according to the changes to the device list. That means we could use the latest_dev->name to show the device name in /proc/self/mounts, the pointer will be always valid as it's assigned before the device is deleted from the list in remove or replace. The RCU protection is sufficient as the device structure is freed after synchronization. Reported-by: Su Yue <l@damenly.su> Tested-by: Su Yue <l@damenly.su> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d24fa5c1 |
|
23-Aug-2021 |
Anand Jain <anand.jain@oracle.com> |
btrfs: convert latest_bdev type to btrfs_device and rename In preparation to fix a bug in btrfs_show_devname(). Convert fs_devices::latest_bdev type from struct block_device to struct btrfs_device and, rename the member to fs_devices::latest_dev. So that btrfs_show_devname() can use fs_devices::latest_dev::name. Tested-by: Su Yue <l@damenly.su> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
5b9b26f5 |
|
26-Jul-2021 |
Christian Brauner <christian.brauner@ubuntu.com> |
btrfs: allow idmapped mount Now that we converted btrfs internally to account for idmapped mounts allow the creation of idmapped mounts on by setting the FS_ALLOW_IDMAP flag. We only need to raise this flag on the btrfs_root_fs_type filesystem since btrfs_mount_root() is ultimately responsible for allocating the superblock and is called into from btrfs_mount() associated with btrfs_fs_type. The conversion of the btrfs inode operations was straightforward. Regarding btrfs specific ioctls that perform checks based on inode permissions only those have been allowed that are not filesystem wide operations and hence can be reasonably charged against a specific mount. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
0ff40a91 |
|
29-Jul-2021 |
Marcos Paulo de Souza <mpdesouza@suse.com> |
btrfs: introduce btrfs_search_backwards function It's a common practice to start a search using offset (u64)-1, which is the u64 maximum value, meaning that we want the search_slot function to be set in the last item with the same objectid and type. Once we are in this position, it's a matter to start a search backwards by calling btrfs_previous_item, which will check if we'll need to go to a previous leaf and other necessary checks, only to be sure that we are in last offset of the same object and type. The new btrfs_search_backwards function does the all these steps when necessary, and can be used to avoid code duplication. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
ea3dc7d2 |
|
28-Jul-2021 |
David Sterba <dsterba@suse.com> |
btrfs: print if fsverity support is built in when loading module As fsverity support depends on a config option, print that at module load time like we do for similar features. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
14605409 |
|
30-Jun-2021 |
Boris Burkov <boris@bur.io> |
btrfs: initial fsverity support Add support for fsverity in btrfs. To support the generic interface in fs/verity, we add two new item types in the fs tree for inodes with verity enabled. One stores the per-file verity descriptor and btrfs verity item and the other stores the Merkle tree data itself. Verity checking is done in end_page_read just before a page is marked uptodate. This naturally handles a variety of edge cases like holes, preallocated extents, and inline extents. Some care needs to be taken to not try to verity pages past the end of the file, which are accessed by the generic buffered file reading code under some circumstances like reading to the end of the last page and trying to read again. Direct IO on a verity file falls back to buffered reads. Verity relies on PageChecked for the Merkle tree data itself to avoid re-walking up shared paths in the tree. For this reason, we need to cache the Merkle tree data. Since the file is immutable after verity is turned on, we can cache it at an index past EOF. Use the new inode ro_flags to store verity on the inode item, so that we can enable verity on a file, then rollback to an older kernel and still mount the file system and read the file. Since we can't safely write the file anymore without ruining the invariants of the Merkle tree, we mark a ro_compat flag on the file system when a file has verity enabled. Acked-by: Eric Biggers <ebiggers@google.com> Co-developed-by: Chris Mason <clm@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
95ea0486 |
|
26-Jul-2021 |
Qu Wenruo <wqu@suse.com> |
btrfs: allow read-write for 4K sectorsize on 64K page size systems Since now we support data and metadata read-write for subpage, remove the RO requirement for subpage mount. There are some extra limitations though: - For now, subpage RW mount is still considered experimental Thus that mount warning will still be there. - No compression support There are still quite some PAGE_SIZE hard coded and quite some call sites use extent_clear_unlock_delalloc() to unlock locked_page. This will screw up subpage helpers. Now for subpage RW mount, no matter what mount option or inode attr is set, all writes will not be compressed. Although reading compressed data has no problem. - No defrag for subpage case The defrag support for subpage case will come in later patches, which will also rework the defrag workflow. - No inline extent will be created This is mostly due to the fact that filemap_fdatawrite_range() will trigger more write than the range specified. In fallocate calls, this behavior can make us to writeback which can be inlined, before we enlarge the i_size. This is a very special corner case, and even current btrfs check won't report error on such inline extent + regular extent. But considering how much effort has been put to prevent such inline + regular, I'd prefer to cut off inline extent completely until we have a good solution. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
214cc184 |
|
26-Jul-2021 |
David Sterba <dsterba@suse.com> |
btrfs: constify and cleanup variables in comparators Comparators just read the data and thus get const parameters. This should be also preserved by the local variables, update all comparators passed to sort or bsearch. Cleanups: - unnecessary casts are dropped - btrfs_cmp_device_free_bytes is cleaned up to follow the common pattern and 'inline' is dropped as the function address is taken Signed-off-by: David Sterba <dsterba@suse.com>
|
#
cbeaae4f |
|
18-Jun-2021 |
David Sterba <dsterba@suse.com> |
btrfs: shorten integrity checker extent data mount option Subjectively, CHECK_INTEGRITY_INCLUDING_EXTENT_DATA is quite long and calling it CHECK_INTEGRITY_DATA still keeps the meaning and matches the mount option name. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
5963ffca |
|
20-May-2021 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: always abort the transaction if we abort a trans handle While stress testing our error handling I noticed that sometimes we would still commit the transaction even though we had aborted the transaction. Currently we track if a trans handle has dirtied any metadata, and if it hasn't we mark the filesystem as having an error (so no new transactions can be started), but we will allow the current transaction to complete as we do not mark the transaction itself as having been aborted. This sounds good in theory, but we were not properly tracking IO errors in btrfs_finish_ordered_io, and thus committing the transaction with bogus free space data. This isn't necessarily a problem per-se with the free space cache, as the other guards in place would have kept us from accepting the free space cache as valid, but highlights a real world case where we had a bug and could have corrupted the filesystem because of it. This "skip abort on empty trans handle" is nice in theory, but assumes we have perfect error handling everywhere, which we clearly do not. Also we do not allow further transactions to be started, so all this does is save the last transaction that was happening, which doesn't necessarily gain us anything other than the potential for real corruption. Remove this particular bit of code, if we decide we need to abort the transaction then abort the current one and keep us from doing real harm to the file system, regardless of whether this specific trans handle dirtied anything or not. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
e9306ad4 |
|
24-Feb-2021 |
Qu Wenruo <wqu@suse.com> |
btrfs: more graceful errors/warnings on 32bit systems when reaching limits Btrfs uses internally mapped u64 address space for all its metadata. Due to the page cache limit on 32bit systems, btrfs can't access metadata at or beyond (ULONG_MAX + 1) << PAGE_SHIFT. See how MAX_LFS_FILESIZE and page::index are defined. This is 16T for 4K page size while 256T for 64K page size. Users can have a filesystem which doesn't have metadata beyond the boundary at mount time, but later balance can cause it to create metadata beyond the boundary. And modification to MM layer is unrealistic just for such minor use case. We can't do more than to prevent mounting such filesystem or warn early when the numbers are still within the limits. To address such problem, this patch will introduce the following checks: - Mount time rejection This will reject any fs which has metadata chunk at or beyond the boundary. - Mount time early warning If there is any metadata chunk beyond 5/8th of the boundary, we do an early warning and hope the end user will see it. - Runtime extent buffer rejection If we're going to allocate an extent buffer at or beyond the boundary, reject such request with EOVERFLOW. This is definitely going to cause problems like transaction abort, but we have no better ways. - Runtime extent buffer early warning If an extent buffer beyond 5/8th of the max file size is allocated, do an early warning. Above error/warning message will only be printed once for each fs to reduce dmesg flood. If the mount is rejected, the filesystem will be mountable only on a 64bit host. Link: https://lore.kernel.org/linux-btrfs/1783f16d-7a28-80e6-4c32-fdf19b705ed0@gmx.com/ Reported-by: Erik Jensen <erikjensen@rkjnsn.net> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
c55a4319 |
|
23-Feb-2021 |
Boris Burkov <boris@bur.io> |
btrfs: fix spurious free_space_tree remount warning The intended logic of the check is to catch cases where the desired free_space_tree setting doesn't match the mounted setting, and the remount is anything but ro->rw. However, it makes the mistake of checking equality on a masked integer (btrfs_test_opt) against a boolean (btrfs_fs_compat_ro). If you run the reproducer: $ mount -o space_cache=v2 dev mnt $ mount -o remount,ro mnt you would expect no warning, because the remount is not attempting to change the free space tree setting, but we do see the warning. To fix this, add explicit bool type casts to the condition. I tested a variety of transitions: sudo mount -o space_cache=v2 /dev/vg0/lv0 mnt/lol (fst enabled) mount -o remount,ro mnt/lol (no warning, no fst change) sudo mount -o remount,rw,space_cache=v1,clear_cache (no warning, ro->rw) sudo mount -o remount,rw,space_cache=v2 mnt (warning, rw->rw with change) sudo mount -o remount,ro mnt (no warning, no fst change) sudo mount -o remount,rw,space_cache=v2 mnt (no warning, no fst change) Reported-by: Chris Murphy <lists@colorremedies.com> CC: stable@vger.kernel.org # 5.11 Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
0bb3eb3e |
|
26-Jan-2021 |
Qu Wenruo <wqu@suse.com> |
btrfs: allow read-only mount of 4K sector size fs on 64K page system This adds the basic RO mount ability for 4K sector size on 64K page system. Currently we only plan to support 4K and 64K page system. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
cac06d84 |
|
26-Jan-2021 |
Qu Wenruo <wqu@suse.com> |
btrfs: introduce the skeleton of btrfs_subpage structure For sectorsize < page size support, we need a structure to record extra status info for each sector of a page. Introduce the skeleton structure, all subpage related code would go to subpage.[ch]. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
a8cc263e |
|
14-Dec-2020 |
Filipe Manana <fdmanana@suse.com> |
btrfs: run delayed iputs when remounting RO to avoid leaking them When remounting RO, after setting the superblock with the RO flag, the cleaner task will start sleeping and do nothing, since the call to btrfs_need_cleaner_sleep() keeps returning 'true'. However, when the cleaner task goes to sleep, the list of delayed iputs may not be empty. As long as we are in RO mode, the cleaner task will keep sleeping and never run the delayed iputs. This means that if a filesystem unmount is started, we get into close_ctree() with a non-empty list of delayed iputs, and because the filesystem is in RO mode and is not in an error state (or a transaction aborted), btrfs_error_commit_super() and btrfs_commit_super(), which run the delayed iputs, are never called, and later we fail the assertion that checks if the delayed iputs list is empty: assertion failed: list_empty(&fs_info->delayed_iputs), in fs/btrfs/disk-io.c:4049 ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.h:3153! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI CPU: 1 PID: 3780621 Comm: umount Tainted: G L 5.6.0-rc2-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 RIP: 0010:assertfail.constprop.0+0x18/0x26 [btrfs] Code: 8b 7b 58 48 85 ff 74 (...) RSP: 0018:ffffb748c89bbdf8 EFLAGS: 00010246 RAX: 0000000000000051 RBX: ffff9608f2584000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff91998988 RDI: 00000000ffffffff RBP: ffff9608f25870d8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc0cbc500 R13: ffffffff92411750 R14: 0000000000000000 R15: ffff9608f2aab250 FS: 00007fcbfaa66c80(0000) GS:ffff960936c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fffc2c2dd38 CR3: 0000000235e54002 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: close_ctree+0x1a2/0x2e6 [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x93/0xc0 exit_to_usermode_loop+0xf9/0x100 do_syscall_64+0x20d/0x260 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fcbfaca6307 Code: eb 0b 00 f7 d8 64 89 (...) RSP: 002b:00007fffc2c2ed68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 0000558203b559b0 RCX: 00007fcbfaca6307 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000558203b55bc0 RBP: 0000000000000000 R08: 0000000000000001 R09: 00007fffc2c2dad0 R10: 0000558203b55bf0 R11: 0000000000000246 R12: 0000558203b55bc0 R13: 00007fcbfadcc204 R14: 0000558203b55aa8 R15: 0000000000000000 Modules linked in: btrfs dm_flakey dm_log_writes (...) ---[ end trace d44d303790049ef6 ]--- So fix this by making the remount RO path run any remaining delayed iputs after waiting for the cleaner to become inactive. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
a0a1db70 |
|
14-Dec-2020 |
Filipe Manana <fdmanana@suse.com> |
btrfs: fix race between RO remount and the cleaner task When we are remounting a filesystem in RO mode we can race with the cleaner task and result in leaking a transaction if the filesystem is unmounted shortly after, before the transaction kthread had a chance to commit that transaction. That also results in a crash during unmount, due to a use-after-free, if hardware acceleration is not available for crc32c. The following sequence of steps explains how the race happens. 1) The filesystem is mounted in RW mode and the cleaner task is running. This means that currently BTRFS_FS_CLEANER_RUNNING is set at fs_info->flags; 2) The cleaner task is currently running delayed iputs for example; 3) A filesystem RO remount operation starts; 4) The RO remount task calls btrfs_commit_super(), which commits any currently open transaction, and it finishes; 5) At this point the cleaner task is still running and it creates a new transaction by doing one of the following things: * When running the delayed iput() for an inode with a 0 link count, in which case at btrfs_evict_inode() we start a transaction through the call to evict_refill_and_join(), use it and then release its handle through btrfs_end_transaction(); * When deleting a dead root through btrfs_clean_one_deleted_snapshot(), a transaction is started at btrfs_drop_snapshot() and then its handle is released through a call to btrfs_end_transaction_throttle(); * When the remount task was still running, and before the remount task called btrfs_delete_unused_bgs(), the cleaner task also called btrfs_delete_unused_bgs() and it picked and removed one block group from the list of unused block groups. Before the cleaner task started a transaction, through btrfs_start_trans_remove_block_group() at btrfs_delete_unused_bgs(), the remount task had already called btrfs_commit_super(); 6) So at this point the filesystem is in RO mode and we have an open transaction that was started by the cleaner task; 7) Shortly after a filesystem unmount operation starts. At close_ctree() we stop the transaction kthread before it had a chance to commit the transaction, since less than 30 seconds (the default commit interval) have elapsed since the last transaction was committed; 8) We end up calling iput() against the btree inode at close_ctree() while there is an open transaction, and since that transaction was used to update btrees by the cleaner, we have dirty pages in the btree inode due to COW operations on metadata extents, and therefore writeback is triggered for the btree inode. So btree_write_cache_pages() is invoked to flush those dirty pages during the final iput() on the btree inode. This results in creating a bio and submitting it, which makes us end up at btrfs_submit_metadata_bio(); 9) At btrfs_submit_metadata_bio() we end up at the if-then-else branch that calls btrfs_wq_submit_bio(), because check_async_write() returned a value of 1. This value of 1 is because we did not have hardware acceleration available for crc32c, so BTRFS_FS_CSUM_IMPL_FAST was not set in fs_info->flags; 10) Then at btrfs_wq_submit_bio() we call btrfs_queue_work() against the workqueue at fs_info->workers, which was already freed before by the call to btrfs_stop_all_workers() at close_ctree(). This results in an invalid memory access due to a use-after-free, leading to a crash. When this happens, before the crash there are several warnings triggered, since we have reserved metadata space in a block group, the delayed refs reservation, etc: ------------[ cut here ]------------ WARNING: CPU: 4 PID: 1729896 at fs/btrfs/block-group.c:125 btrfs_put_block_group+0x63/0xa0 [btrfs] Modules linked in: btrfs dm_snapshot dm_thin_pool (...) CPU: 4 PID: 1729896 Comm: umount Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_put_block_group+0x63/0xa0 [btrfs] Code: f0 01 00 00 48 39 c2 75 (...) RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206 RAX: 0000000000000001 RBX: ffff947ed73e4000 RCX: ffff947ebc8b29c8 RDX: 0000000000000001 RSI: ffffffffc0b150a0 RDI: ffff947ebc8b2800 RBP: ffff947ebc8b2800 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110 R13: ffff947ed73e4160 R14: ffff947ebc8b2988 R15: dead000000000100 FS: 00007f15edfea840(0000) GS:ffff9481ad600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f37e2893320 CR3: 0000000138f68001 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_free_block_groups+0x17f/0x2f0 [btrfs] close_ctree+0x2ba/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f15ee221ee7 Code: ff 0b 00 f7 d8 64 89 01 48 (...) RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7 RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000 RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0 R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000 R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace dd74718fef1ed5c6 ]--- ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-rsv.c:459 btrfs_release_global_block_rsv+0x70/0xc0 [btrfs] Modules linked in: btrfs dm_snapshot dm_thin_pool (...) CPU: 2 PID: 1729896 Comm: umount Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_release_global_block_rsv+0x70/0xc0 [btrfs] Code: 48 83 bb b0 03 00 00 00 (...) RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206 RAX: 000000000033c000 RBX: ffff947ed73e4000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffffc0b0d8c1 RDI: 00000000ffffffff RBP: ffff947ebc8b7000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110 R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100 FS: 00007f15edfea840(0000) GS:ffff9481aca00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000561a79f76e20 CR3: 0000000138f68006 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_free_block_groups+0x24c/0x2f0 [btrfs] close_ctree+0x2ba/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f15ee221ee7 Code: ff 0b 00 f7 d8 64 89 01 (...) RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7 RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000 RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0 R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000 R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace dd74718fef1ed5c7 ]--- ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-group.c:3377 btrfs_free_block_groups+0x25d/0x2f0 [btrfs] Modules linked in: btrfs dm_snapshot dm_thin_pool (...) CPU: 5 PID: 1729896 Comm: umount Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_free_block_groups+0x25d/0x2f0 [btrfs] Code: ad de 49 be 22 01 00 (...) RSP: 0018:ffffb270826bbde8 EFLAGS: 00010206 RAX: ffff947ebeae1d08 RBX: ffff947ed73e4000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff947e9d823ae8 RDI: 0000000000000246 RBP: ffff947ebeae1d08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ebeae1c00 R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100 FS: 00007f15edfea840(0000) GS:ffff9481ad200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1475d98ea8 CR3: 0000000138f68005 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: close_ctree+0x2ba/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f15ee221ee7 Code: ff 0b 00 f7 d8 64 89 (...) RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7 RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000 RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0 R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000 R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace dd74718fef1ed5c8 ]--- BTRFS info (device sdc): space_info 4 has 268238848 free, is not full BTRFS info (device sdc): space_info total=268435456, used=114688, pinned=0, reserved=16384, may_use=0, readonly=65536 BTRFS info (device sdc): global_block_rsv: size 0 reserved 0 BTRFS info (device sdc): trans_block_rsv: size 0 reserved 0 BTRFS info (device sdc): chunk_block_rsv: size 0 reserved 0 BTRFS info (device sdc): delayed_block_rsv: size 0 reserved 0 BTRFS info (device sdc): delayed_refs_rsv: size 524288 reserved 0 And the crash, which only happens when we do not have crc32c hardware acceleration, produces the following trace immediately after those warnings: stack segment: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI CPU: 2 PID: 1749129 Comm: umount Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_queue_work+0x36/0x190 [btrfs] Code: 54 55 53 48 89 f3 (...) RSP: 0018:ffffb27082443ae8 EFLAGS: 00010282 RAX: 0000000000000004 RBX: ffff94810ee9ad90 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff94810ee9ad90 RDI: ffff947ed8ee75a0 RBP: a56b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000007 R11: 0000000000000001 R12: ffff947fa9b435a8 R13: ffff94810ee9ad90 R14: 0000000000000000 R15: ffff947e93dc0000 FS: 00007f3cfe974840(0000) GS:ffff9481ac600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1b42995a70 CR3: 0000000127638003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_wq_submit_bio+0xb3/0xd0 [btrfs] btrfs_submit_metadata_bio+0x44/0xc0 [btrfs] submit_one_bio+0x61/0x70 [btrfs] btree_write_cache_pages+0x414/0x450 [btrfs] ? kobject_put+0x9a/0x1d0 ? trace_hardirqs_on+0x1b/0xf0 ? _raw_spin_unlock_irqrestore+0x3c/0x60 ? free_debug_processing+0x1e1/0x2b0 do_writepages+0x43/0xe0 ? lock_acquired+0x199/0x490 __writeback_single_inode+0x59/0x650 writeback_single_inode+0xaf/0x120 write_inode_now+0x94/0xd0 iput+0x187/0x2b0 close_ctree+0x2c6/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f3cfebabee7 Code: ff 0b 00 f7 d8 64 89 01 (...) RSP: 002b:00007ffc9c9a05f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 00007f3cfecd1264 RCX: 00007f3cfebabee7 RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 0000562b6b478000 RBP: 0000562b6b473a30 R08: 0000000000000000 R09: 00007f3cfec6cbe0 R10: 0000562b6b479fe0 R11: 0000000000000246 R12: 0000000000000000 R13: 0000562b6b478000 R14: 0000562b6b473b40 R15: 0000562b6b473c60 Modules linked in: btrfs dm_snapshot dm_thin_pool (...) ---[ end trace dd74718fef1ed5cc ]--- Finally when we remove the btrfs module (rmmod btrfs), there are several warnings about objects that were allocated from our slabs but were never freed, consequence of the transaction that was never committed and got leaked: ============================================================================= BUG btrfs_delayed_ref_head (Tainted: G B W ): Objects remaining in btrfs_delayed_ref_head on __kmem_cache_shutdown() ----------------------------------------------------------------------------- INFO: Slab 0x0000000094c2ae56 objects=24 used=2 fp=0x000000002bfa2521 flags=0x17fffc000010200 CPU: 5 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 slab_err+0xb7/0xdc ? lock_acquired+0x199/0x490 __kmem_cache_shutdown+0x1ac/0x3c0 ? lock_release+0x20e/0x4c0 kmem_cache_destroy+0x55/0x120 btrfs_delayed_ref_exit+0x11/0x35 [btrfs] exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 f5 (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 INFO: Object 0x0000000050cbdd61 @offset=12104 INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1894 cpu=6 pid=1729873 __slab_alloc.isra.0+0x109/0x1c0 kmem_cache_alloc+0x7bb/0x830 btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] btrfs_free_tree_block+0x128/0x360 [btrfs] __btrfs_cow_block+0x489/0x5f0 [btrfs] btrfs_cow_block+0xf7/0x220 [btrfs] btrfs_search_slot+0x62a/0xc40 [btrfs] btrfs_del_orphan_item+0x65/0xd0 [btrfs] btrfs_find_orphan_roots+0x1bf/0x200 [btrfs] open_ctree+0x125a/0x18a0 [btrfs] btrfs_mount_root.cold+0x13/0xed [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xe0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=4292 cpu=2 pid=1729526 kmem_cache_free+0x34c/0x3c0 __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] btrfs_run_delayed_refs+0x81/0x210 [btrfs] commit_cowonly_roots+0xfb/0x300 [btrfs] btrfs_commit_transaction+0x367/0xc40 [btrfs] sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 INFO: Object 0x0000000086e9b0ff @offset=12776 INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1900 cpu=6 pid=1729873 __slab_alloc.isra.0+0x109/0x1c0 kmem_cache_alloc+0x7bb/0x830 btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] btrfs_alloc_tree_block+0x2bf/0x360 [btrfs] alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs] __btrfs_cow_block+0x12d/0x5f0 [btrfs] btrfs_cow_block+0xf7/0x220 [btrfs] btrfs_search_slot+0x62a/0xc40 [btrfs] btrfs_del_orphan_item+0x65/0xd0 [btrfs] btrfs_find_orphan_roots+0x1bf/0x200 [btrfs] open_ctree+0x125a/0x18a0 [btrfs] btrfs_mount_root.cold+0x13/0xed [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xe0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=3141 cpu=6 pid=1729803 kmem_cache_free+0x34c/0x3c0 __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] btrfs_run_delayed_refs+0x81/0x210 [btrfs] btrfs_write_dirty_block_groups+0x17d/0x3d0 [btrfs] commit_cowonly_roots+0x248/0x300 [btrfs] btrfs_commit_transaction+0x367/0xc40 [btrfs] close_ctree+0x113/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 kmem_cache_destroy btrfs_delayed_ref_head: Slab cache still has objects CPU: 5 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 kmem_cache_destroy+0x119/0x120 btrfs_delayed_ref_exit+0x11/0x35 [btrfs] exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 f5 0b (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 ============================================================================= BUG btrfs_delayed_tree_ref (Tainted: G B W ): Objects remaining in btrfs_delayed_tree_ref on __kmem_cache_shutdown() ----------------------------------------------------------------------------- INFO: Slab 0x0000000011f78dc0 objects=37 used=2 fp=0x0000000032d55d91 flags=0x17fffc000010200 CPU: 3 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 slab_err+0xb7/0xdc ? lock_acquired+0x199/0x490 __kmem_cache_shutdown+0x1ac/0x3c0 ? lock_release+0x20e/0x4c0 kmem_cache_destroy+0x55/0x120 btrfs_delayed_ref_exit+0x1d/0x35 [btrfs] exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 f5 (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 INFO: Object 0x000000001a340018 @offset=4408 INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1917 cpu=6 pid=1729873 __slab_alloc.isra.0+0x109/0x1c0 kmem_cache_alloc+0x7bb/0x830 btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] btrfs_free_tree_block+0x128/0x360 [btrfs] __btrfs_cow_block+0x489/0x5f0 [btrfs] btrfs_cow_block+0xf7/0x220 [btrfs] btrfs_search_slot+0x62a/0xc40 [btrfs] btrfs_del_orphan_item+0x65/0xd0 [btrfs] btrfs_find_orphan_roots+0x1bf/0x200 [btrfs] open_ctree+0x125a/0x18a0 [btrfs] btrfs_mount_root.cold+0x13/0xed [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xe0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=4167 cpu=4 pid=1729795 kmem_cache_free+0x34c/0x3c0 __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] btrfs_run_delayed_refs+0x81/0x210 [btrfs] btrfs_commit_transaction+0x60/0xc40 [btrfs] create_subvol+0x56a/0x990 [btrfs] btrfs_mksubvol+0x3fb/0x4a0 [btrfs] __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs] btrfs_ioctl_snap_create+0x58/0x80 [btrfs] btrfs_ioctl+0x1a92/0x36f0 [btrfs] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 INFO: Object 0x000000002b46292a @offset=13648 INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1923 cpu=6 pid=1729873 __slab_alloc.isra.0+0x109/0x1c0 kmem_cache_alloc+0x7bb/0x830 btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] btrfs_alloc_tree_block+0x2bf/0x360 [btrfs] alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs] __btrfs_cow_block+0x12d/0x5f0 [btrfs] btrfs_cow_block+0xf7/0x220 [btrfs] btrfs_search_slot+0x62a/0xc40 [btrfs] btrfs_del_orphan_item+0x65/0xd0 [btrfs] btrfs_find_orphan_roots+0x1bf/0x200 [btrfs] open_ctree+0x125a/0x18a0 [btrfs] btrfs_mount_root.cold+0x13/0xed [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xe0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=3164 cpu=6 pid=1729803 kmem_cache_free+0x34c/0x3c0 __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] btrfs_run_delayed_refs+0x81/0x210 [btrfs] commit_cowonly_roots+0xfb/0x300 [btrfs] btrfs_commit_transaction+0x367/0xc40 [btrfs] close_ctree+0x113/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 kmem_cache_destroy btrfs_delayed_tree_ref: Slab cache still has objects CPU: 5 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 kmem_cache_destroy+0x119/0x120 btrfs_delayed_ref_exit+0x1d/0x35 [btrfs] exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 f5 (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 ============================================================================= BUG btrfs_delayed_extent_op (Tainted: G B W ): Objects remaining in btrfs_delayed_extent_op on __kmem_cache_shutdown() ----------------------------------------------------------------------------- INFO: Slab 0x00000000f145ce2f objects=22 used=1 fp=0x00000000af0f92cf flags=0x17fffc000010200 CPU: 5 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 slab_err+0xb7/0xdc ? lock_acquired+0x199/0x490 __kmem_cache_shutdown+0x1ac/0x3c0 ? __mutex_unlock_slowpath+0x45/0x2a0 kmem_cache_destroy+0x55/0x120 exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 f5 (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 INFO: Object 0x000000004cf95ea8 @offset=6264 INFO: Allocated in btrfs_alloc_tree_block+0x1e0/0x360 [btrfs] age=1931 cpu=6 pid=1729873 __slab_alloc.isra.0+0x109/0x1c0 kmem_cache_alloc+0x7bb/0x830 btrfs_alloc_tree_block+0x1e0/0x360 [btrfs] alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs] __btrfs_cow_block+0x12d/0x5f0 [btrfs] btrfs_cow_block+0xf7/0x220 [btrfs] btrfs_search_slot+0x62a/0xc40 [btrfs] btrfs_del_orphan_item+0x65/0xd0 [btrfs] btrfs_find_orphan_roots+0x1bf/0x200 [btrfs] open_ctree+0x125a/0x18a0 [btrfs] btrfs_mount_root.cold+0x13/0xed [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xe0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] INFO: Freed in __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs] age=3173 cpu=6 pid=1729803 kmem_cache_free+0x34c/0x3c0 __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs] btrfs_run_delayed_refs+0x81/0x210 [btrfs] commit_cowonly_roots+0xfb/0x300 [btrfs] btrfs_commit_transaction+0x367/0xc40 [btrfs] close_ctree+0x113/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 kmem_cache_destroy btrfs_delayed_extent_op: Slab cache still has objects CPU: 3 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 kmem_cache_destroy+0x119/0x120 exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 BTRFS: state leak: start 30408704 end 30425087 state 1 in tree 1 refs 1 So fix this by making the remount path to wait for the cleaner task before calling btrfs_commit_super(). The remount path now waits for the bit BTRFS_FS_CLEANER_RUNNING to be cleared from fs_info->flags before calling btrfs_commit_super() and this ensures the cleaner can not start a transaction after that, because it sleeps when the filesystem is in RO mode and we have already flagged the filesystem as RO before waiting for BTRFS_FS_CLEANER_RUNNING to be cleared. This also introduces a new flag BTRFS_FS_STATE_RO to be used for fs_info->fs_state when the filesystem is in RO mode. This is because we were doing the RO check using the flags of the superblock and setting the RO mode simply by ORing into the superblock's flags - those operations are not atomic and could result in the cleaner not seeing the update from the remount task after it clears BTRFS_FS_CLEANER_RUNNING. Tested-by: Fabian Vogt <fvogt@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
cb13eea3 |
|
14-Dec-2020 |
Filipe Manana <fdmanana@suse.com> |
btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan If we remount a filesystem in RO mode while the qgroup rescan worker is running, we can end up having it still running after the remount is done, and at unmount time we may end up with an open transaction that ends up never getting committed. If that happens we end up with several memory leaks and can crash when hardware acceleration is unavailable for crc32c. Possibly it can lead to other nasty surprises too, due to use-after-free issues. The following steps explain how the problem happens. 1) We have a filesystem mounted in RW mode and the qgroup rescan worker is running; 2) We remount the filesystem in RO mode, and never stop/pause the rescan worker, so after the remount the rescan worker is still running. The important detail here is that the rescan task is still running after the remount operation committed any ongoing transaction through its call to btrfs_commit_super(); 3) The rescan is still running, and after the remount completed, the rescan worker started a transaction, after it finished iterating all leaves of the extent tree, to update the qgroup status item in the quotas tree. It does not commit the transaction, it only releases its handle on the transaction; 4) A filesystem unmount operation starts shortly after; 5) The unmount task, at close_ctree(), stops the transaction kthread, which had not had a chance to commit the open transaction since it was sleeping and the commit interval (default of 30 seconds) has not yet elapsed since the last time it committed a transaction; 6) So after stopping the transaction kthread we still have the transaction used to update the qgroup status item open. At close_ctree(), when the filesystem is in RO mode and no transaction abort happened (or the filesystem is in error mode), we do not expect to have any transaction open, so we do not call btrfs_commit_super(); 7) We then proceed to destroy the work queues, free the roots and block groups, etc. After that we drop the last reference on the btree inode by calling iput() on it. Since there are dirty pages for the btree inode, corresponding to the COWed extent buffer for the quotas btree, btree_write_cache_pages() is invoked to flush those dirty pages. This results in creating a bio and submitting it, which makes us end up at btrfs_submit_metadata_bio(); 8) At btrfs_submit_metadata_bio() we end up at the if-then-else branch that calls btrfs_wq_submit_bio(), because check_async_write() returned a value of 1. This value of 1 is because we did not have hardware acceleration available for crc32c, so BTRFS_FS_CSUM_IMPL_FAST was not set in fs_info->flags; 9) Then at btrfs_wq_submit_bio() we call btrfs_queue_work() against the workqueue at fs_info->workers, which was already freed before by the call to btrfs_stop_all_workers() at close_ctree(). This results in an invalid memory access due to a use-after-free, leading to a crash. When this happens, before the crash there are several warnings triggered, since we have reserved metadata space in a block group, the delayed refs reservation, etc: ------------[ cut here ]------------ WARNING: CPU: 4 PID: 1729896 at fs/btrfs/block-group.c:125 btrfs_put_block_group+0x63/0xa0 [btrfs] Modules linked in: btrfs dm_snapshot dm_thin_pool (...) CPU: 4 PID: 1729896 Comm: umount Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_put_block_group+0x63/0xa0 [btrfs] Code: f0 01 00 00 48 39 c2 75 (...) RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206 RAX: 0000000000000001 RBX: ffff947ed73e4000 RCX: ffff947ebc8b29c8 RDX: 0000000000000001 RSI: ffffffffc0b150a0 RDI: ffff947ebc8b2800 RBP: ffff947ebc8b2800 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110 R13: ffff947ed73e4160 R14: ffff947ebc8b2988 R15: dead000000000100 FS: 00007f15edfea840(0000) GS:ffff9481ad600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f37e2893320 CR3: 0000000138f68001 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_free_block_groups+0x17f/0x2f0 [btrfs] close_ctree+0x2ba/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f15ee221ee7 Code: ff 0b 00 f7 d8 64 89 01 48 (...) RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7 RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000 RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0 R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000 R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace dd74718fef1ed5c6 ]--- ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-rsv.c:459 btrfs_release_global_block_rsv+0x70/0xc0 [btrfs] Modules linked in: btrfs dm_snapshot dm_thin_pool (...) CPU: 2 PID: 1729896 Comm: umount Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_release_global_block_rsv+0x70/0xc0 [btrfs] Code: 48 83 bb b0 03 00 00 00 (...) RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206 RAX: 000000000033c000 RBX: ffff947ed73e4000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffffc0b0d8c1 RDI: 00000000ffffffff RBP: ffff947ebc8b7000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110 R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100 FS: 00007f15edfea840(0000) GS:ffff9481aca00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000561a79f76e20 CR3: 0000000138f68006 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_free_block_groups+0x24c/0x2f0 [btrfs] close_ctree+0x2ba/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f15ee221ee7 Code: ff 0b 00 f7 d8 64 89 01 (...) RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7 RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000 RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0 R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000 R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace dd74718fef1ed5c7 ]--- ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-group.c:3377 btrfs_free_block_groups+0x25d/0x2f0 [btrfs] Modules linked in: btrfs dm_snapshot dm_thin_pool (...) CPU: 5 PID: 1729896 Comm: umount Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_free_block_groups+0x25d/0x2f0 [btrfs] Code: ad de 49 be 22 01 00 (...) RSP: 0018:ffffb270826bbde8 EFLAGS: 00010206 RAX: ffff947ebeae1d08 RBX: ffff947ed73e4000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff947e9d823ae8 RDI: 0000000000000246 RBP: ffff947ebeae1d08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ebeae1c00 R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100 FS: 00007f15edfea840(0000) GS:ffff9481ad200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1475d98ea8 CR3: 0000000138f68005 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: close_ctree+0x2ba/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f15ee221ee7 Code: ff 0b 00 f7 d8 64 89 (...) RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7 RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000 RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0 R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000 R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace dd74718fef1ed5c8 ]--- BTRFS info (device sdc): space_info 4 has 268238848 free, is not full BTRFS info (device sdc): space_info total=268435456, used=114688, pinned=0, reserved=16384, may_use=0, readonly=65536 BTRFS info (device sdc): global_block_rsv: size 0 reserved 0 BTRFS info (device sdc): trans_block_rsv: size 0 reserved 0 BTRFS info (device sdc): chunk_block_rsv: size 0 reserved 0 BTRFS info (device sdc): delayed_block_rsv: size 0 reserved 0 BTRFS info (device sdc): delayed_refs_rsv: size 524288 reserved 0 And the crash, which only happens when we do not have crc32c hardware acceleration, produces the following trace immediately after those warnings: stack segment: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI CPU: 2 PID: 1749129 Comm: umount Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_queue_work+0x36/0x190 [btrfs] Code: 54 55 53 48 89 f3 (...) RSP: 0018:ffffb27082443ae8 EFLAGS: 00010282 RAX: 0000000000000004 RBX: ffff94810ee9ad90 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff94810ee9ad90 RDI: ffff947ed8ee75a0 RBP: a56b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000007 R11: 0000000000000001 R12: ffff947fa9b435a8 R13: ffff94810ee9ad90 R14: 0000000000000000 R15: ffff947e93dc0000 FS: 00007f3cfe974840(0000) GS:ffff9481ac600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1b42995a70 CR3: 0000000127638003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_wq_submit_bio+0xb3/0xd0 [btrfs] btrfs_submit_metadata_bio+0x44/0xc0 [btrfs] submit_one_bio+0x61/0x70 [btrfs] btree_write_cache_pages+0x414/0x450 [btrfs] ? kobject_put+0x9a/0x1d0 ? trace_hardirqs_on+0x1b/0xf0 ? _raw_spin_unlock_irqrestore+0x3c/0x60 ? free_debug_processing+0x1e1/0x2b0 do_writepages+0x43/0xe0 ? lock_acquired+0x199/0x490 __writeback_single_inode+0x59/0x650 writeback_single_inode+0xaf/0x120 write_inode_now+0x94/0xd0 iput+0x187/0x2b0 close_ctree+0x2c6/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f3cfebabee7 Code: ff 0b 00 f7 d8 64 89 01 (...) RSP: 002b:00007ffc9c9a05f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 00007f3cfecd1264 RCX: 00007f3cfebabee7 RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 0000562b6b478000 RBP: 0000562b6b473a30 R08: 0000000000000000 R09: 00007f3cfec6cbe0 R10: 0000562b6b479fe0 R11: 0000000000000246 R12: 0000000000000000 R13: 0000562b6b478000 R14: 0000562b6b473b40 R15: 0000562b6b473c60 Modules linked in: btrfs dm_snapshot dm_thin_pool (...) ---[ end trace dd74718fef1ed5cc ]--- Finally when we remove the btrfs module (rmmod btrfs), there are several warnings about objects that were allocated from our slabs but were never freed, consequence of the transaction that was never committed and got leaked: ============================================================================= BUG btrfs_delayed_ref_head (Tainted: G B W ): Objects remaining in btrfs_delayed_ref_head on __kmem_cache_shutdown() ----------------------------------------------------------------------------- INFO: Slab 0x0000000094c2ae56 objects=24 used=2 fp=0x000000002bfa2521 flags=0x17fffc000010200 CPU: 5 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 slab_err+0xb7/0xdc ? lock_acquired+0x199/0x490 __kmem_cache_shutdown+0x1ac/0x3c0 ? lock_release+0x20e/0x4c0 kmem_cache_destroy+0x55/0x120 btrfs_delayed_ref_exit+0x11/0x35 [btrfs] exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 f5 (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 INFO: Object 0x0000000050cbdd61 @offset=12104 INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1894 cpu=6 pid=1729873 __slab_alloc.isra.0+0x109/0x1c0 kmem_cache_alloc+0x7bb/0x830 btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] btrfs_free_tree_block+0x128/0x360 [btrfs] __btrfs_cow_block+0x489/0x5f0 [btrfs] btrfs_cow_block+0xf7/0x220 [btrfs] btrfs_search_slot+0x62a/0xc40 [btrfs] btrfs_del_orphan_item+0x65/0xd0 [btrfs] btrfs_find_orphan_roots+0x1bf/0x200 [btrfs] open_ctree+0x125a/0x18a0 [btrfs] btrfs_mount_root.cold+0x13/0xed [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xe0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=4292 cpu=2 pid=1729526 kmem_cache_free+0x34c/0x3c0 __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] btrfs_run_delayed_refs+0x81/0x210 [btrfs] commit_cowonly_roots+0xfb/0x300 [btrfs] btrfs_commit_transaction+0x367/0xc40 [btrfs] sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 INFO: Object 0x0000000086e9b0ff @offset=12776 INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1900 cpu=6 pid=1729873 __slab_alloc.isra.0+0x109/0x1c0 kmem_cache_alloc+0x7bb/0x830 btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] btrfs_alloc_tree_block+0x2bf/0x360 [btrfs] alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs] __btrfs_cow_block+0x12d/0x5f0 [btrfs] btrfs_cow_block+0xf7/0x220 [btrfs] btrfs_search_slot+0x62a/0xc40 [btrfs] btrfs_del_orphan_item+0x65/0xd0 [btrfs] btrfs_find_orphan_roots+0x1bf/0x200 [btrfs] open_ctree+0x125a/0x18a0 [btrfs] btrfs_mount_root.cold+0x13/0xed [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xe0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=3141 cpu=6 pid=1729803 kmem_cache_free+0x34c/0x3c0 __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] btrfs_run_delayed_refs+0x81/0x210 [btrfs] btrfs_write_dirty_block_groups+0x17d/0x3d0 [btrfs] commit_cowonly_roots+0x248/0x300 [btrfs] btrfs_commit_transaction+0x367/0xc40 [btrfs] close_ctree+0x113/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 kmem_cache_destroy btrfs_delayed_ref_head: Slab cache still has objects CPU: 5 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 kmem_cache_destroy+0x119/0x120 btrfs_delayed_ref_exit+0x11/0x35 [btrfs] exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 f5 0b (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 ============================================================================= BUG btrfs_delayed_tree_ref (Tainted: G B W ): Objects remaining in btrfs_delayed_tree_ref on __kmem_cache_shutdown() ----------------------------------------------------------------------------- INFO: Slab 0x0000000011f78dc0 objects=37 used=2 fp=0x0000000032d55d91 flags=0x17fffc000010200 CPU: 3 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 slab_err+0xb7/0xdc ? lock_acquired+0x199/0x490 __kmem_cache_shutdown+0x1ac/0x3c0 ? lock_release+0x20e/0x4c0 kmem_cache_destroy+0x55/0x120 btrfs_delayed_ref_exit+0x1d/0x35 [btrfs] exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 f5 (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 INFO: Object 0x000000001a340018 @offset=4408 INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1917 cpu=6 pid=1729873 __slab_alloc.isra.0+0x109/0x1c0 kmem_cache_alloc+0x7bb/0x830 btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] btrfs_free_tree_block+0x128/0x360 [btrfs] __btrfs_cow_block+0x489/0x5f0 [btrfs] btrfs_cow_block+0xf7/0x220 [btrfs] btrfs_search_slot+0x62a/0xc40 [btrfs] btrfs_del_orphan_item+0x65/0xd0 [btrfs] btrfs_find_orphan_roots+0x1bf/0x200 [btrfs] open_ctree+0x125a/0x18a0 [btrfs] btrfs_mount_root.cold+0x13/0xed [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xe0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=4167 cpu=4 pid=1729795 kmem_cache_free+0x34c/0x3c0 __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] btrfs_run_delayed_refs+0x81/0x210 [btrfs] btrfs_commit_transaction+0x60/0xc40 [btrfs] create_subvol+0x56a/0x990 [btrfs] btrfs_mksubvol+0x3fb/0x4a0 [btrfs] __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs] btrfs_ioctl_snap_create+0x58/0x80 [btrfs] btrfs_ioctl+0x1a92/0x36f0 [btrfs] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 INFO: Object 0x000000002b46292a @offset=13648 INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1923 cpu=6 pid=1729873 __slab_alloc.isra.0+0x109/0x1c0 kmem_cache_alloc+0x7bb/0x830 btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] btrfs_alloc_tree_block+0x2bf/0x360 [btrfs] alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs] __btrfs_cow_block+0x12d/0x5f0 [btrfs] btrfs_cow_block+0xf7/0x220 [btrfs] btrfs_search_slot+0x62a/0xc40 [btrfs] btrfs_del_orphan_item+0x65/0xd0 [btrfs] btrfs_find_orphan_roots+0x1bf/0x200 [btrfs] open_ctree+0x125a/0x18a0 [btrfs] btrfs_mount_root.cold+0x13/0xed [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xe0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=3164 cpu=6 pid=1729803 kmem_cache_free+0x34c/0x3c0 __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] btrfs_run_delayed_refs+0x81/0x210 [btrfs] commit_cowonly_roots+0xfb/0x300 [btrfs] btrfs_commit_transaction+0x367/0xc40 [btrfs] close_ctree+0x113/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 kmem_cache_destroy btrfs_delayed_tree_ref: Slab cache still has objects CPU: 5 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 kmem_cache_destroy+0x119/0x120 btrfs_delayed_ref_exit+0x1d/0x35 [btrfs] exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 f5 (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 ============================================================================= BUG btrfs_delayed_extent_op (Tainted: G B W ): Objects remaining in btrfs_delayed_extent_op on __kmem_cache_shutdown() ----------------------------------------------------------------------------- INFO: Slab 0x00000000f145ce2f objects=22 used=1 fp=0x00000000af0f92cf flags=0x17fffc000010200 CPU: 5 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 slab_err+0xb7/0xdc ? lock_acquired+0x199/0x490 __kmem_cache_shutdown+0x1ac/0x3c0 ? __mutex_unlock_slowpath+0x45/0x2a0 kmem_cache_destroy+0x55/0x120 exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 f5 (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 INFO: Object 0x000000004cf95ea8 @offset=6264 INFO: Allocated in btrfs_alloc_tree_block+0x1e0/0x360 [btrfs] age=1931 cpu=6 pid=1729873 __slab_alloc.isra.0+0x109/0x1c0 kmem_cache_alloc+0x7bb/0x830 btrfs_alloc_tree_block+0x1e0/0x360 [btrfs] alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs] __btrfs_cow_block+0x12d/0x5f0 [btrfs] btrfs_cow_block+0xf7/0x220 [btrfs] btrfs_search_slot+0x62a/0xc40 [btrfs] btrfs_del_orphan_item+0x65/0xd0 [btrfs] btrfs_find_orphan_roots+0x1bf/0x200 [btrfs] open_ctree+0x125a/0x18a0 [btrfs] btrfs_mount_root.cold+0x13/0xed [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xe0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] INFO: Freed in __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs] age=3173 cpu=6 pid=1729803 kmem_cache_free+0x34c/0x3c0 __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs] btrfs_run_delayed_refs+0x81/0x210 [btrfs] commit_cowonly_roots+0xfb/0x300 [btrfs] btrfs_commit_transaction+0x367/0xc40 [btrfs] close_ctree+0x113/0x2fa [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x68/0xb0 exit_to_user_mode_prepare+0x1bb/0x1c0 syscall_exit_to_user_mode+0x4b/0x260 entry_SYSCALL_64_after_hwframe+0x44/0xa9 kmem_cache_destroy btrfs_delayed_extent_op: Slab cache still has objects CPU: 3 PID: 1729921 Comm: rmmod Tainted: G B W 5.10.0-rc4-btrfs-next-73 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8d/0xb5 kmem_cache_destroy+0x119/0x120 exit_btrfs_fs+0xa/0x59 [btrfs] __x64_sys_delete_module+0x194/0x260 ? fpregs_assert_state_consistent+0x1e/0x40 ? exit_to_user_mode_prepare+0x55/0x1c0 ? trace_hardirqs_on+0x1b/0xf0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f693e305897 Code: 73 01 c3 48 8b 0d f9 (...) RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8 RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740 R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760 BTRFS: state leak: start 30408704 end 30425087 state 1 in tree 1 refs 1 Fix this issue by having the remount path stop the qgroup rescan worker when we are remounting RO and teach the rescan worker to stop when a remount is in progress. If later a remount in RW mode happens, we are already resuming the qgroup rescan worker through the call to btrfs_qgroup_rescan_resume(), so we do not need to worry about that. Tested-by: Fabian Vogt <fvogt@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
2838d255 |
|
18-Nov-2020 |
Boris Burkov <boris@bur.io> |
btrfs: warn when remount will not change the free space tree If the remount is ro->ro, rw->ro, or rw->rw, we will not create or clear the free space tree. This can be surprising, so print a warning to dmesg to make the failure more visible. It is also important to ensure that the space cache options (SPACE_CACHE, FREE_SPACE_TREE) are consistent, so ensure those are set to properly match the current on disk state (which won't be changing). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
04c41559 |
|
18-Nov-2020 |
Boris Burkov <boris@bur.io> |
btrfs: use superblock state to print space_cache mount option To make the contents of /proc/mounts better match the actual state of the filesystem, base the display of the space cache mount options off the contents of the super block rather than the last mount options passed in. Since there are many scenarios where the mount will ignore a space cache option, simply showing the passed in option is misleading. For example, if we mount with -o remount,space_cache=v2 on a read-write file system without an existing free space tree, we won't build a free space tree, but /proc/mounts will read space_cache=v2 (until we mount again and it goes away) cache_generation is set iff space_cache=v1, FREE_SPACE_TREE is set iff space_cache=v2, and if neither is the case, we print nospace_cache. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
94846229 |
|
18-Nov-2020 |
Boris Burkov <boris@bur.io> |
btrfs: keep sb cache_generation consistent with space_cache When mounting, btrfs uses the cache_generation in the super block to determine if space cache v1 is in use. However, by mounting with nospace_cache or space_cache=v2, it is possible to disable space cache v1, which does not result in un-setting cache_generation back to 0. In order to base some logic, like mount option printing in /proc/mounts, on the current state of the space cache rather than just the values of the mount option, keep the value of cache_generation consistent with the status of space cache v1. We ensure that cache_generation > 0 iff the file system is using space_cache v1. This requires committing a transaction on any mount which changes whether we are using v1. (v1->nospace_cache, v1->v2, nospace_cache->v1, v2->v1). Since the mechanism for writing out the cache generation is transaction commit, but we want some finer grained control over when we un-set it, we can't just rely on the SPACE_CACHE mount option, and introduce an fs_info flag that mount can use when it wants to unset the generation. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
8cd29088 |
|
18-Nov-2020 |
Boris Burkov <boris@bur.io> |
btrfs: clear oneshot options on mount and remount Some options only apply during mount time and are cleared at the end of mount. For now, the example is USEBACKUPROOT, but CLEAR_CACHE also fits the bill, and this is a preparation patch for also clearing that option. One subtlety is that the current code only resets USEBACKUPROOT on rw mounts, but the option is meaningfully "consumed" by a ro mount, so it feels appropriate to clear in that case as well. A subsequent read-write remount would not go through open_ctree, which is the only place that checks the option, so the change should be benign. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
44c0ca21 |
|
18-Nov-2020 |
Boris Burkov <boris@bur.io> |
btrfs: lift read-write mount setup from mount and remount Mounting rw and remounting from ro to rw naturally share invariants and functionality which result in a correctly setup rw filesystem. Luckily, there is even a strong unity in the code which implements them. In mount's open_ctree, these operations mostly happen after an early return for ro file systems, and in remount, they happen in a section devoted to remounting ro->rw, after some remount specific validation passes. However, there are unfortunately a few differences. There are small deviations in the order of some of the operations, remount does not start orphan cleanup in root_tree or fs_tree, remount does not create the free space tree, and remount does not handle "one-shot" mount options like clear_cache and uuid tree rescan. Since we want to add building the free space tree to remount, and also to start the same orphan cleanup process on a filesystem mounted as ro then remounted rw, we would benefit from unifying the logic between the two code paths. This patch only lifts the existing common functionality, and leaves a natural path for fixing the discrepancies. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
5297199a |
|
26-Nov-2020 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: remove inode number cache feature It's been deprecated since commit b547a88ea577 ("btrfs: start deprecation of mount option inode_cache") which enumerates the reasons. A filesystem that uses the feature (mount -o inode_cache) tracks the inode numbers in bitmaps, that data stay on the filesystem after this patch. The size is roughly 5MiB for 1M inodes [1], which is considered small enough to be left there. Removal of the change can be implemented in btrfs-progs if needed. [1] https://lore.kernel.org/linux-btrfs/20201127145836.GZ6430@twin.jikos.cz/ Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
5d1ab66c |
|
10-Nov-2020 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: disallow space_cache in ZONED mode As updates to the space cache v1 are in-place, the space cache cannot be located over sequential zones and there is no guarantees that the device will have enough conventional zones to store this cache. Resolve this problem by disabling completely the space cache v1. This does not introduce any problems with sequential block groups: all the free space is located after the allocation pointer and no free space before the pointer. There is no need to have such cache. Note: we can technically use free-space-tree (space cache v2) on ZONED mode. But, since ZONED mode now always allocates extents in a block group sequentially regardless of underlying device zone type, it's no use to enable and maintain the tree. For the same reason, NODATACOW is also disabled. In summary, ZONED will disable: | Disabled features | Reason | |-------------------+-----------------------------------------------------| | RAID/DUP | Cannot handle two zone append writes to different | | | zones | |-------------------+-----------------------------------------------------| | space_cache (v1) | In-place updating | | NODATACOW | In-place updating | |-------------------+-----------------------------------------------------| | fallocate | Reserved extent will be a write hole | |-------------------+-----------------------------------------------------| | MIXED_BG | Allocated metadata region will be write holes for | | | data writes | Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
b70f5097 |
|
10-Nov-2020 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: check and enable ZONED mode Introduce function btrfs_check_zoned_mode() to check if ZONED flag is enabled on the file system and if the file system consists of zoned devices with equal zone size. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
5b316468 |
|
10-Nov-2020 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: get zone information of zoned block devices If a zoned block device is found, get its zone information (number of zones and zone size). To avoid costly run-time zone report commands to test the device zones type during block allocation, attach the seq_zones bitmap to the device structure to indicate if a zone is sequential or accept random writes. Also it attaches the empty_zones bitmap to indicate if a zone is empty or not. This patch also introduces the helper function btrfs_dev_is_sequential() to test if the zone storing a block is a sequential write required zone and btrfs_dev_is_empty_zone() to test if the zone is a empty zone. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
a0f6d924 |
|
13-Nov-2020 |
David Sterba <dsterba@suse.com> |
btrfs: remove stub device info from messages when we have no fs_info Without a NULL fs_info the helpers will print something like BTRFS error (device <unknown>): ... This can happen in contexts where fs_info is not available at all or it's potentially unsafe due to object lifetime. The <unknown> stub does not bring much information and with the prefix makes the message unnecessarily longer. Remove it for the NULL fs_info case. BTRFS error: ... Callers can add the device information to the message itself if needed. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
b9729ce0 |
|
20-Aug-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: locking: rip out path->leave_spinning We no longer distinguish between blocking and spinning, so rip out all this code. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
265fdfa6 |
|
01-Jul-2020 |
David Sterba <dsterba@suse.com> |
btrfs: replace s_blocksize_bits with fs_info::sectorsize_bits The value of super_block::s_blocksize_bits is the same as fs_info::sectorsize_bits, but we don't need to do the extra dereferences in many functions and storing the bits as u32 (in fs_info) generates shorter assembly. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
ab1405aa |
|
29-Sep-2020 |
David Sterba <dsterba@suse.com> |
btrfs: generate lockdep keyset names at compile time The names in btrfs_lockdep_keysets are generated from a simple pattern using snprintf but we can generate them directly with some macro magic and remove the helpers. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
9037d3cb |
|
16-Oct-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: introduce mount option rescue=all Now that we have the building blocks for some better recovery options with corrupted file systems, add a rescue=all option to enable all of the relevant rescue options. This will allow distros to simply default to rescue=all for the "oh dear lord the world's on fire" recovery without needing to know all the different options that we have and may add in the future. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
882dbe0c |
|
16-Oct-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: introduce mount option rescue=ignoredatacsums There are cases where you can end up with bad data csums because of misbehaving applications. This happens when an application modifies a buffer in-flight when doing an O_DIRECT write. In order to recover the file we need a way to turn off data checksums so you can copy the file off, and then you can delete the file and restore it properly later. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
42437a63 |
|
16-Oct-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: introduce mount option rescue=ignorebadroots In the face of extent root corruption, or any other core fs wide root corruption we will fail to mount the file system. This makes recovery kind of a pain, because you need to fall back to userspace tools to scrape off data. Instead provide a mechanism to gracefully handle bad roots, so we can at least mount read-only and possibly recover data from the file system. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
68319c18 |
|
16-Oct-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: show rescue=usebackuproot in /proc/mounts The standalone option usebackuproot was intended as one-time use and it was not necessary to keep it in the option list. Now that we're going to have more rescue options, it's desirable to keep them intact as it could be confusing why the option disappears. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ remove the btrfs_clear_opt part from open_ctree ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
ab0b4a3e |
|
16-Oct-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: add a helper to print out rescue= options We're going to have a lot of rescue options, add a helper to collapse the /proc/mounts output to rescue=option1:option2:option3 format. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d70bf748 |
|
16-Oct-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: unify the ro checking for mount options We're going to be adding more options that require RDONLY, so add a helper to do the check and error out if we don't have RDONLY set. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
72804905 |
|
01-Sep-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: kill the RCU protection for fs_info->space_info We have this thing wrapped in an RCU lock, but it's really not needed. We create all the space_info's on mount, and we destroy them on unmount. The list never changes and we're protected from messing with it by the normal mount/umount path, so kill the RCU stuff around it. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
57056740 |
|
21-Jul-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: do async reclaim for data reservations Now that we have the data ticketing stuff in place, move normal data reservations to use an async reclaim helper to satisfy tickets. Before we could have multiple tasks race in and both allocate chunks, resulting in more data chunks than we would necessarily need. Serializing these allocations and making a single thread responsible for flushing will only allocate chunks as needed, as well as cut down on transaction commits and other flush related activities. Priority reservations will still work as they have before, simply trying to allocate a chunk until they can make their reservation. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
282dd7d7 |
|
03-Aug-2020 |
Marcos Paulo de Souza <mpdesouza@suse.com> |
btrfs: reset compression level for lzo on remount Currently a user can set mount "-o compress" which will set the compression algorithm to zlib, and use the default compress level for zlib (3): relatime,compress=zlib:3,space_cache If the user remounts the fs using "-o compress=lzo", then the old compress_level is used: relatime,compress=lzo:3,space_cache But lzo does not expose any tunable compression level. The same happens if we set any compress argument with different level, also with zstd. Fix this by resetting the compress_level when compress=lzo is specified. With the fix applied, lzo is shown without compress level: relatime,compress=lzo,space_cache CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
faa00889 |
|
30-Jul-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: make sure SB_I_VERSION doesn't get unset by remount There's some inconsistency around SB_I_VERSION handling with mount and remount. Since we don't really want it to be off ever just work around this by making sure we don't get the flag cleared on remount. There's a tiny cpu cost of setting the bit, otherwise all changes to i_version also change some of the times (ctime/mtime) so the inode needs to be synced. We wouldn't save anything by disabling it. Reported-by: Eric Sandeen <sandeen@redhat.com> CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add perf impact analysis ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
3ef3959b |
|
22-Jul-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: don't show full path of bind mounts in subvol= Chris Murphy reported a problem where rpm ostree will bind mount a bunch of things for whatever voodoo it's doing. But when it does this /proc/mounts shows something like /dev/sda /mnt/test btrfs rw,relatime,subvolid=256,subvol=/foo 0 0 /dev/sda /mnt/test/baz btrfs rw,relatime,subvolid=256,subvol=/foo/bar 0 0 Despite subvolid=256 being subvol=/foo. This is because we're just spitting out the dentry of the mount point, which in the case of bind mounts is the source path for the mountpoint. Instead we should spit out the path to the actual subvol. Fix this by looking up the name for the subvolid we have mounted. With this fix the same test looks like this /dev/sda /mnt/test btrfs rw,relatime,subvolid=256,subvol=/foo 0 0 /dev/sda /mnt/test/baz btrfs rw,relatime,subvolid=256,subvol=/foo 0 0 Reported-by: Chris Murphy <chris@colorremedies.com> CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
27942c99 |
|
23-Jul-2020 |
David Sterba <dsterba@suse.com> |
btrfs: fix messages after changing compression level by remount Reported by Forza on IRC that remounting with compression options does not reflect the change in level, or at least it does not appear to do so according to the messages: mount -o compress=zstd:1 /dev/sda /mnt mount -o remount,compress=zstd:15 /mnt does not print the change to the level to syslog: [ 41.366060] BTRFS info (device vda): use zstd compression, level 1 [ 41.368254] BTRFS info (device vda): disk space caching is enabled [ 41.390429] BTRFS info (device vda): disk space caching is enabled What really happens is that the message is lost but the level is actualy changed. There's another weird output, if compression is reset to 'no': [ 45.413776] BTRFS info (device vda): use no compression, level 4 To fix that, save the previous compression level and print the message in that case too and use separate message for 'no' compression. CC: stable@vger.kernel.org # 4.19+ Signed-off-by: David Sterba <dsterba@suse.com>
|
#
88c4703f |
|
22-Jul-2020 |
Johannes Thumshirn <johannes.thumshirn@wdc.com> |
btrfs: open-code remount flag setting in btrfs_remount When we're (re)mounting a btrfs filesystem we set the BTRFS_FS_STATE_REMOUNTING state in fs_info to serialize against async reclaim or defrags. This flag is set in btrfs_remount_prepare() called by btrfs_remount(). As btrfs_remount_prepare() does nothing but setting this flag and doesn't have a second caller, we can just open-code the flag setting in btrfs_remount(). Similarly do for so clearing of the flag by moving it out of btrfs_remount_cleanup() into btrfs_remount() to be symmetrical. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
59131393 |
|
21-Jul-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: document special case error codes for fs errors We've had some discussions about what to do in certain scenarios for error codes, specifically EUCLEAN and EROFS. Document these near the error handling code so its clear what their intentions are. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
4faf55b0 |
|
10-Jul-2020 |
Anand Jain <anand.jain@oracle.com> |
btrfs: don't traverse into the seed devices in show_devname ->show_devname currently shows the lowest devid in the list. As the seed devices have the lowest devid in the sprouted filesystem, the userland tool such as findmnt end up seeing seed device instead of the device from the read-writable sprouted filesystem. As shown below. mount /dev/sda /btrfs mount: /btrfs: WARNING: device write-protected, mounted read-only. findmnt --output SOURCE,TARGET,UUID /btrfs SOURCE TARGET UUID /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111 btrfs dev add -f /dev/sdb /btrfs umount /btrfs mount /dev/sdb /btrfs findmnt --output SOURCE,TARGET,UUID /btrfs SOURCE TARGET UUID /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111 All sprouts from a single seed will show the same seed device and the same fsid. That's confusing. This is causing problems in our prototype as there isn't any reference to the sprout file-system(s) which is being used for actual read and write. This was added in the patch which implemented the show_devname in btrfs commit 9c5085c14798 ("Btrfs: implement ->show_devname"). I tried to look for any particular reason that we need to show the seed device, there isn't any. So instead, do not traverse through the seed devices, just show the lowest devid in the sprouted fsid. After the patch: mount /dev/sda /btrfs mount: /btrfs: WARNING: device write-protected, mounted read-only. findmnt --output SOURCE,TARGET,UUID /btrfs SOURCE TARGET UUID /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111 btrfs dev add -f /dev/sdb /btrfs mount -o rw,remount /dev/sdb /btrfs findmnt --output SOURCE,TARGET,UUID /btrfs SOURCE TARGET UUID /dev/sdb /btrfs 595ca0e6-b82e-46b5-b9e2-c72a6928be48 mount /dev/sda /btrfs1 mount: /btrfs1: WARNING: device write-protected, mounted read-only. btrfs dev add -f /dev/sdc /btrfs1 findmnt --output SOURCE,TARGET,UUID /btrfs1 SOURCE TARGET UUID /dev/sdc /btrfs1 ca1dbb7a-8446-4f95-853c-a20f3f82bdbb cat /proc/self/mounts | grep btrfs /dev/sdb /btrfs btrfs rw,relatime,noacl,space_cache,subvolid=5,subvol=/ 0 0 /dev/sdc /btrfs1 btrfs ro,relatime,noacl,space_cache,subvolid=5,subvol=/ 0 0 Reported-by: Martin K. Petersen <martin.petersen@oracle.com> CC: stable@vger.kernel.org # 4.19+ Tested-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
b90a4ab6 |
|
01-Jul-2020 |
David Sterba <dsterba@suse.com> |
btrfs: remove deprecated mount option subvolrootid The option subvolrootid used to be a workaround for mounting subvolumes and ineffective since 5e2a4b25da23 ("btrfs: deprecate subvolrootid mount option"). We have subvol= that works and we don't need to keep the cruft, let's remove it. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d801e7a3 |
|
01-Jul-2020 |
David Sterba <dsterba@suse.com> |
btrfs: remove deprecated mount option alloc_start The mount option alloc_start has no effect since 0d0c71b31720 ("btrfs: obsolete and remove mount option alloc_start") which has details why it's been deprecated. We can remove it. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
b547a88e |
|
18-Jun-2020 |
David Sterba <dsterba@suse.com> |
btrfs: start deprecation of mount option inode_cache Estimated time of removal of the functionality is 5.11, the option will be still parsed but will have no effect. Reasons for deprecation and removal: - very poor naming choice of the mount option, it's supposed to cache and reuse the inode _numbers_, but it sounds a some generic cache for inodes - the only known usecase where this option would make sense is on a 32bit architecture where inode numbers in one subvolume would be exhausted due to 32bit inode::i_ino - the cache is stored on disk, consumes space, needs to be loaded and written back - new inode number allocation is slower due to lookups into the cache (compared to a simple increment which is the default) - uses the free-space-cache code that is going to be deprecated as well in the future Known problems: - since 2011, returning EEXIST when there's not enough space in a page to store all checksums, see commit 4b9465cb9e38 ("Btrfs: add mount -o inode_cache") Remaining issues: - if the option was enabled, new inodes created, the option disabled again, the cache is still stored on the devices and there's currently no way to remove it Signed-off-by: David Sterba <dsterba@suse.com>
|
#
74ef0018 |
|
04-Jun-2020 |
Qu Wenruo <wqu@suse.com> |
btrfs: introduce "rescue=" mount option This patch introduces a new "rescue=" mount option group for all mount options for data recovery. Different rescue sub options are seperated by ':'. E.g "ro,rescue=nologreplay:usebackuproot". The original plan was to use ';', but ';' needs to be escaped/quoted, or it will be interpreted by bash, similar to '|'. And obviously, user can specify rescue options one by one like: "ro,rescue=nologreplay,rescue=usebackuproot". The following mount options are converted to "rescue=", old mount options are deprecated but still available for compatibility purpose: - usebackuproot Now it's "rescue=usebackuproot" - nologreplay Now it's "rescue=nologreplay" Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
c730ae0c |
|
16-Jun-2020 |
Marcos Paulo de Souza <mpdesouza@suse.com> |
btrfs: convert comments to fallthrough annotations Convert fall through comments to the pseudo-keyword which is now the preferred way. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
0202e83f |
|
15-May-2020 |
David Sterba <dsterba@suse.com> |
btrfs: simplify iget helpers The inode lookup starting at btrfs_iget takes the full location key, while only the objectid is used to match the inode, because the lookup happens inside the given root thus the inode number is unique. The entire location key is properly set up in btrfs_init_locked_inode. Simplify the helpers and pass only inode number, renaming it to 'ino' instead of 'objectid'. This allows to remove temporary variables key, saving some stack space. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
56e9357a |
|
15-May-2020 |
David Sterba <dsterba@suse.com> |
btrfs: simplify root lookup by id The main function to lookup a root by its id btrfs_get_fs_root takes the whole key, while only using the objectid. The value of offset is preset to (u64)-1 but not actually used until btrfs_find_root that does the actual search. Switch btrfs_get_fs_root to use only objectid and remove all local variables that existed just for the lookup. The actual key for search is set up in btrfs_get_fs_root, reusing another key variable. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
fb8521ca |
|
28-Apr-2020 |
David Sterba <dsterba@suse.com> |
btrfs: add more codes to decoder table I've grepped logs for 'errno=.*unknown' and found -95, -117 and -122, now added to the table. The wording is adjusted so it makes sense in context of filesystem. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d54f8144 |
|
28-Apr-2020 |
David Sterba <dsterba@suse.com> |
btrfs: sort error decoder entries Add the raw errnos and sort them accordingly. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
7e8f19e5 |
|
27-Nov-2019 |
David Sterba <dsterba@suse.com> |
btrfs: adjust message level for unrecognized mount option An unrecognized option is a failure that should get user/administrator attention, the info level is often below what gets logged, so make it error. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
c0c907a4 |
|
21-Feb-2020 |
Marcos Paulo de Souza <mpdesouza@suse.com> |
btrfs: export helpers for subvolume name/id resolution The functions will be used outside of export.c and super.c to allow resolving subvolume name from a given id, eg. for subvolume deletion by id ioctl. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ split from the next patch ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
bf31f87f |
|
05-Feb-2020 |
David Sterba <dsterba@suse.com> |
btrfs: add wrapper for transaction abort predicate The status of aborted transaction can change between calls and it needs to be accessed by READ_ONCE. Add a helper that also wraps the unlikely hint. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
00246528 |
|
24-Jan-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: rename btrfs_put_fs_root and btrfs_grab_fs_root We are now using these for all roots, rename them to btrfs_put_root() and btrfs_grab_root(); Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
8260edba |
|
24-Jan-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: make the init of static elements in fs_info separate In adding things like eb leak checking and root leak checking there were a lot of weird corner cases that come from the fact that 1) We do not init the fs_info until we get to open_ctree time in the normal case and 2) The test infrastructure half-init's the fs_info for things that it needs. This makes it really annoying to make changes because you have to add init in two different places, have special cases for testing fs_info's that may not have certain things initialized, and cases for fs_info's that didn't make it to open_ctree and thus are not fully set up. Fix this by extracting out the non-allocating init of the fs info into it's own public function and use that to make sure we're all getting consistent views of an allocated fs_info. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
bc44d7c4 |
|
24-Jan-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: push btrfs_grab_fs_root into btrfs_get_fs_root Now that all callers of btrfs_get_fs_root are subsequently calling btrfs_grab_fs_root and handling dropping the ref when they are done appropriately, go ahead and push btrfs_grab_fs_root up into btrfs_get_fs_root. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
0d4b0463 |
|
24-Jan-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: export and rename free_fs_info We're going to start freeing roots and doing other complicated things in free_fs_info, so we need to move it to disk-io.c and export it in order to use things lik btrfs_put_fs_root(). Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
5168489a |
|
06-Feb-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: hold a ref on the root in get_subvol_name_from_objectid We lookup the name of a subvol which means we'll cross into different roots. Hold a ref while we're doing the look ups in the fs_root we're searching. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
3619c94f |
|
24-Jan-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: open code btrfs_read_fs_root_no_name All this does is call btrfs_get_fs_root() with check_ref == true. Just use btrfs_get_fs_root() so we don't have a bunch of different helpers that do the same thing. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
cfe953c8 |
|
03-Feb-2020 |
Su Yue <Damenly_Su@gmx.com> |
btrfs: update the comment of btrfs_control_ioctl() Btrfsctl was removed in 2012, now the function btrfs_control_ioctl() is only used for devices ioctls. So update the comment. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Su Yue <Damenly_Su@gmx.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
10a3a3ed |
|
05-Feb-2020 |
David Sterba <dsterba@suse.com> |
btrfs: log message when rw remount is attempted with unclean tree-log A remount to a read-write filesystem is not safe when there's tree-log to be replayed. Files that could be opened until now might be affected by the changes in the tree-log. A regular mount is needed to replay the log so the filesystem presents the consistent view with the pending changes included. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d55966c4 |
|
31-Jan-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: do not zero f_bavail if we have available space There was some logic added a while ago to clear out f_bavail in statfs() if we did not have enough free metadata space to satisfy our global reserve. This was incorrect at the time, however didn't really pose a problem for normal file systems because we would often allocate chunks if we got this low on free metadata space, and thus wouldn't really hit this case unless we were actually full. Fast forward to today and now we are much better about not allocating metadata chunks all of the time. Couple this with d792b0f19711 ("btrfs: always reserve our entire size for the global reserve") which now means we'll easily have a larger global reserve than our free space, we are now more likely to trip over this while still having plenty of space. Fix this by skipping this logic if the global rsv's space_info is not full. space_info->full is 0 unless we've attempted to allocate a chunk for that space_info and that has failed. If this happens then the space for the global reserve is definitely sacred and we need to report b_avail == 0, but before then we can just use our calculated b_avail. Reported-by: Martin Steigerwald <martin@lichtvoll.de> Fixes: ca8a51b3a979 ("btrfs: statfs: report zero available if metadata are exhausted") CC: stable@vger.kernel.org # 4.5+ Reviewed-by: Qu Wenruo <wqu@suse.com> Tested-By: Martin Steigerwald <martin@lichtvoll.de> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
b0643e59 |
|
13-Dec-2019 |
Dennis Zhou <dennis@kernel.org> |
btrfs: add the beginning of async discard, discard workqueue When discard is enabled, everytime a pinned extent is released back to the block_group's free space cache, a discard is issued for the extent. This is an overeager approach when it comes to discarding and helping the SSD maintain enough free space to prevent severe garbage collection situations. This adds the beginning of async discard. Instead of issuing a discard prior to returning it to the free space, it is just marked as untrimmed. The block_group is then added to a LRU which then feeds into a workqueue to issue discards at a much slower rate. Full discarding of unused block groups is still done and will be addressed in a future patch of the series. For now, we don't persist the discard state of extents and bitmaps. Therefore, our failure recovery mode will be to consider extents untrimmed. This lets us handle failure and unmounting as one in the same. On a number of Facebook webservers, I collected data every minute accounting the time we spent in btrfs_finish_extent_commit() (col. 1) and in btrfs_commit_transaction() (col. 2). btrfs_finish_extent_commit() is where we discard extents synchronously before returning them to the free space cache. discard=sync: p99 total per minute p99 total per minute Drive | extent_commit() (ms) | commit_trans() (ms) --------------------------------------------------------------- Drive A | 434 | 1170 Drive B | 880 | 2330 Drive C | 2943 | 3920 Drive D | 4763 | 5701 discard=async: p99 total per minute p99 total per minute Drive | extent_commit() (ms) | commit_trans() (ms) -------------------------------------------------------------- Drive A | 134 | 956 Drive B | 64 | 1972 Drive C | 59 | 1032 Drive D | 62 | 1200 While it's not great that the stats are cumulative over 1m, all of these servers are running the same workload and and the delta between the two are substantial. We are spending significantly less time in btrfs_finish_extent_commit() which is responsible for discarding. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
46b27f50 |
|
13-Dec-2019 |
Dennis Zhou <dennis@kernel.org> |
btrfs: rename DISCARD mount option to to DISCARD_SYNC This series introduces async discard which will use the flag DISCARD_ASYNC, so rename the original flag to DISCARD_SYNC as it is synchronously done in transaction commit. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
8d6fac00 |
|
02-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: add support for 4-copy replication (raid1c4) Add new block group profile to store 4 copies in a simliar way that current RAID1 does. The profile attributes and constraints are defined in the raid table and used by the same code that already handles the 2- and 3-copy RAID1. The minimum number of devices is 4, the maximum number of devices/chunks that can be lost/damaged is 3. There is no comparable traditional RAID level, the profile is added for future needs to accompany triple-parity and beyond. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
47e6f742 |
|
02-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: add support for 3-copy replication (raid1c3) Add new block group profile to store 3 copies in a simliar way that current RAID1 does. The profile attributes and constraints are defined in the raid table and used by the same code that already handles the 2-copy RAID1. The minimum number of devices is 3, the maximum number of devices/chunks that can be lost/damaged is 2. Like RAID6 but with 33% space utilization. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
f5389f33 |
|
24-Oct-2019 |
Johannes Thumshirn <jthumshirn@suse.de> |
btrfs: remove cached space_info in btrfs_statfs() In btrfs_statfs() we cache fs_info::space_info in a local variable only to use it once in a list_for_each_rcu() statement. Not only is the local variable unnecessary it even makes the code harder to follow as it's not clear which list it is iterating. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
352ae07b |
|
07-Oct-2019 |
David Sterba <dsterba@suse.com> |
btrfs: add blake2b to checksumming algorithms Add blake2b (with 256 bit digest) to the list of possible checksumming algorithms used by BTRFS. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
3831bf00 |
|
07-Oct-2019 |
Johannes Thumshirn <jthumshirn@suse.de> |
btrfs: add sha256 to checksumming algorithm Add sha256 to the list of possible checksumming algorithms used by BTRFS. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
3951e7f0 |
|
07-Oct-2019 |
Johannes Thumshirn <jthumshirn@suse.de> |
btrfs: add xxhash64 to checksumming algorithms Add xxhash64 to the list of possible checksumming algorithms used by BTRFS. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
ba8a9d07 |
|
10-Jul-2019 |
Chris Mason <clm@fb.com> |
Btrfs: delete the entire async bio submission framework Now that we're not using btrfs_schedule_bio() anymore, delete all the code that supported it. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
4143cb8b |
|
01-Oct-2019 |
David Sterba <dsterba@suse.com> |
btrfs: add const function attribute For some reason the attribute is called __attribute_const__ and not __const, marks functions that have no observable effects on program state, IOW not reading pointers, just the arguments and calculating a value. Allows the compiler to do some optimizations, based on -Wsuggest-attribute=const . The effects are rather small, though, about 60 bytes decrese of btrfs.ko. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
b105e927 |
|
01-Oct-2019 |
David Sterba <dsterba@suse.com> |
btrfs: add __cold attribute to more functions The attribute can mark functions supposed to be called rarely if at all and the text can be moved to sections far from the other code. The attribute has been added to several functions already, this patch is based on hints given by gcc -Wsuggest-attribute=cold. The net effect of this patch is decrease of btrfs.ko by 1000-1300, depending on the config options. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
4c66e0d4 |
|
03-Oct-2019 |
David Sterba <dsterba@suse.com> |
btrfs: drop unused parameter is_new from btrfs_iget The parameter is now always set to NULL and could be dropped. The last user was get_default_root but that got reworked in 05dbe6837b60 ("Btrfs: unify subvol= and subvolid= mounting") and the parameter became unused. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
6f0d04f8 |
|
23-Sep-2019 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: separate out the extent io init function We are moving extent_io_tree into it's on file, so separate out the extent_state init stuff from extent_io_tree_init(). Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
1832f2d8 |
|
11-Sep-2018 |
Arnd Bergmann <arnd@arndb.de> |
compat_ioctl: move more drivers to compat_ptr_ioctl The .ioctl and .compat_ioctl file operations have the same prototype so they can both point to the same function, which works great almost all the time when all the commands are compatible. One exception is the s390 architecture, where a compat pointer is only 31 bit wide, and converting it into a 64-bit pointer requires calling compat_ptr(). Most drivers here will never run in s390, but since we now have a generic helper for it, it's easy enough to use it consistently. I double-checked all these drivers to ensure that all ioctl arguments are used as pointers or are ignored, but are not interpreted as integer values. Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: David Sterba <dsterba@suse.com> Acked-by: Darren Hart (VMware) <dvhart@infradead.org> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
#
89439109 |
|
01-Aug-2019 |
David Sterba <dsterba@suse.com> |
btrfs: move sysfs declarations out of ctree.h As the header for sysfs code already exists, use it to clean up ctree.h. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
aac0023c |
|
20-Jun-2019 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move basic block_group definitions to their own header This is prep work for moving all of the block group cache code into its own file. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ minor comment updates ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
559ca6ea |
|
03-Jul-2019 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Refactor btrfs_calc_avail_data_space Simplify the code by removing variables that don't bring any real value as well as simplifying the checks when buidling the candidate list of devices. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
8719aaae |
|
18-Jun-2019 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move space_info to space-info.h Migrate the struct definition and the one helper that's in ctree.h into space-info.h Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
e1ea2bee |
|
18-Jun-2019 |
David Sterba <dsterba@suse.com> |
btrfs: use raid_attr for minimum stripe count in btrfs_calc_avail_data_space Minimum stripe count matches the minimum devices required for a given profile. The open coded assignments match the raid_attr table. What's changed here is the meaning for RAID5/6. Previously their min_stripes would be 1, while newly it's devs_min. This however shold be the same as before because it's not possible to create filesystem on fewer devices than the raid_attr table allows. There's no adjustment regarding the parity stripes (like calc_data_stripes does), because we're interested in overall space that would fit on the devices. Missing devices make no difference for the whole calculation, we have the size stored in the structures. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
4f080f57 |
|
18-Jun-2019 |
David Sterba <dsterba@suse.com> |
btrfs: use raid_attr to adjust minimal stripe size in btrfs_calc_avail_data_space Special case for DUP can be replaced by lookup to the attribute table, where the dev_stripes is the right coefficient. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d5178578 |
|
03-Jun-2019 |
Johannes Thumshirn <jthumshirn@suse.de> |
btrfs: directly call into crypto framework for checksumming Currently btrfs_csum_data() relied on the crc32c() wrapper around the crypto framework for calculating the CRCs. As we have our own crypto_shash structure in the fs_info now, we can directly call into the crypto framework without going trough the wrapper. This way we can even remove the btrfs_csum_data() and btrfs_csum_final() wrappers. The module dependency on crc32c is preserved via MODULE_SOFTDEP("pre: crc32c"), which was previously provided by LIBCRC32C config option doing the same. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
cebf05ca |
|
05-Oct-2018 |
Goldwyn Rodrigues <rgoldwyn@suse.de> |
btrfs: Remove unused variable mode in btrfs_mount This is a leftover from 312c89fbca06 ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()"), the mode was used for opening devices that's not done here anymore. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
9b4e675a |
|
16-May-2019 |
David Sterba <dsterba@suse.com> |
btrfs: detect fast implementation of crc32c on all architectures Currently, there's only check for fast crc32c implementation on X86, based on the CPU flags. This is used to decide if checksumming should be offloaded to worker threads or can be calculated by the caller. As there are more architectures that implement a faster version of crc32c (ARM, SPARC, s390, MIPS, PowerPC), also there are specialized hw cards. The detection is based on driver name, all generic C implementations contain 'generic', while the specialized versions do not. Alternatively the priority could be used, but this is not currently provided by the crypto API. The flag is set per-filesystem at mount time and used for the offloading decisions. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
26602cab |
|
10-Apr-2019 |
Al Viro <viro@zeniv.linux.org.uk> |
btrfs: use ->free_inode() a lot of stuff remains in ->destroy_inode() Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
ae0bc863 |
|
29-Mar-2019 |
Anand Jain <anand.jain@oracle.com> |
btrfs: drop unused parameter in mount_subvol @device_name in mount_subvol() is not used, drop it. Also see: 5bedc48a8f9e ("btrfs: drop unused parameters from mount_subvol"). Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
3f93aef5 |
|
04-Feb-2019 |
Dennis Zhou <dennis@kernel.org> |
btrfs: add zstd compression level support Zstd compression requires different amounts of memory for each level of compression. The prior patches implemented indirection to allow for each compression type to manage their workspaces independently. This patch uses this indirection to implement compression level support for zstd. To manage the additional memory require, each compression level has its own queue of workspaces. A global LRU is used to help with reclaim. Reclaim is done via a timer which provides a mechanism to decrease memory utilization by keeping only workspaces around that are sized appropriately. Forward progress is guaranteed by a preallocated max workspace hidden from the LRU. When getting a workspace, it uses a bitmap to identify the levels that are populated and scans up. If it finds a workspace that is greater than it, it uses it, but does not update the last_used time and the corresponding place in the LRU. If we hit memory pressure, we sleep on the max level workspace. We continue to rescan in case we can use a smaller workspace, but eventually should be able to obtain the max level workspace or allocate one again should memory pressure subside. The memory requirement for decompression is the same as level 1, and therefore can use any of available workspace. The number of workspaces is bound by an upper limit of the workqueue's limit which currently is 2 (percpu limit). The reclaim timer is used to free inactive/improperly sized workspaces and is set to 307s to avoid colliding with transaction commit (every 30s). Repeating the experiment from v2 [1], the Silesia corpus was copied to a btrfs filesystem 10 times and then read back after dropping the caches. The btrfs filesystem was on an SSD. Level Ratio Compression (MB/s) Decompression (MB/s) Memory (KB) 1 2.658 438.47 910.51 780 2 2.744 364.86 886.55 1004 3 2.801 336.33 828.41 1260 4 2.858 286.71 886.55 1260 5 2.916 212.77 556.84 1388 6 2.363 119.82 990.85 1516 7 3.000 154.06 849.30 1516 8 3.011 159.54 875.03 1772 9 3.025 100.51 940.15 1772 10 3.033 118.97 616.26 1772 11 3.036 94.19 802.11 1772 12 3.037 73.45 931.49 1772 13 3.041 55.17 835.26 2284 14 3.087 44.70 716.78 2547 15 3.126 37.30 878.84 2547 [1] https://lore.kernel.org/linux-btrfs/20181031181108.289340-1-terrelln@fb.com/ Cc: Nick Terrell <terrelln@fb.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d0ab62ce |
|
04-Feb-2019 |
Dennis Zhou <dennis@kernel.org> |
btrfs: change set_level() to bound the level passed in Currently, the only user of set_level() is zlib which sets an internal workspace parameter. As level is now plumbed into get_workspace(), this can be handled there rather than separately. This repurposes set_level() to bound the level passed in so it can be used when setting the mounts compression level and as well as verifying the level before getting a workspace. The other benefit is this divides the meaning of compress(0) and get_workspace(0). The former means we want to use the default compression level of the compression type. The latter means we can use any workspace available. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
228a73ab |
|
03-Jan-2019 |
Anand Jain <anand.jain@oracle.com> |
btrfs: introduce new ioctl to unregister a btrfs device Support for a new command that can be used eg. as a command $ btrfs device scan --forget [dev]' (the final name may change though) to undo the effects of 'btrfs device scan [dev]'. For this purpose this patch proposes to use ioctl #5 as it was empty and is next to the SCAN ioctl. The new ioctl BTRFS_IOC_FORGET_DEV works only on the control device (/dev/btrfs-control) to unregister one or all devices, devices that are not mounted. The argument is struct btrfs_ioctl_vol_args, ::name specifies the device path. To unregister all device, the path is an empty string. Again, the devices are removed only if they aren't part of a mounte filesystem. This new ioctl provides: - release of unwanted btrfs_fs_devices and btrfs_devices structures from memory if the device is not going to be mounted - ability to mount filesystem in degraded mode, when one devices is corrupted like in split brain raid1 - running test cases which would require reloading the kernel module but this is not possible eg. due to mounted filesystem or built-in Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
532b618b |
|
30-Jan-2019 |
Eric W. Biederman <ebiederm@xmission.com> |
btrfs: On error always free subvol_name in btrfs_mount The subvol_name is allocated in btrfs_parse_subvol_options and is consumed and freed in mount_subvol. Add a free to the error paths that don't call mount_subvol so that it is guaranteed that subvol_name is freed when an error happens. Fixes: 312c89fbca06 ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()") Cc: stable@vger.kernel.org # v4.19+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
204cc0cc |
|
13-Dec-2018 |
Al Viro <viro@zeniv.linux.org.uk> |
LSM: hide struct security_mnt_opts from any generic code Keep void * instead, allocate on demand (in parse_str_opts, at the moment). Eventually both selinux and smack will be better off with private structures with several strings in those, rather than this "counter and two pointers to dynamically allocated arrays" ugliness. This commit allows to do that at leisure, without disrupting anything outside of given module. Changes: * instead of struct security_mnt_opt use an opaque pointer initialized to NULL. * security_sb_eat_lsm_opts(), security_sb_parse_opts_str() and security_free_mnt_opts() take it as var argument (i.e. as void **); call sites are unchanged. * security_sb_set_mnt_opts() and security_sb_remount() take it by value (i.e. as void *). * new method: ->sb_free_mnt_opts(). Takes void *, does whatever freeing that needs to be done. * ->sb_set_mnt_opts() and ->sb_remount() might get NULL as mnt_opts argument, meaning "empty". Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
a65001e8 |
|
10-Dec-2018 |
Al Viro <viro@zeniv.linux.org.uk> |
btrfs: sanitize security_mnt_opts use 1) keeping a copy in btrfs_fs_info is completely pointless - we never use it for anything. Getting rid of that allows for simpler calling conventions for setup_security_options() (caller is responsible for freeing mnt_opts in all cases). 2) on remount we want to use ->sb_remount(), not ->sb_set_mnt_opts(), same as we would if not for FS_BINARY_MOUNTDATA. Behaviours *are* close (in fact, selinux sb_set_mnt_opts() ought to punt to sb_remount() in "already initialized" case), but let's handle that uniformly. And the only reason why the original btrfs changes didn't go for security_sb_remount() in btrfs_remount() case is that it hadn't been exported. Let's export it for a while - it'll be going away soon anyway. Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
f5c0c26d |
|
16-Nov-2018 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: security_sb_eat_lsm_opts() combination of alloc_secdata(), security_sb_copy_data(), security_sb_parse_opt_str() and free_secdata(). Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
52042d8e |
|
27-Nov-2018 |
Andrea Gelmini <andrea.gelmini@gelma.net> |
btrfs: Fix typos in comments and strings The typos accumulate over time so once in a while time they get fixed in a large patch. Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
de37aa51 |
|
30-Oct-2018 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Remove fsid/metadata_fsid fields from btrfs_info Currently btrfs_fs_info structure contains a copy of the fsid/metadata_uuid fields. Same values are also contained in the btrfs_fs_devices structure which fs_info has a reference to. Let's reduce duplication by removing the fields from fs_info and always refer to the ones in fs_devices. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
f505754f |
|
14-Nov-2018 |
Filipe Manana <fdmanana@suse.com> |
Btrfs: ensure path name is null terminated at btrfs_control_ioctl We were using the path name received from user space without checking that it is null terminated. While btrfs-progs is well behaved and does proper validation and null termination, someone could call the ioctl and pass a non-null terminated patch, leading to buffer overrun problems in the kernel. The ioctl is protected by CAP_SYS_ADMIN. So just set the last byte of the path to a null character, similar to what we do in other ioctls (add/remove/resize device, snapshot creation, etc). CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
7e17916b |
|
03-Nov-2018 |
Arnd Bergmann <arnd@arndb.de> |
btrfs: avoid link error with CONFIG_NO_AUTO_INLINE Note: this patch fixes a problem in a feature outside of btrfs ("kernel hacking: add a config option to disable compiler auto-inlining") and is applied ahead of time due to cross-subsystem dependencies. On 32-bit ARM with gcc-8, I see a link error with the addition of the CONFIG_NO_AUTO_INLINE option: fs/btrfs/super.o: In function `btrfs_statfs': super.c:(.text+0x67b8): undefined reference to `__aeabi_uldivmod' super.c:(.text+0x67fc): undefined reference to `__aeabi_uldivmod' super.c:(.text+0x6858): undefined reference to `__aeabi_uldivmod' super.c:(.text+0x6920): undefined reference to `__aeabi_uldivmod' super.c:(.text+0x693c): undefined reference to `__aeabi_uldivmod' fs/btrfs/super.o:super.c:(.text+0x6958): more undefined references to `__aeabi_uldivmod' follow So far this is the only file that shows the behavior, so I'd propose to just work around it by marking the functions as 'static inline' that normally get inlined here. The reference to __aeabi_uldivmod comes from a div_u64() which has an optimization for a constant division that uses a straight '/' operator when the result should be known to the compiler. My interpretation is that as we turn off inlining, gcc still expects the result to be constant but fails to use that constant value. Link: https://lkml.kernel.org/r/20181103153941.1881966-1-arnd@arndb.de Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> [ add the note ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
4fd786e6 |
|
05-Aug-2018 |
Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> |
btrfs: Remove 'objectid' member from struct btrfs_root There are two members in struct btrfs_root which indicate root's objectid: objectid and root_key.objectid. They are both set to the same value in __setup_root(): static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info, u64 objectid) { ... root->objectid = objectid; ... root->root_key.objectid = objecitd; ... } and not changed to other value after initialization. grep in btrfs directory shows both are used in many places: $ grep -rI "root->root_key.objectid" | wc -l 133 $ grep -rI "root->objectid" | wc -l 55 (4.17, inc. some noise) It is confusing to have two similar variable names and it seems that there is no rule about which should be used in a certain case. Since ->root_key itself is needed for tree reloc tree, let's remove 'objecitd' member and unify code to use ->root_key.objectid in all places. Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
672d5990 |
|
02-Aug-2018 |
Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> |
btrfs: Use wrapper macro for rcu string to remove duplicate code Cleanup patch and no functional changes. Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
3c427693 |
|
20-Jul-2018 |
Josef Bacik <jbacik@fb.com> |
Btrfs: fix btrfs_write_inode vs delayed iput deadlock We recently ran into the following deadlock involving btrfs_write_inode(): [ +0.005066] __schedule+0x38e/0x8c0 [ +0.007144] schedule+0x36/0x80 [ +0.006447] bit_wait+0x11/0x60 [ +0.006446] __wait_on_bit+0xbe/0x110 [ +0.007487] ? bit_wait_io+0x60/0x60 [ +0.007319] __inode_wait_for_writeback+0x96/0xc0 [ +0.009568] ? autoremove_wake_function+0x40/0x40 [ +0.009565] inode_wait_for_writeback+0x21/0x30 [ +0.009224] evict+0xb0/0x190 [ +0.006099] iput+0x1a8/0x210 [ +0.006103] btrfs_run_delayed_iputs+0x73/0xc0 [ +0.009047] btrfs_commit_transaction+0x799/0x8c0 [ +0.009567] btrfs_write_inode+0x81/0xb0 [ +0.008008] __writeback_single_inode+0x267/0x320 [ +0.009569] writeback_sb_inodes+0x25b/0x4e0 [ +0.008702] wb_writeback+0x102/0x2d0 [ +0.007487] wb_workfn+0xa4/0x310 [ +0.006794] ? wb_workfn+0xa4/0x310 [ +0.007143] process_one_work+0x150/0x410 [ +0.008179] worker_thread+0x6d/0x520 [ +0.007490] kthread+0x12c/0x160 [ +0.006620] ? put_pwq_unlocked+0x80/0x80 [ +0.008185] ? kthread_park+0xa0/0xa0 [ +0.007484] ? do_syscall_64+0x53/0x150 [ +0.007837] ret_from_fork+0x29/0x40 Writeback calls: btrfs_write_inode btrfs_commit_transaction btrfs_run_delayed_iputs If iput() is called on that same inode, evict() will wait for writeback forever. btrfs_write_inode() was originally added way back in 4730a4bc5bf3 ("btrfs_dirty_inode") to support O_SYNC writes. However, ->write_inode() hasn't been used for O_SYNC since 148f948ba877 ("vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode"), so btrfs_write_inode() is actually unnecessary (and leads to a bunch of unnecessary commits). Get rid of it, which also gets rid of the deadlock. CC: stable@vger.kernel.org # 3.2+ Signed-off-by: Josef Bacik <jbacik@fb.com> [Omar: new commit message] Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
46df06b8 |
|
13-Jul-2018 |
David Sterba <dsterba@suse.com> |
btrfs: refactor block group replication factor calculation to a helper There are many places that open code the duplicity factor of the block group profiles, create a common helper. This can be easily extended for more copies. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
fa59f27c |
|
16-Jul-2018 |
Anand Jain <anand.jain@oracle.com> |
btrfs: rename btrfs_parse_early_options Rename btrfs_parse_early_options() to btrfs_parse_device_options(). As btrfs_parse_early_options() parses the -o device options and scan the device provided. So this rename specifies its action. Also the function name is in line with btrfs_parse_subvol_options(). No functional changes. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
36350e95 |
|
12-Jul-2018 |
Gu Jinxiang <gujx@cn.fujitsu.com> |
btrfs: return device pointer from btrfs_scan_one_device Return device pointer (with the IS_ERR semantics) from btrfs_scan_one_device so we don't have to return in through pointer. And since btrfs_fs_devices can be obtained from btrfs_device, return that. Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ fixed conflics after recent changes to btrfs_scan_one_device ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d64dcbd1 |
|
12-Jul-2018 |
Gu Jinxiang <gujx@cn.fujitsu.com> |
btrfs: make fs_devices a local variable in btrfs_parse_early_options fs_devices is always passed to btrfs_scan_one_device which overrides it. In the call stack below fs_devices is passed to btrfs_scan_one_device from btrfs_mount_root. In btrfs_mount_root the output fs_devices of this call stack is not used. btrfs_mount_root btrfs_parse_early_options btrfs_scan_one_device So, it is not necessary to pass fs_devices from btrfs_mount_root, using a local variable in btrfs_parse_early_options is enough. Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com> Reviewed-by: Anand Jain <Anand.Jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
81ffd56b |
|
19-Jun-2018 |
David Sterba <dsterba@suse.com> |
btrfs: fix mount and ioctl device scan ioctl race Technically this extends the critical section covered by uuid_mutex to: - parse early mount options -- here we can call device scan on paths that can be passed as 'device=/dev/...' - scan the device passed to mount - open the devices related to the fs_devices -- this increases fs_devices::opened The race can happen when mount calls one of the scans and there's another one called eg. by mkfs or 'btrfs dev scan': Mount Scan ----- ---- scan_one_device (dev1, fsid1) scan_one_device (dev2, fsid1) add the device free stale devices fsid1 fs_devices::opened == 0 find fsid1:dev1 free fsid1:dev1 if it's the last one, free fs_devices of fsid1 too open_devices (dev1, fsid1) dev1 not found When fixed, the uuid mutex will make sure that mount will increase fs_devices::opened and this will not be touched by the racing scan ioctl. Reported-and-tested-by: syzbot+909a5177749d7990ffa4@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+ceb2606025ec1cc3479c@syzkaller.appspotmail.com Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
399f7f4c |
|
19-Jun-2018 |
David Sterba <dsterba@suse.com> |
btrfs: reorder initialization before the mount locks uuid_mutex In preparation to take a big lock, move resource initialization before the critical section. It's not obvious from the diff, the desired order is: - initialize mount security options - allocate temporary fs_info - allocate superblock buffers Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
5139cff5 |
|
19-Jun-2018 |
David Sterba <dsterba@suse.com> |
btrfs: lift uuid_mutex to callers of btrfs_parse_early_options Prepartory work to fix race between mount and device scan. btrfs_parse_early_options calls the device scan from mount and we'll need to let mount completely manage the critical section. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
f5194e34 |
|
19-Jun-2018 |
David Sterba <dsterba@suse.com> |
btrfs: lift uuid_mutex to callers of btrfs_open_devices Prepartory work to fix race between mount and device scan. The callers will have to manage the critical section, eg. mount wants to scan and then call btrfs_open_devices without the ioctl scan walking in and modifying the fs devices in the meantime. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
899f9307 |
|
19-Jun-2018 |
David Sterba <dsterba@suse.com> |
btrfs: lift uuid_mutex to callers of btrfs_scan_one_device Prepartory work to fix race between mount and device scan. The callers will have to manage the critical section, eg. mount wants to scan and then call btrfs_open_devices without the ioctl scan walking in and modifying the fs devices in the meantime. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
93b9bcdf |
|
09-Jul-2018 |
Gu Jinxiang <gujx@cn.fujitsu.com> |
btrfs: remove unused parameter from btrfs_parse_subvol_options Since parameter flags is no more used since commit d7407606564c ("btrfs: split parse_early_options() in two"), remove it. Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d7f663fa |
|
29-Jun-2018 |
David Sterba <dsterba@suse.com> |
btrfs: prune unused includes Remove includes if none of the interfaces and exports is used in the given source file. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
edf57cbf |
|
20-Jun-2018 |
Bart Van Assche <bvanassche@acm.org> |
btrfs: Fix a C compliance issue The C programming language does not allow to use preprocessor statements inside macro arguments (pr_info() is defined as a macro). Hence rework the pr_info() statement in btrfs_print_mod_info() such that it becomes compliant. This patch allows tools like sparse to analyze the BTRFS source code. Fixes: 62e855771dac ("btrfs: convert printk(KERN_* to use pr_* calls") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
acd43e3c |
|
20-Jun-2018 |
Bart Van Assche <bvanassche@acm.org> |
btrfs: Annotate fall-through when parsing mount option This patch avoids that the compiler complains that a fall-through annotation is missing when building with W=1. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
37becec9 |
|
21-May-2018 |
Omar Sandoval <osandov@fb.com> |
Btrfs: allow empty subvol= again I got a report that after upgrading to 4.16, someone's filesystems weren't mounting: [ 23.845852] BTRFS info (device loop0): unrecognized mount option 'subvol=' Before 4.16, this mounted the default subvolume. It turns out that this empty "subvol=" is actually an application bug, but it was causing the application to fail, so it's an ABI break if you squint. The generic parsing code we use for mount options (match_token()) doesn't match an empty string as "%s". Previously, setup_root_args() removed the "subvol=" string, but the mount path was cleaned up to not need that. Add a dummy Opt_subvol_empty to fix this. The simple workaround is to use / or . for the value of 'subvol=' . Fixes: 312c89fbca06 ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()") CC: stable@vger.kernel.org # 4.16+ Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
891f41cb |
|
09-May-2018 |
Chengguang Xu <cgxu519@gmx.com> |
btrfs: return original error code when failing from option parsing It's not good to overwrite -ENOMEM using -EINVAL when failing from mount option parsing, so just return original error code. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: David Sterba <dsterba@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
c1d7c514 |
|
03-Apr-2018 |
David Sterba <dsterba@suse.com> |
btrfs: replace GPL boilerplate by SPDX -- sources Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX header. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
88c14590 |
|
15-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: use RCU in btrfs_show_devname for device list traversal The show_devname callback is used to print device name in /proc/self/mounts, we need to traverse the device list consistently and read the name that's copied to a seq buffer so we don't need further locking. If the first device is being deleted at the same time, the RCU will allow us to read the device name, though it will become stale right after the RCU protection ends. This is unavoidable and the user can expect that the device will disappear from the filesystem's list at some point. The device_list_mutex was pretty heavy as it is used eg. for writing superblock and a few other IO related contexts. This can stall any application that reads the proc file for no reason. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
416a7202 |
|
09-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: sort and group mount option definitions Sort mount options by the primary name, followed by the 'no-' counterpart if it exists. Group the deprecated and debugging options. Enum and token defintions are synced. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
62b8e077 |
|
08-Mar-2018 |
Howard McLauchlan <hmclauchlan@fb.com> |
btrfs: Add nossd_spread mount option Btrfs has two mount options for SSD optimizations: ssd and ssd_spread. Presently there is an option to disable all SSD optimizations, but there isn't an option to disable just ssd_spread. This patch adds a mount option nossd_spread that disables ssd_spread only. Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d612ac59 |
|
26-Feb-2018 |
Anand Jain <anand.jain@oracle.com> |
btrfs: unify types for metadata_ratio and data_chunk_allocations We have btrfs_fs_info::data_chunk_allocations and btrfs_fs_info::metadata_ratio declared as unsigned which would be unsinged int and kernel style prefers unsigned int over bare unsigned. So this patch changes them to u32. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
e67c718b |
|
19-Feb-2018 |
David Sterba <dsterba@suse.com> |
btrfs: add more __cold annotations The __cold functions are placed to a special section, as they're expected to be called rarely. This could help i-cache prefetches or help compiler to decide which branches are more/less likely to be taken without any other annotations needed. Though we can't add more __exit annotations, it's still possible to add __cold (that's also added with __exit). That way the following function categories are tagged: - printf wrappers, error messages - exit helpers Signed-off-by: David Sterba <dsterba@suse.com>
|
#
ccb0e7d1 |
|
14-Feb-2018 |
Anand Jain <anand.jain@oracle.com> |
btrfs: verify subvolid mount parameter We aren't verifying the parameter passed to the subvolid mount option, so we won't report and fail the mount if a junk value is specified for example, -o subvolid=abc. This patch verifies the subvolid option with match_u64. Up to now the memparse function accepts the K/M/G/ suffixes, that are usually meant for size values and do not make sense for a subvolume it. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
9678c543 |
|
08-Jan-2018 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Remove custom crc32c init code The custom crc32 init code was introduced in 14a958e678cd ("Btrfs: fix btrfs boot when compiled as built-in") to enable using btrfs as a built-in. However, later as pointed out by 60efa5eb2e88 ("Btrfs: use late_initcall instead of module_init") this wasn't enough and finally btrfs was switched to late_initcall which comes after the generic crc32c implementation is initiliased. The latter commit superseeded the former. Now that we don't have to maintain our own code let's just remove it and switch to using the generic implementation. Despite touching a lot of files the patch is really simple. Here is the gist of the changes: 1. Select LIBCRC32C rather than the low-level modules. 2. s/btrfs_crc32c/crc32c/g 3. replace hash.h with linux/crc32c.h 4. Move the btrfs namehash funcs to ctree.h and change the tree accordingly. I've tested this with btrfs being both a module and a built-in and xfstest doesn't complain. Does seem to fix the longstanding problem of not automatically selectiong the crc32c module when btrfs is used. Possibly there is a workaround in dracut. The modinfo confirms that now all the module dependencies are there: before: depends: zstd_compress,zstd_decompress,raid6_pq,xor,zlib_deflate after: depends: libcrc32c,zstd_compress,zstd_decompress,raid6_pq,xor,zlib_deflate Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add more info to changelog from mails ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
eceff22a |
|
13-Feb-2018 |
Anand Jain <anand.jain@oracle.com> |
btrfs: add a comment to mark the deprecated mount option The options alloc_start and subvolrootid are deprecated, comment them in the tokens list. And leave them as it is. No functional changes. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d3740608 |
|
13-Feb-2018 |
Anand Jain <anand.jain@oracle.com> |
btrfs: manage commit mount option as %u As the commit mount option is unsigned so manage it as %u for token verifications, instead of %d. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
02453bde |
|
13-Feb-2018 |
Anand Jain <anand.jain@oracle.com> |
btrfs: manage check_int_print_mask mount option as %u As check_int_print_mask mount option is unsigned so manage it as %u for token verifications, instead of %d. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
764cb8b4 |
|
13-Feb-2018 |
Anand Jain <anand.jain@oracle.com> |
btrfs: manage metadata_ratio mount option as %u As metadata_ratio mount option is unsinged so manage it as %u for token verifications, instead of %d. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
f7b885be |
|
13-Feb-2018 |
Anand Jain <anand.jain@oracle.com> |
btrfs: manage thread_pool mount option as %u The mount option thread_pool is always unsigned. Manage it that way all around. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
a8fd1f71 |
|
15-Feb-2018 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: use kvzalloc to allocate btrfs_fs_info The srcu_struct in btrfs_fs_info scales in size with NR_CPUS. On kernels built with NR_CPUS=8192, this can result in kmalloc failures that prevent mounting. There is work in progress to try to resolve this for every user of srcu_struct but using kvzalloc will work around the failures until that is complete. As an example with NR_CPUS=512 on x86_64: the overall size of subvol_srcu is 3460 bytes, fs_info is 6496. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
922ea899 |
|
04-Jan-2018 |
Anand Jain <Anand.Jain@oracle.com> |
btrfS: collapse btrfs_handle_error() into __btrfs_handle_fs_error() There is no other consumer for btrfs_handle_error() other than __btrfs_handle_fs_error(), further this function quite small. Merge it into its parent. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> [ reformat comment ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
61ecda68 |
|
04-Jan-2018 |
Anand Jain <Anand.Jain@oracle.com> |
btrfs: remove check for BTRFS_FS_STATE_ERROR which we just set __btrfs_handle_fs_error() sets BTRFS_FS_STATE_ERROR, and calls btrfs_handle_error() so no need to check if the BTRFS_FS_STATE_ERROR is set in btrfs_handle_error(). And there is no other user of btrfs_handle_error() as well. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
6528b99d |
|
18-Dec-2017 |
Anand Jain <Anand.Jain@oracle.com> |
btrfs: factor btrfs_check_rw_degradable() to check given device Update btrfs_check_rw_degradable() to check against the given device if its lost. We can use this function to know if the volume is going to be in degraded mode OR failed state, when the given device fails. Which is needed when we are handling the device failed state. A preparatory patch does not affect the flow as such. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Qu Wenruo <wqu@suse.com> [ enhance comment ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
5bedc48a |
|
02-Jan-2018 |
David Sterba <dsterba@suse.com> |
btrfs: drop unused parameters from mount_subvol Recent patches reworking the mount path left some unused parameters. We pass a vfsmount to mount_subvol, the flags and data (ie. mount options) have been already applied and we will not need them. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
e215772c |
|
14-Dec-2017 |
Misono, Tomohiro <misono.tomohiro@jp.fujitsu.com> |
btrfs: cleanup unnecessary string dup in btrfs_parse_options() Long ago, commit edf24abe51493 ("btrfs: sanity mount option parsing and early mount code") split the btrfs_parse_options() into two parts (btrfs_parse_early_options() and btrfs_parse_options()). As a result, btrfs_parse_optins no longer gets called twice and is the last one to parse mount option string. Therefore there is no need to dup it. Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
78f6beac |
|
17-Jan-2018 |
Misono, Tomohiro <misono.tomohiro@jp.fujitsu.com> |
btrfs: remove unused arg from parse_subvol_options() Remove unused arg 'holder' from parse_subvol_options(), which has been forgotten to be cleaned in the commit b99beb110e2d ("btrfs: split parse_early_options() in two"). Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
83085935 |
|
14-Dec-2017 |
Misono, Tomohiro <misono.tomohiro@jp.fujitsu.com> |
btrfs: remove unused setup_root_args() Since setup_root_args() is not used anymore, just remove it. Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d7407606 |
|
14-Dec-2017 |
Misono, Tomohiro <misono.tomohiro@jp.fujitsu.com> |
btrfs: split parse_early_options() in two Now parse_early_options() is used by both btrfs_mount() and btrfs_mount_root(). However, the former only needs subvol related part and the latter needs the others. Therefore extract the subvol related parts from parse_early_options() and move it to new parse function (parse_subvol_options()). Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
312c89fb |
|
14-Dec-2017 |
Misono, Tomohiro <misono.tomohiro@jp.fujitsu.com> |
btrfs: cleanup btrfs_mount() using btrfs_mount_root() Cleanup btrfs_mount() by using btrfs_mount_root(). This avoids getting btrfs_mount() called twice in mount path. Old btrfs_mount() will do: 0. VFS layer calls vfs_kern_mount() with registered file_system_type (for btrfs, btrfs_fs_type). btrfs_mount() is called on the way. 1. btrfs_parse_early_options() parses "subvolid=" mount option and set the value to subvol_objectid. Otherwise, subvol_objectid has the initial value of 0 2. check subvol_objectid is 5 or not. Assume this time id is not 5, then btrfs_mount() returns by calling mount_subvol() 3. In mount_subvol(), original mount options are modified to contain "subvolid=0" in setup_root_args(). Then, vfs_kern_mount() is called with btrfs_fs_type and new options 4. btrfs_mount() is called again 5. btrfs_parse_early_options() parses "subvolid=0" and set 5 (instead of 0) to subvol_objectid 6. check subvol_objectid is 5 or not. This time id is 5 and mount_subvol() is not called. btrfs_mount() finishes mounting a root 7. (in mount_subvol()) with using a return vale of vfs_kern_mount(), it calls mount_subtree() 8. return subvolume's dentry Reusing the same file_system_type (and btrfs_mount()) for vfs_kern_mount() is the cause of complication. Instead, new btrfs_mount() will do: 1. parse subvol id related options for later use in mount_subvol() 2. mount device's root by calling vfs_kern_mount() with btrfs_root_fs_type, which is not registered to VFS by register_filesystem(). As a result, btrfs_mount_root() is called 3. return by calling mount_subvol() The code of 2. is moved from the first part of mount_subvol(). The semantics of device holder changes from btrfs_fs_type to btrfs_root_fs_type and has to be used in all contexts. Otherwise we'd get wrong results when mount and dev scan would not check the same thing. (this has been found indendently and the fix is folded into this patch) Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> [ fold the btrfs_control_ioctl fixup, extend the comment ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
72fa39f5 |
|
14-Dec-2017 |
Misono, Tomohiro <misono.tomohiro@jp.fujitsu.com> |
btrfs: add btrfs_mount_root() and new file_system_type Add btrfs_mount_root() and new file_system_type for preparation of cleanup of btrfs_mount(). Code path is not changed yet. btrfs_mount_root() is almost the same as current btrfs_mount(), but doesn't have subvolume related part. Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
87c46ec7 |
|
06-Dec-2017 |
Pravin Shedge <pravin.shedge4linux@gmail.com> |
btrfs: remove duplicate includes These duplicate includes have been found with scripts/checkincludes.pl but they have been removed manually to avoid removing false positives. Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
0f628c63 |
|
31-Oct-2017 |
David Sterba <dsterba@suse.com> |
btrfs: show options: use helper to convert compression type string Use the helper, if the COMPRESS option is set, the result is always defined and not empty. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
401e29c1 |
|
03-Dec-2017 |
Anand Jain <anand.jain@oracle.com> |
btrfs: cleanup device states define BTRFS_DEV_STATE_REPLACE_TGT Currently device state is being managed by each individual int variable such as struct btrfs_device::is_tgtdev_for_dev_replace. Instead of that declare btrfs_device::dev_state BTRFS_DEV_STATE_MISSING and use the bit operations. Signed-off-by: Anand Jain <anand.jain@oracle.com> [ whitespace adjustments ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
e6e674bd |
|
03-Dec-2017 |
Anand Jain <anand.jain@oracle.com> |
btrfs: cleanup device states define BTRFS_DEV_STATE_MISSING Currently device state is being managed by each individual int variable such as struct btrfs_device::missing. Instead of that declare btrfs_device::dev_state BTRFS_DEV_STATE_MISSING and use the bit operations. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by : Nikolay Borisov <nborisov@suse.com> [ whitespace adjustments ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
e12c9621 |
|
03-Dec-2017 |
Anand Jain <anand.jain@oracle.com> |
btrfs: cleanup device states define BTRFS_DEV_STATE_IN_FS_METADATA Currently device state is being managed by each individual int variable such as struct btrfs_device::in_fs_metadata. Instead of that declare device state BTRFS_DEV_STATE_IN_FS_METADATA and use the bit operations. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> [ whitespace adjustments ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
f5c29bd9 |
|
02-Nov-2017 |
Liu Bo <bo.li.liu@oracle.com> |
Btrfs: add __init macro to btrfs init functions Adding __init macro gives kernel a hint that this function is only used during the initialization phase and its memory resources can be freed up after. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
1751e8a6 |
|
27-Nov-2017 |
Linus Torvalds <torvalds@linux-foundation.org> |
Rename superblock flags (MS_xyz -> SB_xyz) This is a pure automated search-and-replace of the internal kernel superblock flags. The s_flags are now called SB_*, with the names and the values for the moment mirroring the MS_* flags that they're equivalent to. Note how the MS_xyz flags are the ones passed to the mount system call, while the SB_xyz flags are what we then use in sb->s_flags. The script to do this was: # places to look in; re security/*: it generally should *not* be # touched (that stuff parses mount(2) arguments directly), but # there are two places where we really deal with superblock flags. FILES="drivers/mtd drivers/staging/lustre fs ipc mm \ include/linux/fs.h include/uapi/linux/bfs_fs.h \ security/apparmor/apparmorfs.c security/apparmor/include/lib.h" # the list of MS_... constants SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \ DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \ POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \ I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \ ACTIVE NOUSER" SED_PROG= for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done # we want files that contain at least one of MS_..., # with fs/namespace.c and fs/pnode.c excluded. L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c') for f in $L; do sed -i $f $SED_PROG; done Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
eae8d825 |
|
05-Nov-2017 |
Qu Wenruo <wqu@suse.com> |
btrfs: Fix wild memory access in compression level parser [BUG] Kernel panic when mounting with "-o compress" mount option. KASAN will report like: ------ ================================================================== BUG: KASAN: wild-memory-access in strncmp+0x31/0xc0 Read of size 1 at addr d86735fce994f800 by task mount/662 ... Call Trace: dump_stack+0xe3/0x175 kasan_report+0x163/0x370 __asan_load1+0x47/0x50 strncmp+0x31/0xc0 btrfs_compress_str2level+0x20/0x70 [btrfs] btrfs_parse_options+0xff4/0x1870 [btrfs] open_ctree+0x2679/0x49f0 [btrfs] btrfs_mount+0x1b7f/0x1d30 [btrfs] mount_fs+0x49/0x190 vfs_kern_mount.part.29+0xba/0x280 vfs_kern_mount+0x13/0x20 btrfs_mount+0x31e/0x1d30 [btrfs] mount_fs+0x49/0x190 vfs_kern_mount.part.29+0xba/0x280 do_mount+0xaad/0x1a00 SyS_mount+0x98/0xe0 entry_SYSCALL_64_fastpath+0x1f/0xbe ------ [Cause] For 'compress' and 'compress_force' options, its token doesn't expect any parameter so its args[0] contains uninitialized data. Accessing args[0] will cause above wild memory access. [Fix] For Opt_compress and Opt_compress_force, set compression level to the default. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ set the default in advance ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
fa4d885a |
|
15-Sep-2017 |
Adam Borowski <kilobyte@angband.pl> |
btrfs: allow setting zlib compression level via :9 This is bikeshedding, but it seems people are drastically more likely to understand "zlib:9" as compression level rather than an algorithm version compared to "zlib9". Based on feedback on the mailinglist, the ":9" will be the only accepted syntax. The level must be a single digit. Unrecognized format will result to the default, for forward compatibility in a similar way the compression algorithm specifier was relaxed in commit a7164fa4e055daf6368c ("btrfs: prepare for extensions in compression options"). Signed-off-by: Adam Borowski <kilobyte@angband.pl> Reviewed-by: David Sterba <dsterba@suse.com> [ tighten the accepted format ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
f51d2b59 |
|
15-Sep-2017 |
David Sterba <dsterba@suse.com> |
btrfs: allow to set compression level for zlib Preliminary support for setting compression level for zlib, the following works: $ mount -o compess=zlib # default $ mount -o compess=zlib0 # same $ mount -o compess=zlib9 # level 9, slower sync, less data $ mount -o compess=zlib1 # level 1, faster sync, more data $ mount -o remount,compress=zlib3 # level set by remount The compress-force works the same as compress'. The level is visible in the same format in /proc/mounts. Level set via file property does not work yet. Required patch: "btrfs: prepare for extensions in compression options" Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d4417e22 |
|
16-Oct-2017 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Replace opencoded sizes with their symbolic constants Currently btrfs' code uses a mix of opencoded sizes and defines from sizes.h. Let's unifiy the code base to always use the symbolic constants. No functional changes Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
fb592373 |
|
29-Sep-2017 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: add ref-verify mount option This adds the infrastructure for turning ref verify on and off for a mount, to be used by a later patch. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> [ enhnance btrfs_print_mod_info to print if ref-verify is compiled in ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
a7e3c5f2 |
|
10-Oct-2017 |
Rakesh Pandit <rakesh@tuxera.com> |
btrfs: use appropriate replacements for __sb_{start,end}_write calls Commit a53f4f8e9c8eb ("btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.") started using internal calls and we replace them with more suitable ones. Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d8953d69 |
|
12-Sep-2017 |
Satoru Takeuchi <satoru.takeuchi@gmail.com> |
btrfs: convert all mount option checking code to use btrfs_test_opt Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
3993b112 |
|
11-Sep-2017 |
Colin Ian King <colin.king@canonical.com> |
btrfs: avoid null pointer dereference on fs_info when calling btrfs_crit There are checks on fs_info in __btrfs_panic to avoid dereferencing a null fs_info, however, there is a call to btrfs_crit that may also dereference a null fs_info. Fix this by adding a check to see if fs_info is null and only print the s_id if fs_info is non-null. Detected by CoverityScan CID#401973 ("Dereference after null check") Fixes: efe120a067c8 ("Btrfs: convert printk to btrfs_ and fix BTRFS prefix") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
357fdad0 |
|
18-Oct-2017 |
Matthew Garrett <mjg59@google.com> |
Convert fs/*/* to SB_I_VERSION [AV: in addition to the fix in previous commit] Signed-off-by: Matthew Garrett <mjg59@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
583b7231 |
|
28-Jul-2017 |
Hans van Kranenburg <hans.van.kranenburg@mendix.com> |
btrfs: Do not use data_alloc_cluster in ssd mode This patch provides a band aid to improve the 'out of the box' behaviour of btrfs for disks that are detected as being an ssd. In a general purpose mixed workload scenario, the current ssd mode causes overallocation of available raw disk space for data, while leaving behind increasing amounts of unused fragmented free space. This situation leads to early ENOSPC problems which are harming user experience and adoption of btrfs as a general purpose filesystem. This patch modifies the data extent allocation behaviour of the ssd mode to make it behave identical to nossd mode. The metadata behaviour and additional ssd_spread option stay untouched so far. Recommendations for future development are to reconsider the current oversimplified nossd / ssd distinction and the broken detection mechanism based on the rotational attribute in sysfs and provide experienced users with a more flexible way to choose allocator behaviour for data and metadata, optimized for certain use cases, while keeping sane 'out of the box' default settings. The internals of the current btrfs code have more potential than what currently gets exposed to the user to choose from. The SSD story... In the first year of btrfs development, around early 2008, btrfs gained a mount option which enables specific functionality for filesystems on solid state devices. The first occurance of this functionality is in commit e18e4809, labeled "Add mount -o ssd, which includes optimizations for seek free storage". The effect on allocating free space for doing (data) writes is to 'cluster' writes together, writing them out in contiguous space, as opposed to a 'tetris' way of putting all separate writes into any free space fragment that fits (which is what the -o nossd behaviour does). A somewhat simplified explanation of what happens is that, when for example, the 'cluster' size is set to 2MiB, when we do some writes, the data allocator will search for a free space block that is 2MiB big, and put the writes in there. The ssd mode itself might allow a 2MiB cluster to be composed of multiple free space extents with some existing data in between, while the additional ssd_spread mount option kills off this option and requires fully free space. The idea behind this is (commit 536ac8ae): "The [...] clusters make it more likely a given IO will completely overwrite the ssd block, so it doesn't have to do an internal rwm cycle."; ssd block meaning nand erase block. So, effectively this means applying a "locality based algorithm" and trying to outsmart the actual ssd. Since then, various changes have been made to the involved code, but the basic idea is still present, and gets activated whenever the ssd mount option is active. This also happens by default, when the rotational flag as seen at /sys/block/<device>/queue/rotational is set to 0. However, there's a number of problems with this approach. First, what the optimization is trying to do is outsmart the ssd by assuming there is a relation between the physical address space of the block device as seen by btrfs and the actual physical storage of the ssd, and then adjusting data placement. However, since the introduction of the Flash Translation Layer (FTL) which is a part of the internal controller of an ssd, these attempts are futile. The use of good quality FTL in consumer ssd products might have been limited in 2008, but this situation has changed drastically soon after that time. Today, even the flash memory in your automatic cat feeding machine or your grandma's wheelchair has a full featured one. Second, the behaviour as described above results in the filesystem being filled up with badly fragmented free space extents because of relatively small pieces of space that are freed up by deletes, but not selected again as part of a 'cluster'. Since the algorithm prefers allocating a new chunk over going back to tetris mode, the end result is a filesystem in which all raw space is allocated, but which is composed of underutilized chunks with a 'shotgun blast' pattern of fragmented free space. Usually, the next problematic thing that happens is the filesystem wanting to allocate new space for metadata, which causes the filesystem to fail in spectacular ways. Third, the default mount options you get for an ssd ('ssd' mode enabled, 'discard' not enabled), in combination with spreading out writes over the full address space and ignoring freed up space leads to worst case behaviour in providing information to the ssd itself, since it will never learn that all the free space left behind is actually free. There are two ways to let an ssd know previously written data does not have to be preserved, which are sending explicit signals using discard or fstrim, or by simply overwriting the space with new data. The worst case behaviour is the btrfs ssd_spread mount option in combination with not having discard enabled. It has a side effect of minimizing the reuse of free space previously written in. Fourth, the rotational flag in /sys/ does not reliably indicate if the device is a locally attached ssd. For example, iSCSI or NBD displays as non-rotational, while a loop device on an ssd shows up as rotational. The combination of the second and third problem effectively means that despite all the good intentions, the btrfs ssd mode reliably causes the ssd hardware and the filesystem structures and performance to be choked to death. The clickbait version of the title of this story would have been "Btrfs ssd optimizations considered harmful for ssds". The current nossd 'tetris' mode (even still without discard) allows a pattern of overwriting much more previously used space, causing many more implicit discards to happen because of the overwrite information the ssd gets. The actual location in the physical address space, as seen from the point of view of btrfs is irrelevant, because the actual writes to the low level flash are reordered anyway thanks to the FTL. Changes made in the code 1. Make ssd mode data allocation identical to tetris mode, like nossd. 2. Adjust and clean up filesystem mount messages so that we can easily identify if a kernel has this patch applied or not, when providing support to end users. Also, make better use of the *_and_info helpers to only trigger messages on actual state changes. Backporting notes Notes for whoever wants to backport this patch to their 4.9 LTS kernel: * First apply commit 951e7966 "btrfs: drop the nossd flag when remounting with -o ssd", or fixup the differences manually. * The rest of the conflicts are because of the fs_info refactoring. So, for example, instead of using fs_info, it's root->fs_info in extent-tree.c Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
a7164fa4 |
|
17-Jul-2017 |
David Sterba <dsterba@suse.com> |
btrfs: prepare for extensions in compression options This is a minimal patch intended to be backported to older kernels. We're going to extend the string specifying the compression method and this would fail on kernels before that change (the string is compared exactly). Relax the string matching only to the prefix, ie. ignoring anything that goes after "zlib" or "lzo", regardless of th format extension we decide to use. This applies to the mount options and properties. That way, patched old kernels could be booted on systems already utilizing the new compression spec. Applicable since commit 63541927c8d11, v3.14. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
3ec83621 |
|
21-Jun-2017 |
David Sterba <dsterba@suse.com> |
btrfs: use GFP_KERNEL in mount and remount We don't need to restrict the allocation flags in btrfs_mount or _remount. No big filesystem locks are held (possibly s_umount but that does no count here). Signed-off-by: David Sterba <dsterba@suse.com>
|
#
b382cfe8 |
|
08-Mar-2017 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Do chunk level check for degraded remount Just the same for mount time check, use btrfs_check_rw_degradable() to check if we are OK to be remounted rw. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
6c6b5a39 |
|
04-Jul-2017 |
Aleksa Sarai <asarai@suse.de> |
btrfs: resume qgroup rescan on rw remount Several distributions mount the "proper root" as ro during initrd and then remount it as rw before pivot_root(2). Thus, if a rescan had been aborted by a previous shutdown, the rescan would never be resumed. This issue would manifest itself as several btrfs ioctl(2)s causing the entire machine to hang when btrfs_qgroup_wait_for_completion was hit (due to the fs_info->qgroup_rescan_running flag being set but the rescan itself not being resumed). Notably, Docker's btrfs storage driver makes regular use of BTRFS_QUOTA_CTL_DISABLE and BTRFS_IOC_QUOTA_RESCAN_WAIT (causing this problem to be manifested on boot for some machines). Cc: <stable@vger.kernel.org> # v3.11+ Cc: Jeff Mahoney <jeffm@suse.com> Fixes: b382a324b60f ("Btrfs: fix qgroup rescan resume on mount") Signed-off-by: Aleksa Sarai <asarai@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
00142756 |
|
12-Jul-2017 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: backref, add tracepoints for prelim_ref insertion and merging This patch adds a tracepoint event for prelim_ref insertion and merging. For each, the ref being inserted or merged and the count of tree nodes is issued. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
5c1aab1d |
|
09-Aug-2017 |
Nick Terrell <terrelln@fb.com> |
btrfs: Add zstd support Add zstd compression and decompression support to BtrFS. zstd at its fastest level compresses almost as well as zlib, while offering much faster compression and decompression, approaching lzo speeds. I benchmarked btrfs with zstd compression against no compression, lzo compression, and zlib compression. I benchmarked two scenarios. Copying a set of files to btrfs, and then reading the files. Copying a tarball to btrfs, extracting it to btrfs, and then reading the extracted files. After every operation, I call `sync` and include the sync time. Between every pair of operations I unmount and remount the filesystem to avoid caching. The benchmark files can be found in the upstream zstd source repository under `contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}` [1] [2]. I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor, 16 GB of RAM, and a SSD. The first compression benchmark is copying 10 copies of the unzipped Silesia corpus [3] into a BtrFS filesystem mounted with `-o compress-force=Method`. The decompression benchmark times how long it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is measured by comparing the output of `df` and `du`. See the benchmark file [1] for details. I benchmarked multiple zstd compression levels, although the patch uses zstd level 1. | Method | Ratio | Compression MB/s | Decompression speed | |---------|-------|------------------|---------------------| | None | 0.99 | 504 | 686 | | lzo | 1.66 | 398 | 442 | | zlib | 2.58 | 65 | 241 | | zstd 1 | 2.57 | 260 | 383 | | zstd 3 | 2.71 | 174 | 408 | | zstd 6 | 2.87 | 70 | 398 | | zstd 9 | 2.92 | 43 | 406 | | zstd 12 | 2.93 | 21 | 408 | | zstd 15 | 3.01 | 11 | 354 | The next benchmark first copies `linux-4.11.6.tar` [4] to btrfs. Then it measures the compression ratio, extracts the tar, and deletes the tar. Then it measures the compression ratio again, and `tar`s the extracted files into `/dev/null`. See the benchmark file [2] for details. | Method | Tar Ratio | Extract Ratio | Copy (s) | Extract (s)| Read (s) | |--------|-----------|---------------|----------|------------|----------| | None | 0.97 | 0.78 | 0.981 | 5.501 | 8.807 | | lzo | 2.06 | 1.38 | 1.631 | 8.458 | 8.585 | | zlib | 3.40 | 1.86 | 7.750 | 21.544 | 11.744 | | zstd 1 | 3.57 | 1.85 | 2.579 | 11.479 | 9.389 | [1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-benchmark.sh [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-extract-benchmark.sh [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia [4] https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.11.6.tar.xz zstd source repository: https://github.com/facebook/zstd Signed-off-by: Nick Terrell <terrelln@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
bc98a42c |
|
17-Jul-2017 |
David Howells <dhowells@redhat.com> |
VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) Firstly by applying the following with coccinelle's spatch: @@ expression SB; @@ -SB->s_flags & MS_RDONLY +sb_rdonly(SB) to effect the conversion to sb_rdonly(sb), then by applying: @@ expression A, SB; @@ ( -(!sb_rdonly(SB)) && A +!sb_rdonly(SB) && A | -A != (sb_rdonly(SB)) +A != sb_rdonly(SB) | -A == (sb_rdonly(SB)) +A == sb_rdonly(SB) | -!(sb_rdonly(SB)) +!sb_rdonly(SB) | -A && (sb_rdonly(SB)) +A && sb_rdonly(SB) | -A || (sb_rdonly(SB)) +A || sb_rdonly(SB) | -(sb_rdonly(SB)) != A +sb_rdonly(SB) != A | -(sb_rdonly(SB)) == A +sb_rdonly(SB) == A | -(sb_rdonly(SB)) && A +sb_rdonly(SB) && A | -(sb_rdonly(SB)) || A +sb_rdonly(SB) || A ) @@ expression A, B, SB; @@ ( -(sb_rdonly(SB)) ? 1 : 0 +sb_rdonly(SB) | -(sb_rdonly(SB)) ? A : B +sb_rdonly(SB) ? A : B ) to remove left over excess bracketage and finally by applying: @@ expression A, SB; @@ ( -(A & MS_RDONLY) != sb_rdonly(SB) +(bool)(A & MS_RDONLY) != sb_rdonly(SB) | -(A & MS_RDONLY) == sb_rdonly(SB) +(bool)(A & MS_RDONLY) == sb_rdonly(SB) ) to make comparisons against the result of sb_rdonly() (which is a bool) work correctly. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
c3d98ea0 |
|
05-Jul-2017 |
David Howells <dhowells@redhat.com> |
VFS: Don't use save/replace_mount_options if not using generic_show_options btrfs, debugfs, reiserfs and tracefs call save_mount_options() and reiserfs calls replace_mount_options(), but they then implement their own ->show_options() methods and don't touch s_options, rendering the saved options unnecessary. I'm trying to eliminate s_options to make it easier to implement a context-based mount where the mount options can be passed individually over a file descriptor. Remove the calls to save/replace_mount_options() call in these cases. Signed-off-by: David Howells <dhowells@redhat.com> cc: Chris Mason <clm@fb.com> cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: Steven Rostedt <rostedt@goodmis.org> cc: linux-btrfs@vger.kernel.org cc: reiserfs-devel@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
6374e57a |
|
23-Jun-2017 |
Chris Mason <clm@fb.com> |
btrfs: fix integer overflow in calc_reclaim_items_nr Dave Jones hit a WARN_ON(nr < 0) in btrfs_wait_ordered_roots() with v4.12-rc6. This was because commit 70e7af244 made it possible for calc_reclaim_items_nr() to return a negative number. It's not really a bug in that commit, it just didn't go far enough down the stack to find all the possible 64->32 bit overflows. This switches calc_reclaim_items_nr() to return a u64 and changes everyone that uses the results of that math to u64 as well. Reported-by: Dave Jones <davej@codemonkey.org.uk> Fixes: 70e7af2 ("Btrfs: fix delalloc accounting leak caused by u32 overflow") Signed-off-by: Chris Mason <clm@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
0d0c71b3 |
|
14-Jun-2017 |
David Sterba <dsterba@suse.com> |
btrfs: obsolete and remove mount option alloc_start The mount option alloc_start was used in the past for debugging and stressing the chunk allocator. Not meant to be used by users, so we're not breaking anybody's setup. There was some added complexity handling changes of the value and when it was not same as default. Such code has likely been untested and I think it's better to remove it. This patch kills all use of alloc_start, and by doing that also fixes a bug when alloc_size is set, potentially called from statfs: in btrfs_calc_avail_data_space, traversing the list in RCU, the RCU protection is temporarily dropped so btrfs_account_dev_extents_size can be called and then RCU is locked again! Doing that inside list_for_each_entry_rcu is just asking for trouble, but unlikely to be observed in practice. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
fac03c8d |
|
15-Jun-2017 |
David Sterba <dsterba@suse.com> |
btrfs: move fs_info::fs_frozen to the flags We can keep the state among the other fs_info flags, there's no reason why fs_frozen would need to be separate. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
6a44517d |
|
15-Jun-2017 |
David Sterba <dsterba@suse.com> |
btrfs: use GFP_KERNEL in btrfs_calc_avail_data_space We don't hold any locks here. Inidirectly called from statfs. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
1b86826d |
|
17-May-2017 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: cleanup root usage by btrfs_get_alloc_profile There are two places where we don't already know what kind of alloc profile we need before calling btrfs_get_alloc_profile, but we need access to a root everywhere we call it. This patch adds helpers for btrfs_{data,metadata,system}_alloc_profile() and relegates btrfs_system_alloc_profile to a static for use in those two cases. The next patch will eliminate one of those. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
9e11ceee |
|
11-Apr-2017 |
Jan Kara <jack@suse.cz> |
btrfs: Convert to separately allocated bdi Allocate struct backing_dev_info separately instead of embedding it inside superblock. This unifies handling of bdi among users. CC: Chris Mason <clm@fb.com> CC: Josef Bacik <jbacik@fb.com> CC: David Sterba <dsterba@suse.com> CC: linux-btrfs@vger.kernel.org Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
4d339d01 |
|
03-Mar-2017 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
btrfs: No need to check !(flags & MS_RDONLY) twice Code cleanup. The code block is for !(*flags & MS_RDONLY). We don't need to check it again. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
951e7966 |
|
31-Mar-2017 |
Adam Borowski <kilobyte@angband.pl> |
btrfs: drop the nossd flag when remounting with -o ssd The opposite case was already handled right in the very next switch entry. And also when turning on nossd, drop ssd_spread. Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Signed-off-by: Adam Borowski <kilobyte@angband.pl> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
56e033a7 |
|
10-Feb-2017 |
David Sterba <dsterba@suse.com> |
btrfs: remove unused parameter from btrfs_fill_super Never used for anything meaningful since we have our own superblock filler. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
20c7bcec |
|
15-Dec-2016 |
Seraphime Kirkovski <kirkseraph@gmail.com> |
Btrfs: ACCESS_ONCE cleanup This replaces ACCESS_ONCE macro with the corresponding READ|WRITE macros Signed-off-by: Seraphime Kirkovski <kirkseraph@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
40f7828b |
|
14-Dec-2016 |
Petr Mladek <pmladek@suse.com> |
btrfs: better handle btrfs_printk() defaults Commit 262c5e86fec7 ("printk/btrfs: handle more message headers") triggers: warning: `ratelimit' may be used uninitialized in this function with gcc (4.1.2) and probably many other versions. The code actually is correct but a bit twisted. Let's make it more straightforward and set the default values at the beginning. Link: http://lkml.kernel.org/r/20161213135246.GQ3506@pathway.suse.cz Signed-off-by: Petr Mladek <pmladek@suse.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
262c5e86 |
|
12-Dec-2016 |
Petr Mladek <pmladek@suse.com> |
printk/btrfs: handle more message headers Commit 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines") allows to define more message headers for a single message. The motivation is that continuous lines might get mixed. Therefore it make sense to define the right log level for every piece of a cont line. The current btrfs_printk() macros do not support continuous lines at the moment. But better be prepared for a custom messages and avoid potential "lvl" buffer overflow. This patch iterates over the entire message header. It is interested only into the message level like the original code. This patch also introduces PRINTK_MAX_SINGLE_HEADER_LEN. Three bytes are enough for the message level header at the moment. But it used to be three, see the commit 04d2c8c83d0e ("printk: convert the format for KERN_<LEVEL> to a 2 byte pattern"). Also I fixed the default ratelimit level. It looked very strange when it was different from the default log level. [pmladek@suse.com: Fix a check of the valid message level] Link: http://lkml.kernel.org/r/20161111183236.GD2145@dhcp128.suse.cz Link: http://lkml.kernel.org/r/1478695291-12169-4-git-send-email-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: David Sterba <dsterba@suse.com> Cc: Joe Perches <joe@perches.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3a45bb20 |
|
09-Sep-2016 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: remove root parameter from transaction commit/end routines Now we only use the root parameter to print the root objectid in a tracepoint. We can use the root parameter from the transaction handle for that. It's also used to join the transaction with async commits, so we remove the comment that it's just for checking. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
2ff7e61e |
|
22-Jun-2016 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: take an fs_info directly when the root is not used otherwise There are loads of functions in btrfs that accept a root parameter but only use it to obtain an fs_info pointer. Let's convert those to just accept an fs_info pointer directly. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
0b246afa |
|
22-Jun-2016 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: root->fs_info cleanup, add fs_info convenience variables In routines where someptr->fs_info is referenced multiple times, we introduce a convenience variable. This makes the code considerably more readable. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
da17066c |
|
15-Jun-2016 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: pull node/sector/stripe sizes out of root and into fs_info We track the node sizes per-root, but they never vary from the values in the superblock. This patch messes with the 80-column style a bit, but subsequent patches to factor out root->fs_info into a convenience variable fix it up again. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
6bccf3ab |
|
21-Jun-2016 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: call functions that always use the same root with fs_info instead There are many functions that are always called with the same root argument. Rather than passing the same root every time, we can pass an fs_info pointer instead and have the function get the root pointer itself. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
5d9dbe61 |
|
05-Oct-2016 |
David Sterba <dsterba@suse.com> |
btrfs: remove stale comment from btrfs_statfs Signed-off-by: David Sterba <dsterba@suse.com>
|
#
ab8d0fc4 |
|
20-Sep-2016 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: convert pr_* to btrfs_* where possible For many printks, we want to know which file system issued the message. This patch converts most pr_* calls to use the btrfs_* versions instead. In some cases, this means adding plumbing to allow call sites access to an fs_info pointer. fs/btrfs/check-integrity.c is left alone for another day. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
62e85577 |
|
20-Sep-2016 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: convert printk(KERN_* to use pr_* calls This patch converts printk(KERN_* style messages to use the pr_* versions. One side effect is that anything that was KERN_DEBUG is now automatically a dynamic debug message. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
5d163e0e |
|
20-Sep-2016 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: unsplit printed strings CodingStyle chapter 2: "[...] never break user-visible strings such as printk messages, because that breaks the ability to grep for them." This patch unsplits user-visible strings. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
afcdd129 |
|
02-Sep-2016 |
Josef Bacik <jbacik@fb.com> |
Btrfs: add a flags field to btrfs_fs_info We have a lot of random ints in btrfs_fs_info that can be put into flags. This is mostly equivalent with the exception of how we deal with quota going on or off, now instead we set a flag when we are turning it on or off and deal with that appropriately, rather than just having a pending state that the current quota_enabled gets set to. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
9e7cc91a |
|
31-Jul-2016 |
Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> |
btrfs: fix fsfreeze hang caused by delayed iputs deal When running fstests generic/068, sometimes we got below deadlock: xfs_io D ffff8800331dbb20 0 6697 6693 0x00000080 ffff8800331dbb20 ffff88007acfc140 ffff880034d895c0 ffff8800331dc000 ffff880032d243e8 fffffffeffffffff ffff880032d24400 0000000000000001 ffff8800331dbb38 ffffffff816a9045 ffff880034d895c0 ffff8800331dbba8 Call Trace: [<ffffffff816a9045>] schedule+0x35/0x80 [<ffffffff816abab2>] rwsem_down_read_failed+0xf2/0x140 [<ffffffff8118f5e1>] ? __filemap_fdatawrite_range+0xd1/0x100 [<ffffffff8134f978>] call_rwsem_down_read_failed+0x18/0x30 [<ffffffffa06631fc>] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs] [<ffffffff810d32b5>] percpu_down_read+0x35/0x50 [<ffffffff81217dfc>] __sb_start_write+0x2c/0x40 [<ffffffffa067f5d5>] start_transaction+0x2a5/0x4d0 [btrfs] [<ffffffffa067f857>] btrfs_join_transaction+0x17/0x20 [btrfs] [<ffffffffa068ba34>] btrfs_evict_inode+0x3c4/0x5d0 [btrfs] [<ffffffff81230a1a>] evict+0xba/0x1a0 [<ffffffff812316b6>] iput+0x196/0x200 [<ffffffffa06851d0>] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs] [<ffffffffa067f1d8>] btrfs_commit_transaction+0x928/0xa80 [btrfs] [<ffffffffa0646df0>] btrfs_freeze+0x30/0x40 [btrfs] [<ffffffff81218040>] freeze_super+0xf0/0x190 [<ffffffff81229275>] do_vfs_ioctl+0x4a5/0x5c0 [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70 [<ffffffff810038cf>] ? syscall_trace_enter_phase1+0x11f/0x140 [<ffffffff81229409>] SyS_ioctl+0x79/0x90 [<ffffffff81003c12>] do_syscall_64+0x62/0x110 [<ffffffff816acbe1>] entry_SYSCALL64_slow_path+0x25/0x25 >From this warning, freeze_super() already holds SB_FREEZE_FS, but btrfs_freeze() will call btrfs_commit_transaction() again, if btrfs_commit_transaction() finds that it has delayed iputs to handle, it'll start_transaction(), which will try to get SB_FREEZE_FS lock again, then deadlock occurs. The root cause is that in btrfs, sync_filesystem(sb) does not make sure all metadata is updated. There still maybe some codes adding delayed iputs, see below sample race window: CPU1 | CPU2 |-> freeze_super() | |-> sync_filesystem(sb); | | |-> cleaner_kthread() | | |-> btrfs_delete_unused_bgs() | | |-> btrfs_remove_chunk() | | |-> btrfs_remove_block_group() | | |-> btrfs_add_delayed_iput() | | |-> sb->s_writers.frozen = SB_FREEZE_FS; | |-> sb_wait_write(sb, SB_FREEZE_FS); | | acquire SB_FREEZE_FS lock. | | | |-> btrfs_freeze() | |-> btrfs_commit_transaction() | |-> btrfs_run_delayed_iputs() | | will handle delayed iputs, | | that means start_transaction() | | will be called, which will try | | to get SB_FREEZE_FS lock. | To fix this issue, introduce a "int fs_frozen" to record internally whether fs has been frozen. If fs has been frozen, we can not handle delayed iputs. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add comment to btrfs_freeze ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
66642832 |
|
10-Jun-2016 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: btrfs_abort_transaction, drop root parameter __btrfs_abort_transaction doesn't use its root parameter except to obtain an fs_info pointer. We can obtain that from trans->root->fs_info for now and from trans->fs_info in a later patch. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
8632daae |
|
20-Jun-2016 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: tests, move initialization into tests/ We have all these stubs that only exist because they're called from btrfs_run_sanity_tests, which is a static inside super.c. Let's just move it all into tests/btrfs-tests.c and only have one stub. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
3cdde224 |
|
09-Jun-2016 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: btrfs_test_opt and friends should take a btrfs_fs_info btrfs_test_opt and friends only use the root pointer to access the fs_info. Let's pass the fs_info directly in preparation to eliminate similar patterns all over btrfs. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
bc074524 |
|
09-Jun-2016 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: prefix fsid to all trace events When using trace events to debug a problem, it's impossible to determine which file system generated a particular event. This patch adds a macro to prefix standard information to the head of a trace event. The extent_state alloc/free events are all that's left without an fs_info available. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
9f8d4909 |
|
15-Jul-2016 |
David Sterba <dsterba@suse.com> |
btrfs: remove obsolete part of comment in statfs The mixed blockgroup reporting has been fixed by commit ae02d1bd070767e109f4a6f1bb1f466e9698a355 "btrfs: fix mixed block count of available space" Signed-off-by: David Sterba <dsterba@suse.com>
|
#
35f4e5e6 |
|
13-Jul-2016 |
Nikolay Borisov <kernel@kyup.com> |
btrfs: Add ratelimit to btrfs printing This patch adds ratelimiting to all messages which are not using the _rl version of the various printing APIs in btrfs. This is designed to be used as a safety net, since a flood messages might cause the softlockup detector to trigger. To reduce interference between different classes of messages use a separate ratelimit state for every class of message. Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
90c711ab |
|
12-Jun-2016 |
Zygo Blaxell <ce3g8jdj@umail.furryterror.org> |
btrfs: avoid blocking open_ctree from cleaner_kthread This fixes a problem introduced in commit 2f3165ecf103599f82bf0ea254039db335fb5005 "btrfs: don't force mounts to wait for cleaner_kthread to delete one or more subvolumes". open_ctree eventually calls btrfs_replay_log which in turn calls btrfs_commit_super which tries to lock the cleaner_mutex, causing a recursive mutex deadlock during mount. Instead of playing whack-a-mole trying to keep up with all the functions that may want to lock cleaner_mutex, put all the cleaner_mutex lockers back where they were, and attack the problem more directly: keep cleaner_kthread asleep until the filesystem is mounted. When filesystems are mounted read-only and later remounted read-write, open_ctree did not set fs_info->open and neither does anything else. Set this flag in btrfs_remount so that neither btrfs_delete_unused_bgs nor cleaner_kthread get confused by the common case of "/" filesystem read-only mount followed by read-write remount. Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
64c12921 |
|
07-Jun-2016 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: account for non-CoW'd blocks in btrfs_abort_transaction The test for !trans->blocks_used in btrfs_abort_transaction is insufficient to determine whether it's safe to drop the transaction handle on the floor. btrfs_cow_block, informed by should_cow_block, can return blocks that have already been CoW'd in the current transaction. trans->blocks_used is only incremented for new block allocations. If an operation overlaps the blocks in the current transaction entirely and must abort the transaction, we'll happily let it clean up the trans handle even though it may have modified the blocks and will commit an incomplete operation. In the long-term, I'd like to do closer tracking of when the fs is actually modified so we can still recover as gracefully as possible, but that approach will need some discussion. In the short term, since this is the only code using trans->blocks_used, let's just switch it to a bool indicating whether any blocks were used and set it when should_cow_block returns false. Cc: stable@vger.kernel.org # 3.4+ Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d94f43b4 |
|
01-Jun-2016 |
Feifei Xu <xufeifei@linux.vnet.ibm.com> |
Btrfs: self-tests: Support testing all possible sectorsizes and nodesizes To test all possible sectorsizes, this commit adds a sectorsize array. This commit executes the tests for all possible sectorsizes and nodesizes. Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
5f9e1059 |
|
16-Sep-2015 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: advertise which crc32c implementation is being used at module load Since several architectures support hardware-accelerated crc32c calculation, it would be nice to confirm that btrfs is actually using it. We can see an elevated use count for the module, but it doesn't actually show who the users are. This patch simply prints the name of the driver after successfully initializing the shash. Signed-off-by: Jeff Mahoney <jeffm@suse.com> [ added a helper and used in module load-time message ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
b9ef22de |
|
01-Jun-2016 |
Feifei Xu <xufeifei@linux.vnet.ibm.com> |
Btrfs: self-tests: Support non-4k page size self-tests code assumes 4k as the sectorsize and nodesize. This commit fix hardcoded 4K. Enables the self-tests code to be executed on non-4k page sized systems (e.g. ppc64). Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
01327610 |
|
19-May-2016 |
Nicholas D Steeves <nsteeves@gmail.com> |
btrfs: fix string and comment grammatical issues and typos Signed-off-by: Nicholas D Steeves <nsteeves@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
578def7c |
|
26-Apr-2016 |
Filipe Manana <fdmanana@suse.com> |
Btrfs: don't wait for unrelated IO to finish before relocation Before the relocation process of a block group starts, it sets the block group to readonly mode, then flushes all delalloc writes and then finally it waits for all ordered extents to complete. This last step includes waiting for ordered extents destinated at extents allocated in other block groups, making us waste unecessary time. So improve this by waiting only for ordered extents that fall into the block group's range. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
|
#
41b34acc |
|
30-Mar-2016 |
Luis de Bethencourt <luisbg@osg.samsung.com> |
btrfs: avoid overflowing f_bfree Since mixed block groups accounting isn't byte-accurate and f_bree is an unsigned integer, it could overflow. Avoid this. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Suggested-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
ae02d1bd |
|
30-Mar-2016 |
Luis de Bethencourt <luisbg@osg.samsung.com> |
btrfs: fix mixed block count of available space Metadata for mixed block is already accounted in total data and should not be counted as part of the free metadata space. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=114281 Signed-off-by: David Sterba <dsterba@suse.com>
|
#
180e4d47 |
|
04-Apr-2016 |
Luis de Bethencourt <luisbg@osg.samsung.com> |
btrfs: fix typos in comments Correct a typo in the chunk_mutex name to make it grepable. Since it is better to fix several typos at once, fixing the 2 more in the same file. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
0713d90c |
|
16-Mar-2016 |
Anand Jain <anand.jain@oracle.com> |
btrfs: remove save_error_info() Actually save_error_info() sets the FS state to error and nothing else. Further the word save doesn't induce caffeine when compared to the word set in what actually it does. So to make it better understandable move save_error_info() code to its only consumer itself. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
34d97007 |
|
16-Mar-2016 |
Anand Jain <anand.jain@oracle.com> |
btrfs: rename btrfs_std_error to btrfs_handle_fs_error btrfs_std_error() handles errors, puts FS into readonly mode (as of now). So its good idea to rename it to btrfs_handle_fs_error(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ edit changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
8ae1af3c |
|
09-Mar-2016 |
Anand Jain <anand.jain@oracle.com> |
btrfs: rename btrfs_print_info to btrfs_print_mod_info So that it indicates what it does. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
d5131b65 |
|
17-Feb-2016 |
David Sterba <dsterba@suse.com> |
btrfs: drop unused argument in btrfs_ioctl_get_supported_features Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
c5868f83 |
|
17-Feb-2016 |
David Sterba <dsterba@suse.com> |
btrfs: add GET_SUPPORTED_FEATURES to the control device ioctls The control device is accessible when no filesystem is mounted and we may want to query features supported by the module. This is already possible using the sysfs files, this ioctl is for parity and convenience. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
fed8f166 |
|
18-Jan-2016 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Introduce new mount option alias for nologreplay Introduce new mount option alias "norecovery" for nologreplay, to keep "norecovery" behavior the same with other filesystems. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
96da0919 |
|
18-Jan-2016 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Introduce new mount option to disable tree log replay Introduce a new mount option "nologreplay" to co-operate with "ro" mount option to get real readonly mount, like "norecovery" in ext* and xfs. Since the new parse_options() need to check new flags at remount time, so add a new parameter for parse_options(). Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
8dcddfa0 |
|
18-Jan-2016 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Introduce new mount option usebackuproot to replace recovery Current "recovery" mount option will only try to use backup root. However the word "recovery" is too generic and may be confusing for some users. Here introduce a new and more specific mount option, "usebackuproot" to replace "recovery" mount option. "Recovery" will be kept for compatibility reason, but will be deprecated. Also, since "usebackuproot" will only affect mount behavior and after open_ctree() it has nothing to do with the filesystem, so clear the flag after mount succeeded. This provides the basis for later unified "norecovery" mount option. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> [ dropped usebackuproot from show_mount, added note about 'recovery' to docs ] Signed-off-by: David Sterba <dsterba@suse.com>
|
#
e410e34f |
|
29-Jan-2016 |
Chris Mason <clm@fb.com> |
Revert "btrfs: synchronize incompat feature bits with sysfs files" This reverts commit 14e46e04958df740c6c6a94849f176159a333f13. This ends up doing sysfs operations from deep in balance (where we should be GFP_NOFS) and under heavy balance load, we're making races against sysfs internals. Revert it for now while we figure things out. Signed-off-by: Chris Mason <clm@fb.com>
|
#
14e46e04 |
|
21-Jan-2016 |
David Sterba <dsterba@suse.com> |
btrfs: synchronize incompat feature bits with sysfs files The files under /sys/fs/UUID/features get out of sync with the actual incompat bits set for the filesystem if they change after mount (eg. the LZO compression). Synchronize the feature bits with the sysfs files representing them right after we set/clear them. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
b7c47bbb |
|
06-Jan-2016 |
Tsutomu Itoh <t-itoh@jp.fujitsu.com> |
Btrfs: fix output of compression message in btrfs_parse_options() The compression message might not be correctly output. Fix it. [[before fix]] # mount -o compress /dev/sdb3 /test3 [ 996.874264] BTRFS info (device sdb3): disk space caching is enabled [ 996.874268] BTRFS: has skinny extents # mount | grep /test3 /dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/) # mount -o remount,compress-force /dev/sdb3 /test3 [ 1035.075017] BTRFS info (device sdb3): force zlib compression [ 1035.075021] BTRFS info (device sdb3): disk space caching is enabled # mount | grep /test3 /dev/sdb3 on /test3 type btrfs (rw,relatime,compress-force=zlib,space_cache,subvolid=5,subvol=/) # mount -o remount,compress /dev/sdb3 /test3 [ 1053.679092] BTRFS info (device sdb3): disk space caching is enabled # mount | grep /test3 /dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/) [[after fix]] # mount -o compress /dev/sdb3 /test3 [ 401.021753] BTRFS info (device sdb3): use zlib compression [ 401.021758] BTRFS info (device sdb3): disk space caching is enabled [ 401.021760] BTRFS: has skinny extents # mount | grep /test3 /dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/) # mount -o remount,compress-force /dev/sdb3 /test3 [ 439.824624] BTRFS info (device sdb3): force zlib compression [ 439.824629] BTRFS info (device sdb3): disk space caching is enabled # mount | grep /test3 /dev/sdb3 on /test3 type btrfs (rw,relatime,compress-force=zlib,space_cache,subvolid=5,subvol=/) # mount -o remount,compress /dev/sdb3 /test3 [ 459.918430] BTRFS info (device sdb3): use zlib compression [ 459.918434] BTRFS info (device sdb3): disk space caching is enabled # mount | grep /test3 /dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/) Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
ca8a51b3 |
|
10-Oct-2015 |
David Sterba <dsterba@suse.com> |
btrfs: statfs: report zero available if metadata are exhausted There is one ENOSPC case that's very confusing. There's Available greater than zero but no file operation succeds (besides removing files). This happens when the metadata are exhausted and there's no possibility to allocate another chunk. In this scenario it's normal that there's still some space in the data chunk and the calculation in df reflects that in the Avail value. To at least give some clue about the ENOSPC situation, let statfs report zero value in Avail, even if there's still data space available. Current: /dev/sdb1 4.0G 3.3G 719M 83% /mnt/test New: /dev/sdb1 4.0G 3.3G 0 100% /mnt/test We calculate the remaining metadata space minus global reserve. If this is (supposedly) smaller than zero, there's no space. But this does not hold in practice, the exhausted state happens where's still some positive delta. So we apply some guesswork and compare the delta to a 4M threshold. (Practically observed delta was 2M.) We probably cannot calculate the exact threshold value because this depends on the internal reservations requested by various operations, so some operations that consume a few metadata will succeed even if the Avail is zero. But this is better than the other way around. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
4d4ab6d6 |
|
19-Nov-2015 |
David Sterba <dsterba@suse.com> |
btrfs: constify static arrays There are a few statically initialized arrays that can be made const. The remaining (like file_system_type, sysfs attributes or prop handlers) do not allow that due to type mismatch when passed to the APIs or because the structures are modified through other members. Signed-off-by: David Sterba <dsterba@suse.com>
|
#
ee22184b |
|
14-Dec-2015 |
Byongho Lee <bhlee.kernel@gmail.com> |
Btrfs: use linux/sizes.h to represent constants We use many constants to represent size and offset value. And to make code readable we use '256 * 1024 * 1024' instead of '268435456' to represent '256MB'. However we can make far more readable with 'SZ_256MB' which is defined in the 'linux/sizes.h'. So this patch replaces 'xxx * 1024 * 1024' kind of expression with single 'SZ_xxxMB' if 'xxx' is a power of 2 then 'xxx * SZ_1M' if 'xxx' is not a power of 2. And I haven't touched to '4096' & '8192' because it's more intuitive than 'SZ_4KB' & 'SZ_8KB'. Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
a1c6f057 |
|
13-Apr-2015 |
Dmitry Monakhov <dmonakhov@openvz.org> |
fs: use block_device name vsprintf helper Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
70f6d82e |
|
29-Sep-2015 |
Omar Sandoval <osandov@fb.com> |
Btrfs: add free space tree mount option Now we can finally hook up everything so we can actually use free space tree. The free space tree is enabled by passing the space_cache=v2 mount option. On the first mount with the this option set, the free space tree will be created and the FREE_SPACE_TREE read-only compat bit will be set. Any time the filesystem is mounted from then on, we must use the free space tree. The clear_cache option will also clear the free space tree. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
7c55ee0c |
|
29-Sep-2015 |
Omar Sandoval <osandov@fb.com> |
Btrfs: add free space tree sanity tests This tests the operations on the free space tree trying to excercise all of the main cases for both formats. Between this and xfstests, the free space tree should have pretty good coverage. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
d0bd4560 |
|
23-Sep-2015 |
Josef Bacik <jbacik@fb.com> |
Btrfs: add fragment=* debug mount option In tracking down these weird bitmap problems it was helpful to artificially create an extremely fragmented file system. These mount options let us either fragment data or metadata or both. With these options I could reproduce all sorts of weird latencies and hangs that occur under extreme fragmentation and get them fixed. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
57d816a1 |
|
14-Aug-2015 |
Anand Jain <anand.jain@oracle.com> |
Btrfs: __btrfs_std_error() logic should be consistent w/out CONFIG_PRINTK defined error handling logic behaves differently with or without CONFIG_PRINTK defined, since there are two copies of the same function which a bit of different logic One, when CONFIG_PRINTK is defined, code is __btrfs_std_error(..) { :: save_error_info(fs_info); if (sb->s_flags & MS_BORN) btrfs_handle_error(fs_info); } and two when CONFIG_PRINTK is not defined, the code is __btrfs_std_error(..) { :: if (sb->s_flags & MS_BORN) { save_error_info(fs_info); btrfs_handle_error(fs_info); } } I doubt if this was intentional ? and appear to have caused since we maintain two copies of the same function and they got diverged with commits. Now to decide which logic is correct reviewed changes as below, 533574c6bc30cf526cc1c41bde050c854a945efb Commit added two copies of this function cf79ffb5b79e8a2b587fbf218809e691bb396c98 Commit made change to only one copy of the function and to the copy when CONFIG_PRINTK is defined. To fix this, instead of maintaining two copies of same function approach, maintain single function, and just put the extra portion of the code under CONFIG_PRINTK define. This patch just does that. And keeps code of with CONFIG_PRINTK defined. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
85e0a0f2 |
|
10-Sep-2015 |
Filipe Manana <fdmanana@suse.com> |
Btrfs: remove unnecessary locking of cleaner_mutex to avoid deadlock After commmit e44163e17796 ("btrfs: explictly delete unused block groups in close_ctree and ro-remount"), added in the 4.3 merge window, we have calls to btrfs_delete_unused_bgs() while holding the cleaner_mutex. This can cause a deadlock with a concurrent block group relocation (when a filesystem balance or shrink operation is in progress for example) because btrfs_delete_unused_bgs() locks delete_unused_bgs_mutex and the relocation path locks first delete_unused_bgs_mutex and then it locks cleaner_mutex, resulting in a classic ABBA deadlock: CPU 0 CPU 1 lock fs_info->cleaner_mutex __btrfs_balance() || btrfs_shrink_device() lock fs_info->delete_unused_bgs_mutex btrfs_relocate_chunk() btrfs_relocate_block_group() lock fs_info->cleaner_mutex btrfs_delete_unused_bgs() lock fs_info->delete_unused_bgs_mutex Fix this by not taking the cleaner_mutex before calling btrfs_delete_unused_bgs() because it's no longer needed after commit 67c5e7d464bc ("Btrfs: fix race between balance and unused block group deletion"). The mutex fs_info->delete_unused_bgs_mutex, the spinlock fs_info->unused_bgs_lock and a block group's spinlock are enough to get correct serialization between tasks running relocation and unused block group deletion (as well as between multiple tasks concurrently calling btrfs_delete_unused_bgs()). This issue was discussed (in the mailing list) during the review of the patch titled "btrfs: explictly delete unused block groups in close_ctree and ro-remount" and it was agreed that acquiring the cleaner mutex had to be dropped after the patch titled "Btrfs: fix race between balance and unused block group deletion" got merged (both patches were submitted at about the same time, but one landed in kernel 4.2 and the other in the 4.3 merge window). Signed-off-by: Filipe Manana <fdmanana@suse.com>
|
#
da2f0f74 |
|
02-Jul-2015 |
Chris Mason <clm@fb.com> |
Btrfs: add support for blkio controllers This attaches accounting information to bios as we submit them so the new blkio controllers can throttle on btrfs filesystems. Not much is required, we're just associating bios with blkcgs during clone, calling wbc_init_bio()/wbc_account_io() during writepages submission, and attaching the bios to the current context during direct IO. Finally if we are splitting bios during btrfs_map_bio, this attaches accounting information to the split. The end result is able to throttle nicely on single disk filesystems. A little more work is required for multi-device filesystems. Signed-off-by: Chris Mason <clm@fb.com>
|
#
f368ed60 |
|
30-Jul-2015 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
char: make misc_deregister a void function With well over 200+ users of this api, there are a mere 12 users that actually checked the return value of this function. And all of them really didn't do anything with that information as the system or module was shutting down no matter what. So stop pretending like it matters, and just return void from misc_deregister(). If something goes wrong in the call, you will get a WARNING splat in the syslog so you know how to fix up your driver. Other than that, there's nothing that can go wrong. Cc: Alasdair Kergon <agk@redhat.com> Cc: Neil Brown <neilb@suse.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Andreas Dilger <andreas.dilger@intel.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Wim Van Sebroeck <wim@iguana.be> Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: David Teigland <teigland@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <jlbec@evilplan.org> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
e33e17ee |
|
15-Jun-2015 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: add missing discards when unpinning extents with -o discard When we clear the dirty bits in btrfs_delete_unused_bgs for extents in the empty block group, it results in btrfs_finish_extent_commit being unable to discard the freed extents. The block group removal patch added an alternate path to forget extents other than btrfs_finish_extent_commit. As a result, any extents that would be freed when the block group is removed aren't discarded. In my test run, with a large copy of mixed sized files followed by removal, it left nearly 2/3 of extents undiscarded. To clean up the block groups, we add the removed block group onto a list that will be discarded after transaction commit. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Tested-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
e44163e1 |
|
15-Jun-2015 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: explictly delete unused block groups in close_ctree and ro-remount The cleaner thread may already be sleeping by the time we enter close_ctree. If that's the case, we'll skip removing any unused block groups queued for removal, even during a normal umount. They'll be cleaned up automatically at next mount, but users expect a umount to be a clean synchronization point, especially when used on thin-provisioned storage with -odiscard. We also explicitly remove unused block groups in the ro-remount path for the same reason. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Tested-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
c8d3fe02 |
|
18-May-2015 |
Omar Sandoval <osandov@osandov.com> |
Btrfs: show subvol= and subvolid= in /proc/mounts Now that we're guaranteed to have a meaningful root dentry, we can just export seq_dentry() and use it in btrfs_show_options(). The subvolume ID is easy to get and can also be useful, so put that in there, too. Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
05dbe683 |
|
18-May-2015 |
Omar Sandoval <osandov@osandov.com> |
Btrfs: unify subvol= and subvolid= mounting Currently, mounting a subvolume with subvolid= takes a different code path than mounting with subvol=. This isn't really a big deal except for the fact that mounts done with subvolid= or the default subvolume don't have a dentry that's connected to the dentry tree like in the subvol= case. To unify the code paths, when given subvolid= or using the default subvolume ID, translate it into a subvolume name by walking ROOT_BACKREFs in the root tree and INODE_REFs in the filesystem trees. Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
bb289b7b |
|
18-May-2015 |
Omar Sandoval <osandov@osandov.com> |
Btrfs: fail on mismatched subvol and subvolid mount options There's nothing to stop a user from passing both subvol= and subvolid= to mount, but if they don't refer to the same subvolume, someone is going to be surprised at some point. Error out on this case, but allow users to pass in both if they do match (which they could, for example, get out of /proc/mounts). Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
fa330659 |
|
18-May-2015 |
Omar Sandoval <osandov@osandov.com> |
Btrfs: clean up error handling in mount_subvol() In preparation for new functionality in mount_subvol(), give it ownership of subvol_name and tidy up the error paths. Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
e6e4dbe8 |
|
18-May-2015 |
Omar Sandoval <osandov@osandov.com> |
Btrfs: remove all subvol options before mounting top-level Currently, setup_root_args() substitutes 's/subvol=[^,]*/subvolid=0/'. But, this means that if the user passes both a subvol and subvolid for some reason, we won't actually mount the top-level when we recursively mount. For example, consider: mkfs.btrfs -f /dev/sdb mount /dev/sdb /mnt btrfs subvol create /mnt/subvol1 # subvolid=257 btrfs subvol create /mnt/subvol2 # subvolid=258 umount /mnt mount -osubvol=/subvol1,subvolid=258 /dev/sdb /mnt In the final mount, subvol=/subvol1,subvolid=258 becomes subvolid=0,subvolid=258, and the last option takes precedence, so we mount subvol2 and try to look up subvol1 inside of it, which fails. So, instead, do a thorough scan through the argument list and remove any subvol= and subvolid= options, then append subvolid=0 to the end. This implicitly makes subvol= take precedence over subvolid=, but we're about to add a stricter check for that. This also makes setup_root_args() more generic, which we'll need soon. Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
773cd04e |
|
18-May-2015 |
Omar Sandoval <osandov@osandov.com> |
Btrfs: lock superblock before remounting for rw subvol Since commit 0723a0473fb4 ("btrfs: allow mounting btrfs subvolumes with different ro/rw options"), when mounting a subvolume read/write when another subvolume has previously been mounted read-only, we first do a remount. However, this should be done with the superblock locked, as per sync_filesystem(): /* * We need to be protected against the filesystem going from * r/o to r/w or vice versa. */ WARN_ON(!rwsem_is_locked(&sb->s_umount)); This WARN_ON can easily be hit with: mkfs.btrfs -f /dev/vdb mount /dev/vdb /mnt btrfs subvol create /mnt/vol1 btrfs subvol create /mnt/vol2 umount /mnt mount -oro,subvol=/vol1 /dev/vdb /mnt mount -orw,subvol=/vol2 /dev/vdb /mnt2 Fixes: 0723a0473fb4 ("btrfs: allow mounting btrfs subvolumes with different ro/rw options") Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
c0d19e2b |
|
24-Apr-2015 |
David Sterba <dsterba@suse.cz> |
btrfs: add 'cold' compiler annotations to all error handling functions The annotated functios will be placed into .text.unlikely section. The annotation also hints compiler to move the code out of the hot paths, and may implicitly mark if-statement leading to that block as unlikely. This is a heuristic, the impact on the generated code is not significant. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
|
#
1a9a8a71 |
|
24-Apr-2015 |
David Sterba <dsterba@suse.cz> |
btrfs: report exact callsite where transaction abort occurs WARN is called from a single location and all bugreports say that's in super.c __btrfs_abort_transaction. This is slightly confusing as we'd rather want to know the exact callsite. Whereas this information is printed in the syslog below the stacktrace, this requires further look and we usually see only the headline from WARNING. Moving the WARN into the macro has to inline some code and increases code by a few kilobytes: text data bss dec hex filename 835481 20305 14120 869906 d4612 btrfs.ko.before 842883 20305 14120 877308 d62fc btrfs.ko.after The delta is +7k (130+ calls), measured on 3.19 x86_64, distro config. The increase is not small and could lead to worse icache use. The code is on error/exit paths that can be recognized by compiler as cold and moved out of the way so the impact is speculated to be low, if measurable at all. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
|
#
2b0143b5 |
|
17-Mar-2015 |
David Howells <dhowells@redhat.com> |
VFS: normal filesystems (and lustre): d_inode() annotations that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
727b9784 |
|
20-Mar-2015 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: cleanup orphans while looking up default subvolume Orphans in the fs tree are cleaned up via open_ctree and subvolume orphans are cleaned via btrfs_lookup_dentry -- except when a default subvolume is in use. The name for the default subvolume uses a manual lookup that doesn't trigger orphan cleanup and needs to trigger it manually as well. This doesn't apply to the remount case since the subvolumes are cleaned up by walking the root radix tree. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
|
#
d8620958 |
|
24-Mar-2015 |
Tom Van Braeckel <tomvanbraeckel@gmail.com> |
btrfs: explicitly set control file's private_data The private_data member of the Btrfs control device file (/dev/btrfs-control) is used to hold the current transaction and needs to be initialized to NULL to signify that no transaction is in progress. We explicitly set the control file's private_data to NULL to be independent of whatever value the misc subsystem initializes it to. Backstory: ---------- The misc subsystem (which is used by /dev/btrfs-control) initializes a file's private_data to point to the misc device when a driver has registered a custom open file operation and initializes it to NULL when a custom open file operation has *not* been provided. This subtle quirk is confusing, to the point where kernel code registers *empty* file open operations to have private_data point to the misc device structure. And it leads to bugs, where the addition or removal of a custom open file operation surprisingly changes the initial contents of a file's private_data structure. To simplify things in the misc subsystem, a patch [1] has been proposed to *always* set private_data to point to the misc device instead of only doing this when a custom open file operation has been registered. But before we can fix this in the misc subsystem itself, we need to modify the (few) drivers that rely on this very subtle behavior. [1] https://lkml.org/lkml/2014/12/4/939 Signed-off-by: Martin Kepplinger <martink@posteo.de> Signed-off-by: Tom Van Braeckel <tomvanbraeckel@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
f8c269d7 |
|
16-Jan-2015 |
David Sterba <dsterba@suse.cz> |
btrfs: cleanup 64bit/32bit divs, compile time constants Switch to div_u64 if the divisor is a numeric constant or sum of sizeof()s. We can remove a few instances of do_div that has the hidden semtantics of changing the 1st argument. Small power-of-two divisors are converted to bitshifts, large values are kept intact for clarity. Signed-off-by: David Sterba <dsterba@suse.cz>
|
#
16068ec1 |
|
16-Jan-2015 |
David Sterba <dsterba@suse.cz> |
btrfs: cleanup 64bit/32bit divs, compile time constants Switch to div_u64 if the divisor is a numeric constant or sum of sizeof()s. We can remove a few instances of do_div that has the hidden semtantics of changing the 1st argument. Small power-of-two divisors are converted to bitshifts, large values are kept intact for clarity. Signed-off-by: David Sterba <dsterba@suse.cz>
|
#
730a78c7 |
|
20-Jan-2015 |
David Sterba <dsterba@suse.cz> |
btrfs: remove a no-op unfreeze superbock callback Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
|
#
a53f4f8e |
|
19-Jan-2015 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. Commit 6b5fe46dfa52 (btrfs: do commit in sync_fs if there are pending changes) will call btrfs_start_transaction() in sync_fs(), to handle some operations needed to be done in next transaction. However this can cause deadlock if the filesystem is frozen, with the following sys_r+w output: [ 143.255932] Call Trace: [ 143.255936] [<ffffffff816c0e09>] schedule+0x29/0x70 [ 143.255939] [<ffffffff811cb7f3>] __sb_start_write+0xb3/0x100 [ 143.255971] [<ffffffffa040ec06>] start_transaction+0x2e6/0x5a0 [btrfs] [ 143.255992] [<ffffffffa040f1eb>] btrfs_start_transaction+0x1b/0x20 [btrfs] [ 143.256003] [<ffffffffa03dc0ba>] btrfs_sync_fs+0xca/0xd0 [btrfs] [ 143.256007] [<ffffffff811f7be0>] sync_fs_one_sb+0x20/0x30 [ 143.256011] [<ffffffff811cbd01>] iterate_supers+0xe1/0xf0 [ 143.256014] [<ffffffff811f7d75>] sys_sync+0x55/0x90 [ 143.256017] [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17 [ 143.256111] Call Trace: [ 143.256114] [<ffffffff816c0e09>] schedule+0x29/0x70 [ 143.256119] [<ffffffff816c3405>] rwsem_down_write_failed+0x1c5/0x2d0 [ 143.256123] [<ffffffff8133f013>] call_rwsem_down_write_failed+0x13/0x20 [ 143.256131] [<ffffffff811caae8>] thaw_super+0x28/0xc0 [ 143.256135] [<ffffffff811db3e5>] do_vfs_ioctl+0x3f5/0x540 [ 143.256187] [<ffffffff811db5c1>] SyS_ioctl+0x91/0xb0 [ 143.256213] [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17 The reason is like the following: (Holding s_umount) VFS sync_fs staff: |- btrfs_sync_fs() |- btrfs_start_transaction() |- sb_start_intwrite() (Waiting thaw_fs to unfreeze) VFS thaw_fs staff: thaw_fs() (Waiting sync_fs to release s_umount) So deadlock happens. This can be easily triggered by fstest/generic/068 with inode_cache mount option. The fix is to check if the fs is frozen, if the fs is frozen, just return and waiting for the next transaction. Cc: David Sterba <dsterba@suse.cz> Reported-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> [enhanced comment, changed to SB_FREEZE_WRITE] Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
|
#
98bd5c54 |
|
19-Jan-2015 |
David Sterba <dsterba@suse.cz> |
btrfs: sync ioctl, handle errors after transaction start The version merged to 3.19 did not handle errors from start_trancaction and could pass an invalid pointer to commit_transaction. Fixes: 6b5fe46dfa52441f ("btrfs: do commit in sync_fs if there are pending changes") Reported-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
|
#
c92f6be3 |
|
26-Nov-2014 |
Filipe Manana <fdmanana@suse.com> |
Btrfs: make btrfs_abort_transaction consider existence of new block groups If the transaction handle doesn't have used blocks but has created new block groups make sure we turn the fs into readonly mode too. This is because the new block groups didn't get all their metadata persisted into the chunk and device trees, and therefore if a subsequent transaction starts, allocates space from the new block groups, writes data or metadata into that space, commits successfully and then after we unmount and mount the filesystem again, the same space can be allocated again for a new block group, resulting in file data or metadata corruption. Example where we don't abort the transaction when we fail to finish the chunk allocation (add items to the chunk and device trees) and later a future transaction where the block group is removed fails because it can't find the chunk item in the chunk tree: [25230.404300] WARNING: CPU: 0 PID: 7721 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x50/0xfc [btrfs]() [25230.404301] BTRFS: Transaction aborted (error -28) [25230.404302] Modules linked in: btrfs dm_flakey nls_utf8 fuse xor raid6_pq ntfs vfat msdos fat xfs crc32c_generic libcrc32c ext3 jbd ext2 dm_mod nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse i2c_piix4 i2ccore parport_pc parport processor button pcspkr serio_raw thermal_sys evdev microcode ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sg sd_mod crc_t10dif crct10dif_generic crct10dif_common virtio_scsi floppy e1000 ata_piix libata virtio_pci virtio_ring scsi_mod virtio [last unloaded: btrfs] [25230.404325] CPU: 0 PID: 7721 Comm: xfs_io Not tainted 3.17.0-rc5-btrfs-next-1+ #1 [25230.404326] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [25230.404328] 0000000000000000 ffff88004581bb08 ffffffff813e7a13 ffff88004581bb50 [25230.404330] ffff88004581bb40 ffffffff810423aa ffffffffa049386a 00000000ffffffe4 [25230.404332] ffffffffa05214c0 000000000000240c ffff88010fc8f800 ffff88004581bba8 [25230.404334] Call Trace: [25230.404338] [<ffffffff813e7a13>] dump_stack+0x45/0x56 [25230.404342] [<ffffffff810423aa>] warn_slowpath_common+0x7f/0x98 [25230.404351] [<ffffffffa049386a>] ? __btrfs_abort_transaction+0x50/0xfc [btrfs] [25230.404353] [<ffffffff8104240b>] warn_slowpath_fmt+0x48/0x50 [25230.404362] [<ffffffffa049386a>] __btrfs_abort_transaction+0x50/0xfc [btrfs] [25230.404374] [<ffffffffa04a8c43>] btrfs_create_pending_block_groups+0x10c/0x135 [btrfs] [25230.404387] [<ffffffffa04b77fd>] __btrfs_end_transaction+0x7e/0x2de [btrfs] [25230.404398] [<ffffffffa04b7a6d>] btrfs_end_transaction+0x10/0x12 [btrfs] [25230.404408] [<ffffffffa04a3d64>] btrfs_check_data_free_space+0x111/0x1f0 [btrfs] [25230.404421] [<ffffffffa04c53bd>] __btrfs_buffered_write+0x160/0x48d [btrfs] [25230.404425] [<ffffffff811a9268>] ? cap_inode_need_killpriv+0x2d/0x37 [25230.404429] [<ffffffff810f6501>] ? get_page+0x1a/0x2b [25230.404441] [<ffffffffa04c7c95>] btrfs_file_write_iter+0x321/0x42f [btrfs] [25230.404443] [<ffffffff8110f5d9>] ? handle_mm_fault+0x7f3/0x846 [25230.404446] [<ffffffff813e98c5>] ? mutex_unlock+0x16/0x18 [25230.404449] [<ffffffff81138d68>] new_sync_write+0x7c/0xa0 [25230.404450] [<ffffffff81139401>] vfs_write+0xb0/0x112 [25230.404452] [<ffffffff81139c9d>] SyS_pwrite64+0x66/0x84 [25230.404454] [<ffffffff813ebf52>] system_call_fastpath+0x16/0x1b [25230.404455] ---[ end trace 5aa5684fdf47ab38 ]--- [25230.404458] BTRFS warning (device sdc): btrfs_create_pending_block_groups:9228: Aborting unused transaction(No space left). [25288.084814] BTRFS: error (device sdc) in btrfs_free_chunk:2509: errno=-2 No such entry (Failed lookup while freeing chunk.) Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
0d95c1be |
|
14-Nov-2014 |
David Sterba <dsterba@suse.cz> |
btrfs: fix wrong accounting of raid1 data profile in statfs The sizes that are obtained from space infos are in raw units and have to be adjusted according to the raid factor. This was missing for f_bavail and df reported doubled size for raid1. Reported-by: Martin Steigerwald <Martin@lichtvoll.de> Fixes: ba7b6e62f420 ("btrfs: adjust statfs calculations according to raid profiles") CC: stable@vger.kernel.org Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
|
#
7e33fd99 |
|
03-Nov-2014 |
Josef Bacik <jbacik@fb.com> |
Btrfs: don't take the chunk_mutex/dev_list mutex in statfs V2 Our gluster boxes get several thousand statfs() calls per second, which begins to suck hardcore with all of the lock contention on the chunk mutex and dev list mutex. We don't really need to hold these things, if we have transient weirdness with statfs() because of the chunk allocator we don't care, so remove this locking. We still need the dev_list lock if you mount with -o alloc_start however, which is a good argument for nuking that thing from orbit, but that's a patch for another day. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
7e1876ac |
|
05-Feb-2014 |
David Sterba <dsterba@suse.cz> |
btrfs: switch inode_cache option handling to pending changes The pending mount option(s) now share namespace and bits with the normal options, and the existing one for (inode_cache) is unset unconditionally at each transaction commit. Introduce a separate namespace for pending changes and enhance the descriptions of the intended change to use separate bits for each action. Signed-off-by: David Sterba <dsterba@suse.cz>
|
#
6b5fe46d |
|
28-Mar-2014 |
David Sterba <dsterba@suse.cz> |
btrfs: do commit in sync_fs if there are pending changes If a pending change is requested, it's not processed unless there is a transaction commit about to happen, not even after sync or SYNC_FS ioctl. For example a remount that toggles the inode_cache option will not take effect after sync on a quiescent filesystem. Signed-off-by: David Sterba <dsterba@suse.cz>
|
#
5ed5f588 |
|
15-Oct-2014 |
Josef Bacik <jbacik@fb.com> |
Btrfs: properly clean up btrfs_end_io_wq_cache In one of Dave's cleanup commits he forgot to call btrfs_end_io_wq_exit on unload, which makes us unable to unload and then re-load the btrfs module. This fixes the problem. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.cz> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
a43bb39b |
|
07-Oct-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Fix compile error when CONFIG_SECURITY is not set. Fix the following compile error when CONFIG_SECURITY is not set: error: 'struct security_mnt_opts' has no member named 'num_mnt_opts' Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
f667aef6 |
|
22-Sep-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Make btrfs handle security mount options internally to avoid losing security label. [BUG] Originally when mount btrfs with "-o subvol=" mount option, btrfs will lose all security lable. And if the btrfs fs is mounted somewhere else, due to the lost of security lable, SELinux will refuse to mount since the same super block is being mounted using different security lable. [REPRODUCER] With SELinux enabled: #mkfs -t btrfs /dev/sda5 #mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs #btrfs subvolume create /mnt/btrfs/subvol #mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/test kernel message: SELinux: mount invalid. Same superblock, different security settings for (dev sda5, type btrfs) [REASON] This happens because btrfs will call vfs_kern_mount() and then mount_subtree() to handle subvolume name lookup. First mount will cut off all the security lables and when it comes to the second vfs_kern_mount(), it has no security label now. [FIX] This patch will makes btrfs behavior much more like nfs, which has the type flag FS_BINARY_MOUNTDATA, making btrfs handles the security label internally. So security label will be set in the real mount time and won't lose label when use with "subvol=" mount option. Reported-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
97eb6b69 |
|
29-Jul-2014 |
David Sterba <dsterba@suse.cz> |
btrfs: use slab for end_io_wq structures The structure is frequently reused. Rename it according to the slab name. Signed-off-by: David Sterba <dsterba@suse.cz>
|
#
af13b492 |
|
29-Jul-2014 |
David Sterba <dsterba@suse.cz> |
btrfs: fix error labels in init_btrfs_fs btrfs_interface_init rarely fails but we could leak the prelim_ref slab. Signed-off-by: David Sterba <dsterba@suse.cz>
|
#
95ac567a |
|
08-Aug-2013 |
Filipe David Borba Manana <fdmanana@gmail.com> |
Btrfs: set default max_inline to 8KiB instead of 8MiB 8MiB is way too large and likely set by mistake. This is not a significant issue as in practice the max amount of data added to an inline extent is also limited by the page cache and btree leaf sizes. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: David Sterba <dsterba@suse.cz>
|
#
143f3636 |
|
29-Jul-2014 |
David Sterba <dsterba@suse.cz> |
btrfs: remove unused variable from btrfs_parse_options Signed-off-by: David Sterba <dsterba@suse.cz>
|
#
15484377 |
|
03-Sep-2014 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: fix unprotected device list access when getting the fs information When we get the fs information, we forgot to acquire the mutex of device list, it might cause the problem we might access a device that was removed. Fix it by acquiring the device list mutex. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
d3982100 |
|
17-Jul-2014 |
Mark Fasheh <mfasheh@suse.de> |
btrfs: add trace for qgroup accounting We want this to debug qgroup changes on live systems. Signed-off-by: Mark Fasheh <mfasheh@suse.de> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
4027e0f4 |
|
29-Jun-2014 |
Wang Shilong <wangsl.fnst@cn.fujitsu.com> |
Btrfs: clear compress-force when remounting with compress option Steps to reproduce: # mkfs.btrfs -f /dev/sdb # mount /dev/sdb /mnt -o compress-force=lzo # mount /dev/sdb /mnt -o remount,compress=zlib # cat /proc/mounts Remounting from compress-force to compress could not clear compress-force option. The problem is there is no way for users to clear compress-force option separately. Fix this problem by clearing @FORCE_COMPRESS flag when remounting to compress=xxx. Suggested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Tested-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
3abdbd78 |
|
04-Jun-2014 |
David Sterba <dsterba@suse.cz> |
btrfs: make close_ctree return void There's no user of the return value and we can get rid of the comment in put_super. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
|
#
ba7b6e62 |
|
01-Jul-2014 |
David Sterba <dsterba@suse.cz> |
btrfs: adjust statfs calculations according to raid profiles This has been discussed in thread: http://thread.gmane.org/gmane.comp.file-systems.btrfs/32528 and this patch implements this proposal: http://thread.gmane.org/gmane.comp.file-systems.btrfs/32536 Works fine for "clean" raid profiles where the raid factor correction does the right job. Otherwise it's pessimistic and may show low space although there's still some left. The df nubmers are lightly wrong in case of mixed block groups, but this is not a major usecase and can be addressed later. The RAID56 numbers are wrong almost the same way as before and will be addressed separately. CC: Hugo Mills <hugo@carfax.org.uk> CC: cwillu <cwillu@cwillu.com> CC: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
|
#
1a0a397e |
|
14-Feb-2014 |
J. Bruce Fields <bfields@redhat.com> |
dcache: d_obtain_alias callers don't all want DISCONNECTED There are a few d_obtain_alias callers that are using it to get the root of a filesystem which may already have an alias somewhere else. This is not the same as the filehandle-lookup case, and none of them actually need DCACHE_DISCONNECTED set. It isn't really a serious problem, but it would really be clearer if we reserved DCACHE_DISCONNECTED for those cases where it's actually needed. In the btrfs case this was causing a spurious printk from nfsd/nfsfh.c:fh_verify when it found an unexpected DCACHE_DISCONNECTED dentry. Josef worked around this by unsetting DCACHE_DISCONNECTED manually in 3a0dfa6a12e "Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol", and this replaces that workaround. Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
0aeb8a6e |
|
30-Jun-2014 |
Anand Jain <Anand.Jain@oracle.com> |
btrfs: fix null pointer dereference in btrfs_show_devname when name is null dev->name is null but missing flag is not set. Strictly speaking the missing flag should have been set, but there are more places where code just checks if name is null. For now this patch does the same. stack: BUG: unable to handle kernel NULL pointer dereference at 0000000000000064 IP: [<ffffffffa0228908>] btrfs_show_devname+0x58/0xf0 [btrfs] [<ffffffff81198879>] show_vfsmnt+0x39/0x130 [<ffffffff81178056>] m_show+0x16/0x20 [<ffffffff8117d706>] seq_read+0x296/0x390 [<ffffffff8115aa7d>] vfs_read+0x9d/0x160 [<ffffffff8115b549>] SyS_read+0x49/0x90 [<ffffffff817abe52>] system_call_fastpath+0x16/0x1b reproducer: mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2 btrfstune -S 1 /dev/sdg1 modprobe -r btrfs && modprobe btrfs mount -o degraded /dev/sdg1 /btrfs btrfs dev add /dev/sdg3 /btrfs Signed-off-by: Anand Jain <Anand.Jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
2aa06a35 |
|
27-Jun-2014 |
Eric Sandeen <sandeen@redhat.com> |
btrfs: fix nossd and ssd_spread mount option regression The commit 0780253 btrfs: Cleanup the btrfs_parse_options for remount. broke ssd options quite badly; it stopped making ssd_spread imply ssd, and it made "nossd" unsettable. Put things back at least as well as they were before (though ssd mount option handling is still pretty odd: # mount -o "nossd,ssd_spread" works?) Reported-by: Roman Mamedov <rm@romanrm.net> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
5f316481 |
|
25-Jun-2014 |
Wang Shilong <wangsl.fnst@cn.fujitsu.com> |
Btrfs: fix race between balance recovery and root deletion Balance recovery is called when RW mounting or remounting from RO to RW, it is called to finish roots merging. When doing balance recovery, relocation root's corresponding fs root(whose root refs is 0) might be destroyed by cleaner thread, this will make btrfs fail to mount. Fix this problem by holding @cleaner_mutex when doing balance recovery. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
351fd353 |
|
15-May-2014 |
David Sterba <dsterba@suse.cz> |
btrfs: remove stale newlines from log messages I've noticed an extra line after "use no compression", but search revealed much more in messages of more critical levels and rare errors. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
|
#
45ff35d6 |
|
11-May-2014 |
Guangliang Zhao <lucienchao@gmail.com> |
Btrfs: remove OPT_acl parse when acl disabled Even CONFIG_BTRFS_FS_POSIX_ACL is not defined, the acl still could been enabled using a mount option, and now fs/btrfs/acl.o is not built, so the mount options will appear to be supported but will be silently ignored. Signed-off-by: Guangliang Zhao <lucienchao@gmail.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
|
#
faa2dbf0 |
|
07-May-2014 |
Josef Bacik <jbacik@fb.com> |
Btrfs: add sanity tests for new qgroup accounting code This exercises the various parts of the new qgroup accounting code. We do some basic stuff and do some things with the shared refs to make sure all that code works. I had to add a bunch of infrastructure because I needed to be able to insert items into a fake tree without having to do all the hard work myself, hopefully this will be usefull in the future. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
21c7e756 |
|
13-May-2014 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: reclaim the reserved metadata space at background Before applying this patch, the task had to reclaim the metadata space by itself if the metadata space was not enough. And When the task started the space reclamation, all the other tasks which wanted to reserve the metadata space were blocked. At some cases, they would be blocked for a long time, it made the performance fluctuate wildly. So we introduce the background metadata space reclamation, when the space is about to be exhausted, we insert a reclaim work into the workqueue, the worker of the workqueue helps us to reclaim the reserved space at the background. By this way, the tasks needn't reclaim the space by themselves at most cases, and even if the tasks have to reclaim the space or are blocked for the space reclamation, they will get enough space more quickly. Here is my test result(Tested by compilebench): Memory: 2GB CPU: 2Cores * 1CPU Partition: 40GB(SSD) Test command: # compilebench -D <mnt> -m Without this patch: intial create total runs 30 avg 54.36 MB/s (user 0.52s sys 2.44s) compile total runs 30 avg 123.72 MB/s (user 0.13s sys 1.17s) read compiled tree total runs 3 avg 81.15 MB/s (user 0.74s sys 4.89s) delete compiled tree total runs 30 avg 5.32 seconds (user 0.35s sys 4.37s) With this patch: intial create total runs 30 avg 59.80 MB/s (user 0.52s sys 2.53s) compile total runs 30 avg 151.44 MB/s (user 0.13s sys 1.11s) read compiled tree total runs 3 avg 83.25 MB/s (user 0.76s sys 4.91s) delete compiled tree total runs 30 avg 5.29 seconds (user 0.34s sys 4.34s) Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
9d89ce65 |
|
23-Apr-2014 |
Wang Shilong <wangsl.fnst@cn.fujitsu.com> |
Btrfs: move btrfs_{set,clear}_and_info() to ctree.h Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
0040e606 |
|
12-Apr-2014 |
Christoph Jaeger <christophjaeger@linux.com> |
btrfs: fix use-after-free in mount_subvol() Pointer 'newargs' is used after the memory that it points to has already been freed. Picked up by Coverity - CID 1201425. Fixes: 0723a0473f ("btrfs: allow mounting btrfs subvolumes with different ro/rw options") Signed-off-by: Christoph Jaeger <christophjaeger@linux.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
0723a047 |
|
19-Nov-2013 |
Harald Hoyer <harald@redhat.com> |
btrfs: allow mounting btrfs subvolumes with different ro/rw options Given the following /etc/fstab entries: /dev/sda3 /mnt/foo btrfs subvol=foo,ro 0 0 /dev/sda3 /mnt/bar btrfs subvol=bar,rw 0 0 you can't issue: $ mount /mnt/foo $ mount /mnt/bar You would have to do: $ mount /mnt/foo $ mount -o remount,rw /mnt/foo $ mount --bind -o remount,ro /mnt/foo $ mount /mnt/bar or $ mount /mnt/bar $ mount --rw /mnt/foo $ mount --bind -o remount,ro /mnt/foo With this patch you can do $ mount /mnt/foo $ mount /mnt/bar $ cat /proc/self/mountinfo 49 33 0:41 /foo /mnt/foo ro,relatime shared:36 - btrfs /dev/sda3 rw,ssd,space_cache 87 33 0:41 /bar /mnt/bar rw,relatime shared:74 - btrfs /dev/sda3 rw,ssd,space_cache Signed-off-by: Chris Mason <clm@fb.com>
|
#
02b9984d |
|
13-Mar-2014 |
Theodore Ts'o <tytso@mit.edu> |
fs: push sync_filesystem() down to the file system's remount_fs() Previously, the no-op "mount -o mount /dev/xxx" operation when the file system is already mounted read-write causes an implied, unconditional syncfs(). This seems pretty stupid, and it's certainly documented or guaraunteed to do this, nor is it particularly useful, except in the case where the file system was mounted rw and is getting remounted read-only. However, it's possible that there might be some file systems that are actually depending on this behavior. In most file systems, it's probably fine to only call sync_filesystem() when transitioning from read-write to read-only, and there are some file systems where this is not needed at all (for example, for a pseudo-filesystem or something like romfs). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jan Kara <jack@suse.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anders Larsen <al@alarsen.net> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: xfs@oss.sgi.com Cc: linux-btrfs@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: codalist@coda.cs.cmu.edu Cc: linux-ext4@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: fuse-devel@lists.sourceforge.net Cc: cluster-devel@redhat.com Cc: linux-mtd@lists.infradead.org Cc: jfs-discussion@lists.sourceforge.net Cc: linux-nfs@vger.kernel.org Cc: linux-nilfs@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Cc: reiserfs-devel@vger.kernel.org
|
#
a046e9c8 |
|
27-Feb-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Cleanup the old btrfs_worker. Since all the btrfs_worker is replaced with the newly created btrfs_workqueue, the old codes can be easily remove. Signed-off-by: Quwenruo <quwenruo@cn.fujitsu.com> Tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com>
|
#
0339ef2f |
|
27-Feb-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Replace fs_info->scrub_* workqueue with btrfs_workqueue. Replace the fs_info->scrub_* with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com>
|
#
5b3bc44e |
|
27-Feb-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Replace fs_info->delayed_workers workqueue with btrfs_workqueue. Replace the fs_info->delayed_workers with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com>
|
#
dc6e3209 |
|
27-Feb-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Replace fs_info->fixup_workers workqueue with btrfs_workqueue. Replace the fs_info->fixup_workers with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com>
|
#
736cfa15 |
|
27-Feb-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Replace fs_info->readahead_workers workqueue with btrfs_workqueue. Replace the fs_info->readahead_workers with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com>
|
#
e66f0bb1 |
|
27-Feb-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Replace fs_info->cache_workers workqueue with btrfs_workqueue. Replace the fs_info->cache_workers with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com>
|
#
fccb5d86 |
|
27-Feb-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Replace fs_info->endio_* workqueue with btrfs_workqueue. Replace the fs_info->endio_* workqueues with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com>
|
#
a8c93d4e |
|
27-Feb-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Replace fs_info->submit_workers with btrfs_workqueue. Much like the fs_info->workers, replace the fs_info->submit_workers use the same btrfs_workqueue. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com>
|
#
afe3d242 |
|
27-Feb-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Replace fs_info->delalloc_workers with btrfs_workqueue Much like the fs_info->workers, replace the fs_info->delalloc_workers use the same btrfs_workqueue. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com>
|
#
5cdc7ad3 |
|
27-Feb-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Replace fs_info->workers with btrfs_workqueue. Use the newly created btrfs_workqueue_struct to replace the original fs_info->workers Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com>
|
#
2c6a92b0 |
|
20-Feb-2014 |
Justin Maggard <jmaggard10@gmail.com> |
btrfs: wake up transaction thread upon remount Now that we can adjust the commit interval with a remount, we need to wake up the transaction thread or else he will continue to sleep until the previous transaction interval has elapsed before waking up. So, if we go from a large commit interval to something smaller, the transaction thread will not wake up until the large interval has expired. This also causes the cleaner thread to stay sleeping, since it gets woken up by the transaction thread. Fix it by simply waking up the transaction thread during a remount. Signed-off-by: Justin Maggard <jmaggard10@gmail.com> Signed-off-by: Josef Bacik <jbacik@fb.com>
|
#
3a0dfa6a |
|
14-Feb-2014 |
Josef Bacik <jbacik@fb.com> |
Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol A user was running into errors from an NFS export of a subvolume that had a default subvol set. When we mount a default subvol we will use d_obtain_alias() to find an existing dentry for the subvolume in the case that the root subvol has already been mounted, or a dummy one is allocated in the case that the root subvol has not already been mounted. This allows us to connect the dentry later on if we wander into the path. However if we don't ever wander into the path we will keep DCACHE_DISCONNECTED set for a long time, which angers NFS. It doesn't appear to cause any problems but it is annoying nonetheless, so simply unset DCACHE_DISCONNECTED in the get_default_root case and switch btrfs_lookup() to use d_materialise_unique() instead which will make everything play nicely together and reconnect stuff if we wander into the defaul subvol path from a different way. With this patch I'm no longer getting the NFS errors when exporting a volume that has been mounted with a default subvol set. Thanks, cc: bfields@fieldses.org cc: ebiederm@xmission.com Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
feb5f965 |
|
13-Feb-2014 |
Mitch Harder <mitch.harder@sabayonlinux.org> |
Btrfs: fix max_inline mount option Currently, the only mount option for max_inline that has any effect is max_inline=0. Any other value that is supplied to max_inline will be adjusted to a minimum of 4k. Since max_inline has an effective maximum of ~3900 bytes due to page size limitations, the current behaviour only has meaning for max_inline=0. This patch will allow the the max_inline mount option to accept non-zero values as indicated in the documentation. Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Chris Mason <clm@fb.com>
|
#
60efa5eb |
|
01-Feb-2014 |
Filipe David Borba Manana <fdmanana@gmail.com> |
Btrfs: use late_initcall instead of module_init It seems that when init_btrfs_fs() is called, crc32c/crc32c-intel might not always be already initialized, which results in the call to crypto_alloc_shash() returning -ENOENT, as experienced by Ahmet who reported this. Therefore make sure init_btrfs_fs() is called after crc32c is initialized (which is at initialization level 6, module_init), by using late_initcall (which is at initialization level 7) instead of module_init for btrfs. Reported-and-Tested-by: Ahmet Inan <ainan@mathematik.uni-freiburg.de> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
07802534 |
|
12-Jan-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Cleanup the btrfs_parse_options for remount. Since remount will pending the new mount options to the original mount options, which will make btrfs_parse_options check the old options then new options, causing some stupid output like "enabling XXX" following by "disable XXX". This patch will add extra check before every btrfs_info to skip the output from old options checking. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
3818aea2 |
|
12-Jan-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Add noinode_cache mount option Add noinode_cache mount option for btrfs. Since inode map cache involves all the btrfs_find_free_ino/return_ino things and if just trigger the mount_opt, an inode number get from inode map cache will not returned to inode map cache. To keep the find and return inode both in the same behavior, a new bit in mount_opt, CHANGE_INODE_CACHE, is introduced for this idea. CHANGE_INODE_CACHE is set/cleared in remounting, and the original INODE_MAP_CACHE is set/cleared according to CHANGE_INODE_CACHE after a success transaction. Since find/return inode is all done between btrfs_start_transaction and btrfs_commit_transaction, this will keep consistent behavior. Also noinode_cache mount option will not stop the caching_kthread. Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
14a958e6 |
|
11-Jan-2014 |
Filipe David Borba Manana <fdmanana@gmail.com> |
Btrfs: fix btrfs boot when compiled as built-in After the change titled "Btrfs: add support for inode properties", if btrfs was built-in the kernel (i.e. not as a module), it would cause a kernel panic, as reported recently by Fengguang: [ 2.024722] BUG: unable to handle kernel NULL pointer dereference at (null) [ 2.027814] IP: [<ffffffff81501594>] crc32c+0xc/0x6b [ 2.028684] PGD 0 [ 2.028684] Oops: 0000 [#1] SMP [ 2.028684] Modules linked in: [ 2.028684] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc7-04795-ga7b57c2 #1 [ 2.028684] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 2.028684] task: ffff88000edba100 ti: ffff88000edd6000 task.ti: ffff88000edd6000 [ 2.028684] RIP: 0010:[<ffffffff81501594>] [<ffffffff81501594>] crc32c+0xc/0x6b [ 2.028684] RSP: 0000:ffff88000edd7e58 EFLAGS: 00010246 [ 2.028684] RAX: 0000000000000000 RBX: ffffffff82295550 RCX: 0000000000000000 [ 2.028684] RDX: 0000000000000011 RSI: ffffffff81efe393 RDI: 00000000fffffffe [ 2.028684] RBP: ffff88000edd7e60 R08: 0000000000000003 R09: 0000000000015d20 [ 2.028684] R10: ffffffff81ef225e R11: ffffffff811b0222 R12: ffffffffffffffff [ 2.028684] R13: 0000000000000239 R14: 0000000000000000 R15: 0000000000000000 [ 2.028684] FS: 0000000000000000(0000) GS:ffff88000fa00000(0000) knlGS:0000000000000000 [ 2.028684] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2.028684] CR2: 0000000000000000 CR3: 000000000220c000 CR4: 00000000000006f0 [ 2.028684] Stack: [ 2.028684] ffffffff82295550 ffff88000edd7e80 ffffffff8238af62 ffffffff8238ac05 [ 2.028684] 0000000000000000 ffff88000edd7e98 ffffffff8238ac0f ffffffff8238ac05 [ 2.028684] ffff88000edd7f08 ffffffff810002ba ffff88000edd7f00 ffffffff810e2404 [ 2.028684] Call Trace: [ 2.028684] [<ffffffff8238af62>] btrfs_props_init+0x4f/0x96 [ 2.028684] [<ffffffff8238ac05>] ? ftrace_define_fields_btrfs_space_reservation+0x145/0x145 [ 2.028684] [<ffffffff8238ac0f>] init_btrfs_fs+0xa/0xf0 [ 2.028684] [<ffffffff8238ac05>] ? ftrace_define_fields_btrfs_space_reservation+0x145/0x145 [ 2.028684] [<ffffffff810002ba>] do_one_initcall+0xa4/0x13a [ 2.028684] [<ffffffff810e2404>] ? parse_args+0x25f/0x33d [ 2.028684] [<ffffffff8234cf75>] kernel_init_freeable+0x1aa/0x230 [ 2.028684] [<ffffffff8234c785>] ? do_early_param+0x88/0x88 [ 2.028684] [<ffffffff819f61b5>] ? rest_init+0x89/0x89 [ 2.028684] [<ffffffff819f61c3>] kernel_init+0xe/0x109 The issue here is that the initialization function of btrfs (super.c:init_btrfs_fs) started using crc32c (from lib/libcrc32c.c). But when it needs to call crc32c (as part of the properties initialization routine), the libcrc32c is not yet initialized, so crc32c derreferenced a NULL pointer (lib/libcrc32c.c:tfm), causing the kernel panic on boot. The approach to fix this is to use crypto component directly to use its crc32c (which is basically what lib/libcrc32c.c is, a wrapper around crypto). This is what ext4 is doing as well, it uses crypto directly to get crc32c functionality. Verified this works both when btrfs is built-in and when it's loadable kernel module. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
63541927 |
|
07-Jan-2014 |
Filipe David Borba Manana <fdmanana@gmail.com> |
Btrfs: add support for inode properties This change adds infrastructure to allow for generic properties for inodes. Properties are name/value pairs that can be associated with inodes for different purposes. They are stored as xattrs with the prefix "btrfs." Properties can be inherited - this means when a directory inode has inheritable properties set, these are added to new inodes created under that directory. Further, subvolumes can also have properties associated with them, and they can be inherited from their parent subvolume. Naturally, directory properties have priority over subvolume properties (in practice a subvolume property is just a regular property associated with the root inode, objectid 256, of the subvolume's fs tree). This change also adds one specific property implementation, named "compression", whose values can be "lzo" or "zlib" and it's an inheritable property. The corresponding changes to btrfs-progs were also implemented. A patch with xfstests for this feature will follow once there's agreement on this change/feature. Further, the script at the bottom of this commit message was used to do some benchmarks to measure any performance penalties of this feature. Basically the tests correspond to: Test 1 - create a filesystem and mount it with compress-force=lzo, then sequentially create N files of 64Kb each, measure how long it took to create the files, unmount the filesystem, mount the filesystem and perform an 'ls -lha' against the test directory holding the N files, and report the time the command took. Test 2 - create a filesystem and don't use any compression option when mounting it - instead set the compression property of the subvolume's root to 'lzo'. Then create N files of 64Kb, and report the time it took. The unmount the filesystem, mount it again and perform an 'ls -lha' like in the former test. This means every single file ends up with a property (xattr) associated to it. Test 3 - same as test 2, but uses 4 properties - 3 are duplicates of the compression property, have no real effect other than adding more work when inheriting properties and taking more btree leaf space. Test 4 - same as test 3 but with 10 properties per file. Results (in seconds, and averages of 5 runs each), for different N numbers of files follow. * Without properties (test 1) file creation time ls -lha time 10 000 files 3.49 0.76 100 000 files 47.19 8.37 1 000 000 files 518.51 107.06 * With 1 property (compression property set to lzo - test 2) file creation time ls -lha time 10 000 files 3.63 0.93 100 000 files 48.56 9.74 1 000 000 files 537.72 125.11 * With 4 properties (test 3) file creation time ls -lha time 10 000 files 3.94 1.20 100 000 files 52.14 11.48 1 000 000 files 572.70 142.13 * With 10 properties (test 4) file creation time ls -lha time 10 000 files 4.61 1.35 100 000 files 58.86 13.83 1 000 000 files 656.01 177.61 The increased latencies with properties are essencialy because of: *) When creating an inode, we now synchronously write 1 more item (an xattr item) for each property inherited from the parent dir (or subvolume). This could be done in an asynchronous way such as we do for dir intex items (delayed-inode.c), which could help reduce the file creation latency; *) With properties, we now have larger fs trees. For this particular test each xattr item uses 75 bytes of leaf space in the fs tree. This could be less by using a new item for xattr items, instead of the current btrfs_dir_item, since we could cut the 'location' and 'type' fields (saving 18 bytes) and maybe 'transid' too (saving a total of 26 bytes per xattr item) from the btrfs_dir_item type. Also tried batching the xattr insertions (ignoring proper hash collision handling, since it didn't exist) when creating files that inherit properties from their parent inode/subvolume, but the end results were (surprisingly) essentially the same. Test script: $ cat test.pl #!/usr/bin/perl -w use strict; use Time::HiRes qw(time); use constant NUM_FILES => 10_000; use constant FILE_SIZES => (64 * 1024); use constant DEV => '/dev/sdb4'; use constant MNT_POINT => '/home/fdmanana/btrfs-tests/dev'; use constant TEST_DIR => (MNT_POINT . '/testdir'); system("mkfs.btrfs", "-l", "16384", "-f", DEV) == 0 or die "mkfs.btrfs failed!"; # following line for testing without properties #system("mount", "-o", "compress-force=lzo", DEV, MNT_POINT) == 0 or die "mount failed!"; # following 2 lines for testing with properties system("mount", DEV, MNT_POINT) == 0 or die "mount failed!"; system("btrfs", "prop", "set", MNT_POINT, "compression", "lzo") == 0 or die "set prop failed!"; system("mkdir", TEST_DIR) == 0 or die "mkdir failed!"; my ($t1, $t2); $t1 = time(); for (my $i = 1; $i <= NUM_FILES; $i++) { my $p = TEST_DIR . '/file_' . $i; open(my $f, '>', $p) or die "Error opening file!"; $f->autoflush(1); for (my $j = 0; $j < FILE_SIZES; $j += 4096) { print $f ('A' x 4096) or die "Error writing to file!"; } close($f); } $t2 = time(); print "Time to create " . NUM_FILES . ": " . ($t2 - $t1) . " seconds.\n"; system("umount", DEV) == 0 or die "umount failed!"; system("mount", DEV, MNT_POINT) == 0 or die "mount failed!"; $t1 = time(); system("bash -c 'ls -lha " . TEST_DIR . " > /dev/null'") == 0 or die "ls failed!"; $t2 = time(); print "Time to ls -lha all files: " . ($t2 - $t1) . " seconds.\n"; system("umount", DEV) == 0 or die "umount failed!"; Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
a88998f2 |
|
05-Jan-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Add treelog mount option. Add treelog mount option to enable tree log with remount option. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
d399167d |
|
05-Jan-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Add datasum mount option. Add datasum mount option to enable checksum with remount option. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
a258af7a |
|
05-Jan-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Add datacow mount option. Add datacow mount option to enable copy-on-write with remount option. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
bd0330ad |
|
05-Jan-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Add acl mount option. Add acl mount option to enable acl with remount option. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
2c9ee856 |
|
05-Jan-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Add noflushoncommit mount option. Add noflushoncommit mount option to disable flush on commit with remount option. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
53036293 |
|
05-Jan-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Add noenospc_debug mount option. Add noenospc_debug mount option to disable ENOSPC debug with remount option. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
e07a2ade |
|
05-Jan-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Add nodiscard mount option. Add nodiscard mount option to disable discard with remount option. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
fc0ca9af |
|
05-Jan-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Add noautodefrag mount option. Btrfs has autodefrag mount option but no pairing noautodefrag option, which makes it impossible to disable autodefrag without umount. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
842bef58 |
|
05-Jan-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Add "barrier" option to support "-o remount,barrier" Btrfs can be remounted without barrier, but there is no "barrier" option so nobody can remount btrfs back with barrier on. Only umount and mount again can re-enable barrier.(Quite awkward) Also the mount options in the document is also changed slightly for the further pairing options changes. Reported-by: Daniel Blueman <daniel@quora.org> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Mike Fleetwood <mike.fleetwood@googlemail.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
efe120a0 |
|
20-Dec-2013 |
Frank Holton <fholton@gmail.com> |
Btrfs: convert printk to btrfs_ and fix BTRFS prefix Convert all applicable cases of printk and pr_* to the btrfs_* macros. Fix all uses of the BTRFS prefix. Signed-off-by: Frank Holton <fholton@gmail.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
a7e252af |
|
22-Nov-2013 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: don't clear the default compression type We met a oops caused by the wrong compression type: [ 556.512356] BUG: unable to handle kernel NULL pointer dereference at (null) [ 556.512370] IP: [<ffffffff811dbaa0>] __list_del_entry+0x1/0x98 [SNIP] [ 556.512490] [<ffffffff811dbb44>] ? list_del+0xd/0x2b [ 556.512539] [<ffffffffa05dd5ce>] find_workspace+0x97/0x175 [btrfs] [ 556.512546] [<ffffffff813c14b5>] ? _raw_spin_lock+0xe/0x10 [ 556.512576] [<ffffffffa05de276>] btrfs_compress_pages+0x2d/0xa2 [btrfs] [ 556.512601] [<ffffffffa05af060>] compress_file_range.constprop.54+0x1f2/0x4e8 [btrfs] [ 556.512627] [<ffffffffa05af388>] async_cow_start+0x32/0x4d [btrfs] [ 556.512655] [<ffffffffa05cc7a1>] worker_loop+0x144/0x4c3 [btrfs] [ 556.512661] [<ffffffff81059404>] ? finish_task_switch+0x80/0xb8 [ 556.512689] [<ffffffffa05cc65d>] ? btrfs_queue_worker+0x244/0x244 [btrfs] [ 556.512695] [<ffffffff8104fa4e>] kthread+0x8d/0x95 [ 556.512699] [<ffffffff81050000>] ? bit_waitqueue+0x34/0x7d [ 556.512704] [<ffffffff8104f9c1>] ? __kthread_parkme+0x65/0x65 [ 556.512709] [<ffffffff813c7eec>] ret_from_fork+0x7c/0xb0 [ 556.512713] [<ffffffff8104f9c1>] ? __kthread_parkme+0x65/0x65 Steps to reproduce: # mkfs.btrfs -f <dev> # mount -o nodatacow <dev> <mnt> # touch <mnt>/<file> # chattr =c <mnt>/<file> # dd if=/dev/zero of=<mnt>/<file> bs=1M count=10 It is because we cleared the default compression type when setting the nodatacow. In fact, we needn't do it because we have used COMPRESS flag to indicate if we need compressed the file data or not, needn't use the variant -- compress_type -- in btrfs_info to do the same thing, and just use it to hold the default compression type. Or we would get a wrong compress type for a file whose own compress flag is set but the compress flag of its filesystem is not set. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
b0244199 |
|
04-Nov-2013 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: don't wait for the completion of all the ordered extents It is very likely that there are lots of ordered extents in the filesytem, if we wait for the completion of all of them when we want to reclaim some space for the metadata space reservation, we would be blocked for a long time. The performance would drop down suddenly for a long time. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
d9b0d9ba |
|
30-Oct-2013 |
Dulshani Gunawardhana <dulshani.gunawardhana89@gmail.com> |
btrfs: Replace kmalloc with kmalloc_array Replace kmalloc(size * nr, ) with kmalloc_array(nr, size), thus making it easier to check is that the calculation doesn't wrap or return a smaller allocation Signed-off-by: Dulshani Gunawardhana <dulshani.gunawardhana89@gmail.com> Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
8b558c5f |
|
16-Oct-2013 |
Zach Brown <zab@redhat.com> |
btrfs: remove fs/btrfs/compat.h fs/btrfs/compat.h only contained trivial macro wrappers of drop_nlink() and inc_nlink(). This doesn't belong in mainline. Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
361c093d |
|
11-Oct-2013 |
Stefan Behrens <sbehrens@giantdisaster.de> |
Btrfs: Wait for uuid-tree rebuild task on remount read-only If the user remounts the filesystem read-only while the uuid-tree scan and rebuild task is still running (this happens once after the filesystem was mounted with an old kernel, or when forced with the mount options), the remount should wait on the tasks completion before setting the filesystem read-only. Otherwise the background task continues to write to the filesystem which is apparently not what users expect. The reproducer: TEST_DEV=/dev/sdzzzzz1 TEST_MNT=/mnt mkfs.btrfs -f $TEST_DEV mount $TEST_DEV $TEST_MNT for i in `seq 50000`; do btrfs subvolume create ${TEST_MNT}/$i; done umount $TEST_MNT mount $TEST_DEV $TEST_MNT -o rescan_uuid_tree sleep 1 ps -elf | fgrep '[btrfs-uuid]' | grep -v grep mount $TEST_DEV $TEST_MNT -o ro,remount ps -elf | fgrep '[btrfs-uuid]' | grep -v grep sleep 1 umount $TEST_MNT Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
aaedb55b |
|
11-Oct-2013 |
Josef Bacik <jbacik@fusionio.com> |
Btrfs: add tests for btrfs_get_extent I'm going to be removing hole extents in the near future so I wanted to make a sanity test for btrfs_get_extent to make sure I don't break anything in the meantime. This patch just puts btrfs_get_extent through its paces by giving it a completely unreasonable mapping to look at and make sure it is giving us back maps that make sense. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
294e30fe |
|
08-Oct-2013 |
Josef Bacik <jbacik@fusionio.com> |
Btrfs: add tests for find_lock_delalloc_range So both Liu and I made huge messes of find_lock_delalloc_range trying to fix stuff, me first by fixing extent size, then him by fixing something I broke and then me again telling him to fix it a different way. So this is obviously a candidate for some testing. This patch adds a pseudo fs so we can allocate fake inodes for tests that need an inode or pages. Then it addes a bunch of tests to make sure find_lock_delalloc_range is acting the way it is supposed to. With this patch and all of our previous patches to find_lock_delalloc_range I am sure it is working as expected now. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
06ea65a3 |
|
19-Sep-2013 |
Josef Bacik <jbacik@fusionio.com> |
Btrfs: add a sanity test for btrfs_split_item While looking at somebodys corruption I became completely convinced that btrfs_split_item was broken, so I wrote this test to verify that it was working as it was supposed to. Thankfully it appears to be working as intended, so just add this test to make sure nobody breaks it in the future. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
94aebfb2 |
|
20-Sep-2013 |
Josef Bacik <jbacik@fusionio.com> |
Btrfs: create the uuid tree on remount rw Users have been complaining of the uuid tree stuff warning that there is no uuid root when trying to do snapshot operations. This is because if you mount -o ro we will not create the uuid tree. But then if you mount -o rw,remount we will still not create it and then any subsequent snapshot/subvol operations you try to do will fail gloriously. Fix this by creating the uuid_root on remount rw if it was not already there. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
6ef3de9c |
|
13-Sep-2013 |
David Sterba <dsterba@suse.cz> |
btrfs: refuse to remount read-write after abort It's still possible to flip the filesystem into RW mode after it's remounted RO due to an abort. There are lots of places that check for the superblock error bit and will not write data, but we should not let the filesystem appear read-write. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
f0de181c |
|
17-Sep-2013 |
Josef Bacik <jbacik@fusionio.com> |
Btrfs: kill delay_iput arg to the wait_ordered functions This is a left over of how we used to wait for ordered extents, which was to grab the inode and then run filemap flush on it. However if we have an ordered extent then we already are holding a ref on the inode, and we just use btrfs_start_ordered_extent anyway, so there is no reason to have an extra ref on the inode to start work on the ordered extent. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
79556c3d |
|
03-Sep-2013 |
Stefan Behrens <sbehrens@giantdisaster.de> |
btrfs: show compiled-in config features at module load time We want to know if there are debugging features compiled in, this may affect performance. The message is printed before the sanity checks. (This commit message is a copy of David Sterba's commit message when he introduced btrfs_print_info()). Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
b9e9a6cb |
|
08-Aug-2013 |
Wang Shilong <wangsl.fnst@cn.fujitsu.com> |
Btrfs: allocate prelim_ref with a slab allocater struct __prelim_ref is allocated and freed frequently when walking backref tree, using slab allocater can not only speed up allocating but also detect memory leaks. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
c1c9ff7c |
|
20-Aug-2013 |
Geert Uytterhoeven <geert@linux-m68k.org> |
Btrfs: Remove superfluous casts from u64 to unsigned long long u64 is "unsigned long long" on all architectures now, so there's no need to cast it when formatting it using the "ll" length modifier. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
f420ee1e |
|
15-Aug-2013 |
Stefan Behrens <sbehrens@giantdisaster.de> |
Btrfs: add mount option to force UUID tree checking This should never be needed, but since all functions are there to check and rebuild the UUID tree, a mount option is added that allows to force this check and rebuild procedure. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
dc11dd5d |
|
14-Aug-2013 |
Josef Bacik <jbacik@fusionio.com> |
Btrfs: separate out tests into their own directory The plan is to have a bunch of unit tests that run when btrfs is loaded when you build with the appropriate config option. My ultimate goal is to have a test for every non-static function we have, but at first I'm going to focus on the things that cause us the most problems. To start out with this just adds a tests/ directory and moves the existing free space cache tests into that directory and sets up all of the infrastructure. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
8b87dc17 |
|
01-Aug-2013 |
David Sterba <dsterba@suse.cz> |
btrfs: add mount option to set commit interval I'ts hardcoded to 30 seconds which is fine for most users. Higher values defer data being synced to permanent storage with obvious consequences when the system crashes. The upper bound is not forced, but a warning is printed if it's more than 300 seconds (5 minutes). Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
8507d216 |
|
23-Jul-2013 |
Wang Shilong <wangsl.fnst@cn.fujitsu.com> |
Btrfs: add missing mounting options in btrfs_show_options() Some options are missing in btrfs_show_options(), this patch adds them. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
1493381f |
|
23-Jul-2013 |
Wang Shilong <wangsl.fnst@cn.fujitsu.com> |
Btrfs: use u64 for subvolid when parsing mount options Although for most time, int is enough for subvolid, we should ensure safety in theory. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
2c334e87 |
|
23-Jul-2013 |
Wang Shilong <wangsl.fnst@cn.fujitsu.com> |
Btrfs: add sanity checks regarding to parsing mount options I just notice the following commands succeed: mount <dev> <mnt> -o thread_pool=-1 This is ridiculous, only positive thread_pool makes sense,this patch adds sanity checks for them, and also catches the error of ENOMEM if allocating memory fails. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
501407aa |
|
10-Jun-2013 |
Josef Bacik <jbacik@fusionio.com> |
Btrfs: stop waiting on current trans if we aborted I hit a hang when run_delayed_refs returned an error in the beginning of btrfs_commit_transaction. If we decide we need to commit the transaction in btrfs_end_transaction we'll set BLOCKED and start to commit, but if we get an error this early on we'll just exit without committing. This is fine, except that anybody else who tried to start a transaction will sit in wait_current_trans() since we're set to BLOCKED and we never set it to something else and woke people up. To fix this we want to check for trans->aborted everywhere we wait for the transaction state to change, and make btrfs_abort_transaction() wake up any waiters there may be. All the callers will notice that the transaction has aborted and exit out properly. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
c73e2936 |
|
16-May-2013 |
Josef Bacik <jbacik@fusionio.com> |
Btrfs: do delay iput in sync_fs We get lock inversion with umount if we allow iputs from sync_fs, so use the delay iput flag to keep this from happening. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
199c2a9c |
|
15-May-2013 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: introduce per-subvolume ordered extent list The reason we introduce per-subvolume ordered extent list is the same as the per-subvolume delalloc inode list. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
3c64a1ab |
|
13-May-2013 |
Stefan Behrens <sbehrens@giantdisaster.de> |
Btrfs: cleanup: don't check the same thing twice btrfs_read_fs_root_no_name() already checks if btrfs_root_refs() is zero and returns ENOENT in this case. There is no need to do it again in six places. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
85965600 |
|
30-Apr-2013 |
David Sterba <dsterba@suse.cz> |
btrfs: show compiled-in config features at module load time We want to know if there are debugging features compiled in, this may affect performance. The message is printed before the sanity checks. Also kill version.h file that serves no purpose, we don't use any version tag for kernel module. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
e6d29605 |
|
30-Apr-2013 |
David Sterba <dsterba@suse.cz> |
btrfs: move ifdef around sanity checks out of init_btrfs_fs Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
061594ef |
|
15-May-2013 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: pause the space balance when remounting to R/O Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
62dbd717 |
|
16-Apr-2013 |
Josef Bacik <jbacik@fusionio.com> |
Btrfs: fix lockdep warning The locking order for stuff is __sb_start_write ordered_mutex but with sync() we don't do __sb_start_write for some strange reason, which means that our iput in wait_ordered_extents could start a transaction which does the __sb_start_write while we're holding the ordered_mutex. Fix this by using delayed iput in sync. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
f42a34b2 |
|
11-Apr-2013 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: fix unblocked autodefraggers when remount The new mount option is set after parsing the remount arguments, so it is wrong that checking the autodefrag is close or not at btrfs_remount_prepare(). Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
cf79ffb5 |
|
01-Apr-2013 |
Josef Bacik <jbacik@fusionio.com> |
Btrfs: fix infinite loop when we abort on mount Testing my enospc log code I managed to abort a transaction during mount, which put me into an infinite loop. This is because of two things, first we don't reset trans_no_join if we abort during transaction commit, which will force anybody trying to start a transaction to just loop endlessly waiting for it to be set to 0. But this is still just a symptom, the second issue is we don't set the fs state to error during errors on mount. This is because we don't want to do the flip read only thing during mount, but we still really want to set the fs state to an error to keep us from even getting to the trans_no_join check. So fix both of these things, make sure to reset trans_no_join if we abort during a commit, and make sure we set the fs state to error no matter if we're mounting or not. This should keep us from getting into this infinite loop again. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
94ef7280 |
|
20-Mar-2013 |
David Sterba <dsterba@suse.cz> |
btrfs: cover more error codes in btrfs_decode_error Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
5e2a4b25 |
|
20-Mar-2013 |
David Sterba <dsterba@suse.cz> |
btrfs: deprecate subvolrootid mount option This mount option was a workaround when subvol= assumed path relative to the default subvolume, not the toplevel one. This was fixed long time ago and subvolrootid has no effect. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
c2cf52eb |
|
19-Mar-2013 |
Simon Kirby <sim@hostway.ca> |
Btrfs: Include the device in most error printk()s With more than one btrfs volume mounted, it can be very difficult to find out which volume is hitting an error. btrfs_error() will print this, but it is currently rigged as more of a fatal error handler, while many of the printk()s are currently for debugging and yet-unhandled cases. This patch just changes the functions where the device information is already available. Some cases remain where the root or fs_info is not passed to the function emitting the error. This may introduce some confusion with volumes backed by multiple devices emitting errors referring to the primary device in the set instead of the one on which the error occurred. Use btrfs_printk(fs_info, format, ...) rather than writing the device string every time, and introduce macro wrappers ala XFS for brevity. Since the function already cannot be used for continuations, print a newline as part of the btrfs_printk() message rather than at each caller. Signed-off-by: Simon Kirby <sim@hostway.ca> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
08748810 |
|
12-Mar-2013 |
David Sterba <dsterba@suse.cz> |
btrfs: clean up transaction abort messages The transaction abort stacktrace is printed only once per module lifetime, but we'd like to see it each time it happens per mounted filesystem. Introduce a fs_state flag that records it. Tweak the messages around abort: * add error number to the first abort * print the exact negative errno from btrfs_decode_error * clean up btrfs_decode_error and callers * no dots at the end of the messages Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
bbece8a3 |
|
11-Mar-2013 |
David Sterba <dsterba@suse.cz> |
btrfs: merge save_error_info helpers into one Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
74255aa0 |
|
15-Mar-2013 |
Josef Bacik <jbacik@fusionio.com> |
Btrfs: add some free space cache tests We keep hitting bugs in the tree log replay because btrfs_remove_free_space doesn't account for some corner case. So add a bunch of tests to try and fully test btrfs_remove_free_space since the only time it is called is during tree log replay. These tests all finish successfully, so as we find more of these bugs we need to add to these tests to make sure we don't regress in fixing things. I've hidden the tests behind a Kconfig option, but they take no time to run so all btrfs developers should have this turned on all the time. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
7f78e035 |
|
02-Mar-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
fs: Limit sys_mount to only request filesystem modules. Modify the request_module to prefix the file system type with "fs-" and add aliases to all of the filesystems that can be built as modules to match. A common practice is to build all of the kernel code and leave code that is not commonly needed as modules, with the result that many users are exposed to any bug anywhere in the kernel. Looking for filesystems with a fs- prefix limits the pool of possible modules that can be loaded by mount to just filesystems trivially making things safer with no real cost. Using aliases means user space can control the policy of which filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf with blacklist and alias directives. Allowing simple, safe, well understood work-arounds to known problematic software. This also addresses a rare but unfortunate problem where the filesystem name is not the same as it's module name and module auto-loading would not work. While writing this patch I saw a handful of such cases. The most significant being autofs that lives in the module autofs4. This is relevant to user namespaces because we can reach the request module in get_fs_type() without having any special permissions, and people get uncomfortable when a user specified string (in this case the filesystem type) goes all of the way to request_module. After having looked at this issue I don't think there is any particular reason to perform any filtering or permission checks beyond making it clear in the module request that we want a filesystem module. The common pattern in the kernel is to call request_module() without regards to the users permissions. In general all a filesystem module does once loaded is call register_filesystem() and go to sleep. Which means there is not much attack surface exposed by loading a filesytem module unless the filesystem is mounted. In a user namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT, which most filesystems do not set today. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Kees Cook <keescook@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
dc81cdc5 |
|
20-Feb-2013 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: fix remount vs autodefrag If we remount the fs to close the auto defragment or make the fs R/O, we should stop the auto defragment. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
d4edf39b |
|
20-Feb-2013 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: fix uncompleted transaction In some cases, we need commit the current transaction, but don't want to start a new one if there is no running transaction, so we introduce the function - btrfs_attach_transaction(), which can catch the current transaction, and return -ENOENT if there is no running transaction. But no running transaction doesn't mean the current transction completely, because we removed the running transaction before it completes. In some cases, it doesn't matter. But in some special cases, such as freeze fs, we hope the transaction is fully on disk, it will introduce some bugs, for example, we may feeze the fs and dump the data in the disk, if the transction doesn't complete, we would dump inconsistent data. So we need fix the above problem for those cases. We fixes this problem by introducing a function: btrfs_attach_transaction_barrier() if we hope all the transaction is fully on the disk, even they are not running, we can use this function. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
37252a66 |
|
30-Jan-2013 |
Eric Sandeen <sandeen@redhat.com> |
btrfs: fix varargs in __btrfs_std_error __btrfs_std_error didn't always properly call va_end, and might call va_start even if fmt was NULL. Move all the varargs handling into the block where we have fmt. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
1c697d4a |
|
30-Jan-2013 |
Eric Sandeen <sandeen@redhat.com> |
btrfs: annotate intentional switch case fallthroughs This keeps static checkers happy. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
aa43a17c |
|
30-Jan-2013 |
Eric Sandeen <sandeen@redhat.com> |
btrfs: handle null fs_info in btrfs_panic() At least backref_tree_panic() can apparently pass in a null fs_info, so handle that in __btrfs_panic to get the message out on the console. The btrfs_panic macro also uses fs_info, but that's largely pointless; it's testing to see if BTRFS_MOUNT_PANIC_ON_FATAL_ERROR is not set. But if it *were* set, __btrfs_panic() would have, well, paniced and we wouldn't be here, testing it! So just BUG() at this point. And since we only use fs_info once now, just use it directly. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
5a016047 |
|
30-Jan-2013 |
Eric Sandeen <sandeen@redhat.com> |
btrfs: remove unused fs_info from btrfs_decode_error() Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
87533c47 |
|
29-Jan-2013 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: use bit operation for ->fs_state There is no lock to protect fs_info->fs_state, it will introduce some problems, such as the value may be covered by the other task when several tasks modify it. For example: Task0 - CPU0 Task1 - CPU1 mov %fs_state rax or $0x1 rax mov %fs_state rax or $0x2 rax mov rax %fs_state mov rax %fs_state The expected value is 3, but in fact, it is 2. Though this problem doesn't happen now (because there is only one flag currently), the code is error prone, if we add other flags, the above problem will happen to a certainty. Now we use bit operation for it to fix the above problem. In this way, we can make the code more robust and be easy to add new flags. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
c018daec |
|
29-Jan-2013 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: protect fs_info->alloc_start fs_info->alloc_start is a 64bits variant, can be accessed by multi-task, but it is not protected strictly, it can be changed while we are accessing it. On 32bit machine, we will get wrong value because we access it by two instructions.(In fact, it is also possible that the same problem happens on the 64bit machine, because the compiler may split the 64bit operation into two 32bit operation.) For example: Assuming -> alloc_start is 0x0000 0000 0001 0000 at the beginning, then we remount and set ->alloc_start to 0x0000 0100 0000 0000. Task0 Task1 load high 32 bits set high 32 bits set low 32 bits load low 32 bits Task1 will get 0. This patch fixes this problem by using two locks to protect it fs_info->chunk_mutex sb->s_umount On the read side, we just need get one of these two locks, and on the write side, we must lock all of them. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
55e301fd |
|
28-Jan-2013 |
Filipe Brandenburger <filbranden@google.com> |
Btrfs: move fs/btrfs/ioctl.h to include/uapi/linux/btrfs.h The header file will then be installed under /usr/include/linux so that userspace applications can refer to Btrfs ioctls by name and use the same structs used internally in the kernel. Signed-off-by: Filipe Brandenburger <filbranden@google.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
78a6184a |
|
20-Nov-2012 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: use slabs for delayed reference allocation The delayed reference allocation is in the fast path of the IO, so use slabs to improve the speed of the allocation. And besides that, it can do check for leaked objects when the module is removed. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
|
#
8d25a086 |
|
14-Jan-2013 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: Add ACCESS_ONCE() to transaction->abort accesses We may access and update transaction->aborted on the different CPUs without lock, so we need ACCESS_ONCE() wrapper to prevent the compiler from creating unsolicited accesses and make sure we can get the right value. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
9247f317 |
|
26-Nov-2012 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: use slabs for auto defrag allocation The auto defrag allocation is in the fast path of the IO, so use slabs to improve the speed of the allocation. And besides that, it can do check for leaked objects when the module is removed. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
8dabb742 |
|
06-Nov-2012 |
Stefan Behrens <sbehrens@giantdisaster.de> |
Btrfs: change core code of btrfs to support the device replace operations This commit contains all the essential changes to the core code of Btrfs for support of the device replace procedure. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
ff023aac |
|
06-Nov-2012 |
Stefan Behrens <sbehrens@giantdisaster.de> |
Btrfs: add code to scrub to copy read data to another disk The device replace procedure makes use of the scrub code. The scrub code is the most efficient code to read the allocated data of a disk, i.e. it reads sequentially in order to avoid disk head movements, it skips unallocated blocks, it uses read ahead mechanisms, and it contains all the code to detect and repair defects. This commit adds code to scrub to allow the scrub code to copy read data to another disk. One goal is to be able to perform as fast as possible. Therefore the write requests are collected until huge bios are built, and the write process is decoupled from the read process with some kind of flow control, of course, in order to limit the allocated memory. The best performance on spinning disks could by reached when the head movements are avoided as much as possible. Therefore a single worker is used to interface the read process with the write process. The regular scrub operation works as fast as before, it is not negatively influenced and actually it is more or less unchanged. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
63a212ab |
|
05-Nov-2012 |
Stefan Behrens <sbehrens@giantdisaster.de> |
Btrfs: disallow some operations on the device replace target device This patch adds some code to disallow operations on the device that is used as the target for the device replace operation. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
1acd6831 |
|
05-Nov-2012 |
Stefan Behrens <sbehrens@giantdisaster.de> |
Btrfs: avoid risk of a deadlock in btrfs_handle_error Remove the attempt to cancel a running scrub or device replace operation in btrfs_handle_error() because it adds the risk of a deadlock. The only penalty of not canceling the operation is that some I/O remains active until the procedure completes. This is basically the same thing that happens to other tasks that are running in user mode context, they are not affected or stopped in btrfs_handle_error(), these tasks just need to handle write errors correctly. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
aa1b8cd4 |
|
05-Nov-2012 |
Stefan Behrens <sbehrens@giantdisaster.de> |
Btrfs: pass fs_info instead of root A small number of functions that are used in a device replace procedure when the operation is resumed at mount time are unable to pass the same root pointer that would be used in the regular (ioctl) context. And since the root pointer is not required, only the fs_info is, the root pointer argument is replaced with the fs_info pointer argument. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
292fd7fc |
|
30-Oct-2012 |
Stefan Behrens <sbehrens@giantdisaster.de> |
Btrfs: don't allow degraded mount if too many devices are missing The current behavior is to allow mounting or remounting a filesystem writeable in degraded mode if at least one writeable device is present. The next failed write access to a missing device which is above the tolerance of the configured level of redundancy results in an read-only enforcement. Even without this, the next time barrier_all_devices() is called and more devices are missing than tolerable, the switch to read-only mode takes place. In order to behave predictably and to provide proper feedback to the user at mount time, this patch compares the number of missing devices with the number of devices that are tolerated to be missing according to the configured RAID level. If more devices are missing than tolerated, e.g. if two devices are missing in case of RAID1, only a read-only mount and remount is allowed. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
bedb2cca |
|
20-Sep-2012 |
Andrei Popa <andrei.popa@i-neo.ro> |
Btrfs: make compress and nodatacow mount options mutually exclusive If a filesystem is mounted with compression and then remounted by adding nodatacow, the compression is disabled but the compress flag is still visible. Also, if a filesystem is mounted with nodatacow and then remounted with compression, nodatacow flag is still present but it's not active. This patch: - removes compress flags and notifies that the compression has been disabled if the filesystem is mounted with nodatacow - removes nodatacow and nodatasum flags if mounted with compress. Signed-off-by: Andrei Popa <andrei.popa@i-neo.ro>
|
#
48940662 |
|
07-May-2012 |
Daniel J Blueman <daniel@quora.org> |
btrfs: fix message printing Fix various messages to include newline and module prefix. Signed-off-by: Daniel J Blueman <daniel@quora.org>
|
#
354aa0fb |
|
20-Sep-2012 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: fix orphan transaction on the freezed filesystem With the following debug patch: static int btrfs_freeze(struct super_block *sb) { + struct btrfs_fs_info *fs_info = btrfs_sb(sb); + struct btrfs_transaction *trans; + + spin_lock(&fs_info->trans_lock); + trans = fs_info->running_transaction; + if (trans) { + printk("Transid %llu, use_count %d, num_writer %d\n", + trans->transid, atomic_read(&trans->use_count), + atomic_read(&trans->num_writers)); + } + spin_unlock(&fs_info->trans_lock); return 0; } I found there was a orphan transaction after the freeze operation was done. It is because the transaction may not be committed when the transaction handle end even though it is the last handle of the current transaction. This design avoid committing the transaction frequently, but also introduce the above problem. So I add btrfs_attach_transaction() which can catch the current transaction and commit it. If there is no transaction, it will return ENOENT, and do not anything. This function also can be used to instead of btrfs_join_transaction_freeze() because it don't increase the writer counter and don't start a new transaction, so it also can fix the deadlock between sync and freeze. Besides that, it is used to instead of btrfs_join_transaction() in transaction_kthread(), because if there is no transaction, the transaction kthread needn't anything. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
|
#
926ced12 |
|
14-Sep-2012 |
Josef Bacik <jbacik@fusionio.com> |
Btrfs: don't do anything in our ->freeze_fs and ->unfreeze_fs We do not need to do anything special to freeze or unfreeze, it's all taken care of by the generic work, and what we currently have is wrong anyway since we shouldn't be returnning to userspace with mutexes held anyway. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
6bbe3a9c |
|
14-Sep-2012 |
Liu Bo <bo.li.liu@oracle.com> |
Btrfs: kill obsolete arguments in btrfs_wait_ordered_extents nocow_only is now an obsolete argument. Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
|
#
60376ce4 |
|
14-Sep-2012 |
Josef Bacik <jbacik@fusionio.com> |
Btrfs: fix race in sync and freeze again I screwed this up, there is a race between checking if there is a running transaction and actually starting a transaction in sync where we could race with a freezer and get ourselves into trouble. To fix this we need to make a new join type to only do the try lock on the freeze stuff. If it fails we'll return EPERM and just return from sync. This fixes a hang Liu Bo reported when running xfstest 68 in a loop. Thanks, Reported-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
69ce977a |
|
06-Sep-2012 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: output more information when aborting a unused transaction handle Though we dump the stack information when aborting a unused transaction handle, we don't know the correct place where we decide to abort the transaction handle if one function has several place where the transaction abort function is invoked and jumps to the same place after this call. And beside that we also don't know the reason why we jump to abort the current handle. So I modify the transaction abort function and make it output the function name, line and error information. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
|
#
6352b91d |
|
06-Sep-2012 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: use a slab for ordered extents allocation The ordered extent allocation is in the fast path of the IO, so use a slab to improve the speed of the allocation. "Size of the struct is 280, so this will fall into the size-512 bucket, giving 8 objects per page, while own slab will pack 14 objects into a page. Another benefit I see is to check for leaked objects when the module is removed (and the cache destroy takes place)." -- David Sterba Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
|
#
bd7de2c9 |
|
24-Aug-2012 |
Josef Bacik <jbacik@fusionio.com> |
Btrfs: fix deadlock with freeze and sync V2 We can deadlock with freeze right now because we unconditionally start a transaction in our ->sync_fs() call. To fix this just check and see if we have a running transaction to commit. This saves us from the deadlock because at this point we'll have the umount sem for the sb so we're safe from freezes coming in after we've done our check. With this patch the freeze xfstests no longer deadlocks. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
aa9ddcd4 |
|
02-Aug-2012 |
Josef Bacik <jbacik@fusionio.com> |
Btrfs: do not use missing devices when showing devname If you do the following mkfs.btrfs /dev/sdb /dev/sdc rmmod btrfs dd if=/dev/zero of=/dev/sdb bs=1M count=1 mount -o degraded /dev/sdc /mnt/btrfs-test the box will panic trying to deref the name for the missing dev since it is the lower numbered devid. So fix show_devname to not use missing devices. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
34eaadaf |
|
25-Jul-2012 |
Artem Bityutskiy <artem.bityutskiy@linux.intel.com> |
btrfs: nuke write_super from comments The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from btrfs. Cc: Chris Mason <chris.mason@fusionio.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
533574c6 |
|
30-Jul-2012 |
Joe Perches <joe@perches.com> |
btrfs: use printk_get_level and printk_skip_level, add __printf, fix fallout Use the generic printk_get_level() to search a message for a kern_level. Add __printf to verify format and arguments. Fix a few messages that had mismatches in format and arguments. Add #ifdef CONFIG_PRINTK blocks to shrink the object size a bit when not using printk. [akpm@linux-foundation.org: whitespace tweak] Signed-off-by: Joe Perches <joe@perches.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2b0ce2c2 |
|
24-Jul-2012 |
Mitch Harder <mitch.harder@sabayonlinux.org> |
Btrfs: Check INCOMPAT flags on remount and add helper function In support of the recently added capability to remount with lzo compression, provide a helper function to check the compression INCOMPAT flags when remounting with lzo compression, and set the flags if necessary. Also, implement the new helper function when defragmenting with explicit lzo compression and when setting the default subvolume. Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
02db0844 |
|
21-Jun-2012 |
Josef Bacik <jbacik@fusionio.com> |
Btrfs: add DEVICE_READY ioctl This will be used in conjunction with btrfs device ready <dev>. This is needed for initrd's to have a nice and lightweight way to tell if all of the devices needed for a file system are in the cache currently. This keeps them from having to do mount+sleep loops waiting for devices to show up. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
#
063849ea |
|
16-Apr-2012 |
Arnd Hannemann <arnd@arndnet.de> |
Btrfs: allow mount -o remount,compress=no Btrfs allows to turn on compression on a mounted and used filesystem by issuing mount -o remount,compress=lzo. This patch allows to turn compression off again while the filesystem is mounted. As suggested by David Sterba if the compress-force option was set, it is implicitly cleared if compression is turned off. Tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: Arnd Hannemann <arnd@arndnet.de>
|
#
c5c3c5f3 |
|
05-Apr-2012 |
Josef Bacik <josef@redhat.com> |
Btrfs: remove ->dirty_inode We do all of our inode updating when we change it, and now that we do ->update_time we don't need ->dirty_inode for atime updates anymore, so just remove it. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
|
#
9249e17f |
|
24-Jun-2012 |
David Howells <dhowells@redhat.com> |
VFS: Pass mount flags to sget() Pass mount flags to sget() so that it can use them in initialising a new superblock before the set function is called. They could also be passed to the compare function. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
2b6ba629 |
|
22-Jun-2012 |
Ilya Dryomov <idryomov@gmail.com> |
Btrfs: resume balance on rw (re)mounts properly This introduces btrfs_resume_balance_async(), which, given that restriper state was recovered earlier by btrfs_recover_balance(), resumes balance in btrfs-balance kthread. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
9c5085c1 |
|
05-Jun-2012 |
Josef Bacik <josef@redhat.com> |
Btrfs: implement ->show_devname Because btrfs can remove the device that was mounted we need to have a ->show_devname so that in this case we can print out some other device in the file system to /proc/mount. So if there are multiple devices in a btrfs file system we will just print the device with the lowest devid that we can find. This will make everything consistent and deal with device removal properly. The drawback is if you mount with a device that is higher than the lowest devicd it won't show up as the mounted device in /proc/mounts, but this is a small price to pay. This was inspired by Miao Xie's patch. Thanks, Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <josef@redhat.com>
|
#
f60d16a8 |
|
25-Apr-2012 |
Jim Meyering <meyering@redhat.com> |
Btrfs: avoid buffer overrun in mount option handling There is an off-by-one error: allocating room for a maximal result string but without room for a trailing NUL. That, can lead to returning a transformed string that is not NUL-terminated, and then to a caller reading beyond end of the malloc'd buffer. Rewrite to s/kzalloc/kmalloc/, remove unwarranted use of strncpy (the result is guaranteed to fit), remove dead strlen at end, and change a few variable names and comments. Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Jim Meyering <meyering@redhat.com>
|
#
f07c9a79 |
|
26-Apr-2012 |
Jim Meyering <jim@meyering.net> |
Btrfs: avoid buffer overrun in btrfs_printk The buffer read-overrun would be triggered by a printk format starting with <N>, where N is a single digit. NUL-terminate after strncpy. Use memcpy, not strncpy, since we know the string we're copying fits in the destination buffer and contains no NUL byte. Signed-off-by: Jim Meyering <meyering@redhat.com>
|
#
0d2450ab |
|
24-Apr-2012 |
Sergei Trofimovich <slyfox@gentoo.org> |
btrfs: allow changing 'thread_pool' size at remount time Changing 'mount -oremount,thread_pool=2 /' didn't make any effect: maximum amount of worker threads is specified in 2 places: - in 'strict btrfs_fs_info::thread_pool_size' - in each worker struct: 'struct btrfs_workers::max_workers' 'mount -oremount' updated only 'btrfs_fs_info::thread_pool_size'. Fix it by pushing new maximum value to all created worker structures as well. Cc: Josef Bacik <josef@redhat.com> Cc: Chris Mason <chris.mason@oracle.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
|
#
0c4d2d95 |
|
05-Apr-2012 |
Josef Bacik <josef@redhat.com> |
Btrfs: use i_version instead of our own sequence We've been keeping around the inode sequence number in hopes that somebody would use it, but nobody uses it and people actually use i_version which serves the same purpose, so use i_version where we used the incore inode's sequence number and that way the sequence is updated properly across the board, and not just in file write. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
|
#
996d282c |
|
23-Apr-2012 |
Josef Bacik <josef@redhat.com> |
Btrfs: do not start delalloc inodes during sync btrfs_start_delalloc_inodes will just walk the list of delalloc inodes and start writing them out, but it doesn't splice the list or anything so as long as somebody is doing work on the box you could end up in this section _forever_. So just remove it, it's not needed anyway since sync will start writeback on all inodes anyway, all we need to do is wait for ordered extents and then we can commit the transaction. In my horrible torture test sync goes from taking 4 minutes to about 1.5 minutes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
8a3db184 |
|
15-Apr-2012 |
Sergei Trofimovich <slyfox@gentoo.org> |
btrfs: fix early abort in 'remount' Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Josef Bacik <josef@redhat.com> Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
|
#
e565d4b9 |
|
23-Mar-2012 |
Jan Schmidt <list.btrfs@jan-o-sch.net> |
Btrfs: actually call btrfs_init_lockdep btrfs_init_lockdep only makes our lockdep class names look prettier, thus it did never hurt we forgot to actually call it. This turns our lockdep identifier strings from lockdep auto-set #[id] into really pretty "btrfs-fs-01" or "btrfs-csum-03". Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
|
#
79787eaa |
|
12-Mar-2012 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: replace many BUG_ONs with proper error handling btrfs currently handles most errors with BUG_ON. This patch is a work-in- progress but aims to handle most errors other than internal logic errors and ENOMEM more gracefully. This iteration prevents most crashes but can run into lockups with the page lock on occasion when the timing "works out." Signed-off-by: Jeff Mahoney <jeffm@suse.com>
|
#
49b25e05 |
|
01-Mar-2012 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: enhance transaction abort infrastructure Signed-off-by: Jeff Mahoney <jeffm@suse.com>
|
#
4da35113 |
|
01-Mar-2012 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: add varargs to btrfs_error btrfs currently handles most errors with BUG_ON. This patch is a work-in- progress but aims to handle most errors other than internal logic errors and ENOMEM more gracefully. This iteration prevents most crashes but can run into lockups with the page lock on occasion when the timing "works out." Signed-off-by: Jeff Mahoney <jeffm@suse.com>
|
#
143bede5 |
|
01-Mar-2012 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: return void in functions without error conditions Signed-off-by: Jeff Mahoney <jeffm@suse.com>
|
#
8c342930 |
|
03-Oct-2011 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: Add btrfs_panic() As part of the effort to eliminate BUG_ON as an error handling technique, we need to determine which errors are actual logic errors, which are on-disk corruption, and which are normal runtime errors e.g. -ENOMEM. Annotating these error cases is helpful to understand and report them. This patch adds a btrfs_panic() routine that will either panic or BUG depending on the new -ofatal_errors={panic,bug} mount option. Since there are still so many BUG_ONs, it defaults to BUG for now but I expect that to change once the error handling effort has made significant progress. Signed-off-by: Jeff Mahoney <jeffm@suse.com>
|
#
48fde701 |
|
08-Jan-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
switch open-coded instances of d_make_root() to new helper Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
9555c6c1 |
|
16-Jan-2012 |
Ilya Dryomov <idryomov@gmail.com> |
Btrfs: add skip_balance mount option Since restriper kthread starts involuntarily on mount and can suck cpu and memory bandwidth add a mount option to forcefully skip it. The restriper in that case hangs around in paused state and can be resumed from userspace when it's convenient. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
f84a8bd6 |
|
17-Nov-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
btrfs: take allocation of ->tree_root into open_ctree() now that we don't need it for sget() anymore... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
815745cf |
|
17-Nov-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
btrfs: let ->s_fs_info point to fs_info, not root... the latter can be obtained from the former (by looking as ->tree_root) just as cheaply as we currently are doing the other way round. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
59553edf |
|
16-Nov-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
btrfs: consolidate failure exits in btrfs_mount() a bit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
d22ca7de |
|
16-Nov-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
btrfs: make free_fs_info() call ->kill_sb() unconditional ... and don't bother with it after btrfs_fill_super() failure - ->kill_sb() (unlike ->put_super()) will be called even if we have not got non-NULL ->s_root. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
be7e0950 |
|
16-Nov-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
btrfs: merge free_fs_info() calls on fill_super failures ... all the way up into btrfs_mount(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
29db78aa |
|
16-Nov-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super() We do not (fortunately) modify ->s_fs_info of superblock on the fly in btrfs_fill_super(); apparent assignment is a no-op. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
ad2b2c80 |
|
16-Nov-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
btrfs: make open_ctree() return int It returns either ERR_PTR(-ve) or sb->s_fs_info. The latter can be found by caller just as well, TYVM, no need to return it. Just return -ve or 0... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
6f07e42e |
|
16-Nov-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
btrfs: sanitizing ->fs_info, part 4 A new helper: btrfs_alloc_root(fs_info); allocates btrfs_root and sets ->fs_info. All places allocating the suckers converted to it. At that point we *never* reassign ->fs_info of btrfs_root; it's set before anyone sees the address of newly allocated struct btrfs_root and never assigned anywhere else. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
6de1d09d |
|
16-Nov-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
btrfs: fix mount/umount race Do *NOT* skip doomed superblocks in btrfs_test_super(); we want sget() to wait for their shutdown to complete. Since we don't mutilate ->s_fs_info in ->put_super() anymore (or free what it used to point to until the superblock is past being findable by sget()), we can just DTRT there and report a match. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
aea52e19 |
|
16-Nov-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
btrfs: get ->kill_sb() of its own ... and move free_fs_info() to that, out of ->put_super(). Do NOT set ->s_fs_info to NULL in the latter; we need it for sget() to be able to see and wait for fs in the middle of umount if we get a mount/umount race. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
98c7089c |
|
16-Nov-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
btrfs: preparation to fixing mount/umount race We need fs_info and root to live until the moment when the victim superblock leaves the list, so we need to postpone free_fs_info() until after ->put_super(). The call is buried in close_ctree(), though, so we need to lift it into the callers (including btrfs_put_super()) first. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
34c80b1d |
|
08-Dec-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
vfs: switch ->show_options() to struct dentry * Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
e407699e |
|
24-Nov-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
btrfs, nfs, apparmor: don't pull mnt_namespace.h for no reason... it's not needed anymore; we used to, back when we had to do mount_subtree() by hand, complete with put_mnt_ns() in it. No more... Apparmor didn't need it since the __d_path() fix. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
21adbd5c |
|
09-Nov-2011 |
Stefan Behrens <sbehrens@giantdisaster.de> |
Btrfs: integrate integrity check module into btrfs This is the last part of the patch series. It modifies the btrfs code to use the integrity check module if configured to do so with the define BTRFS_FS_CHECK_INTEGRITY. If this define is not set, the only effective change is that code is added that handles the mount option to activate the integrity check. If the mount option is set and the define BTRFS_FS_CHECK_INTEGRITY is not set, that code complains in the log and the mount fails with EINVAL. Add the mount option to activate the usage of the integrity check code. Add invocation of btrfs integrity check code init and cleanup function on mount and umount, respectively. Add hook to call btrfs integrity check code version of submit_bh/submit_bio. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
|
#
22c44fe6 |
|
30-Nov-2011 |
Josef Bacik <josef@redhat.com> |
Btrfs: deal with enospc from dirtying inodes properly Now that we're properly keeping track of delayed inode space we've been getting a lot of warnings out of btrfs_dirty_inode() when running xfstest 83. This is because a bunch of people call mark_inode_dirty, which is void so we can't return ENOSPC. This needs to be fixed in a few areas 1) file_update_time - this updates the mtime and such when writing to a file, which will call mark_inode_dirty. So copy file_update_time into btrfs so we can call btrfs_dirty_inode directly and return an error if we get one appropriately. 2) fix symlinks to use btrfs_setattr for ->setattr. For some reason we weren't setting ->setattr for symlinks, even though we should have been. This catches one of the cases where we were getting errors in mark_inode_dirty. 3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly instead of mark_inode_dirty. This lets us return errors properly for truncate and chown/anything related to setattr. 4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and print an error if we have one. The only remaining user we can't control for this is touch_atime(), but we don't really want to keep people from walking down the tree if we don't have space to save the atime update, so just complain but don't worry about it. With this patch xfstests 83 complains a handful of times instead of hundreds of times. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
|
#
39fb26c3 |
|
14-Dec-2011 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: fix inaccurate available space on raid0 profile When we use raid0 as the data profile, df command may show us a very inaccurate value of the available space, which may be much less than the real one. It may make the users puzzled. Fix it by changing the calculation of the available space, and making it be more similar to a fake chunk allocation. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
b772a86e |
|
28-Nov-2011 |
Li Zefan <lizf@cn.fujitsu.com> |
Btrfs: fix oops when calling statfs on readonly device To reproduce this bug: # dd if=/dev/zero of=img bs=1M count=256 # mkfs.btrfs img # losetup -r /dev/loop1 img # mount /dev/loop1 /mnt OOPS!! It triggered BUG_ON(!nr_devices) in btrfs_calc_avail_data_space(). To fix this, instead of checking write-only devices, we check all open deivces: # df -h /dev/loop1 Filesystem Size Used Avail Use% Mounted on /dev/loop1 250M 28K 238M 1% /mnt Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
|
#
ea441d11 |
|
16-Nov-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: mount_subtree() takes vfsmount and relative path, does lookup within that vfsmount (possibly triggering automounts) and returns the result as root of subtree suitable for return by ->mount() (i.e. a reference to dentry and an active reference to its superblock grabbed, superblock locked exclusive). btrfs and nfs switched to it instead of open-coding the sucker. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
c1334495 |
|
16-Nov-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
switch create_mnt_ns() to saner calling conventions, fix double mntput() in nfs Life is much saner if create_mnt_ns(mnt) drops mnt in case of error... Switch it to such calling conventions, switch callers, fix double mntput() in fs/nfs/super.c one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
8d514bbf |
|
16-Nov-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
btrfs: fix double mntput() in mount_subvol() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
8965593e |
|
11-Nov-2011 |
David Sterba <dsterba@suse.cz> |
btrfs: rename the option to nospace_cache Rename no_space_cache option to nospace_cache to be more consistent with the rest, where the simple prefix 'no' is used to negate an option. The option has been introduced during the -rc1 cycle and there are has not been widely used, so it's safe. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
04d21a24 |
|
09-Nov-2011 |
Ilya Dryomov <idryomov@gmail.com> |
Btrfs: rework error handling in btrfs_mount() Commits 6c41761f and 45ea6095 introduced the possibility of NULL pointer dereference on error paths, also we would leave all devices busy and leak fs_info with all sub-structures on error when trying to mount an already mounted fs to a different directory. Fix this by doing all allocations before trying to open any of the devices, adjust error path for mount-already-mounted-fs case. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
f23c8af8 |
|
08-Nov-2011 |
Ilya Dryomov <idryomov@gmail.com> |
Btrfs: fix subvol_name leak on error in btrfs_mount() btrfs_parse_early_options() can fail due to error while scanning devices (-o device= option), but still strdup() subvol_name string: mount -o subvol=SUBV,device=BAD_DEVICE <dev> <mnt> So free subvol_name string on error. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
a90e8b6f |
|
08-Nov-2011 |
Ilya Dryomov <idryomov@gmail.com> |
Btrfs: fix memory leak in btrfs_parse_early_options() Don't leak subvol_name string in case multiple subvol= options are given. "The lastest option is effective" behavior (consistent with subvolid= and subvolrootid= options) is preserved. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
45ea6095 |
|
07-Nov-2011 |
slyich@gmail.com <slyich@gmail.com> |
btrfs: fix double-free 'tree_root' in 'btrfs_mount()' On error path 'tree_root' is treed in 'free_fs_info()'. No need to free it explicitely. Noticed by SLUB in debug mode: Complete reproducer under usermode linux (discovered on real machine): bdev=/dev/ubda btr_root=/btr /mkfs.btrfs $bdev mount $bdev $btr_root mkdir $btr_root/subvols/ cd $btr_root/subvols/ /btrfs su cr foo /btrfs su cr bar mount $bdev -osubvol=subvols/foo $btr_root/subvols/bar umount $btr_root/subvols/bar which gives device fsid 4d55aa28-45b1-474b-b4ec-da912322195e devid 1 transid 7 /dev/ubda ============================================================================= BUG kmalloc-2048: Object already free ----------------------------------------------------------------------------- INFO: Allocated in btrfs_mount+0x389/0x7f0 age=0 cpu=0 pid=277 INFO: Freed in btrfs_mount+0x51c/0x7f0 age=0 cpu=0 pid=277 INFO: Slab 0x0000000062886200 objects=15 used=9 fp=0x0000000070b4d2d0 flags=0x4081 INFO: Object 0x0000000070b4d2d0 @offset=21200 fp=0x0000000070b4a968 ... Call Trace: 70b31948: [<6008c522>] print_trailer+0xe2/0x130 70b31978: [<6008c5aa>] object_err+0x3a/0x50 70b319a8: [<6008e242>] free_debug_processing+0x142/0x2a0 70b319e0: [<600ebf6f>] btrfs_mount+0x55f/0x7f0 70b319f8: [<6008e5c1>] __slab_free+0x221/0x2d0 Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Cc: Arne Jansen <sensille@gmx.net> Cc: Chris Mason <chris.mason@oracle.com> Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
af31f5e5 |
|
03-Nov-2011 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: add a log of past tree roots This takes some of the free space in the btrfs super block to record information about most of the roots in the last four commits. It also adds a -o recovery to use the root history log when we're not able to read the tree of tree roots, the extent tree root, the device tree root or the csum root. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
6c41761f |
|
13-Apr-2011 |
David Sterba <dsterba@suse.cz> |
btrfs: separate superblock items out of fs_info fs_info has now ~9kb, more than fits into one page. This will cause mount failure when memory is too fragmented. Top space consumers are super block structures super_copy and super_for_commit, ~2.8kb each. Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64) Add a wrapper for freeing fs_info and all of it's dynamically allocated members. Signed-off-by: David Sterba <dsterba@suse.cz>
|
#
f9d9ef62 |
|
29-Sep-2011 |
David Sterba <dsterba@suse.cz> |
btrfs: do not allow mounting non-subvolumes via subvol option There's a missing test whether the path passed to subvol=path option during mount is a real subvolume, allowing any directory located in default subovlume to be passed and accepted for mount. (current btrfs progs prevent this early) $ btrfs subvol snapshot . p1-snap ERROR: '.' is not a subvolume (with "is subvolume?" test bypassed) $ btrfs subvol snapshot . p1-snap Create a snapshot of '.' in './p1-snap' $ btrfs subvol list -p . ID 258 parent 5 top level 5 path subvol ID 259 parent 5 top level 5 path subvol1 ID 260 parent 5 top level 5 path default-subvol1 ID 262 parent 5 top level 5 path p1/p1-snapshot ID 263 parent 259 top level 5 path subvol1/subvol1-snap The problem I see is that this makes a false impression of snapshotting the given subvolume but in fact snapshots the default one: a user expects outcome like ID 263 but in fact gets ID 262 . This patch makes mount fail with EINVAL with a message in syslog. Signed-off-by: David Sterba <dsterba@suse.cz>
|
#
5f524444 |
|
12-Oct-2011 |
Ilya Dryomov <idryomov@gmail.com> |
Btrfs: fix a bug when opening seed devices Initialize fs_info->bdev_holder a bit earlier to be able to pass a correct holder id to blkdev_get() when opening seed devices with O_EXCL. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
83c8c9bd |
|
14-Sep-2011 |
Jeff Liu <jeff.liu@oracle.com> |
btrfs: trivial fix, a potential memory leak in btrfs_parse_early_options() Signed-off-by: Jie Liu <jeff.liu@oracle.com>
|
#
73bc1876 |
|
03-Oct-2011 |
Josef Bacik <josef@redhat.com> |
Btrfs: introduce mount option no_space_cache Some users have requested this and I've found I needed a way to disable cache loading without actually clearing the cache, so introduce the no_space_cache option. Before we check the super blocks cache generation field and if it was populated we always turned space caching on. Now we check this and set the space cache option on, and then parse the mount options so that if we want it off it get's turned off. Then we check the mount option all the places we do the caching work instead of checking the super's cache generation. This makes things more consistent and lets us turn space caching off. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
|
#
830c4adb |
|
25-Jul-2011 |
Josef Bacik <josef@redhat.com> |
Btrfs: fix how we mount subvol=<whatever> We've only been able to mount with subvol=<whatever> where whatever was a subvol within whatever root we had as the default. This allows us to mount -o subvol=path/to/subvol/you/want relative from the normal fs_tree root. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
|
#
ba5b8958 |
|
25-Jul-2011 |
Josef Bacik <josef@redhat.com> |
Btrfs: use d_obtain_alias when mounting subvol/subvolid Currently what we do is just wrong. We either 1) Alloc a new "root" dentry with sb->s_root as it's parent which is just wrong as we could walk into this subvol later on via another path and hilarity could ensue. Also we don't check the return value of d_splice_alias which isn't good either. or 2) Do a d_find_alias() which we could have lost our dentry from cache at this point and found nothing. So use d_obtain_alias(). In the case that we already have the inode/dentry in cache we will get the correct dentry. If not we will get a disconnected dentry tree so if we walk into it later on everything will be connected up properly. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
|
#
0942caa3 |
|
28-Jun-2011 |
David Sterba <dsterba@suse.cz> |
btrfs: add missing options displayed in mount output There are three missed mount options settable by user which are not currently displayed in mount output. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
4b9465cb |
|
03-Jun-2011 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: add mount -o inode_cache This makes the inode map cache default to off until we fix the overflow problem when the free space crcs don't fit inside a single page. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
9e1f1de0 |
|
03-Jun-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
more conservative S_NOSEC handling Caching "we have already removed suid/caps" was overenthusiastic as merged. On network filesystems we might have had suid/caps set on another client, silently picked by this client on revalidate, all of that *without* clearing the S_NOSEC flag. AFAICS, the only reasonably sane way to deal with that is * new superblock flag; unless set, S_NOSEC is not going to be set. * local block filesystems set it in their ->mount() (more accurately, mount_bdev() does, so does btrfs ->mount(), users of mount_bdev() other than local block ones clear it) * if any network filesystem (or a cluster one) wants to use S_NOSEC, it'll need to set MS_NOSEC in sb->s_flags *AND* take care to clear S_NOSEC when inode attribute changes are picked from other clients. It's not an earth-shattering hole (anybody that can set suid on another client will almost certainly be able to write to the file before doing that anyway), but it's a bug that needs fixing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4cb5300b |
|
24-May-2011 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: add mount -o auto_defrag This will detect small random writes into files and queue the up for an auto defrag process. It isn't well suited to database workloads yet, but works for smaller files such as rpm, sqlite or bdb databases. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
90a887c9 |
|
26-May-2011 |
Dan Magenheimer <dan.magenheimer@oracle.com> |
btrfs: add cleancache support This sixth patch of eight in this cleancache series "opts-in" cleancache for btrfs. Filesystems must explicitly enable cleancache by calling cleancache_init_fs anytime an instance of the filesystem is mounted. Btrfs uses its own readpage which must be hooked, but all other cleancache hooks are in the VFS layer including the matching cleancache_flush_fs hook which must be called on unmount. Details and a FAQ can be found in Documentation/vm/cleancache.txt [v6-v8: no changes] [v5: jeremy@goop.org: simplify init hook and any future fs init changes] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Jan Beulich <JBeulich@novell.com> Cc: Andreas Dilger <adilger@sun.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Nitin Gupta <ngupta@vflare.org>
|
#
b0839166 |
|
14-May-2011 |
Julia Lawall <julia@diku.dk> |
fs/btrfs: Add missing btrfs_free_path Btrfs_alloc_path should be matched with btrfs_free_path in error-handling code. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ local idexpression struct btrfs_path * x; expression ra,rb; position p1,p2; @@ x = btrfs_alloc_path@p1(...) ... when != btrfs_free_path(x,...) when != if (...) { ... btrfs_free_path(x,...) ...} when != x = ra if(...) { ... when != x = rb when forall when != btrfs_free_path(x,...) \(return <+...x...+>; \| return@p2...; \) } @script:python@ p1 << r.p1; p2 << r.p2; @@ cocci.print_main("alloc",p1) cocci.print_secs("return",p2) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
16cdcec7 |
|
22-Apr-2011 |
Miao Xie <miaox@cn.fujitsu.com> |
btrfs: implement delayed inode items operation Changelog V5 -> V6: - Fix oom when the memory load is high, by storing the delayed nodes into the root's radix tree, and letting btrfs inodes go. Changelog V4 -> V5: - Fix the race on adding the delayed node to the inode, which is spotted by Chris Mason. - Merge Chris Mason's incremental patch into this patch. - Fix deadlock between readdir() and memory fault, which is reported by Itaru Kitayama. Changelog V3 -> V4: - Fix nested lock, which is reported by Itaru Kitayama, by updating space cache inode in time. Changelog V2 -> V3: - Fix the race between the delayed worker and the task which does delayed items balance, which is reported by Tsutomu Itoh. - Modify the patch address David Sterba's comment. - Fix the bug of the cpu recursion spinlock, reported by Chris Mason Changelog V1 -> V2: - break up the global rb-tree, use a list to manage the delayed nodes, which is created for every directory and file, and used to manage the delayed directory name index items and the delayed inode item. - introduce a worker to deal with the delayed nodes. Compare with Ext3/4, the performance of file creation and deletion on btrfs is very poor. the reason is that btrfs must do a lot of b+ tree insertions, such as inode item, directory name item, directory name index and so on. If we can do some delayed b+ tree insertion or deletion, we can improve the performance, so we made this patch which implemented delayed directory name index insertion/deletion and delayed inode update. Implementation: - introduce a delayed root object into the filesystem, that use two lists to manage the delayed nodes which are created for every file/directory. One is used to manage all the delayed nodes that have delayed items. And the other is used to manage the delayed nodes which is waiting to be dealt with by the work thread. - Every delayed node has two rb-tree, one is used to manage the directory name index which is going to be inserted into b+ tree, and the other is used to manage the directory name index which is going to be deleted from b+ tree. - introduce a worker to deal with the delayed operation. This worker is used to deal with the works of the delayed directory name index items insertion and deletion and the delayed inode update. When the delayed items is beyond the lower limit, we create works for some delayed nodes and insert them into the work queue of the worker, and then go back. When the delayed items is beyond the upper bound, we create works for all the delayed nodes that haven't been dealt with, and insert them into the work queue of the worker, and then wait for that the untreated items is below some threshold value. - When we want to insert a directory name index into b+ tree, we just add the information into the delayed inserting rb-tree. And then we check the number of the delayed items and do delayed items balance. (The balance policy is above.) - When we want to delete a directory name index from the b+ tree, we search it in the inserting rb-tree at first. If we look it up, just drop it. If not, add the key of it into the delayed deleting rb-tree. Similar to the delayed inserting rb-tree, we also check the number of the delayed items and do delayed items balance. (The same to inserting manipulation) - When we want to update the metadata of some inode, we cached the data of the inode into the delayed node. the worker will flush it into the b+ tree after dealing with the delayed insertion and deletion. - We will move the delayed node to the tail of the list after we access the delayed node, By this way, we can cache more delayed items and merge more inode updates. - If we want to commit transaction, we will deal with all the delayed node. - the delayed node will be freed when we free the btrfs inode. - Before we log the inode items, we commit all the directory name index items and the delayed inode update. I did a quick test by the benchmark tool[1] and found we can improve the performance of file creation by ~15%, and file deletion by ~20%. Before applying this patch: Create files: Total files: 50000 Total time: 1.096108 Average time: 0.000022 Delete files: Total files: 50000 Total time: 1.510403 Average time: 0.000030 After applying this patch: Create files: Total files: 50000 Total time: 0.932899 Average time: 0.000019 Delete files: Total files: 50000 Total time: 1.215732 Average time: 0.000024 [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3 Many thanks for Kitayama-san's help! Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: David Sterba <dave@jikos.cz> Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Tested-by: Itaru Kitayama <kitayama@cl.bb4u.ne.jp> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
bcd53741 |
|
12-Apr-2011 |
Arne Jansen <sensille@gmx.net> |
btrfs: move btrfs_cmp_device_free_bytes to super.c this function won't be used here anymore, so move it super.c where it is used for df-calculation
|
#
306e16ce |
|
19-Apr-2011 |
David Sterba <dsterba@suse.cz> |
btrfs: rename variables clashing with global function names reported by gcc -Wshadow: page_index, page_offset, new_inode, dev_name Signed-off-by: David Sterba <dsterba@suse.cz>
|
#
e15d0542 |
|
06-Apr-2011 |
Xin Zhong <xin.zhong@intel.com> |
Btrfs: fix subvolume mount by name problem when default mount subvolume is set We create two subvolumes (meego_root and meego_home) in btrfs root directory. And set meego_root as default mount subvolume. After we remount btrfs, meego_root is mounted to top directory by default. Then when we try to mount meego_home (subvol=meego_home) to a subdirectory, it failed. The problem is when default mount subvolume is set to meego_root, we search meego_home in meego_root but can not find it. So the solution is to add a new mount option (subvolrootid) to specify subvol id of root and search subvol name in it. For our case, now we can use "-o subvolrootid=0,subvol=meego_home) to mount meego_home. Detail information can be found in meego bugzilla: https://bugs.meego.com/show_bug.cgi?id=15055 Signed-off-by: Zhong, Xin <xin.zhong@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
200da64e |
|
30-Mar-2011 |
Tsutomu Itoh <t-itoh@jp.fujitsu.com> |
Btrfs: fix /proc/mounts info. Some mount options are not displayed by /proc/mounts. This patch displays the option such as compress_type by /proc/mounts. Ex. [before] $ mount | grep sdc2 /dev/sdc2 on /test12 type btrfs (rw,space_cache,compress=lzo) $ cat /proc/mounts | grep sdc2 /dev/sdc2 /test12 btrfs rw,relatime,compress 0 0 [after] $ mount | grep sdc2 /dev/sdc2 on /test12 type btrfs (rw,space_cache,compress=lzo) $ cat /proc/mounts | grep sdc2 /dev/sdc2 /test12 btrfs rw,relatime,compress=lzo,space_cache 0 0 Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
1abe9b8a |
|
24-Mar-2011 |
liubo <liubo2009@cn.fujitsu.com> |
Btrfs: add initial tracepoint support for btrfs Tracepoints can provide insight into why btrfs hits bugs and be greatly helpful for debugging, e.g dd-7822 [000] 2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0 dd-7822 [000] 2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0 btrfs-transacti-7804 [001] 2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0) btrfs-transacti-7804 [001] 2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0) btrfs-transacti-7804 [001] 2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8 flush-btrfs-2-7821 [001] 2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA flush-btrfs-2-7821 [001] 2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0) flush-btrfs-2-7821 [001] 2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0) flush-btrfs-2-7821 [000] 2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0) Here is what I have added: 1) ordere_extent: btrfs_ordered_extent_add btrfs_ordered_extent_remove btrfs_ordered_extent_start btrfs_ordered_extent_put These provide critical information to understand how ordered_extents are updated. 2) extent_map: btrfs_get_extent extent_map is used in both read and write cases, and it is useful for tracking how btrfs specific IO is running. 3) writepage: __extent_writepage btrfs_writepage_end_io_hook Pages are cirtical resourses and produce a lot of corner cases during writeback, so it is valuable to know how page is written to disk. 4) inode: btrfs_inode_new btrfs_inode_request btrfs_inode_evict These can show where and when a inode is created, when a inode is evicted. 5) sync: btrfs_sync_file btrfs_sync_fs These show sync arguments. 6) transaction: btrfs_transaction_commit In transaction based filesystem, it will be useful to know the generation and who does commit. 7) back reference and cow: btrfs_delayed_tree_ref btrfs_delayed_data_ref btrfs_delayed_ref_head btrfs_cow_block Btrfs natively supports back references, these tracepoints are helpful on understanding btrfs's COW mechanism. 8) chunk: btrfs_chunk_alloc btrfs_chunk_free Chunk is a link between physical offset and logical offset, and stands for space infomation in btrfs, and these are helpful on tracing space things. 9) reserved_extent: btrfs_reserved_extent_alloc btrfs_reserved_extent_free These can show how btrfs uses its space. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
91435650 |
|
16-Feb-2011 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: put ENOSPC debugging under a mount option ENOSPC in btrfs is getting to the point where the extra debugging isn't required. I've put it under mount -o enospc_debug just in case someone is having difficult problems. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
98d5dc13 |
|
19-Jan-2011 |
Tsutomu Itoh <t-itoh@jp.fujitsu.com> |
btrfs: fix return value check of btrfs_start_transaction() The error check of btrfs_start_transaction() is added, and the mistake of the error check on several places is corrected. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
3f3d0bc0 |
|
27-Dec-2010 |
Tero Roponen <tero.roponen@gmail.com> |
Btrfs: Free correct pointer after using strsep We must save and free the original kstrdup()'ed pointer because strsep() modifies its first argument. Signed-off-by: Tero Roponen <tero.roponen@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
|
#
bdc924bb |
|
27-Dec-2010 |
Ian Kent <raven@themaw.net> |
Btrfs: Fix memory leak on finding existing super We missed a memory deallocation in commit 450ba0ea. If an existing super block is found at mount and there is no error condition then the pre-allocated tree_root and fs_info are no not used and are not freeded. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
|
#
acce952b |
|
06-Jan-2011 |
liubo <liubo2009@cn.fujitsu.com> |
Btrfs: forced readonly mounts on errors This patch comes from "Forced readonly mounts on errors" ideas. As we know, this is the first step in being more fault tolerant of disk corruptions instead of just using BUG() statements. The major content: - add a framework for generating errors that should result in filesystems going readonly. - keep FS state in disk super block. - make sure that all of resource will be freed and released at umount time. - make sure that fter FS is forced readonly on error, there will be no more disk change before FS is corrected. For this, we should stop write operation. After this patch is applied, the conversion from BUG() to such a framework can happen incrementally. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
6d07bcec |
|
05-Jan-2011 |
Miao Xie <miaox@cn.fujitsu.com> |
btrfs: fix wrong free space information of btrfs When we store data by raid profile in btrfs with two or more different size disks, df command shows there is some free space in the filesystem, but the user can not write any data in fact, df command shows the wrong free space information of btrfs. # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10 # btrfs-show Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64 Total devices 2 FS bytes used 28.00KB devid 1 size 5.01GB used 2.03GB path /dev/sda9 devid 2 size 10.00GB used 2.01GB path /dev/sda10 # btrfs device scan /dev/sda9 /dev/sda10 # mount /dev/sda9 /mnt # dd if=/dev/zero of=tmpfile0 bs=4K count=9999999999 (fill the filesystem) # sync # df -TH Filesystem Type Size Used Avail Use% Mounted on /dev/sda9 btrfs 17G 8.6G 5.4G 62% /mnt # btrfs-show Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64 Total devices 2 FS bytes used 3.99GB devid 1 size 5.01GB used 5.01GB path /dev/sda9 devid 2 size 10.00GB used 4.99GB path /dev/sda10 It is because btrfs cannot allocate chunks when one of the pairing disks has no space, the free space on the other disks can not be used for ever, and should be subtracted from the total space, but btrfs doesn't subtract this space from the total. It is strange to the user. This patch fixes it by calcing the free space that can be used to allocate chunks. Implementation: 1. get all the devices free space, and align them by stripe length. 2. sort the devices by the free space. 3. check the free space of the devices, 3.1. if it is not zero, and then check the number of the devices that has more free space than this device, if the number of the devices is beyond the min stripe number, the free space can be used, and add into total free space. if the number of the devices is below the min stripe number, we can not use the free space, the check ends. 3.2. if the free space is zero, check the next devices, goto 3.1 This implementation is just likely fake chunk allocation. After appling this patch, df can show correct space information: # df -TH Filesystem Type Size Used Avail Use% Mounted on /dev/sda9 btrfs 17G 8.6G 0 100% /mnt Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
299a08b1 |
|
05-Jan-2011 |
Miao Xie <miaox@cn.fujitsu.com> |
btrfs: fix wrong data space statistics Josef has implemented mixed data/metadata chunks, we must add those chunks' space just like data chunks. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
af53d29a |
|
20-Dec-2010 |
Al Viro <viro@zeniv.linux.org.uk> |
switch btrfs, close races Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
a6fa6fae |
|
25-Oct-2010 |
Li Zefan <lizf@cn.fujitsu.com> |
btrfs: Add lzo compression support Lzo is a much faster compression algorithm than gzib, so would allow more users to enable transparent compression, and some users can choose from compression ratio and speed for different applications Usage: # mount -t btrfs -o compress[=<zlib,lzo>] dev /mnt or # mount -t btrfs -o compress-force[=<zlib,lzo>] dev /mnt "-o compress" without argument is still allowed for compatability. Compatibility: If we mount a filesystem with lzo compression, it will not be able be mounted in old kernels. One reason is, otherwise btrfs will directly dump compressed data, which sits in inline extent, to user. Performance: The test copied a linux source tarball (~400M) from an ext4 partition to the btrfs partition, and then extracted it. (time in second) lzo zlib nocompress copy: 10.6 21.7 14.9 extract: 70.1 94.4 66.6 (data size in MB) lzo zlib nocompress copy: 185.87 108.69 394.49 extract: 193.80 132.36 381.21 Changelog: v1 -> v2: - Select LZO_COMPRESS and LZO_DECOMPRESS in btrfs Kconfig. - Add incompability flag. - Fix error handling in compress code. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
|
#
261507a0 |
|
16-Dec-2010 |
Li Zefan <lizf@cn.fujitsu.com> |
btrfs: Allow to add new compression algorithm Make the code aware of compression type, instead of always assuming zlib compression. Also make the zlib workspace function as common code for all compression types. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
|
#
f106e82c |
|
06-Dec-2010 |
Li Zefan <lizf@cn.fujitsu.com> |
Btrfs: Fix a crash when mounting a subvolume We should drop dentry before deactivating the superblock, otherwise we can hit this bug: BUG: Dentry f349a690{i=100,n=/} still in use (1) [unmount of btrfs loop1] ... Steps to reproduce the bug: # mount /dev/loop1 /mnt # mkdir save # btrfs subvolume snapshot /mnt save/snap1 # umount /mnt # mount -o subvol=save/snap1 /dev/loop1 /mnt (crash) Reported-by: Michael Niederle <mniederle@gmx.at> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
450ba0ea |
|
19-Nov-2010 |
Josef Bacik <josef@redhat.com> |
Btrfs: setup blank root and fs_info for mount time There is a problem with how we use sget, it searches through the list of supers attached to the fs_type looking for a super with the same fs_devices as what we're trying to mount. This depends on sb->s_fs_info being filled, but we don't fill that in until we get to btrfs_fill_super, so we could hit supers on the fs_type super list that have a null s_fs_info. In order to fix that we need to go ahead and setup a blank root with a blank fs_info to hold fs_devices, that way our test will work out right and then we can set s_fs_info in btrfs_set_super, and then open_ctree will simply use our pre-allocated root and fs_info when setting everything up. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
619c8c76 |
|
21-Nov-2010 |
Ian Kent <raven@themaw.net> |
Btrfs - fix race between btrfs_get_sb() and umount When mounting a btrfs file system btrfs_test_super() may attempt to use sb->s_fs_info, the btrfs root, of a super block that is going away and that has had the btrfs root set to NULL in its ->put_super(). But if the super block is going away it cannot be an existing super block so we can return false in this case. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
0de90876 |
|
19-Nov-2010 |
Josef Bacik <josef@redhat.com> |
Btrfs: handle the space_cache option properly When I added the clear_cache option I screwed up and took the break out of the space_cache case statement, so whenever you mount with space_cache you also get clear_cache, which does you no good if you say set space_cache in fstab so it always gets set. This patch adds the break back in properly. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
4260f7c7 |
|
29-Oct-2010 |
Sage Weil <sage@newdream.net> |
Btrfs: allow subvol deletion by unprivileged user with -o user_subvol_rm_allowed Add a mount option user_subvol_rm_allowed that allows users to delete a (potentially non-empty!) subvol when they would otherwise we allowed to do an rmdir(2). We duplicate the may_delete() checks from the core VFS code to implement identical security checks (minus the directory size check). We additionally require that the user has write+exec permission on the subvol root inode. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
559af821 |
|
29-Oct-2010 |
Andi Kleen <andi@firstfloor.org> |
Btrfs: cleanup warnings from gcc 4.6 (nonbugs) These are all the cases where a variable is set, but not read which are not bugs as far as I can see, but simply leftovers. Still needs more review. Found by gcc 4.6's new warnings Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
d0b678cb |
|
29-Oct-2010 |
Julia Lawall <julia@diku.dk> |
Btrfs: Use ERR_CAST helpers Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more clear what is the purpose of the operation, which otherwise looks like a no-op. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; T x; identifier f; @@ T f (...) { <+... - ERR_PTR(PTR_ERR(x)) + x ...+> } @@ expression x; @@ - ERR_PTR(PTR_ERR(x)) + ERR_CAST(x) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
88c2ba3b |
|
21-Sep-2010 |
Josef Bacik <josef@redhat.com> |
Btrfs: Add a clear_cache mount option If something goes wrong with the free space cache we need a way to make sure it's not loaded on mount and that it's cleared for everybody. When you pass the clear_cache option it will make it so all block groups are setup to be cleared, which keeps them from being loaded and then they will be truncated when the transaction is committed. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
|
#
061dbc6b |
|
26-Jul-2010 |
Al Viro <viro@zeniv.linux.org.uk> |
convert btrfs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
0af3d00b |
|
21-Jun-2010 |
Josef Bacik <josef@redhat.com> |
Btrfs: create special free space cache inode In order to save free space cache, we need an inode to hold the data, and we need a special item to point at the right inode for the right block group. So first, create a special item that will point to the right inode, and the number of extent entries we will have and the number of bitmaps we will have. We truncate and pre-allocate space everytime to make sure it's uptodate. This feature will be turned on as soon as you mount with -o space_cache, however it is safe to boot into old kernels, they will just generate the cache the old fashion way. When you boot back into a newer kernel we will notice that we modified and not the cache and automatically discard the cache. Signed-off-by: Josef Bacik <josef@redhat.com>
|
#
0e78340f |
|
22-Oct-2010 |
Josef Bacik <josef@redhat.com> |
Btrfs: fix error handling in btrfs_get_sb If we failed to find the root subvol id, or the subvol=<name>, we would deactivate the locked super and close the devices. The problem is at this point we have gotten the SB all setup, which includes setting super_operations, so when we'd deactiveate the super, we'd do a close_ctree() which closes the devices, so we'd end up closing the devices twice. So if you do something like this mount /dev/sda1 /mnt/test1 mount /dev/sda1 /mnt/test2 -o subvol=xxx umount /mnt/test1 it would blow up (if subvol xxx doesn't exist). This patch fixes that problem. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
|
#
89a55897 |
|
14-Oct-2010 |
Josef Bacik <josef@redhat.com> |
Btrfs: fix df regression The new ENOSPC stuff breaks out the raid types which breaks the way we were reporting df to the system. This fixes it back so that Available is the total space available to data and used is the actual bytes used by the filesystem. This means that Available is Total - data used - all of the metadata space. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
|
#
6038f373 |
|
15-Aug-2010 |
Arnd Bergmann <arnd@arndb.de> |
llseek: automatically add .llseek fop All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode *i, struct file *f) { <+... ( nonseekable_open(...) | nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable */ }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, /* open uses nonseekable */ }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, /* we have seq_read */ }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, /* readdir is present */ }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, /* read accesses f_pos */ }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, /* write accesses f_pos */ }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, /* read and write both use no f_pos */ }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, /* write uses no f_pos */ }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, /* read uses no f_pos */ }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, /* no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>
|
#
bd555975 |
|
07-Jun-2010 |
Al Viro <viro@zeniv.linux.org.uk> |
convert btrfs to ->evict_inode() NB: do we want btrfs_wait_ordered_range() on eviction of inodes with positive i_nlink on subvolume with zero root_refs? If not, btrfs_evict_inode() can be simplified by unconditionally bailing out in case of i_nlink > 0 in the very beginning... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4cbd1149 |
|
29-May-2010 |
Dan Carpenter <error27@gmail.com> |
Btrfs: btrfs_iget() returns ERR_PTR btrfs_iget() returns an ERR_PTR() on failure and not null. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
fb4f6f91 |
|
29-May-2010 |
Dan Carpenter <error27@gmail.com> |
Btrfs: handle error returns from btrfs_lookup_dir_item() If btrfs_lookup_dir_item() fails, we should can just let the mount fail with an error. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
578454ff |
|
20-May-2010 |
Kay Sievers <kay.sievers@vrfy.org> |
driver core: add devname module aliases to allow module on-demand auto-loading This adds: alias: devname:<name> to some common kernel modules, which will allow the on-demand loading of the kernel module when the device node is accessed. Ideally all these modules would be compiled-in, but distros seems too much in love with their modularization that we need to cover the common cases with this new facility. It will allow us to remove a bunch of pretty useless init scripts and modprobes from init scripts. The static device node aliases will be carried in the module itself. The program depmod will extract this information to a file in the module directory: $ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname # Device nodes to trigger on-demand module loading. microcode cpu/microcode c10:184 fuse fuse c10:229 ppp_generic ppp c108:0 tun net/tun c10:200 dm_mod mapper/control c10:235 Udev will pick up the depmod created file on startup and create all the static device nodes which the kernel modules specify, so that these modules get automatically loaded when the device node is accessed: $ /sbin/udevd --debug ... static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184 static_dev_create_from_modules: mknod '/dev/fuse' c10:229 static_dev_create_from_modules: mknod '/dev/ppp' c108:0 static_dev_create_from_modules: mknod '/dev/net/tun' c10:200 static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235 udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666 udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666 A few device nodes are switched to statically allocated numbers, to allow the static nodes to work. This might also useful for systems which still run a plain static /dev, which is completely unsafe to use with any dynamic minor numbers. Note: The devname aliases must be limited to the *common* and *single*instance* device nodes, like the misc devices, and never be used for conceptually limited systems like the loop devices, which should rather get fixed properly and get a control node for losetup to talk to, instead of creating a random number of device nodes in advance, regardless if they are ever used. This facility is to hide the mess distros are creating with too modualized kernels, and just to hide that these modules are not compiled-in, and not to paper-over broken concepts. Thanks! :) Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Ian Kent <raven@themaw.net> Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
d68fc57b |
|
16-May-2010 |
Yan, Zheng <zheng.yan@oracle.com> |
Btrfs: Metadata reservation for orphan inodes reserve metadata space for handling orphan inodes Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
a22285a6 |
|
16-May-2010 |
Yan, Zheng <zheng.yan@oracle.com> |
Btrfs: Integrate metadata reservation with start_transaction Besides simplify the code, this change makes sure all metadata reservation for normal metadata operations are released after committing transaction. Changes since V1: Add code that check if unlink and rmdir will free space. Add ENOSPC handling for clone ioctl. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
b742bb82 |
|
16-May-2010 |
Yan, Zheng <zheng.yan@oracle.com> |
Btrfs: Link block groups of different raid types The size of reserved space is stored in space_info. If block groups of different raid types are linked to separate space_info, changing allocation profile will corrupt reserved space accounting. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
287a0ab9 |
|
19-Mar-2010 |
Josef Bacik <josef@redhat.com> |
Btrfs: kill max_extent mount option As Yan pointed out, theres not much reason for all this complicated math to account for file extents being split up into max_extent chunks, since they are likely to all end up in the same leaf anyway. Since there isn't much reason to use max_extent, just remove the option altogether so we have one less thing we need to test. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
5a0e3ad6 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
#
91748467 |
|
28-Feb-2010 |
Akinobu Mita <akinobu.mita@gmail.com> |
btrfs: use memparse Use memparse() instead of its own private implementation. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
bd4d1088 |
|
05-Mar-2010 |
Josef Bacik <josef@redhat.com> |
Btrfs: make df be a little bit more understandable The way we report df usage is way confusing for everybody, including some other utilities (bacula for one). So this patch makes df a little bit more understandable. First we make used actually count the total amount of used space in all space info's. This will give us a real view of how much disk space is in use. Second, for blocks available, only count data space. This makes things like bacula work because it says 0 when you can no longer write anymore data to the disk. I think this is a nice compromise, since you will end up with something like the following [root@alpha ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup-lv_root 148G 30G 111G 21% / /dev/sda1 194M 116M 68M 64% /boot tmpfs 985M 12K 985M 1% /dev/shm /dev/mapper/VolGroup-LogVol02 145G 140G 0 100% /mnt/btrfs-test Compare this with btrfsctl -i output [root@alpha btrfs-progs-unstable]# ./btrfsctl -i /mnt/btrfs-test/ Metadata, DUP: total=4.62GB, used=2.46GB System, DUP: total=8.00MB, used=24.00KB Data: total=134.80GB, used=134.80GB Metadata: total=8.00MB, used=0.00 System: total=4.00MB, used=0.00 operation complete This way we show that there is no more data space to be used, but we have another 5GB of space left for metadata. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
4849f01d |
|
14-Dec-2009 |
Josef Bacik <josef@redhat.com> |
Btrfs: make subvolid=0 mount the original default root Since theres not a good way to make sure the user sees the original default root tree id, and not to mention it's 5 so is way different than any other volume, just make subvol=0 mount the original default root. This makes it a bit easier for users to handle in the long run. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
73f73415 |
|
04-Dec-2009 |
Josef Bacik <josef@redhat.com> |
Btrfs: change how we mount subvolumes This work is in preperation for being able to set a different root as the default mounting root. There is currently a problem with how we mount subvolumes. We cannot currently mount a subvolume of a subvolume, you can only mount subvolumes/snapshots of the default subvolume. So say you take a snapshot of the default subvolume and call it snap1, and then take a snapshot of snap1 and call it snap2, so now you have / /snap1 /snap1/snap2 as your available volumes. Currently you can only mount / and /snap1, you cannot mount /snap1/snap2. To fix this problem instead of passing subvolid=<name> you must pass in subvolid=<treeid>, where <treeid> is the tree id that gets spit out via the subvolume listing you get from the subvolume listing patches (btrfs filesystem list). This allows us to mount /, /snap1 and /snap1/snap2 as the root volume. In addition to the above, we also now read the default dir item in the tree root to get the root key that it points to. For now this just points at what has always been the default subvolme, but later on I plan to change it to point at whatever root you want to be the new default root, so you can just set the default mount and not have to mount with -o subvolid=<treeid>. I tested this out with the above scenario and it worked perfectly. Thanks, mount -o subvol operates inside the selected subvolid. For example: mount -o subvol=snap1,subvolid=256 /dev/xxx /mnt /mnt will have the snap1 directory for the subvolume with id 256. mount -o subvol=snap /dev/xxx /mnt /mnt will be the snap directory of whatever the default subvolume is. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
da495ecc |
|
25-Feb-2010 |
Josef Bacik <josef@redhat.com> |
Btrfs: kfree correct pointer during mount option parsing We kstrdup the options string, but then strsep screws with the pointer, so when we kfree() it, we're not giving it the right pointer. Tested-by: Andy Lutomirski <luto@mit.edu> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
a555f810 |
|
28-Jan-2010 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add mount -o compress-force The default btrfs mount -o compress mode will quickly back off compressing a file if it notices that compression does not reduce the size of the data being written. This can save considerable CPU because all future writes to the file go through uncompressed. But some files are both very large and have mixed data stored in them. In that case, we want to add the ability to always try compressing data before writing it. This commit adds mount -o compress-force. A later commit will add a new inode flag that does the same thing. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
20a5239a |
|
14-Dec-2009 |
Matthew Wilcox <willy@infradead.org> |
Btrfs: Show discard option in /proc/mounts Christoph's patch e244a0aeb6a599c19a7c802cda6e2d67c847b154 doesn't display the discard option in /proc/mounts, leading to some confusion for me. Here's the missing bit. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
a7a3f7ca |
|
06-Nov-2009 |
Sage Weil <sage@newdream.net> |
Btrfs: fail mount on bad mount options We shouldn't silently ignore unrecognized options. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
24bbcf04 |
|
12-Nov-2009 |
Yan, Zheng <zheng.yan@oracle.com> |
Btrfs: Add delayed iput iput() can trigger new transactions if we are dropping the final reference, so calling it in btrfs_commit_transaction may end up deadlock. This patch adds delayed iput to avoid the issue. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
e244a0ae |
|
14-Oct-2009 |
Christoph Hellwig <hch@lst.de> |
Btrfs: add -o discard option Enable discard by default is not a good idea given the the trim speed of SSD prototypes we've seen, and the carecteristics for many high-end arrays. Turn of discards by default and require the -o discard option to enable them on. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
0eda294d |
|
13-Oct-2009 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: fix btrfs acl #ifdef checks The btrfs acl code was #ifdefing for a define that didn't exist. This correctly matches it to the values used by the Kconfig file. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
49cf6f45 |
|
29-Sep-2009 |
Chris Ball <cjb@laptop.org> |
Btrfs: Fix setting umask when POSIX ACLs are not enabled We currently set sb->s_flags |= MS_POSIXACL unconditionally, which is incorrect -- it tells the VFS that it shouldn't set umask because we will, yet we don't set it ourselves if we aren't using POSIX ACLs, so the umask ends up ignored. Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
b87221de |
|
21-Sep-2009 |
Alexey Dobriyan <adobriyan@gmail.com> |
const: mark remaining super_operations const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
76dda93c |
|
21-Sep-2009 |
Yan, Zheng <zheng.yan@oracle.com> |
Btrfs: add snapshot/subvolume destroy ioctl This patch adds snapshot/subvolume destroy ioctl. A subvolume that isn't being used and doesn't contains links to other subvolumes can be destroyed. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
405f5571 |
|
11-Jul-2009 |
Alexey Dobriyan <adobriyan@gmail.com> |
headers: smp_lock.h redux * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5af7926f |
|
05-May-2009 |
Christoph Hellwig <hch@lst.de> |
enforce ->sync_fs is only called for rw superblock Make sure a superblock really is writeable by checking MS_RDONLY under s_umount. sync_filesystems needed some re-arragement for that, but all but one sync_filesystem caller had the correct locking already so that we could add that check there. cachefiles grew s_umount locking. I've also added a WARN_ON to sync_filesystem to assert this for future callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
59d697b7 |
|
27-Apr-2009 |
Christoph Hellwig <hch@infradead.org> |
btrfs: remove ->write_super and stop maintaining ->s_dirt Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
067c28ad |
|
11-Jun-2009 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: fix -o nodatasum printk spelling It was printing nodatacsum, which was not the correct option name. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
c289811c |
|
10-Jun-2009 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: autodetect SSD devices During mount, btrfs will check the queue nonrot flag for all the devices found in the FS. If they are all non-rotating, SSD mode is enabled by default. If the FS was mounted with -o nossd, the non-rotating flag is ignored. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
451d7585 |
|
09-Jun-2009 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: add mount -o ssd_spread to spread allocations out Some SSDs perform best when reusing block numbers often, while others perform much better when clustering strictly allocates big chunks of unused space. The default mount -o ssd will find rough groupings of blocks where there are a bunch of free blocks that might have some allocated blocks mixed in. mount -o ssd_spread will make sure there are no allocated blocks mixed in. It should perform better on lower end SSDs. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
3b30c22f |
|
09-Jun-2009 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add mount -o nossd This allows you to turn off the ssd mode via remount. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
5d4f98a2 |
|
10-Jun-2009 |
Yan Zheng <zheng.yan@oracle.com> |
Btrfs: Mixed back reference (FORWARD ROLLING FORMAT CHANGE) This commit introduces a new kind of back reference for btrfs metadata. Once a filesystem has been mounted with this commit, IT WILL NO LONGER BE MOUNTABLE BY OLDER KERNELS. When a tree block in subvolume tree is cow'd, the reference counts of all extents it points to are increased by one. At transaction commit time, the old root of the subvolume is recorded in a "dead root" data structure, and the btree it points to is later walked, dropping reference counts and freeing any blocks where the reference count goes to 0. The increments done during cow and decrements done after commit cancel out, and the walk is a very expensive way to go about freeing the blocks that are no longer referenced by the new btree root. This commit reduces the transaction overhead by avoiding the need for dead root records. When a non-shared tree block is cow'd, we free the old block at once, and the new block inherits old block's references. When a tree block with reference count > 1 is cow'd, we increase the reference counts of all extents the new block points to by one, and decrease the old block's reference count by one. This dead tree avoidance code removes the need to modify the reference counts of lower level extents when a non-shared tree block is cow'd. But we still need to update back ref for all pointers in the block. This is because the location of the block is recorded in the back ref item. We can solve this by introducing a new type of back ref. The new back ref provides information about pointer's key, level and in which tree the pointer lives. This information allow us to find the pointer by searching the tree. The shortcoming of the new back ref is that it only works for pointers in tree blocks referenced by their owner trees. This is mostly a problem for snapshots, where resolving one of these fuzzy back references would be O(number_of_snapshots) and quite slow. The solution used here is to use the fuzzy back references in the common case where a given tree block is only referenced by one root, and use the full back references when multiple roots have a reference on a given block. This commit adds per subvolume red-black tree to keep trace of cached inodes. The red-black tree helps the balancing code to find cached inodes whose inode numbers within a given range. This commit improves the balancing code by introducing several data structures to keep the state of balancing. The most important one is the back ref cache. It caches how the upper level tree blocks are referenced. This greatly reduce the overhead of checking back ref. The improved balancing code scales significantly better with a large number of snapshots. This is a very large commit and was written in a number of pieces. But, they depend heavily on the disk format change and were squashed together to make sure git bisect didn't end up in a bad state wrt space balancing or the format change. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
6b65c5c6 |
|
14-May-2009 |
Sage Weil <sage@newdream.net> |
Btrfs: make show_options result match actual option names The notreelog and flushoncommit mount options were being printed slightly differently. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
6f5bbff9 |
|
05-May-2009 |
Al Viro <viro@zeniv.linux.org.uk> |
Convert obvious places to deactivate_locked_super() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
21380931 |
|
21-Apr-2009 |
Joel Becker <joel.becker@oracle.com> |
Btrfs: Fix a bunch of printk() warnings. Just happened to notice a bunch of %llu vs u64 warnings. Here's a patch to cast them all. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
97e728d4 |
|
21-Apr-2009 |
Josef Bacik <jbacik@redhat.com> |
Btrfs: try to keep a healthy ratio of metadata vs data block groups This patch makes the chunk allocator keep a good ratio of metadata vs data block groups. By default for every 8 data block groups, we'll allocate 1 metadata chunk, or about 12% of the disk will be allocated for metadata. This can be changed by specifying the metadata_ratio mount option. This is simply the number of data block groups that have to be allocated to force a metadata chunk allocation. By making sure we allocate metadata chunks more often, we are less likely to get into situations where the whole disk has been allocated as data block groups. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
dae7b665 |
|
08-Apr-2009 |
Li Zefan <lizf@cn.fujitsu.com> |
btrfs: use memdup_user() Remove open-coded memdup_user(). Note this changes some GFP_NOFS to GFP_KERNEL, since copy_from_user() may cause pagefault, it's pointless to pass GFP_NOFS to kmalloc(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
dccae999 |
|
02-Apr-2009 |
Sage Weil <sage@newdream.net> |
Btrfs: add flushoncommit mount option The 'flushoncommit' mount option forces any data dirtied by a write in a prior transaction to commit as part of the current commit. This makes the committed state a fully consistent view of the file system from the application's perspective (i.e., it includes all completed file system operations). This was previously the behavior only when a snapshot is created. This is used by Ceph to ensure that completed writes make it to the platter along with the metadata operations they are bound to (by BTRFS_IOC_TRANS_{START,END}). Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
3a5e1404 |
|
02-Apr-2009 |
Sage Weil <sage@newdream.net> |
Btrfs: notreelog mount option Add a 'notreelog' mount option to disable the tree log (used by fsync, O_SYNC writes). This is much slower, but the tree logging produces inconsistent views into the FS for ceph. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
a9572a15 |
|
02-Apr-2009 |
Eric Paris <eparis@redhat.com> |
Btrfs: introduce btrfs_show_options btrfs options can change at times other than mount, yet /proc/mounts shows the options string used when the fs was mounted (an example would be when btrfs determines that barriers aren't useful and turns them off.) This patch instead outputs the actual options in use by btrfs. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
e1df36d2 |
|
12-Feb-2009 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: don't clean old snapshots on sync(1) Cleaning old snapshots can make sync(1) somewhat slow, and some users and applications still use it in a global fsync kind of workload. This patch changes btrfs not to clean old snapshots during sync, which is safe from a FS consistency point of view. The major downside is that it makes it difficult to tell when old snapshots have been reaped and the space they were using has been reclaimed. A new ioctl will be added for this purpose instead. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
b288052e |
|
12-Feb-2009 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: process mount options on mount -o remount, Btrfs wasn't parsing any new mount options during remount, making it difficult to set mount options on a root drive. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
7eaebe7d |
|
21-Jan-2009 |
Huang Weiyi <weiyi.huang@gmail.com> |
Btrfs: removed unused #include <version.h>'s Removed unused #include <version.h>'s in btrfs Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
19d00cc1 |
|
21-Jan-2009 |
Wang Cong <xiyou.wangcong@gmail.com> |
Btrfs: cleanup fs/btrfs/super.c::btrfs_control_ioctl() - Remove the unused local variable 'len'; - Check return value of kmalloc(). Signed-off-by: Wang Cong <wangcong@zeuux.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
c071fcfd |
|
16-Jan-2009 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: fix ioctl arg size (userland incompatible change!) The structure used to send device in btrfs ioctl calls was not properly aligned, and so 32 bit ioctls would not work properly on 64 bit kernels. We could fix this with compat ioctls, but we're just one byte away and it doesn't make sense at this stage to carry about the compat ioctls forever at this stage in the project. This patch brings the ioctl arg up to an evenly aligned 4k. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
1bcbf313 |
|
15-Jan-2009 |
Qinghuang Feng <qhfeng.kernel@gmail.com> |
btrfs & squashfs: Move btrfs and squashfsto's magic number to <linux/magic.h> Use the standard magic.h for btrfs and squashfs. Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com> Cc: Phillip Lougher <phillip@lougher.demon.co.uk> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0176260f |
|
10-Jan-2009 |
Linus Torvalds <torvalds@linux-foundation.org> |
btrfs: fix for write_super_lockfs/unlockfs error handling Commit c4be0c1dc4cdc37b175579be1460f15ac6495e9a added the ability for write_super_lockfs to return errors, and renamed them to match. But btrfs didn't get converted. Do the minimal conversion to make it compile again. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d397712b |
|
05-Jan-2009 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Fix checkpatch.pl warnings There were many, most are fixed now. struct-funcs.c generates some warnings but these are bogus. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
1f483660 |
|
05-Jan-2009 |
Shen Feng <shen@cn.fujitsu.com> |
Btrfs: fix a memory leak in btrfs_get_sb subvol_name should be freed if error occurs. Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
|
#
e441d54d |
|
05-Jan-2009 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: add permission checks to the ioctls Only root can add/remove devices Only root can defrag subtrees Only files open for writing can be defragged Only files open for writing can be the destination for a clone Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
e4404d6e |
|
12-Dec-2008 |
Yan Zheng <zheng.yan@oracle.com> |
Btrfs: shared seed device This patch makes seed device possible to be shared by multiple mounted file systems. The sharing is achieved by cloning seed device's btrfs_fs_devices structure. Thanks you, Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
|
#
97288f2c |
|
02-Dec-2008 |
Christoph Hellwig <hch@lst.de> |
Btrfs: corret fmode_t annotations Make sure to propagate fmode_t properly and use the right constants for it. Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
b2950863 |
|
02-Dec-2008 |
Christoph Hellwig <hch@lst.de> |
Btrfs: make things static and include the right headers Shut up various sparse warnings about symbols that should be either static or have their declarations in scope. Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
4b4e25f2 |
|
20-Nov-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: compat code fixes The btrfs git kernel trees is used to build a standalone tree for compiling against older kernels. This commit makes the standalone tree work with 2.6.27 Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
3de4586c |
|
17-Nov-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Allow subvolumes and snapshots anywhere in the directory tree Before, all snapshots and subvolumes lived in a single flat directory. This was awkward and confusing because the single flat directory was only writable with the ioctls. This commit changes the ioctls to create subvols and snapshots at any point in the directory tree. This requires making separate ioctls for snapshot and subvol creation instead of a combining them into one. The subvol ioctl does: btrfsctl -S subvol_name parent_dir After the ioctl is done subvol_name lives inside parent_dir. The snapshot ioctl does: btrfsctl -s path_for_snapshot root_to_snapshot path_for_snapshot can be an absolute or relative path. btrfsctl breaks it up into directory and basename components. root_to_snapshot can be any file or directory in the FS. The snapshot is taken of the entire root where that file lives. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
2b82032c |
|
17-Nov-2008 |
Yan Zheng <zheng.yan@oracle.com> |
Btrfs: Seed device support Seed device is a special btrfs with SEEDING super flag set and can only be mounted in read-only mode. Seed devices allow people to create new btrfs on top of it. The new FS contains the same contents as the seed device, but it can be mounted in read-write mode. This patch does the following: 1) split code in btrfs_alloc_chunk into two parts. The first part does makes the newly allocated chunk usable, but does not do any operation that modifies the chunk tree. The second part does the the chunk tree modifications. This division is for the bootstrap step of adding storage to the seed device. 2) Update device management code to handle seed device. The basic idea is: For an FS grown from seed devices, its seed devices are put into a list. Seed devices are opened on demand at mounting time. If any seed device is missing or has been changed, btrfs kernel module will refuse to mount the FS. 3) make btrfs_find_block_group not return NULL when all block groups are read-only. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
|
#
c146afad |
|
12-Nov-2008 |
Yan Zheng <zheng.yan@oracle.com> |
Btrfs: mount ro and remount support This patch adds mount ro and remount support. The main changes in patch are: adding btrfs_remount and related helper function; splitting the transaction related code out of close_ctree into btrfs_commit_super; updating allocator to properly handle read only block group. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
|
#
771ed689 |
|
06-Nov-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Optimize compressed writeback and reads When reading compressed extents, try to put pages into the page cache for any pages covered by the compressed extent that readpages didn't already preload. Add an async work queue to handle transformations at delayed allocation processing time. Right now this is just compression. The workflow is: 1) Find offsets in the file marked for delayed allocation 2) Lock the pages 3) Lock the state bits 4) Call the async delalloc code The async delalloc code clears the state lock bits and delalloc bits. It is important this happens before the range goes into the work queue because otherwise it might deadlock with other work queue items that try to lock those extent bits. The file pages are compressed, and if the compression doesn't work the pages are written back directly. An ordered work queue is used to make sure the inodes are written in the same order that pdflush or writepages sent them down. This changes extent_write_cache_pages to let the writepage function update the wbc nr_written count. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
c8b97818 |
|
29-Oct-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add zlib compression support This is a large change for adding compression on reading and writing, both for inline and regular extents. It does some fairly large surgery to the writeback paths. Compression is off by default and enabled by mount -o compress. Even when the -o compress mount option is not used, it is possible to read compressed extents off the disk. If compression for a given set of pages fails to make them smaller, the file is flagged to avoid future compression attempts later. * While finding delalloc extents, the pages are locked before being sent down to the delalloc handler. This allows the delalloc handler to do complex things such as cleaning the pages, marking them writeback and starting IO on their behalf. * Inline extents are inserted at delalloc time now. This allows us to compress the data before inserting the inline extent, and it allows us to insert an inline extent that spans multiple pages. * All of the in-memory extent representations (extent_map.c, ordered-data.c etc) are changed to record both an in-memory size and an on disk size, as well as a flag for compression. From a disk format point of view, the extent pointers in the file are changed to record the on disk size of a given extent and some encoding flags. Space in the disk format is allocated for compression encoding, as well as encryption and a generic 'other' field. Neither the encryption or the 'other' field are currently used. In order to limit the amount of data read for a single random read in the file, the size of a compressed extent is limited to 128k. This is a software only limit, the disk format supports u64 sized compressed extents. In order to limit the ram consumed while processing extents, the uncompressed size of a compressed extent is limited to 256k. This is a software only limit and will be subject to tuning later. Checksumming is still done on compressed extents, and it is done on the uncompressed version of the data. This way additional encodings can be layered on without having to figure out which encoding to checksum. Compression happens at delalloc time, which is basically singled threaded because it is usually done by a single pdflush thread. This makes it tricky to spread the compression load across all the cpus on the box. We'll have to look at parallel pdflush walks of dirty inodes at a later time. Decompression is hooked into readpages and it does spread across CPUs nicely. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
d352ac68 |
|
29-Sep-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: add and improve comments This improves the comments at the top of many functions. It didn't dive into the guts of functions because I was trying to avoid merging problems with the new allocator and back reference work. extent-tree.c and volumes.c were both skipped, and there is definitely more work todo in cleaning and commenting the code. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
2b1f55b0 |
|
24-Sep-2008 |
Chris Mason <chris.mason@oracle.com> |
Remove Btrfs compat code for older kernels Btrfs had compatibility code for kernels back to 2.6.18. These have been removed, and will be maintained in a separate backport git tree from now on. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
76fcef19 |
|
19-Aug-2008 |
David Woodhouse <David.Woodhouse@intel.com> |
Btrfs: Reinstate '-osubvol=.' option to mount entire tree Date: Tue, 19 Aug 2008 16:49:35 +0100 This disappeared when I removed the special case for '.' in btrfs_lookup() Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
32d48fa1 |
|
18-Aug-2008 |
David Woodhouse <David.Woodhouse@intel.com> |
Mask root object ID into f_fsid in btrfs_statfs() Date: Mon, 18 Aug 2008 13:10:20 +0100 This means that subvolumes get a different fsid, and NFS exporting them works properly. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
9d03632e |
|
17-Aug-2008 |
David Woodhouse <David.Woodhouse@intel.com> |
Fill f_fsid field in btrfs_statfs() Date: Mon, 18 Aug 2008 12:01:52 +0100 Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
be6e8dc0 |
|
20-Jul-2008 |
Balaji Rao <balajirrao@gmail.com> |
NFS support for btrfs - v3 Date: Mon, 21 Jul 2008 02:01:56 +0530 Here's an implementation of NFS support for btrfs. It relies on the fixes which are going in to 2.6.28 for the NFS readdir/lookup deadlock. This uses the btrfs_iget helper introduced previously. [dwmw2: Tidy up a little, switch to d_obtain_alias() w/compat routine, change fh_type, store parent's root object ID where needed, fix some get_parent() and fs_to_dentry() bugs] Signed-off-by: Balaji Rao <balajirrao@gmail.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
b48652c1 |
|
04-Aug-2008 |
Yan Zheng <zheng.yan@oracle.com> |
Btrfs: Various small fixes. This trivial patch contains two locking fixes and a off by one fix. --- Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
33268eaf |
|
23-Jul-2008 |
Josef Bacik <jbacik@redhat.com> |
Btrfs: Add ACL support Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
b3c3da71 |
|
22-Jul-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add version strings on module load Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
3f157a2f |
|
25-Jun-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Online btree defragmentation fixes The btree defragger wasn't making forward progress because the new key wasn't being saved by the btrfs_search_forward function. This also disables the automatic btree defrag, it wasn't scaling well to huge filesystems. The auto-defrag needs to be done differently. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
a74a4b97 |
|
25-Jun-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Replace the transaction work queue with kthreads This creates one kthread for commits and one kthread for deleting old snapshots. All the work queues are removed. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
a2135011 |
|
25-Jun-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Replace the big fs_mutex with a collection of other locks Extent alloctions are still protected by a large alloc_mutex. Objectid allocations are covered by a objectid mutex Other btree operations are protected by a lock on individual btree nodes Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
4543df7e |
|
11-Jun-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add a mount option to control worker thread pool size mount -o thread_pool_size changes the default, which is min(num_cpus + 2, 8). Larger thread pools would make more sense on very large disk arrays. This mount option controls the max size of each thread pool. There are multiple thread pools, so the total worker count will be larger than the mount option. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
15ada040 |
|
11-Jun-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Fix mount -o max_inline=0 max_inline=0 used to force the max_inline size to one sector instead. Now it properly disables inline data items, while still being able to read any that happen to exist on disk. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
43e570b0 |
|
10-Jun-2008 |
Christoph Hellwig <hch@lst.de> |
btrfs: allow scanning multiple devices during mount Allows to specify one or multiple device=/dev/foo options during mount so that ioctls on the control device can be avoided. Especially useful when trying to mount a multi-device setup as root. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
edf24abe |
|
10-Jun-2008 |
Christoph Hellwig <hch@lst.de> |
btrfs: sanity mount option parsing and early mount code Also adds lots of comments to describe what's going on here. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
6bf13c0c |
|
10-Jun-2008 |
Sage Weil <sage@newdream.net> |
Btrfs: transaction ioctls These ioctls let a user application hold a transaction open while it performs a series of operations. A final ioctl does a sync on the fs (closing the current transaction). This is the main requirement for Ceph's OSD to be able to keep the data it's storing in a btrfs volume consistent, and AFAICS it works just fine. The application would do something like fd = ::open("some/file", O_RDONLY); ::ioctl(fd, BTRFS_IOC_TRANS_START); /* do a bunch of stuff */ ::ioctl(fd, BTRFS_IOC_TRANS_END); or just ::close(fd); And to ensure it commits to disk, ::ioctl(fd, BTRFS_IOC_SYNC); When a transaction is held open, the trans_handle is attached to the struct file (via private_data) so that it will get cleaned up if the process dies unexpectedly. A held transaction is also ended on fsync() to avoid a deadlock. A misbehaving application could also deliberately hold a transaction open, effectively locking up the FS, so it may make sense to restrict something like this to root or something. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
f819d837 |
|
09-Jun-2008 |
Linda Knippers <linda.knippers@hp.com> |
btrfsctl -A error code fixup Send the error back to userland if the ioctl fails Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
e1b81e67 |
|
27-May-2008 |
Mingming <cmm@us.ibm.com> |
btrfs delete ordered inode handling fix Use btrfs_release_file instead of a put_inode call Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
dfe25020 |
|
13-May-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add mount -o degraded to allow mounts to continue with missing devices Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
a061fc8d |
|
07-May-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add support for online device removal This required a few structural changes to the code that manages bdev pointers: The VFS super block now gets an anon-bdev instead of a pointer to the lowest bdev. This allows us to avoid swapping the super block bdev pointer around at run time. The code to read in the super block no longer goes through the extent buffer interface. Things got ugly keeping the mapping constant. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
788f20eb |
|
28-Apr-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add new ioctl to add devices Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
e58ca020 |
|
01-Apr-2008 |
Yan <yanzheng@21cn.com> |
Fix btrfs_fill_super to return -EINVAL when no FS found Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
8a4b83cc |
|
24-Mar-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add support for device scanning and detection ioctls Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
a9218f6b |
|
24-Mar-2008 |
Chris Mason <chris.mason@oracle.com> |
Add /dev/btrfs-control for device scanning ioctls Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
6885f308 |
|
20-Feb-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Misc 2.6.25 updates Remove the btrfs read_inode method, and use save_mount_options Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
6f568d35 |
|
29-Jan-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: mount -o max_inline=size to control the maximum inline extent size Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
d1310b2e |
|
24-Jan-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Split the extent_map code into two parts There is now extent_map for mapping offsets in the file to disk and extent_io for state tracking, IO submission and extent_bufers. The new extent_map code shifts from [start,end] pairs to [start,len], and pushes the locking out into the caller. This allows a few performance optimizations and is easier to use. A number of extent_map usage bugs were fixed, mostly with failing to remove extent_map entries when changing the file. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
ed0dab6b |
|
21-Jan-2008 |
Yan <yanzheng@21cn.com> |
Btrfs: Add basic lockfs calls Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
e18e4809 |
|
18-Jan-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add mount -o ssd, which includes optimizations for seek free storage Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
2da98f00 |
|
16-Jan-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Run igrab on data=ordered inodes to prevent deadlocks during writeout Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
61295eb8 |
|
14-Jan-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add drop inode func to avoid data=ordered deadlock Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
21ad10cf |
|
09-Jan-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add flush barriers on commit Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
8f662a76 |
|
02-Jan-2008 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add readahead to the online shrinker, and a mount -o alloc_start= for testing Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
edbd8d4e |
|
21-Dec-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Support for online FS resize (grow and shrink) Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
6da6abae |
|
18-Dec-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Back port to 2.6.18-el kernels Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
c59f8951 |
|
17-Dec-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add mount option to enforce a max extent size Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
be20aa9d |
|
17-Dec-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add mount option to turn off data cow A number of workloads do not require copy on write data or checksumming. mount -o nodatasum to disable checksums and -o nodatacow to disable both copy on write and checksumming. In nodatacow mode, copy on write is still performed when a given extent is under snapshot. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
b6cda9bc |
|
14-Dec-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add mount -o nodatasum to turn of file data checksumming Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
2f4cbe64 |
|
19-Nov-2007 |
Wyatt Banks <wyatt@banksresearch.com> |
Btrfs: Return value checking in module init Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
5103e947 |
|
16-Nov-2007 |
Josef Bacik <jbacik@redhat.com> |
xattr support for btrfs Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
3326d1b0 |
|
15-Oct-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Allow tails larger than one page Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
db94535d |
|
15-Oct-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Allow tree blocks larger than the page size Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
5f39d397 |
|
15-Oct-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Create extent_buffer interface for large blocksizes Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
95e05289 |
|
29-Aug-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Use mount -o subvol to select the subvol directory instead of dev: Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
4b82d6e4 |
|
29-Aug-2007 |
Yan <yanzheng@21cn.com> |
Btrfs: Add mount into directory support Modified form of original patch from Christoph Hellwig to make btrfs mount into the default subvolume by default. mount /dev/somedevice:subvolumename to get other subvolumes or mount /dev/somedevice:. to get the root Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
58176a96 |
|
29-Aug-2007 |
Josef Bacik <jbacik@redhat.com> |
Btrfs: Add per-root block accounting and sysfs entries Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
b888db2b |
|
27-Aug-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Add delayed allocation to the extent based page tree code Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
a52d9a80 |
|
27-Aug-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Extent based page cache code. This uses an rbtree of extents and tests instead of buffer heads. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
e9d0b13b |
|
10-Aug-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Btree defrag on the extent-mapping tree as well Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
4b52dff6 |
|
26-Jun-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Fix super block updates during transaction commit The super block written during commit was not consistent with the state of the trees. This change adds an in-memory copy of the super so that we can make sure to write out consistent data during a commit. Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
54aa1f4d |
|
22-Jun-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: Audit callers and return codes to make sure -ENOSPC gets up the stack Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
6cbd5570 |
|
12-Jun-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: add GPLv2 Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
39279cc3 |
|
12-Jun-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: split up super.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
5276aeda |
|
11-Jun-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: fix oops after block group lookup Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
0cf6c620 |
|
09-Jun-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: remove device tree Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
08607c1b |
|
08-Jun-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: add compat ioctl Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
fabb5681 |
|
07-Jun-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: d_type optimization Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
1de037a4 |
|
29-May-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: fixup various fsx failures Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
3a686375 |
|
24-May-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: sparse files! Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
2b8d99a7 |
|
24-May-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: symlinks and hard links Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
e06afa83 |
|
23-May-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: rename Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
f9f3c6b6 |
|
21-May-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: 2.6.21-git fixes Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
de428b63 |
|
18-May-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: allocator optimizations, truncate readahead Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
098f59c2 |
|
11-May-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: patch queue: fix corruption when splitting large items Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
e37c9e69 |
|
09-May-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: many allocator fixes, pretty solid Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
be744175 |
|
06-May-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: more allocator enhancements Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
be08c1b9 |
|
03-May-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: early metadata/data split Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
35b7e476 |
|
02-May-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: fix page cache memory leak Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
090d1875 |
|
01-May-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: directory readahead Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
31f3c99b |
|
30-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: allocator improvements, inode block groups Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
7c4452b9 |
|
28-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: smarter transaction writeback Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
cd1bc465 |
|
27-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: more block allocator work Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
9078a3e1 |
|
26-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: start of block group code Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
f68cad0f |
|
23-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: fixup dirty_inode related deadlocks Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
c62a1920 |
|
23-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: get rid of the extent_item type field Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
b5133862 |
|
24-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: add dirty_inode call Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
5e82849e |
|
23-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: new subvolume oops fix Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
4d775673 |
|
20-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: add owner and type fields to the extents aand block headers Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
8fd17795 |
|
19-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: early fsync support Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
7e38180e |
|
19-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: directory inode index is back Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
236454df |
|
19-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: many file_write fixes, inline data Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
a429e513 |
|
18-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: working file_write, reorganized key flags Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
70b2befd |
|
17-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: rework csums and extent item ordering Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
b18c6685 |
|
17-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: progress on file_write Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
6567e837 |
|
16-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: early work to file_write in big extents Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
b4100d64 |
|
11-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: add a device id to device items Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
8352d8a4 |
|
12-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: add disk ioctl, mostly working Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
7eccb903 |
|
11-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: create a logical->phsyical block number mapping scheme Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
2d13d8d0 |
|
10-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: detect duplicate subvol names Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
2619ba1f |
|
10-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: subvolumes Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
2932f3ec |
|
10-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: when forced to cow for file_write, get the page uptodate first Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
79b2cb1f |
|
10-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: only cow in get_block when create==1 Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
48ddc6f4 |
|
10-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: cow file extents before writing Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
1b05da2e |
|
09-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: drop the inode map tree Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
c5739bba |
|
10-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: snapshot progress Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
0f7d52f4 |
|
09-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: groundwork for subvolume and snapshot roots Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
d6e4a428 |
|
06-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: start of support for many FS volumes Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
3eb0314d |
|
05-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: uuids Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
5be6f7f1 |
|
05-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: dirindex optimizations Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
7fcde0e3 |
|
04-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: finish off inode indexing in dirs, add overflows Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
5f26f772 |
|
05-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: more inode indexed directory work Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
bae45de0 |
|
04-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: add dir inode index Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
e8f05c45 |
|
04-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: disable inline data code for now Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
d4dbff95 |
|
04-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: support for items bigger than 1/2 the blocksize Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
df24a2b9 |
|
04-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: early inline file data code Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
2da566ed |
|
02-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: csum_verify_file_block locking fix Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
7cfcc17e |
|
02-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: corruptions fixed Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
5caf2a00 |
|
02-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: dynamic allocation of path struct Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
2c90e5d6 |
|
02-Apr-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: still corruption hunting Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
d6025579 |
|
30-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: corruption hunt continues Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
22b0ebda |
|
30-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: hunting slab corruption Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
f254e52c |
|
29-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: verify csums on read Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
75dfe396 |
|
29-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
btrfs_file_write -- first pass Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
87cbda5c |
|
28-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: sha256 csums on metadata Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
d98237b3 |
|
28-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: use a btree inode instead of sb_getblk Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
5f443fd2 |
|
27-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
btrfs_rmdir Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
9773a788 |
|
27-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: byte offsets for file keys Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
f4b9aa8d |
|
27-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
btrfs_truncate Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
6407bf6d |
|
27-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: reference counts on data extents Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
dee26a9f |
|
26-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
btrfs_get_block, file read/write Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
4730a4bc |
|
25-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
btrfs_dirty_inode Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
f7922033 |
|
25-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
btrfs_mkdir Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
dcea7915 |
|
25-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: very simple readdir readahead Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
134e9731 |
|
25-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: unlink and delete_inode Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
78fae27e |
|
25-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: leak fixes, pinning fixes Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
d561c025 |
|
23-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: very minimal locking Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
7f5c1516 |
|
23-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
Add generation number to btrfs_header, readdir fixes, hash collision fixes Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
df2ce34c |
|
23-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: properly set new buffers for new blocks up to date Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
d5719762 |
|
23-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
btrfs_create, btrfs_write_super, btrfs_sync_fs Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
79154b1b |
|
22-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: transaction rework Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
e20d96d6 |
|
21-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
Mountable btrfs, with readdir Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
#
2e635a27 |
|
21-Mar-2007 |
Chris Mason <chris.mason@oracle.com> |
Btrfs: initial move to kernel module land Signed-off-by: Chris Mason <chris.mason@oracle.com>
|