#
0e24ec3c |
|
22-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: remember sick inodes that get inactivated If an unhealthy inode gets inactivated, remember this fact in the per-fs health summary. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
baf44fa5 |
|
22-Feb-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: report inode corruption errors to the health system Whenever we encounter corrupt inode records, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
038ca189 |
|
09-Nov-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: inode recovery does not validate the recovered inode Discovered when trying to track down a weird recovery corruption issue that wasn't detected at recovery time. The specific corruption was a zero extent count field when big extent counts are in use, and it turns out the dinode verifier doesn't detect that specific corruption case, either. So fix it too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
75d1e312 |
|
04-Oct-2023 |
Jeff Layton <jlayton@kernel.org> |
xfs: convert to new timestamp accessors Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-75-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
a0a415e3 |
|
05-Jul-2023 |
Jeff Layton <jlayton@kernel.org> |
xfs: convert to ctime accessor functions In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-80-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
4fcc94d6 |
|
13-Jul-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: track the iunlink list pointer in the xfs_inode Having direct access to the i_next_unlinked pointer in unlinked inodes greatly simplifies the processing of inodes on the unlinked list. We no longer need to look up the inode buffer just to find next inode in the list if the xfs_inode is in memory. These improvements will be realised over upcoming patches as other dependencies on the inode buffer for unlinked list processing are removed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
e45d7cb2 |
|
09-Jul-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: use XFS_IFORK_Q to determine the presence of an xattr fork Modify xfs_ifork_ptr to return a NULL pointer if the caller asks for the attribute fork but i_forkoff is zero. This eliminates the ambiguity between i_forkoff and i_af.if_present, which should make it easier to understand the lifetime of attr forks. While we're at it, remove the if_present checks around calls to xfs_idestroy_fork and xfs_ifork_zap_attr since they can both handle attr forks that have already been torn down. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
2ed5b09b |
|
09-Jul-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: make inode attribute forks a permanent part of struct xfs_inode Syzkaller reported a UAF bug a while back: ================================================================== BUG: KASAN: use-after-free in xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127 Read of size 4 at addr ffff88802cec919c by task syz-executor262/2958 CPU: 2 PID: 2958 Comm: syz-executor262 Not tainted 5.15.0-0.30.3-20220406_1406 #3 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x82/0xa9 lib/dump_stack.c:106 print_address_description.constprop.9+0x21/0x2d5 mm/kasan/report.c:256 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold.14+0x7f/0x11b mm/kasan/report.c:459 xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127 xfs_attr_get+0x378/0x4c2 fs/xfs/libxfs/xfs_attr.c:159 xfs_xattr_get+0xe3/0x150 fs/xfs/xfs_xattr.c:36 __vfs_getxattr+0xdf/0x13d fs/xattr.c:399 cap_inode_need_killpriv+0x41/0x5d security/commoncap.c:300 security_inode_need_killpriv+0x4c/0x97 security/security.c:1408 dentry_needs_remove_privs.part.28+0x21/0x63 fs/inode.c:1912 dentry_needs_remove_privs+0x80/0x9e fs/inode.c:1908 do_truncate+0xc3/0x1e0 fs/open.c:56 handle_truncate fs/namei.c:3084 [inline] do_open fs/namei.c:3432 [inline] path_openat+0x30ab/0x396d fs/namei.c:3561 do_filp_open+0x1c4/0x290 fs/namei.c:3588 do_sys_openat2+0x60d/0x98c fs/open.c:1212 do_sys_open+0xcf/0x13c fs/open.c:1228 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0x0 RIP: 0033:0x7f7ef4bb753d Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1b 79 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007f7ef52c2ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000055 RAX: ffffffffffffffda RBX: 0000000000404148 RCX: 00007f7ef4bb753d RDX: 00007f7ef4bb753d RSI: 0000000000000000 RDI: 0000000020004fc0 RBP: 0000000000404140 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0030656c69662f2e R13: 00007ffd794db37f R14: 00007ffd794db470 R15: 00007f7ef52c2fc0 </TASK> Allocated by task 2953: kasan_save_stack+0x19/0x38 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:434 [inline] __kasan_slab_alloc+0x68/0x7c mm/kasan/common.c:467 kasan_slab_alloc include/linux/kasan.h:254 [inline] slab_post_alloc_hook mm/slab.h:519 [inline] slab_alloc_node mm/slub.c:3213 [inline] slab_alloc mm/slub.c:3221 [inline] kmem_cache_alloc+0x11b/0x3eb mm/slub.c:3226 kmem_cache_zalloc include/linux/slab.h:711 [inline] xfs_ifork_alloc+0x25/0xa2 fs/xfs/libxfs/xfs_inode_fork.c:287 xfs_bmap_add_attrfork+0x3f2/0x9b1 fs/xfs/libxfs/xfs_bmap.c:1098 xfs_attr_set+0xe38/0x12a7 fs/xfs/libxfs/xfs_attr.c:746 xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59 __vfs_setxattr+0x11b/0x177 fs/xattr.c:180 __vfs_setxattr_noperm+0x128/0x5e0 fs/xattr.c:214 __vfs_setxattr_locked+0x1d4/0x258 fs/xattr.c:275 vfs_setxattr+0x154/0x33d fs/xattr.c:301 setxattr+0x216/0x29f fs/xattr.c:575 __do_sys_fsetxattr fs/xattr.c:632 [inline] __se_sys_fsetxattr fs/xattr.c:621 [inline] __x64_sys_fsetxattr+0x243/0x2fe fs/xattr.c:621 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0x0 Freed by task 2949: kasan_save_stack+0x19/0x38 mm/kasan/common.c:38 kasan_set_track+0x1c/0x21 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360 ____kasan_slab_free mm/kasan/common.c:366 [inline] ____kasan_slab_free mm/kasan/common.c:328 [inline] __kasan_slab_free+0xe2/0x10e mm/kasan/common.c:374 kasan_slab_free include/linux/kasan.h:230 [inline] slab_free_hook mm/slub.c:1700 [inline] slab_free_freelist_hook mm/slub.c:1726 [inline] slab_free mm/slub.c:3492 [inline] kmem_cache_free+0xdc/0x3ce mm/slub.c:3508 xfs_attr_fork_remove+0x8d/0x132 fs/xfs/libxfs/xfs_attr_leaf.c:773 xfs_attr_sf_removename+0x5dd/0x6cb fs/xfs/libxfs/xfs_attr_leaf.c:822 xfs_attr_remove_iter+0x68c/0x805 fs/xfs/libxfs/xfs_attr.c:1413 xfs_attr_remove_args+0xb1/0x10d fs/xfs/libxfs/xfs_attr.c:684 xfs_attr_set+0xf1e/0x12a7 fs/xfs/libxfs/xfs_attr.c:802 xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59 __vfs_removexattr+0x106/0x16a fs/xattr.c:468 cap_inode_killpriv+0x24/0x47 security/commoncap.c:324 security_inode_killpriv+0x54/0xa1 security/security.c:1414 setattr_prepare+0x1a6/0x897 fs/attr.c:146 xfs_vn_change_ok+0x111/0x15e fs/xfs/xfs_iops.c:682 xfs_vn_setattr_size+0x5f/0x15a fs/xfs/xfs_iops.c:1065 xfs_vn_setattr+0x125/0x2ad fs/xfs/xfs_iops.c:1093 notify_change+0xae5/0x10a1 fs/attr.c:410 do_truncate+0x134/0x1e0 fs/open.c:64 handle_truncate fs/namei.c:3084 [inline] do_open fs/namei.c:3432 [inline] path_openat+0x30ab/0x396d fs/namei.c:3561 do_filp_open+0x1c4/0x290 fs/namei.c:3588 do_sys_openat2+0x60d/0x98c fs/open.c:1212 do_sys_open+0xcf/0x13c fs/open.c:1228 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0x0 The buggy address belongs to the object at ffff88802cec9188 which belongs to the cache xfs_ifork of size 40 The buggy address is located 20 bytes inside of 40-byte region [ffff88802cec9188, ffff88802cec91b0) The buggy address belongs to the page: page:00000000c3af36a1 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2cec9 flags: 0xfffffc0000200(slab|node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0000200 ffffea00009d2580 0000000600000006 ffff88801a9ffc80 raw: 0000000000000000 0000000080490049 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88802cec9080: fb fb fb fc fc fa fb fb fb fb fc fc fb fb fb fb ffff88802cec9100: fb fc fc fb fb fb fb fb fc fc fb fb fb fb fb fc >ffff88802cec9180: fc fa fb fb fb fb fc fc fa fb fb fb fb fc fc fb ^ ffff88802cec9200: fb fb fb fb fc fc fb fb fb fb fb fc fc fb fb fb ffff88802cec9280: fb fb fc fc fa fb fb fb fb fc fc fa fb fb fb fb ================================================================== The root cause of this bug is the unlocked access to xfs_inode.i_afp from the getxattr code paths while trying to determine which ILOCK mode to use to stabilize the xattr data. Unfortunately, the VFS does not acquire i_rwsem when vfs_getxattr (or listxattr) call into the filesystem, which means that getxattr can race with a removexattr that's tearing down the attr fork and crash: xfs_attr_set: xfs_attr_get: xfs_attr_fork_remove: xfs_ilock_attr_map_shared: xfs_idestroy_fork(ip->i_afp); kmem_cache_free(xfs_ifork_cache, ip->i_afp); if (ip->i_afp && ip->i_afp = NULL; xfs_need_iread_extents(ip->i_afp)) <KABOOM> ip->i_forkoff = 0; Regrettably, the VFS is much more lax about i_rwsem and getxattr than is immediately obvious -- not only does it not guarantee that we hold i_rwsem, it actually doesn't guarantee that we *don't* hold it either. The getxattr system call won't acquire the lock before calling XFS, but the file capabilities code calls getxattr with and without i_rwsem held to determine if the "security.capabilities" xattr is set on the file. Fixing the VFS locking requires a treewide investigation into every code path that could touch an xattr and what i_rwsem state it expects or sets up. That could take years or even prove impossible; fortunately, we can fix this UAF problem inside XFS. An earlier version of this patch used smp_wmb in xfs_attr_fork_remove to ensure that i_forkoff is always zeroed before i_afp is set to null and changed the read paths to use smp_rmb before accessing i_forkoff and i_afp, which avoided these UAF problems. However, the patch author was too busy dealing with other problems in the meantime, and by the time he came back to this issue, the situation had changed a bit. On a modern system with selinux, each inode will always have at least one xattr for the selinux label, so it doesn't make much sense to keep incurring the extra pointer dereference. Furthermore, Allison's upcoming parent pointer patchset will also cause nearly every inode in the filesystem to have extended attributes. Therefore, make the inode attribute fork structure part of struct xfs_inode, at a cost of 40 more bytes. This patch adds a clunky if_present field where necessary to maintain the existing logic of xattr fork null pointer testing in the existing codebase. The next patch switches the logic over to XFS_IFORK_Q and it all goes away. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
2d6ca832 |
|
07-Jul-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: Pre-calculate per-AG agino geometry There is a lot of overhead in functions like xfs_verify_agino() that repeatedly calculate the geometry limits of an AG. These can be pre-calculated as they are static and the verification context has a per-ag context it can quickly reference. In the case of xfs_verify_agino(), we now always have a perag context handy, so we can store the minimum and maximum agino values in the AG in the perag. This means we don't have to calculate it on every call and it can be inlined in callers if we move it to xfs_ag.h. xfs_verify_agino_or_null() gets the same perag treatment. xfs_agino_range() is moved to xfs_ag.c as it's not really a type function, and it's use is largely restricted as the first and last aginos can be grabbed straight from the perag in most cases. Note that we leave the original xfs_verify_agino in place in xfs_types.c as a static function as other callers in that file do not have per-ag contexts so still need to go the long way. It's been renamed to xfs_verify_agno_agino() to indicate it takes both an agno and an agino to differentiate it from new function. $ size --totals fs/xfs/built-in.a text data bss dec hex filename before 1482185 329588 572 1812345 1ba779 (TOTALS) after 1481937 329588 572 1812097 1ba681 (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
1eb70f54 |
|
03-May-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: validate inode fork size against fork format xfs_repair catches fork size/format mismatches, but the in-kernel verifier doesn't, leading to null pointer failures when attempting to perform operations on the fork. This can occur in the xfs_dir_is_empty() where the in-memory fork format does not match the size and so the fork data pointer is accessed incorrectly. Note: this causes new failures in xfs/348 which is testing mode vs ftype mismatches. We now detect a regular file that has been changed to a directory or symlink mode as being corrupt because the data fork is for a symlink or directory should be in local form when there are only 3 bytes of data in the data fork. Hence the inode verify for the regular file now fires w/ -EFSCORRUPTED because the inode fork format does not match the format the corrupted mode says it should be in. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
83a21c18 |
|
29-Mar-2022 |
Chandan Babu R <chandan.babu@oracle.com> |
xfs: Directory's data fork extent counter can never overflow The maximum file size that can be represented by the data fork extent counter in the worst case occurs when all extents are 1 block in length and each block is 1KB in size. With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with 1KB sized blocks, a file can reach upto, (2^31) * 1KB = 2TB This is much larger than the theoretical maximum size of a directory i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB. Since a directory's inode can never overflow its data fork extent counter, this commit removes all the overflow checks associated with it. xfs_dinode_verify() now performs a rough check to verify if a diretory's data fork is larger than 96GB. Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
52a4a148 |
|
08-Mar-2022 |
Chandan Babu R <chandan.babu@oracle.com> |
xfs: Introduce per-inode 64-bit extent counters This commit introduces new fields in the on-disk inode format to support 64-bit data fork extent counters and 32-bit attribute fork extent counters. The new fields will be used only when an inode has XFS_DIFLAG2_NREXT64 flag set. Otherwise we continue to use the regular 32-bit data fork extent counters and 16-bit attribute fork extent counters. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Suggested-by: Dave Chinner <dchinner@redhat.com>
|
#
df9ad5cc |
|
16-Nov-2021 |
Chandan Babu R <chandan.babu@oracle.com> |
xfs: Introduce macros to represent new maximum extent counts for data/attr forks This commit defines new macros to represent maximum extent counts allowed by filesystems which have support for large per-inode extent counters. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
dd95a6ce |
|
27-Aug-2020 |
Chandan Babu R <chandan.babu@oracle.com> |
xfs: Introduce xfs_dfork_nextents() helper This commit replaces the macro XFS_DFORK_NEXTENTS() with the helper function xfs_dfork_nextents(). As of this commit, xfs_dfork_nextents() returns the same value as XFS_DFORK_NEXTENTS(). A future commit which extends inode's extent counter fields will add more logic to this helper. This commit also replaces direct accesses to xfs_dinode->di_[a]nextents with calls to xfs_dfork_nextents(). No functional changes have been made. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
bb1d5049 |
|
25-Feb-2021 |
Chandan Babu R <chandan.babu@oracle.com> |
xfs: Use xfs_extnum_t instead of basic data types xfs_extnum_t is the type to use to declare variables which have values obtained from xfs_dinode->di_[a]nextents. This commit replaces basic types (e.g. uint32_t) with xfs_extnum_t for such variables. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
9feb8f19 |
|
27-Aug-2020 |
Chandan Babu R <chandan.babu@oracle.com> |
xfs: Introduce xfs_iext_max_nextents() helper xfs_iext_max_nextents() returns the maximum number of extents possible for one of data, cow or attribute fork. This helper will be extended further in a future commit when maximum extent counts associated with data/attribute forks are increased. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
95f0b95e |
|
08-Aug-2021 |
Chandan Babu R <chandan.babu@oracle.com> |
xfs: Define max extent length based on on-disk format definition The maximum extent length depends on maximum block count that can be stored in a BMBT record. Hence this commit defines MAXEXTLEN based on BMBT_BLOCKCOUNT_BITLEN. While at it, the commit also renames MAXEXTLEN to XFS_MAX_BMBT_EXTLEN. Suggested-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
de38db72 |
|
11-Oct-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the xfs_dinode_t typedef Remove the few leftover instances of the xfs_dinode_t typedef. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
9343ee76 |
|
18-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert bp->b_bn references to xfs_buf_daddr() Stop directly referencing b_bn in code outside the buffer cache, as b_bn is supposed to be used only as an internal cache index. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
04fcad80 |
|
18-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: introduce xfs_buf_daddr() Introduce a helper function xfs_buf_daddr() to extract the disk address of the buffer from the struct xfs_buf. This will replace direct accesses to bp->b_bn and bp->b_maps[0].bm_bn, as well as the XFS_BUF_ADDR() macro. This patch introduces the helper function and replaces all uses of XFS_BUF_ADDR() as this is just a simple sed replacement. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
cf28e17c |
|
18-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: kill xfs_sb_version_has_v3inode() All callers to xfs_dinode_good_version() and XFS_DINODE_SIZE() in both the kernel and userspace have a xfs_mount structure available which means they can use mount features checks instead looking directly are the superblock. Convert these functions to take a mount and use a xfs_has_v3inodes() check and move it out of the libxfs/xfs_format.h file as it really doesn't have anything to do with the definition of the on-disk format. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
ebd9027d |
|
18-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert xfs_sb_version_has checks to use mount features This is a conversion of the remaining xfs_sb_version_has..(sbp) checks to use xfs_has_..(mp) feature checks. This was largely done with a vim replacement macro that did: :0,$s/xfs_sb_version_has\(.*\)&\(.*\)->m_sb/xfs_has_\1\2/g<CR> A couple of other variants were also used, and the rest touched up by hand. $ size -t fs/xfs/built-in.a text data bss dec hex filename before 1127533 311352 484 1439369 15f689 (TOTALS) after 1125360 311352 484 1437196 15ee0c (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
38c26bfd |
|
18-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: replace xfs_sb_version checks with feature flag checks Convert the xfs_sb_version_hasfoo() to checks against mp->m_features. Checks of the superblock itself during disk operations (e.g. in the read/write verifiers and the to/from disk formatters) are not converted - they operate purely on the superblock state. Everything else should use the mount features. Large parts of this conversion were done with sed with commands like this: for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f done With manual cleanups for things like "xfs_has_extflgbit" and other little inconsistencies in naming. The result is ia lot less typing to check features and an XFS binary size reduced by a bit over 3kB: $ size -t fs/xfs/built-in.a text data bss dec hex filenam before 1130866 311352 484 1442702 16038e (TOTALS) after 1127727 311352 484 1439563 15f74b (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
83193e5e |
|
12-Jul-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: correct the narrative around misaligned rtinherit/extszinherit dirs While auditing the realtime growfs code, I realized that the GROWFSRT ioctl (and by extension xfs_growfs) has always allowed sysadmins to change the realtime extent size when adding a realtime section to the filesystem. Since we also have always allowed sysadmins to set RTINHERIT and EXTSZINHERIT on directories even if there is no realtime device, this invalidates the premise laid out in the comments added in commit 603f000b15f2. In other words, this is not a case of inadequate metadata validation. This is a case of nearly forgotten (and apparently untested) but supported functionality. Update the comments to reflect what we've learned, and remove the log message about correcting the misalignment. Fixes: 603f000b15f2 ("xfs: validate extsz hints against rt extent size when rtinherit is set") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
a7bcb147 |
|
31-May-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: clean up open-coded fs block unit conversions Replace some open-coded fs block unit conversions with the standard conversion macro. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
#
603f000b |
|
12-May-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: validate extsz hints against rt extent size when rtinherit is set The RTINHERIT bit can be set on a directory so that newly created regular files will have the REALTIME bit set to store their data on the realtime volume. If an extent size hint (and EXTSZINHERIT) are set on the directory, the hint will also be copied into the new file. As pointed out in previous patches, for realtime files we require the extent size hint be an integer multiple of the realtime extent, but we don't perform the same validation on a directory with both RTINHERIT and EXTSZINHERIT set, even though the only use-case of that combination is to propagate extent size hints into new realtime files. This leads to inode corruption errors when the bad values are propagated. Because there may be existing filesystems with such a configuration, we cannot simply amend the inode verifier to trip on these directories and call it a day because that will cause previously "working" filesystems to start throwing errors abruptly. Note that it's valid to have directories with rtinherit set even if there is no realtime volume, in which case the problem does not manifest because rtinherit is ignored if there's no realtime device; and it's possible that someone set the flag, crashed, repaired the filesystem (which clears the hint on the realtime file) and continued. Therefore, mitigate this issue in several ways: First, if we try to write out an inode with both rtinherit/extszinherit set and an unaligned extent size hint, turn off the hint to correct the error. Second, if someone tries to misconfigure a directory via the fssetxattr ioctl, fail the ioctl. Third, reverify both extent size hint values when we propagate heritable inode attributes from parent to child, to prevent misconfigurations from spreading. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
6b69e485 |
|
12-May-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: standardize extent size hint validation While chasing a bug involving invalid extent size hints being propagated into newly created realtime files, I noticed that the xfs_ioctl_setattr checks for the extent size hints weren't the same as the ones now encoded in libxfs and used for validation in repair and mkfs. Because the checks in libxfs are more stringent than the ones in the ioctl, it's possible for a live system to set inode flags that immediately result in corruption warnings. Specifically, it's possible to set an extent size hint on an rtinherit directory without checking if the hint is aligned to the realtime extent size, which makes no sense since that combination is used only to seed new realtime files. Replace the open-coded and inadequate checks with the libxfs verifier versions and update the code comments a bit. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
e98d5e88 |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: move the di_crtime field to struct xfs_inode Move the crtime field from struct xfs_icdinode into stuct xfs_inode and remove the now entirely unused struct xfs_icdinode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
3e09ab8f |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: move the di_flags2 field to struct xfs_inode In preparation of removing the historic icinode struct, move the flags2 field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
db07349d |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: move the di_flags field to struct xfs_inode In preparation of removing the historic icinode struct, move the flags field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
7821ea30 |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: move the di_forkoff field to struct xfs_inode In preparation of removing the historic icinode struct, move the forkoff field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
ee7b83fd |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: use a union for i_cowextsize and i_flushiter The i_cowextsize field is only used for v3 inodes, and the i_flushiter field is only used for v1/v2 inodes. Use a union to pack the inode a littler better after adding a few missing guards around their usage. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
965e0a1a |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: move the di_flushiter field to struct xfs_inode In preparation of removing the historic icinode struct, move the flushiter field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
b33ce57d |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: move the di_cowextsize field to struct xfs_inode In preparation of removing the historic icinode struct, move the cowextsize field into the containing xfs_inode structure. Also switch to use the xfs_extlen_t instead of a uint32_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
031474c2 |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: move the di_extsize field to struct xfs_inode In preparation of removing the historic icinode struct, move the extsize field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
6e73a545 |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: move the di_nblocks field to struct xfs_inode In preparation of removing the historic icinode struct, move the nblocks field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
13d2c10b |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: move the di_size field to struct xfs_inode In preparation of removing the historic icinode struct, move the on-disk size field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
ceaf603c |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: move the di_projid field to struct xfs_inode In preparation of removing the historic icinode struct, move the projid field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
9b3beb02 |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the di_dmevmask and di_dmstate fields from struct xfs_icdinode The legacy DMAPI fields were never set by upstream Linux XFS, and have no way to be read using the kernel APIs. So instead of bloating the in-core inode for them just copy them from the on-disk inode into the log when logging the inode. The only caveat is that we need to make sure to zero the fields for newly read or deleted inodes, which is solved using a new flag in the inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
af9dcdde |
|
29-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: split xfs_imap_to_bp Split looking up the dinode from xfs_imap_to_bp, which can be significantly simplified as a result. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
207ddc0e |
|
08-Dec-2020 |
Eric Sandeen <sandeen@redhat.com> |
xfs: don't catch dax+reflink inodes as corruption in verifier We don't yet support dax on reflinked files, but that is in the works. Further, having the flag set does not automatically mean that the inode is actually "in the CPU direct access state," which depends on several other conditions in addition to the flag being set. As such, we should not catch this as corruption in the verifier - simply not actually enabling S_DAX on reflinked files is enough for now. Fixes: 4f435ebe7d04 ("xfs: don't mix reflink and DAX mode for now") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [darrick: fix the scrubber too] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f93e5436 |
|
17-Aug-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: widen ondisk inode timestamps to deal with y2038+ Redesign the ondisk inode timestamps to be a simple unsigned 64-bit counter of nanoseconds since 14 Dec 1901 (i.e. the minimum time in the 32-bit unix time epoch). This enables us to handle dates up to 2486, which solves the y2038 problem. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
5a0bb066 |
|
24-Aug-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: redefine xfs_timestamp_t Redefine xfs_timestamp_t as a __be64 typedef in preparation for the bigtime functionality. Preserve the legacy structure format so that we can let the compiler take care of masking and shifting. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
88947ea0 |
|
17-Aug-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: move xfs_log_dinode_to_disk to the log recovery code Move this function to xfs_inode_item_recover.c since there's only one caller of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
e2705b03 |
|
29-Jun-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove xfs_inobp_check() This debug code is called on every xfs_iflush() call, which then checks every inode in the buffer for non-zero unlinked list field. Hence it checks every inode in the cluster buffer every time a single inode on that cluster it flushed. This is resulting in: - 38.91% 5.33% [kernel] [k] xfs_iflush - 17.70% xfs_iflush - 9.93% xfs_inobp_check 4.36% xfs_buf_offset 10% of the CPU time spent flushing inodes is repeatedly checking unlinked fields in the buffer. We don't need to do this. The other place we call xfs_inobp_check() is xfs_iunlink_update_dinode(), and this is after we've done this assert for the agino we are about to write into that inode: ASSERT(xfs_verify_agino_or_null(mp, agno, next_agino)); which means we've already checked that the agino we are about to write is not 0 on debug kernels. The inode buffer verifiers do everything else we need, so let's just remove this debug code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
298f7bec |
|
29-Jun-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: pin inode backing buffer to the inode log item When we dirty an inode, we are going to have to write it disk at some point in the near future. This requires the inode cluster backing buffer to be present in memory. Unfortunately, under severe memory pressure we can reclaim the inode backing buffer while the inode is dirty in memory, resulting in stalling the AIL pushing because it has to do a read-modify-write cycle on the cluster buffer. When we have no memory available, the read of the cluster buffer blocks the AIL pushing process, and this causes all sorts of issues for memory reclaim as it requires inode writeback to make forwards progress. Allocating a cluster buffer causes more memory pressure, and results in more cluster buffers to be reclaimed, resulting in more RMW cycles to be done in the AIL context and everything then backs up on AIL progress. Only the synchronous inode cluster writeback in the the inode reclaim code provides some level of forwards progress guarantees that prevent OOM-killer rampages in this situation. Fix this by pinning the inode backing buffer to the inode log item when the inode is first dirtied (i.e. in xfs_trans_log_inode()). This may mean the first modification of an inode that has been held in cache for a long time may block on a cluster buffer read, but we can do that in transaction context and block safely until the buffer has been allocated and read. Once we have the cluster buffer, the inode log item takes a reference to it, pinning it in memory, and attaches it to the log item for future reference. This means we can always grab the cluster buffer from the inode log item when we need it. When the inode is finally cleaned and removed from the AIL, we can drop the reference the inode log item holds on the cluster buffer. Once all inodes on the cluster buffer are clean, the cluster buffer will be unpinned and it will be available for memory reclaim to reclaim again. This avoids the issues with needing to do RMW cycles in the AIL pushing context, and hence allows complete non-blocking inode flushing to be performed by the AIL pushing context. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
06734e3c |
|
29-Jun-2020 |
Keyur Patel <iamkeyur96@gmail.com> |
xfs: Couple of typo fixes in comments ./xfs/libxfs/xfs_inode_buf.c:56: unnecssary ==> unnecessary ./xfs/libxfs/xfs_inode_buf.c:59: behavour ==> behaviour ./xfs/libxfs/xfs_inode_buf.c:206: unitialized ==> uninitialized Signed-off-by: Keyur Patel <iamkeyur96@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
ef838512 |
|
18-May-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: cleanup xfs_idestroy_fork Move freeing the dynamically allocated attr and COW fork, as well as zeroing the pointers where actually needed into the callers, and just pass the xfs_ifork structure to xfs_idestroy_fork. Also simplify the kmem_free calls by not checking for NULL first. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f7e67b20 |
|
18-May-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: move the fork format fields into struct xfs_ifork Both the data and attr fork have a format that is stored in the legacy idinode. Move it into the xfs_ifork structure instead, where it uses up padding. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
daf83964 |
|
18-May-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: move the per-fork nextents fields into struct xfs_ifork There are there are three extents counters per inode, one for each of the forks. Two are in the legacy icdinode and one is directly in struct xfs_inode. Switch to a single counter in the xfs_ifork structure where it uses up padding at the end of the structure. This simplifies various bits of code that just wants the number of extents counter and can now directly dereference it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
09c38edd |
|
18-May-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the XFS_DFORK_Q macro Just checking di_forkoff directly is a little easier to follow. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
bb8a66af |
|
14-May-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: remove xfs_iread There is not much point in the xfs_iread function, as it has a single caller and not a whole lot of code. Move it into the only caller, and trim down the overdocumentation to just documenting the important "why" instead of a lot of redundant "what". Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
7f029012 |
|
14-May-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: don't reset i_delayed_blks in xfs_iread i_delayed_blks is set to 0 in xfs_inode_alloc and can't have anything assigned to it until the inode is visible to the VFS. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
2d6051d4 |
|
14-May-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: call xfs_dinode_verify from xfs_inode_from_disk Keep the code dealing with the dinode together, and also ensure we verify the dinode in the owner change log recovery case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0bce8173 |
|
14-May-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: handle unallocated inodes in xfs_inode_from_disk Handle inodes with a 0 di_mode in xfs_inode_from_disk, instead of partially duplicating inode reading in xfs_iread. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
9229d18e |
|
14-May-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: split xfs_iformat_fork xfs_iformat_fork is a weird catchall. Split it into one helper for the data fork and one for the attr fork, and then call both helper as well as the COW fork initialization from xfs_inode_from_disk. Order the COW fork initialization after the attr fork initialization given that it can't fail to simplify the error handling. Note that the newly split helpers are moved down the file in xfs_inode_fork.c to avoid the need for forward declarations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
cb7d5859 |
|
14-May-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: call xfs_iformat_fork from xfs_inode_from_disk We always need to fill out the fork structures when reading the inode, so call xfs_iformat_fork from the tail of xfs_inode_from_disk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
c1995079 |
|
06-May-2020 |
Brian Foster <bfoster@redhat.com> |
xfs: remove unused iget_flags param from xfs_imap_to_bp() iget_flags is unused in xfs_imap_to_bp(). Remove the parameter and fix up the callers. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
15fab3b9 |
|
06-May-2020 |
Brian Foster <bfoster@redhat.com> |
xfs: remove unnecessary shutdown check from xfs_iflush() The shutdown check in xfs_iflush() duplicates checks down in the buffer code. If the fs is shut down, xfs_trans_read_buf_map() always returns an error and falls into the same error path. Remove the unnecessary check along with the warning in xfs_imap_to_bp() that generates excessive noise in the log if the fs is shut down. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
6471e9c5 |
|
18-Mar-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the di_version field from struct icdinode We know the version is 3 if on a v5 file system. For earlier file systems formats we always upgrade the remaining v1 inodes to v2 and thus only use v2 inodes. Use the xfs_sb_version_has_large_dinode helper to check if we deal with small or large dinodes, and thus remove the need for the di_version field in struct icdinode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e9e2eae8 |
|
18-Mar-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: only check the superblock version for dinode size calculation The size of the dinode structure is only dependent on the file system version, so instead of checking the individual inode version just use the newly added xfs_sb_version_has_large_dinode helper, and simplify various calling conventions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
b81b79f4 |
|
18-Mar-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: add a new xfs_sb_version_has_v3inode helper Add a new wrapper to check if a file system supports the v3 inode format with a larger dinode core. Previously we used xfs_sb_version_hascrc for that, which is technically correct but a little confusing to read. Also move xfs_dinode_good_version next to xfs_sb_version_has_v3inode so that we have one place that documents the superblock version to inode version relationship. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
ba8adad5 |
|
21-Feb-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the kuid/kgid conversion wrappers Remove the XFS wrappers for converting from and to the kuid/kgid types. Mostly this means switching to VFS i_{u,g}id_{read,write} helpers, but in a few spots the calls to the conversion functions is open coded. To match the use of sb->s_user_ns in the helpers and other file systems, sb->s_user_ns is also used in the quota code. The ACL code already does the conversion in a grotty layering violation in the VFS xattr code, so it keeps using init_user_ns for the identity mapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
54295159 |
|
21-Feb-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the icdinode di_uid/di_gid members Use the Linux inode i_uid/i_gid members everywhere and just convert from/to the scalar value when reading or writing the on-disk inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
3d8f2821 |
|
21-Feb-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: ensure that the inode uid/gid match values match the icdinode ones Instead of only synchronizing the uid/gid values in xfs_setup_inode, ensure that they always match to prepare for removing the icdinode fields. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
048a35d2 |
|
12-Nov-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: don't reset the "inode core" in xfs_iread We have the exact same memset in xfs_inode_alloc, which is always called just before xfs_iread. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
de7a866f |
|
12-Nov-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: merge the projid fields in struct xfs_icdinode There is no point in splitting the fields like this in an purely in-memory structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
8d2d878d |
|
12-Nov-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: use a struct timespec64 for the in-core crtime struct xfs_icdinode is purely an in-memory data structure, so don't use a log on-disk structure for it. This simplifies the code a bit, and also reduces our include hell slightly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix a minor indenting problem in xfs_trans_ichgtime] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
250d4b4c |
|
28-Jun-2019 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: remove unused header files There are many, many xfs header files which are included but unneeded (or included twice) in the xfs code, so remove them. nb: xfs_linux.h includes about 9 headers for everyone, so those explicit includes get removed by this. I'm not sure what the preference is, but if we wanted explicit includes everywhere, a followup patch could remove those xfs_*.h includes from xfs_linux.h and move them into the files that need them. Or it could be left as-is. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
dbd329f1 |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: add struct xfs_mount pointer to struct xfs_buf We need to derive the mount pointer from a buffer in a lot of place. Add a direct pointer to short cut the pointer chasing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
4b4d98cc |
|
05-Jun-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: finish converting to inodes_per_cluster Finish converting all the old inode_cluster_size >> inopblog users to inodes_per_cluster. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
ef325959 |
|
05-Jun-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: separate inode geometry Separate the inode geometry information into a distinct structure. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
15baadf7 |
|
16-Feb-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: fix xfs_buf magic number endian checks Create a separate magic16 check function so that we don't run afoul of static checkers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
2bfe7069 |
|
06-Feb-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: add inode magic to inode verifier Use xfs_verify_magic to check the magic numbers of inodes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
e34d3e74 |
|
07-Feb-2019 |
Brian Foster <bfoster@redhat.com> |
xfs: always check magic values in on-disk byte order Most verifiers that check on-disk magic values convert the CPU endian magic value constant to disk endian to facilitate compile time optimization of the byte swap and reduce the need for runtime byte swaps in buffer verifiers. Several buffer verifiers do not follow this pattern. Update those verifiers for consistency. Also fix up a random typo in the inode readahead verifier name. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
7d36c195 |
|
07-Feb-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: add xfs_verify_agino_or_null helper Add a new helper to check that a per-AG inode pointer is either null or points somewhere valid within that AG. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
339e1a3f |
|
28-Sep-2018 |
Eric Sandeen <sandeen@redhat.com> |
xfs: validate inode di_forkoff Verify the inode di_forkoff, lifted from xfs_repair's process_check_inode_forkoff(). Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
d4a34e16 |
|
24-Jul-2018 |
Eric Sandeen <sandeen@redhat.com> |
xfs: properly handle free inodes in extent hint validators When inodes are freed in xfs_ifree(), di_flags is cleared (so extent size hints are removed) but the actual extent size fields are left intact. This causes the extent hint validators to fail on freed inodes which once had extent size hints. This can be observed (for example) by running xfs/229 twice on a non-crc xfs filesystem, or presumably on V5 with ikeep. Fixes: 7d71a67 ("xfs: verify extent size hint is valid in inode verifier") Fixes: 02a0fda ("xfs: verify COW extent size hint is valid in inode verifier") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
23fcb334 |
|
22-Jun-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: More robust inode extent count validation When the inode is in extent format, it can't have more extents that fit in the inode fork. We don't currenty check this, and so this corruption goes unnoticed by the inode verifiers. This can lead to crashes operating on invalid in-memory structures. Attempts to access such a inode will now error out in the verifier rather than allowing modification operations to proceed. Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix a typedef, add some braces and breaks to shut up compiler warnings] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0b61f8a4 |
|
05-Jun-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert to SPDX license tags Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
29cad0b3 |
|
05-Jun-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() In xfs_imap_to_bp(), we convert a -EFSCORRUPTED error to -EINVAL if we are doing an untrusted lookup. This is done because we need failed filehandle lookups to report -ESTALE to the caller, and it does this by converting -EINVAL and -ENOENT errors to -ESTALE. The squashing of EFSCORRUPTED in imap_to_bp makes it impossible for for xfs_iget(UNTRUSTED) callers to determine the difference between "inode does not exist" and "corruption detected during lookup". We realy need that distinction in places calling xfS_iget(UNTRUSTED), so move the filehandle error case handling all the way out to xfs_nfs_get_inode() where it is needed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
02a0fda8 |
|
05-Jun-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: verify COW extent size hint is valid in inode verifier There are rules for vald extent size hints. We enforce them when applications set them, but fuzzers violate those rules and that screws us over. Validate COW extent size hint rules in the inode verifier to catch this. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
7d71a671 |
|
05-Jun-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: verify extent size hint is valid in inode verifier There are rules for vald extent size hints. We enforce them when applications set them, but fuzzers violate those rules and that screws us over. This results in alignment assertion failures when setting up allocations such as this in direct IO: XFS: Assertion failed: ap->length, file: fs/xfs/libxfs/xfs_bmap.c, line: 3432 .... Call Trace: xfs_bmap_btalloc+0x415/0x910 xfs_bmapi_write+0x71c/0x12e0 xfs_iomap_write_direct+0x2a9/0x420 xfs_file_iomap_begin+0x4dc/0xa70 iomap_apply+0x43/0x100 iomap_file_buffered_write+0x62/0x90 xfs_file_buffered_aio_write+0xba/0x300 __vfs_write+0xd5/0x150 vfs_write+0xb6/0x180 ksys_write+0x45/0xa0 do_syscall_64+0x5a/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe And from xfs_db: core.extsize = 10380288 Which is not an integer multiple of the block size, and so violates Rule #7 for setting extent size hints. Validate extent size hint rules in the inode verifier to catch this. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
b42db086 |
|
17-Apr-2018 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: enhance dinode verifier Add several more validations to xfs_dinode_verify: - For LOCAL data fork formats, di_nextents must be 0. - For LOCAL attr fork formats, di_anextents must be 0. - For inodes with no attr fork offset, - format must be XFS_DINODE_FMT_EXTENTS if set at all - di_anextents must be 0. Thanks to dchinner for pointing out a couple related checks I had forgotten to add. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199377 Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
fa4493f0 |
|
23-Mar-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove dead inode version setting code We can only get into the branch if CRCs are enabled, so there's no need to check inside the branch for CRCs being enabled.... Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
6a96c565 |
|
23-Mar-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: don't accept inode buffers with suspicious unlinked chains When we're verifying inode buffers, sanity-check the unlinked pointer. We don't want to run the risk of trying to purge something that's obviously broken. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
8bb82bc1 |
|
23-Mar-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: move inode extent size hint validation to libxfs Extent size hint validation is used by scrub to decide if there's an error, and it will be used by repair to decide to remove the hint. Since these use the same validation functions, move them to libxfs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
6edb1810 |
|
23-Mar-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: refactor inode buffer verifier error logging When the inode buffer verifier encounters an error, it's much more helpful to print a buffer from the offending inode instead of just the start of the inode chunk buffer. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
22431bf3 |
|
22-Jan-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: refactor inode verifier corruption error printing Refactor inode verifier error reporting into a non-libxfs function so that we aren't encoding the message format in libxfs. This also changes the kernel dmesg output to resemble buffer verifier errors more closely. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
f0e28280 |
|
11-Dec-2017 |
Jeff Layton <jlayton@kernel.org> |
xfs: convert to new i_version API Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Dave Chinner <dchinner@redhat.com>
|
#
71493b83 |
|
08-Jan-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: move inode fork verifiers to xfs_dinode_verify Consolidate the fork size and format verifiers to xfs_dinode_verify so that we can reject bad inodes earlier and in a single place. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
50aa90ef |
|
08-Jan-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: verify dinode header first Move the v3 inode integrity information (crc, owner, metauuid) before we look at anything else in the inode so that we don't waste time on a torn write or a totally garbled block. This makes xfs_dinode_verify more consistent with the other verifiers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
bc1a09b8 |
|
08-Jan-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: refactor verifier callers to print address of failing check Refactor the callers of verifiers to print the instruction address of a failing check. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
a6a781a5 |
|
08-Jan-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: have buffer verifier functions report failing address Modify each function that checks the contents of a metadata buffer to return the instruction address of the failing test so that we can report more precise failure errors to the log. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
31ca03c9 |
|
08-Jan-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: refactor xfs_verifier_error and xfs_buf_ioerror Since all verification errors also mark the buffer as having an error, we can combine these two calls. Later we'll add a xfs_failaddr_t parameter to promote the idea of reporting corruption errors and the address of the failing check to enable better debugging reports. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
e9e899a2 |
|
31-Oct-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: move error injection tags into their own file Move the error injection tag names into a libxfs header so that we can share it between kernel and userspace. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
9e24cfd0 |
|
20-Jun-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: remove unneeded parameter from XFS_TEST_ERROR Since we moved the injected error frequency controls to the mountpoint, we can get rid of the last argument to XFS_TEST_ERROR. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
#
26788097 |
|
16-Jun-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: export various function for the online scrubber Export various internal functions so that the online scrubber can use them to check the state of metadata. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
c8ce540d |
|
16-Jun-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: remove double-underscore integer types This is a purely mechanical patch that removes the private __{u,}int{8,16,32,64}_t typedefs in favor of using the system {u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform the transformation and fix the resulting whitespace and indentation errors: s/typedef\t__uint8_t/typedef __uint8_t\t/g s/typedef\t__uint/typedef __uint/g s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g s/__uint8_t\t/__uint8_t\t\t/g s/__uint/uint/g s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g s/__int/int/g /^typedef.*int[0-9]*_t;$/d Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
bc593eeb |
|
28-Mar-2017 |
Eric Sandeen <sandeen@redhat.com> |
xfs: fix up inode validation failure message "xfs_iread: validation failed for inode 96 failed" One "failed" seems like enough. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
a324cbf1 |
|
17-Jan-2017 |
Amir Goldstein <amir73il@gmail.com> |
xfs: sanity check inode di_mode Check for invalid file type in xfs_dinode_verify() and fail to load the inode structure from disk. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
3c6f46ea |
|
17-Jan-2017 |
Amir Goldstein <amir73il@gmail.com> |
xfs: sanity check directory inode di_size This changes fixes an assertion hit when fuzzing on-disk i_mode values. The easy case to fix is when changing an empty file i_mode to S_IFDIR. In this case, xfs_dinode_verify() detects an illegal zero size for directory and fails to load the inode structure from disk. For the case of non empty file whose i_mode is changed to S_IFDIR, the ASSERT() statement in xfs_dir2_isblock() is replaced with return -EFSCORRUPTED, to avoid interacting with corrupted jusk also when XFS_DEBUG is disabled. Suggested-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
cae028df |
|
04-Dec-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: optimise CRC updates Nick Piggin reported that the CRC overhead in an fsync heavy workload was higher than expected on a Power8 machine. Part of this was to do with the fact that the power8 CRC implementation is not efficient for CRC lengths of less than 512 bytes, and so the way we split the CRCs over the CRC field means a lot of the CRCs are reduced to being less than than optimal size. To optimise this, change the CRC update mechanism to zero the CRC field first, and then compute the CRC in one pass over the buffer and write the result back into the buffer. We can do this safely because anything writing a CRC has exclusive access to the buffer the CRC is being calculated over. We leave the CRC verify code the same - it still splits the CRC calculation - because we do not want read-only operations modifying the underlying buffer. This is because read-only operations may not have an exclusive access to the buffer guaranteed, and so temporary modifications could leak out to to other processes accessing the buffer concurrently. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
ef388e20 |
|
04-Dec-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: don't allow di_size with high bit set The on-disk field di_size is used to set i_size, which is a signed integer of loff_t. If the high bit of di_size is set, we'll end up with a negative i_size, which will cause all sorts of problems. Since the VFS won't let us create a file with such length, we should catch them here in the verifier too. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
420fbeb4 |
|
07-Nov-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
libxfs: synchronize dinode_verify with userspace The userspace version of _dinode_verify takes a raw inode number instead of an inode itself. Since neither version actually needs the inode, port the changes to the kernel. This will also reduce the libxfs diff noise. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
8cdcc810 |
|
19-Oct-2016 |
Roger Willcocks <roger@filmlight.ltd.uk> |
libxfs: v3 inodes are only valid on crc-enabled filesystems xfs_repair was not detecting that version 3 inodes are invalid for for non-CRC filesystems. The result is specific inode corruptions go undetected and hence aren't repaired if only the version number is out of range. The core of the problem is that the XFS_DINODE_GOOD_VERSION() macro doesn't know that valid inode versions are dependent on a superblock version number. Fix this in libxfs, and propagate the new function out into the rest of xfsprogs to fix the issue. [Darrick: port to kernel from xfsprogs] Reported-by: Leslie Rhorer <lrhorer@mygrande.net> Signed-off-by: Roger Willcocks <roger@filmlight.ltd.uk> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
4f435ebe |
|
03-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: don't mix reflink and DAX mode for now Since we don't have a strategy for handling both DAX and reflink, for now we'll just prohibit both being set at the same time. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
c8e156ac |
|
03-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: check for invalid inode reflink flags We don't support sharing blocks on the realtime device. Flag inodes with the reflink or cowextsize flags set when the reflink feature is disabled. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
f7ca3522 |
|
03-Oct-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: create a separate cow extent size hint for the allocator Create a per-inode extent size allocator hint for copy-on-write. This hint is separate from the existing extent size hint so that CoW can take advantage of the fragmentation-reducing properties of extent size hints without disabling delalloc for regular writes. The extent size hint that's fed to the allocator during a copy on write operation is the greater of the cowextsize and regular extsize hint. During reflink, if we're sharing the entire source file to the entire destination file and the destination file doesn't already have a cowextsize hint, propagate the source file's cowextsize hint to the destination file. Furthermore, zero the bulkstat buffer prior to setting the fields so that we don't copy kernel memory contents into userspace. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
3ab78df2 |
|
02-Aug-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: rework xfs_bmap_free callers to use xfs_defer_ops Restructure everything that used xfs_bmap_free to use xfs_defer_ops instead. For now we'll just remove the old symbols and play some cpp magic to make it work; in the next patch we'll actually rename everything. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
c19b3b05 |
|
08-Feb-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: mode di_mode to vfs inode Move the di_mode value from the xfs_icdinode to the VFS inode, reducing the xfs_icdinode byte another 2 bytes and collapsing another 2 byte hole in the structure. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
83e06f21 |
|
08-Feb-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: move di_changecount to VFS inode We can store the di_changecount in the i_version field of the VFS inode and remove another 8 bytes from the xfs_icdinode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
9e9a2674 |
|
08-Feb-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: move inode generation count to VFS inode Pull another 4 bytes out of the xfs_icdinode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
54d7b5c1 |
|
08-Feb-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: use vfs inode nlink field everywhere The VFS tracks the inode nlink just like the xfs_icdinode. We can remove the variable from the icdinode and use the VFS inode variable everywhere, reducing the size of the xfs_icdinode by a further 4 bytes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
faeb4e47 |
|
08-Feb-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: move v1 inode conversion to xfs_inode_from_disk So we don't have to carry an di_onlink variable around anymore, move the inode conversion from v1 inode format to v2 inode format into xfs_inode_from_disk(). This means we can remove the di_onlink fields from the struct xfs_icdinode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
93f958f9 |
|
08-Feb-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: cull unnecessary icdinode fields Now that the struct xfs_icdinode is not directly related to the on-disk format, we can cull things in it we really don't need to store: - magic number never changes - padding is not necessary - next_unlinked is never used - inode number is redundant - uuid is redundant - lsn is accessed directly from dinode - inode CRC is only accessed directly from dinode Hence we can remove these from the struct xfs_icdinode and redirect the code that uses them to the xfs_dinode appripriately. This reduces the size of the struct icdinode from 152 bytes to 88 bytes, and removes a fair chunk of unnecessary code, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
3987848c |
|
08-Feb-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove timestamps from incore inode The struct xfs_inode has two copies of the current timestamps in it, one in the vfs inode and one in the struct xfs_icdinode. Now that we no longer log the struct xfs_icdinode directly, we don't need to keep the timestamps in this structure. instead we can copy them straight out of the VFS inode when formatting the inode log item or the on-disk inode. This reduces the struct xfs_inode in size by 24 bytes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
f8d55aa052 |
|
08-Feb-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: introduce inode log format object We currently carry around and log an entire inode core in the struct xfs_inode. A lot of the information in the inode core is duplicated in the VFS inode, but we cannot remove this duplication of infomration because the inode core is logged directly in xfs_inode_item_format(). Add a new function xfs_inode_item_format_core() that copies the inode core data into a struct xfs_icdinode that is pulled directly from the log vector buffer. This means we no longer directly copy the inode core, but copy the structures one member at a time. This will be slightly less efficient than copying, but will allow us to remove duplicate and unnecessary items from the struct xfs_inode. To enable us to do this, call the new structure a xfs_log_dinode, so that we know it's different to the physical xfs_dinode and the in-core xfs_icdinode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
7d6a13f0 |
|
11-Jan-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: handle dquot buffer readahead in log recovery correctly When we do dquot readahead in log recovery, we do not use a verifier as the underlying buffer may not have dquots in it. e.g. the allocation operation hasn't yet been replayed. Hence we do not want to fail recovery because we detect an operation to be replayed has not been run yet. This problem was addressed for inodes in commit d891400 ("xfs: inode buffers may not be valid during recovery readahead") but the problem was not recognised to exist for dquots and their buffers as the dquot readahead did not have a verifier. The result of not using a verifier is that when the buffer is then next read to replay a dquot modification, the dquot buffer verifier will only be attached to the buffer if *readahead is not complete*. Hence we can read the buffer, replay the dquot changes and then add it to the delwri submission list without it having a verifier attached to it. This then generates warnings in xfs_buf_ioapply(), which catches and warns about this case. Fix this and make it handle the same readahead verifier error cases as for inode buffers by adding a new readahead verifier that has a write operation as well as a read operation that marks the buffer as not done if any corruption is detected. Also make sure we don't run readahead if the dquot buffer has been marked as cancelled by recovery. This will result in readahead either succeeding and the buffer having a valid write verifier, or readahead failing and the buffer state requiring the subsequent read to resubmit the IO with the new verifier. In either case, this will result in the buffer always ending up with a valid write verifier on it. Note: we also need to fix the inode buffer readahead error handling to mark the buffer with EIO. Brian noticed the code I copied from there wrong during review, so fix it at the same time. Add comments linking the two functions that handle readahead verifier errors together so we don't forget this behavioural link in future. cc: <stable@vger.kernel.org> # 3.12 - current Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
b79f4a1c |
|
11-Jan-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: inode recovery readahead can race with inode buffer creation When we do inode readahead in log recovery, we do can do the readahead before we've replayed the icreate transaction that stamps the buffer with inode cores. The inode readahead verifier catches this and marks the buffer as !done to indicate that it doesn't yet contain valid inodes. In adding buffer error notification (i.e. setting b_error = -EIO at the same time as as we clear the done flag) to such a readahead verifier failure, we can then get subsequent inode recovery failing with this error: XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32 This occurs when readahead completion races with icreate item replay such as: inode readahead find buffer lock buffer submit RA io .... icreate recovery xfs_trans_get_buffer find buffer lock buffer <blocks on RA completion> ..... <ra completion> fails verifier clear XBF_DONE set bp->b_error = -EIO release and unlock buffer <icreate gains lock> icreate initialises buffer marks buffer as done adds buffer to delayed write queue releases buffer At this point, we have an initialised inode buffer that is up to date but has an -EIO state registered against it. When we finally get to recovering an inode in that buffer: inode item recovery xfs_trans_read_buffer find buffer lock buffer sees XBF_DONE is set, returns buffer sees bp->b_error is set fail log recovery! Essentially, we need xfs_trans_get_buf_map() to clear the error status of the buffer when doing a lookup. This function returns uninitialised buffers, so the buffer returned can not be in an error state and none of the code that uses this function expects b_error to be set on return. Indeed, there is an ASSERT(!bp->b_error); in the transaction case in xfs_trans_get_buf_map() that would have caught this if log recovery used transactions.... This patch firstly changes the inode readahead failure to set -EIO on the buffer, and secondly changes xfs_buf_get_map() to never return a buffer with an error state set so this first change doesn't cause unexpected log recovery failures. cc: <stable@vger.kernel.org> # 3.12 - current Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
233135b7 |
|
03-Jan-2016 |
Eric Sandeen <sandeen@redhat.com> |
xfs: print name of verifier if it fails This adds a name to each buf_ops structure, so that if a verifier fails we can print the type of verifier that failed it. Should be a slight debugging aid, I hope. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
ce748eaa |
|
28-Jul-2015 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: create new metadata UUID field and incompat flag This adds a new superblock field, sb_meta_uuid. If set, along with a new incompat flag, the code will use that field on a V5 filesystem to compare to metadata UUIDs, which allows us to change the user- visible UUID at will. Userspace handles the setting and clearing of the incompat flag as appropriate, as the UUID gets changed; i.e. setting the user-visible UUID back to the original UUID (as stored in the new field) will remove the incompatible feature flag. If the incompat flag is not set, this copies the user-visible UUID into into the meta_uuid slot in memory when the superblock is read from disk; the meta_uuid field is not written back to disk in this case. The remainder of this patch simply switches verifiers, initializers, etc to use the new sb_meta_uuid field. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
88ee2df7 |
|
21-Jun-2015 |
Christoph Hellwig <hch@lst.de> |
xfs: return a void pointer from xfs_buf_offset This avoids all kinds of unessecary casts in an envrionment like Linux where we can assume that pointer arithmetics are support on void pointers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
bb58e618 |
|
27-Nov-2014 |
Christoph Hellwig <hch@lst.de> |
xfs: move most of xfs_sb.h to xfs_format.h More on-disk format consolidation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
4fb6e8ad |
|
27-Nov-2014 |
Christoph Hellwig <hch@lst.de> |
xfs: merge xfs_ag.h into xfs_format.h More on-disk format consolidation. A few declarations that weren't on-disk format related move into better suitable spots. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
6d3ebaae |
|
27-Nov-2014 |
Christoph Hellwig <hch@lst.de> |
xfs: merge xfs_dinode.h into xfs_format.h More consolidatation for the on-disk format defintions. Note that the XFS_IS_REALTIME_INODE moves to xfs_linux.h instead as it is not related to the on disk format, but depends on a CONFIG_ option. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
2451337d |
|
24-Jun-2014 |
Dave Chinner <dchinner@redhat.com> |
xfs: global error sign conversion Convert all the errors the core XFs code to negative error signs like the rest of the kernel and remove all the sign conversion we do in the interface layers. Errors for conversion (and comparison) found via searches like: $ git grep " E" fs/xfs $ git grep "return E" fs/xfs $ git grep " E[A-Z].*;$" fs/xfs Negation points found via searches like: $ git grep "= -[a-z,A-Z]" fs/xfs $ git grep "return -[a-z,A-D,F-Z]" fs/xfs $ git grep " -[a-z].*;" fs/xfs [ with some bits I missed from Brian Foster ] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
30f712c9 |
|
24-Jun-2014 |
Dave Chinner <dchinner@redhat.com> |
libxfs: move source files Move all the source files that are shared with userspace into libxfs/. This is done as one big chunk simpy to get it done quickly Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|