#
9d618d19 |
|
12-Oct-2023 |
Jan Kara <jack@suse.cz> |
ocfs2: Avoid touching renamed directory if parent does not change The VFS will not be locking moved directory if its parent does not change. Change ocfs2 rename code to avoid touching renamed directory if its parent does not change as without locking that can corrupt the filesystem. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
1b13a703 |
|
09-Oct-2023 |
Artem Chernyshev <artem.chernyshev@red-soft.ru> |
fs: ocfs2: check status values Test return values before overwriting. Found by Linux Verification Center (linuxtesting.org) with SVACE. Link: https://lkml.kernel.org/r/20231009141111.149858-1-artem.chernyshev@red-soft.ru Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
fd6acbbc |
|
04-Oct-2023 |
Jeff Layton <jlayton@kernel.org> |
ocfs2: convert to new timestamp accessors Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-54-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
6b72e5f9 |
|
03-Aug-2023 |
Artem Chernyshev <artem.chernyshev@red-soft.ru> |
fs: ocfs2: namei: check return value of ocfs2_add_entry() Process result of ocfs2_add_entry() in case we have an error value. Found by Linux Verification Center (linuxtesting.org) with SVACE. Link: https://lkml.kernel.org/r/20230803145417.177649-1-artem.chernyshev@red-soft.ru Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem") Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Artem Chernyshev <artem.chernyshev@red-soft.ru> Cc: Joel Becker <jlbec@evilplan.org> Cc: Kurt Hackel <kurt.hackel@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
6861de97 |
|
05-Jul-2023 |
Jeff Layton <jlayton@kernel.org> |
ocfs2: convert to ctime accessor functions In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-60-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
de3004c8 |
|
14-Mar-2023 |
Roberto Sassu <roberto.sassu@huawei.com> |
ocfs2: Switch to security_inode_init_security() In preparation for removing security_old_inode_init_security(), switch to security_inode_init_security(). Extend the existing ocfs2_initxattrs() to take the ocfs2_security_xattr_info structure from fs_info, and populate the name/value/len triple with the first xattr provided by LSMs. As fs_info was not used before, ocfs2_initxattrs() can now handle the case of replicating the behavior of security_old_inode_init_security(), i.e. just obtaining the xattr, in addition to setting all xattrs provided by LSMs. Supporting multiple xattrs is not currently supported where security_old_inode_init_security() was called (mknod, symlink), as it requires non-trivial changes that can be done at a later time. Like for reiserfs, even if EVM is invoked, it will not provide an xattr (if it is not the first to set it, its xattr will be discarded; if it is the first, it does not have xattrs to calculate the HMAC on). Finally, since security_inode_init_security(), unlike security_old_inode_init_security(), returns zero instead of -EOPNOTSUPP if no xattrs were provided by LSMs or if inodes are private, additionally check in ocfs2_init_security_get() if the xattr name is set. If not, act as if security_old_inode_init_security() returned -EOPNOTSUPP, and set si->enable to zero to notify to the functions following ocfs2_init_security_get() that no xattrs are available. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
|
#
9452e93e |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port privilege checking helpers to mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
f2d40141 |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port inode_init_owner() to mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
e18275ae |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port ->rename() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
5ebb29be |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port ->mknod() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
c54bd91e |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port ->mkdir() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
7a77db95 |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port ->symlink() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
6c960e68 |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port ->create() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
cac2f8b8 |
|
22-Sep-2022 |
Christian Brauner <brauner@kernel.org> |
fs: rename current get acl method The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. The current inode operation for getting posix acls takes an inode argument but various filesystems (e.g., 9p, cifs, overlayfs) need access to the dentry. In contrast to the ->set_acl() inode operation we cannot simply extend ->get_acl() to take a dentry argument. The ->get_acl() inode operation is called from: acl_permission_check() -> check_acl() -> get_acl() which is part of generic_permission() which in turn is part of inode_permission(). Both generic_permission() and inode_permission() are called in the ->permission() handler of various filesystems (e.g., overlayfs). So simply passing a dentry argument to ->get_acl() would amount to also having to pass a dentry argument to ->permission(). We should avoid this unnecessary change. So instead of extending the existing inode operation rename it from ->get_acl() to ->get_inode_acl() and add a ->get_acl() method later that passes a dentry argument and which filesystems that need access to the dentry can implement instead of ->get_inode_acl(). Filesystems like cifs which allow setting and getting posix acls but not using them for permission checking during lookup can simply not implement ->get_inode_acl(). This is intended to be a non-functional change. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Suggested-by/Inspired-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
28f4821b |
|
17-Oct-2022 |
Joseph Qi <joseph.qi@linux.alibaba.com> |
ocfs2: clear dinode links count in case of error In ocfs2_mknod(), if error occurs after dinode successfully allocated, ocfs2 i_links_count will not be 0. So even though we clear inode i_nlink before iput in error handling, it still won't wipe inode since we'll refresh inode from dinode during inode lock. So just like clear inode i_nlink, we clear ocfs2 i_links_count as well. Also do the same change for ocfs2_symlink(). Link: https://lkml.kernel.org/r/20221017130227.234480-2-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: Yan Wang <wangyan122@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
759a7c61 |
|
17-Oct-2022 |
Joseph Qi <joseph.qi@linux.alibaba.com> |
ocfs2: fix BUG when iput after ocfs2_mknod fails Commit b1529a41f777 "ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error" tried to reclaim the claimed inode if __ocfs2_mknod_locked() fails later. But this introduce a race, the freed bit may be reused immediately by another thread, which will update dinode, e.g. i_generation. Then iput this inode will lead to BUG: inode->i_generation != le32_to_cpu(fe->i_generation) We could make this inode as bad, but we did want to do operations like wipe in some cases. Since the claimed inode bit can only affect that an dinode is missing and will return back after fsck, it seems not a big problem. So just leave it as is by revert the reclaim logic. Link: https://lkml.kernel.org/r/20221017130227.234480-1-joseph.qi@linux.alibaba.com Fixes: b1529a41f777 ("ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: Yan Wang <wangyan122@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
1639a49c |
|
14-Jul-2022 |
Yang Xu <xuyang2018.jy@fujitsu.com> |
fs: move S_ISGID stripping into the vfs_*() helpers Move setgid handling out of individual filesystems and into the VFS itself to stop the proliferation of setgid inheritance bugs. Creating files that have both the S_IXGRP and S_ISGID bit raised in directories that themselves have the S_ISGID bit set requires additional privileges to avoid security issues. When a filesystem creates a new inode it needs to take care that the caller is either in the group of the newly created inode or they have CAP_FSETID in their current user namespace and are privileged over the parent directory of the new inode. If any of these two conditions is true then the S_ISGID bit can be raised for an S_IXGRP file and if not it needs to be stripped. However, there are several key issues with the current implementation: * S_ISGID stripping logic is entangled with umask stripping. If a filesystem doesn't support or enable POSIX ACLs then umask stripping is done directly in the vfs before calling into the filesystem. If the filesystem does support POSIX ACLs then unmask stripping may be done in the filesystem itself when calling posix_acl_create(). Since umask stripping has an effect on S_ISGID inheritance, e.g., by stripping the S_IXGRP bit from the file to be created and all relevant filesystems have to call posix_acl_create() before inode_init_owner() where we currently take care of S_ISGID handling S_ISGID handling is order dependent. IOW, whether or not you get a setgid bit depends on POSIX ACLs and umask and in what order they are called. Note that technically filesystems are free to impose their own ordering between posix_acl_create() and inode_init_owner() meaning that there's additional ordering issues that influence S_SIGID inheritance. * Filesystems that don't rely on inode_init_owner() don't get S_ISGID stripping logic. While that may be intentional (e.g. network filesystems might just defer setgid stripping to a server) it is often just a security issue. This is not just ugly it's unsustainably messy especially since we do still have bugs in this area years after the initial round of setgid bugfixes. So the current state is quite messy and while we won't be able to make it completely clean as posix_acl_create() is still a filesystem specific call we can improve the S_SIGD stripping situation quite a bit by hoisting it out of inode_init_owner() and into the vfs creation operations. This means we alleviate the burden for filesystems to handle S_ISGID stripping correctly and can standardize the ordering between S_ISGID and umask stripping in the vfs. We add a new helper vfs_prepare_mode() so S_ISGID handling is now done in the VFS before umask handling. This has S_ISGID handling is unaffected unaffected by whether umask stripping is done by the VFS itself (if no POSIX ACLs are supported or enabled) or in the filesystem in posix_acl_create() (if POSIX ACLs are supported). The vfs_prepare_mode() helper is called directly in vfs_*() helpers that create new filesystem objects. We need to move them into there to make sure that filesystems like overlayfs hat have callchains like: sys_mknod() -> do_mknodat(mode) -> .mknod = ovl_mknod(mode) -> ovl_create(mode) -> vfs_mknod(mode) get S_ISGID stripping done when calling into lower filesystems via vfs_*() creation helpers. Moving vfs_prepare_mode() into e.g. vfs_mknod() takes care of that. This is in any case semantically cleaner because S_ISGID stripping is VFS security requirement. Security hooks so far have seen the mode with the umask applied but without S_ISGID handling done. The relevant hooks are called outside of vfs_*() creation helpers so by calling vfs_prepare_mode() from vfs_*() helpers the security hooks would now see the mode without umask stripping applied. For now we fix this by passing the mode with umask settings applied to not risk any regressions for LSM hooks. IOW, nothing changes for LSM hooks. It is worth pointing out that security hooks never saw the mode that is seen by the filesystem when actually creating the file. They have always been completely misplaced for that to work. The following filesystems use inode_init_owner() and thus relied on S_ISGID stripping: spufs, 9p, bfs, btrfs, ext2, ext4, f2fs, hfsplus, hugetlbfs, jfs, minix, nilfs2, ntfs3, ocfs2, omfs, overlayfs, ramfs, reiserfs, sysv, ubifs, udf, ufs, xfs, zonefs, bpf, tmpfs. All of the above filesystems end up calling inode_init_owner() when new filesystem objects are created through the ->mkdir(), ->mknod(), ->create(), ->tmpfile(), ->rename() inode operations. Since directories always inherit the S_ISGID bit with the exception of xfs when irix_sgid_inherit mode is turned on S_ISGID stripping doesn't apply. The ->symlink() and ->link() inode operations trivially inherit the mode from the target and the ->rename() inode operation inherits the mode from the source inode. All other creation inode operations will get S_ISGID handling via vfs_prepare_mode() when called from their relevant vfs_*() helpers. In addition to this there are filesystems which allow the creation of filesystem objects through ioctl()s or - in the case of spufs - circumventing the vfs in other ways. If filesystem objects are created through ioctl()s the vfs doesn't know about it and can't apply regular permission checking including S_ISGID logic. Therfore, a filesystem relying on S_ISGID stripping in inode_init_owner() in their ioctl() callpath will be affected by moving this logic into the vfs. We audited those filesystems: * btrfs allows the creation of filesystem objects through various ioctls(). Snapshot creation literally takes a snapshot and so the mode is fully preserved and S_ISGID stripping doesn't apply. Creating a new subvolum relies on inode_init_owner() in btrfs_new_subvol_inode() but only creates directories and doesn't raise S_ISGID. * ocfs2 has a peculiar implementation of reflinks. In contrast to e.g. xfs and btrfs FICLONE/FICLONERANGE ioctl() that is only concerned with the actual extents ocfs2 uses a separate ioctl() that also creates the target file. Iow, ocfs2 circumvents the vfs entirely here and did indeed rely on inode_init_owner() to strip the S_ISGID bit. This is the only place where a filesystem needs to call mode_strip_sgid() directly but this is self-inflicted pain. * spufs doesn't go through the vfs at all and doesn't use ioctl()s either. Instead it has a dedicated system call spufs_create() which allows the creation of filesystem objects. But spufs only creates directories and doesn't allo S_SIGID bits, i.e. it specifically only allows 0777 bits. * bpf uses vfs_mkobj() but also doesn't allow S_ISGID bits to be created. The patch will have an effect on ext2 when the EXT2_MOUNT_GRPID mount option is used, on ext4 when the EXT4_MOUNT_GRPID mount option is used, and on xfs when the XFS_FEAT_GRPID mount option is used. When any of these filesystems are mounted with their respective GRPID option then newly created files inherit the parent directories group unconditionally. In these cases non of the filesystems call inode_init_owner() and thus did never strip the S_ISGID bit for newly created files. Moving this logic into the VFS means that they now get the S_ISGID bit stripped. This is a user visible change. If this leads to regressions we will either need to figure out a better way or we need to revert. However, given the various setgid bugs that we found just in the last two years this is a regression risk we should take. Associated with this change is a new set of fstests to enforce the semantics for all new filesystems. Link: https://lore.kernel.org/ceph-devel/20220427092201.wvsdjbnc7b4dttaw@wittgenstein [1] Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes") [2] Link: 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") [3] Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [4] Link: https://lore.kernel.org/r/1657779088-2242-3-git-send-email-xuyang2018.jy@fujitsu.com Suggested-by: Dave Chinner <david@fromorbit.com> Suggested-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-and-Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> [<brauner@kernel.org>: rewrote commit message] Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
137cebf9 |
|
22-Mar-2022 |
hongnanli <hongnan.li@linux.alibaba.com> |
fs/ocfs2: fix comments mentioning i_mutex inode->i_mutex has been replaced with inode->i_rwsem long ago. Fix comments still mentioning i_mutex. Link: https://lkml.kernel.org/r/20220214031314.100094-1-hongnan.li@linux.alibaba.com Signed-off-by: hongnanli <hongnan.li@linux.alibaba.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fa60ce2c |
|
06-May-2021 |
Masahiro Yamada <masahiroy@kernel.org> |
treewide: remove editor modelines and cruft The section "19) Editor modelines and other cruft" in Documentation/process/coding-style.rst clearly says, "Do not include any of these in source files." I recently receive a patch to explicitly add a new one. Let's do treewide cleanups, otherwise some people follow the existing code and attempt to upstream their favoriate editor setups. It is even nicer if scripts/checkpatch.pl can check it. If we like to impose coding style in an editor-independent manner, I think editorconfig (patch [1]) is a saner solution. [1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/ Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> [auxdisplay] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2b5f52c5 |
|
07-Apr-2021 |
Miklos Szeredi <mszeredi@redhat.com> |
ocfs2: convert to fileattr Use the fileattr API to let the VFS handle locking, permission checking and conversion. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: Joel Becker <jlbec@evilplan.org>
|
#
549c7297 |
|
21-Jan-2021 |
Christian Brauner <christian.brauner@ubuntu.com> |
fs: make helpers idmap mount aware Extend some inode methods with an additional user namespace argument. A filesystem that is aware of idmapped mounts will receive the user namespace the mount has been marked with. This can be used for additional permission checking and also to enable filesystems to translate between uids and gids if they need to. We have implemented all relevant helpers in earlier patches. As requested we simply extend the exisiting inode method instead of introducing new ones. This is a little more code churn but it's mostly mechanical and doesnt't leave us with additional inode methods. Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
#
21cb47be |
|
21-Jan-2021 |
Christian Brauner <christian.brauner@ubuntu.com> |
inode: make init and permission helpers idmapped mount aware The inode_owner_or_capable() helper determines whether the caller is the owner of the inode or is capable with respect to that inode. Allow it to handle idmapped mounts. If the inode is accessed through an idmapped mount it according to the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Similarly, allow the inode_init_owner() helper to handle idmapped mounts. It initializes a new inode on idmapped mounts by mapping the fsuid and fsgid of the caller from the mount's user namespace. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
#
45680967 |
|
14-Dec-2020 |
Mauricio Faria de Oliveira <mfo@canonical.com> |
ocfs2: ratelimit the 'max lookup times reached' notice Running stress-ng on ocfs2 completely fills the kernel log with 'max lookup times reached, filesystem may have nested directories.' Let's ratelimit this message as done with others in the code. Test-case: # mkfs.ocfs2 --mount local $DEV # mount $DEV $MNT # cd $MNT # dmesg -C # stress-ng --dirdeep 1 --dirdeep-ops 1000 # dmesg | grep -c 'max lookup times reached' Before: # dmesg -C # stress-ng --dirdeep 1 --dirdeep-ops 1000 ... stress-ng: info: [11116] successful run completed in 3.03s # dmesg | grep -c 'max lookup times reached' 967 After: # dmesg -C # stress-ng --dirdeep 1 --dirdeep-ops 1000 ... stress-ng: info: [739] successful run completed in 0.96s # dmesg | grep -c 'max lookup times reached' 10 # dmesg [ 259.086086] ocfs2_check_if_ancestor: 1990 callbacks suppressed [ 259.086092] (stress-ng-dirde,740,1):ocfs2_check_if_ancestor:1091 max lookup times reached, filesystem may have nested directories, src inode: 18007, dest inode: 17940. ... Link: https://lkml.kernel.org/r/20201001224417.478263-1-mfo@canonical.com Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3f649ab7 |
|
03-Jun-2020 |
Kees Cook <keescook@chromium.org> |
treewide: Remove uninitialized_var() usage Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
0434c9f4 |
|
01-Apr-2020 |
wangjian <wangjian161@huawei.com> |
ocfs2: roll back the reference count modification of the parent directory if an error occurs Under some conditions, the directory cannot be deleted. The specific scenarios are as follows: (for example, /mnt/ocfs2 is the mount point) 1. Create the /mnt/ocfs2/p_dir directory. At this time, the i_nlink corresponding to the inode of the /mnt/ocfs2/p_dir directory is equal to 2. 2. During the process of creating the /mnt/ocfs2/p_dir/s_dir directory, if the call to the inc_nlink function in ocfs2_mknod succeeds, the functions such as ocfs2_init_acl, ocfs2_init_security_set, and ocfs2_dentry_attach_lock fail. At this time, the i_nlink corresponding to the inode of the /mnt/ocfs2/p_dir directory is equal to 3, but /mnt/ocfs2/p_dir/s_dir is not added to the /mnt/ocfs2/p_dir directory entry. 3. Delete the /mnt/ocfs2/p_dir directory (rm -rf /mnt/ocfs2/p_dir). At this time, it is found that the i_nlink corresponding to the inode corresponding to the /mnt/ocfs2/p_dir directory is equal to 3. Therefore, the /mnt/ocfs2/p_dir directory cannot be deleted. Signed-off-by: Jian wang <wangjian161@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/a44f6666-bbc4-405e-0e6c-0f4e922eeef6@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
25b69918 |
|
30-Jan-2020 |
wangyan <wangyan122@huawei.com> |
ocfs2: use ocfs2_update_inode_fsync_trans() to access t_tid in handle->h_transaction For the uniform format, we use ocfs2_update_inode_fsync_trans() to access t_tid in handle->h_transaction Link: http://lkml.kernel.org/r/6ff9a312-5f7d-0e27-fb51-bc4e062fcd97@huawei.com Signed-off-by: Yan Wang <wangyan122@huawei.com> Reviewed-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
225dcadf |
|
23-Sep-2019 |
zhengbin <zhengbin13@huawei.com> |
fs/ocfs2/namei.c: remove set but not used variables Fixes gcc '-Wunused-but-set-variable' warning: fs/ocfs2/namei.c: In function ocfs2_create_inode_in_orphan: fs/ocfs2/namei.c:2503:23: warning: variable di set but not used [-Wunused-but-set-variable] Link: http://lkml.kernel.org/r/1566522588-63786-2-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Changwei Ge <chge@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
328970de |
|
23-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 145 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 021110 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 84 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190524100844.756442981@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
64202a21 |
|
07-Jun-2018 |
Salvatore Mesoraca <s.mesoraca16@gmail.com> |
ocfs2: drop a VLA in ocfs2_orphan_del() Avoid a VLA by using a real constant expression instead of a variable. The compiler should be able to optimize the original code and avoid using an actual VLA. Anyway this change is useful because it will avoid a false positive with -Wvla, it might also help the compiler generating better code. Link: http://lkml.kernel.org/r/1520970710-19732-1-git-send-email-s.mesoraca16@gmail.com Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d324cd4c |
|
05-Apr-2018 |
piaojun <piaojun@huawei.com> |
ocfs2: use 'oi' instead of 'OCFS2_I()' We could use 'oi' instead of 'OCFS2_I()' to make code more elegant. Link: http://lkml.kernel.org/r/5A7020FE.5050906@huawei.com Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cc56c33e |
|
11-Dec-2017 |
Jeff Layton <jlayton@kernel.org> |
ocfs2: convert to new i_version API Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
|
#
c62c38f6 |
|
12-Dec-2016 |
Deepa Dinamani <deepa.kernel@gmail.com> |
ocfs2: replace CURRENT_TIME macro CURRENT_TIME is not y2038 safe. Use y2038 safe ktime_get_real_seconds() here for timestamps. struct heartbeat_block's hb_seq and deletetion time are already 64 bits wide and accommodate times beyond y2038. Also use y2038 safe ktime_get_real_ts64() for on disk inode timestamps. These are also wide enough to accommodate time64_t. Link: http://lkml.kernel.org/r/1475365298-29236-1-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fd50ecad |
|
29-Sep-2016 |
Andreas Gruenbacher <agruenba@redhat.com> |
vfs: Remove {get,set,remove}xattr inode operations These inode operations are no longer used; remove them. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
078cd827 |
|
14-Sep-2016 |
Deepa Dinamani <deepa.kernel@gmail.com> |
fs: Replace CURRENT_TIME with current_time() for inode timestamps CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_time() instead. CURRENT_TIME is also not y2038 safe. This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. As part of the effort current_time() will be extended to do range checks. Hence, it is necessary for all file system timestamps to use current_time(). Also, current_time() will be transitioned along with vfs to be y2038 safe. Note that whenever a single call to current_time() is used to change timestamps in different inodes, it is because they share the same time granularity. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Felipe Balbi <balbi@kernel.org> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
2773bf00 |
|
27-Sep-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
fs: rename "rename2" i_op to "rename" Generated patch: sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2` sed -i "s/\brename2\b/rename/g" `git grep -wl rename2` Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
1cd66c93 |
|
27-Sep-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
fs: make remaining filesystems use .rename2 This is trivial to do: - add flags argument to foo_rename() - check if flags is zero - assign foo_rename() to .rename2 instead of .rename This doesn't mean it's impossible to support RENAME_NOREPLACE for these filesystems, but it is not trivial, like for local filesystems. RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible for a file to be created on one host while it is overwritten by rename on another host). Filesystems converted: 9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs. After this, we can get rid of the duplicate interfaces for rename. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: David Howells <dhowells@redhat.com> [AFS] Acked-by: Mike Marshall <hubcap@omnibond.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Mark Fasheh <mfasheh@suse.com>
|
#
c25a1e06 |
|
12-May-2016 |
Junxiao Bi <junxiao.bi@oracle.com> |
ocfs2: fix posix_acl_create deadlock Commit 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure") refactored code to use posix_acl_create. The problem with this function is that it is not mindful of the cluster wide inode lock making it unsuitable for use with ocfs2 inode creation with ACLs. For example, when used in ocfs2_mknod, this function can cause deadlock as follows. The parent dir inode lock is taken when calling posix_acl_create -> get_acl -> ocfs2_iop_get_acl which takes the inode lock again. This can cause deadlock if there is a blocked remote lock request waiting for the lock to be downconverted. And same deadlock happened in ocfs2_reflink. This fix is to revert back using ocfs2_init_acl. Fixes: 702e5bc68ad2 ("ocfs2: use generic posix ACL infrastructure") Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5955102c |
|
22-Jan-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
wrappers for ->i_mutex access parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
074a6c65 |
|
14-Jan-2016 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: access orphan dinode before delete entry in ocfs2_orphan_del In ocfs2_orphan_del, currently it finds and deletes entry first, and then access orphan dir dinode. This will have a problem once ocfs2_journal_access_di fails. In this case, entry will be removed from orphan dir, but in deed the inode hasn't been deleted successfully. In other words, the file is missing but not actually deleted. So we should access orphan dinode first like unlink and rename. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Jiufei Xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
72865d92 |
|
14-Jan-2016 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: clean up redundant NULL check before iput Since iput will take care the NULL check itself, NULL check before calling it is redundant. So clean them up. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
854ee2e9 |
|
11-Dec-2015 |
Junxiao Bi <junxiao.bi@oracle.com> |
ocfs2: fix SGID not inherited issue Commit 8f1eb48758aa ("ocfs2: fix umask ignored issue") introduced an issue, SGID of sub dir was not inherited from its parents dir. It is because SGID is set into "inode->i_mode" in ocfs2_get_init_inode(), but is overwritten by "mode" which don't have SGID set later. Fixes: 8f1eb48758aa ("ocfs2: fix umask ignored issue") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
21fc61c7 |
|
16-Nov-2015 |
Al Viro <viro@zeniv.linux.org.uk> |
don't put symlink bodies in pagecache into highmem kmap() in page_follow_link_light() needed to go - allowing to hold an arbitrary number of kmaps for long is a great way to deadlocking the system. new helper (inode_nohighmem(inode)) needs to be used for pagecache symlinks inodes; done for all in-tree cases. page_follow_link_light() instrumented to yell about anything missed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
8f1eb487 |
|
20-Nov-2015 |
Junxiao Bi <junxiao.bi@oracle.com> |
ocfs2: fix umask ignored issue New created file's mode is not masked with umask, and this makes umask not work for ocfs2 volume. Fixes: 702e5bc ("ocfs2: use generic posix ACL infrastructure") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Gang He <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b1529a41 |
|
05-Nov-2015 |
alex chen <alex.chen@huawei.com> |
ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error In ocfs2_mknod_locked if '__ocfs2_mknod_locke d' returns an error, we should reclaim the inode successfully claimed above, otherwise, the inode never be reused. The case is described below: ocfs2_mknod ocfs2_mknod_locked ocfs2_claim_new_inode Successfully claim the inode __ocfs2_mknod_locked ocfs2_journal_access_di Failed because of -ENOMEM or other reasons, the inode lockres has not been initialized yet. iput(inode) ocfs2_evict_inode ocfs2_delete_inode ocfs2_inode_lock ocfs2_inode_lock_full_nested __ocfs2_cluster_lock Return -EINVAL because of the inode lockres has not been initialized. So the following operations are not performed ocfs2_wipe_inode ocfs2_remove_inode ocfs2_free_dinode ocfs2_free_suballoc_bits Signed-off-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
30edc43c |
|
05-Nov-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: do not include dio entry in case of orphan scan dio entry will only do truncate in case of ORPHAN_NEED_TRUNCATE. So do not include it when doing normal orphan scan to reduce contention. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
928dda1f |
|
04-Sep-2015 |
Yiwen Jiang <jiangyiwen@huawei.com> |
ocfs2: fix a tiny case that inode can not removed When running dirop_fileop_racer we found a case that inode can not removed. Two nodes, say Node A and Node B, mount the same ocfs2 volume. Create two dirs /race/1/ and /race/2/ in the filesystem. Node A Node B rm -r /race/2/ mv /race/1/ /race/2/ call ocfs2_unlink(), get the EX mode of /race/2/ wait for B unlock /race/2/ decrease i_nlink of /race/2/ to 0, and add inode of /race/2/ into orphan dir, unlock /race/2/ got EX mode of /race/2/. because /race/1/ is dir, so inc i_nlink of /race/2/ and update into disk, unlock /race/2/ because i_nlink of /race/2/ is not zero, this inode will always remain in orphan dir This patch fixes this case by test whether i_nlink of new dir is zero. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Xue jiufei <xuejiufei@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
807a7907 |
|
04-Sep-2015 |
jiangyiwen <jiangyiwen@huawei.com> |
ocfs2: set filesytem read-only when ocfs2_delete_entry failed. In ocfs2_rename, it will lead to an inode with two entried(old and new) if ocfs2_delete_entry(old) failed. Thus, filesystem will be inconsistent. The case is described below: ocfs2_rename -> ocfs2_start_trans -> ocfs2_add_entry(new) -> ocfs2_delete_entry(old) -> __ocfs2_journal_access *failed* because of -ENOMEM -> ocfs2_commit_trans So filesystem should be set to read-only at the moment. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3cb2ec43 |
|
04-Sep-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: adjust code to match locking/unlocking order Unlocking order in ocfs2_unlink and ocfs2_rename mismatches the corresponding locking order, although it won't cause issues, adjust the code so that it looks more reasonable. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
512f62ac |
|
04-Sep-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: fix race between dio and recover orphan During direct io the inode will be added to orphan first and then deleted from orphan. There is a race window that the orphan entry will be deleted twice and thus trigger the BUG when validating OCFS2_DIO_ORPHANED_FL in ocfs2_del_inode_from_orphan. ocfs2_direct_IO_write ... ocfs2_add_inode_to_orphan >>>>>>>> race window. 1) another node may rm the file and then down, this node take care of orphan recovery and clear flag OCFS2_DIO_ORPHANED_FL. 2) since rw lock is unlocked, it may race with another orphan recovery and append dio. ocfs2_del_inode_from_orphan So take inode mutex lock when recovering orphans and make rw unlock at the end of aio write in case of append dio. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9c89fe0a |
|
14-Jul-2015 |
Jan Kara <jack@suse.com> |
ocfs2: Handle error from dquot_initialize() dquot_initialize() can now return error. Handle it where possible. Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Jan Kara <jack@suse.com>
|
#
ab1ba021 |
|
24-Jun-2015 |
Fabian Frederick <fabf@skynet.be> |
ocfs2: use swap() in ocfs2_double_lock() Use kernel.h macro definition. Thanks to Julia Lawall for Coccinelle scripting support. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cf1776a9 |
|
24-Jun-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: fix a tiny race when truncate dio orohaned entry Once dio crashed it will leave an entry in orphan dir. And orphan scan will take care of the clean up. There is a tiny race case that the same entry will be truncated twice and then trigger the BUG in ocfs2_del_inode_from_orphan. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2b0143b5 |
|
17-Mar-2015 |
David Howells <dhowells@redhat.com> |
VFS: normal filesystems (and lustre): d_inode() annotations that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
62f8b1f0 |
|
14-Apr-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: use ENOENT instead of EEXIST when get system file fails When ocfs2_get_system_file_inode fails, it is obscure to set the return value to -EEXIST. So change it to -ENOENT. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e38a5739 |
|
14-Apr-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: use actual name length when find entry in ocfs2_orphan_del() If the namelen is 20 and name only has actual length 16, it will fail in ocfs2_find_entry because of mismatch. So use actual name length when find entry. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4813962b |
|
16-Feb-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: wait for orphan recovery first once append O_DIRECT write crash If one node has crashed with orphan entry leftover, another node which do append O_DIRECT write to the same file will override the i_dio_orphaned_slot. Then the old entry won't be cleaned forever. If this case happens, we let it wait for orphan recovery first. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Xuejiufei <xuejiufei@huawei.com> Cc: alex chen <alex.chen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
06ee5c75 |
|
16-Feb-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: add functions to add and remove inode in orphan dir Add functions to add inode to orphan dir and remove inode in orphan dir. Here we do not call ocfs2_prepare_orphan_dir and ocfs2_orphan_add directly. Because append O_DIRECT will add inode to orphan two and may result in more than one orphan entry for the same inode. [akpm@linux-foundation.org: avoid dynamic stack allocation] Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Xuejiufei <xuejiufei@huawei.com> Cc: alex chen <alex.chen@huawei.com> Cc: Fengguang Wu <fengguang.wu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
53dc20b9 |
|
08-Jan-2015 |
Xue jiufei <xuejiufei@huawei.com> |
ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when link file In ocfs2_link(), the parent directory inode passed to function ocfs2_lookup_ino_from_name() is wrong. Parameter dir is the parent of new_dentry not old_dentry. We should get old_dir from old_dentry and lookup old_dentry in old_dir in case another node remove the old dentry. With this change, hard linking works again, when paths are relative with at least one subdirectory. This is how the problem was reproducable: # mkdir a # mkdir b # touch a/test # ln a/test b/test ln: failed to create hard link `b/test' => `a/test': No such file or directory However when creating links in the same dir, it worked well. Now the link gets created. Fixes: 0e048316ff57 ("ocfs2: check existence of old dentry in ocfs2_link()") Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reported-by: Szabo Aron - UBIT <aron@ubit.hu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Tested-by: Aron Szabo <aron@ubit.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d3556bab |
|
29-Oct-2014 |
Richard Weinberger <richard@nod.at> |
ocfs2: fix d_splice_alias() return code checking d_splice_alias() can return a valid dentry, NULL or an ERR_PTR. Currently the code checks not for ERR_PTR and will cuase an oops in ocfs2_dentry_attach_lock(). Fix this by using IS_ERR_OR_NULL(). Signed-off-by: Richard Weinberger <richard@nod.at> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
595297a8 |
|
23-Jun-2014 |
jiangyiwen <jiangyiwen@huawei.com> |
ocfs2: manually do the iput once ocfs2_add_entry failed in ocfs2_symlink and ocfs2_mknod When the call to ocfs2_add_entry() failed in ocfs2_symlink() and ocfs2_mknod(), iput() will not be called during dput(dentry) because no d_instantiate(), and this will lead to umount hung. Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f7a14f32 |
|
23-Jun-2014 |
Yiwen Jiang <jiangyiwen@huawei.com> |
ocfs2: fix a tiny race when running dirop_fileop_racer When running dirop_fileop_racer we found a dead lock case. 2 nodes, say Node A and Node B, mount the same ocfs2 volume. Create /race/16/1 in the filesystem, and let the inode number of dir 16 is less than the inode number of dir race. Node A Node B mv /race/16/1 /race/ right after Node A has got the EX mode of /race/16/, and tries to get EX mode of /race ls /race/16/ In this case, Node A has got the EX mode of /race/16/, and wants to get EX mode of /race/. Node B has got the PR mode of /race/, and wants to get the PR mode of /race/16/. Since EX and PR are mutually exclusive, dead lock happens. This patch fixes this case by locking in ancestor order before trying inode number order. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5fb1beb0 |
|
23-Jun-2014 |
alex chen <alex.chen@huawei.com> |
ocfs2: should add inode into orphan dir after updating entry in ocfs2_rename() There are two files a and b in dir /mnt/ocfs2. node A node B mv a b In ocfs2_rename(), after calling ocfs2_orphan_add(), the inode of file b will be added into orphan dir. If ocfs2_update_entry() fails, ocfs2_rename return error and mv operation fails. But file b still exists in the parent dir. ocfs2_queue_orphan_scan -> ocfs2_queue_recovery_completion -> ocfs2_complete_recovery -> ocfs2_recover_orphans The inode of the file b will be put with iput(). ocfs2_evict_inode -> ocfs2_delete_inode -> ocfs2_wipe_inode -> ocfs2_remove_inode OCFS2_VALID_FL in the inode i_flags will be cleared. The file b still can be accessed on node B. ls /mnt/ocfs2 When first read the file b with ocfs2_read_inode_block(). It will validate the inode using ocfs2_validate_inode_block(). Because OCFS2_VALID_FL not set in the inode i_flags, so the file system will be readonly. So we should add inode into orphan dir after updating entry in ocfs2_rename(). Signed-off-by: alex.chen <alex.chen@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6fdb702d |
|
03-Apr-2014 |
Darrick J. Wong <darrick.wong@oracle.com> |
ocfs2: call ocfs2_update_inode_fsync_trans when updating any inode Ensure that ocfs2_update_inode_fsync_trans() is called any time we touch an inode in a given transaction. This is a follow-on to the previous patch to reduce lock contention and deadlocking during an fsync operation. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Wengang <wen.gang.wang@oracle.com> Cc: Greg Marsden <greg.marsden@oracle.com> Cc: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f81c2015 |
|
03-Apr-2014 |
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> |
ocfs2: fix panic on kfree(xattr->name) Commit 9548906b2bb7 ('xattr: Constify ->name member of "struct xattr"') missed that ocfs2 is calling kfree(xattr->name). As a result, kernel panic occurs upon calling kfree(xattr->name) because xattr->name refers static constant names. This patch removes kfree(xattr->name) from ocfs2_mknod() and ocfs2_symlink(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Tariq Saeed <tariq.x.saeed@oracle.com> Tested-by: Tariq Saeed <tariq.x.saeed@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
466e68c4 |
|
03-Apr-2014 |
Xue jiufei <xuejiufei@huawei.com> |
ocfs2: __ocfs2_mknod_locked should return error when ocfs2_create_new_inode_locks() failed When ocfs2_create_new_inode_locks() return error, inode open lock may not be obtainted for this inode. So other nodes can remove this file and free dinode when inode still remain in memory on this node, which is not correct and may trigger BUG. So __ocfs2_mknod_locked should return error when ocfs2_create_new_inode_locks() failed. Node_1 Node_2 create fileA, call ocfs2_mknod() -> ocfs2_get_init_inode(), allocate inodeA -> ocfs2_claim_new_inode(), claim dinode(dinodeA) -> call ocfs2_create_new_inode_locks(), create open lock failed, return error -> __ocfs2_mknod_locked return success unlink fileA try open lock succeed, and free dinodeA create another file, call ocfs2_mknod() -> ocfs2_get_init_inode(), allocate inodeB -> ocfs2_claim_new_inode(), as Node_2 had freed dinodeA, so claim dinodeA and update generation for dinodeA call __ocfs2_drop_dl_inodes()->ocfs2_delete_inode() to free inodeA, and finally triggers BUG on(inode->i_generation != le32_to_cpu(fe->i_generation)) in function ocfs2_inode_lock_update(). Signed-off-by: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2931cdcb |
|
03-Apr-2014 |
Darrick J. Wong <darrick.wong@oracle.com> |
ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file Currently, ocfs2_sync_file grabs i_mutex and forces the current journal transaction to complete. This isn't terribly efficient, since sync_file really only needs to wait for the last transaction involving that inode to complete, and this doesn't require i_mutex. Therefore, implement the necessary bits to track the newest tid associated with an inode, and teach sync_file to wait for that instead of waiting for everything in the journal to commit. Furthermore, only issue the flush request to the drive if jbd2 hasn't already done so. This also eliminates the deadlock between ocfs2_file_aio_write() and ocfs2_sync_file(). aio_write takes i_mutex then calls ocfs2_aiodio_wait() to wait for unaligned dio writes to finish. However, if that dio completion involves calling fsync, then we can get into trouble when some ocfs2_sync_file tries to take i_mutex. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0e048316 |
|
10-Feb-2014 |
Xue jiufei <xuejiufei@huawei.com> |
ocfs2: check existence of old dentry in ocfs2_link() System call linkat first calls user_path_at(), check the existence of old dentry, and then calls vfs_link()->ocfs2_link() to do the actual work. There may exist a race when Node A create a hard link for file while node B rm it. Node A Node B user_path_at() ->ocfs2_lookup(), find old dentry exist rm file, add inode say inodeA to orphan_dir call ocfs2_link(),create a hard link for inodeA. rm the link, add inodeA to orphan_dir again When orphan_scan work start, it calls ocfs2_queue_orphans() to do the main work. It first tranverses entrys in orphan_dir, linking all inodes in this orphan_dir to a list look like this: inodeA->inodeB->...->inodeA When tranvering this list, it will fall into loop, calling iput() again and again. And finally trigger BUG_ON(inode->i_state & I_CLEAR). Signed-off-by: joyce <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f5b25855 |
|
27-Jan-2014 |
Xiaowei.Hu <xiaowei.hu@oracle.com> |
ocfs2: do not log ENOENT in unlink() Suppress log message like this: (open_delete,8328,0):ocfs2_unlink:951 ERROR: status = -2 Orabug:17445485 Signed-off-by: Xiaowei Hu <xiaowei.hu@oracle.com> Cc: Joe Jin <joe.jin@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
702e5bc6 |
|
20-Dec-2013 |
Christoph Hellwig <hch@infradead.org> |
ocfs2: use generic posix ACL infrastructure This contains some major refactoring for the create path so that inodes are created with the right mode to start with instead of fixing it up later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
7391a294 |
|
12-Nov-2013 |
Rui Xiang <rui.xiang@huawei.com> |
ocfs2: return ENOMEM when sb_getblk() fails The only reason for sb_getblk() failing is if it can't allocate the buffer_head. So return ENOMEM instead when it fails. [joseph.qi@huawei.com: ocfs2_symlink_get_block() and ocfs2_read_blocks_sync() and ocfs2_read_blocks() need the same change] Signed-off-by: Rui Xiang <rui.xiang@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b5a8bb71 |
|
03-Jul-2013 |
Younger Liu <younger.liu@huawei.com> |
ocfs2: fix readonly issue in ocfs2_unlink() While deleting a file with ocfs2_unlink(), there is a bug in this function. This bug will result in filesystem read-only. After calling ocfs2_orphan_add(), the file which will be deleted is added into orphan dir. If ocfs2_delete_entry() fails, the file still exists in the parent dir. And this scenario introduces a conflict of metadata. If a file is added into orphan dir, when we put inode of the file with iput(), the inode i_flags is setted (~OCFS2_VALID_FL) in ocfs2_remove_inode(), and then write back to disk. But as previously mentioned, the file still exists in the parent dir. On other nodes, the file can be still accessed. When first read the file with ocfs2_read_blocks() from disk, It will check and avalidate inode using ocfs2_validate_inode_block(). So File system will be readonly because the inode is invalid. In other words, the inode i_flags has been set (~OCFS2_VALID_FL). [akpm@linux-foundation.org: cleanups] [jeff.liu@oracle.com: s/inode_is_unlinkable/ocfs2_inode_is_unlinkable/] Signed-off-by: Younger Liu <younger.liu@huawei.com> Signed-off-by: Jensen <shencanquan@huawei.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ea45466a |
|
03-Jul-2013 |
Younger Liu <younger.liu@huawei.com> |
ocfs2: need rollback when journal_access failed in ocfs2_orphan_add() While adding a file into orphan dir in ocfs2_orphan_add(), it calls __ocfs2_add_entry() before ocfs2_journal_access_di(). If ocfs2_journal_access_di() failed, the file is added into orphan dir, and orphan dir dinode updated, but file dinode has not been updated. Accordingly, the data is not consistent between file dinode and orphan dir. So, need to call ocfs2_journal_access_di() before __ocfs2_add_entry(), and if ocfs2_journal_access_di() failed, orphan_fe and orphan_dir_inode->i_nlink need rollback. This bug was added by 3939fda4 ("Ocfs2: Journaling i_flags and i_orphaned_slot when adding inode to orphan dir."). Signed-off-by: Younger Liu <younger.liu@huawei.com> Acked-by: Jeff Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
13eb9887 |
|
03-Jul-2013 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: should not use le32_add_cpu to set ocfs2_dinode i_flags If we use le32_add_cpu to set ocfs2_dinode i_flags, it may lead to the corresponding flag corrupted. So we should change it to bitwise and/or operation. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: shencanquan <shencanquan@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e0991271 |
|
12-Jun-2013 |
Goldwyn Rodrigues <rgoldwyn@gmail.com> |
fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory While removing a non-empty directory, the kernel dumps a message: (rmdir,21743,1):ocfs2_unlink:953 ERROR: status = -39 Suppress the error message from being printed in the dmesg so users don't panic. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Acked-by: Sunil Mushran <sunil.mushran@gmail.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7869e590 |
|
12-Jun-2013 |
Xiaowei.Hu <xiaowei.hu@oracle.com> |
ocfs2: ocfs2_prep_new_orphaned_file() should return ret If an error occurs, for example an EIO in __ocfs2_prepare_orphan_dir, ocfs2_prep_new_orphaned_file will release the inode_ac, then when the caller of ocfs2_prep_new_orphaned_file gets a 0 return, it will refer to a NULL ocfs2_alloc_context struct in the following functions. A kernel panic happens. Signed-off-by: "Xiaowei.Hu" <xiaowei.hu@oracle.com> Reviewed-by: shencanquan <shencanquan@huawei.com> Acked-by: Sunil Mushran <sunil.mushran@gmail.com> Cc: Joe Jin <joe.jin@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2c034176 |
|
31-Jan-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
ocfs2: Convert uid and gids between in core and on disk inodes Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
ebfc3b49 |
|
10-Jun-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
don't pass nameidata to ->create() boolean "does it have to be exclusive?" flag is passed instead; Local filesystem should just ignore it - the object is guaranteed not to be there yet. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
00cd8dd3 |
|
10-Jun-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
stop passing nameidata to ->lookup() Just the flags; only NFS cares even about that, but there are legitimate uses for such argument. And getting rid of that completely would require splitting ->lookup() into a couple of methods (at least), so let's leave that alone for now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
ea022dfb |
|
03-May-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
ocfs: simplify symlink handling seeing that "fast" symlinks still get allocation + copy, we might as well simply switch them to pagecache-based variant of ->follow_link(); just need an appropriate ->readpage() for them... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
847c9db5 |
|
12-Feb-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
ocfs2: deal with wraparounds of i_nlink in ocfs2_rename() unfortunately, nlink_t may be smaller than 32 bits and ->i_nlink on ocfs2 can grow up to 0xffffffff; storing it in nlink_t variable will lose upper bits on such architectures. Needs to be made u32, until we get kernel-side nlink_t uniformly 32bit... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
67697cbd |
|
26-Jul-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
ocfs2: propagate umode_t Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
1a67aafb |
|
25-Jul-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
switch ->mknod() to umode_t Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4acdaf27 |
|
25-Jul-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
switch ->create() to umode_t vfs_create() ignores everything outside of 16bit subset of its mode argument; switching it to umode_t is obviously equivalent and it's the only caller of the method Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
18bb1db3 |
|
25-Jul-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
switch vfs_mkdir() and ->mkdir() to umode_t vfs_mkdir() gets int, but immediately drops everything that might not fit into umode_t and that's the only caller of ->mkdir()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
bfe86848 |
|
28-Oct-2011 |
Miklos Szeredi <mszeredi@suse.cz> |
filesystems: add set_nlink() Replace remaining direct i_nlink updates with a new set_nlink() updater function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
6d6b77f1 |
|
28-Oct-2011 |
Miklos Szeredi <mszeredi@suse.cz> |
filesystems: add missing nlink wrappers Replace direct i_nlink updates with the respective updater function (inc_nlink, drop_nlink, clear_nlink, inode_dec_link_count). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
#
a7732b05 |
|
28-Oct-2011 |
Miklos Szeredi <mszeredi@suse.cz> |
ocfs2: remove unnecessary nlink setting alloc_inode() initializes i_nlink to 1. Remove unnecessary re-initialization. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Joel Becker <jlbec@evilplan.org> CC: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
4e34e719 |
|
23-Jul-2011 |
Christoph Hellwig <hch@lst.de> |
fs: take the ACL checks to common code Replace the ->check_acl method with a ->get_acl method that simply reads an ACL from disk after having a cache miss. This means we can replace the ACL checking boilerplate code with a single implementation in namei.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
178ea735 |
|
20-Jun-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
kill check_acl callback of generic_permission() its value depends only on inode and does not change; we might as well store it in ->i_op->check_acl and be done with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
7ca57363 |
|
24-May-2011 |
Sage Weil <sage@newdream.net> |
ocfs2: remove unnecessary dentry_unhash on rmdir/rename_dir Ocfs2 has no issues with lingering references to unlinked directory inodes. CC: Mark Fasheh <mfasheh@suse.com> CC: ocfs2-devel@oss.oracle.com Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
e4eaac06 |
|
24-May-2011 |
Sage Weil <sage@newdream.net> |
vfs: push dentry_unhash on rename_dir into file systems Only a few file systems need this. Start by pushing it down into each rename method (except gfs2 and xfs) so that it can be dealt with on a per-fs basis. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
79bf7c73 |
|
24-May-2011 |
Sage Weil <sage@newdream.net> |
vfs: push dentry_unhash on rmdir into file systems Only a few file systems need this. Start by pushing it down into each fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs basis. This does not change behavior for any in-tree file systems. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
25985edc |
|
30-Mar-2011 |
Lucas De Marchi <lucas.demarchi@profusion.mobi> |
Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
|
#
8990e44a |
|
23-Feb-2011 |
Tao Ma <boyu.mt@taobao.com> |
ocfs2: Remove masklog ML_NAMEI. Remove mlog(0) from fs/ocfs2/namei.c and the masklog NAMEI finally. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
|
#
c1e8d35e |
|
07-Mar-2011 |
Tao Ma <boyu.mt@taobao.com> |
ocfs2: Remove EXIT from masklog. mlog_exit is used to record the exit status of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. This patch just try to remove it or change it. So: 1. if all the error paths already use mlog_errno, it is just removed. Otherwise, it will be replaced by mlog_errno. 2. if it is used to print some return value, it is replaced with mlog(0,...). mlog_exit_ptr is changed to mlog(0. All those mlog(0,...) will be replaced with trace events later. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
|
#
ef6b689b |
|
20-Feb-2011 |
Tao Ma <boyu.mt@taobao.com> |
ocfs2: Remove ENTRY from masklog. ENTRY is used to record the entry of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. So for mlog_entry_void, we just remove it. for mlog_entry(...), we replace it with mlog(0,...), and they will be replace by trace event later. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
|
#
2a7dba39 |
|
01-Feb-2011 |
Eric Paris <eparis@redhat.com> |
fs/vfs/security: pass last path component to LSM on inode creation SELinux would like to implement a new labeling behavior of newly created inodes. We currently label new inodes based on the parent and the creating process. This new behavior would also take into account the name of the new object when deciding the new label. This is not the (supposed) full path, just the last component of the path. This is very useful because creating /etc/shadow is different than creating /etc/passwd but the kernel hooks are unable to differentiate these operations. We currently require that userspace realize it is doing some difficult operation like that and than userspace jumps through SELinux hoops to get things set up correctly. This patch does not implement new behavior, that is obviously contained in a seperate SELinux patch, but it does pass the needed name down to the correct LSM hook. If no such name exists it is fine to pass NULL. Signed-off-by: Eric Paris <eparis@redhat.com>
|
#
ba87167c |
|
17-Dec-2010 |
Al Viro <viro@zeniv.linux.org.uk> |
switch ocfs2, close races Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
fb045adb |
|
06-Jan-2011 |
Nick Piggin <npiggin@kernel.dk> |
fs: dcache reduce branches in lookup path Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>
|
#
1e6d9153 |
|
20-Dec-2010 |
Tao Ma <boyu.mt@taobao.com> |
ocfs2: Release buffer_head in case of error in ocfs2_double_lock. In ocfs2_double_lock, when ocfs2_inode_lock for inode1 fails, we just unlock inode2 and return without releasing buffer we get from inode_lock(inode2). The good thing is that it is freed by the only caller ocfs2_rename when it exits. But I don't think this is a right way for error handling. We should free the buffer_head we get in ocfs2_double_lock before exit so that the caller doesn't need to take care of it. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
7de9c6ee |
|
23-Oct-2010 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: ihold() Clones an existing reference to inode; caller must already hold one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
5e98d492 |
|
28-Jun-2010 |
Goldwyn Rodrigues <rgoldwyn@gmail.com> |
Track negative entries v3 Track negative dentries by recording the generation number of the parent directory in d_fsdata. The generation number for the parent directory is recorded in the inode_info, which increments every time the lock on the directory is dropped. If the generation number of the parent directory and the negative dentry matches, there is no need to perform the revalidate, else a revalidate is forced. This improves performance in situations where nodes look for the same non-existent file multiple times. Thanks Mark for explaining the DLM sequence. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
97b8f4a9 |
|
13-Aug-2010 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: Fix orphan add in ocfs2_create_inode_in_orphan ocfs2_create_inode_in_orphan() is used by reflink to create the newly reflinked inode simultaneously in the orphan dir. This allows us to easily handle partially-reflinked files during recovery cleanup. We have a problem though - the orphan dir stringifies inode # to determine a unique name under which the orphan entry dirent can be created. Since ocfs2_create_inode_in_orphan() needs the space allocated in the orphan dir before it can allocate the inode, we currently call into the orphan code: /* * We give the orphan dir the root blkno to fake an orphan name, * and allocate enough space for our insertion. */ status = ocfs2_prepare_orphan_dir(osb, &orphan_dir, osb->root_blkno, orphan_name, &orphan_insert); Using osb->root_blkno might work fine on unindexed directories, but the orphan dir can have an index. When it has that index, the above code fails to allocate the proper index entry. Later, when we try to remove the file from the orphan dir (using the actual inode #), the reflink operation will fail. To fix this, I created a function ocfs2_alloc_orphaned_file() which uses the newly split out orphan and inode alloc code to figure out what the inode block number will be (once allocated) and then prepare the orphan dir from that data. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
|
#
dd43bcde |
|
13-Aug-2010 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: split out ocfs2_prepare_orphan_dir() into locking and prep functions We do this because ocfs2_create_inode_in_orphan() wants to order locking of the orphan dir with respect to locking of the inode allocator *before* making any changes to the directory. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
|
#
021960ca |
|
13-Aug-2010 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: split out inode alloc code from ocfs2_mknod_locked Do this by splitting the bulk of the function away from the inode allocation code at the very tom of ocfs2_mknod_locked(). Existing callers don't need to change and won't see any difference. The new function created, __ocfs2_mknod_locked() will be used shortly. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
|
#
75fe0a24 |
|
04-Mar-2010 |
Dmitry Monakhov <dmonakhov@openvz.org> |
ocfs2: replace inode uid,gid,mode initialization with helper function Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
547ba7c8 |
|
10-May-2010 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Block signals for mkdir/link/symlink/O_CREAT. Once file or link creation gets going, it can't be interrupted by a signal. They're not idempotent. This blocks signals in ocfs2_mknod(), ocfs2_link(), and ocfs2_symlink() once we start actually changing things. ocfs2_mknod() covers mknod(), creat(), mkdir(), and open(O_CREAT). Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
ec20cec7 |
|
19-Mar-2010 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Make ocfs2_journal_dirty() void. jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0 since before the kernel moved to git. There is no point in checking this error. ocfs2_journal_dirty() has been faithfully returning the status since the beginning. All over ocfs2, we have blocks of code checking this can't fail status. In the past few years, we've tried to avoid adding these checks, because they are pointless. But anyone who looks at our code assumes they are needed. Finally, ocfs2_journal_dirty() is made a void function. All error checking is removed from other files. We'll BUG_ON() the status of jbd2_journal_dirty_metadata() just in case they change it someday. They won't. Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
a9743fcd |
|
23-Apr-2010 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: Add directory entry later in ocfs2_symlink() and ocfs2_mknod() If we get a failure during creation of an inode we'll allow the orphan code to remove the inode, which is correct. However, we need to ensure that we don't get any errors after the call to ocfs2_add_entry(), otherwise we could leave a dangling directory reference. The solution is simple - in both cases, all I had to do was move ocfs2_dentry_attach_lock() above the ocfs2_add_entry() call. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
062d3403 |
|
22-Apr-2010 |
Li Dongyang <lidongyang@novell.com> |
ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_mknod error path Mark the inode with flag OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_mknod, so we can kill the inode in case of error. [ Fixed up comment style -Mark ] Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
ab41fdc8 |
|
22-Apr-2010 |
Li Dongyang <lidongyang@novell.com> |
ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_symlink error path Mark the inode with flag OCFS2_INODE_SKIP_ORPHAN_DIR when we get an error after allocating one, so that we can kill the inode. Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
d4cd1871 |
|
22-Apr-2010 |
Li Dongyang <lidongyang@novell.com> |
ocfs2: add OCFS2_INODE_SKIP_ORPHAN_DIR flag and honor it in the inode wipe code Currently in the error path of ocfs2_symlink and ocfs2_mknod, we just call iput with the inode we failed with, but the inode wipe code will complain because we don't add the inode to orphan dir. One solution would be to lock the orphan dir during the entire transaction, but that's too heavy for a rare error path. Instead, we add a flag, OCFS2_INODE_SKIP_ORPHAN_DIR which tells the inode wipe code that it won't find this inode in the orphan dir. [ Merge fixes and comment style cleanups -Mark ] Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
3939fda4 |
|
18-Mar-2010 |
Tristan Ye <tristan.ye@oracle.com> |
Ocfs2: Journaling i_flags and i_orphaned_slot when adding inode to orphan dir. Currently, some callers were missing to journal the dirty inode after adding it to orphan dir. Now we're going to journal such modifications within the ocfs2_orphan_add() itself, It's safe to do so, though some existing caller may duplicate this, and it makes the logic look more straightforward anyway. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
2b6cb576 |
|
25-Mar-2010 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Set suballoc_loc on allocated metadata. Get the suballoc_loc from ocfs2_claim_new_inode() or ocfs2_claim_metadata(). Store it on the appropriate field of the block we just allocated. Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
1ed9b777 |
|
05-May-2010 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: ocfs2_claim_*() don't need an ocfs2_super argument. They all take an ocfs2_alloc_context, which has the allocation inode. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
|
#
871a2931 |
|
03-Mar-2010 |
Christoph Hellwig <hch@infradead.org> |
dquot: cleanup dquot initialize routine Get rid of the initialize dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_initialize helper to __dquot_initialize and vfs_dq_init to dquot_initialize to have a consistent namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
907f4554 |
|
03-Mar-2010 |
Christoph Hellwig <hch@infradead.org> |
dquot: move dquot initialization responsibility into the filesystem Currently various places in the VFS call vfs_dq_init directly. This means we tie the quota code into the VFS. Get rid of that and make the filesystem responsible for the initialization. For most metadata operations this is a straight forward move into the methods, but for truncate and open it's a bit more complicated. For truncate we currently only call vfs_dq_init for the sys_truncate case because open already takes care of it for ftruncate and open(O_TRUNC) - the new code causes an additional vfs_dq_init for those which is harmless. For open the initialization is moved from do_filp_open into the open method, which means it happens slightly earlier now, and only for regular files. The latter is fine because we don't need to initialize it for operations on special files, and we already do it as part of the namespace operations for directories. Add a dquot_file_open helper that filesystems that support generic quotas can use to fill in ->open. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
63936dda |
|
03-Mar-2010 |
Christoph Hellwig <hch@infradead.org> |
dquot: cleanup inode allocation / freeing routines Get rid of the alloc_inode and free_inode dquot operations - they are always called from the filesystem and if a filesystem really needs their own (which none currently does) it can just call into it's own routine directly. Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always call the lowlevel dquot_alloc_inode / dqout_free_inode routines directly, which now lose the number argument which is always 1. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
5dd4056d |
|
03-Mar-2010 |
Christoph Hellwig <hch@infradead.org> |
dquot: cleanup space allocation / freeing routines Get rid of the alloc_space, free_space, reserve_space, claim_space and release_rsv dquot operations - they are always called from the filesystem and if a filesystem really needs their own (which none currently does) it can just call into it's own routine directly. Move shared logic into the common __dquot_alloc_space, dquot_claim_space_nodirty and __dquot_free_space low-level methods, and rationalize the wrappers around it to move as much as possible code into the common block for CONFIG_QUOTA vs not. Also rename all these helpers to be named dquot_* instead of vfs_dq_*. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
10cf1a02 |
|
17-Dec-2009 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Set i_nlink properly during reflink. We create a file in orphan dir for reflink so that if there is any error, we don't create any wrong dentry in the dir. But actually the file in orphan dir should be i_nlink = 0 so that it can be replayed and freed successfully. This patch first set i_nlink to 0 when creating the file in orphan dir and then set it to 1(reflink now only works for regular file) when we move it to the dest dir. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
c7d260af |
|
17-Dec-2009 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Add reflinked file's inode to inode hash eariler. We used to add reflinked file's inode to inode hash when we add it to the dest dir. But actually there is a race. Consider the following sequence. 1. reflink happens and create the inode in orphan dir. 2. reflink thread is scheduled out because of some io. 3. recovery begins to work and calls ocfs2_recover_orphans. It calls ocfs2_iget and get a new inode and i_count = 1. It calls iput then and delete inode. the buffer's uptodate state is cleared. This patch move insert_inode_hash to the create function so that it can be found by step 3 and prevented from deleting because i_count > 1. This resolves the bug http://oss.oracle.com/bugzilla/show_bug.cgi?id=1183. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
55f4946e |
|
17-Dec-2009 |
Tristan Ye <tristan.ye@oracle.com> |
Ocfs2: Should ocfs2 support fiemap for S_IFDIR inode? Let userspace have a chance to get the extent info of a directory just like extN did. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
bc13d347 |
|
17-Aug-2009 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Create reflinked file in orphan dir. reflink is a very complicated process, so it can't be integrated into one transaction. So if the system panic in the operation, we may leave a unfinished inode in the destication directory. So we will try to create an inode in orphan_dir first, reflink it to the src file and then move it to the destication file in the end. In that way we won't be afraid of any corruption during the reflink. This patch adds 2 functions for orphan_dir operation: 1. Create a new inode in orphand dir. 2. Move an inode to a target dir. Note: fsck.ocfs2 should work for us to remove the unfinished file in the orphan_dir. Signed-off-by: Tao Ma <tao.ma@oracle.com>
|
#
19bd341f |
|
17-Aug-2009 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Use proper parameter for some inode operation. In order to make the original function more suitable for reflink, we modify the following inode operations. Both are tiny. 1. ocfs2_mknod_locked only use dentry for mlog, so move it to the caller so that reflink can use it without dentry. 2. ocfs2_prepare_orphan_dir only want inode to get its ip_blkno. So use ip_blkno instead. Signed-off-by: Tao Ma <tao.ma@oracle.com>
|
#
0cf2f763 |
|
12-Feb-2009 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Pass struct ocfs2_caching_info to the journal functions. The next step in divorcing metadata I/O management from struct inode is to pass struct ocfs2_caching_info to the journal functions. Thus the journal locks a metadata cache with the cache io_lock function. It also can compare ci_last_trans and ci_created_trans directly. This is a large patch because of all the places we change ocfs2_journal_access..(handle, inode, ...) to ocfs2_journal_access..(handle, INODE_CACHE(inode), ...). Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
8cb471e8 |
|
10-Feb-2009 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Take the inode out of the metadata read/write paths. We are really passing the inode into the ocfs2_read/write_blocks() functions to get at the metadata cache. This commit passes the cache directly into the metadata block functions, divorcing them from the inode. Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
cb25797d |
|
04-Jun-2009 |
Jan Kara <jack@suse.cz> |
ocfs2: Add lockdep annotations Add lockdep support to OCFS2. The support also covers all of the cluster locks except for open locks, journal locks, and local quotafile locks. These are special because they are acquired for a node, not for a particular process and lockdep cannot deal with such type of locking. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
7e31a966 |
|
28-Apr-2009 |
Tao Ma <tao.ma@oracle.com> |
ocfs2/trivial: Remove unused variable in ocfs2_rename. With indexed dir enabled, now we use ocfs2_dir_lookup_result to wrap all the bh used for dir. So remove the 2 unused variables. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
13821151 |
|
24-Feb-2009 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Optimize inode allocation by remembering last group In ocfs2, the inode block search looks for the "emptiest" inode group to allocate from. So if an inode alloc file has many equally (or almost equally) empty groups, new inodes will tend to get spread out amongst them, which in turn can put them all over the disk. This is undesirable because directory operations on conceptually "nearby" inodes force a large number of seeks. So we add ip_last_used_group in core directory inodes which records the last used allocation group. Another field named ip_last_used_slot is also added in case inode stealing happens. When claiming new inode, we passed in directory's inode so that the allocation can use this information. For more details, please see http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
b80b549c |
|
18-Feb-2009 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: re-order ocfs2_empty_dir checks ocfs2_empty_dir() is far more expensive than checking link count. Since both need to be checked at the same time, we can improve performance by checking link count first. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
198a1ca3 |
|
20-Nov-2008 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: Increase max links count Since we've now got a directory format capable of handling a large number of entries, we can increase the maximum link count supported. This only gets increased if the directory indexing feature is turned on. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>
|
#
4ed8a6bb |
|
24-Nov-2008 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: Store dir index records inline Allow us to store a small number of directory index records in the ocfs2_dx_root_block. This saves us a disk read on small to medium sized directories (less than about 250 entries). The inline root is automatically turned into a root block with extents if the directory size increases beyond it's capacity. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>
|
#
9b7895ef |
|
12-Nov-2008 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: Add a name indexed b-tree to directory inodes This patch makes use of Ocfs2's flexible btree code to add an additional tree to directory inodes. The new tree stores an array of small, fixed-length records in each leaf block. Each record stores a hash value, and pointer to a block in the traditional (unindexed) directory tree where a dirent with the given name hash resides. Lookup exclusively uses this tree to find dirents, thus providing us with constant time name lookups. Some of the hashing code was copied from ext3. Unfortunately, it has lots of unfixed checkpatch errors. I left that as-is so that tracking changes would be easier. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>
|
#
4a12ca3a |
|
12-Nov-2008 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: Introduce dir lookup helper struct Many directory manipulation calls pass around a tuple of dirent, and it's containing buffer_head. Dir indexing has a bit more state, but instead of adding yet more arguments to functions, we introduce 'struct ocfs2_dir_lookup_result'. In this patch, it simply holds the same tuple, but future patches will add more state. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>
|
#
d9ae49d6 |
|
04-Mar-2009 |
Tiger Yang <tiger.yang@oracle.com> |
ocfs2: tweak to get the maximum inline data size with xattr Replace max_inline_data with max_inline_data_with_xattr to ensure it correct when xattr inlined. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
13723d00 |
|
17-Oct-2008 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Use metadata-specific ocfs2_journal_access_*() functions. The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2 commit triggers and allow us to compute metadata ecc right before the buffers are written out. This commit provides ecc for inodes, extent blocks, group descriptors, and quota blocks. It is not safe to use extened attributes and metaecc at the same time yet. The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide the type of block at their root. Before, it didn't matter, but now the root block must use the appropriate ocfs2_journal_access_*() function. To keep this abstract, the structures now have a pointer to the matching journal_access function and a wrapper call to call it. A few places use naked ocfs2_write_block() calls instead of adding the blocks to the journal. We make sure to calculate their checksum and ecc before the write. Since we pass around the journal_access functions. Let's typedef them in ocfs2.h. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
a90714c1 |
|
09-Oct-2008 |
Jan Kara <jack@suse.cz> |
ocfs2: Add quota calls for allocation and freeing of inodes and space Add quota calls for allocation and freeing of inodes and space, also update estimates on number of needed credits for a transaction. Move out inode allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called outside of a transaction. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
b657c95c |
|
13-Nov-2008 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Wrap inode block reads in a dedicated function. The ocfs2 code currently reads inodes off disk with a simple ocfs2_read_block() call. Each place that does this has a different set of sanity checks it performs. Some check only the signature. A couple validate the block number (the block read vs di->i_blkno). A couple others check for VALID_FL. Only one place validates i_fs_generation. A couple check nothing. Even when an error is found, they don't all do the same thing. We wrap inode reading into ocfs2_read_inode_block(). This will validate all the above fields, going readonly if they are invalid (they never should be). ocfs2_read_inode_block_full() is provided for the places that want to pass read_block flags. Every caller is passing a struct inode with a valid ip_blkno, so we don't need a separate blkno argument either. We will remove the validation checks from the rest of the code in a later commit, as they are no longer necessary. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
89c38bd0 |
|
13-Nov-2008 |
Tiger Yang <tiger.yang@oracle.com> |
ocfs2: add ocfs2_init_acl in mknod We need to get the parent directories acls and let the new child inherit it. To this, we add additional calculations for data/metadata allocation. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
534eaddd |
|
13-Nov-2008 |
Tiger Yang <tiger.yang@oracle.com> |
ocfs2: add ocfs2_init_security in during file create Security attributes must be set when creating a new inode. We do this in three steps. - First, get security xattr's name and value by security_operation - Calculate and reserve the meta data and clusters needed by this security xattr before starting transaction - Finally, we set it before add_entry Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
f5d36202 |
|
13-Nov-2008 |
Tiger Yang <tiger.yang@oracle.com> |
ocfs2: move new inode allocation out of the transaction Move out inode allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called outside of a transaction. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
b19c2a3b |
|
13-Nov-2008 |
David Howells <dhowells@redhat.com> |
CRED: Wrap task credential accesses in the OCFS2 filesystem Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: ocfs2-devel@oss.oracle.com Signed-off-by: James Morris <jmorris@namei.org>
|
#
b99835c1 |
|
20-Oct-2008 |
Jan Kara <jack@suse.cz> |
ocfs2: Let inode be really deleted when ocfs2_mknod_locked() fails We forgot to set i_nlink to 0 when returning due to error from ocfs2_mknod_locked() and thus inode was not properly released via ocfs2_delete_inode() (e.g. claimed space was not released). Fix it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
87cfa004 |
|
20-Oct-2008 |
Jan Kara <jack@suse.cz> |
ocfs2: Fix checking of return value of new_inode() new_inode() does not return ERR_PTR() but NULL in case of failure. Correct checking of the return value. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
0fcaa56a |
|
09-Oct-2008 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Simplify ocfs2_read_block() More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED. Only six pass a different flag set. Rather than have every caller care, let's make ocfs2_read_block() take no flags and always do a cached read. The remaining six places can call ocfs2_read_blocks() directly. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
31d33073 |
|
09-Oct-2008 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Require an inode for ocfs2_read_block(s)(). Now that synchronous readers are using ocfs2_read_blocks_sync(), all callers of ocfs2_read_blocks() are passing an inode. Use it unconditionally. Since it's there, we don't need to pass the ocfs2_super either. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
a81cb88b |
|
07-Oct-2008 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: Don't check for NULL before brelse() This is pointless as brelse() already does the check. Signed-off-by: Mark Fasheh
|
#
cf1d6c76 |
|
18-Aug-2008 |
Tiger Yang <tiger.yang@oracle.com> |
ocfs2: Add extended attribute support This patch implements storing extended attributes both in inode or a single external block. We only store EA's in-inode when blocksize > 512 or that inode block has free space for it. When an EA's value is larger than 80 bytes, we will store the value via b-tree outside inode or block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
0eb8d47e |
|
18-Aug-2008 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Make high level btree extend code generic Factor out the non-inode specifics of ocfs2_do_extend_allocation() into a more generic function, ocfs2_do_cluster_allocation(). ocfs2_do_extend_allocation calls ocfs2_do_cluster_allocation() now, but the latter can be used for other btree types as well. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
5dabd695 |
|
21-Feb-2008 |
Jan Kara <jack@suse.cz> |
ocfs2: Improve rename locking ocfs2_rename() was being too aggressive with the rename lock - we only need it for certain forms of directory rename. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
4d0ddb2c |
|
05-Mar-2008 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Add inode stealing for ocfs2_reserve_new_inode Inode allocation is modified to look in other nodes allocators during extreme out of space situations. We retry our own slot when space is freed back to the global bitmap, or whenever we've allocated more than 1024 inodes from another slot. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
5fa0613e |
|
10-Jan-2008 |
Jan Kara <jack@suse.cz> |
ocfs2: Silence false lockdep warnings Create separate lockdep lock classes for system file's i_mutexes. They are used to guard allocations and similar things and thus rank differently than i_mutex of a regular file or directory. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
e63aecb6 |
|
18-Oct-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Rename ocfs2_meta_[un]lock Call this the "inode_lock" now, since it covers both data and meta data. This patch makes no functional changes. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
34d024f8 |
|
24-Sep-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Remove mount/unmount votes The node maps that are set/unset by these votes are no longer relevant, thus we can remove the mount and umount votes. Since those are the last two remaining votes, we can also remove the entire vote infrastructure. The vote thread has been renamed to the downconvert thread, and the small amount of functionality related to managing it has been moved into fs/ocfs2/dlmglue.c. All references to votes have been removed or updated. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
e325a88f |
|
31-Oct-2007 |
Srinivas Eeda <srinivas.eeda@oracle.com> |
ocfs2: fix rename vs unlink race If another node unlinks the destination while ocfs2_rename() is waiting on a cluster lock, ocfs2_rename() simply logs an error and continues. This causes a crash because the renaming node is now trying to delete a non-existent inode. The correct solution is to return -ENOENT. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
5b6a3a2b |
|
13-Sep-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Write support for directories with inline data Create all new directories with OCFS2_INLINE_DATA_FL and the inline data bytes formatted as an empty directory. Inode size field reflects the actual amount of inline data available, which makes searching for dirent space very similar to the regular directory search. Inline-data directories are automatically pushed out to extents on any insert request which is too large for the available space. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
#
38760e24 |
|
11-Sep-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Rename cleanups ocfs2_rename() does direct manipulation of the dirent it's gotten back from a directory search. Wrap this manipulation inside of a function so that we can transparently change directory update behavior in the future. As an added bonus, this gets rid of an ugly macro. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
#
be94d117 |
|
11-Sep-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Provide convenience function for ino lookup A couple paths which needed to just match a parent dir + name pair to an inode number were a bit messy because they had to deal with ocfs2_find_files_on_disk() which returns a larger number of values. Provide a convenience function, ocfs2_lookup_ino_from_name() which internalizes all the extra accounting. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
#
316f4b9f |
|
07-Sep-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Move directory manipulation code into dir.c The code for adding, removing, deleting directory entries was splattered all over namei.c. I'd rather have this all centralized, so that it's easier to make changes for inline dir data, and eventually indexed directories. None of the code in any of the functions was changed. I only removed the static keyword from some prototypes so that they could be exported. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
#
480214d7 |
|
06-Aug-2007 |
Sunil Mushran <sunil.musran@oracle.com> |
ocfs2: Fix rename/extend race If one process is extending a file while another is renaming it, there exists a window when rename could flush the old inode's stale i_size to disk. This patch recognizes the fact that rename is only updating the old inode's ctime, so it ensures only that value is flushed to disk. Signed-off-by: Sunil Mushran <sunil.musran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
2ae99a60 |
|
09-Mar-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Support creation of unwritten extents This can now be trivially supported with re-use of our existing extend code. ocfs2_allocate_unwritten_extents() takes a start offset and a byte length and iterates over the inode, adding extents (marked as unwritten) until len is reached. Existing extents are skipped over. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
1ca1a111 |
|
27-Apr-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: fix sparse warnings in fs/ocfs2 None of these are actually harmful, but the noise makes looking for real problems difficult. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
8110b073 |
|
22-Mar-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Fix up i_blocks calculation to know about holes Older file systems which didn't support holes did a dumb calculation of i_blocks based on i_size. This is no longer accurate, so fix things up to take actual allocation into account. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
4f902c37 |
|
09-Mar-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Fix extent lookup to return true size of holes Initially, we had wired things to return a size '1' of holes. Cook up a small amount of code to find the next extent and calculate the number of clusters between the virtual offset and the next allocated extent. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
49cb8d2d |
|
09-Mar-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Read from an unwritten extent returns zeros Return an optional extent flags field from our lookup functions and wire up callers to treat unwritten regions as holes for the purpose of returning zeros to the user. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
363041a5 |
|
17-Jan-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: temporarily remove extent map caching The code in extent_map.c is not prepared to deal with a subtree being rotated between lookups. This can happen when filling holes in sparse files. Instead of a lengthy patch to update the code (which would likely lose the benefit of caching subtree roots), we remove most of the algorithms and implement a simple path based lookup. A less ambitious extent caching scheme will be added in a later patch. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
dcd0538f |
|
16-Jan-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: sparse b-tree support Introduce tree rotations into the b-tree code. This will allow ocfs2 to support sparse files. Much of the added code is designed to be generic (in the ocfs2 sense) so that it can later be re-used to implement large extended attributes. This patch only adds the rotation code and does minimal updates to callers of the extent api. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
50008630 |
|
20-Mar-2007 |
Tiger Yang <tiger.yang@oracle.com> |
ocfs2: Remove delete inode vote Ocfs2 currently does cluster-wide node messaging to check the open state of an inode during delete. This patch removes that mechanism in favor of an inode cluster lock which is taken at shared read when an inode is first read and dropped in clear_inode(). This allows a deleting node to test the liveness of an inode by attempting to take an exclusive lock. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
a9f5f707 |
|
26-Apr-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: filter more error prints We don't want to print anything at all in ocfs2_lookup() when getting an error from ocfs2_iget() - it could be something as innocuous as a signal being detected in the dlm. ocfs2_permission() should filter on -ENOENT which ocfs2_meta_lock() can return if the inode was deleted on another node. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
1b3c3714 |
|
17-Feb-2007 |
Uwe Kleine-König <zeisberg@informatik.uni-freiburg.de> |
Fix typos concerning hierarchy heirarchical, hierachical -> hierarchical heirarchy, hierachy -> hierarchy Signed-off-by: Uwe Kleine-König <zeisberg@informatik.uni-freiburg.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
92e1d5be |
|
12-Feb-2007 |
Arjan van de Ven <arjan@linux.intel.com> |
[PATCH] mark struct inode_operations const 2 Many struct inode_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
592282cf |
|
02-Jan-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Directory c/mtime update fixes ocfs2 wasn't updating c/mtime on directories during dirent creation/deletion. Fix ocfs2_unlink(), ocfs2_rename() and __ocfs2_add_entry() by adding the proper code to update the struct inode and push the change out to disk. This helps rename/unlink on nfs exported file systems in particular as those clients compare directory time values to avoid a full re-reading a directory which hasn't changed. ocfs2_rename() loses some superfluous error handling as a result of this patch. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
c271c5c2 |
|
05-Dec-2006 |
Sunil Mushran <Sunil.Mushran@oracle.com> |
ocfs2: local mounts This allows users to format an ocfs2 file system with a special flag, OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT. When the file system sees this flag, it will not use any cluster services, nor will it require a cluster configuration, thus acting like a 'local' file system. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
d38eb8db |
|
26-Nov-2006 |
Tiger Yang <tiger.yang@oracle.com> |
ocfs2: implement i_op->permission Implement .permission() in ocfs2_file_iops, ocfs2_special_file_iops and ocfs2_dir_iops. This helps us avoid some multi-node races with mode change and vfs operations. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
1fabe148 |
|
09-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Remove struct ocfs2_journal_handle in favor of handle_t This is mostly a search and replace as ocfs2_journal_handle is now no more than a container for a handle_t pointer. ocfs2_commit_trans() becomes very straight forward, and we remove some out of date comments / code. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
65eff9cc |
|
09-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: remove handle argument to ocfs2_start_trans() All callers either pass in NULL directly, or a local variable that is already set to NULL. The internals of ocfs2_start_trans() get a nice cleanup as a result. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
02dc1af4 |
|
09-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: pass ocfs2_super * into ocfs2_commit_trans() This sets us up to remove handle->journal. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
4bcec184 |
|
09-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: remove unused handle argument from ocfs2_meta_lock_full() Now that this is unused and all callers pass NULL, we can safely remove it. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
85b9e783 |
|
06-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Don't allocate handle early in ocfs2_rename() It isn't used until ocfs2_start_trans() anyway. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
da5cbf2f |
|
06-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: don't use handle for locking in allocation functions Instead we record our state on the allocation context structure which all callers already know about and lifetime correctly. This means the reservation functions don't need a handle passed in any more, and we can also take it off the alloc context. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
8d5596c6 |
|
06-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: don't pass handle to ocfs2_meta_lock in ocfs2_rename() Take and drop the locks directly. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
6d8fc40e |
|
06-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: don't pass handle to ocfs2_meta_lock in ocfs2_symlink() Take and drop the locks directly. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
30a4f5e8 |
|
06-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: don't pass handle to ocfs2_meta_lock in ocfs2_unlink() Take and drop the locks directly. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
5098c27b |
|
05-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: don't pass handle to ocfs2_meta_lock() in orphan dir code Take and drop the locks directly. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
123a9643 |
|
05-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: don't pass handle to ocfs2_meta_lock() in ocfs2_link() Take and drop the locks directly. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
e3a82138 |
|
05-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: don't pass handle to ocfs2_meta_lock() in ocfs2_mknod() Take and drop the locks directly. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
711a40fc |
|
11-Oct-2006 |
Sunil Mushran <sunil.mushran@oracle.com> |
ocfs2: remove spurious d_count check in ocfs2_rename() This was causing some folks to incorrectly get -EBUSY during rename. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
17ff7856 |
|
01-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
[PATCH] r/o bind mounts: clean up OCFS2 nlink handling OCFS2 does some operations on i_nlink, then reverts them if some of its operations fail to complete. This does not fit in well with the drop_nlink() logic where we expect i_nlink to stay at zero once it gets there. So, delay all of the nlink operations until we're sure that the operations have completed. Also, introduce a small helper to check whether an inode has proper "unlinkable" i_nlink counts no matter whether it is a directory or regular inode. This patch is broken out from the others because it does contain some logical changes. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
d8c76e6f |
|
01-Oct-2006 |
Dave Hansen <haveblue@us.ibm.com> |
[PATCH] r/o bind mount prepwork: inc_nlink() helper This is mostly included for parity with dec_nlink(), where we will have some more hooks. This one should stay pretty darn straightforward for now. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
9a53c3a7 |
|
01-Oct-2006 |
Dave Hansen <haveblue@us.ibm.com> |
[PATCH] r/o bind mounts: unlink: monitor i_nlink When a filesystem decrements i_nlink to zero, it means that a write must be performed in order to drop the inode from the filesystem. We're shortly going to have keep filesystems from being remounted r/o between the time that this i_nlink decrement and that write occurs. So, add a little helper function to do the decrements. We'll tie into it in a bit to note when i_nlink hits zero. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
24c19ef4 |
|
22-Sep-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Remove i_generation from inode lock names OCFS2 puts inode meta data in the "lock value block" provided by the DLM. Typically, i_generation is encoded in the lock name so that a deleted inode on and a new one in the same block don't share the same lvb. Unfortunately, that scheme means that the read in ocfs2_read_locked_inode() is potentially thrown away as soon as the meta data lock is taken - we cannot encode the lock name without first knowing i_generation, which requires a disk read. This patch encodes i_generation in the inode meta data lvb, and removes the value from the inode meta data lock name. This way, the read can be covered by a lock, and at the same time we can distinguish between an up to date and a stale LVB. This will help cold-cache stat(2) performance in particular. Since this patch changes the protocol version, we take the opportunity to do a minor re-organization of two of the LVB fields. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
0027dd5b |
|
21-Sep-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Remove special casing for inode creation in ocfs2_dentry_attach_lock() We can't use LKM_LOCAL for new dentry locks because an unlink and subsequent re-create of a name/inode pair may result in the lock still being mastered somewhere in the cluster. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
379dfe9d |
|
08-Sep-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Hook rest of the file system into dentry locking API Actually replace the vote calls with the new dentry operations. Make any necessary adjustments to get the scheme to work. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
aa958874 |
|
21-Apr-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: implement directory read-ahead Uptodate.c now knows about read-ahead buffers. Use some more aggressive logic in ocfs2_readdir(). The two functions which currently use directory read-ahead are ocfs2_find_entry() and ocfs2_readdir(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
0f62de2c |
|
31-Aug-2006 |
Tiger Yang <tiger.yang@oracle.com> |
ocfs2: Fix directory link count checks in ocfs2_link() Remove the redundant "i_nlink >= OCFS2_LINK_MAX" check and adds an unlinked directory check in ocfs2_link(). Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
a663e305 |
|
09-Aug-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: move nlink check in ocfs2_mknod() The dir nlink check in ocfs2_mknod() was being done outside of the cluster lock, which means we could have been checking against a stale version of the inode. Fix this by doing the check after the cluster lock instead. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
5515eff8 |
|
26-Mar-2006 |
Andrew Morton <akpm@osdl.org> |
[PATCH] 2tb-files-add-blkcnt_t-fixes Cc: Takashi Sato <sho@tnes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
b0697053 |
|
03-Mar-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: don't use MLF* in the file system Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
ccd979bd |
|
15-Dec-2005 |
Mark Fasheh <mark.fasheh@oracle.com> |
[PATCH] OCFS2: The Second Oracle Cluster Filesystem The OCFS2 file system module. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
|