#
8a924db2 |
|
02-Oct-2023 |
Stefan Berger <stefanb@linux.ibm.com> |
fs: Pass AT_GETATTR_NOSEC flag to getattr interface function When vfs_getattr_nosec() calls a filesystem's getattr interface function then the 'nosec' should propagate into this function so that vfs_getattr_nosec() can again be called from the filesystem's gettattr rather than vfs_getattr(). The latter would add unnecessary security checks that the initial vfs_getattr_nosec() call wanted to avoid. Therefore, introduce the getattr flag GETATTR_NOSEC and allow to pass with the new getattr_flags parameter to the getattr interface function. In overlayfs and ecryptfs use this flag to determine which one of the two functions to call. In a recent code change introduced to IMA vfs_getattr_nosec() ended up calling vfs_getattr() in overlayfs, which in turn called security_inode_getattr() on an exiting process that did not have current->fs set anymore, which then caused a kernel NULL pointer dereference. With this change the call to security_inode_getattr() can be avoided, thus avoiding the NULL pointer dereference. Reported-by: <syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com> Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version") Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <linux-fsdevel@vger.kernel.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Tyler Hicks <code@tyhicks.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Suggested-by: Christian Brauner <brauner@kernel.org> Co-developed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Link: https://lore.kernel.org/r/20231002125733.1251467-1-stefanb@linux.vnet.ibm.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
420a62dd |
|
10-Oct-2023 |
Amir Goldstein <amir73il@gmail.com> |
ovl: Move xattr support to new xattrs.c file This moves the code from super.c and inode.c, and makes ovl_xattr_get/set() static. This is in preparation for doing more work on xattrs support. Signed-off-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
#
162d0644 |
|
19-Jul-2023 |
Amir Goldstein <amir73il@gmail.com> |
ovl: reorder ovl_want_write() after ovl_inode_lock() Make the locking order of ovl_inode_lock() strictly between the two vfs stacked layers, i.e.: - ovl vfs locks: sb_writers, inode_lock, ... - ovl_inode_lock - upper vfs locks: sb_writers, inode_lock, ... To that effect, move ovl_want_write() into the helpers ovl_nlink_start() and ovl_copy_up_start which currently take the ovl_inode_lock() after ovl_want_write(). Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
#
4ddbd0f1 |
|
04-Oct-2023 |
Jeff Layton <jlayton@kernel.org> |
overlayfs: convert to new timestamp accessors Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-58-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
f01d0889 |
|
21-May-2023 |
Andrea Righi <andrea.righi@canonical.com> |
ovl: make consistent use of OVL_FS() Always use OVL_FS() to retrieve the corresponding struct ovl_fs from a struct super_block. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
#
16aac5ad |
|
23-Apr-2023 |
Amir Goldstein <amir73il@gmail.com> |
ovl: support encoding non-decodable file handles When all layers support file handles, we support encoding non-decodable file handles (a.k.a. fid) even with nfs_export=off. When file handles do not need to be decoded, we do not need to copy up redirected lower directories on encode, and we encode also non-indexed upper with lower file handle, so fid will not change on copy up. This enables reporting fanotify events with file handles on overlayfs with default config/mount options. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
#
913e9928 |
|
07-Aug-2023 |
Jeff Layton <jlayton@kernel.org> |
fs: drop the timespec64 argument from update_time Now that all of the update_time operations are prepared for it, we can drop the timespec64 argument from the update_time operation. Do that and remove it from some associated functions like inode_update_time and inode_needs_update_time. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230807-mgctime-v7-8-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
dcb399de |
|
24-May-2023 |
Amir Goldstein <amir73il@gmail.com> |
ovl: pass ovl_fs to xino helpers Internal ovl methods should use ovl_fs and not sb as much as possible. Use a constant_table to translate from enum xino mode to string in preperation for new mount api option parsing. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
#
41665644 |
|
02-Apr-2023 |
Amir Goldstein <amir73il@gmail.com> |
ovl: prepare for lazy lookup of lowerdata inode Make the code handle the case of numlower > 1 and missing lowerdata dentry gracefully. Missing lowerdata dentry is an indication for lazy lookup of lowerdata and in that case the lowerdata_redirect path is stored in ovl_inode. Following commits will defer lookup and perform the lazy lookup on access. Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
2b21da92 |
|
26-Apr-2023 |
Amir Goldstein <amir73il@gmail.com> |
ovl: prepare to store lowerdata redirect for lazy lowerdata lookup Prepare to allow ovl_lookup() to leave the last entry in a non-dir lowerstack empty to signify lazy lowerdata lookup. In this case, ovl_lookup() stores the redirect path from metacopy to lowerdata in ovl_inode, which is going to be used later to perform the lazy lowerdata lookup. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
ab1eb5ff |
|
01-Apr-2023 |
Amir Goldstein <amir73il@gmail.com> |
ovl: deduplicate lowerdata and lowerstack[] The ovl_inode contains a copy of lowerdata in lowerstack[], so the lowerdata inode member can be removed. Use accessors ovl_lowerdata*() to get the lowerdata whereever the member was accessed directly. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
ac900ed4 |
|
01-Apr-2023 |
Amir Goldstein <amir73il@gmail.com> |
ovl: deduplicate lowerpath and lowerstack[] The ovl_inode contains a copy of lowerpath in lowerstack[0], so the lowerpath member can be removed. Use accessor ovl_lowerpath() to get the lowerpath whereever the member was accessed directly. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
0af950f5 |
|
07-Apr-2023 |
Amir Goldstein <amir73il@gmail.com> |
ovl: move ovl_entry into ovl_inode The lower stacks of all the ovl inode aliases should be identical and there is redundant information in ovl_entry and ovl_inode. Move lowerstack into ovl_inode and keep only the OVL_E_FLAGS per overlay dentry. Following patches will deduplicate redundant ovl_inode fields. Note that for pure upper and negative dentries, OVL_E(dentry) may be NULL now, so it is imporatnt to use the ovl_numlower() accessor. Reviewed-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
f4e19e59 |
|
16-May-2023 |
Zhihao Cheng <chengzhihao1@huawei.com> |
ovl: fix null pointer dereference in ovl_get_acl_rcu() Following process: P1 P2 path_openat link_path_walk may_lookup inode_permission(rcu) ovl_permission acl_permission_check check_acl get_cached_acl_rcu ovl_get_inode_acl realinode = ovl_inode_real(ovl_inode) drop_cache __dentry_kill(ovl_dentry) iput(ovl_inode) ovl_destroy_inode(ovl_inode) dput(oi->__upperdentry) dentry_kill(upperdentry) dentry_unlink_inode upperdentry->d_inode = NULL ovl_inode_upper upperdentry = ovl_i_dentry_upper(ovl_inode) d_inode(upperdentry) // returns NULL IS_POSIXACL(realinode) // NULL pointer dereference , will trigger an null pointer dereference at realinode: [ 205.472797] BUG: kernel NULL pointer dereference, address: 0000000000000028 [ 205.476701] CPU: 2 PID: 2713 Comm: ls Not tainted 6.3.0-12064-g2edfa098e750-dirty #1216 [ 205.478754] RIP: 0010:do_ovl_get_acl+0x5d/0x300 [ 205.489584] Call Trace: [ 205.489812] <TASK> [ 205.490014] ovl_get_inode_acl+0x26/0x30 [ 205.490466] get_cached_acl_rcu+0x61/0xa0 [ 205.490908] generic_permission+0x1bf/0x4e0 [ 205.491447] ovl_permission+0x79/0x1b0 [ 205.491917] inode_permission+0x15e/0x2c0 [ 205.492425] link_path_walk+0x115/0x550 [ 205.493311] path_lookupat.isra.0+0xb2/0x200 [ 205.493803] filename_lookup+0xda/0x240 [ 205.495747] vfs_fstatat+0x7b/0xb0 Fetch a reproducer in [Link]. Use the helper ovl_i_path_realinode() to get realinode and then do non-nullptr checking. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217404 Fixes: 332f606b32b6 ("ovl: enable RCU'd ->get_acl()") Cc: <stable@vger.kernel.org> # v5.15 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Christian Brauner <brauner@kernel.org> Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
1a73f5b8 |
|
16-May-2023 |
Zhihao Cheng <chengzhihao1@huawei.com> |
ovl: fix null pointer dereference in ovl_permission() Following process: P1 P2 path_lookupat link_path_walk inode_permission ovl_permission ovl_i_path_real(inode, &realpath) path->dentry = ovl_i_dentry_upper(inode) drop_cache __dentry_kill(ovl_dentry) iput(ovl_inode) ovl_destroy_inode(ovl_inode) dput(oi->__upperdentry) dentry_kill(upperdentry) dentry_unlink_inode upperdentry->d_inode = NULL realinode = d_inode(realpath.dentry) // return NULL inode_permission(realinode) inode->i_sb // NULL pointer dereference , will trigger an null pointer dereference at realinode: [ 335.664979] BUG: kernel NULL pointer dereference, address: 0000000000000002 [ 335.668032] CPU: 0 PID: 2592 Comm: ls Not tainted 6.3.0 [ 335.669956] RIP: 0010:inode_permission+0x33/0x2c0 [ 335.678939] Call Trace: [ 335.679165] <TASK> [ 335.679371] ovl_permission+0xde/0x320 [ 335.679723] inode_permission+0x15e/0x2c0 [ 335.680090] link_path_walk+0x115/0x550 [ 335.680771] path_lookupat.isra.0+0xb2/0x200 [ 335.681170] filename_lookup+0xda/0x240 [ 335.681922] vfs_statx+0xa6/0x1f0 [ 335.682233] vfs_fstatat+0x7b/0xb0 Fetch a reproducer in [Link]. Use the helper ovl_i_path_realinode() to get realinode and then do non-nullptr checking. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217405 Fixes: 4b7791b2e958 ("ovl: handle idmappings in ovl_permission()") Cc: <stable@vger.kernel.org> # v5.19 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Christian Brauner <brauner@kernel.org> Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
4d7ca409 |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port vfs{g,u}id helpers to mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
9452e93e |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port privilege checking helpers to mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
01beba79 |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port inode_owner_or_capable() to mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
4609e1f1 |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port ->permission() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
8782a9ae |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port ->fileattr_set() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
13e83a49 |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port ->set_acl() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
77435322 |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port ->get_acl() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
b74d24f7 |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port ->getattr() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
c1632a0f |
|
12-Jan-2023 |
Christian Brauner <brauner@kernel.org> |
fs: port ->setattr() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
5b52aebe |
|
03-Nov-2022 |
Christian Brauner <brauner@kernel.org> |
ovl: call posix_acl_release() after error checking The current placement of posix_acl_release() in ovl_set_or_remove_acl() means it can be called on an error pointer instead of actual acls. Fix this by moving the posix_acl_release() call after the error handling. Fixes: 0e641857322f ("ovl: implement set acl method") # mainline only Reported-by: syzbot+3f6ef1c4586bb6fd1f61@syzkaller.appspotmail.com Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
31acceb9 |
|
22-Sep-2022 |
Christian Brauner <brauner@kernel.org> |
ovl: use posix acl api Now that posix acls have a proper api us it to copy them. All filesystems that can serve as lower or upper layers for overlayfs have gained support for the new posix acl api in previous patches. So switch all internal overlayfs codepaths for copying posix acls to the new posix acl api. Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
0e641857 |
|
22-Sep-2022 |
Christian Brauner <brauner@kernel.org> |
ovl: implement set acl method The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. In order to build a type safe posix api around get and set acl we need all filesystem to implement get and set acl. Now that we have added get and set acl inode operations that allow easy access to the dentry we give overlayfs it's own get and set acl inode operations. The set acl inode operation is duplicates most of the ovl posix acl xattr handler. The main difference being that the set acl inode operation relies on the new posix acl api. Once the vfs has been switched over the custom posix acl xattr handler will be removed completely. Note, until the vfs has been switched to the new posix acl api this patch is a non-functional change. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
6c0a8bfb |
|
22-Sep-2022 |
Christian Brauner <brauner@kernel.org> |
ovl: implement get acl method The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. In order to build a type safe posix api around get and set acl we need all filesystem to implement get and set acl. Now that we have added get and set acl inode operations that allow easy access to the dentry we give overlayfs it's own get and set acl inode operations. Since overlayfs is a stacking filesystem it will use the newly added posix acl api when retrieving posix acls from the relevant layer. Since overlayfs can also be mounted on top of idmapped layers. If idmapped layers are used overlayfs must take the layer's idmapping into account after it retrieved the posix acls from the relevant layer. Note, until the vfs has been switched to the new posix acl api this patch is a non-functional change. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
cac2f8b8 |
|
22-Sep-2022 |
Christian Brauner <brauner@kernel.org> |
fs: rename current get acl method The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. The current inode operation for getting posix acls takes an inode argument but various filesystems (e.g., 9p, cifs, overlayfs) need access to the dentry. In contrast to the ->set_acl() inode operation we cannot simply extend ->get_acl() to take a dentry argument. The ->get_acl() inode operation is called from: acl_permission_check() -> check_acl() -> get_acl() which is part of generic_permission() which in turn is part of inode_permission(). Both generic_permission() and inode_permission() are called in the ->permission() handler of various filesystems (e.g., overlayfs). So simply passing a dentry argument to ->get_acl() would amount to also having to pass a dentry argument to ->permission(). We should avoid this unnecessary change. So instead of extending the existing inode operation rename it from ->get_acl() to ->get_inode_acl() and add a ->get_acl() method later that passes a dentry argument and which filesystems that need access to the dentry can implement instead of ->get_inode_acl(). Filesystems like cifs which allow setting and getting posix acls but not using them for permission checking during lookup can simply not implement ->get_inode_acl(). This is intended to be a non-functional change. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Suggested-by/Inspired-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
2d343087 |
|
04-Aug-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
overlayfs: constify path Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
abfcf55d |
|
16-Aug-2022 |
Christian Brauner <brauner@kernel.org> |
acl: handle idmapped mounts for idmapped filesystems Ensure that POSIX ACLs checking, getting, and setting works correctly for filesystems mountable with a filesystem idmapping ("fs_idmapping") that want to support idmapped mounts ("mnt_idmapping"). Note that no filesystems mountable with an fs_idmapping do yet support idmapped mounts. This is required infrastructure work to unblock this. As we explained in detail in [1] the fs_idmapping is irrelevant for getxattr() and setxattr() when mapping the ACL_{GROUP,USER} {g,u}ids stored in the uapi struct posix_acl_xattr_entry in posix_acl_fix_xattr_{from,to}_user(). But for acl_permission_check() and posix_acl_{g,s}etxattr_idmapped_mnt() the fs_idmapping matters. acl_permission_check(): During lookup POSIX ACLs are retrieved directly via i_op->get_acl() and are returned via the kernel internal struct posix_acl which contains e_{g,u}id members of type k{g,u}id_t that already take the fs_idmapping into acccount. For example, a POSIX ACL stored with u4 on the backing store is mapped to k10000004 in the fs_idmapping. The mnt_idmapping remaps the POSIX ACL to k20000004. In order to do that the fs_idmapping needs to be taken into account but that doesn't happen yet (Again, this is a counterfactual currently as fuse doesn't support idmapped mounts currently. It's just used as a convenient example.): fs_idmapping: u0:k10000000:r65536 mnt_idmapping: u0:v20000000:r65536 ACL_USER: k10000004 acl_permission_check() -> check_acl() -> get_acl() -> i_op->get_acl() == fuse_get_acl() -> posix_acl_from_xattr(u0:k10000000:r65536 /* fs_idmapping */, ...) { k10000004 = make_kuid(u0:k10000000:r65536 /* fs_idmapping */, u4 /* ACL_USER */); } -> posix_acl_permission() { -1 = make_vfsuid(u0:v20000000:r65536 /* mnt_idmapping */, &init_user_ns, k10000004); vfsuid_eq_kuid(-1, k10000004 /* caller_fsuid */) } In order to correctly map from the fs_idmapping into mnt_idmapping we require the relevant fs_idmaping to be passed: acl_permission_check() -> check_acl() -> get_acl() -> i_op->get_acl() == fuse_get_acl() -> posix_acl_from_xattr(u0:k10000000:r65536 /* fs_idmapping */, ...) { k10000004 = make_kuid(u0:k10000000:r65536 /* fs_idmapping */, u4 /* ACL_USER */); } -> posix_acl_permission() { v20000004 = make_vfsuid(u0:v20000000:r65536 /* mnt_idmapping */, u0:k10000000:r65536 /* fs_idmapping */, k10000004); vfsuid_eq_kuid(v20000004, k10000004 /* caller_fsuid */) } The initial_idmapping is only correct for the current situation because all filesystems that currently support idmapped mounts do not support being mounted with an fs_idmapping. Note that ovl_get_acl() is used to retrieve the POSIX ACLs from the relevant lower layer and the lower layer's mnt_idmapping needs to be taken into account and so does the fs_idmapping. See 0c5fd887d2bb ("acl: move idmapped mount fixup into vfs_{g,s}etxattr()") for more details. For posix_acl_{g,s}etxattr_idmapped_mnt() it is not as obvious why the fs_idmapping matters as it is for acl_permission_check(). Especially because it doesn't matter for posix_acl_fix_xattr_{from,to}_user() (See [1] for more context.). Because posix_acl_{g,s}etxattr_idmapped_mnt() operate on the uapi struct posix_acl_xattr_entry which contains {g,u}id_t values and thus give the impression that the fs_idmapping is irrelevant as at this point appropriate {g,u}id_t values have seemlingly been generated. As we've stated multiple times this assumption is wrong and in fact the uapi struct posix_acl_xattr_entry is taking idmappings into account depending at what place it is operated on. posix_acl_getxattr_idmapped_mnt() When posix_acl_getxattr_idmapped_mnt() is called the values stored in the uapi struct posix_acl_xattr_entry are mapped according to the fs_idmapping. This happened when they were read from the backing store and then translated from struct posix_acl into the uapi struct posix_acl_xattr_entry during posix_acl_to_xattr(). In other words, the fs_idmapping matters as the values stored as {g,u}id_t in the uapi struct posix_acl_xattr_entry have been generated by it. So we need to take the fs_idmapping into account during make_vfsuid() in posix_acl_getxattr_idmapped_mnt(). posix_acl_setxattr_idmapped_mnt() When posix_acl_setxattr_idmapped_mnt() is called the values stored as {g,u}id_t in uapi struct posix_acl_xattr_entry are intended to be the values that ultimately get turned back into a k{g,u}id_t in posix_acl_from_xattr() (which turns the uapi struct posix_acl_xattr_entry into the kernel internal struct posix_acl). In other words, the fs_idmapping matters as the values stored as {g,u}id_t in the uapi struct posix_acl_xattr_entry are intended to be the values that will be undone in the fs_idmapping when writing to the backing store. So we need to take the fs_idmapping into account during from_vfsuid() in posix_acl_setxattr_idmapped_mnt(). Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Fixes: 0c5fd887d2bb ("acl: move idmapped mount fixup into vfs_{g,s}etxattr()") Cc: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Link: https://lore.kernel.org/r/20220816113514.43304-1-brauner@kernel.org
|
#
ded53656 |
|
27-Jul-2022 |
Yang Xu <xuyang2018.jy@fujitsu.com> |
ovl: improve ovl_get_acl() if POSIX ACL support is off Provide a proper stub for the !CONFIG_FS_POSIX_ACL case. Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
1aa5fef5 |
|
06-Jul-2022 |
Christian Brauner <brauner@kernel.org> |
ovl: handle idmappings in ovl_get_acl() During permission checking overlayfs will call ovl_permission() -> generic_permission() -> acl_permission_check() -> check_acl() -> get_acl() -> inode->i_op->get_acl() == ovl_get_acl() -> get_acl() /* on the underlying filesystem */ -> inode->i_op->get_acl() == /*lower filesystem callback */ -> posix_acl_permission() passing through the get_acl() request to the underlying filesystem. Before returning these values to the VFS we need to take the idmapping of the relevant layer into account and translate any ACL_{GROUP,USER} values according to the idmapped mount. We cannot alter the ACLs returned from the relevant layer directly as that would alter the cached values filesystem wide for the lower filesystem. Instead we can clone the ACLs and then apply the relevant idmapping of the layer. This is obviously only relevant when idmapped layers are used. Link: https://lore.kernel.org/r/20220708090134.385160-4-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: linux-unionfs@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
#
8bc0095d |
|
03-Apr-2022 |
Christian Brauner <brauner@kernel.org> |
ovl: handle idmappings in ovl_xattr_{g,s}et() When retrieving xattrs from the upper or lower layers take the relevant mount's idmapping into account. We rely on the previously introduced ovl_i_path_real() helper to retrieve the relevant path. This is needed to support idmapped base layers with overlay. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
4b7791b2 |
|
03-Apr-2022 |
Christian Brauner <brauner@kernel.org> |
ovl: handle idmappings in ovl_permission() Use the previously introduced ovl_i_path_real() helper to retrieve the relevant upper or lower path and take the mount's idmapping into account for the lower layer permission check. This is needed to support idmapped base layers with overlay. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
2878dffc |
|
03-Apr-2022 |
Christian Brauner <brauner@kernel.org> |
ovl: use ovl_copy_{real,upper}attr() wrappers When copying inode attributes from the upper or lower layer to ovl inodes we need to take the upper or lower layer's mount's idmapping into account. In a lot of places we call ovl_copyattr() only on upper inodes and in some we call it on either upper or lower inodes. Split this into two separate helpers. The first one should only be called on upper inodes and is thus called ovl_copy_upperattr(). The second one can be called on upper or lower inodes. We add ovl_copy_realattr() for this task. The new helper makes use of the previously added ovl_i_path_real() helper. This is needed to support idmapped base layers with overlay. When overlay copies the inode information from an upper or lower layer to the relevant overlay inode it will apply the idmapping of the upper or lower layer when doing so. The ovl inode ownership will thus always correctly reflect the ownership of the idmapped upper or lower layer. All idmapping helpers are nops when no idmapped base layers are used. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
ffa5723c |
|
03-Apr-2022 |
Amir Goldstein <amir73il@gmail.com> |
ovl: store lower path in ovl_inode Create some ovl_i_* helpers to get real path from ovl inode. Instead of just stashing struct inode for the lower layer we stash struct path for the lower layer. The helpers allow to retrieve a struct path for the relevant upper or lower layer. This will be used when retrieving information based on struct inode when copying up inode attributes from upper or lower inodes to ovl inodes and when checking permissions in ovl_permission() in following patches. This is needed to support idmapped base layers with overlay. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
50db8d02 |
|
03-Apr-2022 |
Christian Brauner <brauner@kernel.org> |
ovl: handle idmappings for layer fileattrs Take the upper mount's idmapping into account when setting fileattrs on the upper layer. This is needed to support idmapped base layers with overlay. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
dad7017a |
|
03-Apr-2022 |
Christian Brauner <brauner@kernel.org> |
ovl: use ovl_path_getxattr() wrapper Add a helper that allows to retrieve ovl xattrs from either lower or upper layers. To stop passing mnt and dentry separately everywhere use struct path which more accurately reflects the tight coupling between mount and dentry in this helper. Swich over all places to pass a path argument that can operate on either upper or lower layers. This is needed to support idmapped base layers with overlayfs. Some helpers are always called with an upper dentry, which is now utilized by these helpers to create the path. Make this usage explicit by renaming the argument to "upperdentry" and by renaming the function as well in some cases. Also add a check in ovl_do_getxattr() to catch misuse of these functions. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
a15506ea |
|
03-Apr-2022 |
Christian Brauner <brauner@kernel.org> |
ovl: use ovl_do_notify_change() wrapper Introduce ovl_do_notify_change() as a simple wrapper around notify_change() to support idmapped layers. The helper mirrors other ovl_do_*() helpers that operate on the upper layers. When changing ownership of an upper object the intended ownership needs to be mapped according to the upper layer's idmapping. This mapping is the inverse to the mapping applied when copying inode information from an upper layer to the corresponding overlay inode. So e.g., when an upper mount maps files that are stored on-disk as owned by id 1001 to 1000 this means that calling stat on this object from an idmapped mount will report the file as being owned by id 1000. Consequently in order to change ownership of an object in this filesystem so it appears as being owned by id 1000 in the upper idmapped layer it needs to store id 1001 on disk. The mnt mapping helpers take care of this. All idmapping helpers are nops when no idmapped base layers are used. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
c914c0e2 |
|
03-Apr-2022 |
Amir Goldstein <amir73il@gmail.com> |
ovl: use wrappers to all vfs_*xattr() calls Use helpers ovl_*xattr() to access user/trusted.overlay.* xattrs and use helpers ovl_do_*xattr() to access generic xattrs. This is a preparatory patch for using idmapped base layers with overlay. Note that a few of those places called vfs_*xattr() calls directly to reduce the amount of debug output. But as Miklos pointed out since overlayfs has been stable for quite some time the debug output isn't all that relevant anymore and the additional debug in all locations was actually quite helpful when developing this patch series. Cc: <linux-unionfs@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
5b0a414d |
|
04-Nov-2021 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: fix filattr copy-up failure This regression can be reproduced with ntfs-3g and overlayfs: mkdir lower upper work overlay dd if=/dev/zero of=ntfs.raw bs=1M count=2 mkntfs -F ntfs.raw mount ntfs.raw lower touch lower/file.txt mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work - overlay mv overlay/file.txt overlay/file2.txt mv fails and (misleadingly) prints mv: cannot move 'overlay/file.txt' to a subdirectory of itself, 'overlay/file2.txt' The reason is that ovl_copy_fileattr() is triggered due to S_NOATIME being set on all inodes (by fuse) regardless of fileattr. ovl_copy_fileattr() tries to retrieve file attributes from lower file, but that fails because filesystem does not support this ioctl (this should fail with ENOTTY, but ntfs-3g return EINVAL instead). This failure is propagated to origial operation (in this case rename) that triggered the copy-up. The fix is to ignore ENOTTY and EINVAL errors from fileattr_get() in copy up. This also requires turning the internal ENOIOCTLCMD into ENOTTY. As a further measure to prevent unnecessary failures, only try the fileattr_get/set on upper if there are any flags to copy up. Side note: a number of filesystems set S_NOATIME (and sometimes other inode flags) irrespective of fileattr flags. This causes unnecessary calls during copy up, which might lead to a performance issue, especially if latency is high. To fix this, the kernel would need to differentiate between the two cases. E.g. introduce SB_NOATIME_UPDATE, a per-sb variant of S_NOATIME. SB_NOATIME doesn't work, because that's interpreted as "filesystem doesn't store an atime attribute" Reported-and-tested-by: Kevin Locke <kevin@kevinlocke.name> Fixes: 72db82115d2b ("ovl: copy up sync/noatime fileattr flags") Cc: <stable@vger.kernel.org> # v5.15 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
332f606b |
|
18-Aug-2021 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: enable RCU'd ->get_acl() Overlayfs does not cache ACL's (to avoid double caching). Instead it just calls the underlying filesystem's i_op->get_acl(), which will return the cached value, if possible. In rcu path walk, however, get_cached_acl_rcu() is employed to get the value from the cache, which will fail on overlayfs resulting in dropping out of rcu walk mode. This can result in a big performance hit in certain situations. Fix by calling ->get_acl() with rcu=true in case of ACL_DONT_CACHE (which indicates pass-through) Reported-by: garyhuang <zjh.20052005@163.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
0cad6246 |
|
18-Aug-2021 |
Miklos Szeredi <mszeredi@redhat.com> |
vfs: add rcu argument to ->get_acl() callback Add a rcu argument to the ->get_acl() callback to allow get_cached_acl_rcu() to call the ->get_acl() method in the next patch. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
d8991e86 |
|
09-Mar-2021 |
Chengguang Xu <cgxu519@mykernel.net> |
ovl: update ctime when changing fileattr Currently we keep size, mode and times of overlay inode as the same as upper inode, so should update ctime when changing file attribution as well. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
b71759ef |
|
24-Apr-2021 |
Chengguang Xu <cgxu519@mykernel.net> |
ovl: skip checking lower file's i_writecount on truncate It is possible that a directory tree is shared between multiple overlay instances as a lower layer. In this case when one instance executes a file residing on the lower layer, the other instance denies a truncate(2) call on this file. This only happens for truncate(2) and not for open(2) with the O_TRUNC flag. Fix this interference and inconsistency by removing the preliminary i_writecount check before copy-up. This means that unlike on normal filesystems truncate(argv[0]) will now succeed. If this ever causes a regression in a real world use case this needs to be revisited. One way to fix this properly would be to keep a correct i_writecount in the overlay inode, but that is difficult due to memory mapping code only dealing with the real file/inode. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
096a218a |
|
18-Jun-2021 |
Amir Goldstein <amir73il@gmail.com> |
ovl: consistent behavior for immutable/append-only inodes When a lower file has immutable/append-only fileattr flags, the behavior of overlayfs post copy up is inconsistent. Immediattely after copy up, ovl inode still has the S_IMMUTABLE/S_APPEND inode flags copied from lower inode, so vfs code still treats the ovl inode as immutable/append-only. After ovl inode evict or mount cycle, the ovl inode does not have these inode flags anymore. We cannot copy up the immutable and append-only fileattr flags, because immutable/append-only inodes cannot be linked and because overlayfs will not be able to set overlay.* xattr on the upper inodes. Instead, if any of the fileattr flags of interest exist on the lower inode, we store them in overlay.protattr xattr on the upper inode and we read the flags from xattr on lookup and on fileattr_get(). This gives consistent behavior post copy up regardless of inode eviction from cache. When user sets new fileattr flags, we update or remove the overlay.protattr xattr. Storing immutable/append-only fileattr flags in an xattr instead of upper fileattr also solves other non-standard behavior issues - overlayfs can now copy up children of "ovl-immutable" directories and lower aliases of "ovl-immutable" hardlinks. Reported-by: Chengguang Xu <cgxu519@mykernel.net> Link: https://lore.kernel.org/linux-unionfs/20201226104618.239739-1-cgxu519@mykernel.net/ Link: https://lore.kernel.org/linux-unionfs/20210210190334.1212210-5-amir73il@gmail.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
72db8211 |
|
18-Jun-2021 |
Amir Goldstein <amir73il@gmail.com> |
ovl: copy up sync/noatime fileattr flags When a lower file has sync/noatime fileattr flags, the behavior of overlayfs post copy up is inconsistent. Immediately after copy up, ovl inode still has the S_SYNC/S_NOATIME inode flags copied from lower inode, so vfs code still treats the ovl inode as sync/noatime. After ovl inode evict or mount cycle, the ovl inode does not have these inode flags anymore. To fix this inconsistency, try to copy the fileattr flags on copy up if the upper fs supports the fileattr_set() method. This gives consistent behavior post copy up regardless of inode eviction from cache. We cannot copy up the immutable/append-only inode flags in a similar manner, because immutable/append-only inodes cannot be linked and because overlayfs will not be able to set overlay.* xattr on the upper inodes. Those flags will be addressed by a followup patch. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
f48bbfb2 |
|
12-Mar-2021 |
Bhaskar Chowdhury <unixbhaskar@gmail.com> |
ovl: trivial typo fixes in the file inode.c s/peresistent/persistent/ s/xatts/xattrs/ s/annotaion/annotation/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
c68e7ec5 |
|
22-Feb-2021 |
youngjun <her0gyugyu@gmail.com> |
ovl: remove ovl_map_dev_ino() return value ovl_map_dev_ino() always returns success. Remove unnecessary return value. Signed-off-by: youngjun <her0gyugyu@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
66dbfabf |
|
07-Apr-2021 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: stack fileattr ops Add stacking for the fileattr operations. Add hack for calling security_file_ioctl() for now. Probably better to have a pair of specific hooks for these operations. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
549c7297 |
|
21-Jan-2021 |
Christian Brauner <christian.brauner@ubuntu.com> |
fs: make helpers idmap mount aware Extend some inode methods with an additional user namespace argument. A filesystem that is aware of idmapped mounts will receive the user namespace the mount has been marked with. This can be used for additional permission checking and also to enable filesystems to translate between uids and gids if they need to. We have implemented all relevant helpers in earlier patches. As requested we simply extend the exisiting inode method instead of introducing new ones. This is a little more code churn but it's mostly mechanical and doesnt't leave us with additional inode methods. Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
#
c7c7a1a1 |
|
21-Jan-2021 |
Tycho Andersen <tycho@tycho.pizza> |
xattr: handle idmapped mounts When interacting with extended attributes the vfs verifies that the caller is privileged over the inode with which the extended attribute is associated. For posix access and posix default extended attributes a uid or gid can be stored on-disk. Let the functions handle posix extended attributes on idmapped mounts. If the inode is accessed through an idmapped mount we need to map it according to the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. This has no effect for e.g. security xattrs since they don't store uids or gids and don't perform permission checks on them like posix acls do. Link: https://lore.kernel.org/r/20210121131959.646623-10-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Tycho Andersen <tycho@tycho.pizza> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
#
2f221d6f |
|
21-Jan-2021 |
Christian Brauner <christian.brauner@ubuntu.com> |
attr: handle idmapped mounts When file attributes are changed most filesystems rely on the setattr_prepare(), setattr_copy(), and notify_change() helpers for initialization and permission checking. Let them handle idmapped mounts. If the inode is accessed through an idmapped mount map it into the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Helpers that perform checks on the ia_uid and ia_gid fields in struct iattr assume that ia_uid and ia_gid are intended values and have already been mapped correctly at the userspace-kernelspace boundary as we already do today. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-8-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
#
47291baa |
|
21-Jan-2021 |
Christian Brauner <christian.brauner@ubuntu.com> |
namei: make permission helpers idmapped mount aware The two helpers inode_permission() and generic_permission() are used by the vfs to perform basic permission checking by verifying that the caller is privileged over an inode. In order to handle idmapped mounts we extend the two helpers with an additional user namespace argument. On idmapped mounts the two helpers will make sure to map the inode according to the mount's user namespace and then peform identical permission checks to inode_permission() and generic_permission(). If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-6-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
#
554677b9 |
|
28-Jan-2021 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: perform vfs_getxattr() with mounter creds The vfs_getxattr() in ovl_xattr_set() is used to check whether an xattr exist on a lower layer file that is to be removed. If the xattr does not exist, then no need to copy up the file. This call of vfs_getxattr() wasn't wrapped in credential override, and this is probably okay. But for consitency wrap this instance as well. Reported-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
2d2f2d73 |
|
14-Dec-2020 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: user xattr Optionally allow using "user.overlay." namespace instead of "trusted.overlay." This is necessary for overlayfs to be able to be mounted in an unprivileged namepsace. Make the option explicit, since it makes the filesystem format be incompatible. Disable redirect_dir and metacopy options, because these would allow privilege escalation through direct manipulation of the "user.overlay.redirect" or "user.overlay.metacopy" xattrs. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
|
#
c11faf32 |
|
24-Jun-2020 |
Chengguang Xu <cgxu519@mykernel.net> |
ovl: fix incorrect extent info in metacopy case In metacopy case, we should use ovl_inode_realdata() instead of ovl_inode_real() to get real inode which has data, so that we can get correct information of extentes in ->fiemap operation. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
8f6ee74c |
|
02-Sep-2020 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: rearrange ovl_can_list() ovl_can_list() should return false for overlay private xattrs. Since currently these use the "trusted.overlay." prefix, they will always match the "trusted." prefix as well, hence the test for being non-trusted will not trigger. Prepare for using the "user.overlay." namespace by moving the test for private xattr before the test for non-trusted. This patch doesn't change behavior. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
610afc0b |
|
02-Sep-2020 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: pass ovl_fs down to functions accessing private xattrs This paves the way for optionally using the "user.overlay." xattr namespace. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
26150ab5 |
|
02-Sep-2020 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: drop flags argument from ovl_do_setxattr() All callers pass zero flags to ovl_do_setxattr(). So drop this argument. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
d5dc7486 |
|
02-Sep-2020 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: use ovl_do_getxattr() for private xattr Use the convention of calling ovl_do_foo() for operations which are overlay specific. This patch is a no-op, and will have significance for supporting "user.overlay." xattr namespace. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
74c6e384 |
|
04-Jun-2020 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: make oip->index bool ovl_get_inode() uses oip->index as a bool value, not as a pointer. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
08f4c7c8 |
|
04-Jun-2020 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: add accessor for ofs->upper_mnt Next patch will remove ofs->upper_mnt, so add an accessor function for this field. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
45dd052e |
|
23-May-2020 |
Christoph Hellwig <hch@lst.de> |
fs: handle FIEMAP_FLAG_SYNC in fiemap_prep By moving FIEMAP_FLAG_SYNC handling to fiemap_prep we ensure it is handled once instead of duplicated, but can still be done under fs locks, like xfs/iomap intended with its duplicate handling. Also make sure the error value of filemap_write_and_wait is propagated to user space. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20200523073016.2944131-8-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
10c5db28 |
|
23-May-2020 |
Christoph Hellwig <hch@lst.de> |
fs: move the fiemap definitions out of fs.h No need to pull the fiemap definitions into almost every file in the kernel build. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
28166ab3 |
|
01-Jun-2020 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: initialize OVL_UPPERDATA in ovl_lookup() Currently ovl_get_inode() initializes OVL_UPPERDATA flag and for that it has to call ovl_check_metacopy_xattr() and check if metacopy xattr is present or not. yangerkun reported sometimes underlying filesystem might return -EIO and in that case error handling path does not cleanup properly leading to various warnings. Run generic/461 with ext4 upper/lower layer sometimes may trigger the bug as below(linux 4.19): [ 551.001349] overlayfs: failed to get metacopy (-5) [ 551.003464] overlayfs: failed to get inode (-5) [ 551.004243] overlayfs: cleanup of 'd44/fd51' failed (-5) [ 551.004941] overlayfs: failed to get origin (-5) [ 551.005199] ------------[ cut here ]------------ [ 551.006697] WARNING: CPU: 3 PID: 24674 at fs/inode.c:1528 iput+0x33b/0x400 ... [ 551.027219] Call Trace: [ 551.027623] ovl_create_object+0x13f/0x170 [ 551.028268] ovl_create+0x27/0x30 [ 551.028799] path_openat+0x1a35/0x1ea0 [ 551.029377] do_filp_open+0xad/0x160 [ 551.029944] ? vfs_writev+0xe9/0x170 [ 551.030499] ? page_counter_try_charge+0x77/0x120 [ 551.031245] ? __alloc_fd+0x160/0x2a0 [ 551.031832] ? do_sys_open+0x189/0x340 [ 551.032417] ? get_unused_fd_flags+0x34/0x40 [ 551.033081] do_sys_open+0x189/0x340 [ 551.033632] __x64_sys_creat+0x24/0x30 [ 551.034219] do_syscall_64+0xd5/0x430 [ 551.034800] entry_SYSCALL_64_after_hwframe+0x44/0xa9 One solution is to improve error handling and call iget_failed() if error is encountered. Amir thinks that this path is little intricate and there is not real need to check and initialize OVL_UPPERDATA in ovl_get_inode(). Instead caller of ovl_get_inode() can initialize this state. And this will avoid double checking of metacopy xattr lookup in ovl_lookup() and ovl_get_inode(). OVL_UPPERDATA is inode flag. So I was little concerned that initializing it outside ovl_get_inode() might have some races. But this is one way transition. That is once a file has been fully copied up, it can't go back to metacopy file again. And that seems to help avoid races. So as of now I can't see any races w.r.t OVL_UPPERDATA being set wrongly. So move settingof OVL_UPPERDATA inside the callers of ovl_get_inode(). ovl_obtain_alias() already does it. So only two callers now left are ovl_lookup() and ovl_instantiate(). Reported-by: yangerkun <yangerkun@huawei.com> Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
15fd2ea9 |
|
22-Apr-2020 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: clear ATTR_OPEN from attr->ia_valid As of now during open(), we don't pass bunch of flags to underlying filesystem. O_TRUNC is one of these. Normally this is not a problem as VFS calls ->setattr() with zero size and underlying filesystem sets file size to 0. But when overlayfs is running on top of virtiofs, it has an optimization where it does not send setattr request to server if dectects that truncation is part of open(O_TRUNC). It assumes that server already zeroed file size as part of open(O_TRUNC). fuse_do_setattr() { if (attr->ia_valid & ATTR_OPEN) { /* * No need to send request to userspace, since actual * truncation has already been done by OPEN. But still * need to truncate page cache. */ } } IOW, fuse expects O_TRUNC to be passed to it as part of open flags. But currently overlayfs does not pass O_TRUNC to underlying filesystem hence fuse/virtiofs breaks. Setup overlayfs on top of virtiofs and following does not zero the file size of a file is either upper only or has already been copied up. fd = open(foo.txt, O_TRUNC | O_WRONLY); There are two ways to fix this. Either pass O_TRUNC to underlying filesystem or clear ATTR_OPEN from attr->ia_valid so that fuse ends up sending a SETATTR request to server. Miklos is concerned that O_TRUNC might have side affects so it is better to clear ATTR_OPEN for now. Hence this patch clears ATTR_OPEN from attr->ia_valid. I found this problem while running unionmount-testsuite. With this patch, unionmount-testsuite passes with overlayfs on top of virtiofs. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Fixes: bccece1ead36 ("ovl: allow remote upper") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
e67f0216 |
|
22-Apr-2020 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: clear ATTR_FILE from attr->ia_valid ovl_setattr() can be passed an attr which has ATTR_FILE set and attr->ia_file is a file pointer to overlay file. This is done in open(O_TRUNC) path. We should either replace with attr->ia_file with underlying file object or clear ATTR_FILE so that underlying filesystem does not end up using overlayfs file object pointer. There are no good use cases yet so for now clear ATTR_FILE. fuse seems to be one user which can use this. But it can work even without this. So it is not mandatory to pass ATTR_FILE to fuse. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Fixes: bccece1ead36 ("ovl: allow remote upper") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
926e94d7 |
|
21-Feb-2020 |
Amir Goldstein <amir73il@gmail.com> |
ovl: enable xino automatically in more cases So far, with xino=auto, we only enable xino if we know that all underlying filesystem use 32bit inode numbers. When users configure overlay with xino=auto, they already declare that they are ready to handle 64bit inode number from overlay. It is a very common case, that underlying filesystem uses 64bit ino, but rarely or never uses the high inode number bits (e.g. tmpfs, xfs). Leaving it for the users to declare high ino bits are unused with xino=on is not a recipe for many users to enjoy the benefits of xino. There appears to be very little reason not to enable xino when users declare xino=auto even if we do not know how many bits underlying filesystem uses for inode numbers. In the worst case of xino bits overflow by real inode number, we already fall back to the non-xino behavior - real inode number with unique pseudo dev or to non persistent inode number and overlay st_dev (for directories). The only annoyance from auto enabling xino is that xino bits overflow emits a warning to kmsg. Suppress those warnings unless users explicitly asked for xino=on, suggesting that they expected high ino bits to be unused by underlying filesystem. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
dfe51d47 |
|
21-Feb-2020 |
Amir Goldstein <amir73il@gmail.com> |
ovl: avoid possible inode number collisions with xino=on When xino feature is enabled and a real directory inode number overflows the lower xino bits, we cannot map this directory inode number to a unique and persistent inode number and we fall back to the real inode st_ino and overlay st_dev. The real inode st_ino with high bits may collide with a lower inode number on overlay st_dev that was mapped using xino. To avoid possible collision with legitimate xino values, map a non persistent inode number to a dedicated range in the xino address space. The dedicated range is created by adding one more bit to the number of reserved high xino bits. We could have added just one more fsid, but that would have had the undesired effect of changing persistent overlay inode numbers on kernel or require more complex xino mapping code. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
4d314f78 |
|
21-Feb-2020 |
Amir Goldstein <amir73il@gmail.com> |
ovl: use a private non-persistent ino pool There is no reason to deplete the system's global get_next_ino() pool for overlay non-persistent inode numbers and there is no reason at all to allocate non-persistent inode numbers for non-directories. For non-directories, it is much better to leave i_ino the same as real i_ino, to be consistent with st_ino/d_ino. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
a5a84682 |
|
09-Feb-2020 |
Chengguang Xu <cgxu519@mykernel.net> |
ovl: fix a typo in comment Fix a typo in comment. (annonate -> annotate) Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
62c832ed |
|
19-Nov-2019 |
Amir Goldstein <amir73il@gmail.com> |
ovl: simplify i_ino initialization Move i_ino initialization to ovl_inode_init() to avoid the dance of setting i_ino in ovl_fill_inode() sometimes on the first call and sometimes on the seconds call. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
735c907d |
|
19-Nov-2019 |
Amir Goldstein <amir73il@gmail.com> |
ovl: fix out of date comment and unreachable code ovl_inode_update() is no longer called from create object code path. Fixes: 01b39dcc9568 ("ovl: use inode_insert5() to hash a newly...") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
300b124f |
|
19-Nov-2019 |
Amir Goldstein <amir73il@gmail.com> |
ovl: fix value of i_ino for lower hardlink corner case Commit 6dde1e42f497 ("ovl: make i_ino consistent with st_ino in more cases"), relaxed the condition nfs_export=on in order to set the value of i_ino to xino map of real ino. Specifically, it also relaxed the pre-condition that index=on for consistent i_ino. This opened the corner case of lower hardlink in ovl_get_inode(), which calls ovl_fill_inode() with ino=0 and then ovl_init_inode() is called to set i_ino to lower real ino without the xino mapping. Pass the correct values of ino;fsid in this case to ovl_fill_inode(), so it can initialize i_ino correctly. Fixes: 6dde1e42f497 ("ovl: make i_ino consistent with st_ino in more ...") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
b7bf9908 |
|
14-Jan-2020 |
Amir Goldstein <amir73il@gmail.com> |
ovl: fix corner case of non-constant st_dev;st_ino On non-samefs overlay without xino, non pure upper inodes should use a pseudo_dev assigned to each unique lower fs, but if lower layer is on the same fs and upper layer, it has no pseudo_dev assigned. In this overlay layers setup: - two filesystems, A and B - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B Non pure upper overlay inode, whose origin is in layer 1 will have the st_dev;st_ino values of the real lower inode before copy up and the st_dev;st_ino values of the real upper inode after copy up. Fix this inconsitency by assigning a unique pseudo_dev also for upper fs, that will be used as st_dev value along with the lower inode st_dev for overlay inodes in the case above. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
07f1e596 |
|
14-Jan-2020 |
Amir Goldstein <amir73il@gmail.com> |
ovl: generalize the lower_fs[] array Rename lower_fs[] array to fs[], extend its size by one and use index fsid (instead of fsid-1) to access the fs[] array. Initialize fs[0] with upper fs values. fsid 0 is reserved even with lower only overlay, so fs[0] remains null in this case. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
0f831ec8 |
|
16-Nov-2019 |
Amir Goldstein <amir73il@gmail.com> |
ovl: simplify ovl_same_sb() helper No code uses the sb returned from this helper, so make it retrun a boolean and rename it to ovl_same_fs(). The xino mode is irrelevant when all layers are on same fs, so instead of describing samefs with mode OVL_XINO_OFF, use a new xino_mode state, which is 0 in the case of samefs, -1 in the case of xino=off and > 0 with xino enabled. Create a new helper ovl_same_dev(), to use instead of the common check for (ovl_same_fs() || xinobits). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
b1f9d385 |
|
21-Dec-2019 |
Amir Goldstein <amir73il@gmail.com> |
ovl: use ovl_inode_lock in ovl_llseek() In ovl_llseek() we use the overlay inode rwsem to protect against concurrent modifications to real file f_pos, because we copy the overlay file f_pos to/from the real file f_pos. This caused a lockdep warning of locking order violation when the ovl_llseek() operation was called on a lower nested overlay layer while the upper layer fs sb_writers is held (with patch improving copy-up efficiency for big sparse file). Use the internal ovl_inode_lock() instead of the overlay inode rwsem in those cases. It is meant to be used for protecting against concurrent changes to overlay inode internal state changes. The locking order rules are documented to explain this case. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
1bd0a3ae |
|
16-Dec-2019 |
lijiazi <jqqlijiazi@gmail.com> |
ovl: use pr_fmt auto generate prefix Use pr_fmt auto generate "overlayfs: " prefix. Signed-off-by: lijiazi <lijiazi@xiaomi.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
9c6d8f13 |
|
17-Nov-2019 |
Amir Goldstein <amir73il@gmail.com> |
ovl: fix corner case of non-unique st_dev;st_ino On non-samefs overlay without xino, non pure upper inodes should use a pseudo_dev assigned to each unique lower fs and pure upper inodes use the real upper st_dev. It is fine for an overlay pure upper inode to use the same st_dev;st_ino values as the real upper inode, because the content of those two different filesystem objects is always the same. In this case, however: - two filesystems, A and B - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B Non pure upper overlay inode, whose origin is in layer 1 will have the same st_dev;st_ino values as the real lower inode. This may result with a false positive results of 'diff' between the real lower and copied up overlay inode. Fix this by using the upper st_dev;st_ino values in this case. This breaks the property of constant st_dev;st_ino across copy up of this case. This breakage will be fixed by a later patch. Fixes: 5148626b806a ("ovl: allocate anon bdev per unique lower fs") Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
5c2e9f34 |
|
29-Aug-2019 |
Mark Salyzyn <salyzyn@android.com> |
ovl: filter of trusted xattr results in audit When filtering xattr list for reading, presence of trusted xattr results in a security audit log. However, if there is other content no errno will be set, and if there isn't, the errno will be -ENODATA and not -EPERM as is usually associated with a lack of capability. The check does not block the request to list the xattrs present. Switch to ns_capable_noaudit to reflect a more appropriate check. Signed-off-by: Mark Salyzyn <salyzyn@android.com> Cc: linux-security-module@vger.kernel.org Cc: kernel-team@android.com Cc: stable@vger.kernel.org # v3.18+ Fixes: a082c6f680da ("ovl: filter trusted xattr for non-admin") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
d2912cb1 |
|
04-Jun-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
6dde1e42 |
|
09-Jun-2019 |
Amir Goldstein <amir73il@gmail.com> |
ovl: make i_ino consistent with st_ino in more cases Relax the condition that overlayfs supports nfs export, to require that i_ino is consistent with st_ino/d_ino. It is enough to require that st_ino and d_ino are consistent. This fixes the failure of xfstest generic/504, due to mismatch of st_ino to inode number in the output of /proc/locks. Fixes: 12574a9f4c9c ("ovl: consistent i_ino for non-samefs with xino") Cc: <stable@vger.kernel.org> # v4.19 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
146d62e5 |
|
18-Apr-2019 |
Amir Goldstein <amir73il@gmail.com> |
ovl: detect overlapping layers Overlapping overlay layers are not supported and can cause unexpected behavior, but overlayfs does not currently check or warn about these configurations. User is not supposed to specify the same directory for upper and lower dirs or for different lower layers and user is not supposed to specify directories that are descendants of each other for overlay layers, but that is exactly what this zysbot repro did: https://syzkaller.appspot.com/x/repro.syz?x=12c7a94f400000 Moving layer root directories into other layers while overlayfs is mounted could also result in unexpected behavior. This commit places "traps" in the overlay inode hash table. Those traps are dummy overlay inodes that are hashed by the layers root inodes. On mount, the hash table trap entries are used to verify that overlay layers are not overlapping. While at it, we also verify that overlay layers are not overlapping with directories "in-use" by other overlay instances as upperdir/workdir. On lookup, the trap entries are used to verify that overlay layers root inodes have not been moved into other layers after mount. Some examples: $ ./run --ov --samefs -s ... ( mkdir -p base/upper/0/u base/upper/0/w base/lower lower upper mnt mount -o bind base/lower lower mount -o bind base/upper upper mount -t overlay none mnt ... -o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w) $ umount mnt $ mount -t overlay none mnt ... -o lowerdir=base,upperdir=upper/0/u,workdir=upper/0/w [ 94.434900] overlayfs: overlapping upperdir path mount: mount overlay on mnt failed: Too many levels of symbolic links $ mount -t overlay none mnt ... -o lowerdir=upper/0/u,upperdir=upper/0/u,workdir=upper/0/w [ 151.350132] overlayfs: conflicting lowerdir path mount: none is already mounted or mnt busy $ mount -t overlay none mnt ... -o lowerdir=lower:lower/a,upperdir=upper/0/u,workdir=upper/0/w [ 201.205045] overlayfs: overlapping lowerdir path mount: mount overlay on mnt failed: Too many levels of symbolic links $ mount -t overlay none mnt ... -o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w $ mv base/upper/0/ base/lower/ $ find mnt/0 mnt/0 mnt/0/w find: 'mnt/0/w/work': Too many levels of symbolic links find: 'mnt/0/u': Too many levels of symbolic links Reported-by: syzbot+9c69c282adc4edd2b540@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
acf3062a |
|
28-Mar-2019 |
Amir Goldstein <amir73il@gmail.com> |
ovl: relax WARN_ON() for overlapping layers use case This nasty little syzbot repro: https://syzkaller.appspot.com/x/repro.syz?x=12c7a94f400000 Creates overlay mounts where the same directory is both in upper and lower layers. Simplified example: mkdir foo work mount -t overlay none foo -o"lowerdir=.,upperdir=foo,workdir=work" The repro runs several threads in parallel that attempt to chdir into foo and attempt to symlink/rename/exec/mkdir the file bar. The repro hits a WARN_ON() I placed in ovl_instantiate(), which suggests that an overlay inode already exists in cache and is hashed by the pointer of the real upper dentry that ovl_create_real() has just created. At the point of the WARN_ON(), for overlay dir inode lock is held and upper dir inode lock, so at first, I did not see how this was possible. On a closer look, I see that after ovl_create_real(), because of the overlapping upper and lower layers, a lookup by another thread can find the file foo/bar that was just created in upper layer, at overlay path foo/foo/bar and hash the an overlay inode with the new real dentry as lower dentry. This is possible because the overlay directory foo/foo is not locked and the upper dentry foo/bar is in dcache, so ovl_lookup() can find it without taking upper dir inode shared lock. Overlapping layers is considered a wrong setup which would result in unexpected behavior, but it shouldn't crash the kernel and it shouldn't trigger WARN_ON() either, so relax this WARN_ON() and leave a pr_warn() instead to cover all cases of failure to get an overlay inode. The error returned from failure to insert new inode to cache with inode_insert5() was changed to -EEXIST, to distinguish from the error -ENOMEM returned on failure to get/allocate inode with iget5_locked(). Reported-by: syzbot+9c69c282adc4edd2b540@syzkaller.appspotmail.com Fixes: 01b39dcc9568 ("ovl: use inode_insert5() to hash a newly...") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
ec7ba118 |
|
04-Dec-2018 |
Miklos Szeredi <mszeredi@redhat.com> |
Revert "ovl: relax permission checking on underlying layers" This reverts commit 007ea44892e6fa963a0876a979e34890325c64eb. The commit broke some selinux-testsuite cases, and it looks like there's no straightforward fix keeping the direction of this patch, so revert for now. The original patch was trying to fix the consistency of permission checks, and not an observed bug. So reverting should be safe. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
007ea448 |
|
26-Oct-2018 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: relax permission checking on underlying layers Make permission checking more consistent: - special files don't need any access check on underling fs - exec permission check doesn't need to be performed on underlying fs Reported-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
69383c59 |
|
25-Sep-2018 |
Wei Yongjun <weiyongjun1@huawei.com> |
ovl: make symbol 'ovl_aops' static Fixes the following sparse warning: fs/overlayfs/inode.c:507:39: warning: symbol 'ovl_aops' was not declared. Should it be static? Fixes: 5b910bd615ba ("ovl: fix GPF in swapfile_activate of file from overlayfs over xfs") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
5b910bd6 |
|
27-Aug-2018 |
Amir Goldstein <amir73il@gmail.com> |
ovl: fix GPF in swapfile_activate of file from overlayfs over xfs Since overlayfs implements stacked file operations, the underlying filesystems are not supposed to be exposed to the overlayfs file, whose f_inode is an overlayfs inode. Assigning an overlayfs file to swap_file results in an attempt of xfs code to dereference an xfs_inode struct from an ovl_inode pointer: CPU: 0 PID: 2462 Comm: swapon Not tainted 4.18.0-xfstests-12721-g33e17876ea4e #3402 RIP: 0010:xfs_find_bdev_for_inode+0x23/0x2f Call Trace: xfs_iomap_swapfile_activate+0x1f/0x43 __se_sys_swapon+0xb1a/0xee9 Fix this by not assigning the real inode mapping to f_mapping, which will cause swapon() to return an error (-EINVAL). Although it makes sense not to allow setting swpafile on an overlayfs file, some users may depend on it, so we may need to fix this up in the future. Keeping f_mapping pointing to overlay inode mapping will cause O_DIRECT open to fail. Fix this by installing ovl_aops with noop_direct_IO in overlay inode mapping. Keeping f_mapping pointing to overlay inode mapping will cause other a_ops related operations to fail (e.g. readahead()). Those will be fixed by follow up patches. Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: f7c72396d0de ("ovl: add O_DIRECT support") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
80d34810 |
|
27-Aug-2018 |
Amir Goldstein <amir73il@gmail.com> |
ovl: respect FIEMAP_FLAG_SYNC flag Stacked overlayfs fiemap operation broke xfstests that test delayed allocation (with "_test_generic_punch -d"), because ovl_fiemap() failed to write dirty pages when requested. Fixes: 9e142c4102db ("ovl: add ovl_fiemap()") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
997336f2 |
|
11-May-2018 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: Do not do metadata only copy-up for truncate operation truncate should copy up full file (and not do metacopy only), otherwise it will be broken. For example, use truncate to increase size of a file so that any read beyong existing size will return null bytes. If we don't copy up full file, then we end up opening lower file and read from it only reads upto the old size (and not new size after truncate). Hence to avoid such situations, copy up data as well when file size changes. So far it was being done by d_real(O_WRONLY) call in truncate() path. Now that patch has been reverted. So force full copy up in ovl_setattr() if size of file is changing. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
a00c2d59 |
|
11-May-2018 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: Add an inode flag OVL_CONST_INO Add an ovl_inode flag OVL_CONST_INO. This flag signifies if inode number will remain constant over copy up or not. This flag does not get updated over copy up and remains unmodifed after setting once. Next patch in the series will make use of this flag. It will basically figure out if dentry is of type ORIGIN or not. And this can be derived by this flag. ORIGIN = (upperdentry && ovl_test_flag(OVL_CONST_INO, inode)). Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
2664bd08 |
|
11-May-2018 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: Store lower data inode in ovl_inode Right now ovl_inode stores inode pointer for lower inode. This helps with quickly getting lower inode given overlay inode (ovl_inode_lower()). Now with metadata only copy-up, we can have metacopy inode in middle layer as well and inode containing data can be different from ->lower. I need to be able to open the real file in ovl_open_realfile() and for that I need to quickly find the lower data inode. Hence store lower data inode also in ovl_inode. Also provide an helper ovl_inode_lowerdata() to access this field. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
67d756c2 |
|
11-May-2018 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: Fix ovl_getattr() to get number of blocks from lower If an inode has been copied up metadata only, then we need to query the number of blocks from lower and fill up the stat->st_blocks. We need to be careful about races where we are doing stat on one cpu and data copy up is taking place on other cpu. We want to return stat->st_blocks either from lower or stable upper and not something in between. Hence, ovl_has_upperdata() is called first to figure out whether block reporting will take place from lower or upper. We now support metacopy dentries in middle layer. That means number of blocks reporting needs to come from lowest data dentry and this could be different from lower dentry. Hence we end up making a separate vfs_getxattr() call for metacopy dentries to get number of blocks. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
9d3dfea3 |
|
11-May-2018 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: Modify ovl_lookup() and friends to lookup metacopy dentry This patch modifies ovl_lookup() and friends to lookup metacopy dentries. It also allows for presence of metacopy dentries in lower layer. During lookup, check for presence of OVL_XATTR_METACOPY and if not present, set OVL_UPPERDATA bit in flags. We don't support metacopy feature with nfs_export. So in nfs_export code, we set OVL_UPPERDATA flag set unconditionally if upper inode exists. Do not follow metacopy origin if we find a metacopy only inode and metacopy feature is not enabled for that mount. Like redirect, this can have security implications where an attacker could hand craft upper and try to gain access to file on lower which it should not have to begin with. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
027065b7 |
|
11-May-2018 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: Use out_err instead of out_nomem Right now we use goto out_nomem which assumes error code is -ENOMEM. But there are other errors returned like -ESTALE as well. So instead of out_nomem, use out_err which will do ERR_PTR(err). That way one can put error code in err and jump to out_err. This just code reorganization and no change of functionality. I am about to add more code and this organization helps laying more code and error paths on top of it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
d6eac039 |
|
11-May-2018 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: Move the copy up helpers to copy_up.c Right now two copy up helpers are in inode.c. Amir suggested it might be better to move these to copy_up.c. There will one more related function which will come in later patch. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
9cec54c8 |
|
11-May-2018 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: Initialize ovl_inode->redirect in ovl_get_inode() ovl_inode->redirect is an inode property and should be initialized in ovl_get_inode() only when we are adding a new inode to cache. If inode is already in cache, it is already initialized and we should not be touching ovl_inode->redirect field. As of now this is not a problem as redirects are used only for directories which don't share inode. But soon I want to use redirects for regular files also and there it can become an issue. Hence, move ->redirect initialization in ovl_get_inode(). Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
9e142c41 |
|
18-Jul-2018 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: add ovl_fiemap() Implement stacked fiemap(). Need to split inode operations for regular file (which has fiemap) and special file (which doesn't have fiemap). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
d1d04ef8 |
|
18-Jul-2018 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: stack file ops Implement file operations on a regular overlay file. The underlying file is opened separately and cached in ->private_data. It might be worth making an exception for such files when accounting in nr_file to confirm to userspace expectations. We are only adding a small overhead (248bytes for the struct file) since the real inode and dentry are pinned by overlayfs anyway. This patch doesn't have any effect, since the vfs will use d_real() to find the real underlying file to open. The patch at the end of the series will actually enable this functionality. AV: make it use open_with_fake_path(), don't mess with override_creds SzM: still need to mess with override_creds() until no fs uses current_cred() in their open method. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
46e5d0a3 |
|
18-Jul-2018 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: copy up file size as well Copy i_size of the underlying inode to the overlay inode in ovl_copyattr(). This is in preparation for stacking I/O operations on overlay files. This patch shouldn't have any observable effect. Remove stale comment from ovl_setattr() [spotted by Vivek Goyal]. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
5812160e |
|
18-Jul-2018 |
Miklos Szeredi <mszeredi@redhat.com> |
Revert "Revert "ovl: get_write_access() in truncate"" This reverts commit 31c3a7069593b072bd57192b63b62f9a7e994e9a. Re-add functionality dealing with i_writecount on truncate to overlayfs. This patch shouldn't have any observable effects, since we just re-assert the writecout that vfs_truncate() already got for us. This is in preparation for moving overlay functionality out of the VFS. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
d9854c87 |
|
18-Jul-2018 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: copy up times Copy up mtime and ctime to overlay inode after times in real object are modified. Be careful not to dirty cachelines when not necessary. This is in preparation for moving overlay functionality out of the VFS. This patch shouldn't have any observable effect. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
95582b00 |
|
08-May-2018 |
Deepa Dinamani <deepa.kernel@gmail.com> |
vfs: change inode times to use struct timespec64 struct timespec is not y2038 safe. Transition vfs to use y2038 safe struct timespec64 instead. The change was made with the help of the following cocinelle script. This catches about 80% of the changes. All the header file and logic changes are included in the first 5 rules. The rest are trivial substitutions. I avoid changing any of the function signatures or any other filesystem specific data structures to keep the patch simple for review. The script can be a little shorter by combining different cases. But, this version was sufficient for my usecase. virtual patch @ depends on patch @ identifier now; @@ - struct timespec + struct timespec64 current_time ( ... ) { - struct timespec now = current_kernel_time(); + struct timespec64 now = current_kernel_time64(); ... - return timespec_trunc( + return timespec64_trunc( ... ); } @ depends on patch @ identifier xtime; @@ struct \( iattr \| inode \| kstat \) { ... - struct timespec xtime; + struct timespec64 xtime; ... } @ depends on patch @ identifier t; @@ struct inode_operations { ... int (*update_time) (..., - struct timespec t, + struct timespec64 t, ...); ... } @ depends on patch @ identifier t; identifier fn_update_time =~ "update_time$"; @@ fn_update_time (..., - struct timespec *t, + struct timespec64 *t, ...) { ... } @ depends on patch @ identifier t; @@ lease_get_mtime( ... , - struct timespec *t + struct timespec64 *t ) { ... } @te depends on patch forall@ identifier ts; local idexpression struct inode *inode_node; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn_update_time =~ "update_time$"; identifier fn; expression e, E3; local idexpression struct inode *node1; local idexpression struct inode *node2; local idexpression struct iattr *attr1; local idexpression struct iattr *attr2; local idexpression struct iattr attr; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; @@ ( ( - struct timespec ts; + struct timespec64 ts; | - struct timespec ts = current_time(inode_node); + struct timespec64 ts = current_time(inode_node); ) <+... when != ts ( - timespec_equal(&inode_node->i_xtime, &ts) + timespec64_equal(&inode_node->i_xtime, &ts) | - timespec_equal(&ts, &inode_node->i_xtime) + timespec64_equal(&ts, &inode_node->i_xtime) | - timespec_compare(&inode_node->i_xtime, &ts) + timespec64_compare(&inode_node->i_xtime, &ts) | - timespec_compare(&ts, &inode_node->i_xtime) + timespec64_compare(&ts, &inode_node->i_xtime) | ts = current_time(e) | fn_update_time(..., &ts,...) | inode_node->i_xtime = ts | node1->i_xtime = ts | ts = inode_node->i_xtime | <+... attr1->ia_xtime ...+> = ts | ts = attr1->ia_xtime | ts.tv_sec | ts.tv_nsec | btrfs_set_stack_timespec_sec(..., ts.tv_sec) | btrfs_set_stack_timespec_nsec(..., ts.tv_nsec) | - ts = timespec64_to_timespec( + ts = ... -) | - ts = ktime_to_timespec( + ts = ktime_to_timespec64( ...) | - ts = E3 + ts = timespec_to_timespec64(E3) | - ktime_get_real_ts(&ts) + ktime_get_real_ts64(&ts) | fn(..., - ts + timespec64_to_timespec(ts) ,...) ) ...+> ( <... when != ts - return ts; + return timespec64_to_timespec(ts); ...> ) | - timespec_equal(&node1->i_xtime1, &node2->i_xtime2) + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2) | - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2) + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2) | - timespec_compare(&node1->i_xtime1, &node2->i_xtime2) + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2) | node1->i_xtime1 = - timespec_trunc(attr1->ia_xtime1, + timespec64_trunc(attr1->ia_xtime1, ...) | - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2, + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2, ...) | - ktime_get_real_ts(&attr1->ia_xtime1) + ktime_get_real_ts64(&attr1->ia_xtime1) | - ktime_get_real_ts(&attr.ia_xtime1) + ktime_get_real_ts64(&attr.ia_xtime1) ) @ depends on patch @ struct inode *node; struct iattr *attr; identifier fn; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; expression e; @@ ( - fn(node->i_xtime); + fn(timespec64_to_timespec(node->i_xtime)); | fn(..., - node->i_xtime); + timespec64_to_timespec(node->i_xtime)); | - e = fn(attr->ia_xtime); + e = fn(timespec64_to_timespec(attr->ia_xtime)); ) @ depends on patch forall @ struct inode *node; struct iattr *attr; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); fn (..., - &attr->ia_xtime, + &ts, ...); ) ...+> } @ depends on patch forall @ struct inode *node; struct iattr *attr; struct kstat *stat; identifier ia_xtime =~ "^ia_[acm]time$"; identifier i_xtime =~ "^i_[acm]time$"; identifier xtime =~ "^[acm]time$"; identifier fn, ret; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime); + &ts); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime); + &ts); | + ts = timespec64_to_timespec(stat->xtime); ret = fn (..., - &stat->xtime); + &ts); ) ...+> } @ depends on patch @ struct inode *node; struct inode *node2; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier i_xtime3 =~ "^i_[acm]time$"; struct iattr *attrp; struct iattr *attrp2; struct iattr attr ; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; struct kstat *stat; struct kstat stat1; struct timespec64 ts; identifier xtime =~ "^[acmb]time$"; expression e; @@ ( ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ; | node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | stat->xtime = node2->i_xtime1; | stat1.xtime = node2->i_xtime1; | ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ; | ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2; | - e = node->i_xtime1; + e = timespec64_to_timespec( node->i_xtime1 ); | - e = attrp->ia_xtime1; + e = timespec64_to_timespec( attrp->ia_xtime1 ); | node->i_xtime1 = current_time(...); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | - node->i_xtime1 = e; + node->i_xtime1 = timespec_to_timespec64(e); ) Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: <anton@tuxera.com> Cc: <balbi@kernel.org> Cc: <bfields@fieldses.org> Cc: <darrick.wong@oracle.com> Cc: <dhowells@redhat.com> Cc: <dsterba@suse.com> Cc: <dwmw2@infradead.org> Cc: <hch@lst.de> Cc: <hirofumi@mail.parknet.co.jp> Cc: <hubcap@omnibond.com> Cc: <jack@suse.com> Cc: <jaegeuk@kernel.org> Cc: <jaharkes@cs.cmu.edu> Cc: <jslaby@suse.com> Cc: <keescook@chromium.org> Cc: <mark@fasheh.com> Cc: <miklos@szeredi.hu> Cc: <nico@linaro.org> Cc: <reiserfs-devel@vger.kernel.org> Cc: <richard@nod.at> Cc: <sage@redhat.com> Cc: <sfrench@samba.org> Cc: <swhiteho@redhat.com> Cc: <tj@kernel.org> Cc: <trond.myklebust@primarydata.com> Cc: <tytso@mit.edu> Cc: <viro@zeniv.linux.org.uk>
|
#
01b39dcc |
|
11-May-2018 |
Amir Goldstein <amir73il@gmail.com> |
ovl: use inode_insert5() to hash a newly created inode Currently, there is a small window where ovl_obtain_alias() can race with ovl_instantiate() and create two different overlay inodes with the same underlying real non-dir non-hardlink inode. The race requires an adversary to guess the file handle of the yet to be created upper inode and decode the guessed file handle after ovl_creat_real(), but before ovl_instantiate(). This race does not affect overlay directory inodes, because those are decoded via ovl_lookup_real() and not with ovl_obtain_alias(). This patch fixes the race, by using inode_insert5() to add a newly created inode to cache. If the newly created inode apears to already exist in cache (hashed by the same real upper inode), we instantiate the dentry with the old inode and drop the new inode, instead of silently not hashing the new inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
ac6a52eb |
|
08-May-2018 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: Pass argument to ovl_get_inode() in a structure ovl_get_inode() right now has 5 parameters. Soon this patch series will add 2 more and suddenly argument list starts looking too long. Hence pass arguments to ovl_get_inode() in a structure and it looks little cleaner. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
12574a9f |
|
16-Mar-2018 |
Amir Goldstein <amir73il@gmail.com> |
ovl: consistent i_ino for non-samefs with xino When overlay layers are not all on the same fs, but all inode numbers of underlying fs do not use the high 'xino' bits, overlay st_ino values are constant and persistent. In that case, set i_ino value to the same value as st_ino for nfsd readdirplus validator. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
e487d889 |
|
07-Nov-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: constant st_ino for non-samefs with xino On 64bit systems, when overlay layers are not all on the same fs, but all inode numbers of underlying fs are not using the high bits, use the high bits to partition the overlay st_ino address space. The high bits hold the fsid (upper fsid is 0). This way overlay inode numbers are unique and all inodes use overlay st_dev. Inode numbers are also persistent for a given layer configuration. Currently, our only indication for available high ino bits is from a filesystem that supports file handles and uses the default encode_fh() operation, which encodes a 32bit inode number. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
5148626b |
|
28-Mar-2018 |
Amir Goldstein <amir73il@gmail.com> |
ovl: allocate anon bdev per unique lower fs Instead of allocating an anonymous bdev per lower layer, allocate one anonymous bdev per every unique lower fs that is different than upper fs. Every unique lower fs is assigned an fsid > 0 and the number of unique lower fs are stored in ofs->numlowerfs. The assigned fsid is stored in the lower layer struct and will be used also for inode number multiplexing. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
da309e8c |
|
08-Nov-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: factor out ovl_map_dev_ino() helper A helper for ovl_getattr() to map the values of st_dev and st_ino according to constant st_ino rules. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
8f35cf51 |
|
11-Apr-2018 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: cleanup ovl_update_time() No need to mess with an alias, the upperdentry can be retrieved directly from the overlay inode. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
0471a9cd |
|
20-Mar-2018 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: cleanup setting OVL_INDEX Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
9f99e50d |
|
11-Apr-2018 |
Amir Goldstein <amir73il@gmail.com> |
ovl: set lower layer st_dev only if setting lower st_ino For broken hardlinks, we do not return lower st_ino, so we should also not return lower pseudo st_dev. Fixes: a0c5ad307ac0 ("ovl: relax same fs constraint for constant st_ino") Cc: <stable@vger.kernel.org> #v4.15 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
695b46e7 |
|
15-Mar-2018 |
Amir Goldstein <amir73il@gmail.com> |
ovl: set i_ino to the value of st_ino for NFS export Eddie Horng reported that readdir of an overlayfs directory that was exported via NFSv3 returns entries with d_type set to DT_UNKNOWN. The reason is that while preparing the response for readdirplus, nfsd checks inside encode_entryplus_baggage() that a child dentry's inode number matches the value of d_ino returns by overlayfs readdir iterator. Because the overlayfs inodes use arbitrary inode numbers that are not correlated with the values of st_ino/d_ino, NFSv3 falls back to not encoding d_type. Although this is an allowed behavior, we can fix it for the case of all overlayfs layers on the same underlying filesystem. When NFS export is enabled and d_ino is consistent with st_ino (samefs), set the same value also to i_ino in ovl_fill_inode() for all overlayfs inodes, nfsd readdirplus sanity checks will pass. ovl_fill_inode() may be called from ovl_new_inode(), before real inode was created with ino arg 0. In that case, i_ino will be updated to real upper inode i_ino on ovl_inode_init() or ovl_inode_update(). Reported-by: Eddie Horng <eddiehorng.tw@gmail.com> Tested-by: Eddie Horng <eddiehorng.tw@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Fixes: 8383f1748829 ("ovl: wire up NFS export operations") Cc: <stable@vger.kernel.org> #v4.16 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
764baba8 |
|
04-Feb-2018 |
Amir Goldstein <amir73il@gmail.com> |
ovl: hash non-dir by lower inode for fsnotify Commit 31747eda41ef ("ovl: hash directory inodes for fsnotify") fixed an issue of inotify watch on directory that stops getting events after dropping dentry caches. A similar issue exists for non-dir non-upper files, for example: $ mkdir -p lower upper work merged $ touch lower/foo $ mount -t overlay -o lowerdir=lower,workdir=work,upperdir=upper none merged $ inotifywait merged/foo & $ echo 2 > /proc/sys/vm/drop_caches $ cat merged/foo inotifywait doesn't get the OPEN event, because ovl_lookup() called from 'cat' allocates a new overlay inode and does not reuse the watched inode. Fix this by hashing non-dir overlay inodes by lower real inode in the following cases that were not hashed before this change: - A non-upper overlay mount - A lower non-hardlink when index=off A helper ovl_hash_bylower() was added to put all the logic and documentation about which real inode an overlay inode is hashed by into one place. The issue dates back to initial version of overlayfs, but this patch depends on ovl_inode code that was introduced in kernel v4.13. Cc: <stable@vger.kernel.org> #v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
4b91c30a |
|
18-Jan-2018 |
Amir Goldstein <amir73il@gmail.com> |
ovl: lookup connected ancestor of dir in inode cache Decoding a dir file handle requires walking backward up to layer root and for lower dir also checking the index to see if any of the parents have been copied up. Lookup overlay ancestor dentry in inode/dentry cache by decoded real parents to shortcut looking up all the way back to layer root. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
7a9dadef |
|
10-Jul-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: hash non-indexed dir by upper inode for NFS export Non-indexed upper dirs are encoded as upper file handles. When NFS export is enabled, hash non-indexed directory inodes by upper inode, so we can find them in inode cache using the decoded upper inode. When NFS export is disabled, directories are not indexed on copy up, so hash non-indexed directory inodes by origin inode, the same hash key that is used before copy up. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
9436a1a3 |
|
24-Dec-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: decode lower file handles of unlinked but open files Lookup overlay inode in cache by origin inode, so we can decode a file handle of an open file even if the index has a whiteout index entry to mark this overlay inode was unlinked. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
aa3ff3c1 |
|
15-Oct-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: copy up of disconnected dentries With NFS export, some operations on decoded file handles (e.g. open, link, setattr, xattr_set) may call copy up with a disconnected non-dir. In this case, we will copy up lower inode to index dir without linking it to upper dir. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
0aceb53e |
|
12-Dec-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: do not pass overlay dentry to ovl_get_inode() This is needed for using ovl_get_inode() for decoding file handles for NFS export. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
86eaa130 |
|
21-Nov-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: unbless lower st_ino of unverified origin On a malformed overlay, several redirected dirs can point to the same dir on a lower layer. This presents a similar challenge as broken hardlinks, because different objects in the overlay can return the same st_ino/st_dev pair from stat(2). For broken hardlinks, we do not provide constant st_ino on copy up to avoid this inconsistency. When NFS export feature is enabled, apply the same logic to files and directories with unverified lower origin. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
31747eda |
|
14-Jan-2018 |
Amir Goldstein <amir73il@gmail.com> |
ovl: hash directory inodes for fsnotify fsnotify pins a watched directory inode in cache, but if directory dentry is released, new lookup will allocate a new dentry and a new inode. Directory events will be notified on the new inode, while fsnotify listener is watching the old pinned inode. Hash all directory inodes to reuse the pinned inode on lookup. Pure upper dirs are hashes by real upper inode, merge and lower dirs are hashed by real lower inode. The reference to lower inode was being held by the lower dentry object in the overlay dentry (oe->lowerstack[0]). Releasing the overlay dentry may drop lower inode refcount to zero. Add a refcount on behalf of the overlay inode to prevent that. As a by-product, hashing directory inodes also detects multiple redirected dirs to the same lower dir and uncovered redirected dir target on and returns -ESTALE on lookup. The reported issue dates back to initial version of overlayfs, but this patch depends on ovl_inode code that was introduced in kernel v4.13. Cc: <stable@vger.kernel.org> #v4.13 Reported-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Niklas Cassel <niklas.cassel@axis.com>
|
#
a0c5ad30 |
|
31-Oct-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: relax same fs constraint for constant st_ino For the case of all layers not on the same fs, return the copy up origin inode st_dev/st_ino for non-dir from stat(2). This guaranties constant st_dev/st_ino for non-dir across copy up. Like the same fs case, st_ino of non-dir is also persistent. If the st_dev/st_ino for copied up object would have been the same as that of the real underlying lower file, running diff on underlying lower file and overlay copied up file would result in diff reporting that the two files are equal when in fact, they may have different content. Therefore, unlike the same fs case, st_dev is not persistent because it uses the unique anonymous bdev allocated for the lower layer. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
ba1e563c |
|
24-Jul-2017 |
Chandan Rajendra <chandan@linux.vnet.ibm.com> |
ovl: return anonymous st_dev for lower inodes For non-samefs setup, to make sure that st_dev/st_ino pair is unique across the system, we return a unique anonymous st_dev for stat(2) of lower layer inode. A following patch is going to fix constant st_dev/st_ino across copy up by returning origin st_dev/st_ino for copied up objects. If the st_dev/st_ino for copied up object would have been the same as that of the real underlying lower file, running diff on underlying lower file and overlay copied up file would result in diff reporting that the 2 files are equal when in fact, they may have different content. [amir: simplify ovl_get_pseudo_dev() split from allocate anonymous bdev patch] Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
ee023c30 |
|
30-Oct-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: move include of ovl_entry.h into overlayfs.h Most overlayfs c files already explicitly include ovl_entry.h to use overlay entry struct definitions and upcoming changes are going to require even more c files to include this header. All overlayfs c files include overlayfs.h and overlayfs.h itself refers to some structs defined in ovl_entry.h, so it seems more logic to include ovl_entry.h from overlayfs.h than from c files. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
b79e05aa |
|
25-Jun-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: no direct iteration for dir with origin xattr If a non-merge dir in an overlay mount has an overlay.origin xattr, it means it was once an upper merge dir, which may contain whiteouts and then the lower dir was removed under it. Do not iterate real dir directly in this case to avoid exposing whiteouts. [SzM] Set OVL_WHITEOUT for all merge directories as well. [amir] A directory that was just copied up does not have the OVL_WHITEOUTS flag. We need to set it to fix merge dir iteration. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
4eae06de |
|
27-Oct-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: lockdep annotate of nested OVL_I(inode)->lock This fixes a lockdep splat when mounting a nested overlayfs. Fixes: a015dafcaf5b ("ovl: use ovl_inode mutex to synchronize...") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
6eaf0111 |
|
12-Oct-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: fix EIO from lookup of non-indexed upper Commit fbaf94ee3cd5 ("ovl: don't set origin on broken lower hardlink") attempt to avoid the condition of non-indexed upper inode with lower hardlink as origin. If this condition is found, lookup returns EIO. The protection of commit mentioned above does not cover the case of lower that is not a hardlink when it is copied up (with either index=off/on) and then lower is hardlinked while overlay is offline. Changes to lower layer while overlayfs is offline should not result in unexpected behavior, so a permanent EIO error after creating a link in lower layer should not be considered as correct behavior. This fix replaces EIO error with success in cases where upper has origin but no index is found, or index is found that does not match upper inode. In those cases, lookup will not fail and the returned overlay inode will be hashed by upper inode instead of by lower origin inode. Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
939ae4ef |
|
11-Sep-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: fix false positive ESTALE on lookup Commit b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up origin") verifies that the origin lower inode stored in the overlayfs inode matched the inode of a copy up origin dentry found by lookup. There is a false positive result in that check when lower fs does not support file handles and copy up origin cannot be followed by file handle at lookup time. The false negative happens when finding an overlay inode in cache on a copied up overlay dentry lookup. The overlay inode still 'remembers' the copy up origin inode, but the copy up origin dentry is not available for verification. Relax the check in case copy up origin dentry is not available. Fixes: b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up...") Cc: <stable@vger.kernel.org> # v4.13 Reported-by: Jordi Pujol <jordipujolp@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
6787341a |
|
27-Jul-2017 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: check snprintf return Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
1d88f183 |
|
20-Jul-2017 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: fix xattr get and set with selinux inode_doinit_with_dentry() in SELinux wants to read the upper inode's xattr to get security label, and ovl_xattr_get() calls ovl_dentry_real(), which depends on dentry->d_inode, but d_inode is null and not initialized yet at this point resulting in an Oops. Fix by getting the upperdentry info from the inode directly in this case. Reported-by: Eryu Guan <eguan@redhat.com> Fixes: 09d8b586731b ("ovl: move __upperdentry to ovl_inode") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
caf70cb2 |
|
21-Jun-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: cleanup orphan index entries index entry should live only as long as there are upper or lower hardlinks. Cleanup orphan index entries on mount and when dropping the last overlay inode nlink. When about to cleanup or link up to orphan index and the index inode nlink > 1, admit that something went wrong and adjust overlay nlink to index inode nlink - 1 to prevent it from dropping below zero. This could happen when adding lower hardlinks underneath a mounted overlay and then trying to unlink them. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
5f8415d6 |
|
20-Jun-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: persistent overlay inode nlink for indexed inodes With inodes index enabled, an overlay inode nlink counts the union of upper and non-covered lower hardlinks. During the lifetime of a non-pure upper inode, the following nlink modifying operations can happen: 1. Lower hardlink copy up 2. Upper hardlink created, unlinked or renamed over 3. Lower hardlink whiteout or renamed over For the first, copy up case, the union nlink does not change, whether the operation succeeds or fails, but the upper inode nlink may change. Therefore, before copy up, we store the union nlink value relative to the lower inode nlink in the index inode xattr trusted.overlay.nlink. For the second, upper hardlink case, the union nlink should be incremented or decremented IFF the operation succeeds, aligned with nlink change of the upper inode. Therefore, before link/unlink/rename, we store the union nlink value relative to the upper inode nlink in the index inode. For the last, lower cover up case, we simplify things by preceding the whiteout or cover up with copy up. This makes sure that there is an index upper inode where the nlink xattr can be stored before the copied up upper entry is unlink. Return the overlay inode nlinks for indexed upper inodes on stat(2). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
59be0971 |
|
20-Jun-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: implement index dir copy up Implement a copy up method for non-dir objects using index dir to prevent breaking lower hardlinks on copy up. This method requires that the inodes index dir feature was enabled and that all underlying fs support file handle encoding/decoding. On the first lower hardlink copy up, upper file is created in index dir, named after the hex representation of the lower origin inode file handle. On the second lower hardlink copy up, upper file is found in index dir, by the same lower handle key. On either case, the upper indexed inode is then linked to the copy up upper path. The index entry remains linked for future lower hardlink copy up and for lower to upper inode map, that is needed for exporting overlayfs to NFS. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
b9ac5c27 |
|
04-Jul-2017 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: hash overlay non-dir inodes by copy up origin Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
359f392c |
|
21-Jun-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: lookup index entry for copy up origin When inodes index feature is enabled, lookup in indexdir for the index entry of lower real inode or copy up origin inode. The index entry name is the hex representation of the lower inode file handle. If the index dentry in negative, then either no lower aliases have been copied up yet, or aliases have been copied up in older kernels and are not indexed. If the index dentry for a copy up origin inode is positive, but points to an inode different than the upper inode, then either the upper inode has been copied up and not indexed or it was indexed, but since then index dir was cleared. Either way, that index cannot be used to indentify the overlay inode. If a positive dentry that matches the upper inode was found, then it is safe to use the copy up origin st_ino for upper hardlinks, because all indexed upper hardlinks are represented by the same overlay inode as the copy up origin. Set the INDEX type flag on an indexed upper dentry. A non-upper dentry may also have a positive index from copy up of another lower hardlink. This situation will be handled by following patches. Index lookup is going to be used to prevent breaking hardlinks on copy up. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
13c72075 |
|
04-Jul-2017 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: move impure to ovl_inode Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
09d8b586 |
|
04-Jul-2017 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: move __upperdentry to ovl_inode Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
25b7713a |
|
04-Jul-2017 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: use i_private only as a key Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
e6d2ebdd |
|
04-Jul-2017 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: simplify getting inode Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
a082c6f6 |
|
29-May-2017 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: filter trusted xattr for non-admin Filesystems filter out extended attributes in the "trusted." domain for unprivlieged callers. Overlay calls underlying filesystem's method with elevated privs, so need to do the filtering in overlayfs too. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
5b712091 |
|
05-May-2017 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: merge getattr for dir and nondir Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
72b608f0 |
|
23-Apr-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: constant st_ino/st_dev across copy up When all layers are on the same underlying filesystem, let stat(2) return st_dev/st_ino values of the copy up origin inode if it is known. This results in constant st_ino/st_dev representation of files in an overlay mount before and after copy up. When the underlying filesystem support NFS exportfs, the result is also persistent st_ino/st_dev representation before and after mount cycle. Lower hardlinks are broken on copy up to different upper files, so we cannot use the lower origin st_ino for those different files, even for the same fs case. When all overlay layers are on the same fs, use overlay st_dev for non-dirs to get the correct result from du -x. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
b1eaa950 |
|
10-Jan-2017 |
Amir Goldstein <amir73il@gmail.com> |
ovl: lockdep annotate of nested stacked overlayfs inode lock An overlayfs instance can be the lower layer of another overlayfs instance. This setup triggers a lockdep splat of possible recursive locking of sb->s_type->i_mutex_key in iterate_dir(). Trimmed snip: [ INFO: possible recursive locking detected ] bash/2468 is trying to acquire lock: &sb->s_type->i_mutex_key#14, at: iterate_dir+0x7d/0x15c but task is already holding lock: &sb->s_type->i_mutex_key#14, at: iterate_dir+0x7d/0x15c One problem observed with this splat is that ovl_new_inode() does not call lockdep_annotate_inode_mutex_key() to annotate the dir inode lock as &sb->s_type->i_mutex_dir_key like other fs do. The other problem is that the 2 nested levels of overlayfs inode lock are annotated using the same key, which is the cause of the false positive lockdep warning. Fix this by annotating overlayfs inode lock in ovl_fill_inode() according to stack level of the super block instance and use different key for dir vs. non-dir like other fs do. Here is an edited snip from /proc/lockdep_chains after iterate_dir() of nested overlayfs: [...] &ovl_i_mutex_dir_key[depth] (stack_depth=2) [...] &ovl_i_mutex_dir_key[depth]#2 (stack_depth=1) [...] &type->i_mutex_dir_key (stack_depth=0) Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
a528d35e |
|
31-Jan-2017 |
David Howells <dhowells@redhat.com> |
statx: Add a system call to make enhanced file info available Add a system call to make extended file information available, including file creation and some attribute flags where available through the underlying filesystem. The getattr inode operation is altered to take two additional arguments: a u32 request_mask and an unsigned int flags that indicate the synchronisation mode. This change is propagated to the vfs_getattr*() function. Functions like vfs_stat() are now inline wrappers around new functions vfs_statx() and vfs_statx_fd() to reduce stack usage. ======== OVERVIEW ======== The idea was initially proposed as a set of xattrs that could be retrieved with getxattr(), but the general preference proved to be for a new syscall with an extended stat structure. A number of requests were gathered for features to be included. The following have been included: (1) Make the fields a consistent size on all arches and make them large. (2) Spare space, request flags and information flags are provided for future expansion. (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an __s64). (4) Creation time: The SMB protocol carries the creation time, which could be exported by Samba, which will in turn help CIFS make use of FS-Cache as that can be used for coherency data (stx_btime). This is also specified in NFSv4 as a recommended attribute and could be exported by NFSD [Steve French]. (5) Lightweight stat: Ask for just those details of interest, and allow a netfs (such as NFS) to approximate anything not of interest, possibly without going to the server [Trond Myklebust, Ulrich Drepper, Andreas Dilger] (AT_STATX_DONT_SYNC). (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks its cached attributes are up to date [Trond Myklebust] (AT_STATX_FORCE_SYNC). And the following have been left out for future extension: (7) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. Can also be used to modify fill_post_wcc() in NFSD which retrieves i_version directly, but has just called vfs_getattr(). It could get it from the kstat struct if it used vfs_xgetattr() instead. (There's disagreement on the exact semantics of a single field, since not all filesystems do this the same way). (8) BSD stat compatibility: Including more fields from the BSD stat such as creation time (st_btime) and inode generation number (st_gen) [Jeremy Allison, Bernd Schubert]. (9) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd Schubert]. (This was asked for but later deemed unnecessary with the open-by-handle capability available and caused disagreement as to whether it's a security hole or not). (10) Extra coherency data may be useful in making backups [Andreas Dilger]. (No particular data were offered, but things like last backup timestamp, the data version number and the DOS archive bit would come into this category). (11) Allow the filesystem to indicate what it can/cannot provide: A filesystem can now say it doesn't support a standard stat feature if that isn't available, so if, for instance, inode numbers or UIDs don't exist or are fabricated locally... (This requires a separate system call - I have an fsinfo() call idea for this). (12) Store a 16-byte volume ID in the superblock that can be returned in struct xstat [Steve French]. (Deferred to fsinfo). (13) Include granularity fields in the time data to indicate the granularity of each of the times (NFSv4 time_delta) [Steve French]. (Deferred to fsinfo). (14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. Note that the Linux IOC flags are a mess and filesystems such as Ext4 define flags that aren't in linux/fs.h, so translation in the kernel may be a necessity (or, possibly, we provide the filesystem type too). (Some attributes are made available in stx_attributes, but the general feeling was that the IOC flags were to ext[234]-specific and shouldn't be exposed through statx this way). (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, Michael Kerrisk]. (Deferred, probably to fsinfo. Finding out if there's an ACL or seclabal might require extra filesystem operations). (16) Femtosecond-resolution timestamps [Dave Chinner]. (A __reserved field has been left in the statx_timestamp struct for this - if there proves to be a need). (17) A set multiple attributes syscall to go with this. =============== NEW SYSTEM CALL =============== The new system call is: int ret = statx(int dfd, const char *filename, unsigned int flags, unsigned int mask, struct statx *buffer); The dfd, filename and flags parameters indicate the file to query, in a similar way to fstatat(). There is no equivalent of lstat() as that can be emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is also no equivalent of fstat() as that can be emulated by passing a NULL filename to statx() with the fd of interest in dfd. Whether or not statx() synchronises the attributes with the backing store can be controlled by OR'ing a value into the flags argument (this typically only affects network filesystems): (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this respect. (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise its attributes with the server - which might require data writeback to occur to get the timestamps correct. (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a network filesystem. The resulting values should be considered approximate. mask is a bitmask indicating the fields in struct statx that are of interest to the caller. The user should set this to STATX_BASIC_STATS to get the basic set returned by stat(). It should be noted that asking for more information may entail extra I/O operations. buffer points to the destination for the data. This must be 256 bytes in size. ====================== MAIN ATTRIBUTES RECORD ====================== The following structures are defined in which to return the main attribute set: struct statx_timestamp { __s64 tv_sec; __s32 tv_nsec; __s32 __reserved; }; struct statx { __u32 stx_mask; __u32 stx_blksize; __u64 stx_attributes; __u32 stx_nlink; __u32 stx_uid; __u32 stx_gid; __u16 stx_mode; __u16 __spare0[1]; __u64 stx_ino; __u64 stx_size; __u64 stx_blocks; __u64 __spare1[1]; struct statx_timestamp stx_atime; struct statx_timestamp stx_btime; struct statx_timestamp stx_ctime; struct statx_timestamp stx_mtime; __u32 stx_rdev_major; __u32 stx_rdev_minor; __u32 stx_dev_major; __u32 stx_dev_minor; __u64 __spare2[14]; }; The defined bits in request_mask and stx_mask are: STATX_TYPE Want/got stx_mode & S_IFMT STATX_MODE Want/got stx_mode & ~S_IFMT STATX_NLINK Want/got stx_nlink STATX_UID Want/got stx_uid STATX_GID Want/got stx_gid STATX_ATIME Want/got stx_atime{,_ns} STATX_MTIME Want/got stx_mtime{,_ns} STATX_CTIME Want/got stx_ctime{,_ns} STATX_INO Want/got stx_ino STATX_SIZE Want/got stx_size STATX_BLOCKS Want/got stx_blocks STATX_BASIC_STATS [The stuff in the normal stat struct] STATX_BTIME Want/got stx_btime{,_ns} STATX_ALL [All currently available stuff] stx_btime is the file creation time, stx_mask is a bitmask indicating the data provided and __spares*[] are where as-yet undefined fields can be placed. Time fields are structures with separate seconds and nanoseconds fields plus a reserved field in case we want to add even finer resolution. Note that times will be negative if before 1970; in such a case, the nanosecond fields will also be negative if not zero. The bits defined in the stx_attributes field convey information about a file, how it is accessed, where it is and what it does. The following attributes map to FS_*_FL flags and are the same numerical value: STATX_ATTR_COMPRESSED File is compressed by the fs STATX_ATTR_IMMUTABLE File is marked immutable STATX_ATTR_APPEND File is append-only STATX_ATTR_NODUMP File is not to be dumped STATX_ATTR_ENCRYPTED File requires key to decrypt in fs Within the kernel, the supported flags are listed by: KSTAT_ATTR_FS_IOC_FLAGS [Are any other IOC flags of sufficient general interest to be exposed through this interface?] New flags include: STATX_ATTR_AUTOMOUNT Object is an automount trigger These are for the use of GUI tools that might want to mark files specially, depending on what they are. Fields in struct statx come in a number of classes: (0) stx_dev_*, stx_blksize. These are local system information and are always available. (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino, stx_size, stx_blocks. These will be returned whether the caller asks for them or not. The corresponding bits in stx_mask will be set to indicate whether they actually have valid values. If the caller didn't ask for them, then they may be approximated. For example, NFS won't waste any time updating them from the server, unless as a byproduct of updating something requested. If the values don't actually exist for the underlying object (such as UID or GID on a DOS file), then the bit won't be set in the stx_mask, even if the caller asked for the value. In such a case, the returned value will be a fabrication. Note that there are instances where the type might not be valid, for instance Windows reparse points. (2) stx_rdev_*. This will be set only if stx_mode indicates we're looking at a blockdev or a chardev, otherwise will be 0. (3) stx_btime. Similar to (1), except this will be set to 0 if it doesn't exist. ======= TESTING ======= The following test program can be used to test the statx system call: samples/statx/test-statx.c Just compile and run, passing it paths to the files you want to examine. The file is built automatically if CONFIG_SAMPLES is enabled. Here's some example output. Firstly, an NFS directory that crosses to another FSID. Note that the AUTOMOUNT attribute is set because transiting this directory will cause d_automount to be invoked by the VFS. [root@andromeda ~]# /tmp/test-statx -A /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:26 Inode: 1703937 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------) Secondly, the result of automounting on that directory. [root@andromeda ~]# /tmp/test-statx /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:27 Inode: 2 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
5b825c3a |
|
02-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> Add #include <linux/cred.h> dependencies to all .c files rely on sched.h doing that for them. Note that even if the count where we need to add extra headers seems high, it's still a net win, because <linux/sched.h> is included in over 2,200 files ... Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
9aba6521 |
|
12-Nov-2016 |
Amir Goldstein <amir73il@gmail.com> |
ovl: fold ovl_copy_up_truncate() into ovl_copy_up() This removes code duplication. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
ca4c8a3a |
|
16-Dec-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: treat special files like a regular fs No sense in opening special files on the underlying layers, they work just as well if opened on the overlay. Side effect is that it's no longer possible to connect one side of a pipe opened on overlayfs with the other side opened on the underlying layer. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
31c3a706 |
|
12-Oct-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
Revert "ovl: get_write_access() in truncate" This reverts commit 03bea60409328de54e4ff7ec41672e12a9cb0908. Commit 4d0c5ba2ff79 ("vfs: do get_write_access() on upper layer of overlayfs") makes the writecount checks inside overlayfs superfluous, the file is already copied up and write access acquired on the upper inode when ovl_setattr is called with ATTR_SIZE. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
dfeef688 |
|
09-Dec-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
vfs: remove ".readlink = generic_readlink" assignments If .readlink == NULL implies generic_readlink(). Generated by: to_del="\.readlink.*=.*generic_readlink" for i in `git grep -l $to_del`; do sed -i "/$to_del"/d $i; done Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
b93d4a0e |
|
31-Oct-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: fix get_acl() on tmpfs tmpfs doesn't have ->get_acl() because it only uses cached acls. This fixes the acl tests in pjdfstest when tmpfs is used as the upper layer of the overlay. Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 39a25b2b3762 ("ovl: define ->get_acl() for overlay inodes") Cc: <stable@vger.kernel.org> # v4.8
|
#
7764235b |
|
04-Oct-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: use vfs_get_link() Resulting in a complete removal of a function basically implementing the inverse of vfs_readlink(). As a bonus, now the proper security hook is also called. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
78a3fa4f |
|
04-Oct-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: use generic_readlink All filesystems that are backers for overlayfs would also use generic_readlink(). Move this logic to the overlay itself, which is a nice cleanup. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
fd50ecad |
|
29-Sep-2016 |
Andreas Gruenbacher <agruenba@redhat.com> |
vfs: Remove {get,set,remove}xattr inode operations These inode operations are no longer used; remove them. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
31051c85 |
|
26-May-2016 |
Jan Kara <jack@suse.cz> |
fs: Give dentry to inode_change_ok() instead of inode inode_change_ok() will be resposible for clearing capabilities and IMA extended attributes and as such will need dentry. Give it as an argument to inode_change_ok() instead of an inode. Also rename inode_change_ok() to setattr_prepare() to better relect that it does also some modifications in addition to checks. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
8eac98b8 |
|
06-Sep-2016 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: during copy up, switch to mounter's creds early Now, we have the notion that copy up of a file is done with the creds of mounter of overlay filesystem (as opposed to task). Right now before we switch creds, we do some vfs_getattr() operations in the context of task and that itself can fail. We should do that getattr() using the creds of mounter instead. So this patch switches to mounter's creds early during copy up process so that even vfs_getattr() is done with mounter's creds. Do not call revert_creds() unless we have already called ovl_override_creds(). [Reported by Arnd Bergmann] Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
7cb35119 |
|
01-Sep-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: listxattr: use strnlen() Be defensive about what underlying fs provides us in the returned xattr list buffer. If it's not properly null terminated, bail out with a warning insead of BUG. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
|
#
0eb45fc3 |
|
22-Aug-2016 |
Andreas Gruenbacher <agruenba@redhat.com> |
ovl: Switch to generic_getxattr Now that overlayfs has xattr handlers for iop->{set,remove}xattr, use those same handlers for iop->getxattr as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
0e585ccc |
|
22-Aug-2016 |
Andreas Gruenbacher <agruenba@redhat.com> |
ovl: Switch to generic_removexattr Commit d837a49bd57f ("ovl: fix POSIX ACL setting") switches from iop->setxattr from ovl_setxattr to generic_setxattr, so switch from ovl_removexattr to generic_removexattr as well. As far as permission checking goes, the same rules should apply in either case. While doing that, rename ovl_setxattr to ovl_xattr_set to indicate that this is not an iop->setxattr implementation and remove the unused inode argument. Move ovl_other_xattr_set above ovl_own_xattr_set so that they match the order of handlers in ovl_xattr_handlers. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
fe2b7595 |
|
22-Aug-2016 |
Andreas Gruenbacher <agruenba@redhat.com> |
ovl: Fix OVL_XATTR_PREFIX Make sure ovl_own_xattr_handler only matches attribute names starting with "overlay.", not "overlayXXX". Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
2a3a2a3f |
|
01-Sep-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: don't cache acl on overlay layer Some operations (setxattr/chmod) can make the cached acl stale. We either need to clear overlay's acl cache for the affected inode or prevent acl caching on the overlay altogether. Preventing caching has the following advantages: - no double caching, less memory used - overlay cache doesn't go stale when fs clears it's own cache Possible disadvantage is performance loss. If that becomes a problem get_acl() can be optimized for overlayfs. This patch disables caching by pre setting i_*acl to a value that - has bit 0 set, so is_uncached_acl() will return true - is not equal to ACL_NOT_CACHED, so get_acl() will not overwrite it The constant -3 was chosen for this purpose. Fixes: 39a25b2b3762 ("ovl: define ->get_acl() for overlay inodes") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
5201dc44 |
|
01-Sep-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: use cached acl on underlying layer Instead of calling ->get_acl() directly, use get_acl() to get the cached value. We will have the acl cached on the underlying inode anyway, because we do permission checking on the both the overlay and the underlying fs. So, since we already have double caching, this improves performance without any cost. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
0956254a |
|
08-Aug-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: don't copy up opaqueness When a copy up of a directory occurs which has the opaque xattr set, the xattr remains in the upper directory. The immediate behavior with overlayfs is that the upper directory is not treated as opaque, however after a remount the opaque flag is used and upper directory is treated as opaque. This causes files created in the lower layer to be hidden when using multiple lower directories. Fix by not copying up the opaque flag. To reproduce: ----8<---------8<---------8<---------8<---------8<---------8<---- mkdir -p l/d/s u v w mnt mount -t overlay overlay -olowerdir=l,upperdir=u,workdir=w mnt rm -rf mnt/d/ mkdir -p mnt/d/n umount mnt mount -t overlay overlay -olowerdir=u:l,upperdir=v,workdir=w mnt touch mnt/d/foo umount mnt mount -t overlay overlay -olowerdir=u:l,upperdir=v,workdir=w mnt ls mnt/d ----8<---------8<---------8<---------8<---------8<---------8<---- output should be: "foo n" Reported-by: Derek McGowan <dmcg@drizz.net> Link: https://bugzilla.kernel.org/show_bug.cgi?id=151291 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
|
#
500cac3c |
|
13-Jul-2016 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: append MAY_READ when diluting write checks Right now we remove MAY_WRITE/MAY_APPEND bits from mask if realfile is on lower/. This is done as files on lower will never be written and will be copied up. But to copy up a file, mounter should have MAY_READ permission otherwise copy up will fail. So set MAY_READ in mask when MAY_WRITE is reset. Dan Walsh noticed this when he did access(lowerfile, W_OK) and it returned True (context mounts) but when he tried to actually write to file, it failed as mounter did not have permission on lower file. [SzM] don't set MAY_READ if only MAY_APPEND is set without MAY_WRITE; this won't trigger a copy-up. Reported-by: Dan Walsh <dwalsh@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
e29841a0 |
|
13-Jul-2016 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: dilute permission checks on lower only if not special file Right now if file is on lower/, we remove MAY_WRITE/MAY_APPEND bits from mask as lower/ will never be written and file will be copied up. But this is not true for special files. These files are not copied up and are opened in place. So don't dilute the checks for these types of files. Reported-by: Dan Walsh <dwalsh@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
d837a49b |
|
28-Jul-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: fix POSIX ACL setting Setting POSIX ACL needs special handling: 1) Some permission checks are done by ->setxattr() which now uses mounter's creds ("ovl: do operations on underlying file system in mounter's context"). These permission checks need to be done with current cred as well. 2) Setting ACL can fail for various reasons. We do not need to copy up in these cases. In the mean time switch to using generic_setxattr. [Arnd Bergmann] Fix link error without POSIX ACL. posix_acl_from_xattr() doesn't have a 'static inline' implementation when CONFIG_FS_POSIX_ACL is disabled, and I could not come up with an obvious way to do it. This instead avoids the link error by defining two sets of ACL operations and letting the compiler drop one of the two at compile time depending on CONFIG_FS_POSIX_ACL. This avoids all references to the ACL code, also leading to smaller code. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
51f7e52d |
|
28-Jul-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: share inode for hard link Inode attributes are copied up to overlay inode (uid, gid, mode, atime, mtime, ctime) so generic code using these fields works correcty. If a hard link is created in overlayfs separate inodes are allocated for each link. If chmod/chown/etc. is performed on one of the links then the inode belonging to the other ones won't be updated. This patch attempts to fix this by sharing inodes for hard links. Use inode hash (with real inode pointer as a key) to make sure overlay inodes are shared for hard links on upper. Hard links on lower are still split (which is not user observable until the copy-up happens, see Documentation/filesystems/overlayfs.txt under "Non-standard behavior"). The inode is only inserted in the hash if it is non-directoy and upper. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
39b681f8 |
|
28-Jul-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: store real inode pointer in ->i_private To get from overlay inode to real inode we currently use 'struct ovl_entry', which has lifetime connected to overlay dentry. This is okay, since each overlay dentry had a new overlay inode allocated. Following patch will break that assumption, so need to leave out ovl_entry. This patch stores the real inode directly in i_private, with the lowest bit used to indicate whether the inode is upper or lower. Lifetime rules remain, using ovl_inode_real() must only be done while caller holds ref on overlay dentry (and hence on real dentry), or within RCU protected regions. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
a999d7e1 |
|
28-Jul-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: permission: return ECHILD instead of ENOENT The error is due to RCU and is temporary. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
d719e8f2 |
|
28-Jul-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: update atime on upper Fix atime update logic in overlayfs. This patch adds an i_op->update_time() handler to overlayfs inodes. This forwards atime updates to the upper layer only. No atime updates are done on lower layers. Remove implicit atime updates to underlying files and directories with O_NOATIME. Remove explicit atime update in ovl_readlink(). Clear atime related mnt flags from cloned upper mount. This means atime updates are controlled purely by overlayfs mount options. Reported-by: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
9c630ebe |
|
28-Jul-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: simplify permission checking The fact that we always do permission checking on the overlay inode and clear MAY_WRITE for checking access to the lower inode allows cruft to be removed from ovl_permission(). 1) "default_permissions" option effectively did generic_permission() on the overlay inode with i_mode, i_uid and i_gid updated from underlying filesystem. This is what we do by default now. It did the update using vfs_getattr() but that's only needed if the underlying filesystem can change (which is not allowed). We may later introduce a "paranoia_mode" that verifies that mode/uid/gid are not changed. 2) splitting out the IS_RDONLY() check from inode_permission() also becomes unnecessary once we remove the MAY_WRITE from the lower inode check. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
754f8cb7 |
|
01-Jul-2016 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: do not require mounter to have MAY_WRITE on lower Now we have two levels of checks in ovl_permission(). overlay inode is checked with the creds of task while underlying inode is checked with the creds of mounter. Looks like mounter does not have to have WRITE access to files on lower/. So remove the MAY_WRITE from access mask for checks on underlying lower inode. This means task should still have the MAY_WRITE permission on lower inode and mounter is not required to have MAY_WRITE. It also solves the problem of read only NFS mounts being used as lower. If __inode_permission(lower_inode, MAY_WRITE) is called on read only NFS, it fails. By resetting MAY_WRITE, check succeeds and case of read only NFS shold work with overlay without having to specify any special mount options (default permission). Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
1175b6b8 |
|
01-Jul-2016 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: do operations on underlying file system in mounter's context Given we are now doing checks both on overlay inode as well underlying inode, we should be able to do checks and operations on underlying file system using mounter's context. So modify all operations to do checks/operations on underlying dentry/inode in the context of mounter. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
c0ca3d70 |
|
01-Jul-2016 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: modify ovl_permission() to do checks on two inodes Right now ovl_permission() calls __inode_permission(realinode), to do permission checks on real inode and no checks are done on overlay inode. Modify it to do checks both on overlay inode as well as underlying inode. Checks on overlay inode will be done with the creds of calling task while checks on underlying inode will be done with the creds of mounter. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
39a25b2b |
|
01-Jul-2016 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: define ->get_acl() for overlay inodes Now we are planning to do DAC permission checks on overlay inode itself. And to make it work, we will need to make sure we can get acls from underlying inode. So define ->get_acl() for overlay inodes and this in turn calls into underlying filesystem to get acls, if any. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
58ed4e70 |
|
25-May-2016 |
Andreas Gruenbacher <agruenba@redhat.com> |
ovl: store ovl_entry in inode->i_private for all inodes Previously this was only done for directory inodes. Doing so for all inodes makes for a nice cleanup in ovl_permission at zero cost. Inodes are not shared for hard links on the overlay, so this works fine. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
07a2daab |
|
01-Jul-2016 |
Vivek Goyal <vgoyal@redhat.com> |
ovl: Copy up underlying inode's ->i_mode to overlay inode Right now when a new overlay inode is created, we initialize overlay inode's ->i_mode from underlying inode ->i_mode but we retain only file type bits (S_IFMT) and discard permission bits. This patch changes it and retains permission bits too. This should allow overlay to do permission checks on overlay inode itself in task context. [SzM] It also fixes clearing suid/sgid bits on write. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org>
|
#
b99c2d91 |
|
04-Jul-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: handle ATTR_KILL* Before 4bacc9c9234c ("overlayfs: Make f_path...") file->f_path pointed to the underlying file, hence suid/sgid removal on write worked fine. After that patch file->f_path pointed to the overlay file, and the file mode bits weren't copied to overlay_inode->i_mode. So the suid/sgid removal simply stopped working. The fix is to copy the mode bits, but then ovl_setattr() needs to clear ATTR_MODE to avoid the BUG() in notify_change(). So do this first, then in the next patch copy the mode. Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org>
|
#
2d902671 |
|
30-Jun-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
vfs: merge .d_select_inode() into .d_real() The two methods essentially do the same: find the real dentry/inode belonging to an overlay dentry. The difference is in the usage: vfs_open() uses ->d_select_inode() and expects the function to perform copy-up if necessary based on the open flags argument. file_dentry() uses ->d_real() passing in the overlay dentry as well as the underlying inode. vfs_rename() uses ->d_select_inode() but passes zero flags. ->d_real() with a zero inode would have worked just as well here. This patch merges the functionality of ->d_select_inode() into ->d_real() by adding an 'open_flags' argument to the latter. [Al Viro] Make the signature of d_real() match that of ->d_real() again. And constify the inode argument, while we are at it. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
03bea604 |
|
29-Jun-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: get_write_access() in truncate When truncating a file we should check write access on the underlying inode. And we should do so on the lower file as well (before copy-up) for consistency. Original patch and test case by Aihua Zhang. - - >o >o - - test.c - - >o >o - - #include <stdio.h> #include <errno.h> #include <unistd.h> int main(int argc, char *argv[]) { int ret; ret = truncate(argv[0], 4096); if (ret != -1) { fprintf(stderr, "truncate(argv[0]) should have failed\n"); return 1; } if (errno != ETXTBSY) { perror("truncate(argv[0])"); return 1; } return 0; } - - >o >o - - >o >o - - >o >o - - Reported-by: Aihua Zhang <zhangaihua1@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
|
#
a4859d75 |
|
29-Jun-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: fix dentry leak for default_permissions When using the 'default_permissions' mount option, ovl_permission() on non-directories was missing a dput(alias), resulting in "BUG Dentry still in use". Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 8d3095f4ad47 ("ovl: default permissions") Cc: <stable@vger.kernel.org> # v4.5+
|
#
b581755b |
|
06-Jun-2016 |
Miklos Szeredi <mszeredi@redhat.com> |
ovl: xattr filter fix a) ovl_need_xattr_filter() is wrong, we can have multiple lower layers overlaid, all of which (except the lowest one) honouring the "trusted.overlay.opaque" xattr. So need to filter everything except the bottom and the pure-upper layer. b) we no longer can assume that inode is attached to dentry in get/setxattr. This patch unconditionally filters private xattrs to fix both of the above. Performance impact for get/removexattrs is likely in the noise. For listxattrs it might be measurable in pathological cases, but I very much hope nobody cares. If they do, we'll fix it then. Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: b96809173e94 ("security_d_instantiate(): move to the point prior to attaching dentry to inode")
|
#
3767e255 |
|
27-May-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
switch ->setxattr() to passing dentry and inode separately smack ->d_instantiate() uses ->setxattr(), so to be able to call it before we'd hashed the new dentry and attached it to inode, we need ->setxattr() instances getting the inode as an explicit argument rather than obtaining it from dentry. Similar change for ->getxattr() had been done in commit ce23e64. Unlike ->getxattr() (which is used by both selinux and smack instances of ->d_instantiate()) ->setxattr() is used only by smack one and unfortunately it got missed back then. Reported-by: Seung-Woo Kim <sw0312.kim@samsung.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
ce23e640 |
|
10-Apr-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
->getxattr(): pass dentry and inode as separate arguments Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
b81de061 |
|
31-Jan-2016 |
Konstantin Khlebnikov <koct9i@gmail.com> |
ovl: copy new uid/gid into overlayfs runtime inode Overlayfs must update uid/gid after chown, otherwise functions like inode_owner_or_capable() will check user against stale uid. Catched by xfstests generic/087, it chowns file and calls utimes. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org>
|
#
5955102c |
|
22-Jan-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
wrappers for ->i_mutex access parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
fceef393 |
|
29-Dec-2015 |
Al Viro <viro@zeniv.linux.org.uk> |
switch ->get_link() to delayed_call, kill ->put_link() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
cf9a6784 |
|
11-Dec-2015 |
Miklos Szeredi <miklos@szeredi.hu> |
ovl: setattr: check permissions before copy-up Without this copy-up of a file can be forced, even without actually being allowed to do anything on the file. [Arnd Bergmann] include <linux/pagemap.h> for PAGE_CACHE_SIZE (used by MAX_LFS_FILESIZE definition). Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org>
|
#
6b255391 |
|
17-Nov-2015 |
Al Viro <viro@zeniv.linux.org.uk> |
replace ->follow_link() with new method that could stay in RCU mode new method: ->get_link(); replacement of ->follow_link(). The differences are: * inode and dentry are passed separately * might be called both in RCU and non-RCU mode; the former is indicated by passing it a NULL dentry. * when called that way it isn't allowed to block and should return ERR_PTR(-ECHILD) if it needs to be called in non-RCU mode. It's a flagday change - the old method is gone, all in-tree instances converted. Conversion isn't hard; said that, so far very few instances do not immediately bail out when called in RCU mode. That'll change in the next commits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
0f7ff2da |
|
05-Dec-2015 |
Al Viro <viro@zeniv.linux.org.uk> |
ovl: get rid of the dead code left from broken (and disabled) optimizations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
acff81ec |
|
04-Dec-2015 |
Miklos Szeredi <miklos@szeredi.hu> |
ovl: fix permission checking for setattr [Al Viro] The bug is in being too enthusiastic about optimizing ->setattr() away - instead of "copy verbatim with metadata" + "chmod/chown/utimes" (with the former being always safe and the latter failing in case of insufficient permissions) it tries to combine these two. Note that copyup itself will have to do ->setattr() anyway; _that_ is where the elevated capabilities are right. Having these two ->setattr() (one to set verbatim copy of metadata, another to do what overlayfs ->setattr() had been asked to do in the first place) combined is where it breaks. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
8d3095f4 |
|
12-Oct-2015 |
Miklos Szeredi <miklos@szeredi.hu> |
ovl: default permissions Add mount option "default_permissions" to alter the way permissions are calculated. Without this option and prior to this patch permissions were calculated by underlying lower or upper filesystem. With this option the permissions are calculated by overlayfs based on the file owner, group and mode bits. This has significance for example when a read-only exported NFS filesystem is used as a lower layer. In this case the underlying NFS filesystem will reply with EROFS, in which case all we know is that the filesystem is read-only. But that's not what we are interested in, we are interested in whether the access would be allowed if the filesystem wasn't read-only; the server doesn't tell us that, and would need updating at various levels, which doesn't seem practicable. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
|
#
1c8a47df |
|
12-Oct-2015 |
Miklos Szeredi <miklos@szeredi.hu> |
ovl: fix open in stacked overlay If two overlayfs filesystems are stacked on top of each other, then we need recursion in ovl_d_select_inode(). I guess d_backing_inode() is supposed to do that. But currently it doesn't and that functionality is open coded in vfs_open(). This is now copied into ovl_d_select_inode() to fix this regression. Reported-by: Alban Crequy <alban.crequy@gmail.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay...") Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> # v4.2+
|
#
9391dd00 |
|
12-Jul-2015 |
Al Viro <viro@zeniv.linux.org.uk> |
fix a braino in ovl_d_select_inode() when opening a directory we want the overlayfs inode, not one from the topmost layer. Reported-By: Andrey Jr. Melnikov <temnota.am@gmail.com> Tested-By: Andrey Jr. Melnikov <temnota.am@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4bacc9c9 |
|
18-Jun-2015 |
David Howells <dhowells@redhat.com> |
overlayfs: Make f_path always point to the overlay and f_inode to the underlay Make file->f_path always point to the overlay dentry so that the path in /proc/pid/fd is correct and to ensure that label-based LSMs have access to the overlay as well as the underlay (path-based LSMs probably don't need it). Using my union testsuite to set things up, before the patch I see: [root@andromeda union-testsuite]# bash 5</mnt/a/foo107 [root@andromeda union-testsuite]# ls -l /proc/$$/fd/ ... lr-x------. 1 root root 64 Jun 5 14:38 5 -> /a/foo107 [root@andromeda union-testsuite]# stat /mnt/a/foo107 ... Device: 23h/35d Inode: 13381 Links: 1 ... [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5 ... Device: 23h/35d Inode: 13381 Links: 1 ... After the patch: [root@andromeda union-testsuite]# bash 5</mnt/a/foo107 [root@andromeda union-testsuite]# ls -l /proc/$$/fd/ ... lr-x------. 1 root root 64 Jun 5 14:22 5 -> /mnt/a/foo107 [root@andromeda union-testsuite]# stat /mnt/a/foo107 ... Device: 23h/35d Inode: 40346 Links: 1 ... [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5 ... Device: 23h/35d Inode: 40346 Links: 1 ... Note the change in where /proc/$$/fd/5 points to in the ls command. It was pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107 (which is correct). The inode accessed, however, is the lower layer. The union layer is on device 25h/37d and the upper layer on 24h/36d. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
f25801ee |
|
18-Jun-2015 |
David Howells <dhowells@redhat.com> |
overlay: Call ovl_drop_write() earlier in ovl_dentry_open() Call ovl_drop_write() earlier in ovl_dentry_open() before we call vfs_open() as we've done the copy up for which we needed the freeze-write lock by that point. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
5f2c4179 |
|
07-May-2015 |
Al Viro <viro@zeniv.linux.org.uk> |
switch ->put_link() from dentry to inode only one instance looks at that argument at all; that sole exception wants inode rather than dentry. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
6e77137b |
|
02-May-2015 |
Al Viro <viro@zeniv.linux.org.uk> |
don't pass nameidata to ->follow_link() its only use is getting passed to nd_jump_link(), which can obtain it from current->nameidata Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
680baacb |
|
02-May-2015 |
Al Viro <viro@zeniv.linux.org.uk> |
new ->follow_link() and ->put_link() calling conventions a) instead of storing the symlink body (via nd_set_link()) and returning an opaque pointer later passed to ->put_link(), ->follow_link() _stores_ that opaque pointer (into void * passed by address by caller) and returns the symlink body. Returning ERR_PTR() on error, NULL on jump (procfs magic symlinks) and pointer to symlink body for normal symlinks. Stored pointer is ignored in all cases except the last one. Storing NULL for opaque pointer (or not storing it at all) means no call of ->put_link(). b) the body used to be passed to ->put_link() implicitly (via nameidata). Now only the opaque pointer is. In the cases when we used the symlink body to free stuff, ->follow_link() now should store it as opaque pointer in addition to returning it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
3188b295 |
|
22-Mar-2015 |
NeilBrown <neilb@suse.de> |
ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link ovl_follow_link current calls ->put_link on an error path. However ->put_link is about to change in a way that it will be impossible to call it from ovl_follow_link. So rearrange the code to avoid the need for that error path. Specifically: move the kmalloc() call before the ->follow_link() call to the subordinate filesystem. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
cead89bb |
|
24-Nov-2014 |
hujianyang <hujianyang@huawei.com> |
ovl: Use macros to present ovl_xattr This patch adds two macros: OVL_XATTR_PRE_NAME and OVL_XATTR_PRE_LEN to present ovl_xattr name prefix and its length. Also, a new macro OVL_XATTR_OPAQUE is introduced to replace old *ovl_opaque_xattr*. Fix the length of "trusted.overlay." to *16*. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
#
1ba38725 |
|
26-Nov-2014 |
hujianyang <hujianyang@huawei.com> |
ovl: Cleanup redundant blank lines This patch removes redundant blanks lines in overlayfs. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
#
1afaba1e |
|
12-Dec-2014 |
Miklos Szeredi <mszeredi@suse.cz> |
ovl: make path-type a bitmap OVL_PATH_PURE_UPPER -> __OVL_PATH_UPPER | __OVL_PATH_PURE OVL_PATH_UPPER -> __OVL_PATH_UPPER OVL_PATH_MERGE -> __OVL_PATH_UPPER | __OVL_PATH_MERGE OVL_PATH_LOWER -> 0 Multiple R/O layers will allow __OVL_PATH_MERGE without __OVL_PATH_UPPER. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
#
52148463 |
|
20-Nov-2014 |
Miklos Szeredi <mszeredi@suse.cz> |
ovl: fix race in private xattr checks Xattr operations can race with copy up. This does not matter as long as we consistently fiter out "trunsted.overlay.opaque" attribute on upper directories. Previously we checked parent against OVL_PATH_MERGE. This is too general, and prone to race with copy-up. I.e. we found the parent to be on the lower layer but ovl_dentry_real() would return the copied-up dentry, possibly with the "opaque" attribute. So instead use ovl_path_real() and decide to filter the attributes based on the actual type of the dentry we'll use. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
#
e9be9d5e |
|
23-Oct-2014 |
Miklos Szeredi <mszeredi@suse.cz> |
overlay filesystem Overlayfs allows one, usually read-write, directory tree to be overlaid onto another, read-only directory tree. All modifications go to the upper, writable layer. This type of mechanism is most often used for live CDs but there's a wide variety of other uses. The implementation differs from other "union filesystem" implementations in that after a file is opened all operations go directly to the underlying, lower or upper, filesystems. This simplifies the implementation and allows native performance in these cases. The dentry tree is duplicated from the underlying filesystems, this enables fast cached lookups without adding special support into the VFS. This uses slightly more memory than union mounts, but dentries are relatively small. Currently inodes are duplicated as well, but it is a possible optimization to share inodes for non-directories. Opening non directories results in the open forwarded to the underlying filesystem. This makes the behavior very similar to union mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file descriptors). Usage: mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper/upper,workdir=/upper/work /overlay The following cotributions have been folded into this patch: Neil Brown <neilb@suse.de>: - minimal remount support - use correct seek function for directories - initialise is_real before use - rename ovl_fill_cache to ovl_dir_read Felix Fietkau <nbd@openwrt.org>: - fix a deadlock in ovl_dir_read_merged - fix a deadlock in ovl_remove_whiteouts Erez Zadok <ezk@fsl.cs.sunysb.edu> - fix cleanup after WARN_ON Sedat Dilek <sedat.dilek@googlemail.com> - fix up permission to confirm to new API Robin Dong <hao.bigrat@gmail.com> - fix possible leak in ovl_new_inode - create new inode in ovl_link Andy Whitcroft <apw@canonical.com> - switch to __inode_permission() - copy up i_uid/i_gid from the underlying inode AV: - ovl_copy_up_locked() - dput(ERR_PTR(...)) on two failure exits - ovl_clear_empty() - one failure exit forgetting to do unlock_rename(), lack of check for udir being the parent of upper, dropping and regaining the lock on udir (which would require _another_ check for parent being right). - bogus d_drop() in copyup and rename [fix from your mail] - copyup/remove and copyup/rename races [fix from your mail] - ovl_dir_fsync() leaving ERR_PTR() in ->realfile - ovl_entry_free() is pointless - it's just a kfree_rcu() - fold ovl_do_lookup() into ovl_lookup() - manually assigning ->d_op is wrong. Just use ->s_d_op. [patches picked from Miklos]: * copyup/remove and copyup/rename races * bogus d_drop() in copyup and rename Also thanks to the following people for testing and reporting bugs: Jordi Pujol <jordipujolp@gmail.com> Andy Whitcroft <apw@canonical.com> Michal Suchanek <hramrach@centrum.cz> Felix Fietkau <nbd@openwrt.org> Erez Zadok <ezk@fsl.cs.sunysb.edu> Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|