Cross Reference: /linux-master/fs/exportfs/expfs.c

History log of /linux-master/fs/exportfs/expfs.c
Revision	Date	Author	Comments
# 42c3732f	30-Dec-2023	Chuck Lever <chuck.lever@oracle.com>	fs: Create a generic is_dot_dotdot() utility De-duplicate the same functionality in several places by hoisting the is_dot_dotdot() utility function into linux/fs.h. Suggested-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# 9473c445	28-Dec-2023	Trond Myklebust <trond.myklebust@hammerspace.com>	exportfs: fix the fallback implementation of the get_name export operation The fallback implementation for the get_name export operation uses readdir() to try to match the inode number to a filename. That filename is then used together with lookup_one() to produce a dentry. A problem arises when we match the '.' or '..' entries, since that causes lookup_one() to fail. This has sometimes been seen to occur for filesystems that violate POSIX requirements around uniqueness of inode numbers, something that is common for snapshot directories. This patch just ensures that we skip '.' and '..' rather than allowing a match. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/linux-nfs/CAOQ4uxiOZobN76OKB-VBNXWeFKVwLW_eK5QtthGyYzWU9mjb7Q@mail.gmail.com/ Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# d9e5d922	26-Oct-2023	Amir Goldstein <amir73il@gmail.com>	fs: fix build error with CONFIG_EXPORTFS=m or not defined Many of the filesystems that call the generic exportfs helpers do not select the EXPORTFS config. Move generic_encode_ino32_fh() to libfs.c, same as generic_fh_to_*() to avoid having to fix all those config dependencies. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310262151.renqMvme-lkp@intel.com/ Fixes: dfaf653dc415 ("exportfs: make ->encode_fh() a mandatory method for NFS export") Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231026204540.143217-1-amir73il@gmail.com Tested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
# 64343119	23-Oct-2023	Amir Goldstein <amir73il@gmail.com>	exportfs: support encoding non-decodeable file handles by default AT_HANDLE_FID was added as an API for name_to_handle_at() that request the encoding of a file id, which is not intended to be decoded. This file id is used by fanotify to describe objects in events. So far, overlayfs is the only filesystem that supports encoding non-decodeable file ids, by providing export_operations with an ->encode_fh() method and without a ->decode_fh() method. Add support for encoding non-decodeable file ids to all the filesystems that do not provide export_operations, by encoding a file id of type FILEID_INO64_GEN from { i_ino, i_generation }. A filesystem may that does not support NFS export, can opt-out of encoding non-decodeable file ids for fanotify by defining an empty export_operations struct (i.e. with a NULL ->encode_fh() method). This allows the use of fanotify events with file ids on filesystems like 9p which do not support NFS export to bring fanotify in feature parity with inotify on those filesystems. Note that fanotify also requires that the filesystems report a non-null fsid. Currently, many simple filesystems that have support for inotify (e.g. debugfs, tracefs, sysfs) report a null fsid, so can still not be used with fanotify in file id reporting mode. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-5-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
# e21fc203	23-Oct-2023	Amir Goldstein <amir73il@gmail.com>	exportfs: make ->encode_fh() a mandatory method for NFS export Rename the default helper for encoding FILEID_INO32_GEN* file handles to generic_encode_ino32_fh() and convert the filesystems that used the default implementation to use the generic helper explicitly. After this change, exportfs_encode_inode_fh() no longer has a default implementation to encode FILEID_INO32_GEN* file handles. This is a step towards allowing filesystems to encode non-decodeable file handles for fanotify without having to implement any export_operations. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-3-amir73il@gmail.com Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
# 66c62769	23-Oct-2023	Amir Goldstein <amir73il@gmail.com>	exportfs: add helpers to check if filesystem can encode/decode file handles The logic of whether filesystem can encode/decode file handles is open coded in many places. In preparation to changing the logic, move the open coded logic into inline helpers. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-2-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
# 7afdc0c9	31-Jul-2023	Zhu Wang <wangzhu9@huawei.com>	exportfs: remove kernel-doc warnings in exportfs Remove kernel-doc warning in exportfs: fs/exportfs/expfs.c:395: warning: Function parameter or member 'parent' not described in 'exportfs_encode_inode_fh' Signed-off-by: Zhu Wang <wangzhu9@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# 3e327154	05-Aug-2023	Linus Torvalds <torvalds@linux-foundation.org>	vfs: get rid of old '->iterate' directory operation All users now just use '->iterate_shared()', which only takes the directory inode lock for reading. Filesystems that never got convered to shared mode now instead use a wrapper that drops the lock, re-takes it in write mode, calls the old function, and then downgrades the lock back to read mode. This way the VFS layer and other callers no longer need to care about filesystems that never got converted to the modern era. The filesystems that use the new wrapper are ceph, coda, exfat, jfs, ntfs, ocfs2, overlayfs, and vboxsf. Honestly, several of them look like they really could just iterate their directories in shared mode and skip the wrapper entirely, but the point of this change is to not change semantics or fix filesystems that haven't been fixed in the last 7+ years, but to finally get rid of the dual iterators. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
# 304e9c83	02-May-2023	Amir Goldstein <amir73il@gmail.com>	exportfs: add explicit flag to request non-decodeable file handles So far, all callers of exportfs_encode_inode_fh(), except for fsnotify's show_mark_fhandle(), check that filesystem can decode file handles, but we would like to add more callers that do not require a file handle that can be decoded. Introduce a flag to explicitly request a file handle that may not to be decoded later and a wrapper exportfs_encode_fid() that sets this flag and convert show_mark_fhandle() to use the new wrapper. This will be used to allow adding fanotify support to filesystems that do not support NFS export. Acked-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230502124817.3070545-3-amir73il@gmail.com>
# b5287827	02-May-2023	Amir Goldstein <amir73il@gmail.com>	exportfs: change connectable argument to bit flags Convert the bool connectable arguemnt into a bit flags argument and define the EXPORT_FS_CONNECTABLE flag as a requested property of the file handle. We are going to add a flag for requesting non-decodeable file handles. Acked-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230502124817.3070545-2-amir73il@gmail.com>
# 4609e1f1	12-Jan-2023	Christian Brauner <brauner@kernel.org>	fs: port ->permission() to pass mnt_idmap Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
# 427505ff	21-Oct-2022	David Disseldorp <ddiss@suse.de>	exportfs: use pr_debug for unreachable debug statements expfs.c has a bunch of dprintk statements which are unusable due to: #define dprintk(fmt, args...) do{}while(0) Use pr_debug so that they can be enabled dynamically. Also make some minor changes to the debug statements to fix some incorrect types, and remove __func__ which can be handled by dynamic debug separately. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# 25885a35	16-Aug-2022	Al Viro <viro@zeniv.linux.org.uk>	Change calling conventions for filldir_t filldir_t instances (directory iterators callbacks) used to return 0 for "OK, keep going" or -E... for "stop". Note that it's NOT how the error values are reported - the rules for those are callback-dependent and ->iterate{,_shared}() instances only care about zero vs. non-zero (look at emit_dir() and friends). So let's just return bool ("should we keep going?") - it's less confusing that way. The choice between "true means keep going" and "true means stop" is bikesheddable; we have two groups of callbacks - do something for everything in directory, until we run into problem and find an entry in directory and do something to it. The former tended to use 0/-E... conventions - -E<something> on failure. The latter tended to use 0/1, 1 being "stop, we are done". The callers treated anything non-zero as "stop", ignoring which non-zero value did they get. "true means stop" would be more natural for the second group; "true means keep going" - for the first one. I tried both variants and the things like if allocation failed something = -ENOMEM; return true; just looked unnatural and asking for trouble. [folded suggestion from Matthew Wilcox <willy@infradead.org>] Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 3a761d72	03-Apr-2022	Christian Brauner <brauner@kernel.org>	exportfs: support idmapped mounts Make the two locations where exportfs helpers check permission to lookup a given inode idmapped mount aware by switching it to the lookup_one() helper. This is a bugfix for the open_by_handle_at() system call which doesn't take idmapped mounts into account currently. It's not tied to a specific commit so we'll just Cc stable. In addition this is required to support idmapped base layers in overlay. The overlay filesystem uses exportfs to encode and decode file handles for its index=on mount option and when nfs_export=on. Cc: <stable@vger.kernel.org> Cc: <linux-fsdevel@vger.kernel.org> Tested-by: Giuseppe Scrivano <gscrivan@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
# d045465f	30-Nov-2020	Trond Myklebust <trond.myklebust@hammerspace.com>	exportfs: Add a function to return the raw output from fh_to_dentry() In order to allow nfsd to accept return values that are not acceptable to overlayfs and others, add a new function. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# 581ae686	08-Nov-2019	Al Viro <viro@zeniv.linux.org.uk>	race in exportfs_decode_fh() On Sat, Nov 02, 2019 at 06:08:42PM +0000, Al Viro wrote: > It is converging to a reasonably small and understandable surface, actually, > most of that being in core pathname resolution. Two big piles of nightmares > left to review - overlayfs and (somewhat surprisingly) setxattr call chains, > the latter due to IMA/EVM/LSM insanity... Oh, lovely - in exportfs_decode_fh() we have this: err = exportfs_get_name(mnt, target_dir, nbuf, result); if (!err) { inode_lock(target_dir->d_inode); nresult = lookup_one_len(nbuf, target_dir, strlen(nbuf)); inode_unlock(target_dir->d_inode); if (!IS_ERR(nresult)) { if (nresult->d_inode) { dput(result); result = nresult; } else dput(nresult); } } We have derived the parent from fhandle, we have a disconnected dentry for child, we go look for the name. We even find it. Now, we want to look it up. And some bastard goes and unlinks it, just as we are trying to lock the parent. We do a lookup, and get a negative dentry. Then we unlock the parent... and some other bastard does e.g. mkdir with the same name. OK, nresult->d_inode is not NULL (anymore). It has fuck-all to do with the original fhandle (different inumber, etc.) but we happily accept it. Even better, we have no barriers between our check and nresult becoming positive. IOW, having observed non-NULL ->d_inode doesn't give us enough - e.g. we might still see the old ->d_flags value, from back when ->d_inode used to be NULL. On something like alpha we also have no promises that we'll observe anything about the fields of nresult->d_inode, but ->d_flags alone is enough for fun. The callers can't e.g. expect d_is_reg() et.al. to match the reality. This is obviously bogus. And the fix is obvious: check that nresult->d_inode is equal to result->d_inode before unlocking the parent. Note that we'd already had the original result and all of its aliases rejected by the 'acceptable' predicate, so if nresult doesn't supply us a better alias, we are SOL. Does anyone see objections to the following patch? Christoph, that seems to be your code; am I missing something subtle here? AFAICS, that goes back to 2007 or so... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# a2ece088	08-Nov-2019	Al Viro <viro@zeniv.linux.org.uk>	exportfs_decode_fh(): negative pinned may become positive without the parent locked Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# ec23eb54	26-Jul-2019	Mauro Carvalho Chehab <mchehab+samsung@kernel.org>	docs: fs: convert docs without extension to ReST There are 3 remaining files without an extension inside the fs docs dir. Manually convert them to ReST. In the case of the nfs/exporting.rst file, as the nfs docs aren't ported yet, I opted to convert and add a :orphan: there, with should be removed when it gets added into a nfs-specific part of the fs documentation. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
# 09c434b8	19-May-2019	Thomas Gleixner <tglx@linutronix.de>	treewide: Add SPDX license identifier for more missed files Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
# 2084ac6c	23-Nov-2018	Pan Bian <bianpan2016@163.com>	exportfs: do not read dentry after free The function dentry_connected calls dput(dentry) to drop the previously acquired reference to dentry. In this case, dentry can be released. After that, IS_ROOT(dentry) checks the condition (dentry == dentry->d_parent), which may result in a use-after-free bug. This patch directly compares dentry with its parent obtained before dropping the reference. Fixes: a056cc8934c("exportfs: stop retrying once we race with rename/remove") Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 909e22e0	18-Nov-2018	YueHaibing <yuehaibing@huawei.com>	exportfs: fix 'passing zero to ERR_PTR()' warning Fix a static code checker warning: fs/exportfs/expfs.c:171 reconnect_one() warn: passing zero to 'ERR_PTR' The error path for lookup_one_len_unlocked failure should set err to PTR_ERR. Fixes: bbf7a8a3562f ("exportfs: move most of reconnect_path to helper function") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 8a22efa1	09-Mar-2018	Amir Goldstein <amir73il@gmail.com>	ovl: do not try to reconnect a disconnected origin dentry On lookup of non directory, we try to decode the origin file handle stored in upper inode. The origin file handle is supposed to be decoded to a disconnected non-dir dentry, which is fine, because we only need the lower inode of a copy up origin. However, if the origin file handle somehow turns out to be a directory we pay the expensive cost of reconnecting the directory dentry, only to get a mismatch file type and drop the dentry. Optimize this case by explicitly opting out of reconnecting the dentry. Opting-out of reconnect is done by passing a NULL acceptable callback to exportfs_decode_fh(). While the case described above is a strange corner case that does not really need to be optimized, the API added for this optimization will be used by a following patch to optimize a more common case of decoding an overlayfs file handle. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
# a528d35e	31-Jan-2017	David Howells <dhowells@redhat.com>	statx: Add a system call to make enhanced file info available Add a system call to make extended file information available, including file creation and some attribute flags where available through the underlying filesystem. The getattr inode operation is altered to take two additional arguments: a u32 request_mask and an unsigned int flags that indicate the synchronisation mode. This change is propagated to the vfs_getattr() function. Functions like vfs_stat() are now inline wrappers around new functions vfs_statx() and vfs_statx_fd() to reduce stack usage. ======== OVERVIEW ======== The idea was initially proposed as a set of xattrs that could be retrieved with getxattr(), but the general preference proved to be for a new syscall with an extended stat structure. A number of requests were gathered for features to be included. The following have been included: (1) Make the fields a consistent size on all arches and make them large. (2) Spare space, request flags and information flags are provided for future expansion. (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an __s64). (4) Creation time: The SMB protocol carries the creation time, which could be exported by Samba, which will in turn help CIFS make use of FS-Cache as that can be used for coherency data (stx_btime). This is also specified in NFSv4 as a recommended attribute and could be exported by NFSD [Steve French]. (5) Lightweight stat: Ask for just those details of interest, and allow a netfs (such as NFS) to approximate anything not of interest, possibly without going to the server [Trond Myklebust, Ulrich Drepper, Andreas Dilger] (AT_STATX_DONT_SYNC). (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks its cached attributes are up to date [Trond Myklebust] (AT_STATX_FORCE_SYNC). And the following have been left out for future extension: (7) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. Can also be used to modify fill_post_wcc() in NFSD which retrieves i_version directly, but has just called vfs_getattr(). It could get it from the kstat struct if it used vfs_xgetattr() instead. (There's disagreement on the exact semantics of a single field, since not all filesystems do this the same way). (8) BSD stat compatibility: Including more fields from the BSD stat such as creation time (st_btime) and inode generation number (st_gen) [Jeremy Allison, Bernd Schubert]. (9) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd Schubert]. (This was asked for but later deemed unnecessary with the open-by-handle capability available and caused disagreement as to whether it's a security hole or not). (10) Extra coherency data may be useful in making backups [Andreas Dilger]. (No particular data were offered, but things like last backup timestamp, the data version number and the DOS archive bit would come into this category). (11) Allow the filesystem to indicate what it can/cannot provide: A filesystem can now say it doesn't support a standard stat feature if that isn't available, so if, for instance, inode numbers or UIDs don't exist or are fabricated locally... (This requires a separate system call - I have an fsinfo() call idea for this). (12) Store a 16-byte volume ID in the superblock that can be returned in struct xstat [Steve French]. (Deferred to fsinfo). (13) Include granularity fields in the time data to indicate the granularity of each of the times (NFSv4 time_delta) [Steve French]. (Deferred to fsinfo). (14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. Note that the Linux IOC flags are a mess and filesystems such as Ext4 define flags that aren't in linux/fs.h, so translation in the kernel may be a necessity (or, possibly, we provide the filesystem type too). (Some attributes are made available in stx_attributes, but the general feeling was that the IOC flags were to ext[234]-specific and shouldn't be exposed through statx this way). (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, Michael Kerrisk]. (Deferred, probably to fsinfo. Finding out if there's an ACL or seclabal might require extra filesystem operations). (16) Femtosecond-resolution timestamps [Dave Chinner]. (A __reserved field has been left in the statx_timestamp struct for this - if there proves to be a need). (17) A set multiple attributes syscall to go with this. =============== NEW SYSTEM CALL =============== The new system call is: int ret = statx(int dfd, const char filename, unsigned int flags, unsigned int mask, struct statx buffer); The dfd, filename and flags parameters indicate the file to query, in a similar way to fstatat(). There is no equivalent of lstat() as that can be emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is also no equivalent of fstat() as that can be emulated by passing a NULL filename to statx() with the fd of interest in dfd. Whether or not statx() synchronises the attributes with the backing store can be controlled by OR'ing a value into the flags argument (this typically only affects network filesystems): (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this respect. (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise its attributes with the server - which might require data writeback to occur to get the timestamps correct. (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a network filesystem. The resulting values should be considered approximate. mask is a bitmask indicating the fields in struct statx that are of interest to the caller. The user should set this to STATX_BASIC_STATS to get the basic set returned by stat(). It should be noted that asking for more information may entail extra I/O operations. buffer points to the destination for the data. This must be 256 bytes in size. ====================== MAIN ATTRIBUTES RECORD ====================== The following structures are defined in which to return the main attribute set: struct statx_timestamp { __s64 tv_sec; __s32 tv_nsec; __s32 __reserved; }; struct statx { __u32 stx_mask; __u32 stx_blksize; __u64 stx_attributes; __u32 stx_nlink; __u32 stx_uid; __u32 stx_gid; __u16 stx_mode; __u16 __spare0[1]; __u64 stx_ino; __u64 stx_size; __u64 stx_blocks; __u64 __spare1[1]; struct statx_timestamp stx_atime; struct statx_timestamp stx_btime; struct statx_timestamp stx_ctime; struct statx_timestamp stx_mtime; __u32 stx_rdev_major; __u32 stx_rdev_minor; __u32 stx_dev_major; __u32 stx_dev_minor; __u64 __spare2[14]; }; The defined bits in request_mask and stx_mask are: STATX_TYPE Want/got stx_mode & S_IFMT STATX_MODE Want/got stx_mode & ~S_IFMT STATX_NLINK Want/got stx_nlink STATX_UID Want/got stx_uid STATX_GID Want/got stx_gid STATX_ATIME Want/got stx_atime{,_ns} STATX_MTIME Want/got stx_mtime{,_ns} STATX_CTIME Want/got stx_ctime{,_ns} STATX_INO Want/got stx_ino STATX_SIZE Want/got stx_size STATX_BLOCKS Want/got stx_blocks STATX_BASIC_STATS [The stuff in the normal stat struct] STATX_BTIME Want/got stx_btime{,_ns} STATX_ALL [All currently available stuff] stx_btime is the file creation time, stx_mask is a bitmask indicating the data provided and __spares[] are where as-yet undefined fields can be placed. Time fields are structures with separate seconds and nanoseconds fields plus a reserved field in case we want to add even finer resolution. Note that times will be negative if before 1970; in such a case, the nanosecond fields will also be negative if not zero. The bits defined in the stx_attributes field convey information about a file, how it is accessed, where it is and what it does. The following attributes map to FS__FL flags and are the same numerical value: STATX_ATTR_COMPRESSED File is compressed by the fs STATX_ATTR_IMMUTABLE File is marked immutable STATX_ATTR_APPEND File is append-only STATX_ATTR_NODUMP File is not to be dumped STATX_ATTR_ENCRYPTED File requires key to decrypt in fs Within the kernel, the supported flags are listed by: KSTAT_ATTR_FS_IOC_FLAGS [Are any other IOC flags of sufficient general interest to be exposed through this interface?] New flags include: STATX_ATTR_AUTOMOUNT Object is an automount trigger These are for the use of GUI tools that might want to mark files specially, depending on what they are. Fields in struct statx come in a number of classes: (0) stx_dev_, stx_blksize. These are local system information and are always available. (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino, stx_size, stx_blocks. These will be returned whether the caller asks for them or not. The corresponding bits in stx_mask will be set to indicate whether they actually have valid values. If the caller didn't ask for them, then they may be approximated. For example, NFS won't waste any time updating them from the server, unless as a byproduct of updating something requested. If the values don't actually exist for the underlying object (such as UID or GID on a DOS file), then the bit won't be set in the stx_mask, even if the caller asked for the value. In such a case, the returned value will be a fabrication. Note that there are instances where the type might not be valid, for instance Windows reparse points. (2) stx_rdev_*. This will be set only if stx_mode indicates we're looking at a blockdev or a chardev, otherwise will be 0. (3) stx_btime. Similar to (1), except this will be set to 0 if it doesn't exist. ======= TESTING ======= The following test program can be used to test the statx system call: samples/statx/test-statx.c Just compile and run, passing it paths to the files you want to examine. The file is built automatically if CONFIG_SAMPLES is enabled. Here's some example output. Firstly, an NFS directory that crosses to another FSID. Note that the AUTOMOUNT attribute is set because transiting this directory will cause d_automount to be invoked by the VFS. [root@andromeda ~]# /tmp/test-statx -A /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:26 Inode: 1703937 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------) Secondly, the result of automounting on that directory. [root@andromeda ~]# /tmp/test-statx /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:27 Inode: 2 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 5b825c3a	02-Feb-2017	Ingo Molnar <mingo@kernel.org>	sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> Add #include <linux/cred.h> dependencies to all .c files rely on sched.h doing that for them. Note that even if the count where we need to add extra headers seems high, it's still a net win, because <linux/sched.h> is included in over 2,200 files ... Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
# 09bb8bff	03-Aug-2016	NeilBrown <neilb@suse.com>	exportfs: be careful to only return expected errors. When nfsd calls fh_to_dentry, it expect ESTALE or ENOMEM as errors. In particular it can be tempting to return ENOENT, but this is not handled well by nfsd. Rather than requiring strict adherence to error code code filesystems, treat all unexpected error codes the same as ESTALE. This is safest. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 61922694	20-Apr-2016	Al Viro <viro@zeniv.linux.org.uk>	introduce a parallel variant of ->iterate() New method: ->iterate_shared(). Same arguments as in ->iterate(), called with the directory locked only shared. Once all filesystems switch, the old one will be gone. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 383d4e8a	14-Apr-2016	Al Viro <viro@zeniv.linux.org.uk>	reconnect_one(): use lookup_one_len_unlocked() ... and explain the non-obvious logics in case when lookup yields a different dentry. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 5955102c	22-Jan-2016	Al Viro <viro@zeniv.linux.org.uk>	wrappers for ->i_mutex access parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# e36cb0b8	28-Jan-2015	David Howells <dhowells@redhat.com>	VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_(dentry) Convert the following where appropriate: (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry). (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry). (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more complicated than it appears as some calls should be converted to d_can_lookup() instead. The difference is whether the directory in question is a real dir with a ->lookup op or whether it's a fake dir with a ->d_automount op. In some circumstances, we can subsume checks for dentry->d_inode not being NULL into this, provided we the code isn't in a filesystem that expects d_inode to be NULL if the dirent really is* negative (ie. if we're going to use d_inode() rather than d_backing_inode() to get the inode pointer). Note that the dentry type field may be set to something other than DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS manages the fall-through from a negative dentry to a lower layer. In such a case, the dentry type of the negative union dentry is set to the same as the type of the lower dentry. However, if you know d_inode is not NULL at the call site, then you can use the d_is_xxx() functions even in a filesystem. There is one further complication: a 0,0 chardev dentry may be labelled DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was intended for special directory entry types that don't have attached inodes. The following perl+coccinelle script was used: use strict; my @callers; open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' \|') \|\| die "Can't grep for S_ISDIR and co. callers"; @callers = <$fd>; close($fd); unless (@callers) { print "No matches\n"; exit(0); } my @cocci = ( '@@', 'expression E;', '@@', '', '- S_ISLNK(E->d_inode->i_mode)', '+ d_is_symlink(E)', '', '@@', 'expression E;', '@@', '', '- S_ISDIR(E->d_inode->i_mode)', '+ d_is_dir(E)', '', '@@', 'expression E;', '@@', '', '- S_ISREG(E->d_inode->i_mode)', '+ d_is_reg(E)' ); my $coccifile = "tmp.sp.cocci"; open($fd, ">$coccifile") \|\| die $coccifile; print($fd "$_\n") \|\| die $coccifile foreach (@cocci); close($fd); foreach my $file (@callers) { chomp $file; print "Processing ", $file, "\n"; system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 \|\| die "spatch failed"; } [AV: overlayfs parts skipped] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 946e51f2	26-Oct-2014	Al Viro <viro@zeniv.linux.org.uk>	move d_rcu from overlapping d_child to overlapping d_alias Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# ac7576f4	30-Oct-2014	Miklos Szeredi <miklos@szeredi.hu>	vfs: make first argument of dir_context.actor typed Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 00f01791	04-Jun-2014	Fabian Frederick <fabf@skynet.be>	fs/exportfs/expfs.c: kernel-doc warning fixes Fixing 2 typo in function comments. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# f27c9298	17-Oct-2013	J. Bruce Fields <bfields@redhat.com>	exportfs: fix quadratic behavior in filehandle lookup Suppose we're given the filehandle for a directory whose closest ancestor in the dcache is its Nth ancestor. The main loop in reconnect_path searches for an IS_ROOT ancestor of target_dir, reconnects that ancestor to its parent, then recommences the search for an IS_ROOT ancestor from target_dir. This behavior is quadratic in N. And there's really no need to restart the search from target_dir each time: once a directory has been looked up, it won't become IS_ROOT again. So instead of starting from target_dir each time, we can continue where we left off. This simplifies the code and improves performance on very deep directory heirachies. (I can't think of any reason anyone should need heirarchies a hundred or more deep, but the performance improvement may be valuable if only to limit damage in case of abuse.) Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# efbf201f	17-Oct-2013	J. Bruce Fields <bfields@redhat.com>	exportfs: better variable name Replace another unhelpful acronym. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# bbf7a8a3	17-Oct-2013	J. Bruce Fields <bfields@redhat.com>	exportfs: move most of reconnect_path to helper function Also replace 3 easily-confused three-letter acronyms by more helpful variable names. Just cleanup, no change in functionality, with one exception: the dentry_connected() check in the "out_reconnected" case will now only check the ancestors of the current dentry instead of checking all the way from target_dir. Since we've already verified connectivity up to this dentry, that should be sufficient. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# e4b70ebe	16-Oct-2013	J. Bruce Fields <bfields@redhat.com>	exportfs: eliminate unused "noprogress" counter Note this counter is now being set to 0 on every pass through the loop, so it no longer serves any useful purpose. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# a056cc89	16-Oct-2013	J. Bruce Fields <bfields@redhat.com>	exportfs: stop retrying once we race with rename/remove There are two places here where we could race with a rename or remove: - We could find the parent, but then be removed or renamed away from that parent directory before finding our name in that directory. - We could find the parent, and find our name in that parent, but then be renamed or removed before we look ourselves up by that name in that parent. In both cases the concurrent rename or remove will take care of reconnecting the directory that we're currently examining. Our target directory should then also be connected. Check this and clear DISCONNECTED in these cases instead of looping around again. Note: we do need to check that this actually happened if we want to be robust in the face of corrupted filesystems: a corrupted filesystem could just return a completely wrong parent, and we want to fail with an error in that case before starting to clear DISCONNECTED on non-DISCONNECTED filesystems. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 0dbc018a	09-Sep-2013	J. Bruce Fields <bfields@redhat.com>	exportfs: clear DISCONNECTED on all parents sooner Once we've found any connected parent, we know all our parents are connected--that's true even if there's a concurrent rename. May as well clear them all at once and be done with it. Reviewed-by: Cristoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 78cee9a8	22-Oct-2013	J. Bruce Fields <bfields@redhat.com>	exportfs: more detailed comment for path_reconnect Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 854ff5ca	16-Oct-2013	Christoph Hellwig <hch@lst.de>	exportfs: BUG_ON in crazy corner case This would indicate a nasty bug in the dcache and has never triggered in the past 10 years as far as I know. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>