Searched +hist:4 +hist:b6f5d20 (Results 1 - 25 of 106) sorted by path
/linux-master/fs/affs/ | ||
H A D | dir.c | diff b2441318 Wed Nov 01 08:07:57 MDT 2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org> License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
H A D | affs.h | diff b2441318 Wed Nov 01 08:07:57 MDT 2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org> License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> diff ed4433d7 Mon Feb 27 15:27:49 MST 2017 Fabian Frederick <fabf@skynet.be> fs/affs: make affs exportable Add standard functions making AFFS work with NFS. Functions based on ext4 implementation. Tested on loop device. Link: http://lkml.kernel.org/r/20170109191208.6085-4-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff d5de9fd5 Mon Feb 27 15:27:46 MST 2017 Fabian Frederick <fabf@skynet.be> fs/affs: add validation block function Avoid repeating 4 times the same calculation. Link: http://lkml.kernel.org/r/20170109191208.6085-3-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 66f8f509 Thu Apr 05 23:40:50 MDT 2012 Al Viro <viro@zeniv.linux.org.uk> affs: bury unused macros ... unused since 2.4.4. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff 4acdaf27 Mon Jul 25 23:42:34 MDT 2011 Al Viro <viro@zeniv.linux.org.uk> switch ->create() to umode_t vfs_create() ignores everything outside of 16bit subset of its mode argument; switching it to umode_t is obviously equivalent and it's the only caller of the method Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
H A D | file.c | diff 34113026 Wed Jul 12 21:55:08 MDT 2023 Matthew Wilcox (Oracle) <willy@infradead.org> affs: convert data read and write to use folios We still need to convert to/from folios in write_begin & write_end to fit the API, but this removes a lot of calls to old page-based functions, removing many hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20230713035512.4139457-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Acked-by: David Sterba <dsterba@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.com> Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Tom Rix <trix@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> diff 0af57378 Mon Jun 28 20:36:12 MDT 2021 Christoph Hellwig <hch@lst.de> mm: require ->set_page_dirty to be explicitly wired up Remove the CONFIG_BLOCK default to __set_page_dirty_buffers and just wire that method up for the missing instances. [hch@lst.de: ecryptfs: add a ->set_page_dirty cludge] Link: https://lkml.kernel.org/r/20210624125250.536369-1-hch@lst.de Link: https://lkml.kernel.org/r/20210614061512.3966143-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Tyler Hicks <code@tyhicks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff b2441318 Wed Nov 01 08:07:57 MDT 2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org> License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> diff a80f2d22 Mon Apr 24 14:13:10 MDT 2017 Fabian Frederick <fabf@skynet.be> fs/affs: bugfix: Write files greater than page size on OFS Previous AFFS patch fixed OFS write operations but unveiled another bug: files greater than 4KB are being created with a wrong size resulting in errors like the following: dd if=/dev/zero of=file bs=4097 count=1 cp file /mnt/affs/ cp: error writing '/mnt/affs/file': Bad address Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff 92cab82b Fri Dec 12 17:57:55 MST 2014 Fabian Frederick <fabf@skynet.be> fs/affs/file.c: remove obsolete pagesize check linux kernel doesn't manage page sizes below 4kb. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 63259326 Wed Sep 11 15:25:48 MDT 2013 Dan Carpenter <dan.carpenter@oracle.com> affs: use loff_t in affs_truncate() It seems pretty unlikely that AFFS supports files over 4GB but we may as well leave use loff_t just for cleanness sake instead of truncating it to 32 bits. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Marco Stornelli <marco.stornelli@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
/linux-master/fs/bfs/ | ||
H A D | bfs.h | diff b2441318 Wed Nov 01 08:07:57 MDT 2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org> License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> diff 4e29d50a Mon Jul 05 06:15:01 MDT 2010 Artem Bityutskiy <Artem.Bityutskiy@nokia.com> BFS: clean up the superblock usage BFS is a very simple FS and its superblocks contains only static information and is never changed. However, the BFS code for some misterious reasons marked its buffer head as dirty from time to time, but nothing in that buffer was ever changed. This patch removes all the BFS superblock manipulation, simply because it is not needed. It removes: 1. The si_sbh filed from 'struct bfs_sb_info' because it is not needed. We only need to read the SB once on mount to get the start of data blocks and the FS size. After this, we can forget about the SB. 2. All instances of 'mark_buffer_dirty(sbh)' for BFS SB because it is never changed. 3. The '->sync_fs()' method because there is nothing to sync (inodes are synched by VFS). 4. The '->write_super()' method, again, because the SB is never changed. Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff 4e29d50a Mon Jul 05 06:15:01 MDT 2010 Artem Bityutskiy <Artem.Bityutskiy@nokia.com> BFS: clean up the superblock usage BFS is a very simple FS and its superblocks contains only static information and is never changed. However, the BFS code for some misterious reasons marked its buffer head as dirty from time to time, but nothing in that buffer was ever changed. This patch removes all the BFS superblock manipulation, simply because it is not needed. It removes: 1. The si_sbh filed from 'struct bfs_sb_info' because it is not needed. We only need to read the SB once on mount to get the start of data blocks and the FS size. After this, we can forget about the SB. 2. All instances of 'mark_buffer_dirty(sbh)' for BFS SB because it is never changed. 3. The '->sync_fs()' method because there is nothing to sync (inodes are synched by VFS). 4. The '->write_super()' method, again, because the SB is never changed. Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
H A D | dir.c | diff b2441318 Wed Nov 01 08:07:57 MDT 2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org> License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> diff 4acdaf27 Mon Jul 25 23:42:34 MDT 2011 Al Viro <viro@zeniv.linux.org.uk> switch ->create() to umode_t vfs_create() ignores everything outside of 16bit subset of its mode argument; switching it to umode_t is obviously equivalent and it's the only caller of the method Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
H A D | file.c | diff 0af57378 Mon Jun 28 20:36:12 MDT 2021 Christoph Hellwig <hch@lst.de> mm: require ->set_page_dirty to be explicitly wired up Remove the CONFIG_BLOCK default to __set_page_dirty_buffers and just wire that method up for the missing instances. [hch@lst.de: ecryptfs: add a ->set_page_dirty cludge] Link: https://lkml.kernel.org/r/20210624125250.536369-1-hch@lst.de Link: https://lkml.kernel.org/r/20210614061512.3966143-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Tyler Hicks <code@tyhicks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff b2441318 Wed Nov 01 08:07:57 MDT 2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org> License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> diff 4e29d50a Mon Jul 05 06:15:01 MDT 2010 Artem Bityutskiy <Artem.Bityutskiy@nokia.com> BFS: clean up the superblock usage BFS is a very simple FS and its superblocks contains only static information and is never changed. However, the BFS code for some misterious reasons marked its buffer head as dirty from time to time, but nothing in that buffer was ever changed. This patch removes all the BFS superblock manipulation, simply because it is not needed. It removes: 1. The si_sbh filed from 'struct bfs_sb_info' because it is not needed. We only need to read the SB once on mount to get the start of data blocks and the FS size. After this, we can forget about the SB. 2. All instances of 'mark_buffer_dirty(sbh)' for BFS SB because it is never changed. 3. The '->sync_fs()' method because there is nothing to sync (inodes are synched by VFS). 4. The '->write_super()' method, again, because the SB is never changed. Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff 4e29d50a Mon Jul 05 06:15:01 MDT 2010 Artem Bityutskiy <Artem.Bityutskiy@nokia.com> BFS: clean up the superblock usage BFS is a very simple FS and its superblocks contains only static information and is never changed. However, the BFS code for some misterious reasons marked its buffer head as dirty from time to time, but nothing in that buffer was ever changed. This patch removes all the BFS superblock manipulation, simply because it is not needed. It removes: 1. The si_sbh filed from 'struct bfs_sb_info' because it is not needed. We only need to read the SB once on mount to get the start of data blocks and the FS size. After this, we can forget about the SB. 2. All instances of 'mark_buffer_dirty(sbh)' for BFS SB because it is never changed. 3. The '->sync_fs()' method because there is nothing to sync (inodes are synched by VFS). 4. The '->write_super()' method, again, because the SB is never changed. Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
/linux-master/fs/efs/ | ||
H A D | dir.c | diff b2441318 Wed Nov 01 08:07:57 MDT 2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org> License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
/linux-master/fs/ramfs/ | ||
H A D | internal.h | diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
/linux-master/include/linux/ | ||
H A D | msdos_fs.h | diff b2441318 Wed Nov 01 08:07:57 MDT 2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org> License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4de151d8 Tue Mar 21 16:13:35 MST 2006 Alexey Dobriyan <adobriyan@gmail.com> It's UTF-8 Fix some comments to "UTF-8". Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> |
/linux-master/drivers/misc/ibmasm/ | ||
H A D | ibmasmfs.c | diff 4a2ef475 Wed Oct 04 12:51:53 MDT 2023 Jeff Layton <jlayton@kernel.org> ibmasm: convert to new timestamp accessors Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-6-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> diff 5a0e3ad6 Wed Mar 24 02:04:11 MDT 2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
/linux-master/fs/9p/ | ||
H A D | v9fs_vfs.h | diff 5e3cc1ee Wed Jan 23 23:35:13 MST 2019 Hou Tao <houtao1@huawei.com> 9p: use inode->i_lock to protect i_size_write() under 32-bit Use inode->i_lock to protect i_size_write(), else i_size_read() in generic_fillattr() may loop infinitely in read_seqcount_begin() when multiple processes invoke v9fs_vfs_getattr() or v9fs_vfs_getattr_dotl() simultaneously under 32-bit SMP environment, and a soft lockup will be triggered as show below: watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [stat:2217] Modules linked in: CPU: 5 PID: 2217 Comm: stat Not tainted 5.0.0-rc1-00005-g7f702faf5a9e #4 Hardware name: Generic DT based system PC is at generic_fillattr+0x104/0x108 LR is at 0xec497f00 pc : [<802b8898>] lr : [<ec497f00>] psr: 200c0013 sp : ec497e20 ip : ed608030 fp : ec497e3c r10: 00000000 r9 : ec497f00 r8 : ed608030 r7 : ec497ebc r6 : ec497f00 r5 : ee5c1550 r4 : ee005780 r3 : 0000052d r2 : 00000000 r1 : ec497f00 r0 : ed608030 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: ac48006a DAC: 00000051 CPU: 5 PID: 2217 Comm: stat Not tainted 5.0.0-rc1-00005-g7f702faf5a9e #4 Hardware name: Generic DT based system Backtrace: [<8010d974>] (dump_backtrace) from [<8010dc88>] (show_stack+0x20/0x24) [<8010dc68>] (show_stack) from [<80a1d194>] (dump_stack+0xb0/0xdc) [<80a1d0e4>] (dump_stack) from [<80109f34>] (show_regs+0x1c/0x20) [<80109f18>] (show_regs) from [<801d0a80>] (watchdog_timer_fn+0x280/0x2f8) [<801d0800>] (watchdog_timer_fn) from [<80198658>] (__hrtimer_run_queues+0x18c/0x380) [<801984cc>] (__hrtimer_run_queues) from [<80198e60>] (hrtimer_run_queues+0xb8/0xf0) [<80198da8>] (hrtimer_run_queues) from [<801973e8>] (run_local_timers+0x28/0x64) [<801973c0>] (run_local_timers) from [<80197460>] (update_process_times+0x3c/0x6c) [<80197424>] (update_process_times) from [<801ab2b8>] (tick_nohz_handler+0xe0/0x1bc) [<801ab1d8>] (tick_nohz_handler) from [<80843050>] (arch_timer_handler_virt+0x38/0x48) [<80843018>] (arch_timer_handler_virt) from [<80180a64>] (handle_percpu_devid_irq+0x8c/0x240) [<801809d8>] (handle_percpu_devid_irq) from [<8017ac20>] (generic_handle_irq+0x34/0x44) [<8017abec>] (generic_handle_irq) from [<8017b344>] (__handle_domain_irq+0x6c/0xc4) [<8017b2d8>] (__handle_domain_irq) from [<801022e0>] (gic_handle_irq+0x4c/0x88) [<80102294>] (gic_handle_irq) from [<80101a30>] (__irq_svc+0x70/0x98) [<802b8794>] (generic_fillattr) from [<8056b284>] (v9fs_vfs_getattr_dotl+0x74/0xa4) [<8056b210>] (v9fs_vfs_getattr_dotl) from [<802b8904>] (vfs_getattr_nosec+0x68/0x7c) [<802b889c>] (vfs_getattr_nosec) from [<802b895c>] (vfs_getattr+0x44/0x48) [<802b8918>] (vfs_getattr) from [<802b8a74>] (vfs_statx+0x9c/0xec) [<802b89d8>] (vfs_statx) from [<802b9428>] (sys_lstat64+0x48/0x78) [<802b93e0>] (sys_lstat64) from [<80101000>] (ret_fast_syscall+0x0/0x28) [dominique.martinet@cea.fr: updated comment to not refer to a function in another subsystem] Link: http://lkml.kernel.org/r/20190124063514.8571-2-houtao1@huawei.com Cc: stable@vger.kernel.org Fixes: 7549ae3e81cc ("9p: Use the i_size_[read, write]() macros instead of using inode->i_size directly.") Reported-by: Xing Gaopeng <xingaopeng@huawei.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> diff 5e3cc1ee Wed Jan 23 23:35:13 MST 2019 Hou Tao <houtao1@huawei.com> 9p: use inode->i_lock to protect i_size_write() under 32-bit Use inode->i_lock to protect i_size_write(), else i_size_read() in generic_fillattr() may loop infinitely in read_seqcount_begin() when multiple processes invoke v9fs_vfs_getattr() or v9fs_vfs_getattr_dotl() simultaneously under 32-bit SMP environment, and a soft lockup will be triggered as show below: watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [stat:2217] Modules linked in: CPU: 5 PID: 2217 Comm: stat Not tainted 5.0.0-rc1-00005-g7f702faf5a9e #4 Hardware name: Generic DT based system PC is at generic_fillattr+0x104/0x108 LR is at 0xec497f00 pc : [<802b8898>] lr : [<ec497f00>] psr: 200c0013 sp : ec497e20 ip : ed608030 fp : ec497e3c r10: 00000000 r9 : ec497f00 r8 : ed608030 r7 : ec497ebc r6 : ec497f00 r5 : ee5c1550 r4 : ee005780 r3 : 0000052d r2 : 00000000 r1 : ec497f00 r0 : ed608030 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: ac48006a DAC: 00000051 CPU: 5 PID: 2217 Comm: stat Not tainted 5.0.0-rc1-00005-g7f702faf5a9e #4 Hardware name: Generic DT based system Backtrace: [<8010d974>] (dump_backtrace) from [<8010dc88>] (show_stack+0x20/0x24) [<8010dc68>] (show_stack) from [<80a1d194>] (dump_stack+0xb0/0xdc) [<80a1d0e4>] (dump_stack) from [<80109f34>] (show_regs+0x1c/0x20) [<80109f18>] (show_regs) from [<801d0a80>] (watchdog_timer_fn+0x280/0x2f8) [<801d0800>] (watchdog_timer_fn) from [<80198658>] (__hrtimer_run_queues+0x18c/0x380) [<801984cc>] (__hrtimer_run_queues) from [<80198e60>] (hrtimer_run_queues+0xb8/0xf0) [<80198da8>] (hrtimer_run_queues) from [<801973e8>] (run_local_timers+0x28/0x64) [<801973c0>] (run_local_timers) from [<80197460>] (update_process_times+0x3c/0x6c) [<80197424>] (update_process_times) from [<801ab2b8>] (tick_nohz_handler+0xe0/0x1bc) [<801ab1d8>] (tick_nohz_handler) from [<80843050>] (arch_timer_handler_virt+0x38/0x48) [<80843018>] (arch_timer_handler_virt) from [<80180a64>] (handle_percpu_devid_irq+0x8c/0x240) [<801809d8>] (handle_percpu_devid_irq) from [<8017ac20>] (generic_handle_irq+0x34/0x44) [<8017abec>] (generic_handle_irq) from [<8017b344>] (__handle_domain_irq+0x6c/0xc4) [<8017b2d8>] (__handle_domain_irq) from [<801022e0>] (gic_handle_irq+0x4c/0x88) [<80102294>] (gic_handle_irq) from [<80101a30>] (__irq_svc+0x70/0x98) [<802b8794>] (generic_fillattr) from [<8056b284>] (v9fs_vfs_getattr_dotl+0x74/0xa4) [<8056b210>] (v9fs_vfs_getattr_dotl) from [<802b8904>] (vfs_getattr_nosec+0x68/0x7c) [<802b889c>] (vfs_getattr_nosec) from [<802b895c>] (vfs_getattr+0x44/0x48) [<802b8918>] (vfs_getattr) from [<802b8a74>] (vfs_statx+0x9c/0xec) [<802b89d8>] (vfs_statx) from [<802b9428>] (sys_lstat64+0x48/0x78) [<802b93e0>] (sys_lstat64) from [<80101000>] (ret_fast_syscall+0x0/0x28) [dominique.martinet@cea.fr: updated comment to not refer to a function in another subsystem] Link: http://lkml.kernel.org/r/20190124063514.8571-2-houtao1@huawei.com Cc: stable@vger.kernel.org Fixes: 7549ae3e81cc ("9p: Use the i_size_[read, write]() macros instead of using inode->i_size directly.") Reported-by: Xing Gaopeng <xingaopeng@huawei.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> diff b165d601 Fri Oct 22 11:13:12 MDT 2010 Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com> 9p: Add datasync to client side TFSYNC/RFSYNC for dotl SYNOPSIS size[4] Tfsync tag[2] fid[4] datasync[4] size[4] Rfsync tag[2] DESCRIPTION The Tfsync transaction transfers ("flushes") all modified in-core data of file identified by fid to the disk device (or other permanent storage device) where that file resides. If datasync flag is specified data will be fleshed but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff b165d601 Fri Oct 22 11:13:12 MDT 2010 Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com> 9p: Add datasync to client side TFSYNC/RFSYNC for dotl SYNOPSIS size[4] Tfsync tag[2] fid[4] datasync[4] size[4] Rfsync tag[2] DESCRIPTION The Tfsync transaction transfers ("flushes") all modified in-core data of file identified by fid to the disk device (or other permanent storage device) where that file resides. If datasync flag is specified data will be fleshed but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff b165d601 Fri Oct 22 11:13:12 MDT 2010 Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com> 9p: Add datasync to client side TFSYNC/RFSYNC for dotl SYNOPSIS size[4] Tfsync tag[2] fid[4] datasync[4] size[4] Rfsync tag[2] DESCRIPTION The Tfsync transaction transfers ("flushes") all modified in-core data of file identified by fid to the disk device (or other permanent storage device) where that file resides. If datasync flag is specified data will be fleshed but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff b165d601 Fri Oct 22 11:13:12 MDT 2010 Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com> 9p: Add datasync to client side TFSYNC/RFSYNC for dotl SYNOPSIS size[4] Tfsync tag[2] fid[4] datasync[4] size[4] Rfsync tag[2] DESCRIPTION The Tfsync transaction transfers ("flushes") all modified in-core data of file identified by fid to the disk device (or other permanent storage device) where that file resides. If datasync flag is specified data will be fleshed but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff a099027c Mon Sep 27 00:04:24 MDT 2010 M. Mohan Kumar <mohan@in.ibm.com> 9p: Implement TLOCK Synopsis size[4] TLock tag[2] fid[4] flock[n] size[4] RLock tag[2] status[1] Description Tlock is used to acquire/release byte range posix locks on a file identified by given fid. The reply contains status of the lock request flock structure: type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK flags[4] - Flags could be either of P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a conflicting lock exists, wait for that lock to be released. P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is trying to reclaim a lock after a server restrart (due to crash) start[8] - Starting offset for lock length[8] - Number of bytes to lock If length is 0, lock all bytes starting at the location 'start' through to the end of file pid[4] - PID of the process that wants to take lock client_id[4] - Unique client id status[1] - Status of the lock request, can be P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or P9_LOCK_GRACE(3) P9_LOCK_SUCCESS - Request was successful P9_LOCK_BLOCKED - A conflicting lock is held by another process P9_LOCK_ERROR - Error while processing the lock request P9_LOCK_GRACE - Server is in grace period, it can't accept new lock requests in this period (except locks with P9_LOCK_FLAGS_RECLAIM flag set) Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff a099027c Mon Sep 27 00:04:24 MDT 2010 M. Mohan Kumar <mohan@in.ibm.com> 9p: Implement TLOCK Synopsis size[4] TLock tag[2] fid[4] flock[n] size[4] RLock tag[2] status[1] Description Tlock is used to acquire/release byte range posix locks on a file identified by given fid. The reply contains status of the lock request flock structure: type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK flags[4] - Flags could be either of P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a conflicting lock exists, wait for that lock to be released. P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is trying to reclaim a lock after a server restrart (due to crash) start[8] - Starting offset for lock length[8] - Number of bytes to lock If length is 0, lock all bytes starting at the location 'start' through to the end of file pid[4] - PID of the process that wants to take lock client_id[4] - Unique client id status[1] - Status of the lock request, can be P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or P9_LOCK_GRACE(3) P9_LOCK_SUCCESS - Request was successful P9_LOCK_BLOCKED - A conflicting lock is held by another process P9_LOCK_ERROR - Error while processing the lock request P9_LOCK_GRACE - Server is in grace period, it can't accept new lock requests in this period (except locks with P9_LOCK_FLAGS_RECLAIM flag set) Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff a099027c Mon Sep 27 00:04:24 MDT 2010 M. Mohan Kumar <mohan@in.ibm.com> 9p: Implement TLOCK Synopsis size[4] TLock tag[2] fid[4] flock[n] size[4] RLock tag[2] status[1] Description Tlock is used to acquire/release byte range posix locks on a file identified by given fid. The reply contains status of the lock request flock structure: type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK flags[4] - Flags could be either of P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a conflicting lock exists, wait for that lock to be released. P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is trying to reclaim a lock after a server restrart (due to crash) start[8] - Starting offset for lock length[8] - Number of bytes to lock If length is 0, lock all bytes starting at the location 'start' through to the end of file pid[4] - PID of the process that wants to take lock client_id[4] - Unique client id status[1] - Status of the lock request, can be P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or P9_LOCK_GRACE(3) P9_LOCK_SUCCESS - Request was successful P9_LOCK_BLOCKED - A conflicting lock is held by another process P9_LOCK_ERROR - Error while processing the lock request P9_LOCK_GRACE - Server is in grace period, it can't accept new lock requests in this period (except locks with P9_LOCK_FLAGS_RECLAIM flag set) Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff a099027c Mon Sep 27 00:04:24 MDT 2010 M. Mohan Kumar <mohan@in.ibm.com> 9p: Implement TLOCK Synopsis size[4] TLock tag[2] fid[4] flock[n] size[4] RLock tag[2] status[1] Description Tlock is used to acquire/release byte range posix locks on a file identified by given fid. The reply contains status of the lock request flock structure: type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK flags[4] - Flags could be either of P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a conflicting lock exists, wait for that lock to be released. P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is trying to reclaim a lock after a server restrart (due to crash) start[8] - Starting offset for lock length[8] - Number of bytes to lock If length is 0, lock all bytes starting at the location 'start' through to the end of file pid[4] - PID of the process that wants to take lock client_id[4] - Unique client id status[1] - Status of the lock request, can be P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or P9_LOCK_GRACE(3) P9_LOCK_SUCCESS - Request was successful P9_LOCK_BLOCKED - A conflicting lock is held by another process P9_LOCK_ERROR - Error while processing the lock request P9_LOCK_GRACE - Server is in grace period, it can't accept new lock requests in this period (except locks with P9_LOCK_FLAGS_RECLAIM flag set) Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff a099027c Mon Sep 27 00:04:24 MDT 2010 M. Mohan Kumar <mohan@in.ibm.com> 9p: Implement TLOCK Synopsis size[4] TLock tag[2] fid[4] flock[n] size[4] RLock tag[2] status[1] Description Tlock is used to acquire/release byte range posix locks on a file identified by given fid. The reply contains status of the lock request flock structure: type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK flags[4] - Flags could be either of P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a conflicting lock exists, wait for that lock to be released. P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is trying to reclaim a lock after a server restrart (due to crash) start[8] - Starting offset for lock length[8] - Number of bytes to lock If length is 0, lock all bytes starting at the location 'start' through to the end of file pid[4] - PID of the process that wants to take lock client_id[4] - Unique client id status[1] - Status of the lock request, can be P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or P9_LOCK_GRACE(3) P9_LOCK_SUCCESS - Request was successful P9_LOCK_BLOCKED - A conflicting lock is held by another process P9_LOCK_ERROR - Error while processing the lock request P9_LOCK_GRACE - Server is in grace period, it can't accept new lock requests in this period (except locks with P9_LOCK_FLAGS_RECLAIM flag set) Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff a099027c Mon Sep 27 00:04:24 MDT 2010 M. Mohan Kumar <mohan@in.ibm.com> 9p: Implement TLOCK Synopsis size[4] TLock tag[2] fid[4] flock[n] size[4] RLock tag[2] status[1] Description Tlock is used to acquire/release byte range posix locks on a file identified by given fid. The reply contains status of the lock request flock structure: type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK flags[4] - Flags could be either of P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a conflicting lock exists, wait for that lock to be released. P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is trying to reclaim a lock after a server restrart (due to crash) start[8] - Starting offset for lock length[8] - Number of bytes to lock If length is 0, lock all bytes starting at the location 'start' through to the end of file pid[4] - PID of the process that wants to take lock client_id[4] - Unique client id status[1] - Status of the lock request, can be P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or P9_LOCK_GRACE(3) P9_LOCK_SUCCESS - Request was successful P9_LOCK_BLOCKED - A conflicting lock is held by another process P9_LOCK_ERROR - Error while processing the lock request P9_LOCK_GRACE - Server is in grace period, it can't accept new lock requests in this period (except locks with P9_LOCK_FLAGS_RECLAIM flag set) Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> |
H A D | vfs_dir.c | diff eee4a119 Wed May 03 01:49:25 MDT 2023 Dominique Martinet <asmadeus@codewreck.org> 9p: fix ignored return value in v9fs_dir_release retval from filemap_fdatawrite was immediately overwritten by the following p9_fid_put: preserve any error in fdatawrite if there was any first. This fixes the following scan-build warning: fs/9p/vfs_dir.c:220:4: warning: Value stored to 'retval' is never read [deadcode.DeadStores] retval = filemap_fdatawrite(inode->i_mapping); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: 89c58cb395ec ("fs/9p: fix error reporting in v9fs_dir_release") Cc: stable@vger.kernel.org Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org> diff 24e42e32 Wed Nov 18 02:06:42 MST 2020 David Howells <dhowells@redhat.com> 9p: Use fscache indexing rewrite and reenable caching Change the 9p filesystem to take account of the changes to fscache's indexing rewrite and reenable caching in 9p. The following changes have been made: (1) The fscache_netfs struct is no more, and there's no need to register the filesystem as a whole. (2) The session cookie is now an fscache_volume cookie, allocated with fscache_acquire_volume(). That takes three parameters: a string representing the "volume" in the index, a string naming the cache to use (or NULL) and a u64 that conveys coherency metadata for the volume. For 9p, I've made it render the volume name string as: "9p,<devname>,<cachetag>" where the cachetag is replaced by the aname if it wasn't supplied. This probably needs rethinking a bit as the aname can have slashes in it. It might be better to hash the cachetag and use the hash or I could substitute commas for the slashes or something. (3) The fscache_cookie_def is no more and needed information is passed directly to fscache_acquire_cookie(). The cache no longer calls back into the filesystem, but rather metadata changes are indicated at other times. fscache_acquire_cookie() is passed the same keying and coherency information as before. (4) The functions to set/reset/flush cookies are removed and fscache_use_cookie() and fscache_unuse_cookie() are used instead. fscache_use_cookie() is passed a flag to indicate if the cookie is opened for writing. fscache_unuse_cookie() is passed updates for the metadata if we changed it (ie. if the file was opened for writing). These are called when the file is opened or closed. (5) wait_on_page_bit[_killable]() is replaced with the specific wait functions for the bits waited upon. (6) I've got rid of some of the 9p-specific cache helper functions and called things like fscache_relinquish_cookie() directly as they'll optimise away if v9fs_inode_cookie() returns an unconditional NULL (which will be the case if CONFIG_9P_FSCACHE=n). (7) v9fs_vfs_setattr() is made to call fscache_resize() to change the size of the cache object. Notes: (A) We should call fscache_invalidate() if we detect that the server's copy of a file got changed by a third party, but I don't know where to do that. We don't need to do that when allocating the cookie as we get a check-and-invalidate when we initially bind to the cache object. (B) The copy-to-cache-on-writeback side of things will be handled in separate patch. Changes ======= ver #3: - Canonicalise the cookie key and coherency data to make them endianness-independent. ver #2: - Use gfpflags_allow_blocking() rather than using flag directly. - fscache_acquire_volume() now returns errors. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dominique Martinet <asmadeus@codewreck.org> cc: Eric Van Hensbergen <ericvh@gmail.com> cc: Latchesar Ionkov <lucho@ionkov.net> cc: v9fs-developer@lists.sourceforge.net cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819664645.215744.1555314582005286846.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906975017.143852.3459573173204394039.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967178512.1823006.17377493641569138183.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021573143.640689.3977487095697717967.stgit@warthog.procyon.org.uk/ # v4 diff 4b8e9923 Tue Aug 19 18:17:38 MDT 2014 Al Viro <viro@zeniv.linux.org.uk> 9p: switch to %p[dD] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff 348b5901 Sat Aug 06 13:16:59 MDT 2011 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> net/9p: Convert net/9p protocol dumps to tracepoints This helps in more control over debugging. root@qemu-img-64:~# ls /pass/123 ls: cannot access /pass/123: No such file or directory root@qemu-img-64:~# cat /sys/kernel/debug/tracing/trace # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | ls-1536 [001] 70.928584: 9p_protocol_dump: clnt 18446612132784021504 P9_TWALK(tag = 1) 000: 16 00 00 00 6e 01 00 01 00 00 00 02 00 00 00 01 010: 00 03 00 31 32 33 00 00 00 ff ff ff ff 00 00 00 ls-1536 [001] 70.928587: <stack trace> => trace_9p_protocol_dump => p9pdu_finalize => p9_client_rpc => p9_client_walk => v9fs_vfs_lookup => d_alloc_and_lookup => walk_component => path_lookupat ls-1536 [000] 70.929696: 9p_protocol_dump: clnt 18446612132784021504 P9_RLERROR(tag = 1) 000: 0b 00 00 00 07 01 00 02 00 00 00 4e 03 00 02 00 010: 00 00 00 00 03 00 02 00 00 00 00 00 ff 43 00 00 ls-1536 [000] 70.929697: <stack trace> => trace_9p_protocol_dump => p9_client_rpc => p9_client_walk => v9fs_vfs_lookup => d_alloc_and_lookup => walk_component => path_lookupat => do_path_lookup Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff b165d601 Fri Oct 22 11:13:12 MDT 2010 Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com> 9p: Add datasync to client side TFSYNC/RFSYNC for dotl SYNOPSIS size[4] Tfsync tag[2] fid[4] datasync[4] size[4] Rfsync tag[2] DESCRIPTION The Tfsync transaction transfers ("flushes") all modified in-core data of file identified by fid to the disk device (or other permanent storage device) where that file resides. If datasync flag is specified data will be fleshed but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff b165d601 Fri Oct 22 11:13:12 MDT 2010 Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com> 9p: Add datasync to client side TFSYNC/RFSYNC for dotl SYNOPSIS size[4] Tfsync tag[2] fid[4] datasync[4] size[4] Rfsync tag[2] DESCRIPTION The Tfsync transaction transfers ("flushes") all modified in-core data of file identified by fid to the disk device (or other permanent storage device) where that file resides. If datasync flag is specified data will be fleshed but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff b165d601 Fri Oct 22 11:13:12 MDT 2010 Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com> 9p: Add datasync to client side TFSYNC/RFSYNC for dotl SYNOPSIS size[4] Tfsync tag[2] fid[4] datasync[4] size[4] Rfsync tag[2] DESCRIPTION The Tfsync transaction transfers ("flushes") all modified in-core data of file identified by fid to the disk device (or other permanent storage device) where that file resides. If datasync flag is specified data will be fleshed but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff b165d601 Fri Oct 22 11:13:12 MDT 2010 Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com> 9p: Add datasync to client side TFSYNC/RFSYNC for dotl SYNOPSIS size[4] Tfsync tag[2] fid[4] datasync[4] size[4] Rfsync tag[2] DESCRIPTION The Tfsync transaction transfers ("flushes") all modified in-core data of file identified by fid to the disk device (or other permanent storage device) where that file resides. If datasync flag is specified data will be fleshed but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff 7751bdb3 Fri Jun 04 07:41:26 MDT 2010 Sripathi Kodi <sripathik@in.ibm.com> 9p: readdir implementation for 9p2000.L This patch implements the kernel part of readdir() implementation for 9p2000.L Change from V3: Instead of inode, server now sends qids for each dirent SYNOPSIS size[4] Treaddir tag[2] fid[4] offset[8] count[4] size[4] Rreaddir tag[2] count[4] data[count] DESCRIPTION The readdir request asks the server to read the directory specified by 'fid' at an offset specified by 'offset' and return as many dirent structures as possible that fit into count bytes. Each dirent structure is laid out as follows. qid.type[1] the type of the file (directory, etc.), represented as a bit vector corresponding to the high 8 bits of the file's mode word. qid.vers[4] version number for given path qid.path[8] the file server's unique identification for the file offset[8] offset into the next dirent. type[1] type of this directory entry. name[256] name of this directory entry. This patch adds v9fs_dir_readdir_dotl() as the readdir() call for 9p2000.L. This function sends P9_TREADDIR command to the server. In response the server sends a buffer filled with dirent structures. This is different from the existing v9fs_dir_readdir() call which receives stat structures from the server. This results in significant speedup of readdir() on large directories. For example, doing 'ls >/dev/null' on a directory with 10000 files on my laptop takes 1.088 seconds with the existing code, but only takes 0.339 seconds with the new readdir. Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff 7751bdb3 Fri Jun 04 07:41:26 MDT 2010 Sripathi Kodi <sripathik@in.ibm.com> 9p: readdir implementation for 9p2000.L This patch implements the kernel part of readdir() implementation for 9p2000.L Change from V3: Instead of inode, server now sends qids for each dirent SYNOPSIS size[4] Treaddir tag[2] fid[4] offset[8] count[4] size[4] Rreaddir tag[2] count[4] data[count] DESCRIPTION The readdir request asks the server to read the directory specified by 'fid' at an offset specified by 'offset' and return as many dirent structures as possible that fit into count bytes. Each dirent structure is laid out as follows. qid.type[1] the type of the file (directory, etc.), represented as a bit vector corresponding to the high 8 bits of the file's mode word. qid.vers[4] version number for given path qid.path[8] the file server's unique identification for the file offset[8] offset into the next dirent. type[1] type of this directory entry. name[256] name of this directory entry. This patch adds v9fs_dir_readdir_dotl() as the readdir() call for 9p2000.L. This function sends P9_TREADDIR command to the server. In response the server sends a buffer filled with dirent structures. This is different from the existing v9fs_dir_readdir() call which receives stat structures from the server. This results in significant speedup of readdir() on large directories. For example, doing 'ls >/dev/null' on a directory with 10000 files on my laptop takes 1.088 seconds with the existing code, but only takes 0.339 seconds with the new readdir. Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff 7751bdb3 Fri Jun 04 07:41:26 MDT 2010 Sripathi Kodi <sripathik@in.ibm.com> 9p: readdir implementation for 9p2000.L This patch implements the kernel part of readdir() implementation for 9p2000.L Change from V3: Instead of inode, server now sends qids for each dirent SYNOPSIS size[4] Treaddir tag[2] fid[4] offset[8] count[4] size[4] Rreaddir tag[2] count[4] data[count] DESCRIPTION The readdir request asks the server to read the directory specified by 'fid' at an offset specified by 'offset' and return as many dirent structures as possible that fit into count bytes. Each dirent structure is laid out as follows. qid.type[1] the type of the file (directory, etc.), represented as a bit vector corresponding to the high 8 bits of the file's mode word. qid.vers[4] version number for given path qid.path[8] the file server's unique identification for the file offset[8] offset into the next dirent. type[1] type of this directory entry. name[256] name of this directory entry. This patch adds v9fs_dir_readdir_dotl() as the readdir() call for 9p2000.L. This function sends P9_TREADDIR command to the server. In response the server sends a buffer filled with dirent structures. This is different from the existing v9fs_dir_readdir() call which receives stat structures from the server. This results in significant speedup of readdir() on large directories. For example, doing 'ls >/dev/null' on a directory with 10000 files on my laptop takes 1.088 seconds with the existing code, but only takes 0.339 seconds with the new readdir. Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff 7751bdb3 Fri Jun 04 07:41:26 MDT 2010 Sripathi Kodi <sripathik@in.ibm.com> 9p: readdir implementation for 9p2000.L This patch implements the kernel part of readdir() implementation for 9p2000.L Change from V3: Instead of inode, server now sends qids for each dirent SYNOPSIS size[4] Treaddir tag[2] fid[4] offset[8] count[4] size[4] Rreaddir tag[2] count[4] data[count] DESCRIPTION The readdir request asks the server to read the directory specified by 'fid' at an offset specified by 'offset' and return as many dirent structures as possible that fit into count bytes. Each dirent structure is laid out as follows. qid.type[1] the type of the file (directory, etc.), represented as a bit vector corresponding to the high 8 bits of the file's mode word. qid.vers[4] version number for given path qid.path[8] the file server's unique identification for the file offset[8] offset into the next dirent. type[1] type of this directory entry. name[256] name of this directory entry. This patch adds v9fs_dir_readdir_dotl() as the readdir() call for 9p2000.L. This function sends P9_TREADDIR command to the server. In response the server sends a buffer filled with dirent structures. This is different from the existing v9fs_dir_readdir() call which receives stat structures from the server. This results in significant speedup of readdir() on large directories. For example, doing 'ls >/dev/null' on a directory with 10000 files on my laptop takes 1.088 seconds with the existing code, but only takes 0.339 seconds with the new readdir. Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff 7751bdb3 Fri Jun 04 07:41:26 MDT 2010 Sripathi Kodi <sripathik@in.ibm.com> 9p: readdir implementation for 9p2000.L This patch implements the kernel part of readdir() implementation for 9p2000.L Change from V3: Instead of inode, server now sends qids for each dirent SYNOPSIS size[4] Treaddir tag[2] fid[4] offset[8] count[4] size[4] Rreaddir tag[2] count[4] data[count] DESCRIPTION The readdir request asks the server to read the directory specified by 'fid' at an offset specified by 'offset' and return as many dirent structures as possible that fit into count bytes. Each dirent structure is laid out as follows. qid.type[1] the type of the file (directory, etc.), represented as a bit vector corresponding to the high 8 bits of the file's mode word. qid.vers[4] version number for given path qid.path[8] the file server's unique identification for the file offset[8] offset into the next dirent. type[1] type of this directory entry. name[256] name of this directory entry. This patch adds v9fs_dir_readdir_dotl() as the readdir() call for 9p2000.L. This function sends P9_TREADDIR command to the server. In response the server sends a buffer filled with dirent structures. This is different from the existing v9fs_dir_readdir() call which receives stat structures from the server. This results in significant speedup of readdir() on large directories. For example, doing 'ls >/dev/null' on a directory with 10000 files on my laptop takes 1.088 seconds with the existing code, but only takes 0.339 seconds with the new readdir. Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> diff 7751bdb3 Fri Jun 04 07:41:26 MDT 2010 Sripathi Kodi <sripathik@in.ibm.com> 9p: readdir implementation for 9p2000.L This patch implements the kernel part of readdir() implementation for 9p2000.L Change from V3: Instead of inode, server now sends qids for each dirent SYNOPSIS size[4] Treaddir tag[2] fid[4] offset[8] count[4] size[4] Rreaddir tag[2] count[4] data[count] DESCRIPTION The readdir request asks the server to read the directory specified by 'fid' at an offset specified by 'offset' and return as many dirent structures as possible that fit into count bytes. Each dirent structure is laid out as follows. qid.type[1] the type of the file (directory, etc.), represented as a bit vector corresponding to the high 8 bits of the file's mode word. qid.vers[4] version number for given path qid.path[8] the file server's unique identification for the file offset[8] offset into the next dirent. type[1] type of this directory entry. name[256] name of this directory entry. This patch adds v9fs_dir_readdir_dotl() as the readdir() call for 9p2000.L. This function sends P9_TREADDIR command to the server. In response the server sends a buffer filled with dirent structures. This is different from the existing v9fs_dir_readdir() call which receives stat structures from the server. This results in significant speedup of readdir() on large directories. For example, doing 'ls >/dev/null' on a directory with 10000 files on my laptop takes 1.088 seconds with the existing code, but only takes 0.339 seconds with the new readdir. Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> |
H A D | vfs_file.c | diff 4eb31178 Sun Mar 26 19:53:10 MDT 2023 Eric Van Hensbergen <ericvh@kernel.org> fs/9p: Rework cache modes and add new options to Documentation Switch cache modes to a bit-mask and use legacy cache names as shortcuts. Update documentation to include information on both shortcuts and bitmasks. This patch also fixes missing guards related to fscache. Update the documentation for new mount flags and cache modes. Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org> diff 24e42e32 Wed Nov 18 02:06:42 MST 2020 David Howells <dhowells@redhat.com> 9p: Use fscache indexing rewrite and reenable caching Change the 9p filesystem to take account of the changes to fscache's indexing rewrite and reenable caching in 9p. The following changes have been made: (1) The fscache_netfs struct is no more, and there's no need to register the filesystem as a whole. (2) The session cookie is now an fscache_volume cookie, allocated with fscache_acquire_volume(). That takes three parameters: a string representing the "volume" in the index, a string naming the cache to use (or NULL) and a u64 that conveys coherency metadata for the volume. For 9p, I've made it render the volume name string as: "9p,<devname>,<cachetag>" where the cachetag is replaced by the aname if it wasn't supplied. This probably needs rethinking a bit as the aname can have slashes in it. It might be better to hash the cachetag and use the hash or I could substitute commas for the slashes or something. (3) The fscache_cookie_def is no more and needed information is passed directly to fscache_acquire_cookie(). The cache no longer calls back into the filesystem, but rather metadata changes are indicated at other times. fscache_acquire_cookie() is passed the same keying and coherency information as before. (4) The functions to set/reset/flush cookies are removed and fscache_use_cookie() and fscache_unuse_cookie() are used instead. fscache_use_cookie() is passed a flag to indicate if the cookie is opened for writing. fscache_unuse_cookie() is passed updates for the metadata if we changed it (ie. if the file was opened for writing). These are called when the file is opened or closed. (5) wait_on_page_bit[_killable]() is replaced with the specific wait functions for the bits waited upon. (6) I've got rid of some of the 9p-specific cache helper functions and called things like fscache_relinquish_cookie() directly as they'll optimise away if v9fs_inode_cookie() returns an unconditional NULL (which will be the case if CONFIG_9P_FSCACHE=n). (7) v9fs_vfs_setattr() is made to call fscache_resize() to change the size of the cache object. Notes: (A) We should call fscache_invalidate() if we detect that the server's copy of a file got changed by a third party, but I don't know where to do that. We don't need to do that when allocating the cookie as we get a check-and-invalidate when we initially bind to the cache object. (B) The copy-to-cache-on-writeback side of things will be handled in separate patch. Changes ======= ver #3: - Canonicalise the cookie key and coherency data to make them endianness-independent. ver #2: - Use gfpflags_allow_blocking() rather than using flag directly. - fscache_acquire_volume() now returns errors. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dominique Martinet <asmadeus@codewreck.org> cc: Eric Van Hensbergen <ericvh@gmail.com> cc: Latchesar Ionkov <lucho@ionkov.net> cc: v9fs-developer@lists.sourceforge.net cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819664645.215744.1555314582005286846.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906975017.143852.3459573173204394039.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967178512.1823006.17377493641569138183.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021573143.640689.3977487095697717967.stgit@warthog.procyon.org.uk/ # v4 diff 78525c74 Wed Aug 11 02:49:13 MDT 2021 David Howells <dhowells@redhat.com> netfs, 9p, afs, ceph: Use folios Convert the netfs helper library to use folios throughout, convert the 9p and afs filesystems to use folios in their file I/O paths and convert the ceph filesystem to use just enough folios to compile. With these changes, afs passes -g quick xfstests. Changes ======= ver #5: - Got rid of folio_end{io,_read,_write}() and inlined the stuff it does instead (Willy decided he didn't want this after all). ver #4: - Fixed a bug in afs_redirty_page() whereby it didn't set the next page index in the loop and returned too early. - Simplified a check in v9fs_vfs_write_folio_locked()[1]. - Undid a change to afs_symlink_readpage()[1]. - Used offset_in_folio() in afs_write_end()[1]. - Changed from using page_endio() to folio_end{io,_read,_write}()[1]. ver #2: - Add 9p foliation. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dominique Martinet <asmadeus@codewreck.org> Tested-by: kafs-testing@auristor.com cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Dominique Martinet <asmadeus@codewreck.org> cc: v9fs-developer@lists.sourceforge.net cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/YYKa3bfQZxK5/wDN@casper.infradead.org/ [1] Link: https://lore.kernel.org/r/2408234.1628687271@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/162877311459.3085614.10601478228012245108.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/162981153551.1901565.3124454657133703341.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/163005745264.2472992.9852048135392188995.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163584187452.4023316.500389675405550116.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/163649328026.309189.1124218109373941936.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/163657852454.834781.9265101983152100556.stgit@warthog.procyon.org.uk/ # v5 diff eb497943 Tue Nov 02 02:29:55 MDT 2021 David Howells <dhowells@redhat.com> 9p: Convert to using the netfs helper lib to do reads and caching Convert the 9p filesystem to use the netfs helper lib to handle readpage, readahead and write_begin, converting those into a common issue_op for the filesystem itself to handle. The netfs helper lib also handles reading from fscache if a cache is available, and interleaving reads from both sources. This change also switches from the old fscache I/O API to the new one, meaning that fscache no longer keeps track of netfs pages and instead does async DIO between the backing files and the 9p file pagecache. As a part of this change, the handling of PG_fscache changes. It now just means that the cache has a write I/O operation in progress on a page (PG_locked is used for a read I/O op). Note that this is a cut-down version of the fscache rewrite and does not change any of the cookie and cache coherency handling. Changes ======= ver #4: - Rebase on top of folios. - Don't use wait_on_page_bit_killable(). ver #3: - v9fs_req_issue_op() needs to terminate the subrequest. - v9fs_write_end() needs to call SetPageUptodate() a bit more often. - It's not CONFIG_{AFS,V9FS}_FSCACHE[1] - v9fs_init_rreq() should take a ref on the p9_fid and the cleanup should drop it [from Dominique Martinet]. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Dominique Martinet <asmadeus@codewreck.org> cc: v9fs-developer@lists.sourceforge.net cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/YUm+xucHxED+1MJp@codewreck.org/ [1] Link: https://lore.kernel.org/r/163162772646.438332.16323773205855053535.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/163189109885.2509237.7153668924503399173.stgit@warthog.procyon.org.uk/ # rfc v2 Link: https://lore.kernel.org/r/163363943896.1980952.1226527304649419689.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/163551662876.1877519.14706391695553204156.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/163584179557.4023316.11089762304657644342.stgit@warthog.procyon.org.uk # rebase on folio Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> diff f5f7ab16 Sun Oct 04 12:04:22 MDT 2020 Matthew Wilcox (Oracle) <willy@infradead.org> 9P: Cast to loff_t before multiplying On 32-bit systems, this multiplication will overflow for files larger than 4GB. Link: http://lkml.kernel.org/r/20201004180428.14494-2-willy@infradead.org Cc: stable@vger.kernel.org Fixes: fb89b45cdfdc ("9P: introduction of a new cache=mmap model.") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> diff 5e3cc1ee Wed Jan 23 23:35:13 MST 2019 Hou Tao <houtao1@huawei.com> 9p: use inode->i_lock to protect i_size_write() under 32-bit Use inode->i_lock to protect i_size_write(), else i_size_read() in generic_fillattr() may loop infinitely in read_seqcount_begin() when multiple processes invoke v9fs_vfs_getattr() or v9fs_vfs_getattr_dotl() simultaneously under 32-bit SMP environment, and a soft lockup will be triggered as show below: watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [stat:2217] Modules linked in: CPU: 5 PID: 2217 Comm: stat Not tainted 5.0.0-rc1-00005-g7f702faf5a9e #4 Hardware name: Generic DT based system PC is at generic_fillattr+0x104/0x108 LR is at 0xec497f00 pc : [<802b8898>] lr : [<ec497f00>] psr: 200c0013 sp : ec497e20 ip : ed608030 fp : ec497e3c r10: 00000000 r9 : ec497f00 r8 : ed608030 r7 : ec497ebc r6 : ec497f00 r5 : ee5c1550 r4 : ee005780 r3 : 0000052d r2 : 00000000 r1 : ec497f00 r0 : ed608030 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: ac48006a DAC: 00000051 CPU: 5 PID: 2217 Comm: stat Not tainted 5.0.0-rc1-00005-g7f702faf5a9e #4 Hardware name: Generic DT based system Backtrace: [<8010d974>] (dump_backtrace) from [<8010dc88>] (show_stack+0x20/0x24) [<8010dc68>] (show_stack) from [<80a1d194>] (dump_stack+0xb0/0xdc) [<80a1d0e4>] (dump_stack) from [<80109f34>] (show_regs+0x1c/0x20) [<80109f18>] (show_regs) from [<801d0a80>] (watchdog_timer_fn+0x280/0x2f8) [<801d0800>] (watchdog_timer_fn) from [<80198658>] (__hrtimer_run_queues+0x18c/0x380) [<801984cc>] (__hrtimer_run_queues) from [<80198e60>] (hrtimer_run_queues+0xb8/0xf0) [<80198da8>] (hrtimer_run_queues) from [<801973e8>] (run_local_timers+0x28/0x64) [<801973c0>] (run_local_timers) from [<80197460>] (update_process_times+0x3c/0x6c) [<80197424>] (update_process_times) from [<801ab2b8>] (tick_nohz_handler+0xe0/0x1bc) [<801ab1d8>] (tick_nohz_handler) from [<80843050>] (arch_timer_handler_virt+0x38/0x48) [<80843018>] (arch_timer_handler_virt) from [<80180a64>] (handle_percpu_devid_irq+0x8c/0x240) [<801809d8>] (handle_percpu_devid_irq) from [<8017ac20>] (generic_handle_irq+0x34/0x44) [<8017abec>] (generic_handle_irq) from [<8017b344>] (__handle_domain_irq+0x6c/0xc4) [<8017b2d8>] (__handle_domain_irq) from [<801022e0>] (gic_handle_irq+0x4c/0x88) [<80102294>] (gic_handle_irq) from [<80101a30>] (__irq_svc+0x70/0x98) [<802b8794>] (generic_fillattr) from [<8056b284>] (v9fs_vfs_getattr_dotl+0x74/0xa4) [<8056b210>] (v9fs_vfs_getattr_dotl) from [<802b8904>] (vfs_getattr_nosec+0x68/0x7c) [<802b889c>] (vfs_getattr_nosec) from [<802b895c>] (vfs_getattr+0x44/0x48) [<802b8918>] (vfs_getattr) from [<802b8a74>] (vfs_statx+0x9c/0xec) [<802b89d8>] (vfs_statx) from [<802b9428>] (sys_lstat64+0x48/0x78) [<802b93e0>] (sys_lstat64) from [<80101000>] (ret_fast_syscall+0x0/0x28) [dominique.martinet@cea.fr: updated comment to not refer to a function in another subsystem] Link: http://lkml.kernel.org/r/20190124063514.8571-2-houtao1@huawei.com Cc: stable@vger.kernel.org Fixes: 7549ae3e81cc ("9p: Use the i_size_[read, write]() macros instead of using inode->i_size directly.") Reported-by: Xing Gaopeng <xingaopeng@huawei.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> diff 5e3cc1ee Wed Jan 23 23:35:13 MST 2019 Hou Tao <houtao1@huawei.com> 9p: use inode->i_lock to protect i_size_write() under 32-bit Use inode->i_lock to protect i_size_write(), else i_size_read() in generic_fillattr() may loop infinitely in read_seqcount_begin() when multiple processes invoke v9fs_vfs_getattr() or v9fs_vfs_getattr_dotl() simultaneously under 32-bit SMP environment, and a soft lockup will be triggered as show below: watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [stat:2217] Modules linked in: CPU: 5 PID: 2217 Comm: stat Not tainted 5.0.0-rc1-00005-g7f702faf5a9e #4 Hardware name: Generic DT based system PC is at generic_fillattr+0x104/0x108 LR is at 0xec497f00 pc : [<802b8898>] lr : [<ec497f00>] psr: 200c0013 sp : ec497e20 ip : ed608030 fp : ec497e3c r10: 00000000 r9 : ec497f00 r8 : ed608030 r7 : ec497ebc r6 : ec497f00 r5 : ee5c1550 r4 : ee005780 r3 : 0000052d r2 : 00000000 r1 : ec497f00 r0 : ed608030 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: ac48006a DAC: 00000051 CPU: 5 PID: 2217 Comm: stat Not tainted 5.0.0-rc1-00005-g7f702faf5a9e #4 Hardware name: Generic DT based system Backtrace: [<8010d974>] (dump_backtrace) from [<8010dc88>] (show_stack+0x20/0x24) [<8010dc68>] (show_stack) from [<80a1d194>] (dump_stack+0xb0/0xdc) [<80a1d0e4>] (dump_stack) from [<80109f34>] (show_regs+0x1c/0x20) [<80109f18>] (show_regs) from [<801d0a80>] (watchdog_timer_fn+0x280/0x2f8) [<801d0800>] (watchdog_timer_fn) from [<80198658>] (__hrtimer_run_queues+0x18c/0x380) [<801984cc>] (__hrtimer_run_queues) from [<80198e60>] (hrtimer_run_queues+0xb8/0xf0) [<80198da8>] (hrtimer_run_queues) from [<801973e8>] (run_local_timers+0x28/0x64) [<801973c0>] (run_local_timers) from [<80197460>] (update_process_times+0x3c/0x6c) [<80197424>] (update_process_times) from [<801ab2b8>] (tick_nohz_handler+0xe0/0x1bc) [<801ab1d8>] (tick_nohz_handler) from [<80843050>] (arch_timer_handler_virt+0x38/0x48) [<80843018>] (arch_timer_handler_virt) from [<80180a64>] (handle_percpu_devid_irq+0x8c/0x240) [<801809d8>] (handle_percpu_devid_irq) from [<8017ac20>] (generic_handle_irq+0x34/0x44) [<8017abec>] (generic_handle_irq) from [<8017b344>] (__handle_domain_irq+0x6c/0xc4) [<8017b2d8>] (__handle_domain_irq) from [<801022e0>] (gic_handle_irq+0x4c/0x88) [<80102294>] (gic_handle_irq) from [<80101a30>] (__irq_svc+0x70/0x98) [<802b8794>] (generic_fillattr) from [<8056b284>] (v9fs_vfs_getattr_dotl+0x74/0xa4) [<8056b210>] (v9fs_vfs_getattr_dotl) from [<802b8904>] (vfs_getattr_nosec+0x68/0x7c) [<802b889c>] (vfs_getattr_nosec) from [<802b895c>] (vfs_getattr+0x44/0x48) [<802b8918>] (vfs_getattr) from [<802b8a74>] (vfs_statx+0x9c/0xec) [<802b89d8>] (vfs_statx) from [<802b9428>] (sys_lstat64+0x48/0x78) [<802b93e0>] (sys_lstat64) from [<80101000>] (ret_fast_syscall+0x0/0x28) [dominique.martinet@cea.fr: updated comment to not refer to a function in another subsystem] Link: http://lkml.kernel.org/r/20190124063514.8571-2-houtao1@huawei.com Cc: stable@vger.kernel.org Fixes: 7549ae3e81cc ("9p: Use the i_size_[read, write]() macros instead of using inode->i_size directly.") Reported-by: Xing Gaopeng <xingaopeng@huawei.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> diff 9d5b86ac Sun Jul 16 08:28:22 MDT 2017 Benjamin Coddington <bcodding@redhat.com> fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks Since commit c69899a17ca4 "NFSv4: Update of VFS byte range lock must be atomic with the stateid update", NFSv4 has been inserting locks in rpciod worker context. The result is that the file_lock's fl_nspid is the kworker's pid instead of the original userspace pid. The fl_nspid is only used to represent the namespaced virtual pid number when displaying locks or returning from F_GETLK. There's no reason to set it for every inserted lock, since we can usually just look it up from fl_pid. So, instead of looking up and holding struct pid for every lock, let's just look up the virtual pid number from fl_pid when it is needed. That means we can remove fl_nspid entirely. The translaton and presentation of fl_pid should handle the following four cases: 1 - F_GETLK on a remote file with a remote lock: In this case, the filesystem should determine the l_pid to return here. Filesystems should indicate that the fl_pid represents a non-local pid value that should not be translated by returning an fl_pid <= 0. 2 - F_GETLK on a local file with a remote lock: This should be the l_pid of the lock manager process, and translated. 3 - F_GETLK on a remote file with a local lock, and 4 - F_GETLK on a local file with a local lock: These should be the translated l_pid of the local locking process. Fuse was already doing the correct thing by translating the pid into the caller's namespace. With this change we must update fuse to translate to init's pid namespace, so that the locks API can then translate from init's pid namespace into the pid namespace of the caller. With this change, the locks API will expect that if a filesystem returns a remote pid as opposed to a local pid for F_GETLK, that remote pid will be <= 0. This signifies that the pid is remote, and the locks API will forego translating that pid into the pid namespace of the local calling process. Finally, we convert remote filesystems to present remote pids using negative numbers. Have lustre, 9p, ceph, cifs, and dlm negate the remote pid returned for F_GETLK lock requests. Since local pids will never be larger than PID_MAX_LIMIT (which is currently defined as <= 4 million), but pid_t is an unsigned int, we should have plenty of room to represent remote pids with negative numbers if we assume that remote pid numbers are similarly limited. If this is not the case, then we run the risk of having a remote pid returned for which there is also a corresponding local pid. This is a problem we have now, but this patch should reduce the chances of that occurring, while also returning those remote pid numbers, for whatever that may be worth. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> diff 9d5b86ac Sun Jul 16 08:28:22 MDT 2017 Benjamin Coddington <bcodding@redhat.com> fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks Since commit c69899a17ca4 "NFSv4: Update of VFS byte range lock must be atomic with the stateid update", NFSv4 has been inserting locks in rpciod worker context. The result is that the file_lock's fl_nspid is the kworker's pid instead of the original userspace pid. The fl_nspid is only used to represent the namespaced virtual pid number when displaying locks or returning from F_GETLK. There's no reason to set it for every inserted lock, since we can usually just look it up from fl_pid. So, instead of looking up and holding struct pid for every lock, let's just look up the virtual pid number from fl_pid when it is needed. That means we can remove fl_nspid entirely. The translaton and presentation of fl_pid should handle the following four cases: 1 - F_GETLK on a remote file with a remote lock: In this case, the filesystem should determine the l_pid to return here. Filesystems should indicate that the fl_pid represents a non-local pid value that should not be translated by returning an fl_pid <= 0. 2 - F_GETLK on a local file with a remote lock: This should be the l_pid of the lock manager process, and translated. 3 - F_GETLK on a remote file with a local lock, and 4 - F_GETLK on a local file with a local lock: These should be the translated l_pid of the local locking process. Fuse was already doing the correct thing by translating the pid into the caller's namespace. With this change we must update fuse to translate to init's pid namespace, so that the locks API can then translate from init's pid namespace into the pid namespace of the caller. With this change, the locks API will expect that if a filesystem returns a remote pid as opposed to a local pid for F_GETLK, that remote pid will be <= 0. This signifies that the pid is remote, and the locks API will forego translating that pid into the pid namespace of the local calling process. Finally, we convert remote filesystems to present remote pids using negative numbers. Have lustre, 9p, ceph, cifs, and dlm negate the remote pid returned for F_GETLK lock requests. Since local pids will never be larger than PID_MAX_LIMIT (which is currently defined as <= 4 million), but pid_t is an unsigned int, we should have plenty of room to represent remote pids with negative numbers if we assume that remote pid numbers are similarly limited. If this is not the case, then we run the risk of having a remote pid returned for which there is also a corresponding local pid. This is a problem we have now, but this patch should reduce the chances of that occurring, while also returning those remote pid numbers, for whatever that may be worth. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> diff b403f0e3 Wed Jun 29 02:54:23 MDT 2016 Miklos Szeredi <mszeredi@redhat.com> 9p: use file_dentry() v9fs may be used as lower layer of overlayfs and accessing f_path.dentry can lead to a crash. In this case it's a NULL pointer dereference in p9_fid_create(). Fix by replacing direct access of file->f_path.dentry with the file_dentry() accessor, which will always return a native object. Reported-by: Alessio Igor Bogani <alessioigorbogani@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Alessio Igor Bogani <alessioigorbogani@gmail.com> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
/linux-master/fs/adfs/ | ||
H A D | adfs.h | diff b2441318 Wed Nov 01 08:07:57 MDT 2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org> License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
H A D | dir.c | diff 4a0a88b6 Mon Dec 09 04:10:06 MST 2019 Russell King <rmk+kernel@armlinux.org.uk> fs/adfs: dir: improve compiler coverage in adfs_dir_update Get rid of the ifdef, using IS_ENABLED() instead to detect whether the code should be callable. This allows the compiler to always parse the following code, reducing the chances of errors being missed. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
H A D | file.c | diff b2441318 Wed Nov 01 08:07:57 MDT 2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org> License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
/linux-master/fs/afs/ | ||
H A D | dir.c | diff 453924de Wed Nov 08 06:57:42 MST 2023 David Howells <dhowells@redhat.com> afs: Overhaul invalidation handling to better support RO volumes Overhaul the third party-induced invalidation handling, making use of the previously added volume-level event counters (cb_scrub and cb_ro_snapshot) that are now being parsed out of the VolSync record returned by the fileserver in many of its replies. This allows better handling of RO (and Backup) volumes. Since these are snapshot of a RW volume that are updated atomically simultantanously across all servers that host them, they only require a single callback promise for the entire volume. The currently upstream code assumes that RO volumes operate in the same manner as RW volumes, and that each file has its own individual callback - which means that it does a status fetch for *every* file in a RO volume, whether or not the volume got "released" (volume callback breaks can occur for other reasons too, such as the volumeserver taking ownership of a volume from a fileserver). To this end, make the following changes: (1) Change the meaning of the volume's cb_v_break counter so that it is now a hint that we need to issue a status fetch to work out the state of a volume. cb_v_break is incremented by volume break callbacks and by server initialisation callbacks. (2) Add a second counter, cb_v_check, to the afs_volume struct such that if this differs from cb_v_break, we need to do a check. When the check is complete, cb_v_check is advanced to what cb_v_break was at the start of the status fetch. (3) Move the list of mmap'd vnodes to the volume and trigger removal of PTEs that map to files on a volume break rather than on a server break. (4) When a server reinitialisation callback comes in, use the server-to-volume reverse mapping added in a preceding patch to iterate over all the volumes using that server and clear the volume callback promises for that server and the general volume promise as a whole to trigger reanalysis. (5) Replace the AFS_VNODE_CB_PROMISED flag with an AFS_NO_CB_PROMISE (TIME64_MIN) value in the cb_expires_at field, reducing the number of checks we need to make. (6) Change afs_check_validity() to quickly see if various event counters have been incremented or if the vnode or volume callback promise is due to expire/has expired without making any changes to the state. That is now left to afs_validate() as this may get more complicated in future as we may have to examine server records too. (7) Overhaul afs_validate() so that it does a single status fetch if we need to check the state of either the vnode or the volume - and do so under appropriate locking. The function does the following steps: (A) If the vnode/volume is no longer seen as valid, then we take the vnode validation lock and, if the volume promise has expired, the volume check lock also. The latter prevents redundant checks being made to find out if a new version of the volume got released. (B) If a previous RPC call found that the volsync changed unexpectedly or that a RO volume was updated, then we unmap all PTEs pointing to the file to stop mmap being used for access. (C) If the vnode is still seen to be of uncertain validity, then we perform an FS.FetchStatus RPC op to jointly update the volume status and the vnode status. This assessment is done as part of parsing the reply: If the RO volume creation timestamp advances, cb_ro_snapshot is incremented; if either the creation or update timestamps changes in an unexpected way, the cb_scrub counter is incremented If the Data Version returned doesn't match the copy we have locally, then we ask for the pagecache to be zapped. This takes care of handling RO update. (D) If cb_scrub differs between volume and vnode, the vnode's pagecache is zapped and the vnode's cb_scrub is updated unless the file is marked as having been deleted. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org diff 874c8ca1 Thu Jun 09 14:46:04 MDT 2022 David Howells <dhowells@redhat.com> netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context While randstruct was satisfied with using an open-coded "void *" offset cast for the netfs_i_context <-> inode casting, __builtin_object_size() as used by FORTIFY_SOURCE was not as easily fooled. This was causing the following complaint[1] from gcc v12: In file included from include/linux/string.h:253, from include/linux/ceph/ceph_debug.h:7, from fs/ceph/inode.c:2: In function 'fortify_memset_chk', inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2, inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2: include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 242 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by embedding a struct inode into struct netfs_i_context (which should perhaps be renamed to struct netfs_inode). The struct inode vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode structs and vfs_inode is then simply changed to "netfs.inode" in those filesystems. Further, rename netfs_i_context to netfs_inode, get rid of the netfs_inode() function that converted a netfs_i_context pointer to an inode pointer (that can now be done with &ctx->inode) and rename the netfs_i_context() function to netfs_inode() (which is now a wrapper around container_of()). Most of the changes were done with: perl -p -i -e 's/vfs_inode/netfs.inode/'g \ `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]` Kees suggested doing it with a pair structure[2] and a special declarator to insert that into the network filesystem's inode wrapper[3], but I think it's cleaner to embed it - and then it doesn't matter if struct randomisation reorders things. Dave Chinner suggested using a filesystem-specific VFS_I() function in each filesystem to convert that filesystem's own inode wrapper struct into the VFS inode struct[4]. Version #2: - Fix a couple of missed name changes due to a disabled cifs option. - Rename nfs_i_context to nfs_inode - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper structs. [ This also undoes commit 507160f46c55 ("netfs: gcc-12: temporarily disable '-Wattribute-warning' for now") that is no longer needed ] Fixes: bc899ee1c898 ("netfs: Add a netfs inode context") Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> cc: Jonathan Corbet <corbet@lwn.net> cc: Eric Van Hensbergen <ericvh@gmail.com> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Steve French <smfrench@gmail.com> cc: William Kucharski <william.kucharski@oracle.com> cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> cc: Dave Chinner <david@fromorbit.com> cc: linux-doc@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: samba-technical@lists.samba.org cc: linux-fsdevel@vger.kernel.org cc: linux-hardening@vger.kernel.org Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1] Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2] Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3] Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4] Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 874c8ca1 Thu Jun 09 14:46:04 MDT 2022 David Howells <dhowells@redhat.com> netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context While randstruct was satisfied with using an open-coded "void *" offset cast for the netfs_i_context <-> inode casting, __builtin_object_size() as used by FORTIFY_SOURCE was not as easily fooled. This was causing the following complaint[1] from gcc v12: In file included from include/linux/string.h:253, from include/linux/ceph/ceph_debug.h:7, from fs/ceph/inode.c:2: In function 'fortify_memset_chk', inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2, inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2: include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 242 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by embedding a struct inode into struct netfs_i_context (which should perhaps be renamed to struct netfs_inode). The struct inode vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode structs and vfs_inode is then simply changed to "netfs.inode" in those filesystems. Further, rename netfs_i_context to netfs_inode, get rid of the netfs_inode() function that converted a netfs_i_context pointer to an inode pointer (that can now be done with &ctx->inode) and rename the netfs_i_context() function to netfs_inode() (which is now a wrapper around container_of()). Most of the changes were done with: perl -p -i -e 's/vfs_inode/netfs.inode/'g \ `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]` Kees suggested doing it with a pair structure[2] and a special declarator to insert that into the network filesystem's inode wrapper[3], but I think it's cleaner to embed it - and then it doesn't matter if struct randomisation reorders things. Dave Chinner suggested using a filesystem-specific VFS_I() function in each filesystem to convert that filesystem's own inode wrapper struct into the VFS inode struct[4]. Version #2: - Fix a couple of missed name changes due to a disabled cifs option. - Rename nfs_i_context to nfs_inode - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper structs. [ This also undoes commit 507160f46c55 ("netfs: gcc-12: temporarily disable '-Wattribute-warning' for now") that is no longer needed ] Fixes: bc899ee1c898 ("netfs: Add a netfs inode context") Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> cc: Jonathan Corbet <corbet@lwn.net> cc: Eric Van Hensbergen <ericvh@gmail.com> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Steve French <smfrench@gmail.com> cc: William Kucharski <william.kucharski@oracle.com> cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> cc: Dave Chinner <david@fromorbit.com> cc: linux-doc@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: samba-technical@lists.samba.org cc: linux-fsdevel@vger.kernel.org cc: linux-hardening@vger.kernel.org Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1] Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2] Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3] Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4] Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff b4280812 Thu Apr 29 04:18:12 MDT 2021 Jiapeng Chong <jiapeng.chong@linux.alibaba.com> afs: Remove redundant assignment to ret Variable ret is set to -ENOENT and -ENOMEM but this value is never read as it is overwritten or not used later on, hence it is a redundant assignment and can be removed. Cleans up the following clang-analyzer warning: fs/afs/dir.c:2014:4: warning: Value stored to 'ret' is never read [clang-analyzer-deadcode.DeadStores]. fs/afs/dir.c:659:2: warning: Value stored to 'ret' is never read [clang-analyzer-deadcode.DeadStores]. [DH made the following modifications: - In afs_rename(), -ENOMEM should be placed in op->error instead of ret, rather than the assignment being removed entirely. afs_put_operation() will pick it up from there and return it. - If afs_sillyrename() fails, its error code should be placed in op->error rather than in ret also. ] Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept") Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/1619691492-83866-1-git-send-email-jiapeng.chong@linux.alibaba.com Link: https://lore.kernel.org/r/162609465444.3133237.7562832521724298900.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/162610729052.3408253.17364333638838151299.stgit@warthog.procyon.org.uk/ # v2 diff 366911cd Wed Dec 23 03:39:57 MST 2020 David Howells <dhowells@redhat.com> afs: Fix directory entry size calculation The number of dirent records used by an AFS directory entry should be calculated using the assumption that there is a 16-byte name field in the first block, rather than a 20-byte name field (which is actually the case). This miscalculation is historic and effectively standard, so we have to use it. The calculation we need to use is: 1 + (((strlen(name) + 1) + 15) >> 5) where we are adding one to the strlen() result to account for the NUL termination. Fix this by the following means: (1) Create an inline function to do the calculation for a given name length. (2) Use the function to calculate the number of records used for a dirent in afs_dir_iterate_block(). Use this to move the over-end check out of the loop since it only needs to be done once. Further use this to only go through the loop for the 2nd+ records composing an entry. The only test there now is for if the record is allocated - and we already checked the first block at the top of the outer loop. (3) Add a max name length check in afs_dir_iterate_block(). (4) Make afs_edit_dir_add() and afs_edit_dir_remove() use the function from (1) to calculate the number of blocks rather than doing it incorrectly themselves. Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...") Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> diff 3f649ab7 Wed Jun 03 14:09:38 MDT 2020 Kees Cook <keescook@chromium.org> treewide: Remove uninitialized_var() usage Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org> diff 3f649ab7 Wed Jun 03 14:09:38 MDT 2020 Kees Cook <keescook@chromium.org> treewide: Remove uninitialized_var() usage Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org> diff b6489a49 Mon Jun 15 10:36:58 MDT 2020 David Howells <dhowells@redhat.com> afs: Fix silly rename Fix AFS's silly rename by the following means: (1) Set the destination directory in afs_do_silly_rename() so as to avoid misbehaviour and indicate that the directory data version will increment by 1 so as to avoid warnings about unexpected changes in the DV. Also indicate that the ctime should be updated to avoid xfstest grumbling. (2) Note when the server indicates that a directory changed more than we expected (AFS_OPERATION_DIR_CONFLICT), indicating a conflict with a third party change, checking on successful completion of unlink and rename. The problem is that the FS.RemoveFile RPC op doesn't report the status of the unlinked file, though YFS.RemoveFile2 does. This can be mitigated by the assumption that if the directory DV cranked by exactly 1, we can be sure we removed one link from the file; further, ordinarily in AFS, files cannot be hardlinked across directories, so if we reduce nlink to 0, the file is deleted. However, if the directory DV jumps by more than 1, we cannot know if a third party intervened by adding or removing a link on the file we just removed a link from. The same also goes for any vnode that is at the destination of the FS.Rename RPC op. (3) Make afs_vnode_commit_status() apply the nlink drop inside the cb_lock section along with the other attribute updates if ->op_unlinked is set on the descriptor for the appropriate vnode. (4) Issue a follow up status fetch to the unlinked file in the event of a third party conflict that makes it impossible for us to know if we actually deleted the file or not. (5) Provide a flag, AFS_VNODE_SILLY_DELETED, to make afs_getattr() lie to the user about the nlink of a silly deleted file so that it appears as 0, not 1. Found with the generic/035 and generic/084 xfstests. Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> diff da8d0755 Sat Jun 13 12:34:59 MDT 2020 David Howells <dhowells@redhat.com> afs: Concoct ctimes The in-kernel afs filesystem ignores ctime because the AFS fileserver protocol doesn't support ctimes. This, however, causes various xfstests to fail. Work around this by: (1) Setting ctime to attr->ia_ctime in afs_setattr(). (2) Not ignoring ATTR_MTIME_SET, ATTR_TIMES_SET and ATTR_TOUCH settings. (3) Setting the ctime from the server mtime when on the target file when creating a hard link to it. (4) Setting the ctime on directories from their revised mtimes when renaming/moving a file. Found by the generic/221 and generic/309 xfstests. Signed-off-by: David Howells <dhowells@redhat.com> diff e49c7b2f Fri Apr 10 13:51:51 MDT 2020 David Howells <dhowells@redhat.com> afs: Build an abstraction around an "operation" concept Turn the afs_operation struct into the main way that most fileserver operations are managed. Various things are added to the struct, including the following: (1) All the parameters and results of the relevant operations are moved into it, removing corresponding fields from the afs_call struct. afs_call gets a pointer to the op. (2) The target volume is made the main focus of the operation, rather than the target vnode(s), and a bunch of op->vnode->volume are made op->volume instead. (3) Two vnode records are defined (op->file[]) for the vnode(s) involved in most operations. The vnode record (struct afs_vnode_param) contains: - The vnode pointer. - The fid of the vnode to be included in the parameters or that was returned in the reply (eg. FS.MakeDir). - The status and callback information that may be returned in the reply about the vnode. - Callback break and data version tracking for detecting simultaneous third-parth changes. (4) Pointers to dentries to be updated with new inodes. (5) An operations table pointer. The table includes pointers to functions for issuing AFS and YFS-variant RPCs, handling the success and abort of an operation and handling post-I/O-lock local editing of a directory. To make this work, the following function restructuring is made: (A) The rotation loop that issues calls to fileservers that can be found in each function that wants to issue an RPC (such as afs_mkdir()) is extracted out into common code, in a new file called fs_operation.c. (B) The rotation loops, such as the one in afs_mkdir(), are replaced with a much smaller piece of code that allocates an operation, sets the parameters and then calls out to the common code to do the actual work. (C) The code for handling the success and failure of an operation are moved into operation functions (as (5) above) and these are called from the core code at appropriate times. (D) The pseudo inode getting stuff used by the dynamic root code is moved over into dynroot.c. (E) struct afs_iget_data is absorbed into the operation struct and afs_iget() expects to be given an op pointer and a vnode record. (F) Point (E) doesn't work for the root dir of a volume, but we know the FID in advance (it's always vnode 1, unique 1), so a separate inode getter, afs_root_iget(), is provided to special-case that. (G) The inode status init/update functions now also take an op and a vnode record. (H) The RPC marshalling functions now, for the most part, just take an afs_operation struct as their only argument. All the data they need is held there. The result delivery functions write their answers there as well. (I) The call is attached to the operation and then the operation core does the waiting. And then the new operation code is, for the moment, made to just initialise the operation, get the appropriate vnode I/O locks and do the same rotation loop as before. This lays the foundation for the following changes in the future: (*) Overhauling the rotation (again). (*) Support for asynchronous I/O, where the fileserver rotation must be done asynchronously also. Signed-off-by: David Howells <dhowells@redhat.com> |
H A D | internal.h | diff 495f2ae9 Wed Oct 18 02:24:01 MDT 2023 David Howells <dhowells@redhat.com> afs: Fix fileserver rotation Fix the fileserver rotation so that it doesn't use RTT as the basis for deciding which server and address to use as this doesn't necessarily give a good indication of the best path. Instead, use the configurable preference list in conjunction with whatever probes have succeeded at the time of looking. To this end, make the following changes: (1) Keep an array of "server states" to track what addresses we've tried on each server and move the waitqueue entries there that we'll need for probing. (2) Each afs_server_state struct is made to pin the corresponding server's endpoint state rather than the afs_operation struct carrying a pin on the server we're currently looking at. (3) Drop the server list preference; we now always rescan the server list. (4) afs_wait_for_probes() now uses the server state list to guide it in what it waits for (and to provide the waitqueue entries) and returns an indication of whether we'd got a response, run out of responsive addresses or the endpoint state had been superseded and we need to restart the iteration. (5) Call afs_get_address_preferences*() occasionally to refresh the preference values. (6) When picking a server, scan the addresses of the servers for which we have as-yet untested communications, looking for the highest priority one and use that instead of trying all the addresses for a particular server in ascending-RTT order. (7) When a Busy or Offline state is seen across all available servers, do a short sleep. (8) If we detect that we accessed a future RO volume version whilst it is undergoing replication, reissue the op against the older version until at least half of the servers are replicated. (9) Whilst RO replication is ongoing, increase the frequency of Volume Location server checks for that volume to every ten minutes instead of hourly. Also add a tracepoint to track progress through the rotation algorithm. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org diff 453924de Wed Nov 08 06:57:42 MST 2023 David Howells <dhowells@redhat.com> afs: Overhaul invalidation handling to better support RO volumes Overhaul the third party-induced invalidation handling, making use of the previously added volume-level event counters (cb_scrub and cb_ro_snapshot) that are now being parsed out of the VolSync record returned by the fileserver in many of its replies. This allows better handling of RO (and Backup) volumes. Since these are snapshot of a RW volume that are updated atomically simultantanously across all servers that host them, they only require a single callback promise for the entire volume. The currently upstream code assumes that RO volumes operate in the same manner as RW volumes, and that each file has its own individual callback - which means that it does a status fetch for *every* file in a RO volume, whether or not the volume got "released" (volume callback breaks can occur for other reasons too, such as the volumeserver taking ownership of a volume from a fileserver). To this end, make the following changes: (1) Change the meaning of the volume's cb_v_break counter so that it is now a hint that we need to issue a status fetch to work out the state of a volume. cb_v_break is incremented by volume break callbacks and by server initialisation callbacks. (2) Add a second counter, cb_v_check, to the afs_volume struct such that if this differs from cb_v_break, we need to do a check. When the check is complete, cb_v_check is advanced to what cb_v_break was at the start of the status fetch. (3) Move the list of mmap'd vnodes to the volume and trigger removal of PTEs that map to files on a volume break rather than on a server break. (4) When a server reinitialisation callback comes in, use the server-to-volume reverse mapping added in a preceding patch to iterate over all the volumes using that server and clear the volume callback promises for that server and the general volume promise as a whole to trigger reanalysis. (5) Replace the AFS_VNODE_CB_PROMISED flag with an AFS_NO_CB_PROMISE (TIME64_MIN) value in the cb_expires_at field, reducing the number of checks we need to make. (6) Change afs_check_validity() to quickly see if various event counters have been incremented or if the vnode or volume callback promise is due to expire/has expired without making any changes to the state. That is now left to afs_validate() as this may get more complicated in future as we may have to examine server records too. (7) Overhaul afs_validate() so that it does a single status fetch if we need to check the state of either the vnode or the volume - and do so under appropriate locking. The function does the following steps: (A) If the vnode/volume is no longer seen as valid, then we take the vnode validation lock and, if the volume promise has expired, the volume check lock also. The latter prevents redundant checks being made to find out if a new version of the volume got released. (B) If a previous RPC call found that the volsync changed unexpectedly or that a RO volume was updated, then we unmap all PTEs pointing to the file to stop mmap being used for access. (C) If the vnode is still seen to be of uncertain validity, then we perform an FS.FetchStatus RPC op to jointly update the volume status and the vnode status. This assessment is done as part of parsing the reply: If the RO volume creation timestamp advances, cb_ro_snapshot is incremented; if either the creation or update timestamps changes in an unexpected way, the cb_scrub counter is incremented If the Data Version returned doesn't match the copy we have locally, then we ask for the pagecache to be zapped. This takes care of handling RO update. (D) If cb_scrub differs between volume and vnode, the vnode's pagecache is zapped and the vnode's cb_scrub is updated unless the file is marked as having been deleted. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org diff 16069e13 Sun Nov 05 09:11:07 MST 2023 David Howells <dhowells@redhat.com> afs: Parse the VolSync record in the reply of a number of RPC ops A number of fileserver RPC operations return a VolSync record as part of their reply that gives some information about the state of the volume being accessed, including: (1) A volume Creation timestamp. For an RW volume, this is the time at which the volume was created; if it changes, the RW volume was presumably restored from a backup and all cached data should be scrubbed as Data Version numbers could regress on the files in the volume. For an RO volume, this is the time it was last snapshotted from the RW volume. It is expected to advance each time this happens; if it regresses, cached data should be scrubbed. (2) A volume Update timestamp (Auristor only). For an RW volume, this is updated any time any change is made to a volume or its contents. If it regresses, all cached data must be scrubbed. For an RO volume, this is a copy of the RW volume's Update timestamp at the point of snapshotting. It can be used as a version number when checking to see if a callback on a RO volume was due to a snapshot. If it regresses, all cached data must be scrubbed. but this is currently not made use of by the in-kernel afs filesystem. Make the afs filesystem use this by: (1) Add an update time field to the afs_volsync struct and use a value of TIME64_MIN in both that and the creation time to indicate that they are unset. (2) Add creation and update time fields to the afs_volume struct and use this to track the two timestamps. (3) Add a volsync_lock mutex to the afs_volume struct to control modification access for when we detect a change in these values. (3) Add a 'pre-op volsync' struct to the afs_operation struct to record the state of the volume tracking before the op. (4) Add a new counter, cb_scrub, to the afs_volume struct to count events that require all data to be scrubbed. A copy is placed in the afs_vnode struct (inode) and if they no longer match, a scrub takes place. (5) When the result of an operation is being parsed, parse the VolSync data too, if it is provided. Note that the two timestamps are handled separately, since they don't work in quite the same way. - If the afs_volume tracking is unset, just set it and do nothing else. - If the result timestamps are the same as the ones in afs_volume, do nothing. - If the timestamps regress, increment cb_scrub if not already done so. - If the creation timestamp on a RW volume changes, increment cb_scrub if not already done so. - If the creation timestamp on a RO volume advances, update the server list and see if the current server has been excluded, if so reissue the op. Once over half of the replication sites have been updated, increment cb_ro_snapshot to indicate updates may be required and switch over to excluding unupdated replication sites. - If the creation timestamp on a Backup volume advances, just increment cb_ro_snapshot to trigger updates. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org diff 72904d7b Wed Oct 18 17:55:11 MDT 2023 David Howells <dhowells@redhat.com> rxrpc, afs: Allow afs to pin rxrpc_peer objects Change rxrpc's API such that: (1) A new function, rxrpc_kernel_lookup_peer(), is provided to look up an rxrpc_peer record for a remote address and a corresponding function, rxrpc_kernel_put_peer(), is provided to dispose of it again. (2) When setting up a call, the rxrpc_peer object used during a call is now passed in rather than being set up by rxrpc_connect_call(). For afs, this meenat passing it to rxrpc_kernel_begin_call() rather than the full address (the service ID then has to be passed in as a separate parameter). (3) A new function, rxrpc_kernel_remote_addr(), is added so that afs can get a pointer to the transport address for display purposed, and another, rxrpc_kernel_remote_srx(), to gain a pointer to the full rxrpc address. (4) The function to retrieve the RTT from a call, rxrpc_kernel_get_srtt(), is then altered to take a peer. This now returns the RTT or -1 if there are insufficient samples. (5) Rename rxrpc_kernel_get_peer() to rxrpc_kernel_call_get_peer(). (6) Provide a new function, rxrpc_kernel_get_peer(), to get a ref on a peer the caller already has. This allows the afs filesystem to pin the rxrpc_peer records that it is using, allowing faster lookups and pointer comparisons rather than comparing sockaddr_rxrpc contents. It also makes it easier to get hold of the RTT. The following changes are made to afs: (1) The addr_list struct's addrs[] elements now hold a peer struct pointer and a service ID rather than a sockaddr_rxrpc. (2) When displaying the transport address, rxrpc_kernel_remote_addr() is used. (3) The port arg is removed from afs_alloc_addrlist() since it's always overridden. (4) afs_merge_fs_addr4() and afs_merge_fs_addr6() do peer lookup and may now return an error that must be handled. (5) afs_find_server() now takes a peer pointer to specify the address. (6) afs_find_server(), afs_compare_fs_alists() and afs_merge_fs_addr[46]{} now do peer pointer comparison rather than address comparison. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org diff 72904d7b Wed Oct 18 17:55:11 MDT 2023 David Howells <dhowells@redhat.com> rxrpc, afs: Allow afs to pin rxrpc_peer objects Change rxrpc's API such that: (1) A new function, rxrpc_kernel_lookup_peer(), is provided to look up an rxrpc_peer record for a remote address and a corresponding function, rxrpc_kernel_put_peer(), is provided to dispose of it again. (2) When setting up a call, the rxrpc_peer object used during a call is now passed in rather than being set up by rxrpc_connect_call(). For afs, this meenat passing it to rxrpc_kernel_begin_call() rather than the full address (the service ID then has to be passed in as a separate parameter). (3) A new function, rxrpc_kernel_remote_addr(), is added so that afs can get a pointer to the transport address for display purposed, and another, rxrpc_kernel_remote_srx(), to gain a pointer to the full rxrpc address. (4) The function to retrieve the RTT from a call, rxrpc_kernel_get_srtt(), is then altered to take a peer. This now returns the RTT or -1 if there are insufficient samples. (5) Rename rxrpc_kernel_get_peer() to rxrpc_kernel_call_get_peer(). (6) Provide a new function, rxrpc_kernel_get_peer(), to get a ref on a peer the caller already has. This allows the afs filesystem to pin the rxrpc_peer records that it is using, allowing faster lookups and pointer comparisons rather than comparing sockaddr_rxrpc contents. It also makes it easier to get hold of the RTT. The following changes are made to afs: (1) The addr_list struct's addrs[] elements now hold a peer struct pointer and a service ID rather than a sockaddr_rxrpc. (2) When displaying the transport address, rxrpc_kernel_remote_addr() is used. (3) The port arg is removed from afs_alloc_addrlist() since it's always overridden. (4) afs_merge_fs_addr4() and afs_merge_fs_addr6() do peer lookup and may now return an error that must be handled. (5) afs_find_server() now takes a peer pointer to specify the address. (6) afs_find_server(), afs_compare_fs_alists() and afs_merge_fs_addr[46]{} now do peer pointer comparison rather than address comparison. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org diff 874c8ca1 Thu Jun 09 14:46:04 MDT 2022 David Howells <dhowells@redhat.com> netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context While randstruct was satisfied with using an open-coded "void *" offset cast for the netfs_i_context <-> inode casting, __builtin_object_size() as used by FORTIFY_SOURCE was not as easily fooled. This was causing the following complaint[1] from gcc v12: In file included from include/linux/string.h:253, from include/linux/ceph/ceph_debug.h:7, from fs/ceph/inode.c:2: In function 'fortify_memset_chk', inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2, inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2: include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 242 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by embedding a struct inode into struct netfs_i_context (which should perhaps be renamed to struct netfs_inode). The struct inode vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode structs and vfs_inode is then simply changed to "netfs.inode" in those filesystems. Further, rename netfs_i_context to netfs_inode, get rid of the netfs_inode() function that converted a netfs_i_context pointer to an inode pointer (that can now be done with &ctx->inode) and rename the netfs_i_context() function to netfs_inode() (which is now a wrapper around container_of()). Most of the changes were done with: perl -p -i -e 's/vfs_inode/netfs.inode/'g \ `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]` Kees suggested doing it with a pair structure[2] and a special declarator to insert that into the network filesystem's inode wrapper[3], but I think it's cleaner to embed it - and then it doesn't matter if struct randomisation reorders things. Dave Chinner suggested using a filesystem-specific VFS_I() function in each filesystem to convert that filesystem's own inode wrapper struct into the VFS inode struct[4]. Version #2: - Fix a couple of missed name changes due to a disabled cifs option. - Rename nfs_i_context to nfs_inode - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper structs. [ This also undoes commit 507160f46c55 ("netfs: gcc-12: temporarily disable '-Wattribute-warning' for now") that is no longer needed ] Fixes: bc899ee1c898 ("netfs: Add a netfs inode context") Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> cc: Jonathan Corbet <corbet@lwn.net> cc: Eric Van Hensbergen <ericvh@gmail.com> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Steve French <smfrench@gmail.com> cc: William Kucharski <william.kucharski@oracle.com> cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> cc: Dave Chinner <david@fromorbit.com> cc: linux-doc@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: samba-technical@lists.samba.org cc: linux-fsdevel@vger.kernel.org cc: linux-hardening@vger.kernel.org Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1] Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2] Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3] Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4] Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 874c8ca1 Thu Jun 09 14:46:04 MDT 2022 David Howells <dhowells@redhat.com> netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context While randstruct was satisfied with using an open-coded "void *" offset cast for the netfs_i_context <-> inode casting, __builtin_object_size() as used by FORTIFY_SOURCE was not as easily fooled. This was causing the following complaint[1] from gcc v12: In file included from include/linux/string.h:253, from include/linux/ceph/ceph_debug.h:7, from fs/ceph/inode.c:2: In function 'fortify_memset_chk', inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2, inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2: include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 242 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by embedding a struct inode into struct netfs_i_context (which should perhaps be renamed to struct netfs_inode). The struct inode vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode structs and vfs_inode is then simply changed to "netfs.inode" in those filesystems. Further, rename netfs_i_context to netfs_inode, get rid of the netfs_inode() function that converted a netfs_i_context pointer to an inode pointer (that can now be done with &ctx->inode) and rename the netfs_i_context() function to netfs_inode() (which is now a wrapper around container_of()). Most of the changes were done with: perl -p -i -e 's/vfs_inode/netfs.inode/'g \ `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]` Kees suggested doing it with a pair structure[2] and a special declarator to insert that into the network filesystem's inode wrapper[3], but I think it's cleaner to embed it - and then it doesn't matter if struct randomisation reorders things. Dave Chinner suggested using a filesystem-specific VFS_I() function in each filesystem to convert that filesystem's own inode wrapper struct into the VFS inode struct[4]. Version #2: - Fix a couple of missed name changes due to a disabled cifs option. - Rename nfs_i_context to nfs_inode - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper structs. [ This also undoes commit 507160f46c55 ("netfs: gcc-12: temporarily disable '-Wattribute-warning' for now") that is no longer needed ] Fixes: bc899ee1c898 ("netfs: Add a netfs inode context") Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> cc: Jonathan Corbet <corbet@lwn.net> cc: Eric Van Hensbergen <ericvh@gmail.com> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Steve French <smfrench@gmail.com> cc: William Kucharski <william.kucharski@oracle.com> cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> cc: Dave Chinner <david@fromorbit.com> cc: linux-doc@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: samba-technical@lists.samba.org cc: linux-fsdevel@vger.kernel.org cc: linux-hardening@vger.kernel.org Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1] Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2] Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3] Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4] Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff bc899ee1 Tue Jun 29 15:37:05 MDT 2021 David Howells <dhowells@redhat.com> netfs: Add a netfs inode context Add a netfs_i_context struct that should be included in the network filesystem's own inode struct wrapper, directly after the VFS's inode struct, e.g.: struct my_inode { struct { /* These must be contiguous */ struct inode vfs_inode; struct netfs_i_context netfs_ctx; }; }; The netfs_i_context struct so far contains a single field for the network filesystem to use - the cache cookie: struct netfs_i_context { ... struct fscache_cookie *cache; }; Three functions are provided to help with this: (1) void netfs_i_context_init(struct inode *inode, const struct netfs_request_ops *ops); Initialise the netfs context and set the operations. (2) struct netfs_i_context *netfs_i_context(struct inode *inode); Find the netfs context from the VFS inode. (3) struct inode *netfs_inode(struct netfs_i_context *ctx); Find the VFS inode from the netfs context. Changes ======= ver #4) - Fix netfs_is_cache_enabled() to check cookie->cache_priv to see if a cache is present[3]. - Fix netfs_skip_folio_read() to zero out all of the page, not just some of it[3]. ver #3) - Split out the bit to move ceph cap-getting on readahead into ceph_init_request()[1]. - Stick in a comment to the netfs inode structs indicating the contiguity requirements[2]. ver #2) - Adjust documentation to match. - Use "#if IS_ENABLED()" in netfs_i_cookie(), not "#ifdef". - Move the cap check from ceph_readahead() to ceph_init_request() to be called from netfslib. - Remove ceph_readahead() and use netfs_readahead() directly instead. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/8af0d47f17d89c06bbf602496dd845f2b0bf25b3.camel@kernel.org/ [1] Link: https://lore.kernel.org/r/beaf4f6a6c2575ed489adb14b257253c868f9a5c.camel@kernel.org/ [2] Link: https://lore.kernel.org/r/3536452.1647421585@warthog.procyon.org.uk/ [3] Link: https://lore.kernel.org/r/164622984545.3564931.15691742939278418580.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678213320.1200972.16807551936267647470.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692909854.2099075.9535537286264248057.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/306388.1647595110@warthog.procyon.org.uk/ # v4 diff 523d27cd Thu Feb 06 07:22:21 MST 2020 David Howells <dhowells@redhat.com> afs: Convert afs to use the new fscache API Change the afs filesystem to support the new afs driver. The following changes have been made: (1) The fscache_netfs struct is no more, and there's no need to register the filesystem as a whole. There's also no longer a cell cookie. (2) The volume cookie is now an fscache_volume cookie, allocated with fscache_acquire_volume(). This function takes three parameters: a string representing the "volume" in the index, a string naming the cache to use (or NULL) and a u64 that conveys coherency metadata for the volume. For afs, I've made it render the volume name string as: "afs,<cell>,<volume_id>" and the coherency data is currently 0. (3) The fscache_cookie_def is no more and needed information is passed directly to fscache_acquire_cookie(). The cache no longer calls back into the filesystem, but rather metadata changes are indicated at other times. fscache_acquire_cookie() is passed the same keying and coherency information as before, except that these are now stored in big endian form instead of cpu endian. This makes the cache more copyable. (4) fscache_use_cookie() and fscache_unuse_cookie() are called when a file is opened or closed to prevent a cache file from being culled and to keep resources to hand that are needed to do I/O. fscache_use_cookie() is given an indication if the cache is likely to be modified locally (e.g. the file is open for writing). fscache_unuse_cookie() is given a coherency update if we had the file open for writing and will update that. (5) fscache_invalidate() is now given uptodate auxiliary data and a file size. It can also take a flag to indicate if this was due to a DIO write. This is wrapped into afs_fscache_invalidate() now for convenience. (6) fscache_resize() now gets called from the finalisation of afs_setattr(), and afs_setattr() does use/unuse of the cookie around the call to support this. (7) fscache_note_page_release() is called from afs_release_page(). (8) Use a killable wait in nfs_vm_page_mkwrite() when waiting for PG_fscache to be cleared. Render the parts of the cookie key for an afs inode cookie as big endian. Changes ======= ver #2: - Use gfpflags_allow_blocking() rather than using flag directly. - fscache_acquire_volume() now returns errors. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org> Tested-by: kafs-testing@auristor.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819661382.215744.1485608824741611837.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906970002.143852.17678518584089878259.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967174665.1823006.1301789965454084220.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021568841.640689.6684240152253400380.stgit@warthog.procyon.org.uk/ # v4 diff 78525c74 Wed Aug 11 02:49:13 MDT 2021 David Howells <dhowells@redhat.com> netfs, 9p, afs, ceph: Use folios Convert the netfs helper library to use folios throughout, convert the 9p and afs filesystems to use folios in their file I/O paths and convert the ceph filesystem to use just enough folios to compile. With these changes, afs passes -g quick xfstests. Changes ======= ver #5: - Got rid of folio_end{io,_read,_write}() and inlined the stuff it does instead (Willy decided he didn't want this after all). ver #4: - Fixed a bug in afs_redirty_page() whereby it didn't set the next page index in the loop and returned too early. - Simplified a check in v9fs_vfs_write_folio_locked()[1]. - Undid a change to afs_symlink_readpage()[1]. - Used offset_in_folio() in afs_write_end()[1]. - Changed from using page_endio() to folio_end{io,_read,_write}()[1]. ver #2: - Add 9p foliation. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dominique Martinet <asmadeus@codewreck.org> Tested-by: kafs-testing@auristor.com cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Dominique Martinet <asmadeus@codewreck.org> cc: v9fs-developer@lists.sourceforge.net cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/YYKa3bfQZxK5/wDN@casper.infradead.org/ [1] Link: https://lore.kernel.org/r/2408234.1628687271@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/162877311459.3085614.10601478228012245108.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/162981153551.1901565.3124454657133703341.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/163005745264.2472992.9852048135392188995.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163584187452.4023316.500389675405550116.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/163649328026.309189.1124218109373941936.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/163657852454.834781.9265101983152100556.stgit@warthog.procyon.org.uk/ # v5 |
H A D | mntpt.c | diff 88c853c3 Tue Jul 23 04:24:59 MDT 2019 David Howells <dhowells@redhat.com> afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com> diff 4d673da1 Mon Feb 05 23:26:30 MST 2018 David Howells <dhowells@redhat.com> afs: Support the AFS dynamic root Support the AFS dynamic root which is a pseudo-volume that doesn't connect to any server resource, but rather is just a root directory that dynamically creates mountpoint directories where the name of such a directory is the name of the cell. Such a mount can be created thus: mount -t afs none /afs -o dyn Dynamic root superblocks aren't shared except by bind mounts and propagation. Cell root volumes can then be mounted by referring to them by name, e.g.: ls /afs/grand.central.org/ ls /afs/.grand.central.org/ The kernel will upcall to consult the DNS if the address wasn't supplied directly. Signed-off-by: David Howells <dhowells@redhat.com> diff 5a0e3ad6 Wed Mar 24 02:04:11 MDT 2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> diff 4ac91378 Thu Feb 14 20:34:32 MST 2008 Jan Blunck <jblunck@suse.de> Embed a struct path into struct nameidata instead of nd->{dentry,mnt} This is the central patch of a cleanup series. In most cases there is no good reason why someone would want to use a dentry for itself. This series reflects that fact and embeds a struct path into nameidata. Together with the other patches of this series - it enforced the correct order of getting/releasing the reference count on <dentry,vfsmount> pairs - it prepares the VFS for stacking support since it is essential to have a struct path in every place where the stack can be traversed - it reduces the overall code size: without patch series: text data bss dec hex filename 5321639 858418 715768 6895825 6938d1 vmlinux with patch series: text data bss dec hex filename 5320026 858418 715768 6894212 693284 vmlinux This patch: Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix cifs] [akpm@linux-foundation.org: fix smack] Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 416351f2 Wed May 09 03:33:45 MDT 2007 David Howells <dhowells@redhat.com> AFS: AFS fixups Make some miscellaneous changes to the AFS filesystem: (1) Assert RCU barriers on module exit to make sure RCU has finished with callbacks in this module. (2) Correctly handle the AFS server returning a zero-length read. (3) Split out data zapping calls into one function (afs_zap_data). (4) Rename some afs_file_*() functions to afs_*() where they apply to non-regular files too. (5) Be consistent about the presentation of volume ID:vnode ID in debugging output. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
H A D | proc.c | diff 453924de Wed Nov 08 06:57:42 MST 2023 David Howells <dhowells@redhat.com> afs: Overhaul invalidation handling to better support RO volumes Overhaul the third party-induced invalidation handling, making use of the previously added volume-level event counters (cb_scrub and cb_ro_snapshot) that are now being parsed out of the VolSync record returned by the fileserver in many of its replies. This allows better handling of RO (and Backup) volumes. Since these are snapshot of a RW volume that are updated atomically simultantanously across all servers that host them, they only require a single callback promise for the entire volume. The currently upstream code assumes that RO volumes operate in the same manner as RW volumes, and that each file has its own individual callback - which means that it does a status fetch for *every* file in a RO volume, whether or not the volume got "released" (volume callback breaks can occur for other reasons too, such as the volumeserver taking ownership of a volume from a fileserver). To this end, make the following changes: (1) Change the meaning of the volume's cb_v_break counter so that it is now a hint that we need to issue a status fetch to work out the state of a volume. cb_v_break is incremented by volume break callbacks and by server initialisation callbacks. (2) Add a second counter, cb_v_check, to the afs_volume struct such that if this differs from cb_v_break, we need to do a check. When the check is complete, cb_v_check is advanced to what cb_v_break was at the start of the status fetch. (3) Move the list of mmap'd vnodes to the volume and trigger removal of PTEs that map to files on a volume break rather than on a server break. (4) When a server reinitialisation callback comes in, use the server-to-volume reverse mapping added in a preceding patch to iterate over all the volumes using that server and clear the volume callback promises for that server and the general volume promise as a whole to trigger reanalysis. (5) Replace the AFS_VNODE_CB_PROMISED flag with an AFS_NO_CB_PROMISE (TIME64_MIN) value in the cb_expires_at field, reducing the number of checks we need to make. (6) Change afs_check_validity() to quickly see if various event counters have been incremented or if the vnode or volume callback promise is due to expire/has expired without making any changes to the state. That is now left to afs_validate() as this may get more complicated in future as we may have to examine server records too. (7) Overhaul afs_validate() so that it does a single status fetch if we need to check the state of either the vnode or the volume - and do so under appropriate locking. The function does the following steps: (A) If the vnode/volume is no longer seen as valid, then we take the vnode validation lock and, if the volume promise has expired, the volume check lock also. The latter prevents redundant checks being made to find out if a new version of the volume got released. (B) If a previous RPC call found that the volsync changed unexpectedly or that a RO volume was updated, then we unmap all PTEs pointing to the file to stop mmap being used for access. (C) If the vnode is still seen to be of uncertain validity, then we perform an FS.FetchStatus RPC op to jointly update the volume status and the vnode status. This assessment is done as part of parsing the reply: If the RO volume creation timestamp advances, cb_ro_snapshot is incremented; if either the creation or update timestamps changes in an unexpected way, the cb_scrub counter is incremented If the Data Version returned doesn't match the copy we have locally, then we ask for the pagecache to be zapped. This takes care of handling RO update. (D) If cb_scrub differs between volume and vnode, the vnode's pagecache is zapped and the vnode's cb_scrub is updated unless the file is marked as having been deleted. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org diff 72904d7b Wed Oct 18 17:55:11 MDT 2023 David Howells <dhowells@redhat.com> rxrpc, afs: Allow afs to pin rxrpc_peer objects Change rxrpc's API such that: (1) A new function, rxrpc_kernel_lookup_peer(), is provided to look up an rxrpc_peer record for a remote address and a corresponding function, rxrpc_kernel_put_peer(), is provided to dispose of it again. (2) When setting up a call, the rxrpc_peer object used during a call is now passed in rather than being set up by rxrpc_connect_call(). For afs, this meenat passing it to rxrpc_kernel_begin_call() rather than the full address (the service ID then has to be passed in as a separate parameter). (3) A new function, rxrpc_kernel_remote_addr(), is added so that afs can get a pointer to the transport address for display purposed, and another, rxrpc_kernel_remote_srx(), to gain a pointer to the full rxrpc address. (4) The function to retrieve the RTT from a call, rxrpc_kernel_get_srtt(), is then altered to take a peer. This now returns the RTT or -1 if there are insufficient samples. (5) Rename rxrpc_kernel_get_peer() to rxrpc_kernel_call_get_peer(). (6) Provide a new function, rxrpc_kernel_get_peer(), to get a ref on a peer the caller already has. This allows the afs filesystem to pin the rxrpc_peer records that it is using, allowing faster lookups and pointer comparisons rather than comparing sockaddr_rxrpc contents. It also makes it easier to get hold of the RTT. The following changes are made to afs: (1) The addr_list struct's addrs[] elements now hold a peer struct pointer and a service ID rather than a sockaddr_rxrpc. (2) When displaying the transport address, rxrpc_kernel_remote_addr() is used. (3) The port arg is removed from afs_alloc_addrlist() since it's always overridden. (4) afs_merge_fs_addr4() and afs_merge_fs_addr6() do peer lookup and may now return an error that must be handled. (5) afs_find_server() now takes a peer pointer to specify the address. (6) afs_find_server(), afs_compare_fs_alists() and afs_merge_fs_addr[46]{} now do peer pointer comparison rather than address comparison. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org diff 72904d7b Wed Oct 18 17:55:11 MDT 2023 David Howells <dhowells@redhat.com> rxrpc, afs: Allow afs to pin rxrpc_peer objects Change rxrpc's API such that: (1) A new function, rxrpc_kernel_lookup_peer(), is provided to look up an rxrpc_peer record for a remote address and a corresponding function, rxrpc_kernel_put_peer(), is provided to dispose of it again. (2) When setting up a call, the rxrpc_peer object used during a call is now passed in rather than being set up by rxrpc_connect_call(). For afs, this meenat passing it to rxrpc_kernel_begin_call() rather than the full address (the service ID then has to be passed in as a separate parameter). (3) A new function, rxrpc_kernel_remote_addr(), is added so that afs can get a pointer to the transport address for display purposed, and another, rxrpc_kernel_remote_srx(), to gain a pointer to the full rxrpc address. (4) The function to retrieve the RTT from a call, rxrpc_kernel_get_srtt(), is then altered to take a peer. This now returns the RTT or -1 if there are insufficient samples. (5) Rename rxrpc_kernel_get_peer() to rxrpc_kernel_call_get_peer(). (6) Provide a new function, rxrpc_kernel_get_peer(), to get a ref on a peer the caller already has. This allows the afs filesystem to pin the rxrpc_peer records that it is using, allowing faster lookups and pointer comparisons rather than comparing sockaddr_rxrpc contents. It also makes it easier to get hold of the RTT. The following changes are made to afs: (1) The addr_list struct's addrs[] elements now hold a peer struct pointer and a service ID rather than a sockaddr_rxrpc. (2) When displaying the transport address, rxrpc_kernel_remote_addr() is used. (3) The port arg is removed from afs_alloc_addrlist() since it's always overridden. (4) afs_merge_fs_addr4() and afs_merge_fs_addr6() do peer lookup and may now return an error that must be handled. (5) afs_find_server() now takes a peer pointer to specify the address. (6) afs_find_server(), afs_compare_fs_alists() and afs_merge_fs_addr[46]{} now do peer pointer comparison rather than address comparison. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org diff 88c853c3 Tue Jul 23 04:24:59 MDT 2019 David Howells <dhowells@redhat.com> afs: Fix cell refcounting by splitting the usage counter Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com> diff 977e5f8e Fri Apr 17 10:31:26 MDT 2020 David Howells <dhowells@redhat.com> afs: Split the usage count on struct afs_server Split the usage count on the afs_server struct to have an active count that registers who's actually using it separately from the reference count on the object. This allows a future patch to dispatch polling probes without advancing the "unuse" time into the future each time we emit a probe, which would otherwise prevent unused server records from expiring. Included in this: (1) The latter part of afs_destroy_server() in which the RCU destruction of afs_server objects is invoked and the outstanding server count is decremented is split out into __afs_put_server(). (2) afs_put_server() now calls __afs_put_server() rather then setting the management timer. (3) The calls begun by afs_fs_give_up_all_callbacks() and afs_fs_get_capabilities() can now take a ref on the server record, so afs_destroy_server() can just drop its ref and needn't wait for the completion of these calls. They'll put the ref when they're done. (4) Because of (3), afs_fs_probe_done() no longer needs to wake up afs_destroy_server() with server->probe_outstanding. (5) afs_gc_servers can be simplified. It only needs to check if server->active is 0 rather than playing games with the refcount. (6) afs_manage_servers() can propose a server for gc if usage == 0 rather than if ref == 1. The gc is effected by (5). Signed-off-by: David Howells <dhowells@redhat.com> diff 0a5143f2 Fri Oct 19 17:57:57 MDT 2018 David Howells <dhowells@redhat.com> afs: Implement VL server rotation Track VL servers as independent entities rather than lumping all their addresses together into one set and implement server-level rotation by: (1) Add the concept of a VL server list, where each server has its own separate address list. This code is similar to the FS server list. (2) Use the DNS resolver to retrieve a set of servers and their associated addresses, ports, preference and weight ratings. (3) In the case of a legacy DNS resolver or an address list given directly through /proc/net/afs/cells, create a list containing just a dummy server record and attach all the addresses to that. (4) Implement a simple rotation policy, for the moment ignoring the priorities and weights assigned to the servers. (5) Show the address list through /proc/net/afs/<cell>/vlservers. This also displays the source and status of the data as indicated by the upcall. Signed-off-by: David Howells <dhowells@redhat.com> diff d2ddc776 Thu Nov 02 09:27:50 MDT 2017 David Howells <dhowells@redhat.com> afs: Overhaul volume and server record caching and fileserver rotation The current code assumes that volumes and servers are per-cell and are never shared, but this is not enforced, and, indeed, public cells do exist that are aliases of each other. Further, an organisation can, say, set up a public cell and a private cell with overlapping, but not identical, sets of servers. The difference is purely in the database attached to the VL servers. The current code will malfunction if it sees a server in two cells as it assumes global address -> server record mappings and that each server is in just one cell. Further, each server may have multiple addresses - and may have addresses of different families (IPv4 and IPv6, say). To this end, the following structural changes are made: (1) Server record management is overhauled: (a) Server records are made independent of cell. The namespace keeps track of them, volume records have lists of them and each vnode has a server on which its callback interest currently resides. (b) The cell record no longer keeps a list of servers known to be in that cell. (c) The server records are now kept in a flat list because there's no single address to sort on. (d) Server records are now keyed by their UUID within the namespace. (e) The addresses for a server are obtained with the VL.GetAddrsU rather than with VL.GetEntryByName, using the server's UUID as a parameter. (f) Cached server records are garbage collected after a period of non-use and are counted out of existence before purging is allowed to complete. This protects the work functions against rmmod. (g) The servers list is now in /proc/fs/afs/servers. (2) Volume record management is overhauled: (a) An RCU-replaceable server list is introduced. This tracks both servers and their coresponding callback interests. (b) The superblock is now keyed on cell record and numeric volume ID. (c) The volume record is now tied to the superblock which mounts it, and is activated when mounted and deactivated when unmounted. This makes it easier to handle the cache cookie without causing a double-use in fscache. (d) The volume record is loaded from the VLDB using VL.GetEntryByNameU to get the server UUID list. (e) The volume name is updated if it is seen to have changed when the volume is updated (the update is keyed on the volume ID). (3) The vlocation record is got rid of and VLDB records are no longer cached. Sufficient information is stored in the volume record, though an update to a volume record is now no longer shared between related volumes (volumes come in bundles of three: R/W, R/O and backup). and the following procedural changes are made: (1) The fileserver cursor introduced previously is now fleshed out and used to iterate over fileservers and their addresses. (2) Volume status is checked during iteration, and the server list is replaced if a change is detected. (3) Server status is checked during iteration, and the address list is replaced if a change is detected. (4) The abort code is saved into the address list cursor and -ECONNABORTED returned in afs_make_call() if a remote abort happened rather than translating the abort into an error message. This allows actions to be taken depending on the abort code more easily. (a) If a VMOVED abort is seen then this is handled by rechecking the volume and restarting the iteration. (b) If a VBUSY, VRESTARTING or VSALVAGING abort is seen then this is handled by sleeping for a short period and retrying and/or trying other servers that might serve that volume. A message is also displayed once until the condition has cleared. (c) If a VOFFLINE abort is seen, then this is handled as VBUSY for the moment. (d) If a VNOVOL abort is seen, the volume is rechecked in the VLDB to see if it has been deleted; if not, the fileserver is probably indicating that the volume couldn't be attached and needs salvaging. (e) If statfs() sees one of these aborts, it does not sleep, but rather returns an error, so as not to block the umount program. (5) The fileserver iteration functions in vnode.c are now merged into their callers and more heavily macroised around the cursor. vnode.c is removed. (6) Operations on a particular vnode are serialised on that vnode because the server will lock that vnode whilst it operates on it, so a second op sent will just have to wait. (7) Fileservers are probed with FS.GetCapabilities before being used. This is where service upgrade will be done. (8) A callback interest on a fileserver is set up before an FS operation is performed and passed through to afs_make_call() so that it can be set on the vnode if the operation returns a callback. The callback interest is passed through to afs_iget() also so that it can be set there too. In general, record updating is done on an as-needed basis when we try to access servers, volumes or vnodes rather than offloading it to work items and special threads. Notes: (1) Pre AFS-3.4 servers are no longer supported, though this can be added back if necessary (AFS-3.4 was released in 1998). (2) VBUSY is retried forever for the moment at intervals of 1s. (3) /proc/fs/afs/<cell>/servers no longer exists. Signed-off-by: David Howells <dhowells@redhat.com> diff 8b2a464c Thu Nov 02 09:27:50 MDT 2017 David Howells <dhowells@redhat.com> afs: Add an address list concept Add an RCU replaceable address list structure to hold a list of server addresses. The list also holds the To this end: (1) A cell's VL server address list can be loaded directly via insmod or echo to /proc/fs/afs/cells or dynamically from a DNS query for AFSDB or SRV records. (2) Anyone wanting to use a cell's VL server address must wait until the cell record comes online and has tried to obtain some addresses. (3) An FS server's address list, for the moment, has a single entry that is the key to the server list. This will change in the future when a server is instead keyed on its UUID and the VL.GetAddrsU operation is used. (4) An 'address cursor' concept is introduced to handle iteration through the address list. This is passed to the afs_make_call() as, in the future, stuff (such as abort code) that doesn't outlast the call will be returned in it. In the future, we might want to annotate the list with information about how each address fares. We might then want to propagate such annotations over address list replacement. Whilst we're at it, we allow IPv6 addresses to be specified in colon-delimited lists by enclosing them in square brackets. Signed-off-by: David Howells <dhowells@redhat.com> diff 989782dc Thu Nov 02 09:27:50 MDT 2017 David Howells <dhowells@redhat.com> afs: Overhaul cell database management Overhaul the way that the in-kernel AFS client keeps track of cells in the following manner: (1) Cells are now held in an rbtree to make walking them quicker and RCU managed (though this is probably overkill). (2) Cells now have a manager work item that: (A) Looks after fetching and refreshing the VL server list. (B) Manages cell record lifetime, including initialising and destruction. (B) Manages cell record caching whereby threads are kept around for a certain time after last use and then destroyed. (C) Manages the FS-Cache index cookie for a cell. It is not permitted for a cookie to be in use twice, so we have to be careful to not allow a new cell record to exist at the same time as an old record of the same name. (3) Each AFS network namespace is given a manager work item that manages the cells within it, maintaining a single timer to prod cells into updating their DNS records. This uses the reduce_timer() facility to make the timer expire at the soonest timed event that needs happening. (4) When a module is being unloaded, cells and cell managers are now counted out using dec_after_work() to make sure the module text is pinned until after the data structures have been cleaned up. (5) Each cell's VL server list is now protected by a seqlock rather than a semaphore. Signed-off-by: David Howells <dhowells@redhat.com> diff 4d9df986 Thu Nov 02 09:27:47 MDT 2017 David Howells <dhowells@redhat.com> afs: Keep and pass sockaddr_rxrpc addresses rather than in_addr Keep and pass sockaddr_rxrpc addresses around rather than keeping and passing in_addr addresses to allow for the use of IPv6 and non-standard port numbers in future. This also allows the port and service_id fields to be removed from the afs_call struct. Signed-off-by: David Howells <dhowells@redhat.com> |
/linux-master/fs/autofs/ | ||
H A D | autofs_i.h | diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
H A D | root.c | diff 797a1d89 Mon Jun 12 04:45:19 MDT 2023 Jeff Layton <jlayton@kernel.org> autofs: set ctime as well when mtime changes on a dir When adding entries to a directory, POSIX generally requires that the ctime also be updated alongside the mtime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Acked-by: Ian Kent <raven@themaw.net> Message-Id: <20230612104524.17058-4-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> diff 5a0e3ad6 Wed Mar 24 02:04:11 MDT 2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
/linux-master/fs/ | ||
H A D | bad_inode.c | diff b2441318 Wed Nov 01 08:07:57 MDT 2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org> License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> diff a528d35e Tue Jan 31 09:46:22 MST 2017 David Howells <dhowells@redhat.com> statx: Add a system call to make enhanced file info available Add a system call to make extended file information available, including file creation and some attribute flags where available through the underlying filesystem. The getattr inode operation is altered to take two additional arguments: a u32 request_mask and an unsigned int flags that indicate the synchronisation mode. This change is propagated to the vfs_getattr*() function. Functions like vfs_stat() are now inline wrappers around new functions vfs_statx() and vfs_statx_fd() to reduce stack usage. ======== OVERVIEW ======== The idea was initially proposed as a set of xattrs that could be retrieved with getxattr(), but the general preference proved to be for a new syscall with an extended stat structure. A number of requests were gathered for features to be included. The following have been included: (1) Make the fields a consistent size on all arches and make them large. (2) Spare space, request flags and information flags are provided for future expansion. (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an __s64). (4) Creation time: The SMB protocol carries the creation time, which could be exported by Samba, which will in turn help CIFS make use of FS-Cache as that can be used for coherency data (stx_btime). This is also specified in NFSv4 as a recommended attribute and could be exported by NFSD [Steve French]. (5) Lightweight stat: Ask for just those details of interest, and allow a netfs (such as NFS) to approximate anything not of interest, possibly without going to the server [Trond Myklebust, Ulrich Drepper, Andreas Dilger] (AT_STATX_DONT_SYNC). (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks its cached attributes are up to date [Trond Myklebust] (AT_STATX_FORCE_SYNC). And the following have been left out for future extension: (7) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. Can also be used to modify fill_post_wcc() in NFSD which retrieves i_version directly, but has just called vfs_getattr(). It could get it from the kstat struct if it used vfs_xgetattr() instead. (There's disagreement on the exact semantics of a single field, since not all filesystems do this the same way). (8) BSD stat compatibility: Including more fields from the BSD stat such as creation time (st_btime) and inode generation number (st_gen) [Jeremy Allison, Bernd Schubert]. (9) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd Schubert]. (This was asked for but later deemed unnecessary with the open-by-handle capability available and caused disagreement as to whether it's a security hole or not). (10) Extra coherency data may be useful in making backups [Andreas Dilger]. (No particular data were offered, but things like last backup timestamp, the data version number and the DOS archive bit would come into this category). (11) Allow the filesystem to indicate what it can/cannot provide: A filesystem can now say it doesn't support a standard stat feature if that isn't available, so if, for instance, inode numbers or UIDs don't exist or are fabricated locally... (This requires a separate system call - I have an fsinfo() call idea for this). (12) Store a 16-byte volume ID in the superblock that can be returned in struct xstat [Steve French]. (Deferred to fsinfo). (13) Include granularity fields in the time data to indicate the granularity of each of the times (NFSv4 time_delta) [Steve French]. (Deferred to fsinfo). (14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. Note that the Linux IOC flags are a mess and filesystems such as Ext4 define flags that aren't in linux/fs.h, so translation in the kernel may be a necessity (or, possibly, we provide the filesystem type too). (Some attributes are made available in stx_attributes, but the general feeling was that the IOC flags were to ext[234]-specific and shouldn't be exposed through statx this way). (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, Michael Kerrisk]. (Deferred, probably to fsinfo. Finding out if there's an ACL or seclabal might require extra filesystem operations). (16) Femtosecond-resolution timestamps [Dave Chinner]. (A __reserved field has been left in the statx_timestamp struct for this - if there proves to be a need). (17) A set multiple attributes syscall to go with this. =============== NEW SYSTEM CALL =============== The new system call is: int ret = statx(int dfd, const char *filename, unsigned int flags, unsigned int mask, struct statx *buffer); The dfd, filename and flags parameters indicate the file to query, in a similar way to fstatat(). There is no equivalent of lstat() as that can be emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is also no equivalent of fstat() as that can be emulated by passing a NULL filename to statx() with the fd of interest in dfd. Whether or not statx() synchronises the attributes with the backing store can be controlled by OR'ing a value into the flags argument (this typically only affects network filesystems): (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this respect. (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise its attributes with the server - which might require data writeback to occur to get the timestamps correct. (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a network filesystem. The resulting values should be considered approximate. mask is a bitmask indicating the fields in struct statx that are of interest to the caller. The user should set this to STATX_BASIC_STATS to get the basic set returned by stat(). It should be noted that asking for more information may entail extra I/O operations. buffer points to the destination for the data. This must be 256 bytes in size. ====================== MAIN ATTRIBUTES RECORD ====================== The following structures are defined in which to return the main attribute set: struct statx_timestamp { __s64 tv_sec; __s32 tv_nsec; __s32 __reserved; }; struct statx { __u32 stx_mask; __u32 stx_blksize; __u64 stx_attributes; __u32 stx_nlink; __u32 stx_uid; __u32 stx_gid; __u16 stx_mode; __u16 __spare0[1]; __u64 stx_ino; __u64 stx_size; __u64 stx_blocks; __u64 __spare1[1]; struct statx_timestamp stx_atime; struct statx_timestamp stx_btime; struct statx_timestamp stx_ctime; struct statx_timestamp stx_mtime; __u32 stx_rdev_major; __u32 stx_rdev_minor; __u32 stx_dev_major; __u32 stx_dev_minor; __u64 __spare2[14]; }; The defined bits in request_mask and stx_mask are: STATX_TYPE Want/got stx_mode & S_IFMT STATX_MODE Want/got stx_mode & ~S_IFMT STATX_NLINK Want/got stx_nlink STATX_UID Want/got stx_uid STATX_GID Want/got stx_gid STATX_ATIME Want/got stx_atime{,_ns} STATX_MTIME Want/got stx_mtime{,_ns} STATX_CTIME Want/got stx_ctime{,_ns} STATX_INO Want/got stx_ino STATX_SIZE Want/got stx_size STATX_BLOCKS Want/got stx_blocks STATX_BASIC_STATS [The stuff in the normal stat struct] STATX_BTIME Want/got stx_btime{,_ns} STATX_ALL [All currently available stuff] stx_btime is the file creation time, stx_mask is a bitmask indicating the data provided and __spares*[] are where as-yet undefined fields can be placed. Time fields are structures with separate seconds and nanoseconds fields plus a reserved field in case we want to add even finer resolution. Note that times will be negative if before 1970; in such a case, the nanosecond fields will also be negative if not zero. The bits defined in the stx_attributes field convey information about a file, how it is accessed, where it is and what it does. The following attributes map to FS_*_FL flags and are the same numerical value: STATX_ATTR_COMPRESSED File is compressed by the fs STATX_ATTR_IMMUTABLE File is marked immutable STATX_ATTR_APPEND File is append-only STATX_ATTR_NODUMP File is not to be dumped STATX_ATTR_ENCRYPTED File requires key to decrypt in fs Within the kernel, the supported flags are listed by: KSTAT_ATTR_FS_IOC_FLAGS [Are any other IOC flags of sufficient general interest to be exposed through this interface?] New flags include: STATX_ATTR_AUTOMOUNT Object is an automount trigger These are for the use of GUI tools that might want to mark files specially, depending on what they are. Fields in struct statx come in a number of classes: (0) stx_dev_*, stx_blksize. These are local system information and are always available. (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino, stx_size, stx_blocks. These will be returned whether the caller asks for them or not. The corresponding bits in stx_mask will be set to indicate whether they actually have valid values. If the caller didn't ask for them, then they may be approximated. For example, NFS won't waste any time updating them from the server, unless as a byproduct of updating something requested. If the values don't actually exist for the underlying object (such as UID or GID on a DOS file), then the bit won't be set in the stx_mask, even if the caller asked for the value. In such a case, the returned value will be a fabrication. Note that there are instances where the type might not be valid, for instance Windows reparse points. (2) stx_rdev_*. This will be set only if stx_mode indicates we're looking at a blockdev or a chardev, otherwise will be 0. (3) stx_btime. Similar to (1), except this will be set to 0 if it doesn't exist. ======= TESTING ======= The following test program can be used to test the statx system call: samples/statx/test-statx.c Just compile and run, passing it paths to the files you want to examine. The file is built automatically if CONFIG_SAMPLES is enabled. Here's some example output. Firstly, an NFS directory that crosses to another FSID. Note that the AUTOMOUNT attribute is set because transiting this directory will cause d_automount to be invoked by the VFS. [root@andromeda ~]# /tmp/test-statx -A /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:26 Inode: 1703937 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------) Secondly, the result of automounting on that directory. [root@andromeda ~]# /tmp/test-statx /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:27 Inode: 2 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff 4acdaf27 Mon Jul 25 23:42:34 MDT 2011 Al Viro <viro@zeniv.linux.org.uk> switch ->create() to umode_t vfs_create() ignores everything outside of 16bit subset of its mode argument; switching it to umode_t is obviously equivalent and it's the only caller of the method Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> |
H A D | binfmt_misc.c | diff 21ca59b3 Wed Oct 27 16:31:14 MDT 2021 Christian Brauner <brauner@kernel.org> binfmt_misc: enable sandboxed mounts Enable unprivileged sandboxes to create their own binfmt_misc mounts. This is based on Laurent's work in [1] but has been significantly reworked to fix various issues we identified in earlier versions. While binfmt_misc can currently only be mounted in the initial user namespace, binary types registered in this binfmt_misc instance are available to all sandboxes (Either by having them installed in the sandbox or by registering the binary type with the F flag causing the interpreter to be opened right away). So binfmt_misc binary types are already delegated to sandboxes implicitly. However, while a sandbox has access to all registered binary types in binfmt_misc a sandbox cannot currently register its own binary types in binfmt_misc. This has prevented various use-cases some of which were already outlined in [1] but we have a range of issues associated with this (cf. [3]-[5] below which are just a small sample). Extend binfmt_misc to be mountable in non-initial user namespaces. Similar to other filesystem such as nfsd, mqueue, and sunrpc we use keyed superblock management. The key determines whether we need to create a new superblock or can reuse an already existing one. We use the user namespace of the mount as key. This means a new binfmt_misc superblock is created once per user namespace creation. Subsequent mounts of binfmt_misc in the same user namespace will mount the same binfmt_misc instance. We explicitly do not create a new binfmt_misc superblock on every binfmt_misc mount as the semantics for load_misc_binary() line up with the keying model. This also allows us to retrieve the relevant binfmt_misc instance based on the caller's user namespace which can be done in a simple (bounded to 32 levels) loop. Similar to the current binfmt_misc semantics allowing access to the binary types in the initial binfmt_misc instance we do allow sandboxes access to their parent's binfmt_misc mounts if they do not have created a separate binfmt_misc instance. Overall, this will unblock the use-cases mentioned below and in general will also allow to support and harden execution of another architecture's binaries in tight sandboxes. For instance, using the unshare binary it possible to start a chroot of another architecture and configure the binfmt_misc interpreter without being root to run the binaries in this chroot and without requiring the host to modify its binary type handlers. Henning had already posted a few experiments in the cover letter at [1]. But here's an additional example where an unprivileged container registers qemu-user-static binary handlers for various binary types in its separate binfmt_misc mount and is then seamlessly able to start containers with a different architecture without affecting the host: root [lxc monitor] /var/snap/lxd/common/lxd/containers f1 1000000 \_ /sbin/init 1000000 \_ /lib/systemd/systemd-journald 1000000 \_ /lib/systemd/systemd-udevd 1000100 \_ /lib/systemd/systemd-networkd 1000101 \_ /lib/systemd/systemd-resolved 1000000 \_ /usr/sbin/cron -f 1000103 \_ /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only 1000000 \_ /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers 1000104 \_ /usr/sbin/rsyslogd -n -iNONE 1000000 \_ /lib/systemd/systemd-logind 1000000 \_ /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 vt220 1000107 \_ dnsmasq --conf-file=/dev/null -u lxc-dnsmasq --strict-order --bind-interfaces --pid-file=/run/lxc/dnsmasq.pid --liste 1000000 \_ [lxc monitor] /var/lib/lxc f1-s390x 1100000 \_ /usr/bin/qemu-s390x-static /sbin/init 1100000 \_ /usr/bin/qemu-s390x-static /lib/systemd/systemd-journald 1100000 \_ /usr/bin/qemu-s390x-static /usr/sbin/cron -f 1100103 \_ /usr/bin/qemu-s390x-static /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-ac 1100000 \_ /usr/bin/qemu-s390x-static /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers 1100104 \_ /usr/bin/qemu-s390x-static /usr/sbin/rsyslogd -n -iNONE 1100000 \_ /usr/bin/qemu-s390x-static /lib/systemd/systemd-logind 1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 vt220 1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/0 115200,38400,9600 vt220 1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/1 115200,38400,9600 vt220 1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/2 115200,38400,9600 vt220 1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/3 115200,38400,9600 vt220 1100000 \_ /usr/bin/qemu-s390x-static /lib/systemd/systemd-udevd [1]: https://lore.kernel.org/all/20191216091220.465626-1-laurent@vivier.eu [2]: https://discuss.linuxcontainers.org/t/binfmt-misc-permission-denied [3]: https://discuss.linuxcontainers.org/t/lxd-binfmt-support-for-qemu-static-interpreters [4]: https://discuss.linuxcontainers.org/t/3-1-0-binfmt-support-service-in-unprivileged-guest-requires-write-access-on-hosts-proc-sys-fs-binfmt-misc [5]: https://discuss.linuxcontainers.org/t/qemu-user-static-not-working-4-11 Link: https://lore.kernel.org/r/20191216091220.465626-2-laurent@vivier.eu (origin) Link: https://lore.kernel.org/r/20211028103114.2849140-2-brauner@kernel.org (v1) Cc: Sargun Dhillon <sargun@sargun.me> Cc: Serge Hallyn <serge@hallyn.com> Cc: Jann Horn <jannh@google.com> Cc: Henning Schild <henning.schild@siemens.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Laurent Vivier <laurent@vivier.eu> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> --- /* v2 */ - Serge Hallyn <serge@hallyn.com>: - Use GFP_KERNEL_ACCOUNT for userspace triggered allocations when a new binary type handler is registered. - Christian Brauner <christian.brauner@ubuntu.com>: - Switch authorship to me. I refused to do that earlier even though Laurent said I should do so because I think it's genuinely bad form. But by now I have changed so many things that it'd be unfair to blame Laurent for any potential bugs in here. - Add more comments that explain what's going on. - Rename functions while changing them to better reflect what they are doing to make the code easier to understand. - In the first version when a specific binary type handler was removed either through a write to the entry's file or all binary type handlers were removed by a write to the binfmt_misc mount's status file all cleanup work happened during inode eviction. That includes removal of the relevant entries from entry list. While that works fine I disliked that model after thinking about it for a bit. Because it means that there was a window were someone has already removed a or all binary handlers but they could still be safely reached from load_misc_binary() when it has managed to take the read_lock() on the entries list while inode eviction was already happening. Again, that perfectly benign but it's cleaner to remove the binary handler from the list immediately meaning that ones the write to then entry's file or the binfmt_misc status file returns the binary type cannot be executed anymore. That gives stronger guarantees to the user. diff 21ca59b3 Wed Oct 27 16:31:14 MDT 2021 Christian Brauner <brauner@kernel.org> binfmt_misc: enable sandboxed mounts Enable unprivileged sandboxes to create their own binfmt_misc mounts. This is based on Laurent's work in [1] but has been significantly reworked to fix various issues we identified in earlier versions. While binfmt_misc can currently only be mounted in the initial user namespace, binary types registered in this binfmt_misc instance are available to all sandboxes (Either by having them installed in the sandbox or by registering the binary type with the F flag causing the interpreter to be opened right away). So binfmt_misc binary types are already delegated to sandboxes implicitly. However, while a sandbox has access to all registered binary types in binfmt_misc a sandbox cannot currently register its own binary types in binfmt_misc. This has prevented various use-cases some of which were already outlined in [1] but we have a range of issues associated with this (cf. [3]-[5] below which are just a small sample). Extend binfmt_misc to be mountable in non-initial user namespaces. Similar to other filesystem such as nfsd, mqueue, and sunrpc we use keyed superblock management. The key determines whether we need to create a new superblock or can reuse an already existing one. We use the user namespace of the mount as key. This means a new binfmt_misc superblock is created once per user namespace creation. Subsequent mounts of binfmt_misc in the same user namespace will mount the same binfmt_misc instance. We explicitly do not create a new binfmt_misc superblock on every binfmt_misc mount as the semantics for load_misc_binary() line up with the keying model. This also allows us to retrieve the relevant binfmt_misc instance based on the caller's user namespace which can be done in a simple (bounded to 32 levels) loop. Similar to the current binfmt_misc semantics allowing access to the binary types in the initial binfmt_misc instance we do allow sandboxes access to their parent's binfmt_misc mounts if they do not have created a separate binfmt_misc instance. Overall, this will unblock the use-cases mentioned below and in general will also allow to support and harden execution of another architecture's binaries in tight sandboxes. For instance, using the unshare binary it possible to start a chroot of another architecture and configure the binfmt_misc interpreter without being root to run the binaries in this chroot and without requiring the host to modify its binary type handlers. Henning had already posted a few experiments in the cover letter at [1]. But here's an additional example where an unprivileged container registers qemu-user-static binary handlers for various binary types in its separate binfmt_misc mount and is then seamlessly able to start containers with a different architecture without affecting the host: root [lxc monitor] /var/snap/lxd/common/lxd/containers f1 1000000 \_ /sbin/init 1000000 \_ /lib/systemd/systemd-journald 1000000 \_ /lib/systemd/systemd-udevd 1000100 \_ /lib/systemd/systemd-networkd 1000101 \_ /lib/systemd/systemd-resolved 1000000 \_ /usr/sbin/cron -f 1000103 \_ /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only 1000000 \_ /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers 1000104 \_ /usr/sbin/rsyslogd -n -iNONE 1000000 \_ /lib/systemd/systemd-logind 1000000 \_ /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 vt220 1000107 \_ dnsmasq --conf-file=/dev/null -u lxc-dnsmasq --strict-order --bind-interfaces --pid-file=/run/lxc/dnsmasq.pid --liste 1000000 \_ [lxc monitor] /var/lib/lxc f1-s390x 1100000 \_ /usr/bin/qemu-s390x-static /sbin/init 1100000 \_ /usr/bin/qemu-s390x-static /lib/systemd/systemd-journald 1100000 \_ /usr/bin/qemu-s390x-static /usr/sbin/cron -f 1100103 \_ /usr/bin/qemu-s390x-static /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-ac 1100000 \_ /usr/bin/qemu-s390x-static /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers 1100104 \_ /usr/bin/qemu-s390x-static /usr/sbin/rsyslogd -n -iNONE 1100000 \_ /usr/bin/qemu-s390x-static /lib/systemd/systemd-logind 1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 vt220 1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/0 115200,38400,9600 vt220 1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/1 115200,38400,9600 vt220 1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/2 115200,38400,9600 vt220 1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/3 115200,38400,9600 vt220 1100000 \_ /usr/bin/qemu-s390x-static /lib/systemd/systemd-udevd [1]: https://lore.kernel.org/all/20191216091220.465626-1-laurent@vivier.eu [2]: https://discuss.linuxcontainers.org/t/binfmt-misc-permission-denied [3]: https://discuss.linuxcontainers.org/t/lxd-binfmt-support-for-qemu-static-interpreters [4]: https://discuss.linuxcontainers.org/t/3-1-0-binfmt-support-service-in-unprivileged-guest-requires-write-access-on-hosts-proc-sys-fs-binfmt-misc [5]: https://discuss.linuxcontainers.org/t/qemu-user-static-not-working-4-11 Link: https://lore.kernel.org/r/20191216091220.465626-2-laurent@vivier.eu (origin) Link: https://lore.kernel.org/r/20211028103114.2849140-2-brauner@kernel.org (v1) Cc: Sargun Dhillon <sargun@sargun.me> Cc: Serge Hallyn <serge@hallyn.com> Cc: Jann Horn <jannh@google.com> Cc: Henning Schild <henning.schild@siemens.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Laurent Vivier <laurent@vivier.eu> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> --- /* v2 */ - Serge Hallyn <serge@hallyn.com>: - Use GFP_KERNEL_ACCOUNT for userspace triggered allocations when a new binary type handler is registered. - Christian Brauner <christian.brauner@ubuntu.com>: - Switch authorship to me. I refused to do that earlier even though Laurent said I should do so because I think it's genuinely bad form. But by now I have changed so many things that it'd be unfair to blame Laurent for any potential bugs in here. - Add more comments that explain what's going on. - Rename functions while changing them to better reflect what they are doing to make the code easier to understand. - In the first version when a specific binary type handler was removed either through a write to the entry's file or all binary type handlers were removed by a write to the binfmt_misc mount's status file all cleanup work happened during inode eviction. That includes removal of the relevant entries from entry list. While that works fine I disliked that model after thinking about it for a bit. Because it means that there was a window were someone has already removed a or all binary handlers but they could still be safely reached from load_misc_binary() when it has managed to take the read_lock() on the entries list while inode eviction was already happening. Again, that perfectly benign but it's cleaner to remove the binary handler from the list immediately meaning that ones the write to then entry's file or the binfmt_misc status file returns the binary type cannot be executed anymore. That gives stronger guarantees to the user. diff 1c5976ef Wed Oct 27 16:31:13 MDT 2021 Christian Brauner <brauner@kernel.org> binfmt_misc: cleanup on filesystem umount Currently, registering a new binary type pins the binfmt_misc filesystem. Specifically, this means that as long as there is at least one binary type registered the binfmt_misc filesystem survives all umounts, i.e. the superblock is not destroyed. Meaning that a umount followed by another mount will end up with the same superblock and the same binary type handlers. This is a behavior we tend to discourage for any new filesystems (apart from a few special filesystems such as e.g. configfs or debugfs). A umount operation without the filesystem being pinned - by e.g. someone holding a file descriptor to an open file - should usually result in the destruction of the superblock and all associated resources. This makes introspection easier and leads to clearly defined, simple and clean semantics. An administrator can rely on the fact that a umount will guarantee a clean slate making it possible to reinitialize a filesystem. Right now all binary types would need to be explicitly deleted before that can happen. This allows us to remove the heavy-handed calls to simple_pin_fs() and simple_release_fs() when creating and deleting binary types. This in turn allows us to replace the current brittle pinning mechanism abusing dget() which has caused a range of bugs judging from prior fixes in [2] and [3]. The additional dget() in load_misc_binary() pins the dentry but only does so for the sake to prevent ->evict_inode() from freeing the node when a user removes the binary type and kill_node() is run. Which would mean ->interpreter and ->interp_file would be freed causing a UAF. This isn't really nicely documented nor is it very clean because it relies on simple_pin_fs() pinning the filesystem as long as at least one binary type exists. Otherwise it would cause load_misc_binary() to hold on to a dentry belonging to a superblock that has been shutdown. Replace that implicit pinning with a clean and simple per-node refcount and get rid of the ugly dget() pinning. A similar mechanism exists for e.g. binderfs (cf. [4]). All the cleanup work can now be done in ->evict_inode(). In a follow-up patch we will make it possible to use binfmt_misc in sandboxes. We will use the cleaner semantics where a umount for the filesystem will cause the superblock and all resources to be deallocated. In preparation for this apply the same semantics to the initial binfmt_misc mount. Note, that this is a user-visible change and as such a uapi change but one that we can reasonably risk. We've discussed this in earlier versions of this patchset (cf. [1]). The main user and provider of binfmt_misc is systemd. Systemd provides binfmt_misc via autofs since it is configurable as a kernel module and is used by a few exotic packages and users. As such a binfmt_misc mount is triggered when /proc/sys/fs/binfmt_misc is accessed and is only provided on demand. Other autofs on demand filesystems include EFI ESP which systemd umounts if the mountpoint stays idle for a certain amount of time. This doesn't apply to the binfmt_misc autofs mount which isn't touched once it is mounted meaning this change can't accidently wipe binary type handlers without someone having explicitly unmounted binfmt_misc. After speaking to systemd folks they don't expect this change to affect them. In line with our general policy, if we see a regression for systemd or other users with this change we will switch back to the old behavior for the initial binfmt_misc mount and have binary types pin the filesystem again. But while we touch this code let's take the chance and let's improve on the status quo. [1]: https://lore.kernel.org/r/20191216091220.465626-2-laurent@vivier.eu [2]: commit 43a4f2619038 ("exec: binfmt_misc: fix race between load_misc_binary() and kill_node()" [3]: commit 83f918274e4b ("exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()") [4]: commit f0fe2c0f050d ("binder: prevent UAF for binderfs devices II") Link: https://lore.kernel.org/r/20211028103114.2849140-1-brauner@kernel.org (v1) Cc: Sargun Dhillon <sargun@sargun.me> Cc: Serge Hallyn <serge@hallyn.com> Cc: Jann Horn <jannh@google.com> Cc: Henning Schild <henning.schild@siemens.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Laurent Vivier <laurent@vivier.eu> Cc: linux-fsdevel@vger.kernel.org Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> --- /* v2 */ - Christian Brauner <christian.brauner@ubuntu.com>: - Add more comments that explain what's going on. - Rename functions while changing them to better reflect what they are doing to make the code easier to understand. - In the first version when a specific binary type handler was removed either through a write to the entry's file or all binary type handlers were removed by a write to the binfmt_misc mount's status file all cleanup work happened during inode eviction. That includes removal of the relevant entries from entry list. While that works fine I disliked that model after thinking about it for a bit. Because it means that there was a window were someone has already removed a or all binary handlers but they could still be safely reached from load_misc_binary() when it has managed to take the read_lock() on the entries list while inode eviction was already happening. Again, that perfectly benign but it's cleaner to remove the binary handler from the list immediately meaning that ones the write to then entry's file or the binfmt_misc status file returns the binary type cannot be executed anymore. That gives stronger guarantees to the user. diff 1c5976ef Wed Oct 27 16:31:13 MDT 2021 Christian Brauner <brauner@kernel.org> binfmt_misc: cleanup on filesystem umount Currently, registering a new binary type pins the binfmt_misc filesystem. Specifically, this means that as long as there is at least one binary type registered the binfmt_misc filesystem survives all umounts, i.e. the superblock is not destroyed. Meaning that a umount followed by another mount will end up with the same superblock and the same binary type handlers. This is a behavior we tend to discourage for any new filesystems (apart from a few special filesystems such as e.g. configfs or debugfs). A umount operation without the filesystem being pinned - by e.g. someone holding a file descriptor to an open file - should usually result in the destruction of the superblock and all associated resources. This makes introspection easier and leads to clearly defined, simple and clean semantics. An administrator can rely on the fact that a umount will guarantee a clean slate making it possible to reinitialize a filesystem. Right now all binary types would need to be explicitly deleted before that can happen. This allows us to remove the heavy-handed calls to simple_pin_fs() and simple_release_fs() when creating and deleting binary types. This in turn allows us to replace the current brittle pinning mechanism abusing dget() which has caused a range of bugs judging from prior fixes in [2] and [3]. The additional dget() in load_misc_binary() pins the dentry but only does so for the sake to prevent ->evict_inode() from freeing the node when a user removes the binary type and kill_node() is run. Which would mean ->interpreter and ->interp_file would be freed causing a UAF. This isn't really nicely documented nor is it very clean because it relies on simple_pin_fs() pinning the filesystem as long as at least one binary type exists. Otherwise it would cause load_misc_binary() to hold on to a dentry belonging to a superblock that has been shutdown. Replace that implicit pinning with a clean and simple per-node refcount and get rid of the ugly dget() pinning. A similar mechanism exists for e.g. binderfs (cf. [4]). All the cleanup work can now be done in ->evict_inode(). In a follow-up patch we will make it possible to use binfmt_misc in sandboxes. We will use the cleaner semantics where a umount for the filesystem will cause the superblock and all resources to be deallocated. In preparation for this apply the same semantics to the initial binfmt_misc mount. Note, that this is a user-visible change and as such a uapi change but one that we can reasonably risk. We've discussed this in earlier versions of this patchset (cf. [1]). The main user and provider of binfmt_misc is systemd. Systemd provides binfmt_misc via autofs since it is configurable as a kernel module and is used by a few exotic packages and users. As such a binfmt_misc mount is triggered when /proc/sys/fs/binfmt_misc is accessed and is only provided on demand. Other autofs on demand filesystems include EFI ESP which systemd umounts if the mountpoint stays idle for a certain amount of time. This doesn't apply to the binfmt_misc autofs mount which isn't touched once it is mounted meaning this change can't accidently wipe binary type handlers without someone having explicitly unmounted binfmt_misc. After speaking to systemd folks they don't expect this change to affect them. In line with our general policy, if we see a regression for systemd or other users with this change we will switch back to the old behavior for the initial binfmt_misc mount and have binary types pin the filesystem again. But while we touch this code let's take the chance and let's improve on the status quo. [1]: https://lore.kernel.org/r/20191216091220.465626-2-laurent@vivier.eu [2]: commit 43a4f2619038 ("exec: binfmt_misc: fix race between load_misc_binary() and kill_node()" [3]: commit 83f918274e4b ("exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()") [4]: commit f0fe2c0f050d ("binder: prevent UAF for binderfs devices II") Link: https://lore.kernel.org/r/20211028103114.2849140-1-brauner@kernel.org (v1) Cc: Sargun Dhillon <sargun@sargun.me> Cc: Serge Hallyn <serge@hallyn.com> Cc: Jann Horn <jannh@google.com> Cc: Henning Schild <henning.schild@siemens.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Laurent Vivier <laurent@vivier.eu> Cc: linux-fsdevel@vger.kernel.org Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> --- /* v2 */ - Christian Brauner <christian.brauner@ubuntu.com>: - Add more comments that explain what's going on. - Rename functions while changing them to better reflect what they are doing to make the code easier to understand. - In the first version when a specific binary type handler was removed either through a write to the entry's file or all binary type handlers were removed by a write to the binfmt_misc mount's status file all cleanup work happened during inode eviction. That includes removal of the relevant entries from entry list. While that works fine I disliked that model after thinking about it for a bit. Because it means that there was a window were someone has already removed a or all binary handlers but they could still be safely reached from load_misc_binary() when it has managed to take the read_lock() on the entries list while inode eviction was already happening. Again, that perfectly benign but it's cleaner to remove the binary handler from the list immediately meaning that ones the write to then entry's file or the binfmt_misc status file returns the binary type cannot be executed anymore. That gives stronger guarantees to the user. diff e7850f4d Fri Mar 12 22:07:41 MST 2021 Lior Ribak <liorribak@gmail.com> binfmt_misc: fix possible deadlock in bm_register_write There is a deadlock in bm_register_write: First, in the begining of the function, a lock is taken on the binfmt_misc root inode with inode_lock(d_inode(root)). Then, if the user used the MISC_FMT_OPEN_FILE flag, the function will call open_exec on the user-provided interpreter. open_exec will call a path lookup, and if the path lookup process includes the root of binfmt_misc, it will try to take a shared lock on its inode again, but it is already locked, and the code will get stuck in a deadlock To reproduce the bug: $ echo ":iiiii:E::ii::/proc/sys/fs/binfmt_misc/bla:F" > /proc/sys/fs/binfmt_misc/register backtrace of where the lock occurs (#5): 0 schedule () at ./arch/x86/include/asm/current.h:15 1 0xffffffff81b51237 in rwsem_down_read_slowpath (sem=0xffff888003b202e0, count=<optimized out>, state=state@entry=2) at kernel/locking/rwsem.c:992 2 0xffffffff81b5150a in __down_read_common (state=2, sem=<optimized out>) at kernel/locking/rwsem.c:1213 3 __down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1222 4 down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1355 5 0xffffffff811ee22a in inode_lock_shared (inode=<optimized out>) at ./include/linux/fs.h:783 6 open_last_lookups (op=0xffffc9000022fe34, file=0xffff888004098600, nd=0xffffc9000022fd10) at fs/namei.c:3177 7 path_openat (nd=nd@entry=0xffffc9000022fd10, op=op@entry=0xffffc9000022fe34, flags=flags@entry=65) at fs/namei.c:3366 8 0xffffffff811efe1c in do_filp_open (dfd=<optimized out>, pathname=pathname@entry=0xffff8880031b9000, op=op@entry=0xffffc9000022fe34) at fs/namei.c:3396 9 0xffffffff811e493f in do_open_execat (fd=fd@entry=-100, name=name@entry=0xffff8880031b9000, flags=<optimized out>, flags@entry=0) at fs/exec.c:913 10 0xffffffff811e4a92 in open_exec (name=<optimized out>) at fs/exec.c:948 11 0xffffffff8124aa84 in bm_register_write (file=<optimized out>, buffer=<optimized out>, count=19, ppos=<optimized out>) at fs/binfmt_misc.c:682 12 0xffffffff811decd2 in vfs_write (file=file@entry=0xffff888004098500, buf=buf@entry=0xa758d0 ":iiiii:E::ii::i:CF ", count=count@entry=19, pos=pos@entry=0xffffc9000022ff10) at fs/read_write.c:603 13 0xffffffff811defda in ksys_write (fd=<optimized out>, buf=0xa758d0 ":iiiii:E::ii::i:CF ", count=19) at fs/read_write.c:658 14 0xffffffff81b49813 in do_syscall_64 (nr=<optimized out>, regs=0xffffc9000022ff58) at arch/x86/entry/common.c:46 15 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120 To solve the issue, the open_exec call is moved to before the write lock is taken by bm_register_write Link: https://lkml.kernel.org/r/20210228224414.95962-1-liorribak@gmail.com Fixes: 948b701a607f1 ("binfmt_misc: add persistent opened binary handler for containers") Signed-off-by: Lior Ribak <liorribak@gmail.com> Acked-by: Helge Deller <deller@gmx.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 51f39a1f Fri Dec 12 17:57:29 MST 2014 David Drysdale <drysdale@google.com> syscalls: implement execveat() system call This patchset adds execveat(2) for x86, and is derived from Meredydd Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528). The primary aim of adding an execveat syscall is to allow an implementation of fexecve(3) that does not rely on the /proc filesystem, at least for executables (rather than scripts). The current glibc version of fexecve(3) is implemented via /proc, which causes problems in sandboxed or otherwise restricted environments. Given the desire for a /proc-free fexecve() implementation, HPA suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be an appropriate generalization. Also, having a new syscall means that it can take a flags argument without back-compatibility concerns. The current implementation just defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be added in future -- for example, flags for new namespaces (as suggested at https://lkml.org/lkml/2006/7/11/474). Related history: - https://lkml.org/lkml/2006/12/27/123 is an example of someone realizing that fexecve() is likely to fail in a chroot environment. - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered documenting the /proc requirement of fexecve(3) in its manpage, to "prevent other people from wasting their time". - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a problem where a process that did setuid() could not fexecve() because it no longer had access to /proc/self/fd; this has since been fixed. This patch (of 4): Add a new execveat(2) system call. execveat() is to execve() as openat() is to open(): it takes a file descriptor that refers to a directory, and resolves the filename relative to that. In addition, if the filename is empty and AT_EMPTY_PATH is specified, execveat() executes the file to which the file descriptor refers. This replicates the functionality of fexecve(), which is a system call in other UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and so relies on /proc being mounted). The filename fed to the executed program as argv[0] (or the name of the script fed to a script interpreter) will be of the form "/dev/fd/<fd>" (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively reflecting how the executable was found. This does however mean that execution of a script in a /proc-less environment won't work; also, script execution via an O_CLOEXEC file descriptor fails (as the file will not be accessible after exec). Based on patches by Meredydd Luff. Signed-off-by: David Drysdale <drysdale@google.com> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: Shuah Khan <shuah.kh@samsung.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rich Felker <dalias@aerifal.cx> Cc: Christoph Hellwig <hch@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 6b899c4e Wed Dec 10 16:52:08 MST 2014 Mike Frysinger <vapier@gentoo.org> binfmt_misc: add comments & debug logs When trying to develop a custom format handler, the errors returned all effectively get bucketed as EINVAL with no kernel messages. The other errors (ENOMEM/EFAULT) are internal/obvious and basic. Thus any time a bad handler is rejected, the developer has to walk the dense code and try to guess where it went wrong. Needing to dive into kernel code is itself a fairly high barrier for a lot of people. To improve this situation, let's deploy extensive pr_debug markers at logical parse points, and add comments to the dense parsing logic. It let's you see exactly where the parsing aborts, the string the kernel received (useful when dealing with shell code), how it translated the buffers to binary data, and how it will apply the mask at runtime. Some example output: $ echo ':qemu-foo:M::\x7fELF\xAD\xAD\x01\x00:\xff\xff\xff\xff\xff\x00\xff\x00:/usr/bin/qemu-foo:POC' > register $ dmesg binfmt_misc: register: received 92 bytes binfmt_misc: register: delim: 0x3a {:} binfmt_misc: register: name: {qemu-foo} binfmt_misc: register: type: M (magic) binfmt_misc: register: offset: 0x0 binfmt_misc: register: magic[raw]: 5c 78 37 66 45 4c 46 5c 78 41 44 5c 78 41 44 5c \x7fELF\xAD\xAD\ binfmt_misc: register: magic[raw]: 78 30 31 5c 78 30 30 00 x01\x00. binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 66 66 5c 78 66 66 5c 78 66 66 \xff\xff\xff\xff binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 30 30 5c 78 66 66 5c 78 30 30 \xff\x00\xff\x00 binfmt_misc: register: mask[raw]: 00 . binfmt_misc: register: magic/mask length: 8 binfmt_misc: register: magic[decoded]: 7f 45 4c 46 ad ad 01 00 .ELF.... binfmt_misc: register: mask[decoded]: ff ff ff ff ff 00 ff 00 ........ binfmt_misc: register: magic[masked]: 7f 45 4c 46 ad 00 01 00 .ELF.... binfmt_misc: register: interpreter: {/usr/bin/qemu-foo} binfmt_misc: register: flag: P (preserve argv0) binfmt_misc: register: flag: O (open binary) binfmt_misc: register: flag: C (preserve creds) The [raw] lines show us exactly what was received from userspace. The lines after that show us how the kernel has decoded things. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 6b899c4e Wed Dec 10 16:52:08 MST 2014 Mike Frysinger <vapier@gentoo.org> binfmt_misc: add comments & debug logs When trying to develop a custom format handler, the errors returned all effectively get bucketed as EINVAL with no kernel messages. The other errors (ENOMEM/EFAULT) are internal/obvious and basic. Thus any time a bad handler is rejected, the developer has to walk the dense code and try to guess where it went wrong. Needing to dive into kernel code is itself a fairly high barrier for a lot of people. To improve this situation, let's deploy extensive pr_debug markers at logical parse points, and add comments to the dense parsing logic. It let's you see exactly where the parsing aborts, the string the kernel received (useful when dealing with shell code), how it translated the buffers to binary data, and how it will apply the mask at runtime. Some example output: $ echo ':qemu-foo:M::\x7fELF\xAD\xAD\x01\x00:\xff\xff\xff\xff\xff\x00\xff\x00:/usr/bin/qemu-foo:POC' > register $ dmesg binfmt_misc: register: received 92 bytes binfmt_misc: register: delim: 0x3a {:} binfmt_misc: register: name: {qemu-foo} binfmt_misc: register: type: M (magic) binfmt_misc: register: offset: 0x0 binfmt_misc: register: magic[raw]: 5c 78 37 66 45 4c 46 5c 78 41 44 5c 78 41 44 5c \x7fELF\xAD\xAD\ binfmt_misc: register: magic[raw]: 78 30 31 5c 78 30 30 00 x01\x00. binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 66 66 5c 78 66 66 5c 78 66 66 \xff\xff\xff\xff binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 30 30 5c 78 66 66 5c 78 30 30 \xff\x00\xff\x00 binfmt_misc: register: mask[raw]: 00 . binfmt_misc: register: magic/mask length: 8 binfmt_misc: register: magic[decoded]: 7f 45 4c 46 ad ad 01 00 .ELF.... binfmt_misc: register: mask[decoded]: ff ff ff ff ff 00 ff 00 ........ binfmt_misc: register: magic[masked]: 7f 45 4c 46 ad 00 01 00 .ELF.... binfmt_misc: register: interpreter: {/usr/bin/qemu-foo} binfmt_misc: register: flag: P (preserve argv0) binfmt_misc: register: flag: O (open binary) binfmt_misc: register: flag: C (preserve creds) The [raw] lines show us exactly what was received from userspace. The lines after that show us how the kernel has decoded things. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 6b899c4e Wed Dec 10 16:52:08 MST 2014 Mike Frysinger <vapier@gentoo.org> binfmt_misc: add comments & debug logs When trying to develop a custom format handler, the errors returned all effectively get bucketed as EINVAL with no kernel messages. The other errors (ENOMEM/EFAULT) are internal/obvious and basic. Thus any time a bad handler is rejected, the developer has to walk the dense code and try to guess where it went wrong. Needing to dive into kernel code is itself a fairly high barrier for a lot of people. To improve this situation, let's deploy extensive pr_debug markers at logical parse points, and add comments to the dense parsing logic. It let's you see exactly where the parsing aborts, the string the kernel received (useful when dealing with shell code), how it translated the buffers to binary data, and how it will apply the mask at runtime. Some example output: $ echo ':qemu-foo:M::\x7fELF\xAD\xAD\x01\x00:\xff\xff\xff\xff\xff\x00\xff\x00:/usr/bin/qemu-foo:POC' > register $ dmesg binfmt_misc: register: received 92 bytes binfmt_misc: register: delim: 0x3a {:} binfmt_misc: register: name: {qemu-foo} binfmt_misc: register: type: M (magic) binfmt_misc: register: offset: 0x0 binfmt_misc: register: magic[raw]: 5c 78 37 66 45 4c 46 5c 78 41 44 5c 78 41 44 5c \x7fELF\xAD\xAD\ binfmt_misc: register: magic[raw]: 78 30 31 5c 78 30 30 00 x01\x00. binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 66 66 5c 78 66 66 5c 78 66 66 \xff\xff\xff\xff binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 30 30 5c 78 66 66 5c 78 30 30 \xff\x00\xff\x00 binfmt_misc: register: mask[raw]: 00 . binfmt_misc: register: magic/mask length: 8 binfmt_misc: register: magic[decoded]: 7f 45 4c 46 ad ad 01 00 .ELF.... binfmt_misc: register: mask[decoded]: ff ff ff ff ff 00 ff 00 ........ binfmt_misc: register: magic[masked]: 7f 45 4c 46 ad 00 01 00 .ELF.... binfmt_misc: register: interpreter: {/usr/bin/qemu-foo} binfmt_misc: register: flag: P (preserve argv0) binfmt_misc: register: flag: O (open binary) binfmt_misc: register: flag: C (preserve creds) The [raw] lines show us exactly what was received from userspace. The lines after that show us how the kernel has decoded things. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff bbaecc08 Mon Oct 13 16:52:03 MDT 2014 Mike Frysinger <vapier@gentoo.org> binfmt_misc: expand the register format limit to 1920 bytes The current code places a 256 byte limit on the registration format. This ends up being fairly limited when you try to do matching against a binary format like ELF: - the magic & mask formats cannot have any embedded NUL chars (string_unescape_inplace halts at the first NUL) - each escape sequence quadruples the size: \x00 is needed for NUL - trying to match bytes at the start of the file as well as further on leads to a lot of \x00 sequences in the mask - magic & mask have to be the same length (when decoded) - still need bytes for the other fields - impossible! Let's look at a concrete (and common) example: using QEMU to run MIPS ELFs. The name field uses 11 bytes "qemu-mipsel". The interp uses 20 bytes "/usr/bin/qemu-mipsel". The type & flags takes up 4 bytes. We need 7 bytes for the delimiter (usually ":"). We can skip offset. So already we're down to 107 bytes to use with the magic/mask instead of the real limit of 128 (BINPRM_BUF_SIZE). If people use shell code to register (which they do the majority of the time), they're down to ~26 possible bytes since the escape sequence must be \x##. The ELF format looks like (both 32 & 64 bit): e_ident: 16 bytes e_type: 2 bytes e_machine: 2 bytes Those 20 bytes are enough for most architectures because they have so few formats in the first place, thus they can be uniquely identified. That also means for shell users, since 20 is smaller than 26, they can sanely register a handler. But for some targets (like MIPS), we need to poke further. The ELF fields continue on: e_entry: 4 or 8 bytes e_phoff: 4 or 8 bytes e_shoff: 4 or 8 bytes e_flags: 4 bytes We only care about e_flags here as that includes the bits to identify whether the ELF is O32/N32/N64. But now we have to consume another 16 bytes (for 32 bit ELFs) or 28 bytes (for 64 bit ELFs) just to match the flags. If every byte is escaped, we send 288 more bytes to the kernel ((20 {e_ident,e_type,e_machine} + 12 {e_entry,e_phoff,e_shoff} + 4 {e_flags}) * 2 {mask,magic} * 4 {escape}) and we've clearly blown our budget. Even if we try to be clever and do the decoding ourselves (rather than relying on the kernel to process \x##), we still can't hit the mark -- string_unescape_inplace treats mask & magic as C strings so NUL cannot be embedded. That leaves us with having to pass \x00 for the 12/24 entry/phoff/shoff bytes (as those will be completely random addresses), and that is a minimum requirement of 48/96 bytes for the mask alone. Add up the rest and we blow through it (this is for 64 bit ELFs): magic: 20 {e_ident,e_type,e_machine} + 24 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 48 # ^^ See note below. mask: 20 {e_ident,e_type,e_machine} + 96 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 120 Remember above we had 107 left over, and now we're at 168. This is of course the *best* case scenario -- you'll also want to have NUL bytes in the magic & mask too to match literal zeros. Note: the reason we can use 24 in the magic is that we can work off of the fact that for bytes the mask would clobber, we can stuff any value into magic that we want. So when mask is \x00, we don't need the magic to also be \x00, it can be an unescaped raw byte like '!'. This lets us handle more formats (barely) under the current 256 limit, but that's a pretty tall hoop to force people to jump through. With all that said, let's bump the limit from 256 bytes to 1920. This way we support escaping every byte of the mask & magic field (which is 1024 bytes by themselves -- 128 * 4 * 2), and we leave plenty of room for other fields. Like long paths to the interpreter (when you have source in your /really/long/homedir/qemu/foo). Since the current code stuffs more than one structure into the same buffer, we leave a bit of space to easily round up to 2k. 1920 is just as arbitrary as 256 ;). Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff bbaecc08 Mon Oct 13 16:52:03 MDT 2014 Mike Frysinger <vapier@gentoo.org> binfmt_misc: expand the register format limit to 1920 bytes The current code places a 256 byte limit on the registration format. This ends up being fairly limited when you try to do matching against a binary format like ELF: - the magic & mask formats cannot have any embedded NUL chars (string_unescape_inplace halts at the first NUL) - each escape sequence quadruples the size: \x00 is needed for NUL - trying to match bytes at the start of the file as well as further on leads to a lot of \x00 sequences in the mask - magic & mask have to be the same length (when decoded) - still need bytes for the other fields - impossible! Let's look at a concrete (and common) example: using QEMU to run MIPS ELFs. The name field uses 11 bytes "qemu-mipsel". The interp uses 20 bytes "/usr/bin/qemu-mipsel". The type & flags takes up 4 bytes. We need 7 bytes for the delimiter (usually ":"). We can skip offset. So already we're down to 107 bytes to use with the magic/mask instead of the real limit of 128 (BINPRM_BUF_SIZE). If people use shell code to register (which they do the majority of the time), they're down to ~26 possible bytes since the escape sequence must be \x##. The ELF format looks like (both 32 & 64 bit): e_ident: 16 bytes e_type: 2 bytes e_machine: 2 bytes Those 20 bytes are enough for most architectures because they have so few formats in the first place, thus they can be uniquely identified. That also means for shell users, since 20 is smaller than 26, they can sanely register a handler. But for some targets (like MIPS), we need to poke further. The ELF fields continue on: e_entry: 4 or 8 bytes e_phoff: 4 or 8 bytes e_shoff: 4 or 8 bytes e_flags: 4 bytes We only care about e_flags here as that includes the bits to identify whether the ELF is O32/N32/N64. But now we have to consume another 16 bytes (for 32 bit ELFs) or 28 bytes (for 64 bit ELFs) just to match the flags. If every byte is escaped, we send 288 more bytes to the kernel ((20 {e_ident,e_type,e_machine} + 12 {e_entry,e_phoff,e_shoff} + 4 {e_flags}) * 2 {mask,magic} * 4 {escape}) and we've clearly blown our budget. Even if we try to be clever and do the decoding ourselves (rather than relying on the kernel to process \x##), we still can't hit the mark -- string_unescape_inplace treats mask & magic as C strings so NUL cannot be embedded. That leaves us with having to pass \x00 for the 12/24 entry/phoff/shoff bytes (as those will be completely random addresses), and that is a minimum requirement of 48/96 bytes for the mask alone. Add up the rest and we blow through it (this is for 64 bit ELFs): magic: 20 {e_ident,e_type,e_machine} + 24 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 48 # ^^ See note below. mask: 20 {e_ident,e_type,e_machine} + 96 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 120 Remember above we had 107 left over, and now we're at 168. This is of course the *best* case scenario -- you'll also want to have NUL bytes in the magic & mask too to match literal zeros. Note: the reason we can use 24 in the magic is that we can work off of the fact that for bytes the mask would clobber, we can stuff any value into magic that we want. So when mask is \x00, we don't need the magic to also be \x00, it can be an unescaped raw byte like '!'. This lets us handle more formats (barely) under the current 256 limit, but that's a pretty tall hoop to force people to jump through. With all that said, let's bump the limit from 256 bytes to 1920. This way we support escaping every byte of the mask & magic field (which is 1024 bytes by themselves -- 128 * 4 * 2), and we leave plenty of room for other fields. Like long paths to the interpreter (when you have source in your /really/long/homedir/qemu/foo). Since the current code stuffs more than one structure into the same buffer, we leave a bit of space to easily round up to 2k. 1920 is just as arbitrary as 256 ;). Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff bbaecc08 Mon Oct 13 16:52:03 MDT 2014 Mike Frysinger <vapier@gentoo.org> binfmt_misc: expand the register format limit to 1920 bytes The current code places a 256 byte limit on the registration format. This ends up being fairly limited when you try to do matching against a binary format like ELF: - the magic & mask formats cannot have any embedded NUL chars (string_unescape_inplace halts at the first NUL) - each escape sequence quadruples the size: \x00 is needed for NUL - trying to match bytes at the start of the file as well as further on leads to a lot of \x00 sequences in the mask - magic & mask have to be the same length (when decoded) - still need bytes for the other fields - impossible! Let's look at a concrete (and common) example: using QEMU to run MIPS ELFs. The name field uses 11 bytes "qemu-mipsel". The interp uses 20 bytes "/usr/bin/qemu-mipsel". The type & flags takes up 4 bytes. We need 7 bytes for the delimiter (usually ":"). We can skip offset. So already we're down to 107 bytes to use with the magic/mask instead of the real limit of 128 (BINPRM_BUF_SIZE). If people use shell code to register (which they do the majority of the time), they're down to ~26 possible bytes since the escape sequence must be \x##. The ELF format looks like (both 32 & 64 bit): e_ident: 16 bytes e_type: 2 bytes e_machine: 2 bytes Those 20 bytes are enough for most architectures because they have so few formats in the first place, thus they can be uniquely identified. That also means for shell users, since 20 is smaller than 26, they can sanely register a handler. But for some targets (like MIPS), we need to poke further. The ELF fields continue on: e_entry: 4 or 8 bytes e_phoff: 4 or 8 bytes e_shoff: 4 or 8 bytes e_flags: 4 bytes We only care about e_flags here as that includes the bits to identify whether the ELF is O32/N32/N64. But now we have to consume another 16 bytes (for 32 bit ELFs) or 28 bytes (for 64 bit ELFs) just to match the flags. If every byte is escaped, we send 288 more bytes to the kernel ((20 {e_ident,e_type,e_machine} + 12 {e_entry,e_phoff,e_shoff} + 4 {e_flags}) * 2 {mask,magic} * 4 {escape}) and we've clearly blown our budget. Even if we try to be clever and do the decoding ourselves (rather than relying on the kernel to process \x##), we still can't hit the mark -- string_unescape_inplace treats mask & magic as C strings so NUL cannot be embedded. That leaves us with having to pass \x00 for the 12/24 entry/phoff/shoff bytes (as those will be completely random addresses), and that is a minimum requirement of 48/96 bytes for the mask alone. Add up the rest and we blow through it (this is for 64 bit ELFs): magic: 20 {e_ident,e_type,e_machine} + 24 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 48 # ^^ See note below. mask: 20 {e_ident,e_type,e_machine} + 96 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 120 Remember above we had 107 left over, and now we're at 168. This is of course the *best* case scenario -- you'll also want to have NUL bytes in the magic & mask too to match literal zeros. Note: the reason we can use 24 in the magic is that we can work off of the fact that for bytes the mask would clobber, we can stuff any value into magic that we want. So when mask is \x00, we don't need the magic to also be \x00, it can be an unescaped raw byte like '!'. This lets us handle more formats (barely) under the current 256 limit, but that's a pretty tall hoop to force people to jump through. With all that said, let's bump the limit from 256 bytes to 1920. This way we support escaping every byte of the mask & magic field (which is 1024 bytes by themselves -- 128 * 4 * 2), and we leave plenty of room for other fields. Like long paths to the interpreter (when you have source in your /really/long/homedir/qemu/foo). Since the current code stuffs more than one structure into the same buffer, we leave a bit of space to easily round up to 2k. 1920 is just as arbitrary as 256 ;). Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff bbaecc08 Mon Oct 13 16:52:03 MDT 2014 Mike Frysinger <vapier@gentoo.org> binfmt_misc: expand the register format limit to 1920 bytes The current code places a 256 byte limit on the registration format. This ends up being fairly limited when you try to do matching against a binary format like ELF: - the magic & mask formats cannot have any embedded NUL chars (string_unescape_inplace halts at the first NUL) - each escape sequence quadruples the size: \x00 is needed for NUL - trying to match bytes at the start of the file as well as further on leads to a lot of \x00 sequences in the mask - magic & mask have to be the same length (when decoded) - still need bytes for the other fields - impossible! Let's look at a concrete (and common) example: using QEMU to run MIPS ELFs. The name field uses 11 bytes "qemu-mipsel". The interp uses 20 bytes "/usr/bin/qemu-mipsel". The type & flags takes up 4 bytes. We need 7 bytes for the delimiter (usually ":"). We can skip offset. So already we're down to 107 bytes to use with the magic/mask instead of the real limit of 128 (BINPRM_BUF_SIZE). If people use shell code to register (which they do the majority of the time), they're down to ~26 possible bytes since the escape sequence must be \x##. The ELF format looks like (both 32 & 64 bit): e_ident: 16 bytes e_type: 2 bytes e_machine: 2 bytes Those 20 bytes are enough for most architectures because they have so few formats in the first place, thus they can be uniquely identified. That also means for shell users, since 20 is smaller than 26, they can sanely register a handler. But for some targets (like MIPS), we need to poke further. The ELF fields continue on: e_entry: 4 or 8 bytes e_phoff: 4 or 8 bytes e_shoff: 4 or 8 bytes e_flags: 4 bytes We only care about e_flags here as that includes the bits to identify whether the ELF is O32/N32/N64. But now we have to consume another 16 bytes (for 32 bit ELFs) or 28 bytes (for 64 bit ELFs) just to match the flags. If every byte is escaped, we send 288 more bytes to the kernel ((20 {e_ident,e_type,e_machine} + 12 {e_entry,e_phoff,e_shoff} + 4 {e_flags}) * 2 {mask,magic} * 4 {escape}) and we've clearly blown our budget. Even if we try to be clever and do the decoding ourselves (rather than relying on the kernel to process \x##), we still can't hit the mark -- string_unescape_inplace treats mask & magic as C strings so NUL cannot be embedded. That leaves us with having to pass \x00 for the 12/24 entry/phoff/shoff bytes (as those will be completely random addresses), and that is a minimum requirement of 48/96 bytes for the mask alone. Add up the rest and we blow through it (this is for 64 bit ELFs): magic: 20 {e_ident,e_type,e_machine} + 24 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 48 # ^^ See note below. mask: 20 {e_ident,e_type,e_machine} + 96 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 120 Remember above we had 107 left over, and now we're at 168. This is of course the *best* case scenario -- you'll also want to have NUL bytes in the magic & mask too to match literal zeros. Note: the reason we can use 24 in the magic is that we can work off of the fact that for bytes the mask would clobber, we can stuff any value into magic that we want. So when mask is \x00, we don't need the magic to also be \x00, it can be an unescaped raw byte like '!'. This lets us handle more formats (barely) under the current 256 limit, but that's a pretty tall hoop to force people to jump through. With all that said, let's bump the limit from 256 bytes to 1920. This way we support escaping every byte of the mask & magic field (which is 1024 bytes by themselves -- 128 * 4 * 2), and we leave plenty of room for other fields. Like long paths to the interpreter (when you have source in your /really/long/homedir/qemu/foo). Since the current code stuffs more than one structure into the same buffer, we leave a bit of space to easily round up to 2k. 1920 is just as arbitrary as 256 ;). Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff bbaecc08 Mon Oct 13 16:52:03 MDT 2014 Mike Frysinger <vapier@gentoo.org> binfmt_misc: expand the register format limit to 1920 bytes The current code places a 256 byte limit on the registration format. This ends up being fairly limited when you try to do matching against a binary format like ELF: - the magic & mask formats cannot have any embedded NUL chars (string_unescape_inplace halts at the first NUL) - each escape sequence quadruples the size: \x00 is needed for NUL - trying to match bytes at the start of the file as well as further on leads to a lot of \x00 sequences in the mask - magic & mask have to be the same length (when decoded) - still need bytes for the other fields - impossible! Let's look at a concrete (and common) example: using QEMU to run MIPS ELFs. The name field uses 11 bytes "qemu-mipsel". The interp uses 20 bytes "/usr/bin/qemu-mipsel". The type & flags takes up 4 bytes. We need 7 bytes for the delimiter (usually ":"). We can skip offset. So already we're down to 107 bytes to use with the magic/mask instead of the real limit of 128 (BINPRM_BUF_SIZE). If people use shell code to register (which they do the majority of the time), they're down to ~26 possible bytes since the escape sequence must be \x##. The ELF format looks like (both 32 & 64 bit): e_ident: 16 bytes e_type: 2 bytes e_machine: 2 bytes Those 20 bytes are enough for most architectures because they have so few formats in the first place, thus they can be uniquely identified. That also means for shell users, since 20 is smaller than 26, they can sanely register a handler. But for some targets (like MIPS), we need to poke further. The ELF fields continue on: e_entry: 4 or 8 bytes e_phoff: 4 or 8 bytes e_shoff: 4 or 8 bytes e_flags: 4 bytes We only care about e_flags here as that includes the bits to identify whether the ELF is O32/N32/N64. But now we have to consume another 16 bytes (for 32 bit ELFs) or 28 bytes (for 64 bit ELFs) just to match the flags. If every byte is escaped, we send 288 more bytes to the kernel ((20 {e_ident,e_type,e_machine} + 12 {e_entry,e_phoff,e_shoff} + 4 {e_flags}) * 2 {mask,magic} * 4 {escape}) and we've clearly blown our budget. Even if we try to be clever and do the decoding ourselves (rather than relying on the kernel to process \x##), we still can't hit the mark -- string_unescape_inplace treats mask & magic as C strings so NUL cannot be embedded. That leaves us with having to pass \x00 for the 12/24 entry/phoff/shoff bytes (as those will be completely random addresses), and that is a minimum requirement of 48/96 bytes for the mask alone. Add up the rest and we blow through it (this is for 64 bit ELFs): magic: 20 {e_ident,e_type,e_machine} + 24 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 48 # ^^ See note below. mask: 20 {e_ident,e_type,e_machine} + 96 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 120 Remember above we had 107 left over, and now we're at 168. This is of course the *best* case scenario -- you'll also want to have NUL bytes in the magic & mask too to match literal zeros. Note: the reason we can use 24 in the magic is that we can work off of the fact that for bytes the mask would clobber, we can stuff any value into magic that we want. So when mask is \x00, we don't need the magic to also be \x00, it can be an unescaped raw byte like '!'. This lets us handle more formats (barely) under the current 256 limit, but that's a pretty tall hoop to force people to jump through. With all that said, let's bump the limit from 256 bytes to 1920. This way we support escaping every byte of the mask & magic field (which is 1024 bytes by themselves -- 128 * 4 * 2), and we leave plenty of room for other fields. Like long paths to the interpreter (when you have source in your /really/long/homedir/qemu/foo). Since the current code stuffs more than one structure into the same buffer, we leave a bit of space to easily round up to 2k. 1920 is just as arbitrary as 256 ;). Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff bbaecc08 Mon Oct 13 16:52:03 MDT 2014 Mike Frysinger <vapier@gentoo.org> binfmt_misc: expand the register format limit to 1920 bytes The current code places a 256 byte limit on the registration format. This ends up being fairly limited when you try to do matching against a binary format like ELF: - the magic & mask formats cannot have any embedded NUL chars (string_unescape_inplace halts at the first NUL) - each escape sequence quadruples the size: \x00 is needed for NUL - trying to match bytes at the start of the file as well as further on leads to a lot of \x00 sequences in the mask - magic & mask have to be the same length (when decoded) - still need bytes for the other fields - impossible! Let's look at a concrete (and common) example: using QEMU to run MIPS ELFs. The name field uses 11 bytes "qemu-mipsel". The interp uses 20 bytes "/usr/bin/qemu-mipsel". The type & flags takes up 4 bytes. We need 7 bytes for the delimiter (usually ":"). We can skip offset. So already we're down to 107 bytes to use with the magic/mask instead of the real limit of 128 (BINPRM_BUF_SIZE). If people use shell code to register (which they do the majority of the time), they're down to ~26 possible bytes since the escape sequence must be \x##. The ELF format looks like (both 32 & 64 bit): e_ident: 16 bytes e_type: 2 bytes e_machine: 2 bytes Those 20 bytes are enough for most architectures because they have so few formats in the first place, thus they can be uniquely identified. That also means for shell users, since 20 is smaller than 26, they can sanely register a handler. But for some targets (like MIPS), we need to poke further. The ELF fields continue on: e_entry: 4 or 8 bytes e_phoff: 4 or 8 bytes e_shoff: 4 or 8 bytes e_flags: 4 bytes We only care about e_flags here as that includes the bits to identify whether the ELF is O32/N32/N64. But now we have to consume another 16 bytes (for 32 bit ELFs) or 28 bytes (for 64 bit ELFs) just to match the flags. If every byte is escaped, we send 288 more bytes to the kernel ((20 {e_ident,e_type,e_machine} + 12 {e_entry,e_phoff,e_shoff} + 4 {e_flags}) * 2 {mask,magic} * 4 {escape}) and we've clearly blown our budget. Even if we try to be clever and do the decoding ourselves (rather than relying on the kernel to process \x##), we still can't hit the mark -- string_unescape_inplace treats mask & magic as C strings so NUL cannot be embedded. That leaves us with having to pass \x00 for the 12/24 entry/phoff/shoff bytes (as those will be completely random addresses), and that is a minimum requirement of 48/96 bytes for the mask alone. Add up the rest and we blow through it (this is for 64 bit ELFs): magic: 20 {e_ident,e_type,e_machine} + 24 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 48 # ^^ See note below. mask: 20 {e_ident,e_type,e_machine} + 96 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 120 Remember above we had 107 left over, and now we're at 168. This is of course the *best* case scenario -- you'll also want to have NUL bytes in the magic & mask too to match literal zeros. Note: the reason we can use 24 in the magic is that we can work off of the fact that for bytes the mask would clobber, we can stuff any value into magic that we want. So when mask is \x00, we don't need the magic to also be \x00, it can be an unescaped raw byte like '!'. This lets us handle more formats (barely) under the current 256 limit, but that's a pretty tall hoop to force people to jump through. With all that said, let's bump the limit from 256 bytes to 1920. This way we support escaping every byte of the mask & magic field (which is 1024 bytes by themselves -- 128 * 4 * 2), and we leave plenty of room for other fields. Like long paths to the interpreter (when you have source in your /really/long/homedir/qemu/foo). Since the current code stuffs more than one structure into the same buffer, we leave a bit of space to easily round up to 2k. 1920 is just as arbitrary as 256 ;). Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff bbaecc08 Mon Oct 13 16:52:03 MDT 2014 Mike Frysinger <vapier@gentoo.org> binfmt_misc: expand the register format limit to 1920 bytes The current code places a 256 byte limit on the registration format. This ends up being fairly limited when you try to do matching against a binary format like ELF: - the magic & mask formats cannot have any embedded NUL chars (string_unescape_inplace halts at the first NUL) - each escape sequence quadruples the size: \x00 is needed for NUL - trying to match bytes at the start of the file as well as further on leads to a lot of \x00 sequences in the mask - magic & mask have to be the same length (when decoded) - still need bytes for the other fields - impossible! Let's look at a concrete (and common) example: using QEMU to run MIPS ELFs. The name field uses 11 bytes "qemu-mipsel". The interp uses 20 bytes "/usr/bin/qemu-mipsel". The type & flags takes up 4 bytes. We need 7 bytes for the delimiter (usually ":"). We can skip offset. So already we're down to 107 bytes to use with the magic/mask instead of the real limit of 128 (BINPRM_BUF_SIZE). If people use shell code to register (which they do the majority of the time), they're down to ~26 possible bytes since the escape sequence must be \x##. The ELF format looks like (both 32 & 64 bit): e_ident: 16 bytes e_type: 2 bytes e_machine: 2 bytes Those 20 bytes are enough for most architectures because they have so few formats in the first place, thus they can be uniquely identified. That also means for shell users, since 20 is smaller than 26, they can sanely register a handler. But for some targets (like MIPS), we need to poke further. The ELF fields continue on: e_entry: 4 or 8 bytes e_phoff: 4 or 8 bytes e_shoff: 4 or 8 bytes e_flags: 4 bytes We only care about e_flags here as that includes the bits to identify whether the ELF is O32/N32/N64. But now we have to consume another 16 bytes (for 32 bit ELFs) or 28 bytes (for 64 bit ELFs) just to match the flags. If every byte is escaped, we send 288 more bytes to the kernel ((20 {e_ident,e_type,e_machine} + 12 {e_entry,e_phoff,e_shoff} + 4 {e_flags}) * 2 {mask,magic} * 4 {escape}) and we've clearly blown our budget. Even if we try to be clever and do the decoding ourselves (rather than relying on the kernel to process \x##), we still can't hit the mark -- string_unescape_inplace treats mask & magic as C strings so NUL cannot be embedded. That leaves us with having to pass \x00 for the 12/24 entry/phoff/shoff bytes (as those will be completely random addresses), and that is a minimum requirement of 48/96 bytes for the mask alone. Add up the rest and we blow through it (this is for 64 bit ELFs): magic: 20 {e_ident,e_type,e_machine} + 24 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 48 # ^^ See note below. mask: 20 {e_ident,e_type,e_machine} + 96 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 120 Remember above we had 107 left over, and now we're at 168. This is of course the *best* case scenario -- you'll also want to have NUL bytes in the magic & mask too to match literal zeros. Note: the reason we can use 24 in the magic is that we can work off of the fact that for bytes the mask would clobber, we can stuff any value into magic that we want. So when mask is \x00, we don't need the magic to also be \x00, it can be an unescaped raw byte like '!'. This lets us handle more formats (barely) under the current 256 limit, but that's a pretty tall hoop to force people to jump through. With all that said, let's bump the limit from 256 bytes to 1920. This way we support escaping every byte of the mask & magic field (which is 1024 bytes by themselves -- 128 * 4 * 2), and we leave plenty of room for other fields. Like long paths to the interpreter (when you have source in your /really/long/homedir/qemu/foo). Since the current code stuffs more than one structure into the same buffer, we leave a bit of space to easily round up to 2k. 1920 is just as arbitrary as 256 ;). Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff bbaecc08 Mon Oct 13 16:52:03 MDT 2014 Mike Frysinger <vapier@gentoo.org> binfmt_misc: expand the register format limit to 1920 bytes The current code places a 256 byte limit on the registration format. This ends up being fairly limited when you try to do matching against a binary format like ELF: - the magic & mask formats cannot have any embedded NUL chars (string_unescape_inplace halts at the first NUL) - each escape sequence quadruples the size: \x00 is needed for NUL - trying to match bytes at the start of the file as well as further on leads to a lot of \x00 sequences in the mask - magic & mask have to be the same length (when decoded) - still need bytes for the other fields - impossible! Let's look at a concrete (and common) example: using QEMU to run MIPS ELFs. The name field uses 11 bytes "qemu-mipsel". The interp uses 20 bytes "/usr/bin/qemu-mipsel". The type & flags takes up 4 bytes. We need 7 bytes for the delimiter (usually ":"). We can skip offset. So already we're down to 107 bytes to use with the magic/mask instead of the real limit of 128 (BINPRM_BUF_SIZE). If people use shell code to register (which they do the majority of the time), they're down to ~26 possible bytes since the escape sequence must be \x##. The ELF format looks like (both 32 & 64 bit): e_ident: 16 bytes e_type: 2 bytes e_machine: 2 bytes Those 20 bytes are enough for most architectures because they have so few formats in the first place, thus they can be uniquely identified. That also means for shell users, since 20 is smaller than 26, they can sanely register a handler. But for some targets (like MIPS), we need to poke further. The ELF fields continue on: e_entry: 4 or 8 bytes e_phoff: 4 or 8 bytes e_shoff: 4 or 8 bytes e_flags: 4 bytes We only care about e_flags here as that includes the bits to identify whether the ELF is O32/N32/N64. But now we have to consume another 16 bytes (for 32 bit ELFs) or 28 bytes (for 64 bit ELFs) just to match the flags. If every byte is escaped, we send 288 more bytes to the kernel ((20 {e_ident,e_type,e_machine} + 12 {e_entry,e_phoff,e_shoff} + 4 {e_flags}) * 2 {mask,magic} * 4 {escape}) and we've clearly blown our budget. Even if we try to be clever and do the decoding ourselves (rather than relying on the kernel to process \x##), we still can't hit the mark -- string_unescape_inplace treats mask & magic as C strings so NUL cannot be embedded. That leaves us with having to pass \x00 for the 12/24 entry/phoff/shoff bytes (as those will be completely random addresses), and that is a minimum requirement of 48/96 bytes for the mask alone. Add up the rest and we blow through it (this is for 64 bit ELFs): magic: 20 {e_ident,e_type,e_machine} + 24 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 48 # ^^ See note below. mask: 20 {e_ident,e_type,e_machine} + 96 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 120 Remember above we had 107 left over, and now we're at 168. This is of course the *best* case scenario -- you'll also want to have NUL bytes in the magic & mask too to match literal zeros. Note: the reason we can use 24 in the magic is that we can work off of the fact that for bytes the mask would clobber, we can stuff any value into magic that we want. So when mask is \x00, we don't need the magic to also be \x00, it can be an unescaped raw byte like '!'. This lets us handle more formats (barely) under the current 256 limit, but that's a pretty tall hoop to force people to jump through. With all that said, let's bump the limit from 256 bytes to 1920. This way we support escaping every byte of the mask & magic field (which is 1024 bytes by themselves -- 128 * 4 * 2), and we leave plenty of room for other fields. Like long paths to the interpreter (when you have source in your /really/long/homedir/qemu/foo). Since the current code stuffs more than one structure into the same buffer, we leave a bit of space to easily round up to 2k. 1920 is just as arbitrary as 256 ;). Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff bbaecc08 Mon Oct 13 16:52:03 MDT 2014 Mike Frysinger <vapier@gentoo.org> binfmt_misc: expand the register format limit to 1920 bytes The current code places a 256 byte limit on the registration format. This ends up being fairly limited when you try to do matching against a binary format like ELF: - the magic & mask formats cannot have any embedded NUL chars (string_unescape_inplace halts at the first NUL) - each escape sequence quadruples the size: \x00 is needed for NUL - trying to match bytes at the start of the file as well as further on leads to a lot of \x00 sequences in the mask - magic & mask have to be the same length (when decoded) - still need bytes for the other fields - impossible! Let's look at a concrete (and common) example: using QEMU to run MIPS ELFs. The name field uses 11 bytes "qemu-mipsel". The interp uses 20 bytes "/usr/bin/qemu-mipsel". The type & flags takes up 4 bytes. We need 7 bytes for the delimiter (usually ":"). We can skip offset. So already we're down to 107 bytes to use with the magic/mask instead of the real limit of 128 (BINPRM_BUF_SIZE). If people use shell code to register (which they do the majority of the time), they're down to ~26 possible bytes since the escape sequence must be \x##. The ELF format looks like (both 32 & 64 bit): e_ident: 16 bytes e_type: 2 bytes e_machine: 2 bytes Those 20 bytes are enough for most architectures because they have so few formats in the first place, thus they can be uniquely identified. That also means for shell users, since 20 is smaller than 26, they can sanely register a handler. But for some targets (like MIPS), we need to poke further. The ELF fields continue on: e_entry: 4 or 8 bytes e_phoff: 4 or 8 bytes e_shoff: 4 or 8 bytes e_flags: 4 bytes We only care about e_flags here as that includes the bits to identify whether the ELF is O32/N32/N64. But now we have to consume another 16 bytes (for 32 bit ELFs) or 28 bytes (for 64 bit ELFs) just to match the flags. If every byte is escaped, we send 288 more bytes to the kernel ((20 {e_ident,e_type,e_machine} + 12 {e_entry,e_phoff,e_shoff} + 4 {e_flags}) * 2 {mask,magic} * 4 {escape}) and we've clearly blown our budget. Even if we try to be clever and do the decoding ourselves (rather than relying on the kernel to process \x##), we still can't hit the mark -- string_unescape_inplace treats mask & magic as C strings so NUL cannot be embedded. That leaves us with having to pass \x00 for the 12/24 entry/phoff/shoff bytes (as those will be completely random addresses), and that is a minimum requirement of 48/96 bytes for the mask alone. Add up the rest and we blow through it (this is for 64 bit ELFs): magic: 20 {e_ident,e_type,e_machine} + 24 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 48 # ^^ See note below. mask: 20 {e_ident,e_type,e_machine} + 96 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 120 Remember above we had 107 left over, and now we're at 168. This is of course the *best* case scenario -- you'll also want to have NUL bytes in the magic & mask too to match literal zeros. Note: the reason we can use 24 in the magic is that we can work off of the fact that for bytes the mask would clobber, we can stuff any value into magic that we want. So when mask is \x00, we don't need the magic to also be \x00, it can be an unescaped raw byte like '!'. This lets us handle more formats (barely) under the current 256 limit, but that's a pretty tall hoop to force people to jump through. With all that said, let's bump the limit from 256 bytes to 1920. This way we support escaping every byte of the mask & magic field (which is 1024 bytes by themselves -- 128 * 4 * 2), and we leave plenty of room for other fields. Like long paths to the interpreter (when you have source in your /really/long/homedir/qemu/foo). Since the current code stuffs more than one structure into the same buffer, we leave a bit of space to easily round up to 2k. 1920 is just as arbitrary as 256 ;). Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff bbaecc08 Mon Oct 13 16:52:03 MDT 2014 Mike Frysinger <vapier@gentoo.org> binfmt_misc: expand the register format limit to 1920 bytes The current code places a 256 byte limit on the registration format. This ends up being fairly limited when you try to do matching against a binary format like ELF: - the magic & mask formats cannot have any embedded NUL chars (string_unescape_inplace halts at the first NUL) - each escape sequence quadruples the size: \x00 is needed for NUL - trying to match bytes at the start of the file as well as further on leads to a lot of \x00 sequences in the mask - magic & mask have to be the same length (when decoded) - still need bytes for the other fields - impossible! Let's look at a concrete (and common) example: using QEMU to run MIPS ELFs. The name field uses 11 bytes "qemu-mipsel". The interp uses 20 bytes "/usr/bin/qemu-mipsel". The type & flags takes up 4 bytes. We need 7 bytes for the delimiter (usually ":"). We can skip offset. So already we're down to 107 bytes to use with the magic/mask instead of the real limit of 128 (BINPRM_BUF_SIZE). If people use shell code to register (which they do the majority of the time), they're down to ~26 possible bytes since the escape sequence must be \x##. The ELF format looks like (both 32 & 64 bit): e_ident: 16 bytes e_type: 2 bytes e_machine: 2 bytes Those 20 bytes are enough for most architectures because they have so few formats in the first place, thus they can be uniquely identified. That also means for shell users, since 20 is smaller than 26, they can sanely register a handler. But for some targets (like MIPS), we need to poke further. The ELF fields continue on: e_entry: 4 or 8 bytes e_phoff: 4 or 8 bytes e_shoff: 4 or 8 bytes e_flags: 4 bytes We only care about e_flags here as that includes the bits to identify whether the ELF is O32/N32/N64. But now we have to consume another 16 bytes (for 32 bit ELFs) or 28 bytes (for 64 bit ELFs) just to match the flags. If every byte is escaped, we send 288 more bytes to the kernel ((20 {e_ident,e_type,e_machine} + 12 {e_entry,e_phoff,e_shoff} + 4 {e_flags}) * 2 {mask,magic} * 4 {escape}) and we've clearly blown our budget. Even if we try to be clever and do the decoding ourselves (rather than relying on the kernel to process \x##), we still can't hit the mark -- string_unescape_inplace treats mask & magic as C strings so NUL cannot be embedded. That leaves us with having to pass \x00 for the 12/24 entry/phoff/shoff bytes (as those will be completely random addresses), and that is a minimum requirement of 48/96 bytes for the mask alone. Add up the rest and we blow through it (this is for 64 bit ELFs): magic: 20 {e_ident,e_type,e_machine} + 24 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 48 # ^^ See note below. mask: 20 {e_ident,e_type,e_machine} + 96 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 120 Remember above we had 107 left over, and now we're at 168. This is of course the *best* case scenario -- you'll also want to have NUL bytes in the magic & mask too to match literal zeros. Note: the reason we can use 24 in the magic is that we can work off of the fact that for bytes the mask would clobber, we can stuff any value into magic that we want. So when mask is \x00, we don't need the magic to also be \x00, it can be an unescaped raw byte like '!'. This lets us handle more formats (barely) under the current 256 limit, but that's a pretty tall hoop to force people to jump through. With all that said, let's bump the limit from 256 bytes to 1920. This way we support escaping every byte of the mask & magic field (which is 1024 bytes by themselves -- 128 * 4 * 2), and we leave plenty of room for other fields. Like long paths to the interpreter (when you have source in your /really/long/homedir/qemu/foo). Since the current code stuffs more than one structure into the same buffer, we leave a bit of space to easily round up to 2k. 1920 is just as arbitrary as 256 ;). Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
/linux-master/fs/befs/ | ||
H A D | linuxvfs.c | diff 4c3897cc Sun Jul 03 09:29:44 MDT 2016 Luis de Bethencourt <luisbg@osg.samsung.com> befs: make consistent use of befs_error() befs_error() is used in potential errors that could happen in befs to provide informational log messages. befs_debug() is silent when CONFIG_BEFS_DEBUG=no, and very verbose when switched on, which is why it is used for general debugging but not for errors. Fix a few cases where the befs debug utility usage isn't following the expected pattern. To make sure we have consistent information in the logs. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> diff 4ba9b9d0 Wed Oct 17 00:25:51 MDT 2007 Christoph Lameter <clameter@sgi.com> Slab API: remove useless ctor parameter and reorder parameters Slab constructors currently have a flags parameter that is never used. And the order of the arguments is opposite to other slab functions. The object pointer is placed before the kmem_cache pointer. Convert ctor(void *object, struct kmem_cache *s, unsigned long flags) to ctor(struct kmem_cache *s, void *object) throughout the kernel [akpm@linux-foundation.org: coupla fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 94f563c4 Sat Aug 05 01:14:55 MDT 2006 Diego Calleja <diegocg@gmail.com> [PATCH] Fix BeFS slab corruption In bugzilla #6941, Jens Kilian reported: "The function befs_utf2nls (in fs/befs/linuxvfs.c) writes a 0 byte past the end of a block of memory allocated via kmalloc(), leading to memory corruption. This happens only for filenames which are pure ASCII and a multiple of 4 bytes in length. [...] Without DEBUG_SLAB, this leads to further corruption and hard lockups; I believe this is the bug which has made kernels later than 2.6.8 unusable for me. (This must be due to changes in memory management, the bug has been in the BeFS driver since the time it was introduced (AFAICT).) Steps to reproduce: Create a directory (in BeOS, naturally :-) with files named, e.g., "1", "22", "333", "4444", ... Mount it in Linux and do an "ls" or "find"" This patch implements the suggested fix. Credits to Jens Kilian for debugging the problem and finding the right fix. Signed-off-by: Diego Calleja <diegocg@gmail.com> Cc: Jens Kilian <jjk@acm.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6f5d20 Tue Mar 28 02:56:42 MST 2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4b6a9316 Fri Mar 24 04:16:05 MST 2006 Paul Jackson <pj@sgi.com> [PATCH] cpuset memory spread: slab cache filesystems Mark file system inode and similar slab caches subject to SLAB_MEM_SPREAD memory spreading. If a slab cache is marked SLAB_MEM_SPREAD, then anytime that a task that's in a cpuset with the 'memory_spread_slab' option enabled goes to allocate from such a slab cache, the allocations are spread evenly over all the memory nodes (task->mems_allowed) allowed to that task, instead of favoring allocation on the node local to the current cpu. The following inode and similar caches are marked SLAB_MEM_SPREAD: file cache ==== ===== fs/adfs/super.c adfs_inode_cache fs/affs/super.c affs_inode_cache fs/befs/linuxvfs.c befs_inode_cache fs/bfs/inode.c bfs_inode_cache fs/block_dev.c bdev_cache fs/cifs/cifsfs.c cifs_inode_cache fs/coda/inode.c coda_inode_cache fs/dquot.c dquot fs/efs/super.c efs_inode_cache fs/ext2/super.c ext2_inode_cache fs/ext2/xattr.c (fs/mbcache.c) ext2_xattr fs/ext3/super.c ext3_inode_cache fs/ext3/xattr.c (fs/mbcache.c) ext3_xattr fs/fat/cache.c fat_cache fs/fat/inode.c fat_inode_cache fs/freevxfs/vxfs_super.c vxfs_inode fs/hpfs/super.c hpfs_inode_cache fs/isofs/inode.c isofs_inode_cache fs/jffs/inode-v23.c jffs_fm fs/jffs2/super.c jffs2_i fs/jfs/super.c jfs_ip fs/minix/inode.c minix_inode_cache fs/ncpfs/inode.c ncp_inode_cache fs/nfs/direct.c nfs_direct_cache fs/nfs/inode.c nfs_inode_cache fs/ntfs/super.c ntfs_big_inode_cache_name fs/ntfs/super.c ntfs_inode_cache fs/ocfs2/dlm/dlmfs.c dlmfs_inode_cache fs/ocfs2/super.c ocfs2_inode_cache fs/proc/inode.c proc_inode_cache fs/qnx4/inode.c qnx4_inode_cache fs/reiserfs/super.c reiser_inode_cache fs/romfs/inode.c romfs_inode_cache fs/smbfs/inode.c smb_inode_cache fs/sysv/inode.c sysv_inode_cache fs/udf/super.c udf_inode_cache fs/ufs/super.c ufs_inode_cache net/socket.c sock_inode_cache net/sunrpc/rpc_pipe.c rpc_inode_cache The choice of which slab caches to so mark was quite simple. I marked those already marked SLAB_RECLAIM_ACCOUNT, except for fs/xfs, dentry_cache, inode_cache, and buffer_head, which were marked in a previous patch. Even though SLAB_RECLAIM_ACCOUNT is for a different purpose, it marks the same potentially large file system i/o related slab caches as we need for memory spreading. Given that the rule now becomes "wherever you would have used a SLAB_RECLAIM_ACCOUNT slab cache flag before (usually the inode cache), use the SLAB_MEM_SPREAD flag too", this should be easy enough to maintain. Future file system writers will just copy one of the existing file system slab cache setups and tend to get it right without thinking. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> diff 4de151d8 Tue Mar 21 16:13:35 MST 2006 Alexey Dobriyan <adobriyan@gmail.com> It's UTF-8 Fix some comments to "UTF-8". Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> |
Completed in 1419 milliseconds