#
e31f0a57 |
|
19-Sep-2023 |
Al Viro <viro@zeniv.linux.org.uk> |
procfs: make freeing proc_fs_info rcu-delayed makes proc_pid_ns() safe from rcu pathwalk (put_pid_ns() is still synchronous, but that's not a problem - it does rcu-delay everything that needs to be) Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
0ee44885 |
|
12-Jun-2023 |
Rick Edgecombe <rick.p.edgecombe@intel.com> |
x86: Expose thread features in /proc/$PID/status Applications and loaders can have logic to decide whether to enable shadow stack. They usually don't report whether shadow stack has been enabled or not, so there is no way to verify whether an application actually is protected by shadow stack. Add two lines in /proc/$PID/status to report enabled and locked features. Since, this involves referring to arch specific defines in asm/prctl.h, implement an arch breakout to emit the feature lines. [Switched to CET, added to commit log] Co-developed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: John Allen <john.allen@amd.com> Tested-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/all/20230613001108.3040476-37-rick.p.edgecombe%40intel.com
|
#
ef104443 |
|
16-May-2023 |
Arnd Bergmann <arnd@arndb.de> |
procfs: consolidate arch_report_meminfo declaration The arch_report_meminfo() function is provided by four architectures, with a __weak fallback in procfs itself. On architectures that don't have a custom version, the __weak version causes a warning because of the missing prototype. Remove the architecture specific prototypes and instead add one in linux/proc_fs.h. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # for arch/x86 Acked-by: Helge Deller <deller@gmx.de> # parisc Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Message-Id: <20230516195834.551901-1-arnd@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
c3d96f69 |
|
03-Oct-2022 |
David Howells <dhowells@redhat.com> |
net, proc: Provide PROC_FS=n fallback for proc_create_net_single_write() Provide a CONFIG_PROC_FS=n fallback for proc_create_net_single_write(). Also provide a fallback for proc_create_net_data_write(). Fixes: 564def71765c ("proc: Add a way to make network proc files writable") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org
|
#
359745d7 |
|
21-Jan-2022 |
Muchun Song <songmuchun@bytedance.com> |
proc: remove PDE_DATA() completely Remove PDE_DATA() completely and replace it with pde_data(). [akpm@linux-foundation.org: fix naming clash in drivers/nubus/proc.c] [akpm@linux-foundation.org: now fix it properly] Link: https://lkml.kernel.org/r/20211124081956.87711-2-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6dfbbae1 |
|
21-Jan-2022 |
Muchun Song <songmuchun@bytedance.com> |
fs: proc: store PDE()->data into inode->i_private PDE_DATA(inode) is introduced to get user private data and hide the layout of struct proc_dir_entry. The inode->i_private is used to do the same thing as well. Save a copy of user private data to inode-> i_private when proc inode is allocated. This means the user also can get their private data by inode->i_private. Introduce pde_data() to wrap inode->i_private so that we can remove PDE_DATA() from fs/proc/generic.c and make PTE_DATE() as a wrapper of pde_data(). It will be easier if we decide to remove PDE_DATE() in the future. Link: https://lkml.kernel.org/r/20211124081956.87711-1-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ae62fbe2 |
|
19-Jan-2022 |
Hans de Goede <hdegoede@redhat.com> |
proc: make the proc_create[_data]() stubs static inlines Change the proc_create[_data]() stubs which are used when CONFIG_PROC_FS is not set from #defines to a static inline stubs. This should fix clang -Werror builds failing due to errors like this: drivers/platform/x86/thinkpad_acpi.c:918:30: error: unused variable 'dispatch_proc_ops' [-Werror,-Wunused-const-variable] Fixing this in include/linux/proc_fs.h should ensure that the same issue is also fixed in any other drivers hitting the same -Werror issue. [akpm@linux-foundation.org: fix CONFIG_PROC_FS=n] [akpm@linux-foundation.org: fix arch/sparc/kernel/led.c] [akpm@linux-foundation.org: fix build] Link: https://lkml.kernel.org/r/20211116131112.508304-1-hdegoede@redhat.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Hans de Goede <hdegoede@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d4455fac |
|
06-May-2021 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: mandate ->proc_lseek in "struct proc_ops" Now that proc_ops are separate from file_operations and other operations it easy to check all instances to have ->proc_lseek hook and remove check in main code. Note: nonseekable_open() files naturally don't require ->proc_lseek. Garbage collect pde_lseek() function. [adobriyan@gmail.com: smoke test lseek()] Link: https://lkml.kernel.org/r/YG4OIhChOrVTPgdN@localhost.localdomain Link: https://lkml.kernel.org/r/YFYX0Bzwxlc7aBa/@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c6c75ded |
|
15-Dec-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: fix lookup in /proc/net subdirectories after setns(2) Commit 1fde6f21d90f ("proc: fix /proc/net/* after setns(2)") only forced revalidation of regular files under /proc/net/ However, /proc/net/ is unusual in the sense of /proc/net/foo handlers take netns pointer from parent directory which is old netns. Steps to reproduce: (void)open("/proc/net/sctp/snmp", O_RDONLY); unshare(CLONE_NEWNET); int fd = open("/proc/net/sctp/snmp", O_RDONLY); read(fd, &c, 1); Read will read wrong data from original netns. Patch forces lookup on every directory under /proc/net . Link: https://lkml.kernel.org/r/20201205160916.GA109739@localhost.localdomain Fixes: 1da4d377f943 ("proc: revalidate misc dentries") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fd5a13f4 |
|
03-Sep-2020 |
Christoph Hellwig <hch@lst.de> |
proc: add a read_iter method to proc proc_ops This will allow proc files to implement iter read semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
f9c79272 |
|
23-Jul-2020 |
Yonghong Song <yhs@fb.com> |
bpf: Refactor to provide aux info to bpf_iter_init_seq_priv_t This patch refactored target bpf_iter_init_seq_priv_t callback function to accept additional information. This will be needed in later patches for map element targets since a particular map should be passed to traverse elements for that particular map. In the future, other information may be passed to target as well, e.g., pid, cgroup id, etc. to customize the iterator. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184110.590156-1-yhs@fb.com
|
#
9d78edea |
|
18-May-2020 |
Alexey Gladkov <gladkov.alexey@gmail.com> |
proc: proc_pid_ns takes super_block as an argument syzbot found that touch /proc/testfile causes NULL pointer dereference at tomoyo_get_local_path() because inode of the dentry is NULL. Before c59f415a7cb6, Tomoyo received pid_ns from proc's s_fs_info directly. Since proc_pid_ns() can only work with inode, using it in the tomoyo_get_local_path() was wrong. To avoid creating more functions for getting proc_ns, change the argument type of the proc_pid_ns() function. Then, Tomoyo can use the existing super_block to get pid_ns. Link: https://lkml.kernel.org/r/0000000000002f0c7505a5b0e04c@google.com Link: https://lkml.kernel.org/r/20200518180738.2939611-1-gladkov.alexey@gmail.com Reported-by: syzbot+c1af344512918c61362c@syzkaller.appspotmail.com Fixes: c59f415a7cb6 ("Use proc_pid_ns() to get pid_namespace from the proc superblock") Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
303cc571 |
|
05-May-2020 |
Christian Brauner <christian.brauner@ubuntu.com> |
nsproxy: attach to namespaces via pidfds For quite a while we have been thinking about using pidfds to attach to namespaces. This patchset has existed for about a year already but we've wanted to wait to see how the general api would be received and adopted. Now that more and more programs in userspace have started using pidfds for process management it's time to send this one out. This patch makes it possible to use pidfds to attach to the namespaces of another process, i.e. they can be passed as the first argument to the setns() syscall. When only a single namespace type is specified the semantics are equivalent to passing an nsfd. That means setns(nsfd, CLONE_NEWNET) equals setns(pidfd, CLONE_NEWNET). However, when a pidfd is passed, multiple namespace flags can be specified in the second setns() argument and setns() will attach the caller to all the specified namespaces all at once or to none of them. Specifying 0 is not valid together with a pidfd. Here are just two obvious examples: setns(pidfd, CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET); setns(pidfd, CLONE_NEWUSER); Allowing to also attach subsets of namespaces supports various use-cases where callers setns to a subset of namespaces to retain privilege, perform an action and then re-attach another subset of namespaces. If the need arises, as Eric suggested, we can extend this patchset to assume even more context than just attaching all namespaces. His suggestion specifically was about assuming the process' root directory when setns(pidfd, 0) or setns(pidfd, SETNS_PIDFD) is specified. For now, just keep it flexible in terms of supporting subsets of namespaces but let's wait until we have users asking for even more context to be assumed. At that point we can add an extension. The obvious example where this is useful is a standard container manager interacting with a running container: pushing and pulling files or directories, injecting mounts, attaching/execing any kind of process, managing network devices all these operations require attaching to all or at least multiple namespaces at the same time. Given that nowadays most containers are spawned with all namespaces enabled we're currently looking at at least 14 syscalls, 7 to open the /proc/<pid>/ns/<ns> nsfds, another 7 to actually perform the namespace switch. With time namespaces we're looking at about 16 syscalls. (We could amortize the first 7 or 8 syscalls for opening the nsfds by stashing them in each container's monitor process but that would mean we need to send around those file descriptors through unix sockets everytime we want to interact with the container or keep on-disk state. Even in scenarios where a caller wants to join a particular namespace in a particular order callers still profit from batching other namespaces. That mostly applies to the user namespace but all container runtimes I found join the user namespace first no matter if it privileges or deprivileges the container similar to how unshare behaves.) With pidfds this becomes a single syscall no matter how many namespaces are supposed to be attached to. A decently designed, large-scale container manager usually isn't the parent of any of the containers it spawns so the containers don't die when it crashes or needs to update or reinitialize. This means that for the manager to interact with containers through pids is inherently racy especially on systems where the maximum pid number is not significicantly bumped. This is even more problematic since we often spawn and manage thousands or ten-thousands of containers. Interacting with a container through a pid thus can become risky quite quickly. Especially since we allow for an administrator to enable advanced features such as syscall interception where we're performing syscalls in lieu of the container. In all of those cases we use pidfds if they are available and we pass them around as stable references. Using them to setns() to the target process' namespaces is as reliable as using nsfds. Either the target process is already dead and we get ESRCH or we manage to attach to its namespaces but we can't accidently attach to another process' namespaces. So pidfds lend themselves to be used with this api. The other main advantage is that with this change the pidfd becomes the only relevant token for most container interactions and it's the only token we need to create and send around. Apart from significiantly reducing the number of syscalls from double digit to single digit which is a decent reason post-spectre/meltdown this also allows to switch to a set of namespaces atomically, i.e. either attaching to all the specified namespaces succeeds or we fail. If we fail we haven't changed a single namespace. There are currently three namespaces that can fail (other than for ENOMEM which really is not very interesting since we then have other problems anyway) for non-trivial reasons, user, mount, and pid namespaces. We can fail to attach to a pid namespace if it is not our current active pid namespace or a descendant of it. We can fail to attach to a user namespace because we are multi-threaded or because our current mount namespace shares filesystem state with other tasks, or because we're trying to setns() to the same user namespace, i.e. the target task has the same user namespace as we do. We can fail to attach to a mount namespace because it shares filesystem state with other tasks or because we fail to lookup the new root for the new mount namespace. In most non-pathological scenarios these issues can be somewhat mitigated. But there are cases where we're half-attached to some namespace and failing to attach to another one. I've talked about some of these problem during the hallway track (something only the pre-COVID-19 generation will remember) of Plumbers in Los Angeles in 2018(?). Even if all these issues could be avoided with super careful userspace coding it would be nicer to have this done in-kernel. Pidfds seem to lend themselves nicely for this. The other neat thing about this is that setns() becomes an actual counterpart to the namespace bits of unshare(). Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Serge Hallyn <serge@hallyn.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Jann Horn <jannh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Link: https://lore.kernel.org/r/20200505140432.181565-3-christian.brauner@ubuntu.com
|
#
138d0be3 |
|
09-May-2020 |
Yonghong Song <yhs@fb.com> |
net: bpf: Add netlink and ipv6_route bpf_iter targets This patch added netlink and ipv6_route targets, using the same seq_ops (except show() and minor changes for stop()) for /proc/net/{netlink,ipv6_route}. The net namespace for these targets are the current net namespace at file open stage, similar to /proc/net/{netlink,ipv6_route} reference counting the net namespace at seq_file open stage. Since module is not supported for now, ipv6_route is supported only if the IPV6 is built-in, i.e., not compiled as a module. The restriction can be lifted once module is properly supported for bpf_iter. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175910.2476329-1-yhs@fb.com
|
#
e61bb8b3 |
|
19-Apr-2020 |
Alexey Gladkov <gladkov.alexey@gmail.com> |
proc: use named enums for better readability Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
6814ef2d |
|
19-Apr-2020 |
Alexey Gladkov <gladkov.alexey@gmail.com> |
proc: add option to mount only a pids subset This allows to hide all files and directories in the procfs that are not related to tasks. Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
24a71ce5 |
|
19-Apr-2020 |
Alexey Gladkov <gladkov.alexey@gmail.com> |
proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option If "hidepid=4" mount option is set then do not instantiate pids that we can not ptrace. "hidepid=4" means that procfs should only contain pids that the caller can ptrace. Signed-off-by: Djalal Harouni <tixxdz@gmail.com> Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
fa10fed3 |
|
19-Apr-2020 |
Alexey Gladkov <gladkov.alexey@gmail.com> |
proc: allow to mount many instances of proc in one pid namespace This patch allows to have multiple procfs instances inside the same pid namespace. The aim here is lightweight sandboxes, and to allow that we have to modernize procfs internals. 1) The main aim of this work is to have on embedded systems one supervisor for apps. Right now we have some lightweight sandbox support, however if we create pid namespacess we have to manages all the processes inside too, where our goal is to be able to run a bunch of apps each one inside its own mount namespace without being able to notice each other. We only want to use mount namespaces, and we want procfs to behave more like a real mount point. 2) Linux Security Modules have multiple ptrace paths inside some subsystems, however inside procfs, the implementation does not guarantee that the ptrace() check which triggers the security_ptrace_check() hook will always run. We have the 'hidepid' mount option that can be used to force the ptrace_may_access() check inside has_pid_permissions() to run. The problem is that 'hidepid' is per pid namespace and not attached to the mount point, any remount or modification of 'hidepid' will propagate to all other procfs mounts. This also does not allow to support Yama LSM easily in desktop and user sessions. Yama ptrace scope which restricts ptrace and some other syscalls to be allowed only on inferiors, can be updated to have a per-task context, where the context will be inherited during fork(), clone() and preserved across execve(). If we support multiple private procfs instances, then we may force the ptrace_may_access() on /proc/<pids>/ to always run inside that new procfs instances. This will allow to specifiy on user sessions if we should populate procfs with pids that the user can ptrace or not. By using Yama ptrace scope, some restricted users will only be able to see inferiors inside /proc, they won't even be able to see their other processes. Some software like Chromium, Firefox's crash handler, Wine and others are already using Yama to restrict which processes can be ptracable. With this change this will give the possibility to restrict /proc/<pids>/ but more importantly this will give desktop users a generic and usuable way to specifiy which users should see all processes and which users can not. Side notes: * This covers the lack of seccomp where it is not able to parse arguments, it is easy to install a seccomp filter on direct syscalls that operate on pids, however /proc/<pid>/ is a Linux ABI using filesystem syscalls. With this change LSMs should be able to analyze open/read/write/close... In the new patch set version I removed the 'newinstance' option as suggested by Eric W. Biederman. Selftest has been added to verify new behavior. Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
d919b33d |
|
06-Apr-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: faster open/read/close with "permanent" files Now that "struct proc_ops" exist we can start putting there stuff which could not fly with VFS "struct file_operations"... Most of fs/proc/inode.c file is dedicated to make open/read/.../close reliable in the event of disappearing /proc entries which usually happens if module is getting removed. Files like /proc/cpuinfo which never disappear simply do not need such protection. Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such "permanent" files. Enable "permanent" flag for /proc/cpuinfo /proc/kmsg /proc/modules /proc/slabinfo /proc/stat /proc/sysvipc/* /proc/swaps More will come once I figure out foolproof way to prevent out module authors from marking their stuff "permanent" for performance reasons when it is not. This should help with scalability: benchmark is "read /proc/cpuinfo R times by N threads scattered over the system". N R t, s (before) t, s (after) ----------------------------------------------------- 64 4096 1.582458 1.530502 -3.2% 256 4096 6.371926 6.125168 -3.9% 1024 4096 25.64888 24.47528 -4.6% Benchmark source: #include <chrono> #include <iostream> #include <thread> #include <vector> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> const int NR_CPUS = sysconf(_SC_NPROCESSORS_ONLN); int N; const char *filename; int R; int xxx = 0; int glue(int n) { cpu_set_t m; CPU_ZERO(&m); CPU_SET(n, &m); return sched_setaffinity(0, sizeof(cpu_set_t), &m); } void f(int n) { glue(n % NR_CPUS); while (*(volatile int *)&xxx == 0) { } for (int i = 0; i < R; i++) { int fd = open(filename, O_RDONLY); char buf[4096]; ssize_t rv = read(fd, buf, sizeof(buf)); asm volatile ("" :: "g" (rv)); close(fd); } } int main(int argc, char *argv[]) { if (argc < 4) { std::cerr << "usage: " << argv[0] << ' ' << "N /proc/filename R "; return 1; } N = atoi(argv[1]); filename = argv[2]; R = atoi(argv[3]); for (int i = 0; i < NR_CPUS; i++) { if (glue(i) == 0) break; } std::vector<std::thread> T; T.reserve(N); for (int i = 0; i < N; i++) { T.emplace_back(f, i); } auto t0 = std::chrono::system_clock::now(); { *(volatile int *)&xxx = 1; for (auto& t: T) { t.join(); } } auto t1 = std::chrono::system_clock::now(); std::chrono::duration<double> dt = t1 - t0; std::cout << dt.count() << ' '; return 0; } P.S.: Explicit randomization marker is added because adding non-function pointer will silently disable structure layout randomization. [akpm@linux-foundation.org: coding style fixes] Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Link: http://lkml.kernel.org/r/20200222201539.GA22576@avx2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7bc3e6e5 |
|
19-Feb-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
proc: Use a list of inodes to flush from proc Rework the flushing of proc to use a list of directory inodes that need to be flushed. The list is kept on struct pid not on struct task_struct, as there is a fixed connection between proc inodes and pids but at least for the case of de_thread the pid of a task_struct changes. This removes the dependency on proc_mnt which allows for different mounts of proc having different mount options even in the same pid namespace and this allows for the removal of proc_mnt which will trivially the first mount of proc to honor it's mount options. This flushing remains an optimization. The functions pid_delete_dentry and pid_revalidate ensure that ordinary dcache management will not attempt to use dentries past the point their respective task has died. When unused the shrinker will eventually be able to remove these dentries. There is a case in de_thread where proc_flush_pid can be called early for a given pid. Which winds up being safe (if suboptimal) as this is just an optiimization. Only pid directories are put on the list as the other per pid files are children of those directories and d_invalidate on the directory will get them as well. So that the pid can be used during flushing it's reference count is taken in release_task and dropped in proc_flush_pid. Further the call of proc_flush_pid is moved after the tasklist_lock is released in release_task so that it is certain that the pid has already been unhashed when flushing it taking place. This removes a small race where a dentry could recreated. As struct pid is supposed to be small and I need a per pid lock I reuse the only lock that currently exists in struct pid the the wait_pidfd.lock. The net result is that this adds all of this functionality with just a little extra list management overhead and a single extra pointer in struct pid. v2: Initialize pid->inodes. I somehow failed to get that initialization into the initial version of the patch. A boot failure was reported by "kernel test robot <lkp@intel.com>", and failure to initialize that pid->inodes matches all of the reported symptoms. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
d56c0d45 |
|
03-Feb-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: decouple proc from VFS with "struct proc_ops" Currently core /proc code uses "struct file_operations" for custom hooks, however, VFS doesn't directly call them. Every time VFS expands file_operations hook set, /proc code bloats for no reason. Introduce "struct proc_ops" which contains only those hooks which /proc allows to call into (open, release, read, write, ioctl, mmap, poll). It doesn't contain module pointer as well. Save ~184 bytes per usage: add/remove: 26/26 grow/shrink: 1/4 up/down: 1922/-6674 (-4752) Function old new delta sysvipc_proc_ops - 72 +72 ... config_gz_proc_ops - 72 +72 proc_get_inode 289 339 +50 proc_reg_get_unmapped_area 110 107 -3 close_pdeo 227 224 -3 proc_reg_open 289 284 -5 proc_create_data 60 53 -7 rt_cpu_seq_fops 256 - -256 ... default_affinity_proc_fops 256 - -256 Total: Before=5430095, After=5425343, chg -0.09% Link: http://lkml.kernel.org/r/20191225172228.GA13378@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9573e8f7 |
|
04-Dec-2019 |
Miaohe Lin <linmiaohe@huawei.com> |
include/linux/proc_fs.h: fix confusing macro arg name state_size and ops are in the wrong position. Link: http://lkml.kernel.org/r/20190910021747.11216-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Aleksa Sarai <cyphar@cyphar.com> Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
68bc30bb |
|
05-Jun-2019 |
Aubrey Li <aubrey.li@linux.intel.com> |
proc: Add /proc/<pid>/arch_status Exposing architecture specific per process information is useful for various reasons. An example is the AVX512 usage on x86 which is important for task placement for power/performance optimizations. Adding this information to the existing /prcc/pid/status file would be the obvious choise, but it has been agreed on that a explicit arch_status file is better in separating the generic and architecture specific information. [ tglx: Massage changelog ] Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andrew Morton <akpm@linux-foundation.org> Cc: peterz@infradead.org Cc: hpa@zytor.com Cc: ak@linux.intel.com Cc: tim.c.chen@linux.intel.com Cc: dave.hansen@intel.com Cc: arjan@linux.intel.com Cc: adobriyan@gmail.com Cc: aubrey.li@intel.com Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Linux API <linux-api@vger.kernel.org> Link: https://lkml.kernel.org/r/20190606012236.9391-1-aubrey.li@linux.intel.com
|
#
3eb39f47 |
|
18-Nov-2018 |
Christian Brauner <christian@brauner.io> |
signal: add pidfd_send_signal() syscall The kill() syscall operates on process identifiers (pid). After a process has exited its pid can be reused by another process. If a caller sends a signal to a reused pid it will end up signaling the wrong process. This issue has often surfaced and there has been a push to address this problem [1]. This patch uses file descriptors (fd) from proc/<pid> as stable handles on struct pid. Even if a pid is recycled the handle will not change. The fd can be used to send signals to the process it refers to. Thus, the new syscall pidfd_send_signal() is introduced to solve this problem. Instead of pids it operates on process fds (pidfd). /* prototype and argument /* long pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags); /* syscall number 424 */ The syscall number was chosen to be 424 to align with Arnd's rework in his y2038 to minimize merge conflicts (cf. [25]). In addition to the pidfd and signal argument it takes an additional siginfo_t and flags argument. If the siginfo_t argument is NULL then pidfd_send_signal() is equivalent to kill(<positive-pid>, <signal>). If it is not NULL pidfd_send_signal() is equivalent to rt_sigqueueinfo(). The flags argument is added to allow for future extensions of this syscall. It currently needs to be passed as 0. Failing to do so will cause EINVAL. /* pidfd_send_signal() replaces multiple pid-based syscalls */ The pidfd_send_signal() syscall currently takes on the job of rt_sigqueueinfo(2) and parts of the functionality of kill(2), Namely, when a positive pid is passed to kill(2). It will however be possible to also replace tgkill(2) and rt_tgsigqueueinfo(2) if this syscall is extended. /* sending signals to threads (tid) and process groups (pgid) */ Specifically, the pidfd_send_signal() syscall does currently not operate on process groups or threads. This is left for future extensions. In order to extend the syscall to allow sending signal to threads and process groups appropriately named flags (e.g. PIDFD_TYPE_PGID, and PIDFD_TYPE_TID) should be added. This implies that the flags argument will determine what is signaled and not the file descriptor itself. Put in other words, grouping in this api is a property of the flags argument not a property of the file descriptor (cf. [13]). Clarification for this has been requested by Eric (cf. [19]). When appropriate extensions through the flags argument are added then pidfd_send_signal() can additionally replace the part of kill(2) which operates on process groups as well as the tgkill(2) and rt_tgsigqueueinfo(2) syscalls. How such an extension could be implemented has been very roughly sketched in [14], [15], and [16]. However, this should not be taken as a commitment to a particular implementation. There might be better ways to do it. Right now this is intentionally left out to keep this patchset as simple as possible (cf. [4]). /* naming */ The syscall had various names throughout iterations of this patchset: - procfd_signal() - procfd_send_signal() - taskfd_send_signal() In the last round of reviews it was pointed out that given that if the flags argument decides the scope of the signal instead of different types of fds it might make sense to either settle for "procfd_" or "pidfd_" as prefix. The community was willing to accept either (cf. [17] and [18]). Given that one developer expressed strong preference for the "pidfd_" prefix (cf. [13]) and with other developers less opinionated about the name we should settle for "pidfd_" to avoid further bikeshedding. The "_send_signal" suffix was chosen to reflect the fact that the syscall takes on the job of multiple syscalls. It is therefore intentional that the name is not reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the fomer because it might imply that pidfd_send_signal() is a replacement for kill(2), and not the latter because it is a hassle to remember the correct spelling - especially for non-native speakers - and because it is not descriptive enough of what the syscall actually does. The name "pidfd_send_signal" makes it very clear that its job is to send signals. /* zombies */ Zombies can be signaled just as any other process. No special error will be reported since a zombie state is an unreliable state (cf. [3]). However, this can be added as an extension through the @flags argument if the need ever arises. /* cross-namespace signals */ The patch currently enforces that the signaler and signalee either are in the same pid namespace or that the signaler's pid namespace is an ancestor of the signalee's pid namespace. This is done for the sake of simplicity and because it is unclear to what values certain members of struct siginfo_t would need to be set to (cf. [5], [6]). /* compat syscalls */ It became clear that we would like to avoid adding compat syscalls (cf. [7]). The compat syscall handling is now done in kernel/signal.c itself by adding __copy_siginfo_from_user_generic() which lets us avoid compat syscalls (cf. [8]). It should be noted that the addition of __copy_siginfo_from_user_any() is caused by a bug in the original implementation of rt_sigqueueinfo(2) (cf. 12). With upcoming rework for syscall handling things might improve significantly (cf. [11]) and __copy_siginfo_from_user_any() will not gain any additional callers. /* testing */ This patch was tested on x64 and x86. /* userspace usage */ An asciinema recording for the basic functionality can be found under [9]. With this patch a process can be killed via: #define _GNU_SOURCE #include <errno.h> #include <fcntl.h> #include <signal.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/stat.h> #include <sys/syscall.h> #include <sys/types.h> #include <unistd.h> static inline int do_pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags) { #ifdef __NR_pidfd_send_signal return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags); #else return -ENOSYS; #endif } int main(int argc, char *argv[]) { int fd, ret, saved_errno, sig; if (argc < 3) exit(EXIT_FAILURE); fd = open(argv[1], O_DIRECTORY | O_CLOEXEC); if (fd < 0) { printf("%s - Failed to open \"%s\"\n", strerror(errno), argv[1]); exit(EXIT_FAILURE); } sig = atoi(argv[2]); printf("Sending signal %d to process %s\n", sig, argv[1]); ret = do_pidfd_send_signal(fd, sig, NULL, 0); saved_errno = errno; close(fd); errno = saved_errno; if (ret < 0) { printf("%s - Failed to send signal %d to process %s\n", strerror(errno), sig, argv[1]); exit(EXIT_FAILURE); } exit(EXIT_SUCCESS); } /* Q&A * Given that it seems the same questions get asked again by people who are * late to the party it makes sense to add a Q&A section to the commit * message so it's hopefully easier to avoid duplicate threads. * * For the sake of progress please consider these arguments settled unless * there is a new point that desperately needs to be addressed. Please make * sure to check the links to the threads in this commit message whether * this has not already been covered. */ Q-01: (Florian Weimer [20], Andrew Morton [21]) What happens when the target process has exited? A-01: Sending the signal will fail with ESRCH (cf. [22]). Q-02: (Andrew Morton [21]) Is the task_struct pinned by the fd? A-02: No. A reference to struct pid is kept. struct pid - as far as I understand - was created exactly for the reason to not require to pin struct task_struct (cf. [22]). Q-03: (Andrew Morton [21]) Does the entire procfs directory remain visible? Just one entry within it? A-03: The same thing that happens right now when you hold a file descriptor to /proc/<pid> open (cf. [22]). Q-04: (Andrew Morton [21]) Does the pid remain reserved? A-04: No. This patchset guarantees a stable handle not that pids are not recycled (cf. [22]). Q-05: (Andrew Morton [21]) Do attempts to signal that fd return errors? A-05: See {Q,A}-01. Q-06: (Andrew Morton [22]) Is there a cleaner way of obtaining the fd? Another syscall perhaps. A-06: Userspace can already trivially retrieve file descriptors from procfs so this is something that we will need to support anyway. Hence, there's no immediate need to add another syscalls just to make pidfd_send_signal() not dependent on the presence of procfs. However, adding a syscalls to get such file descriptors is planned for a future patchset (cf. [22]). Q-07: (Andrew Morton [21] and others) This fd-for-a-process sounds like a handy thing and people may well think up other uses for it in the future, probably unrelated to signals. Are the code and the interface designed to permit such future applications? A-07: Yes (cf. [22]). Q-08: (Andrew Morton [21] and others) Now I think about it, why a new syscall? This thing is looking rather like an ioctl? A-08: This has been extensively discussed. It was agreed that a syscall is preferred for a variety or reasons. Here are just a few taken from prior threads. Syscalls are safer than ioctl()s especially when signaling to fds. Processes are a core kernel concept so a syscall seems more appropriate. The layout of the syscall with its four arguments would require the addition of a custom struct for the ioctl() thereby causing at least the same amount or even more complexity for userspace than a simple syscall. The new syscall will replace multiple other pid-based syscalls (see description above). The file-descriptors-for-processes concept introduced with this syscall will be extended with other syscalls in the future. See also [22], [23] and various other threads already linked in here. Q-09: (Florian Weimer [24]) What happens if you use the new interface with an O_PATH descriptor? A-09: pidfds opened as O_PATH fds cannot be used to send signals to a process (cf. [2]). Signaling processes through pidfds is the equivalent of writing to a file. Thus, this is not an operation that operates "purely at the file descriptor level" as required by the open(2) manpage. See also [4]. /* References */ [1]: https://lore.kernel.org/lkml/20181029221037.87724-1-dancol@google.com/ [2]: https://lore.kernel.org/lkml/874lbtjvtd.fsf@oldenburg2.str.redhat.com/ [3]: https://lore.kernel.org/lkml/20181204132604.aspfupwjgjx6fhva@brauner.io/ [4]: https://lore.kernel.org/lkml/20181203180224.fkvw4kajtbvru2ku@brauner.io/ [5]: https://lore.kernel.org/lkml/20181121213946.GA10795@mail.hallyn.com/ [6]: https://lore.kernel.org/lkml/20181120103111.etlqp7zop34v6nv4@brauner.io/ [7]: https://lore.kernel.org/lkml/36323361-90BD-41AF-AB5B-EE0D7BA02C21@amacapital.net/ [8]: https://lore.kernel.org/lkml/87tvjxp8pc.fsf@xmission.com/ [9]: https://asciinema.org/a/IQjuCHew6bnq1cr78yuMv16cy [11]: https://lore.kernel.org/lkml/F53D6D38-3521-4C20-9034-5AF447DF62FF@amacapital.net/ [12]: https://lore.kernel.org/lkml/87zhtjn8ck.fsf@xmission.com/ [13]: https://lore.kernel.org/lkml/871s6u9z6u.fsf@xmission.com/ [14]: https://lore.kernel.org/lkml/20181206231742.xxi4ghn24z4h2qki@brauner.io/ [15]: https://lore.kernel.org/lkml/20181207003124.GA11160@mail.hallyn.com/ [16]: https://lore.kernel.org/lkml/20181207015423.4miorx43l3qhppfz@brauner.io/ [17]: https://lore.kernel.org/lkml/CAGXu5jL8PciZAXvOvCeCU3wKUEB_dU-O3q0tDw4uB_ojMvDEew@mail.gmail.com/ [18]: https://lore.kernel.org/lkml/20181206222746.GB9224@mail.hallyn.com/ [19]: https://lore.kernel.org/lkml/20181208054059.19813-1-christian@brauner.io/ [20]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/ [21]: https://lore.kernel.org/lkml/20181228152012.dbf0508c2508138efc5f2bbe@linux-foundation.org/ [22]: https://lore.kernel.org/lkml/20181228233725.722tdfgijxcssg76@brauner.io/ [23]: https://lwn.net/Articles/773459/ [24]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/ [25]: https://lore.kernel.org/lkml/CAK8P3a0ej9NcJM8wXNPbcGUyOUZYX+VLoDFdbenW3s3114oQZw@mail.gmail.com/ Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Jann Horn <jannh@google.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Florian Weimer <fweimer@redhat.com> Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Tycho Andersen <tycho@tycho.ws> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Aleksa Sarai <cyphar@cyphar.com>
|
#
891ae71d |
|
21-Aug-2018 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: spread "const" a bit Link: http://lkml.kernel.org/r/20180627200614.GB18434@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
564def71 |
|
18-May-2018 |
David Howells <dhowells@redhat.com> |
proc: Add a way to make network proc files writable Provide two extra functions, proc_create_net_data_write() and proc_create_net_single_write() that act like their non-write versions but also set a write method in the proc_dir_entry struct. An internal simple write function is provided that will copy its buffer and hand it to the pde->write() method if available (or give an error if not). The buffer may be modified by the write method. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
3617d949 |
|
13-Apr-2018 |
Christoph Hellwig <hch@lst.de> |
proc: introduce proc_create_net_single Variant of proc_create_data that directly take a seq_file show callback and deals with network namespaces in ->open and ->release. All callers of proc_create + single_open_net converted over, and single_{open,release}_net are removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
c3506372 |
|
10-Apr-2018 |
Christoph Hellwig <hch@lst.de> |
proc: introduce proc_create_net{,_data} Variants of proc_create{,_data} that directly take a struct seq_operations and deal with network namespaces in ->open and ->release. All callers of proc_create + seq_open_net converted over, and seq_{open,release}_net are removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
3f3942ac |
|
15-May-2018 |
Christoph Hellwig <hch@lst.de> |
proc: introduce proc_create_single{,_data} Variants of proc_create{,_data} that directly take a seq_file show callback and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
44414d82 |
|
24-Apr-2018 |
Christoph Hellwig <hch@lst.de> |
proc: introduce proc_create_seq_private Variant of proc_create_data that directly take a struct seq_operations argument + a private state size and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
fddda2b7 |
|
13-Apr-2018 |
Christoph Hellwig <hch@lst.de> |
proc: introduce proc_create_seq{,_data} Variants of proc_create{,_data} that directly take a struct seq_operations argument and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
76f668be |
|
15-May-2018 |
Christoph Hellwig <hch@lst.de> |
proc: introduce a proc_pid_ns helper Factor out retrieving the per-sb pid namespaces from the sb private data into an easier to understand helper. Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
b2441318 |
|
01-Nov-2017 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
855d9765 |
|
08-Sep-2017 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: uninline proc_create() Save some code from ~320 invocations all clearing last argument. add/remove: 3/0 grow/shrink: 0/158 up/down: 45/-702 (-657) function old new delta proc_create - 17 +17 __ksymtab_proc_create - 16 +16 __kstrtab_proc_create - 12 +12 yam_init_driver 301 298 -3 ... cifs_proc_init 249 228 -21 via_fb_pci_probe 2304 2280 -24 Link: http://lkml.kernel.org/r/20170819094702.GA27864@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f97df70b |
|
14-Nov-2016 |
Seth Forshee <seth.forshee@canonical.com> |
xenfs: Use proc_create_mount_point() to create /proc/xen Mounting proc in user namespace containers fails if the xenbus filesystem is mounted on /proc/xen because this directory fails the "permanently empty" test. proc_create_mount_point() exists specifically to create such mountpoints in proc but is currently proc-internal. Export this interface to modules, then use it in xenbus when creating /proc/xen. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
c62cce2c |
|
24-Oct-2016 |
Andrey Vagin <avagin@openvz.org> |
net: add an ioctl to get a socket network namespace Each socket operates in a network namespace where it has been created, so if we want to dump and restore a socket, we have to know its network namespace. We have a socket_diag to get information about sockets, it doesn't report sockets which are not bound or connected. This patch introduces a new socket ioctl, which is called SIOCGSKNS and used to get a file descriptor for a socket network namespace. A task must have CAP_NET_ADMIN in a target network namespace to use this ioctl. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f7790029 |
|
12-Sep-2014 |
Jeff Layton <jlayton@kernel.org> |
lockd: move lockd's grace period handling into its own module Currently, all of the grace period handling is part of lockd. Eventually though we'd like to be able to build v4-only servers, at which point we'll need to put all of this elsewhere. Move the code itself into fs/nfs_common and have it build a grace.ko module. Then, rejigger the Kconfig options so that both nfsd and lockd enable it automatically. Signed-off-by: Jeff Layton <jlayton@primarydata.com>
|
#
647f010b |
|
04-Jun-2014 |
Andrew Morton <akpm@linux-foundation.org> |
init/main.c: remove an ifdef Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
59d8053f |
|
11-Apr-2013 |
David Howells <dhowells@redhat.com> |
proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h Move non-public declarations and definitions from linux/proc_fs.h to fs/proc/internal.h. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
c30480b9 |
|
12-Apr-2013 |
David Howells <dhowells@redhat.com> |
proc: Make the PROC_I() and PDE() macros internal to procfs Make the PROC_I() and PDE() macros internal to procfs. This means making PDE_DATA() out of line. This could be made more optimal by storing PDE()->data into inode->i_private. Also provide a __PDE_DATA() that is inline and internal to procfs. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
a8ca16ea |
|
12-Apr-2013 |
David Howells <dhowells@redhat.com> |
proc: Supply a function to remove a proc entry by PDE Supply a function (proc_remove()) to remove a proc entry (and any subtree rooted there) by proc_dir_entry pointer rather than by name and (optionally) root dir entry pointer. This allows us to eliminate all remaining pde->name accesses outside of procfs. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Grant Likely <grant.likely@linaro.or> cc: linux-acpi@vger.kernel.org cc: openipmi-developer@lists.sourceforge.net cc: devicetree-discuss@lists.ozlabs.org cc: linux-pci@vger.kernel.org cc: netdev@vger.kernel.org cc: netfilter-devel@vger.kernel.org cc: alsa-devel@alsa-project.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4a520d27 |
|
12-Apr-2013 |
David Howells <dhowells@redhat.com> |
proc: Supply an accessor for getting the data from a PDE's parent Supply an accessor function for getting the private data from the parent proc_dir_entry struct of the proc_dir_entry struct associated with an inode. ReiserFS, for instance, stores the super_block pointer in the proc directory it makes for that super_block, and a pointer to the respective seq_file show function in each of the proc files in that directory. This allows a reduction in the number of file_operations structs, open functions and seq_operations structs required. The problem otherwise is that each show function requires two pieces of data but only has storage for one per PDE (and this has no release function). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: Jerry Chuang <jerry-chuang@realtek.com> cc: Maxim Mikityanskiy <maxtram95@gmail.com> cc: YAMANE Toshiaki <yamanetoshi@gmail.com> cc: linux-wireless@vger.kernel.org cc: linux-scsi@vger.kernel.org cc: devel@driverdev.osuosl.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
270b5ac2 |
|
11-Apr-2013 |
David Howells <dhowells@redhat.com> |
proc: Add proc_mkdir_data() Add proc_mkdir_data() to allow procfs directories to be created that are annotated at the time of creation with private data rather than doing this post-creation. This means no access is then required to the proc_dir_entry struct to set this. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: Neela Syam Kolli <megaraidlinux@lsi.com> cc: Jerry Chuang <jerry-chuang@realtek.com> cc: linux-scsi@vger.kernel.org cc: devel@driverdev.osuosl.org cc: linux-wireless@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
34db8aaf |
|
11-Apr-2013 |
David Howells <dhowells@redhat.com> |
proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} Move some bits from linux/proc_fs.h to linux/of.h, signal.h and tty.h. Also move proc_tty_init() and proc_device_tree_init() to fs/proc/internal.h as they're internal to procfs. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Grant Likely <grant.likely@secretlab.ca> cc: devicetree-discuss@lists.ozlabs.org cc: linux-arch@vger.kernel.org cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: Jri Slaby <jslaby@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4abfd029 |
|
11-Apr-2013 |
David Howells <dhowells@redhat.com> |
proc: Move PDE_NET() to fs/proc/proc_net.c Move PDE_NET() to fs/proc/proc_net.c as that's where the only user is. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
0bb80f24 |
|
11-Apr-2013 |
David Howells <dhowells@redhat.com> |
proc: Split the namespace stuff out into linux/proc_ns.h Split the proc namespace stuff out into linux/proc_ns.h. Signed-off-by: David Howells <dhowells@redhat.com> cc: netdev@vger.kernel.org cc: Serge E. Hallyn <serge.hallyn@ubuntu.com> cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
271a15ea |
|
11-Apr-2013 |
David Howells <dhowells@redhat.com> |
proc: Supply PDE attribute setting accessor functions Supply accessor functions to set attributes in proc_dir_entry structs. The following are supplied: proc_set_size() and proc_set_user(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> cc: linuxppc-dev@lists.ozlabs.org cc: linux-media@vger.kernel.org cc: netdev@vger.kernel.org cc: linux-wireless@vger.kernel.org cc: linux-pci@vger.kernel.org cc: netfilter-devel@vger.kernel.org cc: alsa-devel@alsa-project.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
2f96b8c1 |
|
11-Apr-2013 |
David Howells <dhowells@redhat.com> |
proc: Split kcore bits from linux/procfs.h into linux/kcore.h Split kcore bits from linux/procfs.h into linux/kcore.h. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> cc: linux-mips@linux-mips.org cc: sparclinux@vger.kernel.org cc: x86@kernel.org cc: linux-mm@kvack.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
3cb5bf1b |
|
10-Apr-2013 |
David Howells <dhowells@redhat.com> |
proc: Delete create_proc_read_entry() Delete create_proc_read_entry() as it no longer has any users. Also delete read_proc_t, write_proc_t, the read_proc member of the proc_dir_entry struct and the support functions that use them. This saves a pointer for every PDE allocated. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
d0206fb5 |
|
09-Apr-2013 |
David Howells <dhowells@redhat.com> |
procfs: Mark create_proc_read_entry deprecated Mark create_proc_read_entry deprecated. proc_create[_data]() should be used instead. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
05c0ae21 |
|
04-Apr-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
try a saner locking for pde_opener... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
866ad9a7 |
|
03-Apr-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
procfs: preparations for remove_proc_entry() race fixes * leave ->proc_fops alone; make ->pde_users negative instead * trim pde_opener * move relevant code in fs/proc/inode.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
80e928f7 |
|
04-Apr-2013 |
David Howells <dhowells@redhat.com> |
proc: Kill create_proc_entry() Kill create_proc_entry() in favour of create_proc_read_entry(), proc_create() and proc_create_data(). Signed-off-by: David Howells <dhowells@redhat.com>
|
#
d9dda78b |
|
31-Mar-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
procfs: new helper - PDE_DATA(inode) The only part of proc_dir_entry the code outside of fs/proc really cares about is PDE(inode)->data. Provide a helper for that; static inline for now, eventually will be moved to fs/proc, along with the knowledge of struct proc_dir_entry layout. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
ee21ed0a |
|
31-Mar-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
procfs: kill ->write_proc() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
8ce584c7 |
|
30-Mar-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
procfs: add proc_remove_subtree() just what it sounds like; do that only to procfs subtrees you've created - doing that to something shared with another driver is not only antisocial, but might cause interesting races with proc_create() and its ilk. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4fc1a601 |
|
18-Feb-2013 |
Gao feng <gaofeng@cn.fujitsu.com> |
net: proc: fix build failed when procfs is not configured commit d4beaa66add8aebf83ab16d2fde4e4de8dac36df "net: proc: change proc_net_fops_create to proc_create" uses proc_create to replace proc_net_fops_create, when CONFIG_PROC isn't configured, some build error will occurs. net/packet/af_packet.c: In function 'packet_net_init': net/packet/af_packet.c:3831:48: error: 'packet_seq_fops' undeclared (first use in this function) net/packet/af_packet.c:3831:48: note: each undeclared identifier is reported only once for each function it appears in There may be other build fails like above,this patch change proc_create from function to macros when CONFIG_PROC is not configured,just like what proc_net_fops_create did before this commit. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c2399059 |
|
17-Feb-2013 |
Gao feng <gaofeng@cn.fujitsu.com> |
net: proc: remove proc_net_remove proc_net_remove has been replaced by remove_proc_entry. we can remove it now. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b4278c96 |
|
17-Feb-2013 |
Gao feng <gaofeng@cn.fujitsu.com> |
net: proc: remove proc_net_fops_create proc_net_fops_create has been replaced by proc_create, we can remove it now. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4f73bc4d |
|
17-Jan-2013 |
Joe Millenbach <jmillenbach@gmail.com> |
tty: Added a CONFIG_TTY option to allow removal of TTY The option allows you to remove TTY and compile without errors. This saves space on systems that won't support TTY interfaces anyway. bloat-o-meter output is below. The bulk of this patch consists of Kconfig changes adding "depends on TTY" to various serial devices and similar drivers that require the TTY layer. Ideally, these dependencies would occur on a common intermediate symbol such as SERIO, but most drivers "select SERIO" rather than "depends on SERIO", and "select" does not respect dependencies. bloat-o-meter output comparing our previous minimal to new minimal by removing TTY. The list is filtered to not show removed entries with awk '$3 != "-"' as the list was very long. add/remove: 0/226 grow/shrink: 2/14 up/down: 6/-35356 (-35350) function old new delta chr_dev_init 166 170 +4 allow_signal 80 82 +2 static.__warned 143 142 -1 disallow_signal 63 62 -1 __set_special_pids 95 94 -1 unregister_console 126 121 -5 start_kernel 546 541 -5 register_console 593 588 -5 copy_from_user 45 40 -5 sys_setsid 128 120 -8 sys_vhangup 32 19 -13 do_exit 1543 1526 -17 bitmap_zero 60 40 -20 arch_local_irq_save 137 117 -20 release_task 674 652 -22 static.spin_unlock_irqrestore 308 260 -48 Signed-off-by: Joe Millenbach <jmillenbach@gmail.com> Reviewed-by: Jamey Sharp <jamey@minilop.net> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
138d22b5 |
|
17-Dec-2012 |
Cyrill Gorcunov <gorcunov@openvz.org> |
fs, epoll: add procfs fdinfo helper This allows us to print out eventpoll target file descriptor, events and data, the /proc/pid/fdinfo/fd consists of | pos: 0 | flags: 02 | tfd: 5 events: 1d data: ffffffffffffffff enabled: 1 [avagin@: fix for unitialized ret variable] Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: James Bottomley <jbottomley@parallels.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matthew Helsley <matt.helsley@gmail.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
98f842e6 |
|
15-Jun-2011 |
Eric W. Biederman <ebiederm@xmission.com> |
proc: Usable inode numbers for the namespace file descriptors. Assign a unique proc inode to each namespace, and use that inode number to ensure we only allocate at most one proc inode for every namespace in proc. A single proc inode per namespace allows userspace to test to see if two processes are in the same namespace. This has been a long requested feature and only blocked because a naive implementation would put the id in a global space and would ultimately require having a namespace for the names of namespaces, making migration and certain virtualization tricks impossible. We still don't have per superblock inode numbers for proc, which appears necessary for application unaware checkpoint/restart and migrations (if the application is using namespace file descriptors) but that is now allowd by the design if it becomes important. I have preallocated the ipc and uts initial proc inode numbers so their structures can be statically initialized. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
33d6dce6 |
|
17-Jun-2011 |
Eric W. Biederman <ebiederm@xmission.com> |
proc: Generalize proc inode allocation Generalize the proc inode allocation so that it can be used without having to having to create a proc_dir_entry. This will allow namespace file descriptors to remain light weight entitities but still have the same inode number when the backing namespace is the same. Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
cde1975b |
|
26-Jul-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Implent proc namespace operations This allows entering a user namespace, and the ability to store a reference to a user namespace with a bind mount. Addition of missing userns_ns_put in userns_install from Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
8823c079 |
|
07-Mar-2010 |
Eric W. Biederman <ebiederm@xmission.com> |
vfs: Add setns support for the mount namespace setns support for the mount namespace is a little tricky as an arbitrary decision must be made about what to set fs->root and fs->pwd to, as there is no expectation of a relationship between the two mount namespaces. Therefore I arbitrarily find the root mount point, and follow every mount on top of it to find the top of the mount stack. Then I set fs->root and fs->pwd to that location. The topmost root of the mount stack seems like a reasonable place to be. Bind mount support for the mount namespace inodes has the possibility of creating circular dependencies between mount namespaces. Circular dependencies can result in loops that prevent mount namespaces from every being freed. I avoid creating those circular dependencies by adding a sequence number to the mount namespace and require all bind mounts be of a younger mount namespace into an older mount namespace. Add a helper function proc_ns_inode so it is possible to detect when we are attempting to bind mound a namespace inode. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
57e8391d |
|
07-Mar-2010 |
Eric W. Biederman <ebiederm@xmission.com> |
pidns: Add setns support - Pid namespaces are designed to be inescapable so verify that the passed in pid namespace is a child of the currently active pid namespace or the currently active pid namespace itself. Allowing the currently active pid namespace is important so the effects of an earlier setns can be cancelled. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
dcb0f222 |
|
09-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert proc to use kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
7773fbc5 |
|
10-Jan-2012 |
Cyrill Gorcunov <gorcunov@openvz.org> |
procfs: make proc_get_link to use dentry instead of inode Prepare the ground for the next "map_files" patch which needs a name of a link file to analyse. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d161a13f |
|
24-Jul-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
switch procfs to umode_t use both proc_dir_entry ->mode and populating functions Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
09570f91 |
|
27-Jul-2011 |
David Howells <dhowells@redhat.com> |
proc: make struct proc_dir_entry::name a terminal array rather than a pointer Since __proc_create() appends the name it is given to the end of the PDE structure that it allocates, there isn't a need to store a name pointer. Instead we can just replace the name pointer with a terminal char array of _unspecified_ length. The compiler will simply append the string to statically defined variables of PDE type overlapping any hole at the end of the structure and, unlike specifying an explicitly _zero_ length array, won't give a warning if you try to statically initialise it with a string of more than zero length. Also, whilst we're at it: (1) Move namelen to end just prior to name and reduce it to a single byte (name shouldn't be longer than NAME_MAX). (2) Move pde_unload_lock two places further on so that if it's four bytes in size on a 64-bit machine, it won't cause an unused hole in the PDE struct. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
60063497 |
|
26-Jul-2011 |
Arun Sharma <asharma@fb.com> |
atomic: use <linux/atomic.h> This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
38646013 |
|
26-May-2011 |
Jiri Slaby <jirislaby@kernel.org> |
mm: extract exe_file handling from procfs Setup and cleanup of mm_struct->exe_file is currently done in fs/proc/. This was because exe_file was needed only for /proc/<pid>/exe. Since we will need the exe_file functionality also for core dumps (so core name can contain full binary path), built this functionality always into the kernel. To achieve that move that out of proc FS to the kernel/ where in fact it should belong. By doing that we can make dup_mm_exe_file static. Also we can drop linux/proc_fs.h inclusion in fs/exec.c and kernel/fork.c. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f2beb798 |
|
24-May-2011 |
Stephen Wilson <wilsons@start.ca> |
proc: make struct proc_maps_private truly private Now that mm/mempolicy.c is no longer implementing /proc/pid/numa_maps there is no need to export struct proc_maps_private to the world. Move it to fs/proc/internal.h instead. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f12a20fc |
|
17-May-2011 |
Randy Dunlap <randy.dunlap@oracle.com> |
procfs: add stub for proc_mkdir_mode() Provide a stub for proc_mkdir_mode() when CONFIG_PROC_FS is not enabled, just like the stub for proc_mkdir(). Fixes this linux-next build error: drivers/net/wireless/airo.c:4504: error: implicit declaration of function 'proc_mkdir_mode' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "John W. Linville" <linville@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a00eaf11 |
|
07-Mar-2010 |
Eric W. Biederman <ebiederm@xmission.com> |
ns proc: Add support for the ipc namespace Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
34482e89 |
|
07-Mar-2010 |
Eric W. Biederman <ebiederm@xmission.com> |
ns proc: Add support for the uts namespace Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
13b6f576 |
|
07-Mar-2010 |
Eric W. Biederman <ebiederm@xmission.com> |
ns proc: Add support for the network namespace. Implementing file descriptors for the network namespace is simple and straight forward. Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
6b4e306a |
|
07-Mar-2010 |
Eric W. Biederman <ebiederm@xmission.com> |
ns: proc files for namespace naming policy. Create files under /proc/<pid>/ns/ to allow controlling the namespaces of a process. This addresses three specific problems that can make namespaces hard to work with. - Namespaces require a dedicated process to pin them in memory. - It is not possible to use a namespace unless you are the child of the original creator. - Namespaces don't have names that userspace can use to talk about them. The namespace files under /proc/<pid>/ns/ can be opened and the file descriptor can be used to talk about a specific namespace, and to keep the specified namespace alive. A namespace can be kept alive by either holding the file descriptor open or bind mounting the file someplace else. aka: mount --bind /proc/self/ns/net /some/filesystem/path mount --bind /proc/self/fd/<N> /some/filesystem/path This allows namespaces to be named with userspace policy. It requires additional support to make use of these filedescriptors and that will be comming in the following patches. Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
312ec7e5 |
|
23-Mar-2011 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: make struct proc_dir_entry::namelen unsigned int 1. namelen is declared "unsigned short" which hints for "maybe space savings". Indeed in 2.4 struct proc_dir_entry looked like: struct proc_dir_entry { unsigned short low_ino; unsigned short namelen; Now, low_ino is "unsigned int", all savings were gone for a long time. "struct proc_dir_entry" is not that countless to worry about it's size, anyway. 2. converting from unsigned short to int/unsigned int can only create problems, we better play it safe. Space is not really conserved, because of natural alignment for the next field. sizeof(struct proc_dir_entry) remains the same. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
26562c59 |
|
22-Sep-2009 |
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> |
kcore: register vmemmap range Benjamin Herrenschmidt <benh@kernel.crashing.org> pointed out that vmemmap range is not included in KCORE_RAM, KCORE_VMALLOC .... This adds KCORE_VMEMMAP if SPARSEMEM_VMEMMAP is used. By this, vmemmap can be readable via /proc/kcore Because it's not vmalloc area, vread/vwrite cannot be used. But the range is static against the memory layout, this patch handles vmemmap area by the same scheme with physical memory. This patch assumes SPARSEMEM_VMEMMAP range is not in VMALLOC range. It's correct now. [akpm@linux-foundation.org: fix typo] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c30bb2a2 |
|
22-Sep-2009 |
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> |
kcore: add kclist types Presently, kclist_add() only eats start address and size as its arguments. Considering to make kclist dynamically reconfigulable, it's necessary to know which kclists are for System RAM and which are not. This patch add kclist types as KCORE_RAM KCORE_VMALLOC KCORE_TEXT KCORE_OTHER This "type" is used in a patch following this for detecting KCORE_RAM. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2ef43ec7 |
|
22-Sep-2009 |
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> |
kcore: use usual list for kclist This patchset is for /proc/kcore. With this, - many per-arch hooks are removed. - /proc/kcore will know really valid physical memory area. - /proc/kcore will be aware of memory hotplug. - /proc/kcore will be architecture independent i.e. if an arch supports CONFIG_MMU, it can use /proc/kcore. (if the arch uses usual memory layout.) This patch: /proc/kcore uses its own list handling codes. It's better to use generic list codes. No changes in logic. just clean up. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3174c21b |
|
07-Apr-2009 |
Al Viro <viro@zeniv.linux.org.uk> |
Move junk from proc_fs.h to fs/proc/internal.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
99b76233 |
|
25-Mar-2009 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc 2/2: remove struct proc_dir_entry::owner Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy as correctly noted at bug #12454. Someone can lookup entry with NULL ->owner, thus not pinning enything, and release it later resulting in module refcount underflow. We can keep ->owner and supply it at registration time like ->proc_fops and ->data. But this leaves ->owner as easy-manipulative field (just one C assignment) and somebody will forget to unpin previous/pin current module when switching ->owner. ->proc_fops is declared as "const" which should give some thoughts. ->read_proc/->write_proc were just fixed to not require ->owner for protection. rmmod'ed directories will be empty and return "." and ".." -- no harm. And directories with tricky enough readdir and lookup shouldn't be modular. We definitely don't want such modular code. Removing ->owner will also make PDE smaller. So, let's nuke it. Kudos to Jeff Layton for reminding about this, let's say, oversight. http://bugzilla.kernel.org/show_bug.cgi?id=12454 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
|
#
59c7572e |
|
06-Oct-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: remove fs/proc/proc_misc.c Now that everything was moved to their more or less expected places, apply rm(1). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
|
#
97ce5d6d |
|
06-Oct-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: move all /proc/kcore stuff to fs/proc/kcore.c Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
|
#
1b483a6a |
|
15-Sep-2008 |
Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> |
powerpc: Remove remains of /proc/ppc_htab commit 14cf11af6cf608eb8c23e989ddb17a715ddce109 ("powerpc: Merge enough to start building in arch/powerpc.") unwired /proc/ppc_htab, and commit 917f0af9e5a9ceecf9e72537fabb501254ba321d ("powerpc: Remove arch/ppc and include/asm-ppc") removed the rest of the /proc/ppc_htab support, but there are still a few references left. Kill them for good. Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
9043476f |
|
15-Jul-2008 |
Al Viro <viro@zeniv.linux.org.uk> |
[PATCH] sanitize proc_sysctl * keep references to ctl_table_head and ctl_table in /proc/sys inodes * grab the former during operations, use the latter for access to entry if that succeeds * have ->d_compare() check if table should be seen for one who does lookup; that allows us to avoid flipping inodes - if we have the same name resolve to different things, we'll just keep several dentries and ->d_compare() will reject the wrong ones. * have ->lookup() and ->readdir() scan the table of our inode first, then walk all ctl_table_header and scan ->attached_by for those that are attached to our directory. * implement ->getattr(). * get rid of insane amounts of tree-walking * get rid of the need to know dentry in ->permission() and of the contortions induced by that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
881adb85 |
|
25-Jul-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: always do ->release Current two-stage scheme of removing PDE emphasizes one bug in proc: open rmmod remove_proc_entry close ->release won't be called because ->proc_fops were cleared. In simple cases it's small memory leak. For every ->open, ->release has to be done. List of openers is introduced which is traversed at remove_proc_entry() if neeeded. Discussions with Al long ago (sigh). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6e644c31 |
|
25-Jul-2008 |
Adrian Bunk <bunk@kernel.org> |
move proc_kmsg_operations to fs/proc/internal.h This patch moves the extern of struct proc_kmsg_operations to fs/proc/internal.h and adds an #include "internal.h" to fs/proc/kmsg.c so that the latter sees the former. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8086cd45 |
|
22-Jul-2008 |
Adrian Bunk <bunk@kernel.org> |
netns: make get_proc_net() static get_proc_net() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
57d3c64f |
|
12-Jun-2008 |
Ben Nizette <bn@niasdigital.com> |
proc_fs.h: move struct mm_struct forward-declaration Move the forward-declaration of struct mm_struct a little way up proc_fs.h. This fixes a bunch of "'struct mm_struct' declared inside parameter list" warnings with CONFIG_PROC_FS=n Signed-off-by: Ben Nizette <bn@niasdigital.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
59b74351 |
|
29-Apr-2008 |
Denis V. Lunev <den@openvz.org> |
proc: introduce proc_create_data to setup de->data This set of patches fixes an proc ->open'less usage due to ->proc_fops flip in the most part of the kernel code. The original OOPS is described in the commit 2d3a4e3666325a9709cc8ea2e88151394e8f20fc: Typical PDE creation code looks like: pde = create_proc_entry("foo", 0, NULL); if (pde) pde->proc_fops = &foo_proc_fops; Notice that PDE is first created, only then ->proc_fops is set up to final value. This is a problem because right after creation a) PDE is fully visible in /proc , and b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's possible to ->read without ->open (see one class of oopses below). The fix is new API called proc_create() which makes sure ->proc_fops are set up before gluing PDE to main tree. Typical new code looks like: pde = proc_create("foo", 0, NULL, &foo_proc_fops); if (!pde) return -ENOMEM; Fix most networking users for a start. In the long run, create_proc_entry() for regular files will go. In addition to this, proc_create_data is introduced to fix reading from proc without PDE->data. The race is basically the same as above. create_proc_entries is replaced in the entire kernel code as new method is also simply better. This patch: The problem is the same as for de->proc_fops. Right now PDE becomes visible without data set. So, the entry could be looked up without data. This, in most cases, will simply OOPS. proc_create_data call is created to address this issue. proc_create now becomes a wrapper around it. Signed-off-by: Denis V. Lunev <den@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Chris Mason <chris.mason@oracle.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jaroslav Kysela <perex@suse.cz> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Karsten Keil <kkeil@suse.de> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: Len Brown <lenb@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Mikael Starvik <starvik@axis.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Neil Brown <neilb@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Osterlund <petero2@telia.com> Cc: Pierre Peiffer <peifferp@gmail.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8731f14d |
|
29-Apr-2008 |
Alexey Dobriyan <adobriyan@sw.ru> |
proc: remove ->get_info infrastructure Now that last dozen or so users of ->get_info were removed, ditch it too. Everyone sane shouldd have switched to seq_file interface long ago. P.S.: Co-existing 3 interfaces (->get_info/->read_proc/->proc_fops) for proc is long-standing crap, BTW, thus a) put ->read_proc/->write_proc/read_proc_entry() users on death row, b) new such users should be rejected, c) everyone is encouraged to convert his favourite ->read_proc user or I'll do it, lazy bastards. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c74c120a |
|
29-Apr-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: remove proc_root from drivers Remove proc_root export. Creation and removal works well if parent PDE is supplied as NULL -- it worked always that way. So, one useless export removed and consistency added, some drivers created PDEs with &proc_root as parent but removed them as NULL and so on. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
928b4d8c |
|
29-Apr-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: remove proc_root_driver Use creation by full path: "driver/foo". Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
36a5aeb8 |
|
29-Apr-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: remove proc_root_fs Use creation by full path instead: "fs/foo". Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9c37066d |
|
29-Apr-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: remove proc_bus Remove proc_bus export and variable itself. Using pathnames works fine and is slightly more understandable and greppable. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
925d1c40 |
|
29-Apr-2008 |
Matt Helsley <matthltc@us.ibm.com> |
procfs task exe symlink The kernel implements readlink of /proc/pid/exe by getting the file from the first executable VMA. Then the path to the file is reconstructed and reported as the result. Because of the VMA walk the code is slightly different on nommu systems. This patch avoids separate /proc/pid/exe code on nommu systems. Instead of walking the VMAs to find the first executable file-backed VMA we store a reference to the exec'd file in the mm_struct. That reference would prevent the filesystem holding the executable file from being unmounted even after unmapping the VMAs. So we track the number of VM_EXECUTABLE VMAs and drop the new reference when the last one is unmapped. This avoids pinning the mounted filesystem. [akpm@linux-foundation.org: improve comments] [yamamoto@valinux.co.jp: fix dup_mmap] Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: David Howells <dhowells@redhat.com> Cc:"Eric W. Biederman" <ebiederm@xmission.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e9720acd |
|
07-Mar-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NET]: Make /proc/net a symlink on /proc/self/net (v3) Current /proc/net is done with so called "shadows", but current implementation is broken and has little chances to get fixed. The problem is that dentries subtree of /proc/net directory has fancy revalidation rules to make processes living in different net namespaces see different entries in /proc/net subtree, but currently, tasks see in the /proc/net subdir the contents of any other namespace, depending on who opened the file first. The proposed fix is to turn /proc/net into a symlink, which points to /proc/self/net, which in turn shows what previously was in /proc/net - the network-related info, from the net namespace the appropriate task lives in. # ls -l /proc/net lrwxrwxrwx 1 root root 8 Mar 5 15:17 /proc/net -> self/net In other words - this behaves like /proc/mounts, but unlike "mounts", "net" is not a file, but a directory. Changes from v2: * Fixed discrepancy of /proc/net nlink count and selinux labeling screwup pointed out by Stephen. To get the correct nlink count the ->getattr callback for /proc/net is overridden to read one from the net->proc_net entry. To make selinux still work the net->proc_net entry is initialized properly, i.e. with the "net" name and the proc_net parent. Selinux fixes are Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Changes from v1: * Fixed a task_struct leak in get_proc_task_net, pointed out by Paul. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3dcd25f3 |
|
14-Feb-2008 |
Jan Blunck <jblunck@suse.de> |
d_path: Make proc_get_link() use a struct path argument proc_get_link() is always called with a dentry and a vfsmount from a struct path. Make proc_get_link() take it directly as an argument. Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2d3a4e36 |
|
08-Feb-2008 |
Alexey Dobriyan <adobriyan@sw.ru> |
proc: fix ->open'less usage due to ->proc_fops flip Typical PDE creation code looks like: pde = create_proc_entry("foo", 0, NULL); if (pde) pde->proc_fops = &foo_proc_fops; Notice that PDE is first created, only then ->proc_fops is set up to final value. This is a problem because right after creation a) PDE is fully visible in /proc , and b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's possible to ->read without ->open (see one class of oopses below). The fix is new API called proc_create() which makes sure ->proc_fops are set up before gluing PDE to main tree. Typical new code looks like: pde = proc_create("foo", 0, NULL, &foo_proc_fops); if (!pde) return -ENOMEM; Fix most networking users for a start. In the long run, create_proc_entry() for regular files will go. BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024 printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000 Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/block/sda/sda1/dev Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2) EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0 EIP is at mutex_lock_nested+0x75/0x25d EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570 ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000) Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb 00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246 00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550 Call Trace: [<c106f7ce>] seq_read+0x24/0x28a [<c106f7aa>] seq_read+0x0/0x28a [<c106f7ce>] seq_read+0x24/0x28a [<c106f7aa>] seq_read+0x0/0x28a [<c10818b8>] proc_reg_read+0x60/0x73 [<c1081858>] proc_reg_read+0x0/0x73 [<c105a34f>] vfs_read+0x6c/0x8b [<c105a6f3>] sys_read+0x3c/0x63 [<c10025f2>] sysenter_past_esp+0x5f/0xa5 [<c10697a7>] destroy_inode+0x24/0x33 ======================= INFO: lockdep is turned off. Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33 EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8 [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
df5f8314 |
|
08-Feb-2008 |
Eric W. Biederman <ebiederm@xmission.com> |
proc: seqfile convert proc_pid_status to properly handle pid namespaces Currently we possibly lookup the pid in the wrong pid namespace. So seq_file convert proc_pid_status which ensures the proper pid namespaces is passed in. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: another build fix] [akpm@linux-foundation.org: s390 build fix] [akpm@linux-foundation.org: fix task_name() output] [akpm@linux-foundation.org: fix nommu build] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Andrew Morgan <morgan@kernel.org> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
be614086 |
|
08-Feb-2008 |
Eric W. Biederman <ebiederm@xmission.com> |
proc: implement proc_single_file_operations Currently many /proc/pid files use a crufty precursor to the current seq_file api, and they don't have direct access to the pid_namespace or the pid of for which they are displaying data. So implement proc_single_file_operations to make the seq_file routines easy to use, and to give access to the full state of the pid of we are displaying data for. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f248dcb3 |
|
04-Feb-2008 |
Matt Mackall <mpm@selenic.com> |
maps4: move clear_refs code to task_mmu.c This puts all the clear_refs code where it belongs and probably lets things compile on MMU-less systems as well. Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: David Rientjes <rientjes@google.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e5d69b9f |
|
10-Jan-2008 |
Denis V. Lunev <den@openvz.org> |
[ATM]: Oops reading net/atm/arp cat /proc/net/atm/arp causes the NULL pointer dereference in the get_proc_net+0xc/0x3a. This happens as proc_get_net believes that the parent proc dir entry contains struct net. Fix this assumption for "net/atm" case. The problem is introduced by the commit c0097b07abf5f92ab135d024dd41bd2aada1512f from Eric W. Biederman/Daniel Lezcano. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a622f2d |
|
05-Dec-2007 |
Alexey Dobriyan <adobriyan@sw.ru> |
proc: fix proc_dir_entry refcounting Creating PDEs with refcount 0 and "deleted" flag has problems (see below). Switch to usual scheme: * PDE is created with refcount 1 * every de_get does +1 * every de_put() and remove_proc_entry() do -1 * once refcount reaches 0, PDE is freed. This elegantly fixes at least two following races (both observed) without introducing new locks, without abusing old locks, without spreading lock_kernel(): 1) PDE leak remove_proc_entry de_put ----------------- ------ [refcnt = 1] if (atomic_read(&de->count) == 0) if (atomic_dec_and_test(&de->count)) if (de->deleted) /* also not taken! */ free_proc_entry(de); else de->deleted = 1; [refcount=0, deleted=1] 2) use after free remove_proc_entry de_put ----------------- ------ [refcnt = 1] if (atomic_dec_and_test(&de->count)) if (atomic_read(&de->count) == 0) free_proc_entry(de); /* boom! */ if (de->deleted) free_proc_entry(de); BUG: unable to handle kernel paging request at virtual address 6b6b6b6b printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4) EIP: 0060:[<c10acdda>] EFLAGS: 00210097 CPU: 1 EIP is at strnlen+0x6/0x18 EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000) Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400 c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400 f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34 Call Trace: [<c10ac4f0>] vsnprintf+0x2ad/0x49b [<c10ac779>] vscnprintf+0x14/0x1f [<c1018e6b>] vprintk+0xc5/0x2f9 [<c10379f1>] handle_fasteoi_irq+0x0/0xab [<c1004f44>] do_IRQ+0x9f/0xb7 [<c117db3b>] preempt_schedule_irq+0x3f/0x5b [<c100264e>] need_resched+0x1f/0x21 [<c10190ba>] printk+0x1b/0x1f [<c107c8ad>] de_put+0x3d/0x50 [<c107c8f8>] proc_delete_inode+0x38/0x41 [<c107c8c0>] proc_delete_inode+0x0/0x41 [<c1066298>] generic_delete_inode+0x5e/0xc6 [<c1065aa9>] iput+0x60/0x62 [<c1063c8e>] d_kill+0x2d/0x46 [<c1063fa9>] dput+0xdc/0xe4 [<c10571a1>] __fput+0xb0/0xcd [<c1054e49>] filp_close+0x48/0x4f [<c1055ee9>] sys_close+0x67/0xa5 [<c10026b6>] sysenter_past_esp+0x5f/0x85 ======================= Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9 EIP: [<c10acdda>] strnlen+0x6/0x18 SS:ESP 0068:f380be44 Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds, module is already pinned and remove_proc_entry() can't happen => nobody can mark PDE deleted. Dummy proc root in netns code is not marked with refcount 1. AFAICS, we never get it, it's just for proper /proc/net removal. I double checked CLONE_NETNS continues to work. Patch survives many hours of modprobe/rmmod/cat loops without new bugs which can be attributed to refcounting. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2b1e300a |
|
01-Dec-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[NETNS]: Fix /proc/net breakage Well I clearly goofed when I added the initial network namespace support for /proc/net. Currently things work but there are odd details visible to user space, even when we have a single network namespace. Since we do not cache proc_dir_entry dentries at the moment we can just modify ->lookup to return a different directory inode depending on the network namespace of the process looking at /proc/net, replacing the current technique of using a magic and fragile follow_link method. To accomplish that this patch: - introduces a shadow_proc method to allow different dentries to be returned from proc_lookup. - Removes the old /proc/net follow_link magic - Fixes a weakness in our not caching of proc generic dentries. As shadow_proc uses a task struct to decided which dentry to return we can go back later and fix the proc generic caching without modifying any code that uses the shadow_proc method. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
#
44656ba1 |
|
07-Nov-2007 |
David S. Miller <davem@sunset.davemloft.net> |
[NET]: Kill proc_net_create() There are no more users. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6f4e6433 |
|
19-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
pid namespaces: initialize the namespace's proc_mnt The namespace's proc_mnt must be kern_mount-ed to make this pointer always valid, independently of whether the user space mounted the proc or not. This solves raced in proc_flush_task, etc. with the proc_mnt switching from NULL to not-NULL. The initialization is done after the init's pid is created and hashed to make proc_get_sb() finr it and get for root inode. Sice the namespace holds the vfsmnt, vfsmnt holds the superblock and the superblock holds the namespace we must explicitly break this circle to destroy all the stuff. This is done after the init of the namespace dies. Running a few steps forward - when init exits it will kill all its children, so no proc_mnt will be needed after its death. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
07543f5c |
|
19-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
pid namespaces: make proc have multiple superblocks - one for each namespace Each pid namespace have to be visible through its own proc mount. Thus we need to have per-namespace proc trees with their own superblocks. We cannot easily show different pid namespace via one global proc tree, since each pid refers to different tasks in different namespaces. E.g. pid 1 refers to the init task in the initial namespace and to some other task when seeing from another namespace. Moreover - pid, exisintg in one namespace may not exist in the other. This approach has one move advantage is that the tasks from the init namespace can see what tasks live in another namespace by reading entries from another proc tree. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
60347f67 |
|
19-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
pid namespaces: prepare proc_flust_task() to flush entries from multiple proc trees The first part is trivial - we just make the proc_flush_task() to operate on arbitrary vfsmount with arbitrary ids and pass the pid and global proc_mnt to it. The other change is more tricky: I moved the proc_flush_task() call in release_task() higher to address the following problem. When flushing task from many proc trees we need to know the set of ids (not just one pid) to find the dentries' names to flush. Thus we need to pass the task's pid to proc_flush_task() as struct pid is the only object that can provide all the pid numbers. But after __exit_signal() task has detached all his pids and this information is lost. This creates a tiny gap for proc_pid_lookup() to bring some dentries back to tree and keep them in hash (since pids are still alive before __exit_signal()) till the next shrink, but since proc_flush_task() does not provide a 100% guarantee that the dentries will be flushed, this is OK to do so. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
077130c0 |
|
13-Sep-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[NET]: Fix race when opening a proc file while a network namespace is exiting. The problem: proc_net files remember which network namespace the are against but do not remember hold a reference count (as that would pin the network namespace). So we currently have a small window where the reference count on a network namespace may be incremented when opening a /proc file when it has already gone to zero. To fix this introduce maybe_get_net and get_proc_net. maybe_get_net increments the network namespace reference count only if it is greater then zero, ensuring we don't increment a reference count after it has gone to zero. get_proc_net handles all of the magic to go from a proc inode to the network namespace instance and call maybe_get_net on it. PROC_NET the old accessor is removed so that we don't get confused and use the wrong helper function. Then I fix up the callers to use get_proc_net and handle the case case where get_proc_net returns NULL. In that case I return -ENXIO because effectively the network namespace has already gone away so the files we are trying to access don't exist anymore. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
457c4cbc |
|
11-Sep-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[NET]: Make /proc/net per network namespace This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
76ceb2f9 |
|
10-Aug-2007 |
Alexey Dobriyan <adobriyan@gmail.com> |
Remove unused struct proc_dir_entry::set After /proc/sys rewrite it was left unused. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
786d7e16 |
|
16-Jul-2007 |
Alexey Dobriyan <adobriyan@sw.ru> |
Fix rmmod/read/write races in /proc entries Fix following races: =========================================== 1. Write via ->write_proc sleeps in copy_from_user(). Module disappears meanwhile. Or, more generically, system call done on /proc file, method supplied by module is called, module dissapeares meanwhile. pde = create_proc_entry() if (!pde) return -ENOMEM; pde->write_proc = ... open write copy_from_user pde = create_proc_entry(); if (!pde) { remove_proc_entry(); return -ENOMEM; /* module unloaded */ } *boom* ========================================== 2. bogo-revoke aka proc_kill_inodes() remove_proc_entry vfs_read proc_kill_inodes [check ->f_op validness] [check ->f_op->read validness] [verify_area, security permissions checks] ->f_op = NULL; if (file->f_op->read) /* ->f_op dereference, boom */ NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's see how this scheme behaves, then extend if needed for directories. Directories creators in /proc only set ->owner for them, so proxying for directories may be unneeded. NOTE, NOTE, NOTE: methods being proxied are ->llseek, ->read, ->write, ->poll, ->unlocked_ioctl, ->ioctl, ->compat_ioctl, ->open, ->release. If your in-tree module uses something else, yell on me. Full audit pending. [akpm@linux-foundation.org: build fix] Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7695650a |
|
08-May-2007 |
Alexey Dobriyan <adobriyan@openvz.org> |
Fix race between proc_get_inode() and remove_proc_entry() proc_lookup remove_proc_entry =========== ================= lock_kernel(); spin_lock(&proc_subdir_lock); [find PDE with refcount 0] spin_unlock(&proc_subdir_lock); spin_lock(&proc_subdir_lock); [find PDE with refcount 0] [check refcount and free PDE] spin_unlock(&proc_subdir_lock); proc_get_inode: de_get(de); /* boom */ Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b813e931 |
|
06-May-2007 |
David Rientjes <rientjes@google.com> |
smaps: add clear_refs file to clear reference Adds /proc/pid/clear_refs. When any non-zero number is written to this file, pte_mkold() and ClearPageReferenced() is called for each pte and its corresponding page, respectively, in that task's VMAs. This file is only writable by the user who owns the task. It is now possible to measure _approximately_ how much memory a task is using by clearing the reference bits with echo 1 > /proc/pid/clear_refs and checking the reference count for each VMA from the /proc/pid/smaps output at a measured time interval. For example, to observe the approximate change in memory footprint for a task, write a script that clears the references (echo 1 > /proc/pid/clear_refs), sleeps, and then greps for Pgs_Referenced and extracts the size in kB. Add the sizes for each VMA together for the total referenced footprint. Moments later, repeat the process and observe the difference. For example, using an efficient Mozilla: accumulated time referenced memory ---------------- ----------------- 0 s 408 kB 1 s 408 kB 2 s 556 kB 3 s 1028 kB 4 s 872 kB 5 s 1956 kB 6 s 416 kB 7 s 1560 kB 8 s 2336 kB 9 s 1044 kB 10 s 416 kB This is a valuable tool to get an approximate measurement of the memory footprint for a task. Cc: Hugh Dickins <hugh@veritas.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> [akpm@linux-foundation.org: build fixes] [mpm@selenic.com: rename for_each_pmd] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
77b14db5 |
|
14-Feb-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[PATCH] sysctl: reimplement the sysctl proc support With this change the sysctl inodes can be cached and nothing needs to be done when removing a sysctl table. For a cost of 2K code we will save about 4K of static tables (when we remove de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or about half that on a 32bit arch. The speed feels about the same, even though we can now cache the sysctl dentries :( We get the core advantage that we don't need to have a 1 to 1 mapping between ctl table entries and proc files. Making it possible to have /proc/sys vary depending on the namespace you are in. The currently merged namespaces don't have an issue here but the network namespace under /proc/sys/net needs to have different directories depending on which network adapters are visible. By simply being a cache different directories being visible depending on who you are is trivial to implement. [akpm@osdl.org: fix uninitialised var] [akpm@osdl.org: fix ARM build] [bunk@stusta.de: make things static] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c5ef1c42 |
|
12-Feb-2007 |
Arjan van de Ven <arjan@linux.intel.com> |
[PATCH] mark struct inode_operations const 3 Many struct inode_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
20cdc894 |
|
02-Oct-2006 |
Eric W. Biederman <ebiederm@xmission.com> |
[PATCH] proc: modify proc_pident_lookup to be completely table driven Currently proc_pident_lookup gets the names and types from a table and then has a huge switch statement to get the inode and file operations it needs. That is silly and is becoming increasingly hard to maintain so I just put all of the information in the table. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
dbf8685c |
|
27-Sep-2006 |
David Howells <dhowells@redhat.com> |
[PATCH] NOMMU: Implement /proc/pid/maps for NOMMU Implement /proc/pid/maps for NOMMU by reading the vm_area_list attached to current->mm->context.vmlist. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
e18fa700 |
|
24-Sep-2006 |
Jeff Garzik <jeff@garzik.org> |
Move several *_SUPER_MAGIC symbols to include/linux/magic.h. Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
13b41b09 |
|
26-Jun-2006 |
Eric W. Biederman <ebiederm@xmission.com> |
[PATCH] proc: Use struct pid not struct task_ref Incrementally update my proc-dont-lock-task_structs-indefinitely patches so that they work with struct pid instead of struct task_ref. Mostly this is a straight 1-1 substitution. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
99f89551 |
|
26-Jun-2006 |
Eric W. Biederman <ebiederm@xmission.com> |
[PATCH] proc: don't lock task_structs indefinitely Every inode in /proc holds a reference to a struct task_struct. If a directory or file is opened and remains open after the the task exits this pinning continues. With 8K stacks on a 32bit machine the amount pinned per file descriptor is about 10K. Normally I would figure a reasonable per user process limit is about 100 processes. With 80 processes, with a 1000 file descriptors each I can trigger the 00M killer on a 32bit kernel, because I have pinned about 800MB of useless data. This patch replaces the struct task_struct pointer with a pointer to a struct task_ref which has a struct task_struct pointer. The so the pinning of dead tasks does not happen. The code now has to contend with the fact that the task may now exit at any time. Which is a little but not muh more complicated. With this change it takes about 1000 processes each opening up 1000 file descriptors before I can trigger the OOM killer. Much better. [mlp@google.com: task_mmu small fixes] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Paul Jackson <pj@sgi.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Albert Cahalan <acahalan@gmail.com> Signed-off-by: Prasanna Meda <mlp@google.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
48e6484d |
|
26-Jun-2006 |
Eric W. Biederman <ebiederm@xmission.com> |
[PATCH] proc: Rewrite the proc dentry flush on exit optimization To keep the dcache from filling up with dead /proc entries we flush them on process exit. However over the years that code has gotten hairy with a dentry_pointer and a lock in task_struct and misdocumented as a correctness feature. I have rewritten this code to look and see if we have a corresponding entry in the dcache and if so flush it on process exit. This removes the extra fields in the task_struct and allows me to trivially handle the case of a /proc/<tgid>/task/<pid> entry as well as the current /proc/<pid> entries. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
aed7a6c4 |
|
26-Jun-2006 |
Eric W. Biederman <ebiederm@xmission.com> |
[PATCH] proc: Replace proc_inode.type with proc_inode.fd The sole renaming use of proc_inode.type is to discover the file descriptor number, so just store the file descriptor number and don't wory about processing this field. This removes any /proc limits on the maximum number of file descriptors, and clears the path to make the hard coded /proc inode numbers go away. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
62c4f0a2 |
|
25-Apr-2006 |
David Woodhouse <dwmw2@infradead.org> |
Don't include linux/config.h from anywhere else in include/ Signed-off-by: David Woodhouse <dwmw2@infradead.org>
|
#
80e8ff63 |
|
10-Apr-2006 |
Vivek Goyal <vgoyal@in.ibm.com> |
[PATCH] kdump proc vmcore size oveflow fix A couple of /proc/vmcore data structures overflow with 32bit systems having memory more than 4G. This patch fixes those. Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
4b6f5d20 |
|
28-Mar-2006 |
Arjan van de Ven <arjan@infradead.org> |
[PATCH] Make most file operations structs in fs/ const This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
99ac48f5 |
|
28-Mar-2006 |
Arjan van de Ven <arjan@infradead.org> |
[PATCH] mark f_ops const in the inode Mark the f_ops members of inodes as const, as well as fix the ripple-through this causes by places that copy this f_ops and then "do stuff" with it. Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
22e6c1b3 |
|
26-Mar-2006 |
Maneesh Soni <maneesh@in.ibm.com> |
[PATCH] Use loff_t for size in struct proc_dir_entry Change proc_dir_entry->size to be loff_t to represent files like /proc/vmcore for 32bit systems with more than 4G memory. Needed for seeing correct size for /proc/vmcore for 32-bit systems with > 4G RAM. Signed-off-by: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
64a07bd8 |
|
26-Mar-2006 |
Steven Rostedt <rostedt@goodmis.org> |
[PATCH] protect remove_proc_entry It has been discovered that the remove_proc_entry has a race in the removing of entries in the proc file system that are siblings. There's no protection around the traversing and removing of elements that belong in the same subdirectory. This subdirectory list is protected in other areas by the BKL. So the BKL was at first used to protect this area too, but unfortunately, remove_proc_entry may be called with spinlocks held. The BKL may schedule, so this was not a solution. The final solution was to add a new global spin lock to protect this list, called proc_subdir_lock. This lock now protects the list in remove_proc_entry, and I also went around looking for other areas that this list is modified and added this protection there too. Care must be taken since these locations call several functions that may also schedule. Since I don't see any location that these functions that modify the subdirectory list are called by interrupts, the irqsave/restore versions of the spin lock was _not_ used. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
898b5395 |
|
12-Jan-2006 |
Dave C Boutcher <sleddog@us.ibm.com> |
[PATCH] powerpc: Add/remove/update properties in /proc/device-tree Add support to the proc_device_tree file for removing and updating properties. Remove just removes the proc file, update changes the data pointer within the proc file. The remainder of the device-tree changes occur elsewhere. Signed-off-by: Dave Boutcher <sleddog@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
330d57fb |
|
04-Nov-2005 |
Al Viro <viro@zeniv.linux.org.uk> |
[PATCH] Fix sysctl unregistration oops (CVE-2005-2709) You could open the /proc/sys/net/ipv4/conf/<if>/<whatever> file, then wait for interface to go away, try to grab as much memory as possible in hope to hit the (kfreed) ctl_table. Then fill it with pointers to your function. Then do read from file you've opened and if you are lucky, you'll get it called as ->proc_handler() in kernel mode. So this is at least an Oops and possibly more. It does depend on an interface going away though, so less of a security risk than it would otherwise be. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
183d0202 |
|
06-Nov-2005 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
[PATCH] ppc64: SMU partition recovery This patch adds the ability to the SMU driver to recover missing calibration partitions from the SMU chip itself. It also adds some dynamic mecanism to /proc/device-tree so that new properties are visible to userland. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
666bfddb |
|
25-Jun-2005 |
Vivek Goyal <vgoyal@in.ibm.com> |
[PATCH] kdump: Access dump file in elf format (/proc/vmcore) From: "Vivek Goyal" <vgoyal@in.ibm.com> o Support for /proc/vmcore interface. This interface exports elf core image either in ELF32 or ELF64 format, depending on the format in which elf headers have been stored by crashed kernel. o Added support for CONFIG_VMCORE config option. o Removed the dependency on /proc/kcore. From: "Eric W. Biederman" <ebiederm@xmission.com> This patch has been refactored to more closely match the prevailing style in the affected files. And to clearly indicate the dependency between /proc/kcore and proc/vmcore.c From: Hariprasad Nellitheertha <hari@in.ibm.com> This patch contains the code that provides an ELF format interface to the previous kernel's memory post kexec reboot. Signed off by Hariprasad Nellitheertha <hari@in.ibm.com> Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|