#
f9c0a39d |
|
28-Sep-2023 |
Kees Cook <keescook@chromium.org> |
binfmt_elf: Only report padzero() errors when PROT_WRITE Errors with padzero() should be caught unless we're expecting a pathological (non-writable) segment. Report -EFAULT only when PROT_WRITE is present. Additionally add some more documentation to padzero(), elf_map(), and elf_load(). Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Suggested-by: Eric Biederman <ebiederm@xmission.com> Tested-by: Pedro Falcato <pedro.falcato@gmail.com> Signed-off-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20230929032435.2391507-5-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
d5ca24f6 |
|
28-Sep-2023 |
Kees Cook <keescook@chromium.org> |
binfmt_elf: Use elf_load() for library While load_elf_library() is a libc5-ism, we can still replace most of its contents with elf_load() as well, further simplifying the code. Some historical context: - libc4 was a.out and used uselib (a.out support has been removed) - libc5 was ELF and used uselib (there may still be users) - libc6 is ELF and has never used uselib Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Suggested-by: Eric Biederman <ebiederm@xmission.com> Tested-by: Pedro Falcato <pedro.falcato@gmail.com> Signed-off-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20230929032435.2391507-4-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
8b04d326 |
|
28-Sep-2023 |
Kees Cook <keescook@chromium.org> |
binfmt_elf: Use elf_load() for interpreter Handle arbitrary memsz>filesz in interpreter ELF segments, instead of only supporting it in the last segment (which is expected to be the BSS). Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Reported-by: Pedro Falcato <pedro.falcato@gmail.com> Closes: https://lore.kernel.org/lkml/20221106021657.1145519-1-pedro.falcato@gmail.com/ Tested-by: Pedro Falcato <pedro.falcato@gmail.com> Signed-off-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20230929032435.2391507-3-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
8ed2ef21 |
|
28-Sep-2023 |
Kees Cook <keescook@chromium.org> |
binfmt_elf: elf_bss no longer used by load_elf_binary() With the BSS handled generically via the new filesz/memsz mismatch handling logic in elf_load(), elf_bss no longer needs to be tracked. Drop the variable. Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Suggested-by: Eric Biederman <ebiederm@xmission.com> Tested-by: Pedro Falcato <pedro.falcato@gmail.com> Signed-off-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20230929032435.2391507-2-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
585a0186 |
|
28-Sep-2023 |
Eric W. Biederman <ebiederm@xmission.com> |
binfmt_elf: Support segments with 0 filesz and misaligned starts Implement a helper elf_load() that wraps elf_map() and performs all of the necessary work to ensure that when "memsz > filesz" the bytes described by "memsz > filesz" are zeroed. An outstanding issue is if the first segment has filesz 0, and has a randomized location. But that is the same as today. In this change I replaced an open coded padzero() that did not clear all of the way to the end of the page, with padzero() that does. I also stopped checking the return of padzero() as there is at least one known case where testing for failure is the wrong thing to do. It looks like binfmt_elf_fdpic may have the proper set of tests for when error handling can be safely completed. I found a couple of commits in the old history https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git, that look very interesting in understanding this code. commit 39b56d902bf3 ("[PATCH] binfmt_elf: clearing bss may fail") commit c6e2227e4a3e ("[SPARC64]: Missing user access return value checks in fs/binfmt_elf.c and fs/compat.c") commit 5bf3be033f50 ("v2.4.10.1 -> v2.4.10.2") Looking at commit 39b56d902bf3 ("[PATCH] binfmt_elf: clearing bss may fail"): > commit 39b56d902bf35241e7cba6cc30b828ed937175ad > Author: Pavel Machek <pavel@ucw.cz> > Date: Wed Feb 9 22:40:30 2005 -0800 > > [PATCH] binfmt_elf: clearing bss may fail > > So we discover that Borland's Kylix application builder emits weird elf > files which describe a non-writeable bss segment. > > So remove the clear_user() check at the place where we zero out the bss. I > don't _think_ there are any security implications here (plus we've never > checked that clear_user() return value, so whoops if it is a problem). > > Signed-off-by: Pavel Machek <pavel@suse.cz> > Signed-off-by: Andrew Morton <akpm@osdl.org> > Signed-off-by: Linus Torvalds <torvalds@osdl.org> It seems pretty clear that binfmt_elf_fdpic with skipping clear_user() for non-writable segments and otherwise calling clear_user(), aka padzero(), and checking it's return code is the right thing to do. I just skipped the error checking as that avoids breaking things. And notably, it looks like Borland's Kylix died in 2005 so it might be safe to just consider read-only segments with memsz > filesz an error. Reported-by: Sebastian Ott <sebott@redhat.com> Reported-by: Thomas Weißschuh <linux@weissschuh.net> Closes: https://lkml.kernel.org/r/20230914-bss-alloc-v1-1-78de67d2c6dd@weissschuh.net Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://lore.kernel.org/r/87sf71f123.fsf@email.froward.int.ebiederm.org Tested-by: Pedro Falcato <pedro.falcato@gmail.com> Signed-off-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20230929032435.2391507-1-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
8d7071af |
|
24-Jun-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
mm: always expand the stack with the mmap write lock held This finishes the job of always holding the mmap write lock when extending the user stack vma, and removes the 'write_locked' argument from the vm helper functions again. For some cases, we just avoid expanding the stack at all: drivers and page pinning really shouldn't be extending any stacks. Let's see if any strange users really wanted that. It's worth noting that architectures that weren't converted to the new lock_mm_and_find_vma() helper function are left using the legacy "expand_stack()" function, but it has been changed to drop the mmap_lock and take it for writing while expanding the vma. This makes it fairly straightforward to convert the remaining architectures. As a result of dropping and re-taking the lock, the calling conventions for this function have also changed, since the old vma may no longer be valid. So it will now return the new vma if successful, and NULL - and the lock dropped - if the area could not be extended. Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> # ia64 Tested-by: Frank Scheiner <frank.scheiner@web.de> # ia64 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f440fa1a |
|
16-Jun-2023 |
Liam R. Howlett <Liam.Howlett@oracle.com> |
mm: make find_extend_vma() fail if write lock not held Make calls to extend_vma() and find_extend_vma() fail if the write lock is required. To avoid making this a flag-day event, this still allows the old read-locking case for the trivial situations, and passes in a flag to say "is it write-locked". That way write-lockers can say "yes, I'm being careful", and legacy users will continue to work in all the common cases until they have been fully converted to the new world order. Co-Developed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
aa88054b |
|
22-Jun-2023 |
Baruch Siach <baruch@tkos.co.il> |
binfmt_elf: fix comment typo s/reset/regset/ Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/0b2967c4a4141875c493e835d5a6f8f2d19ae2d6.1687499804.git.baruch@tkos.co.il
|
#
60592fb6 |
|
11-May-2023 |
Fangrui Song <maskray@google.com> |
coredump, vmcore: Set p_align to 4 for PT_NOTE Tools like readelf/llvm-readelf use p_align to parse a PT_NOTE program header as an array of 4-byte entries or 8-byte entries. Currently, there are workarounds[1] in place for Linux to treat p_align==0 as 4. However, it would be more appropriate to set the correct alignment so that tools do not have to rely on guesswork. FreeBSD coredumps set p_align to 4 as well. [1]: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=82ed9683ec099d8205dc499ac84febc975235af6 [2]: https://reviews.llvm.org/D150022 Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230512022528.3430327-1-maskray@google.com
|
#
7435721a |
|
07-Mar-2023 |
Nick Alcock <nick.alcock@oracle.com> |
binfmt_elf: remove MODULE_LICENSE in non-modules Since commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Suggested-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: linux-modules@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
70e79866 |
|
28-Feb-2023 |
Alexey Dobriyan <adobriyan@gmail.com> |
ELF: fix all "Elf" typos ELF is acronym and therefore should be spelled in all caps. I left one exception at Documentation/arm/nwfpe/nwfpe.rst which looks like being written in the first person. Link: https://lkml.kernel.org/r/Y/3wGWQviIOkyLJW@p183 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
317c8194 |
|
22-Nov-2022 |
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> |
rseq: Introduce feature size and alignment ELF auxiliary vector entries Export the rseq feature size supported by the kernel as well as the required allocation alignment for the rseq per-thread area to user-space through ELF auxiliary vector entries. This is part of the extensible rseq ABI. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221122203932.231377-3-mathieu.desnoyers@efficios.com
|
#
19e183b5 |
|
22-Dec-2022 |
Catalin Marinas <catalin.marinas@arm.com> |
elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size} A subsequent fix for arm64 will use this parameter to parse the vma information from the snapshot created by dump_vma_snapshot() rather than traversing the vma list without the mmap_lock. Fixes: 6dd8b1a0b6cb ("arm64: mte: Dump the MTE tags in the core file") Cc: <stable@vger.kernel.org> # 5.18.x Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Seth Jenkins <sethjenkins@google.com> Suggested-by: Seth Jenkins <sethjenkins@google.com> Cc: Will Deacon <will@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221222181251.1345752-3-catalin.marinas@arm.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
38ba2f11 |
|
04-Sep-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
[elf] get rid of get_note_info_size() it's trivial now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
e92edb85 |
|
04-Sep-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
[elf] unify regset and non-regset cases The only real difference is in filling per-thread notes - getting the values of registers. And this is the only part that is worth an ifdef - we don't need to duplicate the logics regarding gathering threads, filling other notes, etc. It would've been hard to do back when regset-based variant had been introduced, mostly due to sharing bits and pieces of helpers with aout coredumps. As the result, too much had been duplicated and the copies had drifted away since then. Now it can be done cleanly... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
e961d370 |
|
04-Sep-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
[elf][non-regset] use elf_core_copy_task_regs() for dumper as well elf_core_copy_regs() is equivalent to elf_core_copy_task_regs() of current on all architectures. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
bdbadfcc |
|
03-Sep-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
[elf][non-regset] uninline elf_core_copy_task_fpregs() (and lose pt_regs argument) Don't bother with pointless macros - we are not sharing it with aout coredumps anymore. Just convert the underlying functions to the same arguments (nobody uses regs, actually) and call them elf_core_copy_task_fpregs(). And unexport the entire bunch, while we are at it. [added missing includes in arch/{csky,m68k,um}/kernel/process.c to avoid extra warnings about the lack of externs getting added to huge piles for those files. Pointless, but...] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
dc64cc12 |
|
14-Nov-2022 |
Bo Liu <liubo03@inspur.com> |
binfmt_elf: replace IS_ERR() with IS_ERR_VALUE() Avoid typecasts that are needed for IS_ERR() and use IS_ERR_VALUE() instead. Signed-off-by: Bo Liu <liubo03@inspur.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221115031757.2426-1-liubo03@inspur.com
|
#
ef20c513 |
|
19-Oct-2022 |
Rolf Eike Beer <eb@emlix.com> |
binfmt_elf: simplify error handling in load_elf_phdrs() The err variable was the same like retval, but capped to <= 0. This is the same as retval as elf_read() never returns positive values. Signed-off-by: Rolf Eike Beer <eb@emlix.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/4137126.7Qn9TF0dmF@mobilepool36.emlix.com
|
#
cfc46ca4 |
|
19-Oct-2022 |
Rolf Eike Beer <eb@emlix.com> |
binfmt_elf: fix documented return value for load_elf_phdrs() This function has never returned anything but a plain NULL. Fixes: 6a8d38945cf4 ("binfmt_elf: Hoist ELF program header loading to a function") Signed-off-by: Rolf Eike Beer <eb@emlix.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/2359389.EDbqzprbEW@mobilepool36.emlix.com
|
#
8f6e3f9e |
|
18-Oct-2022 |
Kees Cook <keescook@chromium.org> |
binfmt: Fix whitespace issues Fix the annoying whitespace issues that have been following these files around for years. Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Link: https://lore.kernel.org/r/20221018071350.never.230-kees@kernel.org
|
#
4b0e21d6 |
|
08-Jun-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
[elf][regset] simplify thread list handling in fill_note_info() fill_note_info() iterates through the list of threads collected in mm->core_state->dumper, allocating a struct elf_thread_core_info instance for each and linking those into a list. We need the entry corresponding to current to be first in the resulting list, so the logics for list insertion is if it's for current or list is empty insert in the head else insert after the first element However, in mm->core_state->dumper the entry for current is guaranteed to be the first one. Which means that both parts of condition will be true on the first iteration and neither will be true on all subsequent ones. Taking the first iteration out of the loop simplifies things nicely... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
922ef161 |
|
04-Sep-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
[elf][regset] clean fill_note_info() a bit *info is already initialized... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
9a938eba |
|
07-Jun-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
kill coredump_params->regs it's always task_pt_regs(current) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
594d2a14 |
|
24-Oct-2022 |
Li Zetao <lizetao1@huawei.com> |
fs/binfmt_elf: Fix memory leak in load_elf_binary() There is a memory leak reported by kmemleak: unreferenced object 0xffff88817104ef80 (size 224): comm "xfs_admin", pid 47165, jiffies 4298708825 (age 1333.476s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 60 a8 b3 00 81 88 ff ff a8 10 5a 00 81 88 ff ff `.........Z..... backtrace: [<ffffffff819171e1>] __alloc_file+0x21/0x250 [<ffffffff81918061>] alloc_empty_file+0x41/0xf0 [<ffffffff81948cda>] path_openat+0xea/0x3d30 [<ffffffff8194ec89>] do_filp_open+0x1b9/0x290 [<ffffffff8192660e>] do_open_execat+0xce/0x5b0 [<ffffffff81926b17>] open_exec+0x27/0x50 [<ffffffff81a69250>] load_elf_binary+0x510/0x3ed0 [<ffffffff81927759>] bprm_execve+0x599/0x1240 [<ffffffff8192a997>] do_execveat_common.isra.0+0x4c7/0x680 [<ffffffff8192b078>] __x64_sys_execve+0x88/0xb0 [<ffffffff83bbf0a5>] do_syscall_64+0x35/0x80 If "interp_elf_ex" fails to allocate memory in load_elf_binary(), the program will take the "out_free_ph" error handing path, resulting in "interpreter" file resource is not released. Fix it by adding an error handing path "out_free_file", which will release the file resource when "interp_elf_ex" failed to allocate memory. Fixes: 0693ffebcfe5 ("fs/binfmt_elf.c: allocate less for static executable") Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221024154421.982230-1-lizetao1@huawei.com
|
#
aeb79237 |
|
14-Apr-2022 |
Andrew Morton <akpm@linux-foundation.org> |
revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE" Despite Mike's attempted fix (925346c129da117122), regressions reports continue: https://lore.kernel.org/lkml/cb5b81bd-9882-e5dc-cd22-54bdbaaefbbc@leemhuis.info/ https://bugzilla.kernel.org/show_bug.cgi?id=215720 https://lkml.kernel.org/r/b685f3d0-da34-531d-1aa9-479accd3e21b@leemhuis.info So revert this patch. Fixes: 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE") Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Chris Kennelly <ckennelly@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Fangrui Song <maskray@google.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
354e923d |
|
14-Apr-2022 |
Andrew Morton <akpm@linux-foundation.org> |
revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders" Commit 925346c129da11 ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders") was an attempt to fix regressions due to 9630f0d60fec5f ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE"). But regressionss continue to be reported: https://lore.kernel.org/lkml/cb5b81bd-9882-e5dc-cd22-54bdbaaefbbc@leemhuis.info/ https://bugzilla.kernel.org/show_bug.cgi?id=215720 https://lkml.kernel.org/r/b685f3d0-da34-531d-1aa9-479accd3e21b@leemhuis.info This patch reverts the fix, so the original can also be reverted. Fixes: 925346c129da11 ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders") Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Chris Kennelly <ckennelly@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: David Rientjes <rientjes@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Fangrui Song <maskray@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: Mike Rapoport <rppt@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
dd664099 |
|
17-Mar-2022 |
Rick Edgecombe <rick.p.edgecombe@intel.com> |
binfmt_elf: Don't write past end of notes for regset gap In fill_thread_core_info() the ptrace accessible registers are collected to be written out as notes in a core file. The note array is allocated from a size calculated by iterating the user regset view, and counting the regsets that have a non-zero core_note_type. However, this only allows for there to be non-zero core_note_type at the end of the regset view. If there are any gaps in the middle, fill_thread_core_info() will overflow the note allocation, as it iterates over the size of the view and the allocation would be smaller than that. There doesn't appear to be any arch that has gaps such that they exceed the notes allocation, but the code is brittle and tries to support something it doesn't. It could be fixed by increasing the allocation size, but instead just have the note collecting code utilize the array better. This way the allocation can stay smaller. Even in the case of no arch's that have gaps in their regset views, this introduces a change in the resulting indicies of t->notes. It does not introduce any changes to the core file itself, because any blank notes are skipped in write_note_info(). In case, the allocation logic between fill_note_info() and fill_thread_core_info() ever diverges from the usage logic, warn and skip writing any notes that would overflow the array. This fix is derrived from an earlier one[0] by Yu-cheng Yu. [0] https://lore.kernel.org/lkml/20180717162502.32274-1-yu-cheng.yu@intel.com/ Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220317192013.13655-4-rick.p.edgecombe@intel.com
|
#
390031c9 |
|
08-Mar-2022 |
Eric W. Biederman <ebiederm@xmission.com> |
coredump: Use the vma snapshot in fill_files_note Matthew Wilcox reported that there is a missing mmap_lock in file_files_note that could possibly lead to a user after free. Solve this by using the existing vma snapshot for consistency and to avoid the need to take the mmap_lock anywhere in the coredump code except for dump_vma_snapshot. Update the dump_vma_snapshot to capture vm_pgoff and vm_file that are neeeded by fill_files_note. Add free_vma_snapshot to free the captured values of vm_file. Reported-by: Matthew Wilcox <willy@infradead.org> Link: https://lkml.kernel.org/r/20220131153740.2396974-1-willy@infradead.org Cc: stable@vger.kernel.org Fixes: a07279c9a8cd ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot") Fixes: 2aa362c49c31 ("coredump: extend core dump note section to contain file names of mapped files") Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
9ec7d323 |
|
30-Jan-2022 |
Eric W. Biederman <ebiederm@xmission.com> |
coredump/elf: Pass coredump_params into fill_note_info Instead of individually passing cprm->siginfo and cprm->regs into fill_note_info pass all of struct coredump_params. This is preparation to allow fill_files_note to use the existing vma snapshot. Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
95c5436a |
|
07-Mar-2022 |
Eric W. Biederman <ebiederm@xmission.com> |
coredump: Snapshot the vmas in do_coredump Move the call of dump_vma_snapshot and kvfree(vma_meta) out of the individual coredump routines into do_coredump itself. This makes the code less error prone and easier to maintain. Make the vma snapshot available to the coredump routines in struct coredump_params. This makes it easier to change and update what is captures in the vma snapshot and will be needed for fixing fill_file_notes. Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
9e1a3ce0 |
|
23-Feb-2022 |
Kees Cook <keescook@chromium.org> |
binfmt_elf: Introduce KUnit test Adds simple KUnit test for some binfmt_elf internals: specifically a regression test for the problem fixed by commit 8904d9cd90ee ("ELF: fix overflow in total mapping size calculation"). $ ./tools/testing/kunit/kunit.py run --arch x86_64 \ --kconfig_add CONFIG_IA32_EMULATION=y '*binfmt_elf' ... [19:41:08] ================== binfmt_elf (1 subtest) ================== [19:41:08] [PASSED] total_mapping_size_test [19:41:08] =================== [PASSED] binfmt_elf ==================== [19:41:08] ============== compat_binfmt_elf (1 subtest) =============== [19:41:08] [PASSED] total_mapping_size_test [19:41:08] ================ [PASSED] compat_binfmt_elf ================ [19:41:08] ============================================================ [19:41:08] Testing complete. Passed: 2, Failed: 0, Crashed: 0, Skipped: 0, Errors: 0 Cc: Eric Biederman <ebiederm@xmission.com> Cc: David Gow <davidgow@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Magnus Groß" <magnus.gross@rwth-aachen.de> Cc: kunit-dev@googlegroups.com Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> --- v1: https://lore.kernel.org/lkml/20220224054332.1852813-1-keescook@chromium.org v2: - improve commit log - fix comment URL (Daniel) - drop redundant KUnit Kconfig help info (Daniel) - note in Kconfig help that COMPAT builds add a compat test (David)
|
#
2b4bfbe0 |
|
27-Jan-2022 |
Akira Kawata <akirakawata1@gmail.com> |
fs/binfmt_elf: Refactor load_elf_binary function I delete load_addr because it is not used anymore. And I rename load_addr_set to first_pt_load because it is used only to capture the first iteration of the loop. Signed-off-by: Akira Kawata <akirakawata1@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127124014.338760-3-akirakawata1@gmail.com
|
#
0da1d500 |
|
27-Jan-2022 |
Akira Kawata <akirakawata1@gmail.com> |
fs/binfmt_elf: Fix AT_PHDR for unusual ELF files BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=197921 As pointed out in the discussion of buglink, we cannot calculate AT_PHDR as the sum of load_addr and exec->e_phoff. : The AT_PHDR of ELF auxiliary vectors should point to the memory address : of program header. But binfmt_elf.c calculates this address as follows: : : NEW_AUX_ENT(AT_PHDR, load_addr + exec->e_phoff); : : which is wrong since e_phoff is the file offset of program header and : load_addr is the memory base address from PT_LOAD entry. : : The ld.so uses AT_PHDR as the memory address of program header. In normal : case, since the e_phoff is usually 64 and in the first PT_LOAD region, it : is the correct program header address. : : But if the address of program header isn't equal to the first PT_LOAD : address + e_phoff (e.g. Put the program header in other non-consecutive : PT_LOAD region), ld.so will try to read program header from wrong address : then crash or use incorrect program header. This is because exec->e_phoff is the offset of PHDRs in the file and the address of PHDRs in the memory may differ from it. This patch fixes the bug by calculating the address of program headers from PT_LOADs directly. Signed-off-by: Akira Kawata <akirakawata1@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127124014.338760-2-akirakawata1@gmail.com
|
#
d65bc29b |
|
13-Feb-2022 |
Alexey Dobriyan <adobriyan@gmail.com> |
binfmt: move more stuff undef CONFIG_COREDUMP struct linux_binfmt::core_dump and struct min_coredump::min_coredump are used under CONFIG_COREDUMP only. Shrink those embedded configs a bit. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/YglbIFyN+OtwVyjW@localhost.localdomain
|
#
10b19249 |
|
03-Oct-2021 |
Alexey Dobriyan <adobriyan@gmail.com> |
ELF: fix overflow in total mapping size calculation Kernel assumes that ELF program headers are ordered by mapping address, but doesn't enforce it. It is possible to make mapping size extremely huge by simply shuffling first and last PT_LOAD segments. As long as PT_LOAD segments do not overlap, it is silly to require sorting by v_addr anyway because mmap() doesn't care. Don't assume PT_LOAD segments are sorted and calculate min and max addresses correctly. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Tested-by: "Magnus Groß" <magnus.gross@rwth-aachen.de> Link: https://lore.kernel.org/all/Yfqm7HbucDjPbES+@fractal.localdomain/ Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/lkml/YVmd7D0M6G%2FDcP4O@localhost.localdomain
|
#
439a8468 |
|
28-Feb-2022 |
Kees Cook <keescook@chromium.org> |
binfmt_elf: Avoid total_mapping_size for ET_EXEC Partially revert commit 5f501d555653 ("binfmt_elf: reintroduce using MAP_FIXED_NOREPLACE"), which applied the ET_DYN "total_mapping_size" logic also to ET_EXEC. At least ia64 has ET_EXEC PT_LOAD segments that are not virtual-address contiguous (but _are_ file-offset contiguous). This would result in a giant mapping attempting to cover the entire span, including the virtual address range hole, and well beyond the size of the ELF file itself, causing the kernel to refuse to load it. For example: $ readelf -lW /usr/bin/gcc ... Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz ... ... LOAD 0x000000 0x4000000000000000 0x4000000000000000 0x00b5a0 0x00b5a0 ... LOAD 0x00b5a0 0x600000000000b5a0 0x600000000000b5a0 0x0005ac 0x000710 ... ... ^^^^^^^^ ^^^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^ File offset range : 0x000000-0x00bb4c 0x00bb4c bytes Virtual address range : 0x4000000000000000-0x600000000000bcb0 0x200000000000bcb0 bytes Remove the total_mapping_size logic for ET_EXEC, which reduces the ET_EXEC MAP_FIXED_NOREPLACE coverage to only the first PT_LOAD (better than nothing), and retains it for ET_DYN. Ironically, this is the reverse of the problem that originally caused problems with MAP_FIXED_NOREPLACE: overlapping PT_LOAD segments. Future work could restore full coverage if load_elf_binary() were to perform mappings in a separate phase from the loading (where it could resolve both overlaps and holes). Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Reported-by: matoro <matoro_bugzilla_kernel@matoro.tk> Fixes: 5f501d555653 ("binfmt_elf: reintroduce using MAP_FIXED_NOREPLACE") Link: https://lore.kernel.org/r/a3edd529-c42d-3b09-135c-7e98a15b150f@leemhuis.info Tested-by: matoro <matoro_mailinglist_kernel@matoro.tk> Link: https://lore.kernel.org/lkml/ce8af9c13bcea9230c7689f3c1e0e2cd@matoro.tk Tested-By: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/lkml/49182d0d-708b-4029-da5f-bc18603440a6@physik.fu-berlin.de Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
925346c1 |
|
11-Feb-2022 |
Mike Rapoport <rppt@kernel.org> |
fs/binfmt_elf: fix PT_LOAD p_align values for loaders Rui Salvaterra reported that Aisleroit solitaire crashes with "Wrong __data_start/_end pair" assertion from libgc after update to v5.17-rc1. Bisection pointed to commit 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE") that fixed handling of static PIEs, but made the condition that guards load_bias calculation to exclude loader binaries. Restoring the check for presence of interpreter fixes the problem. Link: https://lkml.kernel.org/r/20220202121433.3697146-1-rppt@kernel.org Fixes: 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reported-by: Rui Salvaterra <rsalvaterra@gmail.com> Tested-by: Rui Salvaterra <rsalvaterra@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H.J. Lu" <hjl.tools@gmail.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9630f0d6 |
|
19-Jan-2022 |
H.J. Lu <hjl.tools@gmail.com> |
fs/binfmt_elf: use PT_LOAD p_align values for static PIE Extend commit ce81bb256a22 ("fs/binfmt_elf: use PT_LOAD p_align values for suitable start address") which fixed PIE binaries built with -Wl,-z,max-page-size=0x200000, to cover static PIE binaries. This fixes: https://bugzilla.kernel.org/show_bug.cgi?id=215275 Tested by verifying static PIE binaries with -Wl,-z,max-page-size=0x200000 loading. Link: https://lkml.kernel.org/r/20211209174052.370537-1-hjl.tools@gmail.com Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Cc: Chris Kennelly <ckennelly@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: David Rientjes <rientjes@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Fangrui Song <maskray@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
95af469c |
|
19-Jan-2022 |
Yafang Shao <laoar.shao@gmail.com> |
fs/binfmt_elf: replace open-coded string copy with get_task_comm It is better to use get_task_comm() instead of the open coded string copy as we do in other places. struct elf_prpsinfo is used to dump the task information in userspace coredump or kernel vmcore. Below is the verification of vmcore, crash> ps PID PPID CPU TASK ST %MEM VSZ RSS COMM 0 0 0 ffffffff9d21a940 RU 0.0 0 0 [swapper/0] > 0 0 1 ffffa09e40f85e80 RU 0.0 0 0 [swapper/1] > 0 0 2 ffffa09e40f81f80 RU 0.0 0 0 [swapper/2] > 0 0 3 ffffa09e40f83f00 RU 0.0 0 0 [swapper/3] > 0 0 4 ffffa09e40f80000 RU 0.0 0 0 [swapper/4] > 0 0 5 ffffa09e40f89f80 RU 0.0 0 0 [swapper/5] 0 0 6 ffffa09e40f8bf00 RU 0.0 0 0 [swapper/6] > 0 0 7 ffffa09e40f88000 RU 0.0 0 0 [swapper/7] > 0 0 8 ffffa09e40f8de80 RU 0.0 0 0 [swapper/8] > 0 0 9 ffffa09e40f95e80 RU 0.0 0 0 [swapper/9] > 0 0 10 ffffa09e40f91f80 RU 0.0 0 0 [swapper/10] > 0 0 11 ffffa09e40f93f00 RU 0.0 0 0 [swapper/11] > 0 0 12 ffffa09e40f90000 RU 0.0 0 0 [swapper/12] > 0 0 13 ffffa09e40f9bf00 RU 0.0 0 0 [swapper/13] > 0 0 14 ffffa09e40f98000 RU 0.0 0 0 [swapper/14] > 0 0 15 ffffa09e40f9de80 RU 0.0 0 0 [swapper/15] It works well as expected. Some comments are added to explain why we use the hard-coded 16. Link: https://lkml.kernel.org/r/20211120112738.45980-5-laoar.shao@gmail.com Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a43e5e3a |
|
08-Nov-2021 |
Alexey Dobriyan <adobriyan@gmail.com> |
ELF: simplify STACK_ALLOC macro "A -= B; A" is equivalent to "A -= B". Link: https://lkml.kernel.org/r/YVmcP256fRMqCwgK@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5f501d55 |
|
08-Nov-2021 |
Kees Cook <keescook@chromium.org> |
binfmt_elf: reintroduce using MAP_FIXED_NOREPLACE Commit b212921b13bd ("elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings") reverted back to using MAP_FIXED to map ELF LOAD segments because it was found that the segments in some binaries overlap and can cause MAP_FIXED_NOREPLACE to fail. The original intent of MAP_FIXED_NOREPLACE in the ELF loader was to prevent the silent clobbering of an existing mapping (e.g. stack) by the ELF image, which could lead to exploitable conditions. Quoting commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map"), which originally introduced the use of MAP_FIXED_NOREPLACE in the loader: Both load_elf_interp and load_elf_binary rely on elf_map to map segments [to a specific] address and they use MAP_FIXED to enforce that. This is however [a] dangerous thing prone to silent data corruption which can be even exploitable. ... Let's take CVE-2017-1000253 as an example ... we could end up mapping [the executable] over the existing stack ... The [stack layout] issue has been fixed since then ... So we should be safe and any [similar] attack should be impractical. On the other hand this is just too subtle [an] assumption ... it can break quite easily and [be] hard to spot. ... Address this [weakness] by changing MAP_FIXED to the newly added MAP_FIXED_NOREPLACE. This will mean that mmap will fail if there is an existing mapping clashing with the requested one [instead of silently] clobbering it. Then processing ET_DYN binaries the loader already calculates a total size for the image when the first segment is mapped, maps the entire image, and then unmaps the remainder before the remaining segments are then individually mapped. To avoid the earlier problems (legitimate overlapping LOAD segments specified in the ELF), apply the same logic to ET_EXEC binaries as well. For both ET_EXEC and ET_DYN+INTERP use MAP_FIXED_NOREPLACE for the initial total size mapping and then use MAP_FIXED to build the final (possibly legitimately overlapping) mappings. For ET_DYN w/out INTERP, continue to map at a system-selected address in the mmap region. Link: https://lkml.kernel.org/r/20210916215947.3993776-1-keescook@chromium.org Link: https://lore.kernel.org/lkml/1595869887-23307-2-git-send-email-anthony.yznaga@oracle.com Co-developed-by: Anthony Yznaga <anthony.yznaga@oracle.com> Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Chen Jingwen <chenjingwen6@huawei.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrei Vagin <avagin@openvz.org> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0258b5fd |
|
22-Sep-2021 |
Eric W. Biederman <ebiederm@xmission.com> |
coredump: Limit coredumps to a single thread group Today when a signal is delivered with a handler of SIG_DFL whose default behavior is to generate a core dump not only that process but every process that shares the mm is killed. In the case of vfork this looks like a real world problem. Consider the following well defined sequence. if (vfork() == 0) { execve(...); _exit(EXIT_FAILURE); } If a signal that generates a core dump is received after vfork but before the execve changes the mm the process that called vfork will also be killed (as the mm is shared). Similarly if the execve fails after the point of no return the kernel delivers SIGSEGV which will kill both the exec'ing process and because the mm is shared the process that called vfork as well. As far as I can tell this behavior is a violation of people's reasonable expectations, POSIX, and is unnecessarily fragile when the system is low on memory. Solve this by making a userspace visible change to only kill a single process/thread group. This is possible because Jann Horn recently modified[1] the coredump code so that the mm can safely be modified while the coredump is happening. With LinuxThreads long gone I don't expect anyone to have a notice this behavior change in practice. To accomplish this move the core_state pointer from mm_struct to signal_struct, which allows different thread groups to coredump simultatenously. In zap_threads remove the work to kill anything except for the current thread group. v2: Remove core_state from the VM_BUG_ON_MM print to fix compile failure when CONFIG_DEBUG_VM is enabled. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> [1] a07279c9a8cd ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot") Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3") History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Link: https://lkml.kernel.org/r/87y27mvnke.fsf@disp2133 Link: https://lkml.kernel.org/r/20211007144701.67592574@canb.auug.org.au Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
9b2f72cc |
|
28-Sep-2021 |
Chen Jingwen <chenjingwen6@huawei.com> |
elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings In commit b212921b13bd ("elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings") we still leave MAP_FIXED_NOREPLACE in place for load_elf_interp. Unfortunately, this will cause kernel to fail to start with: 1 (init): Uhuuh, elf segment at 00003ffff7ffd000 requested but the memory is mapped already Failed to execute /init (error -17) The reason is that the elf interpreter (ld.so) has overlapping segments. readelf -l ld-2.31.so Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x000000000002c94c 0x000000000002c94c R E 0x10000 LOAD 0x000000000002dae0 0x000000000003dae0 0x000000000003dae0 0x00000000000021e8 0x0000000000002320 RW 0x10000 LOAD 0x000000000002fe00 0x000000000003fe00 0x000000000003fe00 0x00000000000011ac 0x0000000000001328 RW 0x10000 The reason for this problem is the same as described in commit ad55eac74f20 ("elf: enforce MAP_FIXED on overlaying elf segments"). Not only executable binaries, elf interpreters (e.g. ld.so) can have overlapping elf segments, so we better drop MAP_FIXED_NOREPLACE and go back to MAP_FIXED in load_elf_interp. Fixes: 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") Cc: <stable@vger.kernel.org> # v4.19 Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4589ff7c |
|
23-Apr-2021 |
David Hildenbrand <david@redhat.com> |
binfmt: remove in-tree usage of MAP_DENYWRITE At exec time when we mmap the new executable via MAP_DENYWRITE we have it opened via do_open_execat() and already deny_write_access()'ed the file successfully. Once exec completes, we allow_write_acces(); however, we set mm->exe_file in begin_new_exec() via set_mm_exe_file() and also deny_write_access() as long as mm->exe_file remains set. We'll effectively deny write access to our executable via mm->exe_file until mm->exe_file is changed -- when the process is removed, on new exec, or via sys_prctl(PR_SET_MM_MAP/EXE_FILE). Let's remove all usage of MAP_DENYWRITE, it's no longer necessary for mm->exe_file. In case of an elf interpreter, we'll now only deny write access to the file during exec. This is somewhat okay, because the interpreter behaves (and sometime is) a shared library; all shared libraries, especially the ones loaded directly in user space like via dlopen() won't ever be mapped via MAP_DENYWRITE, because we ignore that from user space completely; these shared libraries can always be modified while mapped and executed. Let's only special-case the main executable, denying write access while being executed by a process. This can be considered a minor user space visible change. While this is a cleanup, it also fixes part of a problem reported with VM_DENYWRITE on overlayfs, as VM_DENYWRITE is effectively unused with this patch and will be removed next: "Overlayfs did not honor positive i_writecount on realfile for VM_DENYWRITE mappings." [1] [1] https://lore.kernel.org/r/YNHXzBgzRrZu1MrD@miu.piliscsaba.redhat.com/ Reported-by: Chengguang Xu <cgxu519@mykernel.net> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: David Hildenbrand <david@redhat.com>
|
#
42be8b42 |
|
21-Apr-2021 |
David Hildenbrand <david@redhat.com> |
binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib() uselib() is the legacy systemcall for loading shared libraries. Nowadays, applications use dlopen() to load shared libraries, completely implemented in user space via mmap(). For example, glibc uses MAP_COPY to mmap shared libraries. While this maps to MAP_PRIVATE | MAP_DENYWRITE on Linux, Linux ignores any MAP_DENYWRITE specification from user space in mmap. With this change, all remaining in-tree users of MAP_DENYWRITE use it to map an executable. We will be able to open shared libraries loaded via uselib() writable, just as we already can via dlopen() from user space. This is one step into the direction of removing MAP_DENYWRITE from the kernel. This can be considered a minor user space visible change. Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: David Hildenbrand <david@redhat.com>
|
#
a4eec6a3 |
|
28-Jun-2021 |
David Hildenbrand <david@redhat.com> |
binfmt: remove in-tree usage of MAP_EXECUTABLE Ever since commit e9714acf8c43 ("mm: kill vma flag VM_EXECUTABLE and mm->num_exe_file_vmas"), VM_EXECUTABLE is gone and MAP_EXECUTABLE is essentially completely ignored. Let's remove all usage of MAP_EXECUTABLE. [akpm@linux-foundation.org: fix blooper in fs/binfmt_aout.c. per David] Link: https://lkml.kernel.org/r/20210421093453.6904-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kevin Brodsky <Kevin.Brodsky@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2f064a59 |
|
11-Jun-2021 |
Peter Zijlstra <peterz@infradead.org> |
sched: Change task_struct::state Change the type and name of task_struct::state. Drop the volatile and shrink it to an 'unsigned int'. Rename it in order to find all uses such that we can use READ_ONCE/WRITE_ONCE as appropriate. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org
|
#
d0f1088b |
|
08-Mar-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
coredump: don't bother with do_truncate() have dump_skip() just remember how much needs to be skipped, leave actual seeks/writing zeroes to the next dump_emit() or the end of coredump output, whichever comes first. And instead of playing with do_truncate() in the end, just write one NUL at the end of the last gap (if any). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
2347961b |
|
28-Jan-2020 |
Laurent Vivier <laurent@vivier.eu> |
binfmt_misc: pass binfmt_misc flags to the interpreter It can be useful to the interpreter to know which flags are in use. For instance, knowing if the preserve-argv[0] is in use would allow to skip the pathname argument. This patch uses an unused auxiliary vector, AT_FLAGS, to add a flag to inform interpreter if the preserve-argv[0] is enabled. Note by Helge Deller: The real-world user of this patch is qemu-user, which needs to know if it has to preserve the argv[0]. See Debian bug #970460. Signed-off-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: YunQiang Su <ysu@wavecomp.com> URL: http://bugs.debian.org/970460 Signed-off-by: Helge Deller <deller@gmx.de>
|
#
f2485a2d |
|
12-Jun-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
elf_prstatus: collect the common part (everything before pr_reg) into a struct Preparations to doing i386 compat elf_prstatus sanely - rather than duplicating the beginning of compat_elf_prstatus, take these fields into a separate structure (compat_elf_prstatus_common), so that it could be reused. Due to the incestous relationship between binfmt_elf.c and compat_binfmt_elf.c we need the same shape change done to native struct elf_prstatus, gathering the fields prior to pr_reg into a new structure (struct elf_prstatus_common). Fortunately, offset of pr_reg is always a multiple of 16 with no padding right before it, so it's possible to turn all the stuff prior to it into a single member without disturbing the layout. [build fix from Geert Uytterhoeven folded in] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
8a00dd00 |
|
22-Jun-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
binfmt_elf: partially sanitize PRSTATUS_SIZE and SET_PR_FPVALID On 64bit architectures that support 32bit processes there are two possible layouts for NT_PRSTATUS note in ELF coredumps. For one thing, several fields are 64bit for native processes and 32bit for compat ones (pr_sigpend, etc.). For another, the register dump is obviously different - the size and number of registers are not going to be the same for 32bit and 64bit variants of processor. Usually that's handled by having two structures - elf_prstatus for native layout and compat_elf_prstatus for 32bit one. 32bit processes are handled by fs/compat_binfmt_elf.c, which defines a macro called 'elf_prstatus' that expands to compat_elf_prstatus. Then it includes fs/binfmt_elf.c, which makes all references to struct elf_prstatus to be textually replaced with struct compat_elf_prstatus. Ugly and somewhat brittle, but it works. However, amd64 is worse - there are _three_ possible layouts. One for native 64bit processes, another for i386 (32bit) processes and yet another for x32 (32bit address space with full 64bit registers). Both i386 and x32 processes are handled by fs/compat_binfmt_elf.c, with usual compat_binfmt_elf.c trickery. However, the layouts for i386 and x32 are not identical - they have the common beginning, but the register dump part (pr_reg) is bigger on x32. Worse, pr_reg is not the last field - it's followed by int pr_fpvalid, so that field ends up at different offsets for i386 and x32 layouts. Fortunately, there's not much code that cares about any of that - it's all encapsulated in fill_thread_core_info(). Since x32 variant is bigger, we define compat_elf_prstatus to match that layout. That way i386 processes have enough space to fit their layout into. Moreover, since these layouts are identical prior to pr_reg, we don't need to distinguish x32 and i386 cases when we are setting the fields prior to pr_reg. Filling pr_reg itself is done by calling ->get() method of appropriate regset, and that method knows what layout (and size) to use. We do need to distinguish x32 and i386 cases only for two things: setting ->pr_fpvalid (offset differs for x32 and i386) and choosing the right size for our note. The way it's done is Not Nice, for the lack of more accurate printable description. There are two macros (PRSTATUS_SIZE and SET_PR_FPVALID), that default essentially to sizeof(struct elf_prstatus) and (S)->pr_fpvalid = 1. On x86 asm/compat.h provides its own variants. Unfortunately, quite a few things go wrong there: * PRSTATUS_SIZE doesn't use the normal test for process being an x32 one; it compares the size reported by regset with the size of pr_reg. * it hardcodes the sizes of x32 and i386 variants (296 and 144 resp.), so if some change in includes leads to asm/compat.h pulled in by fs/binfmt_elf.c we are in trouble - it will end up using the size of x32 variant for 64bit processes. * it's in the wrong place; asm/compat.h couldn't define the structure for i386 layout, since it lacks quite a few types needed for it. Hardcoded sizes are largely due to that. The proper fix would be to have an explicitly defined i386 variant of structure and have PRSTATUS_SIZE/SET_PR_FPVALID check for TIF_X32 to choose the variant that should be used. Unfortunately, that requires some manipulations of headers; we'll do that later in the series, but for now let's go with the minimal variant - rename PRSTATUS_SIZE in asm/compat.h to COMPAT_PRSTATUS_SIZE, have fs/compat_binfmt_elf.c define PRSTATUS_SIZE to COMPAT_PRSTATUS_SIZE and use the normal TIF_X32 check in that macro. The size of i386 variant is kept hardcoded for now. Similar story for SET_PR_FPVALID. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
c39ab6de |
|
25-Nov-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
coredump: Document coredump code exclusively used by cell spufs Oleg Nesterov recently asked[1] why is there an unshare_files in do_coredump. After digging through all of the callers of lookup_fd it turns out that it is arch/powerpc/platforms/cell/spufs/coredump.c:coredump_next_context that needs the unshare_files in do_coredump. Looking at the history[2] this code was also the only piece of coredump code that required the unshare_files when the unshare_files was added. Looking at that code it turns out that cell is also the only architecture that implements elf_coredump_extra_notes_size and elf_coredump_extra_notes_write. I looked at the gdb repo[3] support for cell has been removed[4] in binutils 2.34. Geoff Levand reports he is still getting questions on how to run modern kernels on the PS3, from people using 3rd party firmware so this code is not dead. According to Wikipedia the last PS3 shipped in Japan sometime in 2017. So it will probably be a little while before everyone's hardware dies. Add some comments briefly documenting the coredump code that exists only to support cell spufs to make it easier to understand the coredump code. Eventually the hardware will be dead, or their won't be userspace tools, or the coredump code will be refactored and it will be too difficult to update a dead architecture and these comments make it easy to tell where to pull to remove cell spufs support. [1] https://lkml.kernel.org/r/20201123175052.GA20279@redhat.com [2] 179e037fc137 ("do_coredump(): make sure that descriptor table isn't shared") [3] git://sourceware.org/git/binutils-gdb.git [4] abf516c6931a ("Remove Cell Broadband Engine debugging support"). Link: https://lkml.kernel.org/r/87h7pdnlzv.fsf_-_@x220.int.ebiederm.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
5e01fdff |
|
31-Aug-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
fs: Replace zero-length array with flexible-array member There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arrays Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
#
9a29a671 |
|
03-Oct-2020 |
Gabriel Krisman Bertazi <krisman@collabora.com> |
elf: Expose ELF header on arch_setup_additional_pages() Like it is done for SET_PERSONALITY with ARM, which requires the ELF header to select correct personality parameters, x86 requires the headers when selecting which VDSO to load, instead of relying on the going-away TIF_IA32/X32 flags. Add an indirection macro to arch_setup_additional_pages(), that x86 can reimplement to receive the extra parameter just for ELF files. This requires no changes to other architectures, who can continue to use the original arch_setup_additional_pages for ELF and non-ELF binaries. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201004032536.1229030-8-krisman@collabora.com
|
#
bc3d7bf6 |
|
03-Oct-2020 |
Gabriel Krisman Bertazi <krisman@collabora.com> |
elf: Expose ELF header in compat_start_thread() Like it is done for SET_PERSONALITY with x86, which requires the ELF header to select correct personality parameters, x86 requires the headers on compat_start_thread() to choose starting CS for ELF32 binaries, instead of relying on the going-away TIF_IA32/X32 flags. Add an indirection macro to ELF invocations of START_THREAD, that x86 can reimplement to receive the extra parameter just for ELF files. This requires no changes to other architectures who don't need the header information, they can continue to use the original start_thread for ELF and non-ELF binaries, and it prevents affecting non-ELF code paths for x86. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201004032536.1229030-6-krisman@collabora.com
|
#
b2767d97 |
|
17-Oct-2020 |
Jann Horn <jannh@google.com> |
binfmt_elf: take the mmap lock around find_extend_vma() create_elf_tables() runs after setup_new_exec(), so other tasks can already access our new mm and do things like process_madvise() on it. (At the time I'm writing this commit, process_madvise() is not in mainline yet, but has been in akpm's tree for some time.) While I believe that there are currently no APIs that would actually allow another process to mess up our VMA tree (process_madvise() is limited to MADV_COLD and MADV_PAGEOUT, and uring and userfaultfd cannot reach an mm under which no syscalls have been executed yet), this seems like an accident waiting to happen. Let's make sure that we always take the mmap lock around GUP paths as long as another process might be able to see the mm. (Yes, this diff looks suspicious because we drop the lock before doing anything with `vma`, but that's because we actually don't do anything with it apart from the NULL check.) Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michel Lespinasse <walken@google.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://lkml.kernel.org/r/CAG48ez1-PBCdv3y8pn-Ty-b+FmBSLwDuVKFSt8h7wARLy0dF-Q@mail.gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a07279c9 |
|
15-Oct-2020 |
Jann Horn <jannh@google.com> |
binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot In both binfmt_elf and binfmt_elf_fdpic, use a new helper dump_vma_snapshot() to take a snapshot of the VMA list (including the gate VMA, if we have one) while protected by the mmap_lock, and then use that snapshot instead of walking the VMA list without locking. An alternative approach would be to keep the mmap_lock held across the entire core dumping operation; however, keeping the mmap_lock locked while we may be blocked for an unbounded amount of time (e.g. because we're dumping to a FUSE filesystem or so) isn't really optimal; the mmap_lock blocks things like the ->release handler of userfaultfd, and we don't really want critical system daemons to grind to a halt just because someone "gifted" them SCM_RIGHTS to an eternally-locked userfaultfd, or something like that. Since both the normal ELF code and the FDPIC ELF code need this functionality (and if any other binfmt wants to add coredump support in the future, they'd probably need it, too), implement this with a common helper in fs/coredump.c. A downside of this approach is that we now need a bigger amount of kernel memory per userspace VMA in the normal ELF case, and that we need O(n) kernel memory in the FDPIC ELF case at all; but 40 bytes per VMA shouldn't be terribly bad. There currently is a data race between stack expansion and anything that reads ->vm_start or ->vm_end under the mmap_lock held in read mode; to mitigate that for core dumping, take the mmap_lock in write mode when taking a snapshot of the VMA hierarchy. (If we only took the mmap_lock in read mode, we could end up with a corrupted core dump if someone does get_user_pages_remote() concurrently. Not really a major problem, but taking the mmap_lock either way works here, so we might as well avoid the issue.) (This doesn't do anything about the existing data races with stack expansion in other mm code.) Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-6-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
429a22e7 |
|
15-Oct-2020 |
Jann Horn <jannh@google.com> |
coredump: rework elf/elf_fdpic vma_dump_size() into common helper At the moment, the binfmt_elf and binfmt_elf_fdpic code have slightly different code to figure out which VMAs should be dumped, and if so, whether the dump should contain the entire VMA or just its first page. Eliminate duplicate code by reworking the binfmt_elf version into a generic core dumping helper in coredump.c. As part of that, change the heuristic for detecting executable/library header pages to check whether the inode is executable instead of looking at the file mode. This is less problematic in terms of locking because it lets us avoid get_user() under the mmap_sem. (And arguably it looks nicer and makes more sense in generic code.) Adjust a little bit based on the binfmt_elf_fdpic version: ->anon_vma is only meaningful under CONFIG_MMU, otherwise we have to assume that the VMA has been written to. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-5-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
afc63a97b |
|
15-Oct-2020 |
Jann Horn <jannh@google.com> |
coredump: refactor page range dumping into common helper Both fs/binfmt_elf.c and fs/binfmt_elf_fdpic.c need to dump ranges of pages into the coredump file. Extract that logic into a common helper. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200827114932.3572699-4-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ce81bb25 |
|
15-Oct-2020 |
Chris Kennelly <ckennelly@google.com> |
fs/binfmt_elf: use PT_LOAD p_align values for suitable start address Patch series "Selecting Load Addresses According to p_align", v3. The current ELF loading mechancism provides page-aligned mappings. This can lead to the program being loaded in a way unsuitable for file-backed, transparent huge pages when handling PIE executables. While specifying -z,max-page-size=0x200000 to the linker will generate suitably aligned segments for huge pages on x86_64, the executable needs to be loaded at a suitably aligned address as well. This alignment requires the binary's cooperation, as distinct segments need to be appropriately paddded to be eligible for THP. For binaries built with increased alignment, this limits the number of bits usable for ASLR, but provides some randomization over using fixed load addresses/non-PIE binaries. This patch (of 2): The current ELF loading mechancism provides page-aligned mappings. This can lead to the program being loaded in a way unsuitable for file-backed, transparent huge pages when handling PIE executables. For binaries built with increased alignment, this limits the number of bits usable for ASLR, but provides some randomization over using fixed load addresses/non-PIE binaries. Tested by verifying program with -Wl,-z,max-page-size=0x200000 loading. [akpm@linux-foundation.org: fix max() warning] [ckennelly@google.com: augment comment] Link: https://lkml.kernel.org/r/20200821233848.3904680-2-ckennelly@google.com Signed-off-by: Chris Kennelly <ckennelly@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: David Rientjes <rientjes@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Hugh Dickens <hughd@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Fangrui Song <maskray@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Link: https://lkml.kernel.org/r/20200820170541.1132271-1-ckennelly@google.com Link: https://lkml.kernel.org/r/20200820170541.1132271-2-ckennelly@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7a896028 |
|
12-Jun-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
kill elf_fpxregs_t all uses are conditional upon ELF_CORE_COPY_XFPREGS, which has not been defined on any architecture since 2010 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
b4e9c954 |
|
01-Jun-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
introduction of regset ->get() wrappers, switching ELF coredumps to those Two new helpers: given a process and regset, dump into a buffer. regset_get() takes a buffer and size, regset_get_alloc() takes size and allocates a buffer. Return value in both cases is the amount of data actually dumped in case of success or -E... on error. In both cases the size is capped by regset->n * regset->size, so ->get() is called with offset 0 and size no more than what regset expects. binfmt_elf.c callers of ->get() are switched to using those; the other caller (copy_regset_to_user()) will need some preparations to switch. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
852991dd |
|
04-Jun-2020 |
Anthony Iliopoulos <ailiop@suse.com> |
fs/binfmt_elf: remove redundant elf_map ifndef The ifndef was added a long time ago to support archs that would define their own mapping function. The last user was the metag arch which was removed from the tree, and as such there are no users left. Let's kill it. Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200402161543.4119-1-ailiop@suse.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
646e84de |
|
19-Feb-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
binfmt_elf: don't bother with __{put,copy_to}_user() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
1d605416 |
|
27-May-2020 |
Alexander Potapenko <glider@google.com> |
fs/binfmt_elf.c: allocate initialized memory in fill_thread_core_info() KMSAN reported uninitialized data being written to disk when dumping core. As a result, several kilobytes of kmalloc memory may be written to the core file and then read by a non-privileged user. Reported-by: sam <sunhaoyl@outlook.com> Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200419100848.63472-1-glider@google.com Link: https://github.com/google/kmsan/issues/76 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b8a61c9e |
|
14-May-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
exec: Generic execfd support Most of the support for passing the file descriptor of an executable to an interpreter already lives in the generic code and in binfmt_elf. Rework the fields in binfmt_elf that deal with executable file descriptor passing to make executable file descriptor passing a first class concept. Move the fd_install from binfmt_misc into begin_new_exec after the new creds have been installed. This means that accessing the file through /proc/<pid>/fd/N is able to see the creds for the new executable before allowing access to the new executables files. Performing the install of the executables file descriptor after the point of no return also means that nothing special needs to be done on error. The exiting of the process will close all of it's open files. Move the would_dump from binfmt_misc into begin_new_exec right after would_dump is called on the bprm->file. This makes it obvious this case exists and that no nesting of bprm->file is currently supported. In binfmt_misc the movement of fd_install into generic code means that it's special error exit path is no longer needed. Link: https://lkml.kernel.org/r/87y2poyd91.fsf_-_@x220.int.ebiederm.org Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
2388777a |
|
03-May-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
exec: Rename flush_old_exec begin_new_exec There is and has been for a very long time been a lot more going on in flush_old_exec than just flushing the old state. After the movement of code from setup_new_exec there is a whole lot more going on than just flushing the old executables state. Rename flush_old_exec to begin_new_exec to more accurately reflect what this function does. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
96ecee29 |
|
03-May-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
exec: Merge install_exec_creds into setup_new_exec The two functions are now always called one right after the other so merge them together to make future maintenance easier. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
d2530b43 |
|
04-May-2020 |
Christoph Hellwig <hch@lst.de> |
binfmt_elf: remove the set_fs(KERNEL_DS) in elf_core_dump There is no logic in elf_core_dump itself or in the various arch helpers called from it which use uaccess routines on kernel pointers except for the file writes thate are nicely encapsulated by using __kernel_write in dump_emit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
fa4751f4 |
|
04-May-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
binfmt_elf: remove the set_fs in fill_siginfo_note The code in binfmt_elf.c is differnt from the rest of the code that processes siginfo, as it sends siginfo from a kernel buffer to a file rather than from kernel memory to userspace buffers. To remove it's use of set_fs the code needs some different siginfo helpers. Add the helper copy_siginfo_to_external to copy from the kernel's internal siginfo layout to a buffer in the siginfo layout that userspace expects. Modify fill_siginfo_note to use copy_siginfo_to_external instead of set_fs and copy_siginfo_to_user. Update compat_binfmt_elf.c to use the previously added copy_siginfo_to_external32 to handle the compat case. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
aa0d1564 |
|
06-Apr-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: don't free interpreter's ELF pheaders on common path Static executables don't need to free NULL pointer. It doesn't matter really because static executable is not common scenario but do it anyway out of pedantry. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200219185330.GA4933@avx2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0693ffeb |
|
06-Apr-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: allocate less for static executable PT_INTERP ELF header can be spared if executable is static. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200219185012.GB4871@avx2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c69bcc93 |
|
06-Apr-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: delete "loc" variable "loc" variable became just a wrapper for PT_INTERP ELF header after main ELF header was moved to "bprm->buf". Delete it. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200219184847.GA4871@avx2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
03911132 |
|
06-Apr-2020 |
Anshuman Khandual <anshuman.khandual@arm.com> |
mm/vma: replace all remaining open encodings with is_vm_hugetlb_page() This replaces all remaining open encodings with is_vm_hugetlb_page(). Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Will Deacon <will@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Nick Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Paul Burton <paulburton@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/1582520593-30704-4-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fe0f6766 |
|
16-Mar-2020 |
Dave Martin <Dave.Martin@arm.com> |
elf: Allow arch to tweak initial mmap prot flags An arch may want to tweak the mmap prot flags for an ELFexecutable's initial mappings. For example, arm64 is going to need to add PROT_BTI for executable pages in an ELF process whose executable is marked as using Branch Target Identification (an ARMv8.5-A control flow integrity feature). So that this can be done in a generic way, add a hook arch_elf_adjust_prot() to modify the prot flags as desired: arches can select CONFIG_HAVE_ELF_PROT and implement their own backend where necessary. By default, leave the prot flags unchanged. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
00e19cee |
|
16-Mar-2020 |
Dave Martin <Dave.Martin@arm.com> |
ELF: Add ELF program property parsing support ELF program properties will be needed for detecting whether to enable optional architecture or ABI features for a new ELF process. For now, there are no generic properties that we care about, so do nothing unless CONFIG_ARCH_USE_GNU_PROPERTY=y. Otherwise, the presence of properties using the PT_PROGRAM_PROPERTY phdrs entry (if any), and notify each property to the arch code. For now, the added code is not used. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
1fbede6e |
|
30-Jan-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: coredump: allow process with empty address space to coredump Unmapping whole address space at once with munmap(0, (1ULL<<47) - 4096) or equivalent will create empty coredump. It is silly way to exit, however registers content may still be useful. The right to coredump is fundamental right of a process! Link: http://lkml.kernel.org/r/20191222150137.GA1277@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
28f46656 |
|
30-Jan-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: coredump: delete duplicated overflow check array_size() macro will do overflow check anyway. Link: http://lkml.kernel.org/r/20191222144009.GB24341@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
225a3f53 |
|
30-Jan-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: coredump: allocate core ELF header on stack Comment says ELF header is "too large to be on stack". 64 bytes on 64-bit is not large by any means. Link: http://lkml.kernel.org/r/20191222143850.GA24341@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
18676ffc |
|
30-Jan-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: make BAD_ADDR() unlikely If some mapping goes past TASK_SIZE it will be rejected by kernel which means no such userspace binaries exist. Mark every such check as unlikely. Link: http://lkml.kernel.org/r/20191215124355.GA21124@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
03c6d723 |
|
30-Jan-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: better codegen around current->mm "current->mm" pointer is stable in general except few cases one of which execve(2). Compiler can't treat is as stable but it _is_ stable most of the time. During ELF loading process ->mm becomes stable right after flush_old_exec(). Help compiler by caching current->mm, otherwise it continues to refetch it. add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-141 (-141) Function old new delta elf_core_dump 5062 5039 -23 load_elf_binary 5426 5308 -118 Note: other cases are left as is because it is either pessimisation or no change in binary size. Link: http://lkml.kernel.org/r/20191215124755.GB21124@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a62c5b1b |
|
30-Jan-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: don't copy ELF header around ELF header is read into bprm->buf[] by generic execve code. Save a memcpy and allocate just one header for the interpreter instead of two headers (64 bytes instead of 128 on 64-bit). Link: http://lkml.kernel.org/r/20191208171242.GA19716@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f67ef446 |
|
30-Jan-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: fix ->start_code calculation Only executable segments should be accounted to ->start_code just like they do to ->end_code (correctly). Link: http://lkml.kernel.org/r/20191208171410.GB19716@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1f83d806 |
|
30-Jan-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: smaller code generation around auxv vector fill Filling auxv vector as array with index (auxv[i++] = ...) generates terrible code. "saved_auxv" should be reworked because it is the worst member of mm_struct by size/usefullness ratio but do it later. Meanwhile help gcc a little with *auxv++ idiom. Space savings on x86_64: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-127 (-127) Function old new delta load_elf_binary 5470 5343 -127 Link: http://lkml.kernel.org/r/20191208172301.GD19716@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
658c0335 |
|
04-Dec-2019 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: extract elf_read() function ELF reads done by the kernel have very complicated error detection code which better live in one place. Link: http://lkml.kernel.org/r/20191005165215.GB26927@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
81696d5d |
|
04-Dec-2019 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: delete unused "interp_map_addr" argument Link: http://lkml.kernel.org/r/20191005165049.GA26927@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e2bb80d5 |
|
23-Nov-2017 |
Arnd Bergmann <arnd@arndb.de> |
y2038: elfcore: Use __kernel_old_timeval for process times We store elapsed time for a crashed process in struct elf_prstatus using 'timeval' structures. Once glibc starts using 64-bit time_t, this becomes incompatible with the kernel's idea of timeval since the structure layout no longer matches on 32-bit architectures. This changes the definition of the elf_prstatus structure to use __kernel_old_timeval instead, which is hardcoded to the currently used binary layout. There is no risk of overflow in y2038 though, because the time values are all relative times, and can store up to 68 years of process elapsed time. There is a risk of applications breaking at build time when they use the new kernel headers and expect the type to be exactly 'timeval' rather than a structure that has the same fields as before. Those applications have to be modified to deal with 64-bit time_t anyway. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
#
b212921b |
|
06-Oct-2019 |
Linus Torvalds <torvalds@linux-foundation.org> |
elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings In commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") we changed elf to use MAP_FIXED_NOREPLACE instead of MAP_FIXED for the executable mappings. Then, people reported that it broke some binaries that had overlapping segments from the same file, and commit ad55eac74f20 ("elf: enforce MAP_FIXED on overlaying elf segments") re-instated MAP_FIXED for some overlaying elf segment cases. But only some - despite the summary line of that commit, it only did it when it also does a temporary brk vma for one obvious overlapping case. Now Russell King reports another overlapping case with old 32-bit x86 binaries, which doesn't trigger that limited case. End result: we had better just drop MAP_FIXED_NOREPLACE entirely, and go back to MAP_FIXED. Yes, it's a sign of old binaries generated with old tool-chains, but we do pride ourselves on not breaking existing setups. This still leaves MAP_FIXED_NOREPLACE in place for the load_elf_interp() and the old load_elf_library() use-cases, because nobody has reported breakage for those. Yet. Note that in all the cases seen so far, the overlapping elf sections seem to be just re-mapping of the same executable with different section attributes. We could possibly introduce a new MAP_FIXED_NOFILECHANGE flag or similar, which acts like NOREPLACE, but allows just remapping the same executable file using different protection flags. It's not clear that would make a huge difference to anything, but if people really hate that "elf remaps over previous maps" behavior, maybe at least a more limited form of remapping would alleviate some concerns. Alternatively, we should take a look at our elf_map() logic to see if we end up not mapping things properly the first time. In the meantime, this is the minimal "don't do that then" patch while people hopefully think about it more. Reported-by: Russell King <linux@armlinux.org.uk> Fixes: 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") Fixes: ad55eac74f20 ("elf: enforce MAP_FIXED on overlaying elf segments") Cc: Michal Hocko <mhocko@suse.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7be3cb01 |
|
26-Sep-2019 |
Kees Cook <keescook@chromium.org> |
binfmt_elf: Do not move brk for INTERP-less ET_EXEC When brk was moved for binaries without an interpreter, it should have been limited to ET_DYN only. In other words, the special case was an ET_DYN that lacks an INTERP, not just an executable that lacks INTERP. The bug manifested for giant static executables, where the brk would end up in the middle of the text area on 32-bit architectures. Reported-and-tested-by: Richard Kojedzinszky <richard@kojedz.in> Fixes: bbdc6076d2e5 ("binfmt_elf: move brk out of mmap when doing direct loader exec") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
649775be |
|
23-Sep-2019 |
Alexandre Ghiti <alex@ghiti.fr> |
mm, fs: move randomize_stack_top from fs to mm Patch series "Provide generic top-down mmap layout functions", v6. This series introduces generic functions to make top-down mmap layout easily accessible to architectures, in particular riscv which was the initial goal of this series. The generic implementation was taken from arm64 and used successively by arm, mips and finally riscv. Note that in addition the series fixes 2 issues: - stack randomization was taken into account even if not necessary. - [1] fixed an issue with mmap base which did not take into account randomization but did not report it to arm and mips, so by moving arm64 into a generic library, this problem is now fixed for both architectures. This work is an effort to factorize architecture functions to avoid code duplication and oversights as in [1]. [1]: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1429066.html This patch (of 14): This preparatory commit moves this function so that further introduction of generic topdown mmap layout is contained only in mm/util.c. Link: http://lkml.kernel.org/r/20190730055113.23635-2-alex@ghiti.fr Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: James Hogan <jhogan@kernel.org> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
aa94b1dc |
|
16-Jul-2019 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: delete stale comment "passed_fileno" variable was deleted 11 years ago in 2.6.25. Link: http://lkml.kernel.org/r/20190529201747.GA23248@avx2 Fixes: d20894a23708 ("Remove a.out interpreter support in ELF loader") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
09c434b8 |
|
19-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Add SPDX license identifier for more missed files Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
bbdc6076 |
|
14-May-2019 |
Kees Cook <keescook@chromium.org> |
binfmt_elf: move brk out of mmap when doing direct loader exec Commmit eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"), made changes in the rare case when the ELF loader was directly invoked (e.g to set a non-inheritable LD_LIBRARY_PATH, testing new versions of the loader), by moving into the mmap region to avoid both ET_EXEC and PIE binaries. This had the effect of also moving the brk region into mmap, which could lead to the stack and brk being arbitrarily close to each other. An unlucky process wouldn't get its requested stack size and stack allocations could end up scribbling on the heap. This is illustrated here. In the case of using the loader directly, brk (so helpfully identified as "[heap]") is allocated with the _loader_ not the binary. For example, with ASLR entirely disabled, you can see this more clearly: $ /bin/cat /proc/self/maps 555555554000-55555555c000 r-xp 00000000 ... /bin/cat 55555575b000-55555575c000 r--p 00007000 ... /bin/cat 55555575c000-55555575d000 rw-p 00008000 ... /bin/cat 55555575d000-55555577e000 rw-p 00000000 ... [heap] ... 7ffff7ff7000-7ffff7ffa000 r--p 00000000 ... [vvar] 7ffff7ffa000-7ffff7ffc000 r-xp 00000000 ... [vdso] 7ffff7ffc000-7ffff7ffd000 r--p 00027000 ... /lib/x86_64-linux-gnu/ld-2.27.so 7ffff7ffd000-7ffff7ffe000 rw-p 00028000 ... /lib/x86_64-linux-gnu/ld-2.27.so 7ffff7ffe000-7ffff7fff000 rw-p 00000000 ... 7ffffffde000-7ffffffff000 rw-p 00000000 ... [stack] $ /lib/x86_64-linux-gnu/ld-2.27.so /bin/cat /proc/self/maps ... 7ffff7bcc000-7ffff7bd4000 r-xp 00000000 ... /bin/cat 7ffff7bd4000-7ffff7dd3000 ---p 00008000 ... /bin/cat 7ffff7dd3000-7ffff7dd4000 r--p 00007000 ... /bin/cat 7ffff7dd4000-7ffff7dd5000 rw-p 00008000 ... /bin/cat 7ffff7dd5000-7ffff7dfc000 r-xp 00000000 ... /lib/x86_64-linux-gnu/ld-2.27.so 7ffff7fb2000-7ffff7fd6000 rw-p 00000000 ... 7ffff7ff7000-7ffff7ffa000 r--p 00000000 ... [vvar] 7ffff7ffa000-7ffff7ffc000 r-xp 00000000 ... [vdso] 7ffff7ffc000-7ffff7ffd000 r--p 00027000 ... /lib/x86_64-linux-gnu/ld-2.27.so 7ffff7ffd000-7ffff7ffe000 rw-p 00028000 ... /lib/x86_64-linux-gnu/ld-2.27.so 7ffff7ffe000-7ffff8020000 rw-p 00000000 ... [heap] 7ffffffde000-7ffffffff000 rw-p 00000000 ... [stack] The solution is to move brk out of mmap and into ELF_ET_DYN_BASE since nothing is there in the direct loader case (and ET_EXEC is still far away at 0x400000). Anything that ran before should still work (i.e. the ultimately-launched binary already had the brk very far from its text, so this should be no different from a COMPAT_BRK standpoint). The only risk I see here is that if someone started to suddenly depend on the entire memory space lower than the mmap region being available when launching binaries via a direct loader execs which seems highly unlikely, I'd hope: this would mean a binary would _not_ work when exec()ed normally. (Note that this is only done under CONFIG_ARCH_HAS_ELF_RANDOMIZATION when randomization is turned on.) Link: http://lkml.kernel.org/r/20190422225727.GA21011@beast Link: https://lkml.kernel.org/r/CAGXu5jJ5sj3emOT2QPxQkNQk0qbU6zEfu9=Omfhx_p0nCKPSjA@mail.gmail.com Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Ali Saidi <alisaidi@amazon.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
249b08e4 |
|
14-May-2019 |
Alexey Dobriyan <adobriyan@gmail.com> |
elf: init pt_regs pointer later Get "current_pt_regs" pointer right before usage. Space savings on x86_64: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-180 (-180) Function old new delta load_elf_binary 5806 5626 -180 !!! Looks like the compiler doesn't know that "current_pt_regs" is stable pointer (because it doesn't know ->stack isn't) even though it knows that "current" is stable pointer. So it saves it in the very beginning and then tries to carry it through a lot of code. Here is what happens here: load_elf_binary() ... mov rax,QWORD PTR gs:0x14c00 mov r13,QWORD PTR [rax+0x18] r13 = current->stack call kmem_cache_alloc # first kmalloc [980 bytes later!] # let's spill that sucker because we need a register # for "load_bias" calculations at # # if (interpreter) { # load_bias = ELF_ET_DYN_BASE; # if (current->flags & PF_RANDOMIZE) # load_bias += arch_mmap_rnd(); # elf_flags |= elf_fixed; # } mov QWORD PTR [rsp+0x68],r13 If this is not _the_ root cause it is still eeeeh. After the patch things become much simpler: mov rax, QWORD PTR gs:0x14c00 # current mov rdx, QWORD PTR [rax+0x18] # current->stack movq [rdx+0x3fb8], 0 # fill pt_regs ... call finalize_exec Link: http://lkml.kernel.org/r/20190419200343.GA19788@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Tested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d8e7cb39 |
|
14-May-2019 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: extract PROT_* calculations There are two places where mapping protections are calculated: one for executable, another one for interpreter -- take them out. ELF read and execute permissions are interchanged with Linux PROT_READ and PROT_EXEC, microoptimizations are welcome! Link: http://lkml.kernel.org/r/20190417213413.GB26474@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
85264316 |
|
14-May-2019 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs//binfmt_elf.c: move variables initialization closer to their usage Link: http://lkml.kernel.org/r/20190416202002.GB24304@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
be0deb58 |
|
14-May-2019 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: save 1 indent level Rewrite for (...) { if (->p_type == PT_INTERP) { ... break; } } loop into for (...) { if (->p_type != PT_INTERP) continue; ... break; } Link: http://lkml.kernel.org/r/20190416201906.GA24304@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ba0f6b88 |
|
14-May-2019 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: delete trailing "return;" in functions returning "void" Link: http://lkml.kernel.org/r/20190314205042.GE18143@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cc338010 |
|
14-May-2019 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: free PT_INTERP filename ASAP There is no reason for PT_INTERP filename to linger till the end of the whole loading process. Link: http://lkml.kernel.org/r/20190314204953.GD18143@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Nikitas Angelinas <nikitas.angelinas@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mukesh Ojha <mojha@codeaurora.org> [nikitas.angelinas@gmail.com: fix GPF when dereferencing invalid interpreter] Link: http://lkml.kernel.org/r/20190330140032.GA1527@vostro Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5cf4a363 |
|
14-May-2019 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: make scope of "pos" variable smaller Link: http://lkml.kernel.org/r/20190314204707.GC18143@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
22f084db |
|
14-May-2019 |
Andrew Morton <akpm@linux-foundation.org> |
fs/binfmt_elf.c: remove unneeded initialization of mm->start_stack As pointed out by zoujc@lenovo.com, setup_arg_pages() already initialized current->mm->start_stack. Link: https://bugzilla.kernel.org/show_bug.cgi?id=202881 Reported-by: <zoujc@lenovo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
49ac9819 |
|
07-Mar-2019 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: spread const a little Link: http://lkml.kernel.org/r/20190204202830.GC27482@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
93f044e2 |
|
07-Mar-2019 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: use list_for_each_entry() [adobriyan@gmail.com: fixup compilation] Link: http://lkml.kernel.org/r/20190205064334.GA2152@avx2 Link: http://lkml.kernel.org/r/20190204202800.GB27482@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
faf1c315 |
|
07-Mar-2019 |
Alexey Dobriyan <adobriyan@gmail.com> |
fs/binfmt_elf.c: don't be afraid of overflow Number of ELF program headers is 16-bit by spec, so total size comfortably fits into "unsigned int". Space savings: 7 bytes! add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-7 (-7) Function old new delta load_elf_phdrs 137 130 -7 Link: http://lkml.kernel.org/r/20190204202715.GA27482@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ae7795bc |
|
25-Sep-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Distinguish between kernel_siginfo and siginfo Linus recently observed that if we did not worry about the padding member in struct siginfo it is only about 48 bytes, and 48 bytes is much nicer than 128 bytes for allocating on the stack and copying around in the kernel. The obvious thing of only adding the padding when userspace is including siginfo.h won't work as there are sigframe definitions in the kernel that embed struct siginfo. So split siginfo in two; kernel_siginfo and siginfo. Keeping the traditional name for the userspace definition. While the version that is used internally to the kernel and ultimately will not be padded to 128 bytes is called kernel_siginfo. The definition of struct kernel_siginfo I have put in include/signal_types.h A set of buildtime checks has been added to verify the two structures have the same field offsets. To make it easy to verify the change kernel_siginfo retains the same size as siginfo. The reduction in size comes in a following change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
2f819db5 |
|
15-May-2018 |
Maciej W. Rozycki <macro@mips.com> |
binfmt_elf: Respect error return from `regset->active' The regset API documented in <linux/regset.h> defines -ENODEV as the result of the `->active' handler to be used where the feature requested is not available on the hardware found. However code handling core file note generation in `fill_thread_core_info' interpretes any non-zero result from the `->active' handler as the regset requested being active. Consequently processing continues (and hopefully gracefully fails later on) rather than being abandoned right away for the regset requested. Fix the problem then by making the code proceed only if a positive result is returned from the `->active' handler. Signed-off-by: Maciej W. Rozycki <macro@mips.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: 4206d3aa1978 ("elf core dump: notes user_regset") Patchwork: https://patchwork.linux-mips.org/patch/19332/ Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: James Hogan <jhogan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
|
#
24962af7 |
|
13-Jul-2018 |
Oscar Salvador <osalvador@suse.de> |
fs, elf: make sure to page align bss in load_elf_library The current code does not make sure to page align bss before calling vm_brk(), and this can lead to a VM_BUG_ON() in __mm_populate() due to the requested lenght not being correctly aligned. Let us make sure to align it properly. Kees: only applicable to CONFIG_USELIB kernels: 32-bit and configured for libc5. Link: http://lkml.kernel.org/r/20180705145539.9627-1-osalvador@techadventures.net Signed-off-by: Oscar Salvador <osalvador@suse.de> Reported-by: syzbot+5dcb560fe12aa5091c06@syzkaller.appspotmail.com Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Acked-by: Kees Cook <keescook@chromium.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
86a2bb5a |
|
14-Jun-2018 |
Alexey Dobriyan <adobriyan@gmail.com> |
coredump: fix spam with zero VMA process Nobody ever tried to self destruct by unmapping whole address space at once: munmap((void *)0, (1ULL << 47) - 4096); Doing this produces 2 warnings for zero-length vmalloc allocations: a.out[1353]: segfault at 7f80bcc4b757 ip 00007f80bcc4b757 sp 00007fff683939b8 error 14 a.out: vmalloc: allocation failure: 0 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null) ... a.out: vmalloc: allocation failure: 0 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null) ... Fix is to switch to kvmalloc(). Steps to reproduce: // vsyscall=none #include <sys/mman.h> #include <sys/resource.h> int main(void) { setrlimit(RLIMIT_CORE, &(struct rlimit){RLIM_INFINITY, RLIM_INFINITY}); munmap((void *)0, (1ULL << 47) - 4096); return 0; } Link: http://lkml.kernel.org/r/20180410180353.GA2515@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
42bc47b3 |
|
12-Jun-2018 |
Kees Cook <keescook@chromium.org> |
treewide: Use array_size() in vmalloc() The vmalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vmalloc(a * b) with: vmalloc(array_size(a, b)) as well as handling cases of: vmalloc(a * b * c) with: vmalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vmalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | vmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(char) * COUNT + COUNT , ...) | vmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vmalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vmalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vmalloc(C1 * C2 * C3, ...) | vmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vmalloc(C1 * C2, ...) | vmalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
6da2ec56 |
|
12-Jun-2018 |
Kees Cook <keescook@chromium.org> |
treewide: kmalloc() -> kmalloc_array() The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
d23a61ee |
|
20-Apr-2018 |
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> |
fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST error Commit 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") is printing spurious messages under memory pressure due to map_addr == -ENOMEM. 9794 (a.out): Uhuuh, elf segment at 00007f2e34738000(fffffffffffffff4) requested but the memory is mapped already 14104 (a.out): Uhuuh, elf segment at 00007f34fd76c000(fffffffffffffff4) requested but the memory is mapped already 16843 (a.out): Uhuuh, elf segment at 00007f930ecc7000(fffffffffffffff4) requested but the memory is mapped already Complain only if -EEXIST, and use %px for printing the address. Link: http://lkml.kernel.org/r/201804182307.FAC17665.SFMOFJVFtHOLOQ@I-love.SAKURA.ne.jp Fixes: 4ed28639519c7bad ("fs, elf: drop MAP_FIXED usage from elf_map") is Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andrei Vagin <avagin@openvz.org> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Kees Cook <keescook@chromium.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Joel Stanley <joel@jms.id.au> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ad55eac7 |
|
10-Apr-2018 |
Michal Hocko <mhocko@suse.com> |
elf: enforce MAP_FIXED on overlaying elf segments Anshuman has reported that with "fs, elf: drop MAP_FIXED usage from elf_map" applied, some ELF binaries in his environment fail to start with [ 23.423642] 9148 (sed): Uhuuh, elf segment at 0000000010030000 requested but the memory is mapped already [ 23.423706] requested [10030000, 10040000] mapped [10030000, 10040000] 100073 anon The reason is that the above binary has overlapping elf segments: LOAD 0x0000000000000000 0x0000000010000000 0x0000000010000000 0x0000000000013a8c 0x0000000000013a8c R E 10000 LOAD 0x000000000001fd40 0x000000001002fd40 0x000000001002fd40 0x00000000000002c0 0x00000000000005e8 RW 10000 LOAD 0x0000000000020328 0x0000000010030328 0x0000000010030328 0x0000000000000384 0x00000000000094a0 RW 10000 That binary has two RW LOAD segments, the first crosses a page border into the second 0x1002fd40 (LOAD2-vaddr) + 0x5e8 (LOAD2-memlen) == 0x10030328 (LOAD3-vaddr) Handle this situation by enforcing MAP_FIXED when we establish a temporary brk VMA to handle overlapping segments. All other mappings will still use MAP_FIXED_NOREPLACE. Link: http://lkml.kernel.org/r/20180213100440.GM3443@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Andrei Vagin <avagin@openvz.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Kees Cook <keescook@chromium.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Joel Stanley <joel@jms.id.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4ed28639 |
|
10-Apr-2018 |
Michal Hocko <mhocko@suse.com> |
fs, elf: drop MAP_FIXED usage from elf_map Both load_elf_interp and load_elf_binary rely on elf_map to map segments on a controlled address and they use MAP_FIXED to enforce that. This is however dangerous thing prone to silent data corruption which can be even exploitable. Let's take CVE-2017-1000253 as an example. At the time (before commit eab09532d400: "binfmt_elf: use ELF_ET_DYN_BASE only for PIE") ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from the stack top on 32b (legacy) memory layout (only 1GB away). Therefore we could end up mapping over the existing stack with some luck. The issue has been fixed since then (a87938b2e246: "fs/binfmt_elf.c: fix bug in loading of PIE binaries"), ELF_ET_DYN_BASE moved moved much further from the stack (eab09532d400 and later by c715b72c1ba4: "mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes") and excessive stack consumption early during execve fully stopped by da029c11e6b1 ("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be safe and any attack should be impractical. On the other hand this is just too subtle assumption so it can break quite easily and hard to spot. I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still fundamentally dangerous. Moreover it shouldn't be even needed. We are at the early process stage and so there shouldn't be unrelated mappings (except for stack and loader) existing so mmap for a given address should succeed even without MAP_FIXED. Something is terribly wrong if this is not the case and we should rather fail than silently corrupt the underlying mapping. Address this issue by changing MAP_FIXED to the newly added MAP_FIXED_NOREPLACE. This will mean that mmap will fail if there is an existing mapping clashing with the requested one without clobbering it. [mhocko@suse.com: fix build] [akpm@linux-foundation.org: coding-style fixes] [avagin@openvz.org: don't use the same value for MAP_FIXED_NOREPLACE and MAP_SYNC] Link: http://lkml.kernel.org/r/20171218184916.24445-1-avagin@openvz.org Link: http://lkml.kernel.org/r/20171213092550.2774-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Kees Cook <keescook@chromium.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Joel Stanley <joel@jms.id.au> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b8383831 |
|
10-Apr-2018 |
Kees Cook <keescook@chromium.org> |
exec: introduce finalize_exec() before start_thread() Provide a final callback into fs/exec.c before start_thread() takes over, to handle any last-minute changes, like the coming restoration of the stack limit. Link: http://lkml.kernel.org/r/1518638796-20819-3-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Cc: Brad Spengler <spender@grsecurity.net> Cc: Greg KH <greg@kroah.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
60c9d92f |
|
06-Feb-2018 |
Alexey Dobriyan <adobriyan@gmail.com> |
elf: fix NT_FILE integer overflow If vm.max_map_count bumped above 2^26 (67+ mil) and system has enough RAM to allocate all the VMAs (~12.8 GB on Fedora 27 with 200-byte VMAs), then it should be possible to overflow 32-bit "size", pass paranoia check, allocate very little vmalloc space and oops while writing into vmalloc guard page... But I didn't test this, only coredump of regular process. Link: http://lkml.kernel.org/r/20180112203427.GA9109@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
27e64b4b |
|
31-Oct-2017 |
Dave Martin <Dave.Martin@arm.com> |
regset: Add support for dynamically sized regsets Currently the regset API doesn't allow for the possibility that regsets (or at least, the amount of meaningful data in a regset) may change in size. In particular, this results in useless padding being added to coredumps if a regset's current size is smaller than its theoretical maximum size. This patch adds a get_size() function to struct user_regset. Individual regset implementations can implement this function to return the current size of the regset data. A regset_size() function is added to provide callers with an abstract interface for determining the size of a regset without needing to know whether the regset is dynamically sized or not. The only affected user of this interface is the ELF coredump code: This patch ports ELF coredump to dump regsets with their actual size in the coredump. This has no effect except for new regsets that are dynamically sized and provide a get_size() implementation. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: H. J. Lu <hjl.tools@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
4755200b |
|
16-Aug-2017 |
Nicolas Pitre <nico@fluxnic.net> |
binfmt_elf: don't attempt to load FDPIC binaries On platforms where both ELF and ELF-FDPIC variants are available, the regular ELF loader will happily identify FDPIC binaries as proper ELF and load them without the necessary FDPIC fixups, resulting in an immediate user space crash. Let's prevent binflt_elf from loading those binaries so binfmt_elf_fdpic has a chance to pick them up. For those architectures that don't define elf_check_fdpic(), a default version returning false is provided. Signed-off-by: Nicolas Pitre <nico@linaro.org> Acked-by: Mickael GUENE <mickael.guene@st.com> Tested-by: Vincent Abriou <vincent.abriou@st.com> Tested-by: Andras Szemzo <szemzo.andras@gmail.com>
|
#
bdd1d2d3 |
|
01-Sep-2017 |
Christoph Hellwig <hch@lst.de> |
fs: fix kernel_read prototype Use proper ssize_t and size_t types for the return value and count argument, move the offset last and make it an in/out argument like all other read/write helpers, and make the buf argument a void pointer to get rid of lots of casts in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
01578e36 |
|
15-Aug-2017 |
Oleg Nesterov <oleg@redhat.com> |
x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks The ADDR_NO_RANDOMIZE checks in stack_maxrandom_size() and randomize_stack_top() are not required. PF_RANDOMIZE is set by load_elf_binary() only if ADDR_NO_RANDOMIZE is not set, no need to re-check after that. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: stable@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20170815154011.GB1076@redhat.com
|
#
c425e189 |
|
18-Jul-2017 |
Kees Cook <keescook@chromium.org> |
binfmt: Introduce secureexec flag The bprm_secureexec hook can be moved earlier. Right now, it is called during create_elf_tables(), via load_binary(), via search_binary_handler(), via exec_binprm(). Nearly all (see exception below) state used by bprm_secureexec is created during the bprm_set_creds hook, called from prepare_binprm(). For all LSMs (except commoncaps described next), only the first execution of bprm_set_creds takes any effect (they all check bprm->called_set_creds which prepare_binprm() sets after the first call to the bprm_set_creds hook). However, all these LSMs also only do anything with bprm_secureexec when they detected a secure state during their first run of bprm_set_creds. Therefore, it is functionally identical to move the detection into bprm_set_creds, since the results from secureexec here only need to be based on the first call to the LSM's bprm_set_creds hook. The single exception is that the commoncaps secureexec hook also examines euid/uid and egid/gid differences which are controlled by bprm_fill_uid(), via prepare_binprm(), which can be called multiple times (e.g. binfmt_script, binfmt_misc), and may clear the euid/egid for the final load (i.e. the script interpreter). However, while commoncaps specifically ignores bprm->cred_prepared, and runs its bprm_set_creds hook each time prepare_binprm() may get called, it needs to base the secureexec decision on the final call to bprm_set_creds. As a result, it will need special handling. To begin this refactoring, this adds the secureexec flag to the bprm struct, and calls the secureexec hook during setup_new_exec(). This is safe since all the cred work is finished (and past the point of no return). This explicit call will be removed in later patches once the hook has been removed. Cc: David Howells <dhowells@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: James Morris <james.l.morris@oracle.com>
|
#
67c6777a |
|
10-Jul-2017 |
Kees Cook <keescook@chromium.org> |
binfmt_elf: safely increment argv pointers When building the argv/envp pointers, the envp is needlessly pre-incremented instead of just continuing after the argv pointers are finished. In some (likely impossible) race where the strings could be changed from userspace between copy_strings() and here, it might be possible to confuse the envp position. Instead, just use sp like everything else. Link: http://lkml.kernel.org/r/20170622173838.GA43308@beast Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Rik van Riel <riel@redhat.com> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Qualys Security Advisory <qsa@qualys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
eab09532 |
|
10-Jul-2017 |
Kees Cook <keescook@chromium.org> |
binfmt_elf: use ELF_ET_DYN_BASE only for PIE The ELF_ET_DYN_BASE position was originally intended to keep loaders away from ET_EXEC binaries. (For example, running "/lib/ld-linux.so.2 /bin/cat" might cause the subsequent load of /bin/cat into where the loader had been loaded.) With the advent of PIE (ET_DYN binaries with an INTERP Program Header), ELF_ET_DYN_BASE continued to be used since the kernel was only looking at ET_DYN. However, since ELF_ET_DYN_BASE is traditionally set at the top 1/3rd of the TASK_SIZE, a substantial portion of the address space is unused. For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs are loaded above the mmap region. This means they can be made to collide (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with pathological stack regions. Lowering ELF_ET_DYN_BASE solves both by moving programs below the mmap region in all cases, and will now additionally avoid programs falling back to the mmap region by enforcing MAP_FIXED for program loads (i.e. if it would have collided with the stack, now it will fail to load instead of falling back to the mmap region). To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP) are loaded into the mmap region, leaving space available for either an ET_EXEC binary with a fixed location or PIE being loaded into mmap by the loader. Only PIE programs are loaded offset from ELF_ET_DYN_BASE, which means architectures can now safely lower their values without risk of loaders colliding with their subsequently loaded programs. For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to use the entire 32-bit address space for 32-bit pointers. Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and suggestions on how to implement this solution. Fixes: d1fd836dcf00 ("mm: split ET_DYN ASLR from mmap ASLR") Link: http://lkml.kernel.org/r/20170621173201.GA114489@beast Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Rik van Riel <riel@redhat.com> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Qualys Security Advisory <qsa@qualys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Pratyush Anand <panand@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
32ef5517 |
|
05-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> Introduce a trivial, mostly empty <linux/sched/cputime.h> header to prepare for the moving of cputime functionality out of sched.h. Update all code that relies on these facilities. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
68db0cf1 |
|
08-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/task_stack.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
5b825c3a |
|
02-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> Add #include <linux/cred.h> dependencies to all .c files rely on sched.h doing that for them. Note that even if the count where we need to add extra headers seems high, it's still a net win, because <linux/sched.h> is included in over 2,200 files ... Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
f7ccbae4 |
|
08-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/coredump.h> We are going to split <linux/sched/coredump.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/coredump.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
16e72e9b |
|
22-Feb-2017 |
Denys Vlasenko <dvlasenk@redhat.com> |
powerpc: do not make the entire heap executable On 32-bit powerpc the ELF PLT sections of binaries (built with --bss-plt, or with a toolchain which defaults to it) look like this: [17] .sbss NOBITS 0002aff8 01aff8 000014 00 WA 0 0 4 [18] .plt NOBITS 0002b00c 01aff8 000084 00 WAX 0 0 4 [19] .bss NOBITS 0002b090 01aff8 0000a4 00 WA 0 0 4 Which results in an ELF load header: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x019c70 0x00029c70 0x00029c70 0x01388 0x014c4 RWE 0x10000 This is all correct, the load region containing the PLT is marked as executable. Note that the PLT starts at 0002b00c but the file mapping ends at 0002aff8, so the PLT falls in the 0 fill section described by the load header, and after a page boundary. Unfortunately the generic ELF loader ignores the X bit in the load headers when it creates the 0 filled non-file backed mappings. It assumes all of these mappings are RW BSS sections, which is not the case for PPC. gcc/ld has an option (--secure-plt) to not do this, this is said to incur a small performance penalty. Currently, to support 32-bit binaries with PLT in BSS kernel maps *entire brk area* with executable rights for all binaries, even --secure-plt ones. Stop doing that. Teach the ELF loader to check the X bit in the relevant load header and create 0 filled anonymous mappings that are executable if the load header requests that. Test program showing the difference in /proc/$PID/maps: int main() { char buf[16*1024]; char *p = malloc(123); /* make "[heap]" mapping appear */ int fd = open("/proc/self/maps", O_RDONLY); int len = read(fd, buf, sizeof(buf)); write(1, buf, len); printf("%p\n", p); return 0; } Compiled using: gcc -mbss-plt -m32 -Os test.c -otest Unpatched ppc64 kernel: 00100000-00120000 r-xp 00000000 00:00 0 [vdso] 0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094 /usr/lib/libc-2.17.so 0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094 /usr/lib/libc-2.17.so 0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094 /usr/lib/libc-2.17.so 10000000-10010000 r-xp 00000000 fd:00 100674505 /home/user/test 10010000-10020000 r--p 00000000 fd:00 100674505 /home/user/test 10020000-10030000 rw-p 00010000 fd:00 100674505 /home/user/test 10690000-106c0000 rwxp 00000000 00:00 0 [heap] f7f70000-f7fa0000 r-xp 00000000 fd:00 67898089 /usr/lib/ld-2.17.so f7fa0000-f7fb0000 r--p 00020000 fd:00 67898089 /usr/lib/ld-2.17.so f7fb0000-f7fc0000 rw-p 00030000 fd:00 67898089 /usr/lib/ld-2.17.so ffa90000-ffac0000 rw-p 00000000 00:00 0 [stack] 0x10690008 Patched ppc64 kernel: 00100000-00120000 r-xp 00000000 00:00 0 [vdso] 0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094 /usr/lib/libc-2.17.so 0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094 /usr/lib/libc-2.17.so 0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094 /usr/lib/libc-2.17.so 10000000-10010000 r-xp 00000000 fd:00 100674505 /home/user/test 10010000-10020000 r--p 00000000 fd:00 100674505 /home/user/test 10020000-10030000 rw-p 00010000 fd:00 100674505 /home/user/test 10180000-101b0000 rw-p 00000000 00:00 0 [heap] ^^^^ this has changed f7c60000-f7c90000 r-xp 00000000 fd:00 67898089 /usr/lib/ld-2.17.so f7c90000-f7ca0000 r--p 00020000 fd:00 67898089 /usr/lib/ld-2.17.so f7ca0000-f7cb0000 rw-p 00030000 fd:00 67898089 /usr/lib/ld-2.17.so ff860000-ff890000 rw-p 00000000 00:00 0 [stack] 0x10180008 The patch was originally posted in 2012 by Jason Gunthorpe and apparently ignored: https://lkml.org/lkml/2012/9/30/138 Lightly run-tested. Link: http://lkml.kernel.org/r/20161215131950.23054-1-dvlasenk@redhat.com Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Florian Weimer <fweimer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cd19c364 |
|
30-Jan-2017 |
Frederic Weisbecker <fweisbec@gmail.com> |
fs/binfmt: Convert obsolete cputime type to nsecs Use the new nsec based cputime accessors as part of the whole cputime conversion from cputime_t to nsecs. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-12-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
5613fda9 |
|
30-Jan-2017 |
Frederic Weisbecker <fweisbec@gmail.com> |
sched/cputime: Convert task/group cputime to nsecs Now that most cputime readers use the transition API which return the task cputime in old style cputime_t, we can safely store the cputime in nsecs. This will eventually make cputime statistics less opaque and more granular. Back and forth convertions between cputime_t and nsecs in order to deal with cputime_t random granularity won't be needed anymore. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-8-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
a1cecf2b |
|
30-Jan-2017 |
Frederic Weisbecker <fweisbec@gmail.com> |
sched/cputime: Introduce special task_cputime_t() API to return old-typed cputime This API returns a task's cputime in cputime_t in order to ease the conversion of cputime internals to use nsecs units instead. Blindly converting all cputime readers to use this API now will later let us convert more smoothly and step by step all these places to use the new nsec based cputime. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-7-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
4d22c75d |
|
11-Jan-2017 |
Dave Kleikamp <dave.kleikamp@oracle.com> |
coredump: Ensure proper size of sparse core files If the last section of a core file ends with an unmapped or zero page, the size of the file does not correspond with the last dump_skip() call. gdb complains that the file is truncated and can be confusing to users. After all of the vma sections are written, make sure that the file size is no smaller than the current file position. This problem can be demonstrated with gdb's bigcore testcase on the sparc architecture. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
7c0f6ba6 |
|
24-Dec-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
Replace <asm/uaccess.h> with <linux/uaccess.h> globally This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
30f74aa0 |
|
12-Dec-2016 |
Jason Baron <jbaron@akamai.com> |
binfmt_elf: use vmalloc() for allocation of vma_filesz We have observed page allocations failures of order 4 during core dump while trying to allocate vma_filesz. This results in a useless core file of size 0. To improve reliability use vmalloc(). Note that the vmalloc() allocation is bounded by sysctl_max_map_count, which is 65,530 by default. So with a 4k page size, and 8 bytes per seg, this is a max of 128 pages or an order 7 allocation. Other parts of the core dump path, such as fill_files_note() are already using vmalloc() for presumably similar reasons. Link: http://lkml.kernel.org/r/1479745791-17611-1-git-send-email-jbaron@akamai.com Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
90954e7b |
|
05-Sep-2016 |
Dmitry Safonov <0x7f454c46@gmail.com> |
x86/coredump: Use pr_reg size, rather that TIF_IA32 flag Killed PR_REG_SIZE and PR_REG_PTR macro as we can get regset size from regset view. I wish I could also kill PRSTATUS_SIZE nicely. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: 0x7f454c46@gmail.com Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: gorcunov@openvz.org Cc: xemul@virtuozzo.com Link: http://lkml.kernel.org/r/20160905133308.28234-5-dsafonov@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
9f834ec1 |
|
22-Aug-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
binfmt_elf: switch to new creds when switching to new mm We used to delay switching to the new credentials until after we had mapped the executable (and possible elf interpreter). That was kind of odd to begin with, since the new executable will actually then _run_ with the new creds, but whatever. The bigger problem was that we also want to make sure that we turn off prof events and tracing before we start mapping the new executable state. So while this is a cleanup, it's also a fix for a possible information leak. Reported-by: Robert Święcki <robert@swiecki.net> Tested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0036d1f7 |
|
02-Aug-2016 |
Kees Cook <keescook@chromium.org> |
binfmt_elf: fix calculations for bss padding A double-bug exists in the bss calculation code, where an overflow can happen in the "last_bss - elf_bss" calculation, but vm_brk internally aligns the argument, underflowing it, wrapping back around safe. We shouldn't depend on these bugs staying in sync, so this cleans up the bss padding handling to avoid the overflow. This moves the bss padzero() before the last_bss > elf_bss case, since the zero-filling of the ELF_PAGE should have nothing to do with the relationship of last_bss and elf_bss: any trailing portion should be zeroed, and a zero size is already handled by padzero(). Then it handles the math on elf_bss vs last_bss correctly. These need to both be ELF_PAGE aligned to get the comparison correct, since that's the expected granularity of the mappings. Since elf_bss already had alignment-based padding happen in padzero(), the "start" of the new vm_brk() should be moved forward as done in the original code. However, since the "end" of the vm_brk() area will already become PAGE_ALIGNed in vm_brk() then last_bss should get aligned here to avoid hiding it as a side-effect. Additionally makes a cosmetic change to the initial last_bss calculation so it's easier to read in comparison to the load_addr calculation above it (i.e. the only difference is p_filesz vs p_memsz). Link: http://lkml.kernel.org/r/1468014494-25291-2-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Ismael Ripoll Ripoll <iripoll@upv.es> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Chen Gang <gang.chen.5i5j@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1607f09c |
|
05-Jun-2016 |
Mateusz Guzik <mguzik@redhat.com> |
coredump: fix dumping through pipes The offset in the core file used to be tracked with ->written field of the coredump_params structure. The field was retired in favour of file->f_pos. However, ->f_pos is not maintained for pipes which leads to breakage. Restore explicit tracking of the offset in coredump_params. Introduce ->pos field for this purpose since ->written was already reused. Fixes: a00839395103 ("get rid of coredump_params->written"). Reported-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
5d22fc25 |
|
27-May-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
mm: remove more IS_ERR_VALUE abuses The do_brk() and vm_brk() return value was "unsigned long" and returned the starting address on success, and an error value on failure. The reasons are entirely historical, and go back to it basically behaving like the mmap() interface does. However, nobody actually wanted that interface, and it causes totally pointless IS_ERR_VALUE() confusion. What every single caller actually wants is just the simpler integer return of zero for success and negative error number on failure. So just convert to that much clearer and more common calling convention, and get rid of all the IS_ERR_VALUE() uses wrt vm_brk(). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ecc2bc8a |
|
23-May-2016 |
Michal Hocko <mhocko@suse.com> |
mm, elf: handle vm_brk error load_elf_library doesn't handle vm_brk failure although nothing really indicates it cannot do that because the function is allowed to fail due to vm_mmap failures already. This might be not a problem now but later patch will make vm_brk killable (resp. mmap_sem for write waiting will become killable) and so the failure will be more probable. Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a0083939 |
|
11-May-2016 |
Omar Sandoval <osandov@fb.com> |
coredump: get rid of coredump_params->written cprm->written is redundant with cprm->file->f_pos, so use that instead. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
09cbfeaf |
|
01-Apr-2016 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5ef11c35 |
|
26-Feb-2016 |
Daniel Cashman <dcashman@android.com> |
mm: ASLR: use get_random_long() Replace calls to get_random_int() followed by a cast to (unsigned long) with calls to get_random_long(). Also address shifting bug which, in case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits. Signed-off-by: Daniel Cashman <dcashman@android.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Kralevich <nnk@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
eb4bc076 |
|
12-Nov-2015 |
Maciej W. Rozycki <macro@mips.com> |
ELF: Also pass any interpreter's file header to `arch_check_elf' Also pass any interpreter's file header to `arch_check_elf' so that any architecture handler can have a look at it if needed. Signed-off-by: Maciej W. Rozycki <macro@imgtec.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Fortune <Matthew.Fortune@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/11478/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
#
54d15714 |
|
26-Oct-2015 |
Maciej W. Rozycki <macro@mips.com> |
binfmt_elf: Correct `arch_check_elf's description Correct `arch_check_elf's description, mistakenly copied and pasted from `arch_elf_pt_proc'. Signed-off-by: Maciej W. Rozycki <macro@imgtec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
b582ef5c |
|
26-Oct-2015 |
Maciej W. Rozycki <macro@mips.com> |
binfmt_elf: Don't clobber passed executable's file header Do not clobber the buffer space passed from `search_binary_handler' and originally preloaded by `prepare_binprm' with the executable's file header by overwriting it with its interpreter's file header. Instead keep the buffer space intact and directly use the data structure locally allocated for the interpreter's file header, fixing a bug introduced in 2.1.14 with loadable module support (linux-mips.org commit beb11695 [Import of Linux/MIPS 2.1.14], predating kernel.org repo's history). Adjust the amount of data read from the interpreter's file accordingly. This was not an issue before loadable module support, because back then `load_elf_binary' was executed only once for a given ELF executable, whether the function succeeded or failed. With loadable module support supported and enabled, upon a failure of `load_elf_binary' -- which may for example be caused by architecture code rejecting an executable due to a missing hardware feature requested in the file header -- a module load is attempted and then the function reexecuted by `search_binary_handler'. With the executable's file header replaced with its interpreter's file header the executable can then be erroneously accepted in this subsequent attempt. Cc: stable@vger.kernel.org # all the way back Signed-off-by: Maciej W. Rozycki <macro@imgtec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
5037835c |
|
05-Oct-2015 |
Ross Zwisler <zwisler@kernel.org> |
coredump: add DAX filtering for ELF coredumps Add two new flags to the existing coredump mechanism for ELF files to allow us to explicitly filter DAX mappings. This is desirable because DAX mappings, like hugetlb mappings, have the potential to be very large. Update the coredump_filter documentation in Documentation/filesystems/proc.txt so that it addresses the new DAX coredump flags. Also update the documented default value of coredump_filter to be consistent with the core(5) man page. The documentation being updated talks about bit 4, Dump ELF headers, which is enabled if CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is turned on in the kernel config. This kernel config option defaults to "y" if both ELF binaries and coredump are enabled. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
9bf39ab2 |
|
19-Jun-2015 |
Miklos Szeredi <mszeredi@suse.cz> |
vfs: add file_path() helper Turn d_path(&file->f_path, ...); into file_path(file, ...); Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
2b1d3ae9 |
|
28-May-2015 |
Andrew Morton <akpm@linux-foundation.org> |
fs/binfmt_elf.c:load_elf_binary(): return -EINVAL on zero-length mappings load_elf_binary() returns `retval', not `error'. Fixes: a87938b2e246b81b4fb ("fs/binfmt_elf.c: fix bug in loading of PIE binaries") Reported-by: James Hogan <james.hogan@imgtec.com> Cc: Michael Davidson <md@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
204db6ed |
|
14-Apr-2015 |
Kees Cook <keescook@chromium.org> |
mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE The arch_randomize_brk() function is used on several architectures, even those that don't support ET_DYN ASLR. To avoid bulky extern/#define tricks, consolidate the support under CONFIG_ARCH_HAS_ELF_RANDOMIZE for the architectures that support it, while still handling CONFIG_COMPAT_BRK. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Russell King <linux@arm.linux.org.uk> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: "David A. Long" <dave.long@linaro.org> Cc: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Arun Chandran <achandran@mvista.com> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Min-Hua Chen <orca.chen@gmail.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Alex Smith <alex@alex-smith.me.uk> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Vineeth Vijayan <vvijayan@mvista.com> Cc: Jeff Bailey <jeffbailey@google.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Behan Webster <behanw@converseincode.com> Cc: Ismael Ripoll <iripoll@upv.es> Cc: Jan-Simon Mller <dl9pf@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d1fd836d |
|
14-Apr-2015 |
Kees Cook <keescook@chromium.org> |
mm: split ET_DYN ASLR from mmap ASLR This fixes the "offset2lib" weakness in ASLR for arm, arm64, mips, powerpc, and x86. The problem is that if there is a leak of ASLR from the executable (ET_DYN), it means a leak of shared library offset as well (mmap), and vice versa. Further details and a PoC of this attack is available here: http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html With this patch, a PIE linked executable (ET_DYN) has its own ASLR region: $ ./show_mmaps_pie 54859ccd6000-54859ccd7000 r-xp ... /tmp/show_mmaps_pie 54859ced6000-54859ced7000 r--p ... /tmp/show_mmaps_pie 54859ced7000-54859ced8000 rw-p ... /tmp/show_mmaps_pie 7f75be764000-7f75be91f000 r-xp ... /lib/x86_64-linux-gnu/libc.so.6 7f75be91f000-7f75beb1f000 ---p ... /lib/x86_64-linux-gnu/libc.so.6 7f75beb1f000-7f75beb23000 r--p ... /lib/x86_64-linux-gnu/libc.so.6 7f75beb23000-7f75beb25000 rw-p ... /lib/x86_64-linux-gnu/libc.so.6 7f75beb25000-7f75beb2a000 rw-p ... 7f75beb2a000-7f75beb4d000 r-xp ... /lib64/ld-linux-x86-64.so.2 7f75bed45000-7f75bed46000 rw-p ... 7f75bed46000-7f75bed47000 r-xp ... 7f75bed47000-7f75bed4c000 rw-p ... 7f75bed4c000-7f75bed4d000 r--p ... /lib64/ld-linux-x86-64.so.2 7f75bed4d000-7f75bed4e000 rw-p ... /lib64/ld-linux-x86-64.so.2 7f75bed4e000-7f75bed4f000 rw-p ... 7fffb3741000-7fffb3762000 rw-p ... [stack] 7fffb377b000-7fffb377d000 r--p ... [vvar] 7fffb377d000-7fffb377f000 r-xp ... [vdso] The change is to add a call the newly created arch_mmap_rnd() into the ELF loader for handling ET_DYN ASLR in a separate region from mmap ASLR, as was already done on s390. Removes CONFIG_BINFMT_ELF_RANDOMIZE_PIE, which is no longer needed. Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Russell King <linux@arm.linux.org.uk> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: "David A. Long" <dave.long@linaro.org> Cc: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Arun Chandran <achandran@mvista.com> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Min-Hua Chen <orca.chen@gmail.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Alex Smith <alex@alex-smith.me.uk> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Vineeth Vijayan <vvijayan@mvista.com> Cc: Jeff Bailey <jeffbailey@google.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Behan Webster <behanw@converseincode.com> Cc: Ismael Ripoll <iripoll@upv.es> Cc: Jan-Simon Mller <dl9pf@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a87938b2 |
|
14-Apr-2015 |
Michael Davidson <md@google.com> |
fs/binfmt_elf.c: fix bug in loading of PIE binaries With CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE enabled, and a normal top-down address allocation strategy, load_elf_binary() will attempt to map a PIE binary into an address range immediately below mm->mmap_base. Unfortunately, load_elf_ binary() does not take account of the need to allocate sufficient space for the entire binary which means that, while the first PT_LOAD segment is mapped below mm->mmap_base, the subsequent PT_LOAD segment(s) end up being mapped above mm->mmap_base into the are that is supposed to be the "gap" between the stack and the binary. Since the size of the "gap" on x86_64 is only guaranteed to be 128MB this means that binaries with large data segments > 128MB can end up mapping part of their data segment over their stack resulting in corruption of the stack (and the data segment once the binary starts to run). Any PIE binary with a data segment > 128MB is vulnerable to this although address randomization means that the actual gap between the stack and the end of the binary is normally greater than 128MB. The larger the data segment of the binary the higher the probability of failure. Fix this by calculating the total size of the binary in the same way as load_elf_interp(). Signed-off-by: Michael Davidson <md@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4e7c22d4 |
|
14-Feb-2015 |
Hector Marco-Gisbert <hecmargi@upv.es> |
x86, mm/ASLR: Fix stack randomization on 64-bit systems The issue is that the stack for processes is not properly randomized on 64 bit architectures due to an integer overflow. The affected function is randomize_stack_top() in file "fs/binfmt_elf.c": static unsigned long randomize_stack_top(unsigned long stack_top) { unsigned int random_variable = 0; if ((current->flags & PF_RANDOMIZE) && !(current->personality & ADDR_NO_RANDOMIZE)) { random_variable = get_random_int() & STACK_RND_MASK; random_variable <<= PAGE_SHIFT; } return PAGE_ALIGN(stack_top) + random_variable; return PAGE_ALIGN(stack_top) - random_variable; } Note that, it declares the "random_variable" variable as "unsigned int". Since the result of the shifting operation between STACK_RND_MASK (which is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64): random_variable <<= PAGE_SHIFT; then the two leftmost bits are dropped when storing the result in the "random_variable". This variable shall be at least 34 bits long to hold the (22+12) result. These two dropped bits have an impact on the entropy of process stack. Concretely, the total stack entropy is reduced by four: from 2^28 to 2^30 (One fourth of expected entropy). This patch restores back the entropy by correcting the types involved in the operations in the functions randomize_stack_top() and stack_maxrandom_size(). The successful fix can be tested with: $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done 7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0 [stack] 7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0 [stack] 7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0 [stack] 7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0 [stack] ... Once corrected, the leading bytes should be between 7ffc and 7fff, rather than always being 7fff. Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es> Signed-off-by: Ismael Ripoll <iripoll@upv.es> [ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: CVE-2015-1593 Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.net Signed-off-by: Borislav Petkov <bp@suse.de>
|
#
52f5592e |
|
10-Dec-2014 |
Jungseung Lee <js07.lee@gmail.com> |
fs/binfmt_elf.c: fix internal inconsistency relating to vma dump size vma_dump_size() has been used several times on actual dumper and it is supposed to return the same value for the same vma. But vma_dump_size() could return different values for same vma. The known problem case is concurrent shared memory removal. If a vma is used for a shared memory and that shared memory is removed between writing program header and dumping vma memory, this will result in a dump file which is internally consistent. To fix the problem, we set baseline to get dump size and store the size into vma_filesz and always use the same vma dump size which is stored in vma_filsz. The consistnecy with reality is not actually guranteed, but it's tolerable since that is fully consistent with base line. Signed-off-by: Jungseung Lee <js07.lee@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
774c105e |
|
11-Sep-2014 |
Paul Burton <paulburton@kernel.org> |
binfmt_elf: allow arch code to examine PT_LOPROC ... PT_HIPROC headers MIPS is introducing new variants of its O32 ABI which differ in their handling of floating point, in order to enable a gradual transition towards a world where mips32 binaries can take advantage of new hardware features only available when configured for certain FP modes. In order to do this ELF binaries are being augmented with a new section that indicates, amongst other things, the FP mode requirements of the binary. The presence & location of such a section is indicated by a program header in the PT_LOPROC ... PT_HIPROC range. In order to allow the MIPS architecture code to examine the program header & section in question, pass all program headers in this range to an architecture-specific arch_elf_pt_proc function. This function may return an error if the header is deemed invalid or unsuitable for the system, in which case that error will be returned from load_elf_binary and upwards through the execve syscall. A means is required for the architecture code to make a decision once it is known that all such headers have been seen, but before it is too late to return from an execve syscall. For this purpose the arch_check_elf function is added, and called once, after all PT_LOPROC to PT_HIPROC headers have been passed to arch_elf_pt_proc but before the code which invoked execve has been lost. This enables the architecture code to make a decision based upon all the headers present in an ELF binary and its interpreter, as is required to forbid conflicting FP ABI requirements between an ELF & its interpreter. In order to allow data to be stored throughout the calls to the above functions, struct arch_elf_state is introduced. Finally a variant of the SET_PERSONALITY macro is introduced which accepts a pointer to the struct arch_elf_state, allowing it to act based upon state observed from the architecture specific program headers. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/7679/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
#
a9d9ef13 |
|
11-Sep-2014 |
Paul Burton <paulburton@kernel.org> |
binfmt_elf: load interpreter program headers earlier Load the program headers of an ELF interpreter early enough in load_elf_binary that they can be examined before it's too late to return an error from an exec syscall. This patch does not perform any such checking, it merely lays the groundwork for a further patch to do so. No functional change is intended. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/7675/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
#
6a8d3894 |
|
11-Sep-2014 |
Paul Burton <paulburton@kernel.org> |
binfmt_elf: Hoist ELF program header loading to a function load_elf_binary & load_elf_interp both load program headers from an ELF executable in the same way, duplicating the code. This patch introduces a helper function (load_elf_phdrs) which performs this common task & calls it from both load_elf_binary & load_elf_interp. In addition to reducing code duplication, this is part of preparing to load the ELF interpreter headers earlier such that they can be examined before it's too late to return an error from an exec syscall. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/7676/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
#
19d860a1 |
|
04-May-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
handle suicide on late failure exits in execve() in search_binary_handler() ... rather than doing that in the guts of ->load_binary(). [updated to fix the bug spotted by Shentino - for SIGSEGV we really need something stronger than send_sig_info(); again, better do that in one place] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
b219e25f |
|
04-Jun-2014 |
Fabian Frederick <fabf@skynet.be> |
fs/binfmt_elf.c: fix bool assignements Fix coccinelle warnings. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
78d683e8 |
|
19-May-2014 |
Andy Lutomirski <luto@amacapital.net> |
mm, fs: Add vm_ops->name as an alternative to arch_vma_name arch_vma_name sucks. It's a silly hack, and it's annoying to implement correctly. In fact, AFAICS, even the straightforward x86 implementation is incorrect (I suspect that it breaks if the vdso mapping is split or gets remapped). This adds a new vm_ops->name operation that can replace it. The followup patches will remove all uses of arch_vma_name on x86, fixing a couple of annoyances in the process. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/2eee21791bb36a0a408c5c2bdb382a9e6a41ca4a.1400538962.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
ab0e113f |
|
07-Apr-2014 |
Alex Thorlton <athorlton@sgi.com> |
exec: kill the unnecessary mm->def_flags setting in load_elf_binary() load_elf_binary() sets current->mm->def_flags = def_flags and def_flags is always zero. Not only this looks strange, this is unnecessary because mm_init() has already set ->def_flags = 0. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
69369a70 |
|
03-Apr-2014 |
Josh Triplett <josh@joshtriplett.org> |
fs, kernel: permit disabling the uselib syscall uselib hasn't been used since libc5; glibc does not use it. Support turning it off. When disabled, also omit the load_elf_library implementation from binfmt_elf.c, which only uselib invokes. bloat-o-meter: add/remove: 0/4 grow/shrink: 0/1 up/down: 0/-785 (-785) function old new delta padzero 39 36 -3 uselib_flags 20 - -20 sys_uselib 168 - -168 SyS_uselib 168 - -168 load_elf_library 426 - -426 The new CONFIG_USELIB defaults to `y'. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7a5f4f1c |
|
23-Jan-2014 |
Todor Minchev <todor@minchev.co.uk> |
fs: binfmt_elf: remove unused defines INTERPRETER_NONE and INTERPRETER_ELF These two defines are unused since the removal of the a.out interpreter support in the ELF loader in kernel 2.6.25 Signed-off-by: Todor Minchev <todor@minchev.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
afabada9 |
|
14-Oct-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
elf{,_fdpic} coredump: get rid of pointless if (siginfo->si_signo) we can't get to do_coredump() if that condition isn't satisfied... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
ec57941e |
|
13-Oct-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
constify do_coredump() argument Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
ce395960 |
|
13-Oct-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
constify copy_siginfo_to_user{,32}() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
22a8cb82 |
|
08-Oct-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: dump_align() dump_skip to given alignment... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
9b56d543 |
|
08-Oct-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
dump_skip(): dump_seek() replacement taking coredump_params Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
1ad67015 |
|
07-Oct-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
binfmt_elf: count notes towards coredump limit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
cdc3d562 |
|
05-Oct-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
switch elf_coredump_extra_notes_write() to dump_emit() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
13046ece |
|
05-Oct-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
binfmt_elf: convert writing actual dump pages to dump_emit() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
aa3e7eaf |
|
05-Oct-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
switch elf_core_write_extra_data() to dump_emit() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
506f21c5 |
|
05-Oct-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
switch elf_core_write_extra_phdrs() to dump_emit() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
ecc8c772 |
|
05-Oct-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: dump_emit() dump_write() analog, takes core_dump_params instead of file, keeps track of the amount written in cprm->written and checks for cprm->limit. Start using it in binfmt_elf.c... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
72c2d531 |
|
22-Sep-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
file->f_op is never NULL... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
72023656 |
|
30-Sep-2013 |
Dan Aloni <alonid@stratoscale.com> |
fs/binfmt_elf.c: prevent a coredump with a large vm_map_count from Oopsing A high setting of max_map_count, and a process core-dumping with a large enough vm_map_count could result in an NT_FILE note not being written, and the kernel crashing immediately later because it has assumed otherwise. Reproduction of the oops-causing bug described here: https://lkml.org/lkml/2013/8/30/50 Rge ussue originated in commit 2aa362c49c31 ("coredump: extend core dump note section to contain file names of mapped file") from Oct 4, 2012. This patch make that section optional in that case. fill_files_note() should signify the error, and also let the info struct in elf_core_dump() be zero-initialized so that we can check for the optionally written note. [akpm@linux-foundation.org: avoid abusing E2BIG, remove a couple of not-really-needed local variables] [akpm@linux-foundation.org: fix sparse warning] Signed-off-by: Dan Aloni <alonid@stratoscale.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Denys Vlasenko <vda.linux@googlemail.com> Reported-by: Martin MOKREJS <mmokrejs@gmail.com> Tested-by: Martin MOKREJS <mmokrejs@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
98d1e64f |
|
10-Jul-2013 |
Michel Lespinasse <walken@google.com> |
mm: remove free_area_cache Since all architectures have been converted to use vm_unmapped_area(), there is no remaining use for the free_area_cache. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Helge Deller <deller@gmx.de> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
079148b9 |
|
30-Apr-2013 |
Oleg Nesterov <oleg@redhat.com> |
coredump: factor out the setting of PF_DUMPCORE Cleanup. Every linux_binfmt->core_dump() sets PF_DUMPCORE, move this into zap_threads() called by do_coredump(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Mandeep Singh Baines <msb@chromium.org> Cc: Neil Horman <nhorman@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c1d025e2 |
|
30-Apr-2013 |
Jiri Kosina <jkosina@suse.cz> |
binfmt_elf: PIE: make PF_RANDOMIZE check comment more accurate The comment I originally added in commit a3defbe5c337 ("binfmt_elf: fix PIE execution with randomization disabled") is not really 100% accurate -- sysctl is not the only way how PF_RANDOMIZE could be forcibly unset in runtime. Another option of course is direct modification of personality flags (i.e. running through setarch wrapper). Make the comment more explicit and accurate. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2171364d |
|
17-Apr-2013 |
Michael Neuling <mikey@neuling.org> |
powerpc: Add HWCAP2 aux entry We are currently out of free bits in AT_HWCAP. With POWER8, we have several hardware features that we need to advertise. Tested on POWER and x86. Signed-off-by: Michael Neuling <michael@neuling.org> Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
23d9e482 |
|
17-Apr-2013 |
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> |
fs/binfmt_elf.c: fix hugetlb memory check in vma_dump_size() Documentation/filesystems/proc.txt says about coredump_filter bitmask, Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only effected by bit 5-6. However current code can go into the subsequent flag checks of bit 0-4 for vma(VM_HUGETLB). So this patch inserts 'return' and makes it work as written in the document. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c07380be |
|
09-May-2011 |
James Hogan <jhogan@kernel.org> |
Revert some of "binfmt_elf: cleanups" The commit "binfmt_elf: cleanups" (f670d0ecda73b7438eec9ed108680bc5f5362ad8) removed an ifndef elf_map but this breaks compilation for metag which does define elf_map. This adds the ifndef back in as it was before, but does not affect the other cleanups made by that patch. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Acked-by: Mikael Pettersson <mikpe@it.uu.se>
|
#
496ad9aa |
|
23-Jan-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: file_inode(file) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
d3330cf0 |
|
21-Feb-2013 |
Zhang Yanfei <zhangyanfei@cn.fujitsu.com> |
binfmt_elf: remove unused argument in fill_elf_header In fill_elf_header(), elf->e_ident[EI_OSABI] is always set to ELF_OSABI, so remove the unused argument 'osabi'. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6fac4829 |
|
13-Nov-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
cputime: Use accessors to read task cputime stats This is in preparation for the full dynticks feature. While remotely reading the cputime of a task running in a full dynticks CPU, we'll need to do some extra-computation. This way we can account the time it spent tickless in userspace since its last cputime snapshot. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
|
#
6899e92d |
|
17-Dec-2012 |
Alan Cox <alan@linux.intel.com> |
binfmt_elf: fix corner case kfree of uninitialized data If elf_core_dump() is called and fill_note_info() fails in the kmalloc() then it returns 0 but has not yet initialised all the needed fields. As a result we do a kfree(randomness) after correctly skipping the thread data. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
71613c3b |
|
20-Oct-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
get rid of pt_regs argument of ->load_binary() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
314e51b9 |
|
08-Oct-2012 |
Konstantin Khlebnikov <khlebnikov@openvz.org> |
mm: kill vma flag VM_RESERVED and mm->reserved_vm counter A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA, currently it lost original meaning but still has some effects: | effect | alternative flags -+------------------------+--------------------------------------------- 1| account as reserved_vm | VM_IO 2| skip in core dump | VM_IO, VM_DONTDUMP 3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP 4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP This patch removes reserved_vm counter from mm_struct. Seems like nobody cares about it, it does not exported into userspace directly, it only reduces total_vm showed in proc. Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP. remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP. remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP. [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0103bd16 |
|
08-Oct-2012 |
Konstantin Khlebnikov <khlebnikov@openvz.org> |
mm: prepare VM_DONTDUMP for using in drivers Rename VM_NODUMP into VM_DONTDUMP: this name matches other negative flags: VM_DONTEXPAND, VM_DONTCOPY. Currently this flag used only for sys_madvise. The next patch will use it for replacing the outdated flag VM_RESERVED. Also forbid madvise(MADV_DODUMP) for special kernel mappings VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP) Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2aa362c4 |
|
04-Oct-2012 |
Denys Vlasenko <vda.linux@googlemail.com> |
coredump: extend core dump note section to contain file names of mapped files This note has the following format: long count -- how many files are mapped long page_size -- units for file_ofs array of [COUNT] elements of long start long end long file_ofs followed by COUNT filenames in ASCII: "FILE1" NUL "FILE2" NUL... Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Amerigo Wang <amwang@redhat.com> Cc: "Jonathan M. Foote" <jmfoote@cert.org> Cc: Roland McGrath <roland@hack.frob.com> Cc: Pedro Alves <palves@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
49ae4d4b |
|
04-Oct-2012 |
Denys Vlasenko <vda.linux@googlemail.com> |
coredump: add a new elf note with siginfo of the signal Existing PRSTATUS note contains only si_signo, si_code, si_errno fields from the siginfo of the signal which caused core to be dumped. There are tools which try to analyze crashes for possible security implications, and they want to use, among other data, si_addr field from the SIGSEGV. This patch adds a new elf note, NT_SIGINFO, which contains the complete siginfo_t of the signal which killed the process. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Amerigo Wang <amwang@redhat.com> Cc: "Jonathan M. Foote" <jmfoote@cert.org> Cc: Roland McGrath <roland@hack.frob.com> Cc: Pedro Alves <palves@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5ab1c309 |
|
04-Oct-2012 |
Denys Vlasenko <vda.linux@googlemail.com> |
coredump: pass siginfo_t* to do_coredump() and below, not merely signr This is a preparatory patch for the introduction of NT_SIGINFO elf note. With this patch we pass "siginfo_t *siginfo" instead of "int signr" to do_coredump() and put it into coredump_params. It will be used by the next patch. Most changes are simple s/signr/siginfo->si_signo/. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Amerigo Wang <amwang@redhat.com> Cc: "Jonathan M. Foote" <jmfoote@cert.org> Cc: Roland McGrath <roland@hack.frob.com> Cc: Pedro Alves <palves@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6eec482f |
|
04-Oct-2012 |
Alan Cox <alan@linux.intel.com> |
binfmt_elf: Uninitialized variable load_elf_interp() has interp_map_addr carefully described as "uninitialized_var" and marked so as to avoid a warning. However if you trace the code it is passed into load_elf_interp and then this value is checked against NULL. As this return value isn't used this is actually safe but it freaks various analysis tools that see un-initialized memory addresses being read before their value is ever defined. Set it to NULL as a matter of programming good taste if nothing else Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f34f9d18 |
|
25-Sep-2012 |
Denys Vlasenko <vda.linux@googlemail.com> |
coredump: prevent double-free on an error path in core dumper In !CORE_DUMP_USE_REGSET case, if elf_note_info_init fails to allocate memory for info->fields, it frees already allocated stuff and returns error to its caller, fill_note_info. Which in turn returns error to its caller, elf_core_dump. Which jumps to cleanup label and calls free_note_info, which will happily try to free all info->fields again. BOOM. This is the fix. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Cc: Venu Byravarasu <vbyravarasu@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
826eba4d |
|
02-Aug-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
the only place that needs to include asm/exec.h is linux/binfmts.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
5a5e4c2e |
|
29-May-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
binfmt_elf: switch elf_map() to vm_mmap/vm_munmap No reason to hold ->mmap_sem over the sequence Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
ebc887b2 |
|
07-Feb-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Convert binary formats to use kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
6be5ceb0 |
|
20-Apr-2012 |
Linus Torvalds <torvalds@linux-foundation.org> |
VM: add "vm_mmap()" helper function This continues the theme started with vm_brk() and vm_munmap(): vm_mmap() does the same thing as do_mmap(), but additionally does the required VM locking. This uninlines (and rewrites it to be clearer) do_mmap(), which sadly duplicates it in mm/mmap.c and mm/nommu.c. But that way we don't have to export our internal do_mmap_pgoff() function. Some day we hopefully don't have to export do_mmap() either, if all modular users can become the simpler vm_mmap() instead. We're actually very close to that already, with the notable exception of the (broken) use in i810, and a couple of stragglers in binfmt_elf. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e4eb1ff6 |
|
20-Apr-2012 |
Linus Torvalds <torvalds@linux-foundation.org> |
VM: add "vm_brk()" helper function It does the same thing as "do_brk()", except it handles the VM locking too. It turns out that all external callers want that anyway, so we can make do_brk() static to just mm/mmap.c while at it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
96f951ed |
|
28-Mar-2012 |
David Howells <dhowells@redhat.com> |
Add #includes needed to permit the removal of asm/system.h asm/system.h is a cause of circular dependency problems because it contains commonly used primitive stuff like barrier definitions and uncommonly used stuff like switch_to() that might require MMU definitions. asm/system.h has been disintegrated by this point on all arches into the following common segments: (1) asm/barrier.h Moved memory barrier definitions here. (2) asm/cmpxchg.h Moved xchg() and cmpxchg() here. #included in asm/atomic.h. (3) asm/bug.h Moved die() and similar here. (4) asm/exec.h Moved arch_align_stack() here. (5) asm/elf.h Moved AT_VECTOR_SIZE_ARCH here. (6) asm/switch_to.h Moved switch_to() here. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
accb61fe |
|
23-Mar-2012 |
Jason Baron <jbaron@redhat.com> |
coredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP Since we no longer need the VM_ALWAYSDUMP flag, let's use the freed bit for 'VM_NODUMP' flag. The idea is is to add a new madvise() flag: MADV_DONTDUMP, which can be set by applications to specifically request memory regions which should not dump core. The specific application I have in mind is qemu: we can add a flag there that wouldn't dump all of guest memory when qemu dumps core. This flag might also be useful for security sensitive apps that want to absolutely make sure that parts of memory are not dumped. To clear the flag use: MADV_DODUMP. [akpm@linux-foundation.org: s/MADV_NODUMP/MADV_DONTDUMP/, s/MADV_CLEAR_NODUMP/MADV_DODUMP/, per Roland] [akpm@linux-foundation.org: fix up the architectures which broke] Signed-off-by: Jason Baron <jbaron@redhat.com> Acked-by: Roland McGrath <roland@hack.frob.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Avi Kivity <avi@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
909af768 |
|
23-Mar-2012 |
Jason Baron <jbaron@redhat.com> |
coredump: remove VM_ALWAYSDUMP flag The motivation for this patchset was that I was looking at a way for a qemu-kvm process, to exclude the guest memory from its core dump, which can be quite large. There are already a number of filter flags in /proc/<pid>/coredump_filter, however, these allow one to specify 'types' of kernel memory, not specific address ranges (which is needed in this case). Since there are no more vma flags available, the first patch eliminates the need for the 'VM_ALWAYSDUMP' flag. The flag is used internally by the kernel to mark vdso and vsyscall pages. However, it is simple enough to check if a vma covers a vdso or vsyscall page without the need for this flag. The second patch then replaces the 'VM_ALWAYSDUMP' flag with a new 'VM_NODUMP' flag, which can be set by userspace using new madvise flags: 'MADV_DONTDUMP', and unset via 'MADV_DODUMP'. The core dump filters continue to work the same as before unless 'MADV_DONTDUMP' is set on the region. The qemu code which implements this features is at: http://people.redhat.com/~jbaron/qemu-dump/qemu-dump.patch In my testing the qemu core dump shrunk from 383MB -> 13MB with this patch. I also believe that the 'MADV_DONTDUMP' flag might be useful for security sensitive apps, which might want to select which areas are dumped. This patch: The VM_ALWAYSDUMP flag is currently used by the coredump code to indicate that a vma is part of a vsyscall or vdso section. However, we can determine if a vma is in one these sections by checking it against the gate_vma and checking for a non-NULL return value from arch_vma_name(). Thus, freeing a valuable vma bit. Signed-off-by: Jason Baron <jbaron@redhat.com> Acked-by: Roland McGrath <roland@hack.frob.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
19e5109f |
|
23-Feb-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
take removal of PF_FORKNOEXEC to flush_old_exec() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
8fc3dc5a |
|
17-Mar-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
__register_binfmt() made void Just don't pass NULL to it - nobody does, anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
c8e25258 |
|
02-Mar-2012 |
H. Peter Anvin <hpa@zytor.com> |
regset: Prevent null pointer reference on readonly regsets The regset common infrastructure assumed that regsets would always have .get and .set methods, but not necessarily .active methods. Unfortunately people have since written regsets without .set methods. Rather than putting in stub functions everywhere, handle regsets with null .get or .set methods explicitly. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@hack.frob.com> Cc: <stable@vger.kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0953f65d |
|
14-Feb-2012 |
H. J. Lu <hjl.tools@gmail.com> |
elf: Allow core dump-related fields to be overridden Allow some core dump-related fields to be overridden. This allows core dumps to work correctly for x32. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com>
|
#
e39f5602 |
|
10-Jan-2012 |
David Daney <ddaney.cavm@gmail.com> |
fs: binfmt_elf: create Kconfig variable for PIE randomization Randomization of PIE load address is hard coded in binfmt_elf.c for X86 and ARM. Create a new Kconfig variable (CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE) for this and use it instead. Thus architecture specific policy is pushed out of the generic binfmt_elf.c and into the architecture Kconfig files. X86 and ARM Kconfigs are modified to select the new variable so there is no change in behavior. A follow on patch will select it for MIPS too. Signed-off-by: David Daney <david.daney@cavium.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a3defbe5 |
|
02-Nov-2011 |
Jiri Kosina <jkosina@suse.cz> |
binfmt_elf: fix PIE execution with randomization disabled The case of address space randomization being disabled in runtime through randomize_va_space sysctl is not treated properly in load_elf_binary(), resulting in SIGKILL coming at exec() time for certain PIE-linked binaries in case the randomization has been disabled at runtime prior to calling exec(). Handle the randomize_va_space == 0 case the same way as if we were not supporting .text randomization at all. Based on original patch by H.J. Lu and Josh Boyer. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Russell King <rmk@arm.linux.org.uk> Cc: H.J. Lu <hongjiu.lu@intel.com> Cc: <stable@kernel.org> Tested-by: Josh Boyer <jwboyer@redhat.com> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1b5d783c |
|
18-Jun-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
consolidate BINPRM_FLAGS_ENFORCE_NONDUMP handling new helper: would_dump(bprm, file). Checks if we are allowed to read the file and if we are not - sets ENFORCE_NODUMP. Exported, used in places that previously open-coded the same logics. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4471a675 |
|
14-Apr-2011 |
Jiri Kosina <jkosina@suse.cz> |
brk: COMPAT_BRK: fix detection of randomized brk 5520e89 ("brk: fix min_brk lower bound computation for COMPAT_BRK") tried to get the whole logic of brk randomization for legacy (libc5-based) applications finally right. It turns out that the way to detect whether brk has actually been randomized in the end or not introduced by that patch still doesn't work for those binaries, as reported by Geert: : /sbin/init from my old m68k ramdisk exists prematurely. : : Before the patch: : : | brk(0x80005c8e) = 0x80006000 : : After the patch: : : | brk(0x80005c8e) = 0x80005c8e : : Old libc5 considers brk() to have failed if the return value is not : identical to the requested value. I don't like it, but currently see no better option than a bit flag in task_struct to catch the CONFIG_COMPAT_BRK && randomize_va_space == 2 case. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
31db58b3 |
|
13-Mar-2011 |
Stephen Wilson <wilsons@start.ca> |
mm: arch: make get_gate_vma take an mm_struct instead of a task_struct Morally, the presence of a gate vma is more an attribute of a particular mm than a particular task. Moreover, dropping the dependency on task_struct will help make both existing and future operations on mm's more flexible and convenient. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
1a530a6f |
|
22-Mar-2011 |
David Daney <ddaney@caviumnetworks.com> |
binfmt_elf: quiet GCC-4.6 'set but not used' warning in load_elf_binary() With GCC-4.6 we get warnings about things being 'set but not used'. In load_elf_binary() this can happen with reloc_func_desc if ELF_PLAT_INIT is defined, but doesn't use the reloc_func_desc argument. Quiet the warning/error by marking reloc_func_desc as __maybe_unused. Signed-off-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f670d0ec |
|
12-Jan-2011 |
Mikael Pettersson <mikpe@it.uu.se> |
binfmt_elf: cleanups This cleans up a few bits in binfmt_elf.c and binfmts.h: - the hasvdso field in struct linux_binfmt is unused, so remove it and the only initialization of it - the elf_map CPP symbol is not defined anywhere in the kernel, so remove an unnecessary #ifndef elf_map - reduce excessive indentation in elf_format's initializer - add missing spaces, remove extraneous spaces No functional changes, but tested on x86 (32 and 64 bit), powerpc (32 and 64 bit), sparc64, arm, and alpha. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e4eab08d |
|
20-Aug-2010 |
Nicolas Pitre <nico@fluxnic.net> |
ARM: 6342/1: fix ASLR of PIE executables Since commits 990cb8acf2 and cc92c28b2d, it is possible to have full address space layout randomization (ASLR) on ARM. Except that one small change was missing for ASLR of PIE executables. Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
#
30736a4d |
|
05-Mar-2010 |
Masami Hiramatsu <mhiramat@redhat.com> |
coredump: pass mm->flags as a coredump parameter for consistency Pass mm->flags as a coredump parameter for consistency. --- 1787 if (mm->core_state || !get_dumpable(mm)) { <- (1) 1788 up_write(&mm->mmap_sem); 1789 put_cred(cred); 1790 goto fail; 1791 } 1792 [...] 1798 if (get_dumpable(mm) == 2) { /* Setuid core dump mode */ <-(2) 1799 flag = O_EXCL; /* Stop rewrite attacks */ 1800 cred->fsuid = 0; /* Dump root private */ 1801 } --- Since dumpable bits are not protected by lock, there is a chance to change these bits between (1) and (2). To solve this issue, this patch copies mm->flags to coredump_params.mm_flags at the beginning of do_coredump() and uses it instead of get_dumpable() while dumping core. This copy is also passed to binfmt->core_dump, since elf*_core_dump() uses dump_filter bits in mm->flags. [akpm@linux-foundation.org: fix merge] Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8d9032bb |
|
05-Mar-2010 |
Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com> |
elf coredump: add extended numbering support The current ELF dumper implementation can produce broken corefiles if program headers exceed 65535. This number is determined by the number of vmas which the process have. In particular, some extreme programs may use more than 65535 vmas. (If you google max_map_count, you can find some users facing this problem.) This kind of program never be able to generate correct coredumps. This patch implements ``extended numbering'' that uses sh_info field of the first section header instead of e_phnum field in order to represent upto 4294967295 vmas. This is supported by AMD64-ABI(http://www.x86-64.org/documentation.html) and Solaris(http://docs.sun.com/app/docs/doc/817-1984/). Of course, we are preparing patches for gdb and binutils. Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: David Howells <dhowells@redhat.com> Cc: Greg Ungerer <gerg@snapgear.com> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
93eb211e |
|
05-Mar-2010 |
Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com> |
elf coredump: make offset calculation process and writing process explicit By the next patch, elf_core_dump() and elf_fdpic_core_dump() will support extended numbering and so will produce the corefiles with section header table in a special case. The problem is the process of writing a file header offset of the section header table into e_shoff field of the ELF header. ELF header is positioned at the beginning of the corefile, while section header at the end. So, we need to take which of the following ways: 1. Seek backward to retry writing operation for ELF header after writing process for a whole part 2. Make offset calculation process and writing process totally sequential The clause 1. is not always possible: one cannot assume that file system supports seek function. Consider the no_llseek case. Therefore, this patch adopts the clause 2. Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: David Howells <dhowells@redhat.com> Cc: Greg Ungerer <gerg@snapgear.com> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1fcccbac |
|
05-Mar-2010 |
Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com> |
elf coredump: replace ELF_CORE_EXTRA_* macros by functions elf_core_dump() and elf_fdpic_core_dump() use #ifdef and the corresponding macro for hiding _multiline_ logics in functions. This patch removes #ifdef and replaces ELF_CORE_EXTRA_* by corresponding functions. For architectures not implemeonting ELF_CORE_EXTRA_*, we use weak functions in order to reduce a range of modification. This cleanup is for my next patches, but I think this cleanup itself is worth doing regardless of my firnal purpose. Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: David Howells <dhowells@redhat.com> Cc: Greg Ungerer <gerg@snapgear.com> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
088e7af7 |
|
05-Mar-2010 |
Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com> |
coredump: move dump_write() and dump_seek() into a header file My next patch will replace ELF_CORE_EXTRA_* macros by functions, putting them into other newly created *.c files. Then, each files will contain dump_write(), where each pair of binfmt_*.c and elfcore.c should be the same. So, this patch moves them into a header file with dump_seek(). Also, the patch deletes confusing DUMP_WRITE macros in each files. Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: David Howells <dhowells@redhat.com> Cc: Greg Ungerer <gerg@snapgear.com> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
221af7f8 |
|
28-Jan-2010 |
Linus Torvalds <torvalds@linux-foundation.org> |
Split 'flush_old_exec' into two functions 'flush_old_exec()' is the point of no return when doing an execve(), and it is pretty badly misnamed. It doesn't just flush the old executable environment, it also starts up the new one. Which is very inconvenient for things like setting up the new personality, because we want the new personality to affect the starting of the new environment, but at the same time we do _not_ want the new personality to take effect if flushing the old one fails. As a result, the x86-64 '32-bit' personality is actually done using this insane "I'm going to change the ABI, but I haven't done it yet" bit (TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the personality, but just the "pending" bit, so that "flush_thread()" can do the actual personality magic. This patch in no way changes any of that insanity, but it does split the 'flush_old_exec()' function up into a preparatory part that can fail (still called flush_old_exec()), and a new part that will actually set up the new exec environment (setup_new_exec()). All callers are changed to trivially comply with the new world order. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f6151dfe |
|
17-Dec-2009 |
Masami Hiramatsu <mhiramat@redhat.com> |
mm: introduce coredump parameter structure Introduce coredump parameter data structure (struct coredump_params) to simplify binfmt->core_dump() arguments. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Suggested-by: Ingo Molnar <mingo@elte.hu> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
698ba7b5 |
|
15-Dec-2009 |
Christoph Hellwig <hch@lst.de> |
elf: kill USE_ELF_CORE_DUMP Currently all architectures but microblaze unconditionally define USE_ELF_CORE_DUMP. The microblaze omission seems like an error to me, so let's kill this ifdef and make sure we are the same everywhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: <linux-arch@vger.kernel.org> Cc: Michal Simek <michal.simek@petalogix.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
af901ca1 |
|
14-Nov-2009 |
André Goddard Rosa <andre.goddard@gmail.com> |
tree-wide: fix assorted typos all over the place That is "success", "unknown", "through", "performance", "[re|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
0cf062d0 |
|
23-Sep-2009 |
Amerigo Wang <amwang@redhat.com> |
elf: clean up fill_note_info() Introduce a helper function elf_note_info_init() to help fill_note_info() to do initializations, also fix the potential memory leaks. [akpm@linux-foundation.org: remove NUM_NOTES] Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f3e8fccd |
|
21-Sep-2009 |
Hugh Dickins <hugh.dickins@tiscali.co.uk> |
mm: add get_dump_page In preparation for the next patch, add a simple get_dump_page(addr) interface for the CONFIG_ELF_CORE dumpers to use, instead of calling get_user_pages() directly. They're not interested in errors: they just want to use holes as much as possible, to save space and make sure that the data is aligned where the headers said it would be. Oh, and don't use that horrid DUMP_SEEK(off) macro! Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Acked-by: Rik van Riel <riel@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9f0ab4a3 |
|
08-Sep-2009 |
Roland McGrath <roland@redhat.com> |
binfmt_elf: fix PT_INTERP bss handling In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if the PT_LOAD has no PROT_WRITE and no .bss. This generates EFAULT. Here is a small test case. (Yes, there are other, useful PT_INTERP which have only .text and no .data/.bss.) ----- ptinterp.S _start: .globl _start nop int3 ----- $ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S $ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c $ ./hello Segmentation fault # during execve() itself After applying the patch: $ ./hello Trace trap # user-mode execution after execve() finishes If the ELF headers are actually self-inconsistent, then dying is fine. But having no PROT_WRITE segment is perfectly normal and correct if there is no segment with p_memsz > p_filesz (i.e. bss). John Reiser suggested checking for PROT_WRITE in the bss logic. I think it makes most sense to simply apply the bss logic only when there is bss. This patch looks less trivial than it is due to some reindentation. It just moves the "if (last_bss > elf_bss) {" test up to include the partial-page bss logic as well as the more-pages bss logic. Reported-by: John Reiser <jreiser@bitwagon.com> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
|
#
752015d1 |
|
08-Sep-2009 |
Roland McGrath <roland@redhat.com> |
binfmt_elf: fix PT_INTERP bss handling In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if the PT_LOAD has no PROT_WRITE and no .bss. This generates EFAULT. Here is a small test case. (Yes, there are other, useful PT_INTERP which have only .text and no .data/.bss.) ----- ptinterp.S _start: .globl _start nop int3 ----- $ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S $ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c $ ./hello Segmentation fault # during execve() itself After applying the patch: $ ./hello Trace trap # user-mode execution after execve() finishes If the ELF headers are actually self-inconsistent, then dying is fine. But having no PROT_WRITE segment is perfectly normal and correct if there is no segment with p_memsz > p_filesz (i.e. bss). John Reiser suggested checking for PROT_WRITE in the bss logic. I think it makes most sense to simply apply the bss logic only when there is bss. This patch looks less trivial than it is due to some reindentation. It just moves the "if (last_bss > elf_bss) {" test up to include the partial-page bss logic as well as the more-pages bss logic. Reported-by: John Reiser <jreiser@bitwagon.com> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e2dbe125 |
|
30-Jun-2009 |
Amerigo Wang <amwang@redhat.com> |
elf: fix one check-after-use Check before use it. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
341c87bf |
|
30-Jun-2009 |
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> |
elf: limit max map count to safe value With ELF, at generating coredump, some more headers other than used vmas are added. When max_map_count == 65536, a core generated by following kinds of code can be unreadable because the number of ELF's program header is written in 16bit in Ehdr (please see elf.h) and the number overflows. == ... = mmap(); (munmap, mprotect, etc...) if (failed) abort(); == This can happen in mmap/munmap/mprotect/etc...which calls split_vma(). I think 65536 is not safe as _default_ and reduce it to 65530 is good for avoiding unexpected corrupted core. Anyway, max_map_count can be enlarged by sysctl if a user is brave.. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Jakub Jelinek <jakub@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3b34fc58 |
|
17-Jun-2009 |
Oleg Nesterov <oleg@redhat.com> |
elf_core_dump: use rcu_read_lock() to access ->real_parent In theory it is not safe to dereference ->parent/real_parent without tasklist or rcu lock, we can race with re-parenting. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d0f35dde |
|
29-Mar-2009 |
Al Viro <viro@zeniv.linux.org.uk> |
Trim includes in binfmt_elf Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
e7b9b550 |
|
29-Mar-2009 |
Al Viro <viro@zeniv.linux.org.uk> |
Don't mess with descriptor table in load_elf_binary() ... since we don't tell anyone which descriptor does the file get. We used to, but only in case of ELF binary with a.out loader and that stuff has been gone for a while. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
92dc07b1 |
|
06-Feb-2009 |
Roland McGrath <roland@redhat.com> |
elf core dump: fix get_user use The elf_core_dump() code does its work with set_fs(KERNEL_DS) in force, so vma_dump_size() needs to switch back with set_fs(USER_DS) to safely use get_user() for a normal user-space address. Checking for VM_READ optimizes out the case where get_user() would fail anyway. The vm_file check here was already superfluous given the control flow earlier in the function, so that is a cleanup/optimization unrelated to other changes but an obvious and trivial one. Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Roland McGrath <roland@redhat.com>
|
#
f06295b4 |
|
07-Jan-2009 |
Kees Cook <keescook@chromium.org> |
ELF: implement AT_RANDOM for glibc PRNG seeding While discussing[1] the need for glibc to have access to random bytes during program load, it seems that an earlier attempt to implement AT_RANDOM got stalled. This implements a random 16 byte string, available to every ELF program via a new auxv AT_RANDOM vector. [1] http://sourceware.org/ml/libc-alpha/2008-10/msg00006.html Ulrich said: glibc needs right after startup a bit of random data for internal protections (stack canary etc). What is now in upstream glibc is that we always unconditionally open /dev/urandom, read some data, and use it. For every process startup. That's slow. ... The solution is to provide a limited amount of random data to the starting process in the aux vector. I suggested 16 bytes and this is what the patch implements. If we need only 16 bytes or less we use the data directly. If we need more we'll use the 16 bytes to see a PRNG. This avoids the costly /dev/urandom use and it allows the kernel to use the most adequate source of random data for this purpose. It might not be the same pool as that for /dev/urandom. Concerns were expressed about the depletion of the randomness pool. But this patch doesn't make the situation worse, it doesn't deplete entropy more than happens now. Signed-off-by: Kees Cook <kees.cook@canonical.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fc5243d9 |
|
25-Dec-2008 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] arch_setup_additional_pages arguments arch_setup_additional_pages currently gets two arguments, the binary format descripton and an indication if the process uses an executable stack or not. The second argument is not used by anybody, it could be removed without replacement. What actually does make sense is to pass an indication if the process uses the elf interpreter or not. The glibc code will not use anything from the vdso if the process does not use the dynamic linker, so for statically linked binaries the architecture backend can choose not to map the vdso. Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
a6f76f23 |
|
13-Nov-2008 |
David Howells <dhowells@redhat.com> |
CRED: Make execve() take advantage of copy-on-write credentials Make execve() take advantage of copy-on-write credentials, allowing it to set up the credentials in advance, and then commit the whole lot after the point of no return. This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). The credential bits from struct linux_binprm are, for the most part, replaced with a single credentials pointer (bprm->cred). This means that all the creds can be calculated in advance and then applied at the point of no return with no possibility of failure. I would like to replace bprm->cap_effective with: cap_isclear(bprm->cap_effective) but this seems impossible due to special behaviour for processes of pid 1 (they always retain their parent's capability masks where normally they'd be changed - see cap_bprm_set_creds()). The following sequence of events now happens: (a) At the start of do_execve, the current task's cred_exec_mutex is locked to prevent PTRACE_ATTACH from obsoleting the calculation of creds that we make. (a) prepare_exec_creds() is then called to make a copy of the current task's credentials and prepare it. This copy is then assigned to bprm->cred. This renders security_bprm_alloc() and security_bprm_free() unnecessary, and so they've been removed. (b) The determination of unsafe execution is now performed immediately after (a) rather than later on in the code. The result is stored in bprm->unsafe for future reference. (c) prepare_binprm() is called, possibly multiple times. (i) This applies the result of set[ug]id binaries to the new creds attached to bprm->cred. Personality bit clearance is recorded, but now deferred on the basis that the exec procedure may yet fail. (ii) This then calls the new security_bprm_set_creds(). This should calculate the new LSM and capability credentials into *bprm->cred. This folds together security_bprm_set() and parts of security_bprm_apply_creds() (these two have been removed). Anything that might fail must be done at this point. (iii) bprm->cred_prepared is set to 1. bprm->cred_prepared is 0 on the first pass of the security calculations, and 1 on all subsequent passes. This allows SELinux in (ii) to base its calculations only on the initial script and not on the interpreter. (d) flush_old_exec() is called to commit the task to execution. This performs the following steps with regard to credentials: (i) Clear pdeath_signal and set dumpable on certain circumstances that may not be covered by commit_creds(). (ii) Clear any bits in current->personality that were deferred from (c.i). (e) install_exec_creds() [compute_creds() as was] is called to install the new credentials. This performs the following steps with regard to credentials: (i) Calls security_bprm_committing_creds() to apply any security requirements, such as flushing unauthorised files in SELinux, that must be done before the credentials are changed. This is made up of bits of security_bprm_apply_creds() and security_bprm_post_apply_creds(), both of which have been removed. This function is not allowed to fail; anything that might fail must have been done in (c.ii). (ii) Calls commit_creds() to apply the new credentials in a single assignment (more or less). Possibly pdeath_signal and dumpable should be part of struct creds. (iii) Unlocks the task's cred_replace_mutex, thus allowing PTRACE_ATTACH to take place. (iv) Clears The bprm->cred pointer as the credentials it was holding are now immutable. (v) Calls security_bprm_committed_creds() to apply any security alterations that must be done after the creds have been changed. SELinux uses this to flush signals and signal handlers. (f) If an error occurs before (d.i), bprm_free() will call abort_creds() to destroy the proposed new credentials and will then unlock cred_replace_mutex. No changes to the credentials will have been made. (2) LSM interface. A number of functions have been changed, added or removed: (*) security_bprm_alloc(), ->bprm_alloc_security() (*) security_bprm_free(), ->bprm_free_security() Removed in favour of preparing new credentials and modifying those. (*) security_bprm_apply_creds(), ->bprm_apply_creds() (*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds() Removed; split between security_bprm_set_creds(), security_bprm_committing_creds() and security_bprm_committed_creds(). (*) security_bprm_set(), ->bprm_set_security() Removed; folded into security_bprm_set_creds(). (*) security_bprm_set_creds(), ->bprm_set_creds() New. The new credentials in bprm->creds should be checked and set up as appropriate. bprm->cred_prepared is 0 on the first call, 1 on the second and subsequent calls. (*) security_bprm_committing_creds(), ->bprm_committing_creds() (*) security_bprm_committed_creds(), ->bprm_committed_creds() New. Apply the security effects of the new credentials. This includes closing unauthorised files in SELinux. This function may not fail. When the former is called, the creds haven't yet been applied to the process; when the latter is called, they have. The former may access bprm->cred, the latter may not. (3) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) The bprm_security_struct struct has been removed in favour of using the credentials-under-construction approach. (c) flush_unauthorized_files() now takes a cred pointer and passes it on to inode_has_perm(), file_has_perm() and dentry_open(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
|
#
c69e8d9c |
|
13-Nov-2008 |
David Howells <dhowells@redhat.com> |
CRED: Use RCU to access another task's creds and to release a task's own creds Use RCU to access another task's creds and to release a task's own creds. This means that it will be possible for the credentials of a task to be replaced without another task (a) requiring a full lock to read them, and (b) seeing deallocated memory. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
|
#
86a264ab |
|
13-Nov-2008 |
David Howells <dhowells@redhat.com> |
CRED: Wrap current->cred and a few other accessors Wrap current->cred and a few other accessors to hide their actual implementation. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
|
#
b6dff3ec |
|
13-Nov-2008 |
David Howells <dhowells@redhat.com> |
CRED: Separate task security context from task_struct Separate the task security context from task_struct. At this point, the security data is temporarily embedded in the task_struct with two pointers pointing to it. Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in entry.S via asm-offsets. With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
|
#
e575f111 |
|
18-Oct-2008 |
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> |
coredump_filter: add hugepage dumping Presently hugepage's vma has a VM_RESERVED flag in order not to be swapped. But a VM_RESERVED vma isn't core dumped because this flag is often used for some kernel vmas (e.g. vmalloc, sound related). Thus hugepages are never dumped and it can't be debugged easily. Many developers want hugepages to be included into core-dump. However, We can't read generic VM_RESERVED area because this area is often IO mapping area. then these area reading may change device state. it is definitly undesiable side-effect. So adding a hugepage specific bit to the coredump filter is better. It will be able to hugepage core dumping and doesn't cause any side-effect to any i/o devices. In additional, libhugetlb use hugetlb private mapping pages as anonymous page. Then, hugepage private mapping pages should be core dumped by default. Then, /proc/[pid]/core_dump_filter has two new bits. - bit 5 mean hugetlb private mapping pages are dumped or not. (default: yes) - bit 6 mean hugetlb shared mapping pages are dumped or not. (default: no) I tested by following method. % ulimit -c unlimited % ./crash_hugepage 50 % ./crash_hugepage 50 -p % ls -lh % gdb ./crash_hugepage core % % echo 0x43 > /proc/self/coredump_filter % ./crash_hugepage 50 % ./crash_hugepage 50 -p % ls -lh % gdb ./crash_hugepage core #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <sys/mman.h> #include <string.h> #include "hugetlbfs.h" int main(int argc, char** argv){ char* p; int ch; int mmap_flags = MAP_SHARED; int fd; int nr_pages; while((ch = getopt(argc, argv, "p")) != -1) { switch (ch) { case 'p': mmap_flags &= ~MAP_SHARED; mmap_flags |= MAP_PRIVATE; break; default: /* nothing*/ break; } } argc -= optind; argv += optind; if (argc == 0){ printf("need # of pages\n"); exit(1); } nr_pages = atoi(argv[0]); if (nr_pages < 2) { printf("nr_pages must >2\n"); exit(1); } fd = hugetlbfs_unlinked_fd(); p = mmap(NULL, nr_pages * gethugepagesize(), PROT_READ|PROT_WRITE, mmap_flags, fd, 0); sleep(2); *(p + gethugepagesize()) = 1; /* COW */ sleep(2); /* crash! */ *(int*)0 = 1; return 0; } Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Kawai Hidehiro <hidehiro.kawai.ez@hitachi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: William Irwin <wli@holomorphy.com> Cc: Adam Litke <agl@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0b592682 |
|
16-Oct-2008 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[PATCH] remove unused ibcs2/PER_SVR4 in SET_PERSONALITY The SET_PERSONALITY macro is always called with a second argument of 0. Remove the ibcs argument and the various tests to set the PER_SVR4 personality. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
f06febc9 |
|
12-Sep-2008 |
Frank Mayhar <fmayhar@google.com> |
timers: fix itimer/many thread hang Overview This patch reworks the handling of POSIX CPU timers, including the ITIMER_PROF, ITIMER_VIRT timers and rlimit handling. It was put together with the help of Roland McGrath, the owner and original writer of this code. The problem we ran into, and the reason for this rework, has to do with using a profiling timer in a process with a large number of threads. It appears that the performance of the old implementation of run_posix_cpu_timers() was at least O(n*3) (where "n" is the number of threads in a process) or worse. Everything is fine with an increasing number of threads until the time taken for that routine to run becomes the same as or greater than the tick time, at which point things degrade rather quickly. This patch fixes bug 9906, "Weird hang with NPTL and SIGPROF." Code Changes This rework corrects the implementation of run_posix_cpu_timers() to make it run in constant time for a particular machine. (Performance may vary between one machine and another depending upon whether the kernel is built as single- or multiprocessor and, in the latter case, depending upon the number of running processors.) To do this, at each tick we now update fields in signal_struct as well as task_struct. The run_posix_cpu_timers() function uses those fields to make its decisions. We define a new structure, "task_cputime," to contain user, system and scheduler times and use these in appropriate places: struct task_cputime { cputime_t utime; cputime_t stime; unsigned long long sum_exec_runtime; }; This is included in the structure "thread_group_cputime," which is a new substructure of signal_struct and which varies for uniprocessor versus multiprocessor kernels. For uniprocessor kernels, it uses "task_cputime" as a simple substructure, while for multiprocessor kernels it is a pointer: struct thread_group_cputime { struct task_cputime totals; }; struct thread_group_cputime { struct task_cputime *totals; }; We also add a new task_cputime substructure directly to signal_struct, to cache the earliest expiration of process-wide timers, and task_cputime also replaces the it_*_expires fields of task_struct (used for earliest expiration of thread timers). The "thread_group_cputime" structure contains process-wide timers that are updated via account_user_time() and friends. In the non-SMP case the structure is a simple aggregator; unfortunately in the SMP case that simplicity was not achievable due to cache-line contention between CPUs (in one measured case performance was actually _worse_ on a 16-cpu system than the same test on a 4-cpu system, due to this contention). For SMP, the thread_group_cputime counters are maintained as a per-cpu structure allocated using alloc_percpu(). The timer functions update only the timer field in the structure corresponding to the running CPU, obtained using per_cpu_ptr(). We define a set of inline functions in sched.h that we use to maintain the thread_group_cputime structure and hide the differences between UP and SMP implementations from the rest of the kernel. The thread_group_cputime_init() function initializes the thread_group_cputime structure for the given task. The thread_group_cputime_alloc() is a no-op for UP; for SMP it calls the out-of-line function thread_group_cputime_alloc_smp() to allocate and fill in the per-cpu structures and fields. The thread_group_cputime_free() function, also a no-op for UP, in SMP frees the per-cpu structures. The thread_group_cputime_clone_thread() function (also a UP no-op) for SMP calls thread_group_cputime_alloc() if the per-cpu structures haven't yet been allocated. The thread_group_cputime() function fills the task_cputime structure it is passed with the contents of the thread_group_cputime fields; in UP it's that simple but in SMP it must also safely check that tsk->signal is non-NULL (if it is it just uses the appropriate fields of task_struct) and, if so, sums the per-cpu values for each online CPU. Finally, the three functions account_group_user_time(), account_group_system_time() and account_group_exec_runtime() are used by timer functions to update the respective fields of the thread_group_cputime structure. Non-SMP operation is trivial and will not be mentioned further. The per-cpu structure is always allocated when a task creates its first new thread, via a call to thread_group_cputime_clone_thread() from copy_signal(). It is freed at process exit via a call to thread_group_cputime_free() from cleanup_signal(). All functions that formerly summed utime/stime/sum_sched_runtime values from from all threads in the thread group now use thread_group_cputime() to snapshot the values in the thread_group_cputime structure or the values in the task structure itself if the per-cpu structure hasn't been allocated. Finally, the code in kernel/posix-cpu-timers.c has changed quite a bit. The run_posix_cpu_timers() function has been split into a fast path and a slow path; the former safely checks whether there are any expired thread timers and, if not, just returns, while the slow path does the heavy lifting. With the dedicated thread group fields, timers are no longer "rebalanced" and the process_timer_rebalance() function and related code has gone away. All summing loops are gone and all code that used them now uses the thread_group_cputime() inline. When process-wide timers are set, the new task_cputime structure in signal_struct is used to cache the earliest expiration; this is checked in the fast path. Performance The fix appears not to add significant overhead to existing operations. It generally performs the same as the current code except in two cases, one in which it performs slightly worse (Case 5 below) and one in which it performs very significantly better (Case 2 below). Overall it's a wash except in those two cases. I've since done somewhat more involved testing on a dual-core Opteron system. Case 1: With no itimer running, for a test with 100,000 threads, the fixed kernel took 1428.5 seconds, 513 seconds more than the unfixed system, all of which was spent in the system. There were twice as many voluntary context switches with the fix as without it. Case 2: With an itimer running at .01 second ticks and 4000 threads (the most an unmodified kernel can handle), the fixed kernel ran the test in eight percent of the time (5.8 seconds as opposed to 70 seconds) and had better tick accuracy (.012 seconds per tick as opposed to .023 seconds per tick). Case 3: A 4000-thread test with an initial timer tick of .01 second and an interval of 10,000 seconds (i.e. a timer that ticks only once) had very nearly the same performance in both cases: 6.3 seconds elapsed for the fixed kernel versus 5.5 seconds for the unfixed kernel. With fewer threads (eight in these tests), the Case 1 test ran in essentially the same time on both the modified and unmodified kernels (5.2 seconds versus 5.8 seconds). The Case 2 test ran in about the same time as well, 5.9 seconds versus 5.4 seconds but again with much better tick accuracy, .013 seconds per tick versus .025 seconds per tick for the unmodified kernel. Since the fix affected the rlimit code, I also tested soft and hard CPU limits. Case 4: With a hard CPU limit of 20 seconds and eight threads (and an itimer running), the modified kernel was very slightly favored in that while it killed the process in 19.997 seconds of CPU time (5.002 seconds of wall time), only .003 seconds of that was system time, the rest was user time. The unmodified kernel killed the process in 20.001 seconds of CPU (5.014 seconds of wall time) of which .016 seconds was system time. Really, though, the results were too close to call. The results were essentially the same with no itimer running. Case 5: With a soft limit of 20 seconds and a hard limit of 2000 seconds (where the hard limit would never be reached) and an itimer running, the modified kernel exhibited worse tick accuracy than the unmodified kernel: .050 seconds/tick versus .028 seconds/tick. Otherwise, performance was almost indistinguishable. With no itimer running this test exhibited virtually identical behavior and times in both cases. In times past I did some limited performance testing. those results are below. On a four-cpu Opteron system without this fix, a sixteen-thread test executed in 3569.991 seconds, of which user was 3568.435s and system was 1.556s. On the same system with the fix, user and elapsed time were about the same, but system time dropped to 0.007 seconds. Performance with eight, four and one thread were comparable. Interestingly, the timer ticks with the fix seemed more accurate: The sixteen-thread test with the fix received 149543 ticks for 0.024 seconds per tick, while the same test without the fix received 58720 for 0.061 seconds per tick. Both cases were configured for an interval of 0.01 seconds. Again, the other tests were comparable. Each thread in this test computed the primes up to 25,000,000. I also did a test with a large number of threads, 100,000 threads, which is impossible without the fix. In this case each thread computed the primes only up to 10,000 (to make the runtime manageable). System time dominated, at 1546.968 seconds out of a total 2176.906 seconds (giving a user time of 629.938s). It received 147651 ticks for 0.015 seconds per tick, still quite accurate. There is obviously no comparable test without the fix. Signed-off-by: Frank Mayhar <fmayhar@google.com> Cc: Roland McGrath <roland@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
6341c393 |
|
25-Jul-2008 |
Roland McGrath <roland@redhat.com> |
tracehook: exec This moves all the ptrace hooks related to exec into tracehook.h inlines. This also lifts the calls for tracing out of the binfmt load_binary hooks into search_binary_handler() after it calls into the binfmt module. This change has no effect, since all the binfmt modules' load_binary functions did the call at the end on success, and now search_binary_handler() does it immediately after return if successful. We consolidate the repeated code, and binfmt modules no longer need to import ptrace_notify(). Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
83914441 |
|
25-Jul-2008 |
Oleg Nesterov <oleg@tv-sign.ru> |
coredump: elf_core_dump: use core_state->dumper list Kill the nasty rcu_read_lock() + do_each_thread() loop, use the list encoded in mm->core_state instead, s/GFP_ATOMIC/GFP_KERNEL/. This patch allows futher cleanups in binfmt_elf.c, in particular we can kill the parallel info->threads list. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
24d5288f |
|
25-Jul-2008 |
Oleg Nesterov <oleg@tv-sign.ru> |
coredump: elf_core_dump: skip kernel threads linux_binfmt->core_dump() runs before the process does exit_aio(), this means that we can hit the kernel thread which shares the same ->mm. Afaics, nothing really bad can happen, but perhaps it makes sense to fix this minor bug. It is sad we have to iterate over all threads in system and use GFP_ATOMIC. Hopefully we can kill theses ugly do_each_thread()s, but this needs some nontrivial changes in mm_struct and do_coredump. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
483fad1c |
|
21-Jul-2008 |
Nathan Lynch <ntl@pobox.com> |
ELF loader support for auxvec base platform string Some IBM POWER-based platforms have the ability to run in a mode which mostly appears to the OS as a different processor from the actual hardware. For example, a Power6 system may appear to be a Power5+, which makes the AT_PLATFORM value "power5+". This means that programs are restricted to the ISA supported by Power5+; Power6-specific instructions are treated as illegal. However, some applications (virtual machines, optimized libraries) can benefit from knowledge of the underlying CPU model. A new aux vector entry, AT_BASE_PLATFORM, will denote the actual hardware. For example, on a Power6 system in Power5+ compatibility mode, AT_PLATFORM will be "power5+" and AT_BASE_PLATFORM will be "power6". The idea is that AT_PLATFORM indicates the instruction set supported, while AT_BASE_PLATFORM indicates the underlying microarchitecture. If the architecture has defined ELF_BASE_PLATFORM, copy that value to the user stack in the same manner as ELF_PLATFORM. Signed-off-by: Nathan Lynch <ntl@pobox.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
65191087 |
|
21-Jul-2008 |
John Reiser <jreiser@BitWagon.com> |
execve filename: document and export via auxiliary vector The Linux kernel puts the filename argument of execve() into the new address space. Many developers are surprised to learn this. Those who know and could use it, object "But it's not documented." Those who want to use it dislike the expression (char *)(1+ strlen(env[-1+ n_env]) + env[-1+ n_env]) because it requires locating the last original environment variable, and assumes that the filename follows the characters. This patch documents the insertion of the filename, and makes it easier to find by adding a new tag AT_EXECFN in the ElfXX_auxv_t; see <elf.h>. In many cases readlink("/proc/self/exe",) gives the same answer. But if all the original pages get unmapped, then the kernel erases the symlink for /proc/self/exe. This can happen when a program decompressor does a good job of cleaning up after uncompressing directly to memory, so that the address space of the target program looks the same as if compression had never happened. One example is http://upx.sourceforge.net . One notable use of the underlying concept (what path containED the executable) is glibc expanding $ORIGIN in DT_RUNPATH. In practice for the near term, it may be a good idea for user-mode code to use both /proc/self/exe and AT_EXECFN as fall-back methods for each other. /proc/self/exe can fail due to unmapping, AT_EXECFN can fail because it won't be present on non-new systems. The auxvec or {AT_EXECFN}.d_val also can get overwritten, although in nearly all cases this would be the result of a bug. The runtime cost is one NEW_AUX_ENT using two words of stack space. The underlying value is maintained already as bprm->exec; setup_arg_pages() in fs/exec.c slides it for stack_shift, etc. Signed-off-by: John Reiser <jreiser@BitWagon.com> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a9e0f529 |
|
15-Jun-2008 |
David Woodhouse <dwmw2@infradead.org> |
Remove last traces of a.out support from ELF loader. In commit d20894a23708c2af75966534f8e4dedb46d48db2 ("Remove a.out interpreter support in ELF loader"), Andi removed support for a.out interpreters from the ELF loader, which was only ever needed for the transition from a.out to ELF. This removes the last traces of that support, in particular the inclusion of <linux/a.out.h>. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
23c4971e |
|
08-May-2008 |
WANG Cong <xiyou.wangcong@gmail.com> |
[Patch] fs/binfmt_elf.c: fix wrong return values create_elf_tables() returns 0 on success. But when strnlen_user() "fails", it returns 0 directly. So this is wrong. Signed-off-by: WANG Cong <wangcong@zeuux.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
5f719558 |
|
05-May-2008 |
WANG Cong <xiyou.wangcong@gmail.com> |
[Patch] fs/binfmt_elf.c: fix a wrong free In kmalloc failing path, we shouldn't free pointers in 'info', because the struct 'info' is uninitilized when kmalloc is called. And when kmalloc returns NULL, it's needless to kfree it. Signed-off-by: WANG Cong <wangcong@zeuux.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> -- Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4220b7fe |
|
29-Apr-2008 |
WANG Cong <xiyou.wangcong@gmail.com> |
elf: fix shadowed variables in fs/binfmt_elf.c Fix these sparse warings: fs/binfmt_elf.c:1749:29: warning: symbol 'tmp' shadows an earlier one fs/binfmt_elf.c:1734:28: originally declared here fs/binfmt_elf.c:2009:26: warning: symbol 'vma' shadows an earlier one fs/binfmt_elf.c:1892:24: originally declared here [akpm@linux-foundation.org: chose better variable name] Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6970c8ef |
|
29-Apr-2008 |
Cyrill Gorcunov <gorcunov@gmail.com> |
BINFMT: fill_elf_header cleanup - use straight memset first This patch does simplify fill_elf_header function by setting to zero the whole elf header first. So we fillup the fields we really need only. before: text data bss dec hex filename 11735 80 0 11815 2e27 fs/binfmt_elf.o after: text data bss dec hex filename 11710 80 0 11790 2e0e fs/binfmt_elf.o viola, 25 bytes of text is freed Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fd8328be |
|
22-Apr-2008 |
Al Viro <viro@zeniv.linux.org.uk> |
[PATCH] sanitize handling of shared descriptor tables in failing execve() * unshare_files() can fail; doing it after irreversible actions is wrong and de_thread() is certainly irreversible. * since we do it unconditionally anyway, we might as well do it in do_execve() and save ourselves the PITA in binfmt handlers, etc. * while we are at it, binfmt_som actually leaked files_struct on failure. As a side benefit, unshare_files(), put_files_struct() and reset_files_struct() become unexported. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
d31472b6 |
|
04-Mar-2008 |
Roland McGrath <roland@redhat.com> |
core dump: user_regset writeback This makes the user_regset-based core dump code call user_regset writeback hooks when available. This is necessary groundwork to allow IA64 to set CORE_DUMP_USE_REGSET. Cc: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Roland McGrath <roland@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d20894a2 |
|
08-Feb-2008 |
Andi Kleen <andi@firstfloor.org> |
Remove a.out interpreter support in ELF loader Following the deprecation schedule the a.out ELF interpreter support is removed now with this patch. a.out ELF interpreters were an transition feature for moving a.out systems to ELF, but they're unlikely to be still needed. Pure a.out systems will still work of course. This allows to simplify the hairy ELF loader. Signed-off-by: Andi Kleen <ak@suse.de> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7fa30315 |
|
08-Feb-2008 |
David Howells <dhowells@redhat.com> |
aout: suppress A.OUT library support if !CONFIG_ARCH_SUPPORTS_AOUT Suppress A.OUT library support if CONFIG_ARCH_SUPPORTS_AOUT is not set. Not all architectures support the A.OUT binfmt, so the ELF binfmt should not be permitted to go looking for A.OUT libraries to load in such a case. Not only that, but under such conditions A.OUT core dumps are not produced either. To make this work, this patch also does the following: (1) Makes the existence of the contents of linux/a.out.h contingent on CONFIG_ARCH_SUPPORTS_AOUT. (2) Renames dump_thread() to aout_dump_thread() as it's only called by A.OUT core dumping code. (3) Moves aout_dump_thread() into asm/a.out-core.h and makes it inline. This is then included only where needed. This means that this bit of arch code will be stored in the appropriate A.OUT binfmt module rather than the core kernel. (4) Drops A.OUT support for Blackfin (according to Mike Frysinger it's not needed) and FRV. This patch depends on the previous patch to move STACK_TOP[_MAX] out of asm/a.out.h and into asm/processor.h as they're required whether or not A.OUT format is available. [jdike@addtoit.com: uml: re-remove accidentally restored code] Signed-off-by: David Howells <dhowells@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
32a93233 |
|
06-Feb-2008 |
Ingo Molnar <mingo@elte.hu> |
brk randomization: introduce CONFIG_COMPAT_BRK based on similar patch from: Pavel Machek <pavel@ucw.cz> Introduce CONFIG_COMPAT_BRK. If disabled then the kernel is free (but not obliged to) randomize the brk area. Heap randomization breaks ancient binaries, so we keep COMPAT_BRK enabled by default. Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
09c6dd3c |
|
03-Feb-2008 |
Ohad Ben-Cohen <ohad@bencohen.org> |
fs/binfmt_elf.c: spello fix s/litle/little Signed-off-by: Ohad Ben-Cohen <ohad@bencohen.org> Signed-off-by: Adrian Bunk <bunk@kernel.org>
|
#
612a95b4 |
|
30-Jan-2008 |
Andi Kleen <andi@firstfloor.org> |
x86: remove iBCS support ibcs2 support has never been supported on 2.6 kernels as far as I know, and if it has it must have been an external patch. Anyways, if anybody applies an external patch they could as well readd the ibcs checking code to the ELF loader in the same patch. But there is no reason to keep this code running in all Linux kernels. This will save at least two strcmps each ELF execution. No deprecation period because it could not have been used anyway. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
4206d3aa |
|
30-Jan-2008 |
Roland McGrath <roland@redhat.com> |
elf core dump: notes user_regset This modifies the ELF core dump code under #ifdef CORE_DUMP_USE_REGSET. It changes nothing when this macro is not defined. When it's #define'd by some arch header (e.g. asm/elf.h), the arch must support the user_regset (linux/regset.h) interface for reading thread state. This provides an alternate version of note segment writing that is based purely on the user_regset interfaces. When CORE_DUMP_USE_REGSET is set, the arch need not define macros such as ELF_CORE_COPY_REGS and ELF_ARCH. All that information is taken from the user_regset data structures. The core dumps come out exactly the same if arch's definitions for its user_regset details are correct. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
3aba481f |
|
30-Jan-2008 |
Roland McGrath <roland@redhat.com> |
elf core dump: notes reorg This pulls out the code for writing the notes segment of an ELF core dump into separate functions. This cleanly isolates into one cluster of functions everything that deals with the note formats and the hooks into arch code to fill them. The top-level elf_core_dump function itself now deals purely with the generic ELF format and the memory segments. This only moves code around into functions that can be inlined away. It should not change any behavior at all. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
bb1ad820 |
|
30-Jan-2008 |
Andrew Morton <akpm@linux-foundation.org> |
x86: PIE executable randomization, checkpatch fixes #39: FILE: arch/ia64/ia32/binfmt_elf32.c:229: +elf32_map (struct file *filep, unsigned long addr, struct elf_phdr *eppnt, int prot, int type, unsigned long unused) WARNING: no space between function name and open parenthesis '(' #39: FILE: arch/ia64/ia32/binfmt_elf32.c:229: +elf32_map (struct file *filep, unsigned long addr, struct elf_phdr *eppnt, int prot, int type, unsigned long unused) WARNING: line over 80 characters #67: FILE: arch/x86/kernel/sys_x86_64.c:80: + new_begin = randomize_range(*begin, *begin + 0x02000000, 0); ERROR: use tabs not spaces #110: FILE: arch/x86/kernel/sys_x86_64.c:185: + ^I mm->cached_hole_size = 0;$ ERROR: use tabs not spaces #111: FILE: arch/x86/kernel/sys_x86_64.c:186: + ^I^Imm->free_area_cache = mm->mmap_base;$ ERROR: use tabs not spaces #112: FILE: arch/x86/kernel/sys_x86_64.c:187: + ^I}$ ERROR: use tabs not spaces #141: FILE: arch/x86/kernel/sys_x86_64.c:216: + ^I^I/* remember the largest hole we saw so far */$ ERROR: use tabs not spaces #142: FILE: arch/x86/kernel/sys_x86_64.c:217: + ^I^Iif (addr + mm->cached_hole_size < vma->vm_start)$ ERROR: use tabs not spaces #143: FILE: arch/x86/kernel/sys_x86_64.c:218: + ^I^I mm->cached_hole_size = vma->vm_start - addr;$ ERROR: use tabs not spaces #157: FILE: arch/x86/kernel/sys_x86_64.c:232: + ^Imm->free_area_cache = TASK_UNMAPPED_BASE;$ ERROR: need a space before the open parenthesis '(' #291: FILE: arch/x86/mm/mmap_64.c:101: + } else if(mmap_is_legacy()) { WARNING: braces {} are not necessary for single statement blocks #302: FILE: arch/x86/mm/mmap_64.c:112: + if (current->flags & PF_RANDOMIZE) { + mm->mmap_base += ((long)rnd) << PAGE_SHIFT; + } WARNING: line over 80 characters #314: FILE: fs/binfmt_elf.c:48: +static unsigned long elf_map (struct file *, unsigned long, struct elf_phdr *, int, int, unsigned long); WARNING: no space between function name and open parenthesis '(' #314: FILE: fs/binfmt_elf.c:48: +static unsigned long elf_map (struct file *, unsigned long, struct elf_phdr *, int, int, unsigned long); WARNING: line over 80 characters #429: FILE: fs/binfmt_elf.c:438: + eppnt, elf_prot, elf_type, total_size); ERROR: need space after that ',' (ctx:VxV) #480: FILE: fs/binfmt_elf.c:939: + elf_prot, elf_flags,0); ^ total: 9 errors, 7 warnings, 461 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
cc503c1b |
|
30-Jan-2008 |
Jiri Kosina <jkosina@suse.cz> |
x86: PIE executable randomization main executable of (specially compiled/linked -pie/-fpie) ET_DYN binaries onto a random address (in cases in which mmap() is allowed to perform a randomization). The code has been extraced from Ingo's exec-shield patch http://people.redhat.com/mingo/exec-shield/ [akpm@linux-foundation.org: fix used-uninitialsied warning] [kamezawa.hiroyu@jp.fujitsu.com: fixed ia32 ELF on x86_64 handling] Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
c1d171a0 |
|
30-Jan-2008 |
Jiri Kosina <jkosina@suse.cz> |
x86: randomize brk Randomize the location of the heap (brk) for i386 and x86_64. The range is randomized in the range starting at current brk location up to 0x02000000 offset for both architectures. This, together with pie-executable-randomization.patch and pie-executable-randomization-fix.patch, should make the address space randomization on i386 and x86_64 complete. Arjan says: This is known to break older versions of some emacs variants, whose dumper code assumed that the last variable declared in the program is equal to the start of the dynamically allocated memory region. (The dumper is the code where emacs effectively dumps core at the end of it's compilation stage; this coredump is then loaded as the main program during normal use) iirc this was 5 years or so; we found this way back when I was at RH and we first did the security stuff there (including this brk randomization). It wasn't all variants of emacs, and it got fixed as a result (I vaguely remember that emacs already had code to deal with it for other archs/oses, just ifdeffed wrongly). It's a rare and wrong assumption as a general thing, just on x86 it mostly happened to be true (but to be honest, it'll break too if gcc does something fancy or if the linker does a non-standard order). Still its something we should at least document. Note 2: afaik it only broke the emacs *build*. I'm not 100% sure about that (it IS 5 years ago) though. [ akpm@linux-foundation.org: deuglification ] Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
45626bb2 |
|
07-Jan-2008 |
Roland McGrath <roland@redhat.com> |
core dump: real_parent ppid The pr_ppid field reported in core dumps should match what getppid() would have returned to that process, regardless of whether a debugger is attached. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b488893a |
|
19-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
pid namespaces: changes to show virtual ids to user This is the largest patch in the set. Make all (I hope) the places where the pid is shown to or get from user operate on the virtual pids. The idea is: - all in-kernel data structures must store either struct pid itself or the pid's global nr, obtained with pid_nr() call; - when seeking the task from kernel code with the stored id one should use find_task_by_pid() call that works with global pids; - when showing pid's numerical value to the user the virtual one should be used, but however when one shows task's pid outside this task's namespace the global one is to be used; - when getting the pid from userspace one need to consider this as the virtual one and use appropriate task/pid-searching functions. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: nuther build fix] [akpm@linux-foundation.org: yet nuther build fix] [akpm@linux-foundation.org: remove unneeded casts] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a47afb0f |
|
19-Oct-2007 |
Pavel Emelianov <xemul@openvz.org> |
pid namespaces: round up the API The set of functions process_session, task_session, process_group and task_pgrp is confusing, as the names can be mixed with each other when looking at the code for a long time. The proposals are to * equip the functions that return the integer with _nr suffix to represent that fact, * and to make all functions work with task (not process) by making the common prefix of the same name. For monotony the routines signal_session() and set_signal_session() are replaced with task_session_nr() and set_task_session(), especially since they are only used with the explicit task->signal dereference. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d68c9d6a |
|
17-Oct-2007 |
Franck Bui-Huu <fbuihuu@gmail.com> |
Break ELF_PLATFORM and stack pointer randomization dependency Currently arch_align_stack() is used by fs/binfmt_elf.c to randomize stack pointer inside a page. But this happens only if ELF_PLATFORM symbol is defined. ELF_PLATFORM is normally set if the architecture wants ld.so to load implementation specific libraries for optimization. And currently a lot of architectures just yield this symbol to NULL. This is the case for MIPS architecture where ELF_PLATFORM is NULL but arch_align_stack() has been redefined to do stack inside page randomization. So in this case no randomization is actually done. This patch breaks this dependency which seems to be useless and allows platforms such MIPS to do the randomization. Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4f9a58d7 |
|
17-Oct-2007 |
Olaf Hering <olaf@aepfle.de> |
increase AT_VECTOR_SIZE to terminate saved_auxv properly include/asm-powerpc/elf.h has 6 entries in ARCH_DLINFO. fs/binfmt_elf.c has 14 unconditional NEW_AUX_ENT entries and 2 conditional NEW_AUX_ENT entries. So in the worst case, saved_auxv does not get an AT_NULL entry at the end. The saved_auxv array must be terminated with an AT_NULL entry. Make the size of mm_struct->saved_auxv arch dependend, based on the number of ARCH_DLINFO entries. Signed-off-by: Olaf Hering <olh@suse.de> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
82df3973 |
|
17-Oct-2007 |
Roland McGrath <roland@redhat.com> |
Add MMF_DUMP_ELF_HEADERS This adds the MMF_DUMP_ELF_HEADERS option to /proc/pid/coredump_filter. This dumps the first page (only) of a private file mapping if it appears to be a mapping of an ELF file. Including these pages in the core dump may give sufficient identifying information to associate the original DSO and executable file images and their debugging information with a core file in a generic way just from its contents (e.g. when those binaries were built with ld --build-id). I expect this to become the default behavior eventually. Existing versions of gdb can be confused by the core dumps it creates, so it won't enabled by default for some time to come. Soon many people will have systems with a gdb that handle these dumps, so they can arrange to set the bit at boot and have it inherited system-wide. This also cleans up the checking of the MMF_DUMP_* flag bits, which did not need to be using atomic macros. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8e9073ed |
|
17-Oct-2007 |
Andi Kleen <ak@novell.com> |
Deprecate a.out ELF interpreters The Linux ELF loader is quite complicated and messy code (that could probably need a rewrite, but that's a different chapter). One particular messy part in it is the support for non ELF a.out ld.sos. This was originally added to make transition from a.out to ELF easier because an a.out ELF ld.so could be still build using an older a.out toolkit. But by now that should be fully obsolete and removing it would clean up binfmt_elf.c up a bit. I propose to deprecate this support and remove for 2.6.25. Drawback is that someone still runs their system with a.out ld.so they would need to update the ld.so when updating to a new kernel. This patch just adds an entry to the deprecation file and a printk warning users. [akpm@linux-foundation.org: better warning message] Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7dc0b22e |
|
17-Oct-2007 |
Neil Horman <nhorman@tuxdriver.com> |
core_pattern: ignore RLIMIT_CORE if core_pattern is a pipe For some time /proc/sys/kernel/core_pattern has been able to set its output destination as a pipe, allowing a user space helper to receive and intellegently process a core. This infrastructure however has some shortcommings which can be enhanced. Specifically: 1) The coredump code in the kernel should ignore RLIMIT_CORE limitation when core_pattern is a pipe, since file system resources are not being consumed in this case, unless the user application wishes to save the core, at which point the app is restricted by usual file system limits and restrictions. 2) The core_pattern code should be able to parse and pass options to the user space helper as an argv array. The real core limit of the uid of the crashing proces should also be passable to the user space helper (since it is overridden to zero when called). 3) Some miscellaneous bugs need to be cleaned up (specifically the recognition of a recursive core dump, should the user mode helper itself crash. Also, the core dump code in the kernel should not wait for the user mode helper to exit, since the same context is responsible for writing to the pipe, and a read of the pipe by the user mode helper will result in a deadlock. This patch: Remove the check of RLIMIT_CORE if core_pattern is a pipe. In the event that core_pattern is a pipe, the entire core will be fed to the user mode helper. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Cc: <martin.pitt@ubuntu.com> Cc: <wwoods@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5b20cd80 |
|
17-Oct-2007 |
Mark Nelson <markn@au1.ibm.com> |
x86: replace NT_PRXFPREG with ELF_CORE_XFPREG_TYPE #define Replace NT_PRXFPREG with ELF_CORE_XFPREG_TYPE in the coredump code which allows for more flexibility in the note type for the state of 'extended floating point' implementations in coredumps. New note types can now be added with an appropriate #define. This does #define ELF_CORE_XFPREG_TYPE to be NT_PRXFPREG in all current users so there's are no change in behaviour. This will let us use different note types on powerpc for the Altivec/VMX state that some PowerPC cpus have (G4, PPC970, POWER6) and for the SPE (signal processing extension) state that some embedded PowerPC cpus from Freescale have. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@suse.de> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
557ed1fa |
|
16-Oct-2007 |
Nick Piggin <npiggin@suse.de> |
remove ZERO_PAGE The commit b5810039a54e5babf428e9a1e89fc1940fabff11 contains the note A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue. And indeed this cacheline bouncing has shown up on large SGI systems. There was a situation where an Altix system was essentially livelocked tearing down ZERO_PAGE pagetables when an HPC app aborted during startup. This situation can be avoided in userspace, but it does highlight the potential scalability problem with refcounting ZERO_PAGE, and corner cases where it can really hurt (we don't want the system to livelock!). There are several broad ways to fix this problem: 1. add back some special casing to avoid refcounting ZERO_PAGE 2. per-node or per-cpu ZERO_PAGES 3. remove the ZERO_PAGE completely I will argue for 3. The others should also fix the problem, but they result in more complex code than does 3, with little or no real benefit that I can see. Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a false optimisation: if an application is performance critical, it would not be doing many read faults of new memory, or at least it could be expected to write to that memory soon afterwards. If cache or memory use is critical, it should not be working with a significant number of ZERO_PAGEs anyway (a more compact representation of zeroes should be used). As a sanity check -- mesuring on my desktop system, there are never many mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not increase much without it. When running a make -j4 kernel compile on my dual core system, there are about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000 ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second is torn down without being COWed). So removing ZERO_PAGE will save 1,000 page faults per second when running kbuild, while keeping it only saves less than 1 page clearing operation per second. 1 page clear is cheaper than a thousand faults, presumably, so there isn't an obvious loss. Neither the logical argument nor these basic tests give a guarantee of no regressions. However, this is a reasonable opportunity to try to remove the ZERO_PAGE from the pagefault path. If it is found to cause regressions, we can reintroduce it and just avoid refcounting it. The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see much use to them except on benchmarks. All other users of ZERO_PAGE are converted just to use ZERO_PAGE(0) for simplicity. We can look at replacing them all and maybe ripping out ZERO_PAGE completely when we are more satisfied with this solution. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus "snif" Torvalds <torvalds@linux-foundation.org>
|
#
e5501492 |
|
18-Sep-2007 |
Michael Ellerman <michael@ellerman.id.au> |
[POWERPC] spufs: Cleanup ELF coredump extra notes logic To start with, arch_notes_size() etc. is a little too ambiguous a name for my liking, so change the function names to be more explicit. Calling through macros is ugly, especially with hidden parameters, so don't do that, call the routines directly. Use ARCH_HAVE_EXTRA_ELF_NOTES as the only flag, and based on it decide whether we want the extern declarations or the empty versions. Since we have empty routines, actually use them in the coredump code to save a few #ifdefs. We want to change the handling of foffset so that the write routine updates foffset as it goes, instead of using file->f_pos (so that writing to a pipe works). So pass foffset to the write routine, and for now just set it to file->f_pos at the end of writing. It should also be possible for the write routine to fail, so change it to return int and treat a non-zero return as failure. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
d4e3cc38 |
|
21-Jul-2007 |
Andrew Morton <akpm@linux-foundation.org> |
revert "PIE randomization" There are reports of this causing userspace failures (http://lkml.org/lkml/2007/7/20/421). Revert. Cc: Jan Kratochvil <honza@jikos.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Ulrich Kunitz <kune@deine-taler.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Bret Towe" <magnade@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a1b59e80 |
|
19-Jul-2007 |
Kawai, Hidehiro <hidehiro.kawai.ez@hitachi.com> |
coredump masking: ELF: enable core dump filtering This patch enables core dump filtering for ELF-formatted core file. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b6a2fea3 |
|
19-Jul-2007 |
Ollie Wild <aaw@google.com> |
mm: variable length argument support Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from the old mm into the new mm. We create the new mm before the binfmt code runs, and place the new stack at the very top of the address space. Once the binfmt code runs and figures out where the stack should be, we move it downwards. It is a bit peculiar in that we have one task with two mm's, one of which is inactive. [a.p.zijlstra@chello.nl: limit stack size] Signed-off-by: Ollie Wild <aaw@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Hugh Dickins <hugh@veritas.com> [bunk@stusta.de: unexport bprm_mm_init] Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4d3b573a |
|
16-Jul-2007 |
Andrew Morton <akpm@linux-foundation.org> |
binfmt_elf warning fix fs/binfmt_elf.c: In function 'load_elf_binary': fs/binfmt_elf.c:1002: warning: 'interp_map_addr' may be used uninitialized in this function The compiler (gcc-4.1.0) is correct, but it failed to notice that we didn't use the resulting value. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
60bfba7e |
|
16-Jul-2007 |
Jan Kratochvil <honza@jikos.cz> |
PIE randomization This patch is using mmap()'s randomization functionality in such a way that it maps the main executable of (specially compiled/linked -pie/-fpie) ET_DYN binaries onto a random address (in cases in which mmap() is allowed to perform a randomization). Origin of this patch is in exec-shield (http://people.redhat.com/mingo/exec-shield/) [jkosina@suse.cz: pie randomization: fix BAD_ADDR macro] Signed-off-by: Jan Kratochvil <honza@jikos.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ef7320ed |
|
06-Jul-2007 |
Michael Ellerman <michael@ellerman.id.au> |
Fix elf_core_dump() when writing arch specific notes (spu coredumps) elf_core_dump() supports dumping arch specific ELF notes, via the #define ELF_CORE_WRITE_EXTRA_NOTES. Currently the only user of this is the powerpc spu coredump code. There is a bug in the handling of foffset WRT the arch notes, which causes us to erroneously increment foffset by the size of the arch notes, leaving a block of zeroes in the file, and causing all subsequent data in the file to be at <supposed position> + <arch note size>. eg: LOAD 0x050000 0x00100000 0x00000000 0x20000 0x20000 R E 0x10000 Tells us we should have a chunk of data at 0x50000. The truth is the data is at 0x90dbc = 0x50000 + 0x40dbc (the size of the arch notes). This bug prevents gdb from reading the core file correctly. The simplest fix is to simply remember the size of the arch notes, and add it to foffset after we've written the arch notes. The only drawback is that if the arch code doesn't write as many bytes as it said it would, we end up with a broken core dump again. For now I think that's a reasonable requirement. Tested on a Cell blade, gdb no longer complains about the core file being bogus. While I'm here I should point out that the spu coredump code does not work if we're dumping to a pipe - we'll have to wait for 23 to fix that. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b140f251 |
|
08-May-2007 |
Alexey Kuznetsov <alexey@openvz.org> |
Invalid return value of execve() resulting in oopses When elf loader fails to map executable (due to memory shortage or because binary is malformed), it can return 0. Normally, this is invisible because process is killed with SIGKILL and it never returns to user space. But if exec() is called from kernel thread (hotplug, whatever) consequences are more interesting and vary depending on architecture. i386. Nothing especially interesting, execve() just returns with "success" :-) x86_64. Fake zero frame is used on way to caller, RSP/RIP are loaded with zeros, ergo... double fault. ia64. Similar to i386, but r32...r95 are corrupted. Sometimes it oopses due to return to zero PC, sometimes it sees NaT in rXX and oopses due to NaT consumption. Signed-off-by: Alexey Kuznetsov <alexey@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7e80d0d0 |
|
08-May-2007 |
Alexey Dobriyan <adobriyan@gmail.com> |
i386: sched.h inclusion from module.h is baack linux/module.h -> linux/elf.h -> asm-i386/elf.h -> linux/utsname.h -> linux/sched.h Noticeably cut the number of files which are rebuild upon touching sched.h and cut down pulled junk from every module.h inclusion. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e63340ae |
|
08-May-2007 |
Randy Dunlap <randy.dunlap@oracle.com> |
header cleaning: don't include smp_lock.h when not used Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
03221702 |
|
02-Apr-2007 |
Brian Pomerantz <bapper@piratehaven.org> |
[PATCH] fix page leak during core dump When the dump cannot occur most likely because of a full file system and the page to be written is the zero page, the call to page_cache_release() is missed. Signed-off-by: Brian Pomerantz <bapper@mvista.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: David Howells <dhowells@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d1cabd63 |
|
16-Mar-2007 |
James Bottomley <James.Bottomley@SteelEye.com> |
[PATCH] fix process crash caused by randomisation and 64k pages This bug was seen on ppc64, but it could have occurred on any architecture with a page size of 64k or above. The problem is that in fs/binfmt_elf.c:randomize_stack_top() randomizes the stack to within 0x7ff pages. On 4k page machines, this is 8MB; on 64k page boxes, this is 128MB. The problem is that the new binary layout (selected in arch_pick_mmap_layout) places the mapping segment 128MB or the stack rlimit away from the top of the process memory, whichever is larger. If you chose an rlimit of less than 128MB (most defaults are in the 8Mb range) then you can end up having your entire stack randomized away. The fix is to make randomize_stack_top() only steal at most 8MB, which this patch does. However, I have to point out that even with this, your stack rlimit might not be exactly what you get if it's > 128MB, because you're still losing the random offset of up to 8MB. The true fix should be to leave an explicit gap for the randomization plus a buffer when determining mmap_base, but that would involve fixing all the architectures. Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9fbbd4dd |
|
13-Feb-2007 |
Andi Kleen <ak@linux.intel.com> |
[PATCH] x86: Don't require the vDSO for handling a.out signals and in other strange binfmts. vDSO is not necessarily mapped there. Signed-off-by: Andi Kleen <ak@suse.de>
|
#
1fb84496 |
|
26-Jan-2007 |
Alexey Dobriyan <adobriyan@openvz.org> |
[PATCH] core-dumping unreadable binaries via PT_INTERP Proposed patch to fix #5 in http://www.isec.pl/vulnerabilities/isec-0017-binfmt_elf.txt aka http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2004-1073 To reproduce, do * grab poc at the end of advisory. * add line "eph.p_memsz = 4096;" after "eph.p_filesz = 4096;" where first "4096" is something equal to or greater than 4096. * ./poc /usr/bin/sudo && ls -l Here I get with 2.6.20-rc5: -rw------- 1 ad ad 102400 2007-01-15 19:17 core ---s--x--x 2 root root 101820 2007-01-15 19:15 /usr/bin/sudo Check for MAY_READ like binfmt_misc.c does. Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f47aef55 |
|
26-Jan-2007 |
Roland McGrath <roland@redhat.com> |
[PATCH] i386 vDSO: use VM_ALWAYSDUMP This patch fixes core dumps to include the vDSO vma, which is left out now. It removes the special-case core writing macros, which were not doing the right thing for the vDSO vma anyway. Instead, it uses VM_ALWAYSDUMP in the vma; there is no need for the fixmap page to be installed. It handles the CONFIG_COMPAT_VDSO case by making elf_core_dump use the fake vma from get_gate_vma after real vmas in the same way the /proc/PID/maps code does. This changes core dumps so they no longer include the non-PT_LOAD phdrs from the vDSO. I made the change to add them in the first place, but in turned out that nothing ever wanted them there since the advent of NT_AUXV. It's cleaner to leave them out, and just let the phdrs inside the vDSO image speak for themselves. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e5b97dde |
|
26-Jan-2007 |
Roland McGrath <roland@redhat.com> |
[PATCH] Add VM_ALWAYSDUMP This patch adds the VM_ALWAYSDUMP flag for vm_flags in vm_area_struct. This provides a clean explicit way to have a vma always included in core dumps, as is needed for vDSO's. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
90cb28e8 |
|
06-Jan-2007 |
Linus Torvalds <torvalds@woody.osdl.org> |
Revert "[PATCH] binfmt_elf: randomize PIE binaries (2nd try)" This reverts commit 59287c0913cc9a6c75712a775f6c1c1ef418ef3b. Hugh Dickins reports that it causes random failures on x86 with SuSE 10.2, and points out "Isn't that randomization, anywhere from 0x10000 to ELF_ET_DYN_BASE, sure to place the ET_DYN from time to time just where the comment says it's trying to avoid? I assume that somehow results in the error reported." (where the comment in question is the existing comment in the source code about mmap/brk clashes). Suggested-by: Hugh Dickins <hugh@veritas.com> Acked-by: Marcus Meissner <meissner@suse.de> Cc: Andrew Morton <akpm@osdl.org> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
937949d9 |
|
08-Dec-2006 |
Cedric Le Goater <clg@fr.ibm.com> |
[PATCH] add process_session() helper routine Replace occurences of task->signal->session by a new process_session() helper routine. It will be useful for pid namespaces to abstract the session pid number. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
0f7fc9e4 |
|
08-Dec-2006 |
Josef "Jeff" Sipek <jsipek@cs.sunysb.edu> |
[PATCH] VFS: change struct file to use struct path This patch changes struct file to use struct path instead of having independent pointers to struct dentry and struct vfsmount, and converts all users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}. Additionally, it adds two #define's to make the transition easier for users of the f_dentry and f_vfsmnt. Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
8de61e69 |
|
06-Dec-2006 |
David Rientjes <rientjes@cs.washington.edu> |
[PATCH] fs: remove unused variable Removed unused 'have_pt_gnu_stack' variable. Reported by David Binderman <dcb314@hotmail.com> Signed-off-by: David Rientjes <rientjes@cs.washington.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
386d9a7e |
|
06-Dec-2006 |
Magnus Damm <magnus@valinux.co.jp> |
[PATCH] elf: Always define elf_addr_t in linux/elf.h Define elf_addr_t in linux/elf.h. The size of the type is determined using ELF_CLASS. This allows us to remove the defines that today are spread all over .c and .h files. Signed-off-by: Magnus Damm <magnus@valinux.co.jp> Cc: Daniel Jacobowitz <drow@false.org> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
841d5fb7 |
|
06-Dec-2006 |
Heiko Carstens <hca@linux.ibm.com> |
[PATCH] binfmt: fix uaccess handling Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
59287c09 |
|
06-Dec-2006 |
Marcus Meissner <meissner@suse.de> |
[PATCH] binfmt_elf: randomize PIE binaries (2nd try) Randomizes -pie compiled binaries from 64k (0x10000) up to ELF_ET_DYN_BASE. 0 -> 64k is excluded to allow NULL ptr accesses to fail. Signed-off-by: Marcus Meissner <meissner@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
bf1ab978 |
|
22-Nov-2006 |
Dwayne Grant McConnell <decimal@us.ibm.com> |
[POWERPC] coredump: Add SPU elf notes to coredump. This patch adds SPU elf notes to the coredump. It creates a separate note for each of /regs, /fpcr, /lslr, /decr, /decr_status, /mem, /signal1, /signal1_type, /signal2, /signal2_type, /event_mask, /event_status, /mbox_info, /ibox_info, /wbox_info, /dma_info, /proxydma_info, /object-id. A new macro, ARCH_HAVE_EXTRA_NOTES, was created for architectures to specify they have extra elf core notes. A new macro, ELF_CORE_EXTRA_NOTES_SIZE, was created so the size of the additional notes could be calculated and added to the notes phdr entry. A new macro, ELF_CORE_WRITE_EXTRA_NOTES, was created so the new notes would be written after the existing notes. The SPU coredump code resides in spufs. Stub functions are provided in the kernel which are hooked into the spufs code which does the actual work via register_arch_coredump_calls(). A new set of __spufs_<file>_read/get() functions was provided to allow the coredump code to read from the spufs files without having to lock the SPU context for each file read from. Cc: <linux-arch@vger.kernel.org> Signed-off-by: Dwayne Grant McConnell <decimal@us.ibm.com> Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
|
#
a7a0d86f |
|
13-Oct-2006 |
Petr Vandrovec <petr@vandrovec.name> |
[PATCH] Fix core files so they make sense to gdb... It is silly to use non-static variable for writting zeroes to the file. And more seriously, foffset in core dump file dump function was incremented too much, so some parts of core dump were shifted by size of few phdrs and notes down, so although gdb was able to load that file, it did not make lot of sense - in my test case data pages were shifted down by about 900 bytes. Signed-off-by: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
7f14daa1 |
|
12-Oct-2006 |
Petr Vandrovec <petr@vandrovec.name> |
[PATCH] Get core dump code to work... The file based core dump code was broken by pipe changes - a relative llseek returns the absolute file position on success, not the relative one, so dump_seek() always failed when invoked with non-zero current position. Only success/failure can be tested with relative lseek, we have to trust kernel that on success we've got right file offset. With this fix in place I have finally real core files instead of 1KB fragments... Signed-off-by: Petr Vandrovec <petr@vandrovec.name> [ Cleaned it up a bit while here - use SEEK_CUR instead of hardcoding 1 ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
d025c9db |
|
01-Oct-2006 |
Andi Kleen <ak@linux.intel.com> |
[PATCH] Support piping into commands in /proc/sys/kernel/core_pattern Using the infrastructure created in previous patches implement support to pipe core dumps into programs. This is done by overloading the existing core_pattern sysctl with a new syntax: |program When the first character of the pattern is a '|' the kernel will instead threat the rest of the pattern as a command to run. The core dump will be written to the standard input of that program instead of to a file. This is useful for having automatic core dump analysis without filling up disks. The program can do some simple analysis and save only a summary of the core dump. The core dump proces will run with the privileges and in the name space of the process that caused the core dump. I also increased the core pattern size to 128 bytes so that longer command lines fit. Most of the changes comes from allowing core dumps without seeks. They are fairly straight forward though. One small incompatibility is that if someone had a core pattern previously that started with '|' they will get suddenly new behaviour. I think that's unlikely to be a real problem though. Additional background: > Very nice, do you happen to have a program that can accept this kind of > input for crash dumps? I'm guessing that the embedded people will > really want this functionality. I had a cheesy demo/prototype. Basically it wrote the dump to a file again, ran gdb on it to get a backtrace and wrote the summary to a shared directory. Then there was a simple CGI script to generate a "top 10" crashes HTML listing. Unfortunately this still had the disadvantage to needing full disk space for a dump except for deleting it afterwards (in fact it was worse because over the pipe holes didn't work so if you have a holey address map it would require more space). Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as cores (at least it worked with zsh's =(cat core) syntax), so it would be likely possible to do it without temporary space with a simple wrapper that calls it in the right way. I ran out of time before doing that though. The demo prototype scripts weren't very good. If there is really interest I can dig them out (they are currently on a laptop disk on the desk with the laptop itself being in service), but I would recommend to rewrite them for any serious application of this and fix the disk space problem. Also to be really useful it should probably find a way to automatically fetch the debuginfos (I cheated and just installed them in advance). If nobody else does it I can probably do the rewrite myself again at some point. My hope at some point was that desktops would support it in their builtin crash reporters, but at least the KDE people I talked too seemed to be happy with their user space only solution. Alan sayeth: I don't believe that piping as such as neccessarily the right model, but the ability to intercept and processes core dumps from user space is asked for by many enterprise users as well. They want to know about, capture, analyse and process core dumps, often centrally and in automated form. [akpm@osdl.org: loff_t != unsigned long] Signed-off-by: Andi Kleen <ak@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
07f3f05c |
|
30-Sep-2006 |
David Howells <dhowells@redhat.com> |
[PATCH] BLOCK: Move extern declarations out of fs/*.c into header files [try #6] Create a new header file, fs/internal.h, for common definitions local to the sources in the fs/ directory. Move extern definitions that should be in header files from fs/*.c to fs/internal.h or other main header files where they span directories. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
486ccb05 |
|
29-Sep-2006 |
Oleg Nesterov <oleg@tv-sign.ru> |
[PATCH] elf_core_dump: don't take tasklist_lock do_each_thread() is rcu-safe, and all tasks which use this ->mm must sleep in wait_for_completion(&mm->core_done) at this point, so we can use RCU locks. Also, remove unneeded INIT_LIST_HEAD(new) before list_add(new, head). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
3b9b8ab6 |
|
29-Sep-2006 |
Kirill Korotaev <dev@sw.ru> |
[PATCH] Fix unserialized task->files changing Fixed race on put_files_struct on exec with proc. Restoring files on current on error path may lead to proc having a pointer to already kfree-d files_struct. ->files changing at exit.c and khtread.c are safe as exit_files() makes all things under lock. Found during OpenVZ stress testing. [akpm@osdl.org: add export] Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
8d6b5eee |
|
26-Sep-2006 |
Andrew Morton <akpm@osdl.org> |
[PATCH] binfmt_elf: consistently use loff_t As David Howells <dhowells@redhat.com> points out, binfmt_elf sometimes uses off_t, sometimes uses loff_t. Use loff_t throughout. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
c16b63e0 |
|
26-Sep-2006 |
Andi Kleen <ak@linux.intel.com> |
[PATCH] i386/x86-64: Don't randomize stack top when no randomization personality is set Based on patch from Frank van Maarseveen <frankvm@frankvm.com>, but extended. Signed-off-by: Andi Kleen <ak@suse.de>
|
#
b4cac1a0 |
|
10-Jul-2006 |
David Howells <dhowells@redhat.com> |
[PATCH] FDPIC: Move roundup() into linux/kernel.h Move the roundup() macro from binfmt_elf.c into linux/kernel.h as it's generally useful. [akpm@osdl.org: nuke all the other implementations] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
ce51059b |
|
03-Jul-2006 |
Chuck Ebbert <76306.1226@compuserve.com> |
[PATCH] binfmt_elf: fix checks for bad address Fix check for bad address; use macro instead of open-coding two checks. Taken from RHEL4 kernel update. From: Ernie Petrides <petrides@redhat.com> For background, the BAD_ADDR() macro should return TRUE if the address is TASK_SIZE, because that's the lowest address that is *not* valid for user-space mappings. The macro was correct in binfmt_aout.c but was wrong for the "equal to" case in binfmt_elf.c. There were two in-line validations of user-space addresses in binfmt_elf.c, which have been appropriately converted to use the corrected BAD_ADDR() macro in the patch you posted yesterday. Note that the size checks against TASK_SIZE are okay as coded. The additional changes that I propose are below. These are in the error paths for bad ELF entry addresses once load_elf_binary() has already committed to exec'ing the new image (following the tearing down of the task's original address space). The 1st hunk deals with the interp-side of the outer "if". There were two problems here. The printk() should be removed because this path can be triggered at will by a bogus interpreter image created and used by a malicious user. Further, the error code should not be ENOEXEC, because that causes the loop in search_binary_handler() to continue trying other exec handlers (twice, in fact). But it's too late for this to work correctly, because the user address space has already been torn down, and an exec() failure cannot be returned to the user code because the code no longer exists. The only recovery is to force a SIGSEGV, but it's best to terminate the search loop immediately. I somewhat arbitrarily chose EINVAL as a fallback error code, but any error returned by load_elf_interp() will override that (but this value will never be seen by user-space). The 2nd hunk deals with the non-interp-side of the outer "if". There were two problems here as well. The SIGSEGV needs to be forced, because a prior sigaction() syscall might have set the associated disposition to SIG_IGN. And the ENOEXEC should be changed to EINVAL as described above. Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Signed-off-by: Ernie Petrides <petrides@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
785d5570 |
|
23-Jun-2006 |
Jesper Juhl <jesper.juhl@gmail.com> |
[PATCH] binflt_elf: remove more casts Remove redundant casts from NEW_AUX_ENT() arguments in fs/binfmt_elf.c Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
f4e5cc2c |
|
23-Jun-2006 |
Jesper Juhl <jesper.juhl@gmail.com> |
[PATCH] binfmt_elf: CodingStyle cleanup and remove some pointless casts Do a CodingStyle cleanup of fs/binfmt_elf.c and also remove some pointless casts of kmalloc() return values in the same file. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
c89681ed |
|
22-Jun-2006 |
Miklos Szeredi <miklos@szeredi.hu> |
[PATCH] remove steal_locks() This patch removes the steal_locks() function. steal_locks() doesn't work correctly with any filesystem that does it's own lock management, including NFS, CIFS, etc. In addition it has weird semantics on local filesystems in case tasks sharing file-descriptor tables are doing POSIX locking operations in parallel to execve(). The steal_locks() function has an effect on applications doing: clone(CLONE_FILES) /* in child */ lock execve lock POSIX locks acquired before execve (by "child", "parent" or any further task sharing files_struct) will after the execve be owned exclusively by "child". According to Chris Wright some LSB/LTP kind of suite triggers without the stealing behavior, but there's no known real-world application that would also fail. Apps using NPTL are not affected, since all other threads are killed before execve. Apps using LinuxThreads are only affected if they - have multiple threads during exec (LinuxThreads doesn't kill other threads, the app may do it with pthread_kill_other_threads_np()) - rely on POSIX locks being inherited across exec Both conditions are documented, but not their interaction. Apps using clone() natively are affected if they - use clone(CLONE_FILES) - rely on POSIX locks being inherited across exec The above scenarios are unlikely, but possible. If the patch is vetoed, there's a plan B, that involves mostly keeping the weird stealing semantics, but changing the way lock ownership is handled so that network and local filesystems work consistently. That would add more complexity though, so this solution seems to be preferred by most people. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Matthew Wilcox <willy@debian.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
913bd906 |
|
25-Mar-2006 |
Andi Kleen <ak@linux.intel.com> |
[PATCH] x86_64: Increase the variability of the process stack on 64bit architectures 8MB is not really very random, use 1GB (or more with larger page sizes) instead. Also use the low bits of the random generator output now instead of throwing them away. Only enabled on x86-64 right now. Other architectures need to add a suitable STACK_RND_MASK Cc: mingo@elte.hu Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
55148548 |
|
25-Mar-2006 |
Carsten Otte <cotte@de.ibm.com> |
[PATCH] remove needless check in binfmt_elf.c Local variable i is unsigned int and thus cannot be negative. (akpm: unsigneds shouldn't be called `i'. This value cannot possibly be negative anyway). Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
11b0b5ab |
|
25-Mar-2006 |
Oliver Neukum <neukum@fachschaft.cup.uni-muenchen.de> |
[PATCH] use kzalloc and kcalloc in core fs code Signed-off-by: Oliver Neukum <oliver@neukum.name> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
5342fba5 |
|
25-Feb-2006 |
Suresh Siddha <suresh.b.siddha@intel.com> |
[PATCH] x86_64: Check for bad elf entry address. Fixes a local DOS on Intel systems that lead to an endless recursive fault. AMD machines don't seem to be affected. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
858119e1 |
|
14-Jan-2006 |
Arjan van de Ven <arjan@infradead.org> |
[PATCH] Unlinline a bunch of other functions Remove the "inline" keyword from a bunch of big functions in the kernel with the goal of shrinking it by 30kb to 40kb Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
74da6cd0 |
|
10-Jan-2006 |
Jesper Juhl <juhl-lkml@dif.dk> |
missing printk loglevel and tiny tiny whitespace change in binfmt_elf() Patch adds a mising printk loglevel (I think KERN_WARNING is appropriate here) in fs/binfmt_elf.c, and while I was there I made some tiny tiny tiny adjustments to whitespacing in the neighborhood. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
792db3af |
|
09-Jan-2006 |
Jesper Juhl <jesper.juhl@gmail.com> |
[PATCH] fs/binfmt_elf: Remove unneeded kmalloc() return value casts Remove unneeded casts of kmalloc() return value in binfmt_elf. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
708e9a79 |
|
08-Jan-2006 |
Matt Mackall <mpm@selenic.com> |
[PATCH] tiny: Configure ELF core dump support configurable support for ELF core dumps text data bss dec hex filename 3330172 529036 190556 4049764 3dcb64 vmlinux-baseline 3325552 528912 190556 4045020 3db8dc vmlinux-no-elf add/remove: 0/8 grow/shrink: 0/0 up/down: 0/-4424 (-4424) function old new delta fill_note 32 - -32 maydump 58 - -58 dump_seek 67 - -67 writenote 180 - -180 elf_dump_thread_status 274 - -274 fill_psinfo 308 - -308 fill_prstatus 466 - -466 elf_core_dump 3039 - -3039 Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
dda6ebde |
|
08-Jan-2006 |
David Gibson <david@gibson.dropbear.id.au> |
[PATCH] Fix handling of ELF segments with zero filesize mmap() returns -EINVAL if given a zero length, and thus elf_map() in binfmt_elf.c does likewise if it attempts to map a (page-aligned) ELF segment with zero filesize. Such a situation never arises with the default linker scripts, but there's nothing inherently wrong with zero-filesize (but non-zero memsize) ELF segments. Custom linker scripts can generate them, and the kernel should be able to map them; this patch makes it so. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
f99d49ad |
|
07-Nov-2005 |
Jesper Juhl <jesper.juhl@gmail.com> |
[PATCH] kfree cleanup: fs This is the fs/ part of the big kfree cleanup patch. Remove pointless checks for NULL prior to calling kfree() in fs/. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
a9289728 |
|
30-Oct-2005 |
Eric W. Biederman <ebiederm@xmission.com> |
[PATCH] Don't uselessly export task_struct to userspace in core dumps task_struct is an internal structure to the kernel with a lot of good information, that is probably interesting in core dumps. However there is no way for user space to know what format that information is in making it useless. I grepped the GDB 6.3 source code and NT_TASKSTRUCT while defined is not used anywhere else. So I would be surprised if anyone notices it is missing. In addition exporting kernel pointers to all the interesting kernel data structures sounds like the very definition of an information leak. I haven't a clue what someone with evil intentions could do with that information, but in any attack against the kernel it looks like this is the perfect tool for aiming that attack. So since NT_TASKSTRUCT is useless as currently defined and is potentially dangerous, let's just not export it. (akpm: Daniel Jacobowitz <dan@debian.org> "would be amazed" if anything was using NT_TASKSTRUCT). Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
404351e6 |
|
29-Oct-2005 |
Hugh Dickins <hugh@veritas.com> |
[PATCH] mm: mm_init set_mm_counters How is anon_rss initialized? In dup_mmap, and by mm_alloc's memset; but that's not so good if an mm_counter_t is a special type. And how is rss initialized? By set_mm_counter, all over the place. Come on, we just need to initialize them both at once by set_mm_counter in mm_init (which follows the memcpy when forking). Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
6de50517 |
|
11-Oct-2005 |
akpm@osdl.org <akpm@osdl.org> |
[PATCH] binfmt_elf bss padding fix Nir Tzachar <tzachar@cs.bgu.ac.il> points out that if an ELF file specifies a zero-length bss at a whacky address, we cannot load that binary because padzero() tries to zero out the end of the page at the whacky address, and that may not be writeable. See also http://bugzilla.kernel.org/show_bug.cgi?id=5411 So teach load_elf_binary() to skip the bss settng altogether if the elf file has a zero-length bss segment. Cc: Roland McGrath <roland@redhat.com> Cc: Daniel Jacobowitz <dan@debian.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
1363c3cd |
|
21-Jun-2005 |
Wolfgang Wander <wwc@rentec.com> |
[PATCH] Avoiding mmap fragmentation Ingo recently introduced a great speedup for allocating new mmaps using the free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and causes huge performance increases in thread creation. The downside of this patch is that it does lead to fragmentation in the mmap-ed areas (visible via /proc/self/maps), such that some applications that work fine under 2.4 kernels quickly run out of memory on any 2.6 kernel. The problem is twofold: 1) the free_area_cache is used to continue a search for memory where the last search ended. Before the change new areas were always searched from the base address on. So now new small areas are cluttering holes of all sizes throughout the whole mmap-able region whereas before small holes tended to close holes near the base leaving holes far from the base large and available for larger requests. 2) the free_area_cache also is set to the location of the last munmap-ed area so in scenarios where we allocate e.g. five regions of 1K each, then free regions 4 2 3 in this order the next request for 1K will be placed in the position of the old region 3, whereas before we appended it to the still active region 1, placing it at the location of the old region 2. Before we had 1 free region of 2K, now we only get two free regions of 1K -> fragmentation. The patch addresses thes issues by introducing yet another cache descriptor cached_hole_size that contains the largest known hole size below the current free_area_cache. If a new request comes in the size is compared against the cached_hole_size and if the request can be filled with a hole below free_area_cache the search is started from the base instead. The results look promising: Whereas 2.6.12-rc4 fragments quickly and my (earlier posted) leakme.c test program terminates after 50000+ iterations with 96 distinct and fragmented maps in /proc/self/maps it performs nicely (as expected) with thread creation, Ingo's test_str02 with 20000 threads requires 0.7s system time. Taking out Ingo's patch (un-patch available per request) by basically deleting all mentions of free_area_cache from the kernel and starting the search for new memory always at the respective bases we observe: leakme terminates successfully with 11 distinctive hardly fragmented areas in /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system time for Ingo's test_str02 with 20000 threads. Now - drumroll ;-) the appended patch works fine with leakme: it ends with only 7 distinct areas in /proc/self/maps and also thread creation seems sufficiently fast with 0.71s for 20000 threads. Signed-off-by: Wolfgang Wander <wwc@rentec.com> Credit-to: "Richard Purdie" <rpurdie@rpsys.net> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> (partly) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
5db92850 |
|
15-Jun-2005 |
Daniel Jacobowitz <dan@debian.org> |
[PATCH] Fix large core dumps with a 32-bit off_t The ELF core dump code has one use of off_t when writing out segments. Some of the segments may be passed the 2GB limit of an off_t, even on a 32-bit system, so it's important to use loff_t instead. This fixes a corrupted core dump in the bigcore test in GDB's testsuite. Signed-off-by: Daniel Jacobowitz <dan@codesourcery.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
a84a5059 |
|
11-May-2005 |
Greg Kroah-Hartman <gregkh@suse.de> |
[PATCH] fix Linux kernel ELF core dump privilege elevation As reported by Paul Starzetz <ihaquer@isec.pl> Reference: CAN-2005-1263 Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
18c8baff |
|
28-Apr-2005 |
Roland McGrath <roland@redhat.com> |
[PATCH] Fix error recovery path for arch_setup_additional_pages If arch_setup_additional_pages fails, the error path will do some double-frees. This fixes it. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
547ee84c |
|
16-Apr-2005 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
[PATCH] ppc64: Improve mapping of vDSO This patch reworks the way the ppc64 is mapped in user memory by the kernel to make it more robust against possible collisions with executable segments. Instead of just whacking a VMA at 1Mb, I now use get_unmapped_area() with a hint, and I moved the mapping of the vDSO to after the mapping of the various ELF segments and of the interpreter, so that conflicts get caught properly (it still has to be before create_elf_tables since the later will fill the AT_SYSINFO_EHDR with the proper address). While I was at it, I also changed the 32 and 64 bits vDSO's to link at their "natural" address of 1Mb instead of 0. This is the address where they are normally mapped in absence of conflict. By doing so, it should be possible to properly prelink one it's been verified to work on glibc. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|