#
5a00bfd6 |
|
15-Feb-2024 |
Ryan Roberts <ryan.roberts@arm.com> |
arm64/mm: new ptep layer to manage contig bit Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses. The following APIs are treated this way: - ptep_get - set_pte - set_ptes - pte_clear - ptep_get_and_clear - ptep_test_and_clear_young - ptep_clear_flush_young - ptep_set_wrprotect - ptep_set_access_flags Link: https://lkml.kernel.org/r/20240215103205.2607016-11-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Tested-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Barry Song <21cnbao@gmail.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
659e1930 |
|
15-Feb-2024 |
Ryan Roberts <ryan.roberts@arm.com> |
arm64/mm: convert set_pte_at() to set_ptes(..., 1) Since set_ptes() was introduced, set_pte_at() has been implemented as a generic macro around set_ptes(..., 1). So this change should continue to generate the same code. However, making this change prepares us for the transparent contpte support. It means we can reroute set_ptes() to __set_ptes(). Since set_pte_at() is a generic macro, there will be no equivalent __set_pte_at() to reroute to. Note that a couple of calls to set_pte_at() remain in the arch code. This is intentional, since those call sites are acting on behalf of core-mm and should continue to call into the public set_ptes() rather than the arch-private __set_ptes(). Link: https://lkml.kernel.org/r/20240215103205.2607016-9-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Tested-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Barry Song <21cnbao@gmail.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
53273655 |
|
15-Feb-2024 |
Ryan Roberts <ryan.roberts@arm.com> |
arm64/mm: convert READ_ONCE(*ptep) to ptep_get(ptep) There are a number of places in the arch code that read a pte by using the READ_ONCE() macro. Refactor these call sites to instead use the ptep_get() helper, which itself is a READ_ONCE(). Generated code should be the same. This will benefit us when we shortly introduce the transparent contpte support. In this case, ptep_get() will become more complex so we now have all the code abstracted through it. Link: https://lkml.kernel.org/r/20240215103205.2607016-8-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Tested-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Barry Song <21cnbao@gmail.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
7ac8d5b2 |
|
14-Feb-2024 |
Ard Biesheuvel <ardb@kernel.org> |
arm64: Add ESR decoding for exceptions involving translation level -1 The LPA2 feature introduces new FSC values to report abort exceptions related to translation level -1. Define these and wire them up. Reuse the new ESR FSC classification helpers that arrived via the KVM arm64 tree, and update the one for translation faults to check specifically for a translation fault at level -1. (Access flag or permission faults cannot occur at level -1 because they alway involve a descriptor at the superior level so changing those helpers is not needed). Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-73-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
46e714c7 |
|
26-Dec-2023 |
Suren Baghdasaryan <surenb@google.com> |
arch/mm/fault: fix major fault accounting when retrying under per-VMA lock A test [1] in Android test suite started failing after [2] was merged. It turns out that after handling a major fault under per-VMA lock, the process major fault counter does not register that fault as major. Before [2] read faults would be done under mmap_lock, in which case FAULT_FLAG_TRIED flag is set before retrying. That in turn causes mm_account_fault() to account the fault as major once retry completes. With per-VMA locks we often retry because a fault can't be handled without locking the whole mm using mmap_lock. Therefore such retries do not set FAULT_FLAG_TRIED flag. This logic does not work after [2] because we can now handle read major faults under per-VMA lock and upon retry the fact there was a major fault gets lost. Fix this by setting FAULT_FLAG_TRIED after retrying under per-VMA lock if VM_FAULT_MAJOR was returned. Ideally we would use an additional VM_FAULT bit to indicate the reason for the retry (could not handle under per-VMA lock vs other reason) but this simpler solution seems to work, so keeping it simple. [1] https://cs.android.com/android/platform/superproject/+/master:test/vts-testcase/kernel/api/drop_caches_prop/drop_caches_test.cpp [2] https://lore.kernel.org/all/20231006195318.4087158-6-willy@infradead.org/ Link: https://lkml.kernel.org/r/20231226214610.109282-1-surenb@google.com Fixes: 12214eba1992 ("mm: handle read faults under the VMA lock") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
4e00f1d9 |
|
16-Oct-2023 |
Mark Rutland <mark.rutland@arm.com> |
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN We use cpus_have_const_cap() to check for ARM64_HAS_EPAN but this is not necessary and alternative_has_cap() or cpus_have_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_HAS_EPAN cpucap is used to affect two things: 1) The permision bits used for userspace executable mappings, which are chosen by adjust_protection_map(), which is an arch_initcall. This is called after the ARM64_HAS_EPAN cpucap has been detected and alternatives have been patched, and before any userspace translation tables exist. 2) The handling of faults taken from (user or kernel) accesses to userspace executable mappings in do_page_fault(). Userspace translation tables are created after adjust_protection_map() is called, and hence after the ARM64_HAS_EPAN cpucap has been detected and alternatives have been patched. Neither of these run until after ARM64_HAS_EPAN cpucap has been detected and alternatives have been patched, and hence there's no need to use cpus_have_const_cap(). Since adjust_protection_map() is only executed once at boot time it would be best for it to use cpus_have_cap(), and since do_page_fault() is executed frequently it would be best for it to use alternatives_have_cap_unlikely(). This patch replaces the uses of cpus_have_const_cap() with cpus_have_cap() and alternative_has_cap_unlikely(), which will avoid generating redundant code, and should be better for all subsequent calls at runtime. The ARM64_HAS_EPAN cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
4089eef0 |
|
30-Jun-2023 |
Suren Baghdasaryan <surenb@google.com> |
mm: drop per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED handle_mm_fault returning VM_FAULT_RETRY or VM_FAULT_COMPLETED means mmap_lock has been released. However with per-VMA locks behavior is different and the caller should still release it. To make the rules consistent for the caller, drop the per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED. Currently the only path returning VM_FAULT_RETRY under per-VMA locks is do_swap_page and no path returns VM_FAULT_COMPLETED for now. [willy@infradead.org: fix riscv] Link: https://lkml.kernel.org/r/CAJuCfpE6GWEx1rPBmNpUfoD5o-gNFz9-UFywzCE2PbEGBiVz7g@mail.gmail.com Link: https://lkml.kernel.org/r/20230630211957.1341547-4-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Peter Xu <peterx@redhat.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hillf Danton <hdanton@sina.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Minchan Kim <minchan@google.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
284e0592 |
|
24-Jul-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
mm: remove CONFIG_PER_VMA_LOCK ifdefs Patch series "Handle most file-backed faults under the VMA lock", v3. This patchset adds the ability to handle page faults on parts of files which are already in the page cache without taking the mmap lock. This patch (of 10): Provide lock_vma_under_rcu() when CONFIG_PER_VMA_LOCK is not defined to eliminate ifdefs in the users. Link: https://lkml.kernel.org/r/20230724185410.1124082-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230724185410.1124082-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
24be4d0b |
|
03-Jul-2023 |
SeongJae Park <sj@kernel.org> |
arch/arm64/mm/fault: Fix undeclared variable error in do_page_fault() Commit ae870a68b5d1 ("arm64/mm: Convert to using lock_mm_and_find_vma()") made do_page_fault() to use 'vma' even if CONFIG_PER_VMA_LOCK is not defined, but the declaration is still in the ifdef. As a result, building kernel without the config fails with undeclared variable error as below: arch/arm64/mm/fault.c: In function 'do_page_fault': arch/arm64/mm/fault.c:624:2: error: 'vma' undeclared (first use in this function); did you mean 'vmap'? 624 | vma = lock_mm_and_find_vma(mm, addr, regs); | ^~~ | vmap Fix it by moving the declaration out of the ifdef. Fixes: ae870a68b5d1 ("arm64/mm: Convert to using lock_mm_and_find_vma()") Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ae870a68 |
|
15-Jun-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
arm64/mm: Convert to using lock_mm_and_find_vma() This converts arm64 to use the new page fault helper. It was very straightforward, but still needed a fix for the "obvious" conversion I initially did. Thanks to Suren for the fix and testing. Fixed-and-tested-by: Suren Baghdasaryan <surenb@google.com> Unnecessary-code-removal-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
52924726 |
|
08-Jun-2023 |
Hugh Dickins <hughd@google.com> |
arm64: allow pte_offset_map() to fail In rare transient cases, not yet made possible, pte_offset_map() and pte_offset_map_lock() may not find a page table: handle appropriately. Link: https://lkml.kernel.org/r/35e46485-8499-4337-c51f-b8fa495a1a93@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: John David Anglin <dave.anglin@bell.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
bb6e04a1 |
|
09-May-2023 |
Arnd Bergmann <arnd@arndb.de> |
kasan: use internal prototypes matching gcc-13 builtins gcc-13 warns about function definitions for builtin interfaces that have a different prototype, e.g.: In file included from kasan_test.c:31: kasan.h:574:6: error: conflicting types for built-in function '__asan_register_globals'; expected 'void(void *, long int)' [-Werror=builtin-declaration-mismatch] 574 | void __asan_register_globals(struct kasan_global *globals, size_t size); kasan.h:577:6: error: conflicting types for built-in function '__asan_alloca_poison'; expected 'void(void *, long int)' [-Werror=builtin-declaration-mismatch] 577 | void __asan_alloca_poison(unsigned long addr, size_t size); kasan.h:580:6: error: conflicting types for built-in function '__asan_load1'; expected 'void(void *)' [-Werror=builtin-declaration-mismatch] 580 | void __asan_load1(unsigned long addr); kasan.h:581:6: error: conflicting types for built-in function '__asan_store1'; expected 'void(void *)' [-Werror=builtin-declaration-mismatch] 581 | void __asan_store1(unsigned long addr); kasan.h:643:6: error: conflicting types for built-in function '__hwasan_tag_memory'; expected 'void(void *, unsigned char, long int)' [-Werror=builtin-declaration-mismatch] 643 | void __hwasan_tag_memory(unsigned long addr, u8 tag, unsigned long size); The two problems are: - Addresses are passes as 'unsigned long' in the kernel, but gcc-13 expects a 'void *'. - sizes meant to use a signed ssize_t rather than size_t. Change all the prototypes to match these. Using 'void *' consistently for addresses gets rid of a couple of type casts, so push that down to the leaf functions where possible. This now passes all randconfig builds on arm, arm64 and x86, but I have not tested it on the other architectures that support kasan, since they tend to fail randconfig builds in other ways. This might fail if any of the 32-bit architectures expect a 'long' instead of 'int' for the size argument. The __asan_allocas_unpoison() function prototype is somewhat weird, since it uses a pointer for 'stack_top' and an size_t for 'stack_bottom'. This looks like it is meant to be 'addr' and 'size' like the others, but the implementation clearly treats them as 'top' and 'bottom'. Link: https://lkml.kernel.org/r/20230509145735.9263-2-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
0e2aba69 |
|
24-May-2023 |
Jisheng Zhang <jszhang@kernel.org> |
arm64: mm: pass original fault address to handle_mm_fault() in PER_VMA_LOCK block When reading the arm64's PER_VMA_LOCK support code, I found a bit difference between arm64 and other arch when calling handle_mm_fault() during VMA lock-based page fault handling: the fault address is masked before passing to handle_mm_fault(). This is also different from the usage in mmap_lock-based handling. I think we need to pass the original fault address to handle_mm_fault() as we did in commit 84c5e23edecd ("arm64: mm: Pass original fault address to handle_mm_fault()"). If we go through the code path further, we can find that the "masked" fault address can cause mismatched fault address between perf sw major/minor page fault sw event and perf page fault sw event: do_page_fault perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, ..., addr) // orig addr handle_mm_fault mm_account_fault perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, ...) // masked addr Fixes: cd7f176aea5f ("arm64/mm: try VMA lock-based page fault handling first") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20230524131305.2808-1-jszhang@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
|
#
1f9d4ba6 |
|
11-May-2023 |
Mark Brown <broonie@kernel.org> |
arm64/esr: Add decode of ISS2 to data abort reporting The architecture has added more information about faults to ISS2 within ESR. Add decode of this to our data abort fault decode to aid diagnostics. Features that are not currently enabled are included here for completeness. Since the architecture specifies the values of bits within ISS2 in terms of ISS2 rather than in terms of the register as a whole we do so for our definitions as well, this makes it easier to review bitfield definitions. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230417-arm64-iss2-dabt-decode-v3-2-c1fa503e503a@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
e13d32e9 |
|
16-May-2023 |
Arnd Bergmann <arnd@arndb.de> |
arm64: move early_brk64 prototype to header The prototype used for calling early_brk64() is in the file that calls it, which is the wrong place, as it is not included for the definition: arch/arm64/kernel/traps.c:1100:12: error: no previous prototype for 'early_brk64' [-Werror=missing-prototypes] Move it to an appropriate header instead. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20230516160642.523862-15-arnd@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
d91d5808 |
|
02-May-2023 |
Min-Hua Chen <minhuadotchen@gmail.com> |
arm64/mm: mark private VM_FAULT_X defines as vm_fault_t This patch fixes several sparse warnings for fault.c: arch/arm64/mm/fault.c:493:24: sparse: warning: incorrect type in return expression (different base types) arch/arm64/mm/fault.c:493:24: sparse: expected restricted vm_fault_t arch/arm64/mm/fault.c:493:24: sparse: got int arch/arm64/mm/fault.c:501:32: sparse: warning: incorrect type in return expression (different base types) arch/arm64/mm/fault.c:501:32: sparse: expected restricted vm_fault_t arch/arm64/mm/fault.c:501:32: sparse: got int arch/arm64/mm/fault.c:503:32: sparse: warning: incorrect type in return expression (different base types) arch/arm64/mm/fault.c:503:32: sparse: expected restricted vm_fault_t arch/arm64/mm/fault.c:503:32: sparse: got int arch/arm64/mm/fault.c:511:24: sparse: warning: incorrect type in return expression (different base types) arch/arm64/mm/fault.c:511:24: sparse: expected restricted vm_fault_t arch/arm64/mm/fault.c:511:24: sparse: got int arch/arm64/mm/fault.c:670:13: sparse: warning: restricted vm_fault_t degrades to integer arch/arm64/mm/fault.c:670:13: sparse: warning: restricted vm_fault_t degrades to integer arch/arm64/mm/fault.c:713:39: sparse: warning: restricted vm_fault_t degrades to integer Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com> Link: https://lore.kernel.org/r/20230502151909.128810-1-minhuadotchen@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
cd7f176a |
|
27-Feb-2023 |
Suren Baghdasaryan <surenb@google.com> |
arm64/mm: try VMA lock-based page fault handling first Attempt VMA lock-based page fault handling first, and fall back to the existing mmap_lock-based handling if that fails. Link: https://lkml.kernel.org/r/20230227173632.3292573-31-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
6bc56a4d |
|
16-Jan-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
mm: add vma_alloc_zeroed_movable_folio() Replace alloc_zeroed_user_highpage_movable(). The main difference is returning a folio containing a single page instead of returning the page, but take the opportunity to rename the function to match other allocation functions a little better and rewrite the documentation to place more emphasis on the zeroing rather than the highmem aspect. Link: https://lkml.kernel.org/r/20230116191813.2145215-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
d77e59a8 |
|
03-Nov-2022 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: mte: Lock a page for MTE tag initialisation Initialising the tags and setting PG_mte_tagged flag for a page can race between multiple set_pte_at() on shared pages or setting the stage 2 pte via user_mem_abort(). Introduce a new PG_mte_lock flag as PG_arch_3 and set it before attempting page initialisation. Given that PG_mte_tagged is never cleared for a page, consider setting this flag to mean page unlocked and wait on this bit with acquire semantics if the page is locked: - try_page_mte_tagging() - lock the page for tagging, return true if it can be tagged, false if already tagged. No acquire semantics if it returns true (PG_mte_tagged not set) as there is no serialisation with a previous set_page_mte_tagged(). - set_page_mte_tagged() - set PG_mte_tagged with release semantics. The two-bit locking is based on Peter Collingbourne's idea. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Steven Price <steven.price@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Peter Collingbourne <pcc@google.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221104011041.290951-6-pcc@google.com
|
#
e059853d |
|
03-Nov-2022 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: mte: Fix/clarify the PG_mte_tagged semantics Currently the PG_mte_tagged page flag mostly means the page contains valid tags and it should be set after the tags have been cleared or restored. However, in mte_sync_tags() it is set before setting the tags to avoid, in theory, a race with concurrent mprotect(PROT_MTE) for shared pages. However, a concurrent mprotect(PROT_MTE) with a copy on write in another thread can cause the new page to have stale tags. Similarly, tag reading via ptrace() can read stale tags if the PG_mte_tagged flag is set before actually clearing/restoring the tags. Fix the PG_mte_tagged semantics so that it is only set after the tags have been cleared or restored. This is safe for swap restoring into a MAP_SHARED or CoW page since the core code takes the page lock. Add two functions to test and set the PG_mte_tagged flag with acquire and release semantics. The downside is that concurrent mprotect(PROT_MTE) on a MAP_SHARED page may cause tag loss. This is already the case for KVM guests if a VMM changes the page protection while the guest triggers a user_mem_abort(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> [pcc@google.com: fix build with CONFIG_ARM64_MTE disabled] Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Steven Price <steven.price@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221104011041.290951-3-pcc@google.com
|
#
e8dfdf31 |
|
28-Oct-2022 |
Ard Biesheuvel <ardb@kernel.org> |
arm64: efi: Recover from synchronous exceptions occurring in firmware Unlike x86, which has machinery to deal with page faults that occur during the execution of EFI runtime services, arm64 has nothing like that, and a synchronous exception raised by firmware code brings down the whole system. With more EFI based systems appearing that were not built to run Linux (such as the Windows-on-ARM laptops based on Qualcomm SOCs), as well as the introduction of PRM (platform specific firmware routines that are callable just like EFI runtime services), we are more likely to run into issues of this sort, and it is much more likely that we can identify and work around such issues if they don't bring down the system entirely. Since we already use a EFI runtime services call wrapper in assembler, we can quite easily add some code that captures the execution state at the point where the call is made, allowing us to revert to this state and proceed execution if the call triggered a synchronous exception. Given that the kernel and the firmware don't share any data structures that could end up in an indeterminate state, we can happily continue running, as long as we mark the EFI runtime services as unavailable from that point on. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
0bb1fbff |
|
14-Nov-2022 |
Mark Rutland <mark.rutland@arm.com> |
arm64: mm: kfence: only handle translation faults Alexander noted that KFENCE only expects to handle faults from invalid page table entries (i.e. translation faults), but arm64's fault handling logic will call kfence_handle_page_fault() for other types of faults, including alignment faults caused by unaligned atomics. This has the unfortunate property of causing those other faults to be reported as "KFENCE: use-after-free", which is misleading and hinders debugging. Fix this by only forwarding unhandled translation faults to the KFENCE code, similar to what x86 does already. Alexander has verified that this passes all the tests in the KFENCE test suite and avoids bogus reports on misaligned atomics. Link: https://lore.kernel.org/all/20221102081620.1465154-1-zhongbaisong@huawei.com/ Fixes: 840b23986344 ("arm64, kfence: enable KFENCE for ARM64") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marco Elver <elver@google.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20221114104411.2853040-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
7572ac3c |
|
30-Nov-2022 |
Ard Biesheuvel <ardb@kernel.org> |
arm64: efi: Revert "Recover from synchronous exceptions ..." This reverts commit 23715a26c8d81291, which introduced some code in assembler that manipulates both the ordinary and the shadow call stack pointer in a way that could potentially be taken advantage of. So let's revert it, and do a better job the next time around. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
#
23715a26 |
|
28-Oct-2022 |
Ard Biesheuvel <ardb@kernel.org> |
arm64: efi: Recover from synchronous exceptions occurring in firmware Unlike x86, which has machinery to deal with page faults that occur during the execution of EFI runtime services, arm64 has nothing like that, and a synchronous exception raised by firmware code brings down the whole system. With more EFI based systems appearing that were not built to run Linux (such as the Windows-on-ARM laptops based on Qualcomm SOCs), as well as the introduction of PRM (platform specific firmware routines that are callable just like EFI runtime services), we are more likely to run into issues of this sort, and it is much more likely that we can identify and work around such issues if they don't bring down the system entirely. Since we already use a EFI runtime services call wrapper in assembler, we can quite easily add some code that captures the execution state at the point where the call is made, allowing us to revert to this state and proceed execution if the call triggered a synchronous exception. Given that the kernel and the firmware don't share any data structures that could end up in an indeterminate state, we can happily continue running, as long as we mark the EFI runtime services as unavailable from that point on. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
3fc24ef3 |
|
01-Jul-2022 |
Ard Biesheuvel <ardb@kernel.org> |
arm64: compat: Implement misalignment fixups for multiword loads The 32-bit ARM kernel implements fixups on behalf of user space when using LDM/STM or LDRD/STRD instructions on addresses that are not 32-bit aligned. This is not something that is supported by the architecture, but was done anyway to increase compatibility with user space software, which mostly targeted x86 at the time and did not care about aligned accesses. This feature is one of the remaining impediments to being able to switch to 64-bit kernels on 64-bit capable hardware running 32-bit user space, so let's implement it for the arm64 compat layer as well. Note that the intent is to implement the exact same handling of misaligned multi-word loads and stores as the 32-bit kernel does, including what appears to be missing support for user space programs that rely on SETEND to switch to a different byte order and back. Also, like the 32-bit ARM version, we rely on the faulting address reported by the CPU to infer the memory address, instead of decoding the instruction fully to obtain this information. This implementation is taken from the 32-bit ARM tree, with all pieces removed that deal with instructions other than LDRD/STRD and LDM/STM, or that deal with alignment exceptions taken in kernel mode. Cc: debian-arm@lists.debian.org Cc: Vagrant Cascadian <vagrant@debian.org> Cc: Riku Voipio <riku.voipio@iki.fi> Cc: Steve McIntyre <steve@einval.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20220701135322.3025321-1-ardb@kernel.org [catalin.marinas@arm.com: change the option to 'default n'] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
d9272525 |
|
30-May-2022 |
Peter Xu <peterx@redhat.com> |
mm: avoid unnecessary page fault retires on shared memory types I observed that for each of the shared file-backed page faults, we're very likely to retry one more time for the 1st write fault upon no page. It's because we'll need to release the mmap lock for dirty rate limit purpose with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()). Then after that throttling we return VM_FAULT_RETRY. We did that probably because VM_FAULT_RETRY is the only way we can return to the fault handler at that time telling it we've released the mmap lock. However that's not ideal because it's very likely the fault does not need to be retried at all since the pgtable was well installed before the throttling, so the next continuous fault (including taking mmap read lock, walk the pgtable, etc.) could be in most cases unnecessary. It's not only slowing down page faults for shared file-backed, but also add more mmap lock contention which is in most cases not needed at all. To observe this, one could try to write to some shmem page and look at "pgfault" value in /proc/vmstat, then we should expect 2 counts for each shmem write simply because we retried, and vm event "pgfault" will capture that. To make it more efficient, add a new VM_FAULT_COMPLETED return code just to show that we've completed the whole fault and released the lock. It's also a hint that we should very possibly not need another fault immediately on this page because we've just completed it. This patch provides a ~12% perf boost on my aarch64 test VM with a simple program sequentially dirtying 400MB shmem file being mmap()ed and these are the time it needs: Before: 650.980 ms (+-1.94%) After: 569.396 ms (+-1.38%) I believe it could help more than that. We need some special care on GUP and the s390 pgfault handler (for gmap code before returning from pgfault), the rest changes in the page fault handlers should be relatively straightforward. Another thing to mention is that mm_account_fault() does take this new fault as a generic fault to be accounted, unlike VM_FAULT_RETRY. I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping them as-is. Link: https://lkml.kernel.org/r/20220530183450.42886-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vineet Gupta <vgupta@kernel.org> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm part] Acked-by: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Brian Cain <bcain@quicinc.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Richard Weinberger <richard@nod.at> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Will Deacon <will@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Simek <monstr@monstr.eu> Cc: Matt Turner <mattst88@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: David Hildenbrand <david@redhat.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Chris Zankel <chris@zankel.net> Cc: Hugh Dickins <hughd@google.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Helge Deller <deller@gmx.de> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
70c248ac |
|
10-Jun-2022 |
Catalin Marinas <catalin.marinas@arm.com> |
mm: kasan: Skip unpoisoning of user pages Commit c275c5c6d50a ("kasan: disable freed user page poisoning with HW tags") added __GFP_SKIP_KASAN_POISON to GFP_HIGHUSER_MOVABLE. A similar argument can be made about unpoisoning, so also add __GFP_SKIP_KASAN_UNPOISON to user pages. To ensure the user page is still accessible via page_address() without a kasan fault, reset the page->flags tag. With the above changes, there is no need for the arm64 tag_clear_highpage() to reset the page->flags tag. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20220610152141.2148929-3-catalin.marinas@arm.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
bc249e37 |
|
03-May-2022 |
Mark Brown <broonie@kernel.org> |
arm64/mte: Make TCF field values and naming more standard In preparation for automatic generation of the defines for system registers make the values used for the enumeration in SCTLR_ELx.TCF suitable for use with the newly defined SYS_FIELD_PREP_ENUM helper, removing the shift from the define and using the helper to generate it on use instead. Since we only ever interact with this field in EL1 and in preparation for generation of the defines also rename from SCTLR_ELx to SCTLR_EL1. SCTLR_EL2 is not quite the same as SCTLR_EL1 so the conversion does not share the field definitions. There should be no functional change from this patch. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220503170233.507788-4-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
8d56e5c5 |
|
24-Apr-2022 |
Alexandru Elisei <alexandru.elisei@arm.com> |
arm64: Treat ESR_ELx as a 64-bit register In the initial release of the ARM Architecture Reference Manual for ARMv8-A, the ESR_ELx registers were defined as 32-bit registers. This changed in 2018 with version D.a (ARM DDI 0487D.a) of the architecture, when they became 64-bit registers, with bits [63:32] defined as RES0. In version G.a, a new field was added to ESR_ELx, ISS2, which covers bits [36:32]. This field is used when the Armv8.7 extension FEAT_LS64 is implemented. As a result of the evolution of the register width, Linux stores it as both a 64-bit value and a 32-bit value, which hasn't affected correctness so far as Linux only uses the lower 32 bits of the register. Make the register type consistent and always treat it as 64-bit wide. The register is redefined as an "unsigned long", which is an unsigned double-word (64-bit quantity) for the LP64 machine (aapcs64 [1], Table 1, page 14). The type was chosen because "unsigned int" is the most frequent type for ESR_ELx and because FAR_ELx, which is used together with ESR_ELx in exception handling, is also declared as "unsigned long". The 64-bit type also makes adding support for architectural features that use fields above bit 31 easier in the future. The KVM hypervisor will receive a similar update in a subsequent patch. [1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220425114444.368693-4-alexandru.elisei@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
0e25498f |
|
28-Jun-2021 |
Eric W. Biederman <ebiederm@xmission.com> |
exit: Add and use make_task_dead. There are two big uses of do_exit. The first is it's design use to be the guts of the exit(2) system call. The second use is to terminate a task after something catastrophic has happened like a NULL pointer in kernel code. Add a function make_task_dead that is initialy exactly the same as do_exit to cover the cases where do_exit is called to handle catastrophic failure. In time this can probably be reduced to just a light wrapper around do_task_dead. For now keep it exactly the same so that there will be no behavioral differences introducing this new concept. Replace all of the uses of do_exit that use it for catastraphic task cleanup with make_task_dead to make it clear what the code is doing. As part of this rename rewind_stack_do_exit rewind_stack_and_make_dead. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
36ef159f |
|
14-Jan-2022 |
Qi Zheng <zhengqi.arch@bytedance.com> |
mm: remove redundant check about FAULT_FLAG_ALLOW_RETRY bit Since commit 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple times") allowed VM_FAULT_RETRY for multiple times, the FAULT_FLAG_ALLOW_RETRY bit of fault_flag will not be changed in the page fault path, so the following check is no longer needed: flags & FAULT_FLAG_ALLOW_RETRY So just remove it. [akpm@linux-foundation.org: coding style fixes] Link: https://lkml.kernel.org/r/20211110123358.36511-1-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: Peter Xu <peterx@redhat.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
07b742a4 |
|
07-Dec-2021 |
Mark Rutland <mark.rutland@arm.com> |
arm64: mm: log potential KASAN shadow alias When the kernel is built with KASAN_GENERIC or KASAN_SW_TAGS, shadow memory is allocated and mapped for all legitimate kernel addresses, and prior to a regular memory access instrumentation will read from the corresponding shadow address. Due to the way memory addresses are converted to shadow addresses, bogus pointers (e.g. NULL) can generate shadow addresses out of the bounds of allocated shadow memory. For example, with KASAN_GENERIC and 48-bit VAs, NULL would have a shadow address of dfff800000000000, which falls between the TTBR ranges. To make such cases easier to debug, this patch makes die_kernel_fault() dump the real memory address range for any potential KASAN shadow access using kasan_non_canonical_hook(), which results in fault information as below when KASAN is enabled: | Unable to handle kernel paging request at virtual address dfff800000000017 | KASAN: null-ptr-deref in range [0x00000000000000b8-0x00000000000000bf] | Mem abort info: | ESR = 0x96000004 | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | FSC = 0x04: level 0 translation fault | Data abort info: | ISV = 0, ISS = 0x00000004 | CM = 0, WnR = 0 | [dfff800000000017] address between user and kernel address ranges Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Will Deacon <will@kernel.org> Tested-by: Andrey Konovalov <andreyknvl@gmail.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20211207183226.834557-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
6f6cfa58 |
|
07-Dec-2021 |
Mark Rutland <mark.rutland@arm.com> |
arm64: mm: use die_kernel_fault() in do_mem_abort() If we take an unhandled fault from EL1, either: a) The xFSC handler calls die_kernel_fault() directly. In this case, die_kernel_fault() calls: pr_alert(..., msg, addr); mem_abort_decode(esr); show_pte(addr); die(); bust_spinlocks(0); do_exit(SIGKILL); b) The xFSC handler returns to do_mem_abort(), indicating failure. In this case, do_mem_abort() calls: pr_alert(..., addr); mem_abort_decode(esr); show_pte(addr); arm64_notify_die() { die(); } This inconstency is unfortunatem, and in theory in case (b) registered notifiers can prevent us from terminating the faulting thread by returning NOTIFY_STOP, whereupon we'll end up returning from the fault, replaying, and almost certainly get stuck in a livelock spewing errors into dmesg. We don't expect notifers to fix things up, since we dump state to dmesg before invoking them, so it would be more sensible to consistently terminate the thread in this case. This patch has do_mem_abort() call die_kernel_fault() for unhandled faults taken from EL1. Where we would previously have logged a messafe of the form: | Unhandled fault at ${ADDR} ... we will now log a message of the form: | Unable to handle kernel ${FAULT_NAME} at virtual address ${ADDR} ... and we will consistently terminate the thread from which the fault was taken. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Tested-by: Andrey Konovalov <andreyknvl@gmail.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20211207183226.834557-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
76721503 |
|
14-Jul-2021 |
Mark Rutland <mark.rutland@arm.com> |
arm64: kasan: mte: remove redundant mte_report_once logic We have special logic to suppress MTE tag check fault reporting, based on a global `mte_report_once` and `reported` variables. These can be used to suppress calling kasan_report() when taking a tag check fault, but do not prevent taking the fault in the first place, nor does they affect the way we disable tag checks upon taking a fault. The core KASAN code already defaults to reporting a single fault, and has a `multi_shot` control to permit reporting multiple faults. The only place we transiently alter `mte_report_once` is in lib/test_kasan.c, where we also the `multi_shot` state as the same time. Thus `mte_report_once` and `reported` are redundant, and can be removed. When a tag check fault is taken, tag checking will be disabled by `do_tag_recovery` and must be explicitly re-enabled if desired. The test code does this by calling kasan_enable_tagging_sync(). This patch removes the redundant mte_report_once() logic and associated variables. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Will Deacon <will@kernel.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20210714143843.56537-4-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
84c5e23e |
|
14-Jun-2021 |
Gavin Shan <gshan@redhat.com> |
arm64: mm: Pass original fault address to handle_mm_fault() Currently, the lower bits of fault address is cleared before it's passed to handle_mm_fault(). It's unnecessary since generic code does same thing since the commit 1a29d85eb0f19 ("mm: use vmf->address instead of of vmf->virtual_address"). This passes the original fault address to handle_mm_fault() in case the generic code needs to know the exact fault address. Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20210614122701.100515-1-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
e0e3903f |
|
08-Jun-2021 |
Mark Rutland <mark.rutland@arm.com> |
arm64: mm: decode xFSC in mem_abort_decode() It would be helpful if mem_abort_decode() could decode the DFSC/IFSC, as this can make it easier to identify common bugs (e.g. accesses which trigger alignment faults) without having to manually decode the xFSC value. Decode the xFSC in mem_abort_decode(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210608123742.11921-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
064dbfb4 |
|
07-Jun-2021 |
Mark Rutland <mark.rutland@arm.com> |
arm64: entry: convert IRQ+FIQ handlers to C For various reasons we'd like to convert the bulk of arm64's exception triage logic to C. As a step towards that, this patch converts the EL1 and EL0 IRQ+FIQ triage logic to C. Separate C functions are added for the native and compat cases so that in subsequent patches we can handle native/compat differences in C. Since the triage functions can now call arm64_apply_bp_hardening() directly, the do_el0_irq_bp_hardening() wrapper function is removed. Since the user_exit_irqoff macro is now unused, it is removed. The user_enter_irqoff macro is still used by the ret_to_user code, and cannot be removed at this time. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210607094624.34689-8-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
013bb59d |
|
02-Jun-2021 |
Peter Collingbourne <pcc@google.com> |
arm64: mte: handle tags zeroing at page allocation time Currently, on an anonymous page fault, the kernel allocates a zeroed page and maps it in user space. If the mapping is tagged (PROT_MTE), set_pte_at() additionally clears the tags. It is, however, more efficient to clear the tags at the same time as zeroing the data on allocation. To avoid clearing the tags on any page (which may not be mapped as tagged), only do this if the vma flags contain VM_MTE. This requires introducing a new GFP flag that is used to determine whether to clear the tags. The DC GZVA instruction with a 0 top byte (and 0 tag) requires top-byte-ignore. Set the TCR_EL1.{TBI1,TBID1} bits irrespective of whether KASAN_HW is enabled. Signed-off-by: Peter Collingbourne <pcc@google.com> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://linux-review.googlesource.com/id/Id46dc94e30fe11474f7e54f5d65e7658dbdddb26 Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20210602235230.3928842-4-pcc@google.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
fcf9dc02 |
|
03-Jun-2021 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
arm64: mm: Add is_el1_data_abort() helper We alread have is_el1_instruction_abort(), add is_el1_data_abort() helper and use it. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210603120239.169018-1-wangkefeng.wang@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
18107f8a |
|
12-Mar-2021 |
Vladimir Murzin <vladimir.murzin@arm.com> |
arm64: Support execute-only permissions with Enhanced PAN Enhanced Privileged Access Never (EPAN) allows Privileged Access Never to be used with Execute-only mappings. Absence of such support was a reason for 24cecc377463 ("arm64: Revert support for execute-only user mappings"). Thus now it can be revisited and re-enabled. Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210312173811.58284-2-vladimir.murzin@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
bc8fbc5f |
|
25-Feb-2021 |
Marco Elver <elver@google.com> |
kfence: add test suite Add KFENCE test suite, testing various error detection scenarios. Makes use of KUnit for test organization. Since KFENCE's interface to obtain error reports is via the console, the test verifies that KFENCE outputs expected reports to the console. [elver@google.com: fix typo in test] Link: https://lkml.kernel.org/r/X9lHQExmHGvETxY4@elver.google.com [elver@google.com: show access type in report] Link: https://lkml.kernel.org/r/20210111091544.3287013-2-elver@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-9-elver@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Co-developed-by: Alexander Potapenko <glider@google.com> Reviewed-by: Jann Horn <jannh@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d438fabc |
|
25-Feb-2021 |
Marco Elver <elver@google.com> |
kfence: use pt_regs to generate stack trace on faults Instead of removing the fault handling portion of the stack trace based on the fault handler's name, just use struct pt_regs directly. Change kfence_handle_page_fault() to take a struct pt_regs, and plumb it through to kfence_report_error() for out-of-bounds, use-after-free, or invalid access errors, where pt_regs is used to generate the stack trace. If the kernel is a DEBUG_KERNEL, also show registers for more information. Link: https://lkml.kernel.org/r/20201105092133.2075331-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
840b2398 |
|
25-Feb-2021 |
Marco Elver <elver@google.com> |
arm64, kfence: enable KFENCE for ARM64 Add architecture specific implementation details for KFENCE and enable KFENCE for the arm64 architecture. In particular, this implements the required interface in <asm/kfence.h>. KFENCE requires that attributes for pages from its memory pool can individually be set. Therefore, force the entire linear map to be mapped at page granularity. Doing so may result in extra memory allocated for page tables in case rodata=full is not set; however, currently CONFIG_RODATA_FULL_DEFAULT_ENABLED=y is the default, and the common case is therefore not affected by this change. [elver@google.com: add missing copyright and description header] Link: https://lkml.kernel.org/r/20210118092159.145934-3-elver@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-4-elver@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Co-developed-by: Alexander Potapenko <glider@google.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f05842cf |
|
24-Feb-2021 |
Andrey Konovalov <andreyknvl@google.com> |
kasan, arm64: allow using KUnit tests with HW_TAGS mode On a high level, this patch allows running KUnit KASAN tests with the hardware tag-based KASAN mode. Internally, this change reenables tag checking at the end of each KASAN test that triggers a tag fault and leads to tag checking being disabled. Also simplify is_write calculation in report_tag_fault. With this patch KASAN tests are still failing for the hardware tag-based mode; fixes come in the next few patches. [andreyknvl@google.com: export HW_TAGS symbols for KUnit tests] Link: https://lkml.kernel.org/r/e7eeb252da408b08f0c81b950a55fb852f92000b.1613155970.git.andreyknvl@google.com Link: https://linux-review.googlesource.com/id/Id94dc9eccd33b23cda4950be408c27f879e474c8 Link: https://lkml.kernel.org/r/51b23112cf3fd62b8f8e9df81026fa2b15870501.1610733117.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6459b846 |
|
01-Feb-2021 |
Mark Rutland <mark.rutland@arm.com> |
arm64: entry: consolidate Cortex-A76 erratum 1463225 workaround The workaround for Cortex-A76 erratum 1463225 is split across the syscall and debug handlers in separate files. This structure currently forces us to do some redundant work for debug exceptions from EL0, is a little difficult to follow, and gets in the way of some future rework of the exception entry code as it requires exceptions to be unmasked late in the syscall handling path. To simplify things, and as a preparatory step for future rework of exception entry, this patch moves all the workaround logic into entry-common.c. As the debug handler only needs to run for EL1 debug exceptions, we no longer call it for EL0 debug exceptions, and no longer need to check user_mode(regs) as this is always false. For clarity cortex_a76_erratum_1463225_debug_handler() is changed to return bool. In the SVC path, the workaround is applied earlier, but this should have no functional impact as exceptions are still masked. In the debug path we run the fixup before explicitly disabling preemption, but we will not attempt to preempt before returning from the exception. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210202120341.28858-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
abd4737f |
|
05-Feb-2021 |
Miaohe Lin <linmiaohe@huawei.com> |
mm/arm64: Correct obsolete comment in do_page_fault() commit d8ed45c5dcd4 ("mmap locking API: use coccinelle to convert mmap_sem rwsem call sites") has convertd down_read_trylock() to mmap_read_trylock(). But it forgot to update the relevant comment. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Link: https://lore.kernel.org/r/20210205090919.63382-1-linmiaohe@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
3ed86b9a |
|
15-Jan-2021 |
Andrey Konovalov <andreyknvl@google.com> |
kasan, arm64: fix pointer tags in KASAN reports As of the "arm64: expose FAR_EL1 tag bits in siginfo" patch, the address that is passed to report_tag_fault has pointer tags in the format of 0x0X, while KASAN uses 0xFX format (note the difference in the top 4 bits). Fix up the pointer tag for kernel pointers in do_tag_check_fault by setting them to the same value as bit 55. Explicitly use __untagged_addr() instead of untagged_addr(), as the latter doesn't affect TTBR1 addresses. Fixes: dceec3ff7807 ("arm64: expose FAR_EL1 tag bits in siginfo") Fixes: 4291e9ee6189 ("kasan, arm64: print report from tag fault handler") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://linux-review.googlesource.com/id/I9ced973866036d8679e8f4ae325de547eb969649 Link: https://lore.kernel.org/r/ff30b0afe6005fd046f9ac72bfb71822aedccd89.1610731872.git.andreyknvl@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
4291e9ee |
|
22-Dec-2020 |
Andrey Konovalov <andreyknvl@google.com> |
kasan, arm64: print report from tag fault handler Add error reporting for hardware tag-based KASAN. When CONFIG_KASAN_HW_TAGS is enabled, print KASAN report from the arm64 tag fault handler. SAS bits aren't set in ESR for all faults reported in EL1, so it's impossible to find out the size of the access the caused the fault. Adapt KASAN reporting code to handle this case. Link: https://lkml.kernel.org/r/b559c82b6a969afedf53b4694b475f0234067a1a.1606161801.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
98c970da |
|
22-Dec-2020 |
Vincenzo Frascino <vincenzo.frascino@arm.com> |
arm64: mte: add in-kernel tag fault handler Add the implementation of the in-kernel fault handler. When a tag fault happens on a kernel address: * MTE is disabled on the current CPU, * the execution continues. When a tag fault happens on a user address: * the kernel executes do_bad_area() and panics. The tag fault handler for kernel addresses is currently empty and will be filled in by a future commit. Link: https://lkml.kernel.org/r/20201203102628.GB2224@gaia Link: https://lkml.kernel.org/r/ad31529b073e22840b7a2246172c2b67747ed7c4.1606161801.git.andreyknvl@google.com Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Co-developed-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> [catalin.marinas@arm.com: ensure CONFIG_ARM64_PAN is enabled with MTE] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3d2403fd |
|
02-Dec-2020 |
Mark Rutland <mark.rutland@arm.com> |
arm64: uaccess: remove set_fs() Now that the uaccess primitives dont take addr_limit into account, we have no need to manipulate this via set_fs() and get_fs(). Remove support for these, along with some infrastructure this renders redundant. We no longer need to flip UAO to access kernel memory under KERNEL_DS, and head.S unconditionally clears UAO for all kernel configurations via an ERET in init_kernel_el. Thus, we don't need to dynamically flip UAO, nor do we need to context-switch it. However, we still need to adjust PAN during SDEI entry. Masking of __user pointers no longer needs to use the dynamic value of addr_limit, and can use a constant derived from the maximum possible userspace task size. A new TASK_SIZE_MAX constant is introduced for this, which is also used by core code. In configurations supporting 52-bit VAs, this may include a region of unusable VA space above a 48-bit TTBR0 limit, but never includes any portion of TTBR1. Note that TASK_SIZE_MAX is an exclusive limit, while USER_DS and KERNEL_DS were inclusive limits, and is converted to a mask by subtracting one. As the SDEI entry code repurposes the otherwise unnecessary pt_regs::orig_addr_limit field to store the TTBR1 of the interrupted context, for now we rename that to pt_regs::sdei_ttbr1. In future we can consider factoring that out. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: James Morse <james.morse@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201202131558.39270-10-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
2a9b3e6a |
|
30-Nov-2020 |
Mark Rutland <mark.rutland@arm.com> |
arm64: entry: fix EL1 debug transitions In debug_exception_enter() and debug_exception_exit() we trace hardirqs on/off while RCU isn't guaranteed to be watching, and we don't save and restore the hardirq state, and so may return with this having changed. Handle this appropriately with new entry/exit helpers which do the bare minimum to ensure this is appropriately maintained, without marking debug exceptions as NMIs. These are placed in entry-common.c with the other entry/exit helpers. In future we'll want to reconsider whether some debug exceptions should be NMIs, but this will require a significant refactoring, and for now this should prevent issues with lockdep and RCU. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marins <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-12-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
23529049 |
|
30-Nov-2020 |
Mark Rutland <mark.rutland@arm.com> |
arm64: entry: fix non-NMI user<->kernel transitions When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE will WARN() at boot time that interrupts are enabled when we call context_tracking_user_enter(), despite the DAIF flags indicating that IRQs are masked. The problem is that we're not tracking IRQ flag changes accurately, and so lockdep believes interrupts are enabled when they are not (and vice-versa). We can shuffle things so to make this more accurate. For kernel->user transitions there are a number of constraints we need to consider: 1) When we call __context_tracking_user_enter() HW IRQs must be disabled and lockdep must be up-to-date with this. 2) Userspace should be treated as having IRQs enabled from the PoV of both lockdep and tracing. 3) As context_tracking_user_enter() stops RCU from watching, we cannot use RCU after calling it. 4) IRQ flag tracing and lockdep have state that must be manipulated before RCU is disabled. ... with similar constraints applying for user->kernel transitions, with the ordering reversed. The generic entry code has enter_from_user_mode() and exit_to_user_mode() helpers to handle this. We can't use those directly, so we add arm64 copies for now (without the instrumentation markers which aren't used on arm64). These replace the existing user_exit() and user_exit_irqoff() calls spread throughout handlers, and the exception unmasking is left as-is. Note that: * The accounting for debug exceptions from userspace now happens in el0_dbg() and ret_to_user(), so this is removed from debug_exception_enter() and debug_exception_exit(). As user_exit_irqoff() wakes RCU, the userspace-specific check is removed. * The accounting for syscalls now happens in el0_svc(), el0_svc_compat(), and ret_to_user(), so this is removed from el0_svc_common(). This does not adversely affect the workaround for erratum 1463225, as this does not depend on any of the state tracking. * In ret_to_user() we mask interrupts with local_daif_mask(), and so we need to inform lockdep and tracing. Here a trace_hardirqs_off() is sufficient and safe as we have not yet exited kernel context and RCU is usable. * As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only needs to check for the latter. * EL0 SError handling will be dealt with in a subsequent patch, as this needs to be treated as an NMI. Prior to this patch, booting an appropriately-configured kernel would result in spats as below: | DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled()) | WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0 | Modules linked in: | CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3 | Hardware name: linux,dummy-virt (DT) | pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--) | pc : check_flags.part.54+0x1dc/0x1f0 | lr : check_flags.part.54+0x1dc/0x1f0 | sp : ffff80001003bd80 | x29: ffff80001003bd80 x28: ffff66ce801e0000 | x27: 00000000ffffffff x26: 00000000000003c0 | x25: 0000000000000000 x24: ffffc31842527258 | x23: ffffc31842491368 x22: ffffc3184282d000 | x21: 0000000000000000 x20: 0000000000000001 | x19: ffffc318432ce000 x18: 0080000000000000 | x17: 0000000000000000 x16: ffffc31840f18a78 | x15: 0000000000000001 x14: ffffc3184285c810 | x13: 0000000000000001 x12: 0000000000000000 | x11: ffffc318415857a0 x10: ffffc318406614c0 | x9 : ffffc318415857a0 x8 : ffffc31841f1d000 | x7 : 647261685f706564 x6 : ffffc3183ff7c66c | x5 : ffff66ce801e0000 x4 : 0000000000000000 | x3 : ffffc3183fe00000 x2 : ffffc31841500000 | x1 : e956dc24146b3500 x0 : 0000000000000000 | Call trace: | check_flags.part.54+0x1dc/0x1f0 | lock_is_held_type+0x10c/0x188 | rcu_read_lock_sched_held+0x70/0x98 | __context_tracking_enter+0x310/0x350 | context_tracking_enter.part.3+0x5c/0xc8 | context_tracking_user_enter+0x6c/0x80 | finish_ret_to_user+0x2c/0x13cr Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201130115950.22492-8-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
dceec3ff |
|
20-Nov-2020 |
Peter Collingbourne <pcc@google.com> |
arm64: expose FAR_EL1 tag bits in siginfo The kernel currently clears the tag bits (i.e. bits 56-63) in the fault address exposed via siginfo.si_addr and sigcontext.fault_address. However, the tag bits may be needed by tools in order to accurately diagnose memory errors, such as HWASan [1] or future tools based on the Memory Tagging Extension (MTE). Expose these bits via the arch_untagged_si_addr mechanism, so that they are only exposed to signal handlers with the SA_EXPOSE_TAGBITS flag set. [1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://linux-review.googlesource.com/id/Ia8876bad8c798e0a32df7c2ce1256c4771c81446 Link: https://lore.kernel.org/r/0010296597784267472fa13b39f8238d87a72cf8.1605904350.git.pcc@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
96d389ca |
|
28-Oct-2020 |
Rob Herring <robh@kernel.org> |
arm64: Add workaround for Arm Cortex-A77 erratum 1508412 On Cortex-A77 r0p0 and r1p0, a sequence of a non-cacheable or device load and a store exclusive or PAR_EL1 read can cause a deadlock. The workaround requires a DMB SY before and after a PAR_EL1 register read. In addition, it's possible an interrupt (doing a device read) or KVM guest exit could be taken between the DMB and PAR read, so we also need a DMB before returning from interrupt and before returning to a guest. A deadlock is still possible with the workaround as KVM guests must also have the workaround. IOW, a malicious guest can deadlock an affected systems. This workaround also depends on a firmware counterpart to enable the h/w to insert DMB SY after load and store exclusive instructions. See the errata document SDEN-1152370 v10 [1] for more information. [1] https://static.docs.arm.com/101992/0010/Arm_Cortex_A77_MP074_Software_Developer_Errata_Notice_v10.pdf Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/20201028182839.166037-2-robh@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
|
#
6a1bdb17 |
|
30-Sep-2020 |
Will Deacon <will@kernel.org> |
arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op Our use of broadcast TLB maintenance means that spurious page-faults that have been handled already by another CPU do not require additional TLB maintenance. Make flush_tlb_fix_spurious_fault() a no-op and rely on the existing TLB invalidation instead. Add an explicit flush_tlb_page() when making a page dirty, as the TLB is permitted to cache the old read-only entry. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200728092220.GA21800@willie-the-truck Signed-off-by: Will Deacon <will@kernel.org>
|
#
637ec831 |
|
16-Sep-2019 |
Vincenzo Frascino <vincenzo.frascino@arm.com> |
arm64: mte: Handle synchronous and asynchronous tag check faults The Memory Tagging Extension has two modes of notifying a tag check fault at EL0, configurable through the SCTLR_EL1.TCF0 field: 1. Synchronous raising of a Data Abort exception with DFSC 17. 2. Asynchronous setting of a cumulative bit in TFSRE0_EL1. Add the exception handler for the synchronous exception and handling of the asynchronous TFSRE0_EL1.TF0 bit setting via a new TIF flag in do_notify_resume(). On a tag check failure in user-space, whether synchronous or asynchronous, a SIGSEGV will be raised on the faulting thread. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
|
#
6a1bb025 |
|
11-Aug-2020 |
Peter Xu <peterx@redhat.com> |
mm/arm64: use general page fault accounting Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened. To do this, we pass pt_regs pointer into __do_page_fault(). Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Link: http://lkml.kernel.org/r/20200707225021.200906-6-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
bce617ed |
|
11-Aug-2020 |
Peter Xu <peterx@redhat.com> |
mm: do page fault accounting in handle_mm_fault Patch series "mm: Page fault accounting cleanups", v5. This is v5 of the pf accounting cleanup series. It originates from Gerald Schaefer's report on an issue a week ago regarding to incorrect page fault accountings for retried page fault after commit 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple times"): https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/ What this series did: - Correct page fault accounting: we do accounting for a page fault (no matter whether it's from #PF handling, or gup, or anything else) only with the one that completed the fault. For example, page fault retries should not be counted in page fault counters. Same to the perf events. - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf event is used in an adhoc way across different archs. Case (1): for many archs it's done at the entry of a page fault handler, so that it will also cover e.g. errornous faults. Case (2): for some other archs, it is only accounted when the page fault is resolved successfully. Case (3): there're still quite some archs that have not enabled this perf event. Since this series will touch merely all the archs, we unify this perf event to always follow case (1), which is the one that makes most sense. And since we moved the accounting into handle_mm_fault, the other two MAJ/MIN perf events are well taken care of naturally. - Unify definition of "major faults": the definition of "major fault" is slightly changed when used in accounting (not VM_FAULT_MAJOR). More information in patch 1. - Always account the page fault onto the one that triggered the page fault. This does not matter much for #PF handlings, but mostly for gup. More information on this in patch 25. Patchset layout: Patch 1: Introduced the accounting in handle_mm_fault(), not enabled. Patch 2-23: Enable the new accounting for arch #PF handlers one by one. Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.) Patch 25: Cleanup GUP task_struct pointer since it's not needed any more This patch (of 25): This is a preparation patch to move page fault accountings into the general code in handle_mm_fault(). This includes both the per task flt_maj/flt_min counters, and the major/minor page fault perf events. To do this, the pt_regs pointer is passed into handle_mm_fault(). PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault handlers. So far, all the pt_regs pointer that passed into handle_mm_fault() is NULL, which means this patch should have no intented functional change. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d8ed45c5 |
|
08-Jun-2020 |
Michel Lespinasse <walken@google.com> |
mmap locking API: use coccinelle to convert mmap_sem rwsem call sites This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e31cf2f4 |
|
08-Jun-2020 |
Mike Rapoport <rppt@kernel.org> |
mm: don't include asm/pgtable.h if linux/mm.h is already included Patch series "mm: consolidate definitions of page table accessors", v2. The low level page table accessors (pXY_index(), pXY_offset()) are duplicated across all architectures and sometimes more than once. For instance, we have 31 definition of pgd_offset() for 25 supported architectures. Most of these definitions are actually identical and typically it boils down to, e.g. static inline unsigned long pmd_index(unsigned long address) { return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); } static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) { return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); } These definitions can be shared among 90% of the arches provided XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined. For architectures that really need a custom version there is always possibility to override the generic version with the usual ifdefs magic. These patches introduce include/linux/pgtable.h that replaces include/asm-generic/pgtable.h and add the definitions of the page table accessors to the new header. This patch (of 12): The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the functions involving page table manipulations, e.g. pte_alloc() and pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h> in the files that include <linux/mm.h>. The include statements in such cases are remove with a simple loop: for f in $(git grep -l "include <linux/mm.h>") ; do sed -i -e '/include <asm\/pgtable.h>/ d' $f done Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e9f63768 |
|
04-Jun-2020 |
Mike Rapoport <rppt@kernel.org> |
arm64: add support for folded p4d page tables Implement primitives necessary for the 4th level folding, add walks of p4d level where appropriate, replace 5level-fixup.h with pgtable-nop4d.h and remove __ARCH_USE_5LEVEL_HACK. [arnd@arndb.de: fix gcc-10 shift warning] Link: http://lkml.kernel.org/r/20200429185657.4085975-1-arnd@arndb.de Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: James Morse <james.morse@arm.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200414153455.21744-4-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8fcc4ae6 |
|
01-May-2020 |
James Morse <james.morse@arm.com> |
arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work APEI is unable to do all of its error handling work in nmi-context, so it defers non-fatal work onto the irq_work queue. arch_irq_work_raise() sends an IPI to the calling cpu, but this is not guaranteed to be taken before returning to user-space. Unless the exception interrupted a context with irqs-masked, irq_work_run() can run immediately. Otherwise return -EINPROGRESS to indicate ghes_notify_sea() found some work to do, but it hasn't finished yet. With this apei_claim_sea() returning '0' means this external-abort was also notification of a firmware-first RAS error, and that APEI has processed the CPER records. Signed-off-by: James Morse <james.morse@arm.com> Tested-by: Tyler Baicar <baicar@os.amperecomputing.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
6cb4d9a2 |
|
10-Apr-2020 |
Anshuman Khandual <anshuman.khandual@arm.com> |
mm/vma: introduce VM_ACCESS_FLAGS There are many places where all basic VMA access flags (read, write, exec) are initialized or checked against as a group. One such example is during page fault. Existing vma_is_accessible() wrapper already creates the notion of VMA accessibility as a group access permissions. Hence lets just create VM_ACCESS_FLAGS (VM_READ|VM_WRITE|VM_EXEC) which will not only reduce code duplication but also extend the VMA accessibility concept in general. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Salter <msalter@redhat.com> Cc: Nick Hu <nickhu@andestech.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rob Springer <rspringer@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Link: http://lkml.kernel.org/r/1583391014-8170-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4064b982 |
|
01-Apr-2020 |
Peter Xu <peterx@redhat.com> |
mm: allow VM_FAULT_RETRY for multiple times The idea comes from a discussion between Linus and Andrea [1]. Before this patch we only allow a page fault to retry once. We achieved this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing handle_mm_fault() the second time. This was majorly used to avoid unexpected starvation of the system by looping over forever to handle the page fault on a single page. However that should hardly happen, and after all for each code path to return a VM_FAULT_RETRY we'll first wait for a condition (during which time we should possibly yield the cpu) to happen before VM_FAULT_RETRY is really returned. This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY flag when we receive VM_FAULT_RETRY. It means that the page fault handler now can retry the page fault for multiple times if necessary without the need to generate another page fault event. Meanwhile we still keep the FAULT_FLAG_TRIED flag so page fault handler can still identify whether a page fault is the first attempt or not. Then we'll have these combinations of fault flags (only considering ALLOW_RETRY flag and TRIED flag): - ALLOW_RETRY and !TRIED: this means the page fault allows to retry, and this is the first try - ALLOW_RETRY and TRIED: this means the page fault allows to retry, and this is not the first try - !ALLOW_RETRY and !TRIED: this means the page fault does not allow to retry at all - !ALLOW_RETRY and TRIED: this is forbidden and should never be used In existing code we have multiple places that has taken special care of the first condition above by checking against (fault_flags & FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to detect the first retry of a page fault by checking against both (fault_flags & FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now even the 2nd try will have the ALLOW_RETRY set, then use that helper in all existing special paths. One example is in __lock_page_or_retry(), now we'll drop the mmap_sem only in the first attempt of page fault and we'll keep it in follow up retries, so old locking behavior will be retained. This will be a nice enhancement for current code [2] at the same time a supporting material for the future userfaultfd-writeprotect work, since in that work there will always be an explicit userfault writeprotect retry for protected pages, and if that cannot resolve the page fault (e.g., when userfaultfd-writeprotect is used in conjunction with swapped pages) then we'll possibly need a 3rd retry of the page fault. It might also benefit other potential users who will have similar requirement like userfault write-protection. GUP code is not touched yet and will be covered in follow up patch. Please read the thread below for more information. [1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/ [2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
dde16072 |
|
01-Apr-2020 |
Peter Xu <peterx@redhat.com> |
mm: introduce FAULT_FLAG_DEFAULT Although there're tons of arch-specific page fault handlers, most of them are still sharing the same initial value of the page fault flags. Say, merely all of the page fault handlers would allow the fault to be retried, and they also allow the fault to respond to SIGKILL. Let's define a default value for the fault flags to replace those initial page fault flags that were copied over. With this, it'll be far easier to introduce new fault flag that can be used by all the architectures instead of touching all the archs. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b502f038 |
|
01-Apr-2020 |
Peter Xu <peterx@redhat.com> |
arm64/mm: use helper fault_signal_pending() Let the arm64 fault handling to use the new fault_signal_pending() helper, by moving the signal handling out of the retry logic. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220155927.9264-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
24cecc37 |
|
06-Jan-2020 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Revert support for execute-only user mappings The ARMv8 64-bit architecture supports execute-only user permissions by clearing the PTE_USER and PTE_UXN bits, practically making it a mostly privileged mapping but from which user running at EL0 can still execute. The downside, however, is that the kernel at EL1 inadvertently reading such mapping would not trip over the PAN (privileged access never) protection. Revert the relevant bits from commit cab15ce604e5 ("arm64: Introduce execute-only page access permissions") so that PROT_EXEC implies PROT_READ (and therefore PTE_USER) until the architecture gains proper support for execute-only user mappings. Fixes: cab15ce604e5 ("arm64: Introduce execute-only page access permissions") Cc: <stable@vger.kernel.org> # 4.9.x- Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e44ec4a3 |
|
29-Oct-2019 |
Xiang Zheng <zhengxiang9@huawei.com> |
arm64: print additional fault message when executing non-exec memory When attempting to executing non-executable memory, the fault message shows: Unable to handle kernel read from unreadable memory at virtual address ffff802dac469000 This may confuse someone, so add a new fault message for instruction abort. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
bfe29874 |
|
25-Oct-2019 |
James Morse <james.morse@arm.com> |
arm64: entry-common: don't touch daif before bp-hardening The previous patches mechanically transformed the assembly version of entry.S to entry-common.c for synchronous exceptions. The C version of local_daif_restore() doesn't quite do the same thing as the assembly versions if pseudo-NMI is in use. In particular, | local_daif_restore(DAIF_PROCCTX_NOIRQ) will still allow pNMI to be delivered. This is not the behaviour do_el0_ia_bp_hardening() and do_sp_pc_abort() want as it should not be possible for the PMU handler to run as an NMI until the bp-hardening sequence has run. The bp-hardening calls were placed where they are because this was the first C code to run after the relevant exceptions. As we've now moved that point earlier, move the checks and calls earlier too. This makes it clearer that this stuff runs before any kind of exception, and saves modifying PSTATE twice. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
afa7c0e5 |
|
25-Oct-2019 |
James Morse <james.morse@arm.com> |
arm64: Remove asmlinkage from updated functions Now that the callers of these functions have moved into C, they no longer need the asmlinkage annotation. Remove it. Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
b6e43c0e |
|
25-Oct-2019 |
James Morse <james.morse@arm.com> |
arm64: remove __exception annotations Since commit 732674980139 ("arm64: unwind: reference pt_regs via embedded stack frame") arm64 has not used the __exception annotation to dump the pt_regs during stack tracing. in_exception_text() has no callers. This annotation is only used to blacklist kprobes, it means the same as __kprobes. Section annotations like this require the functions to be grouped together between the start/end markers, and placed according to the linker script. For kprobes we also have NOKPROBE_SYMBOL() which logs the symbol address in a section that kprobes parses and blacklists at boot. Using NOKPROBE_SYMBOL() instead lets kprobes publish the list of blacklisted symbols, and saves us from having an arm64 specific spelling of __kprobes. do_debug_exception() already has a NOKPROBE_SYMBOL() annotation. Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
38137335 |
|
15-Oct-2019 |
Mark Rutland <mark.rutland@arm.com> |
arm64: mm: fix inverted PAR_EL1.F check When detecting a spurious EL1 translation fault, we have the CPU retry the translation using an AT S1E1R instruction, and inspect PAR_EL1 to determine if the fault was spurious. When PAR_EL1.F == 0, the AT instruction successfully translated the address without a fault, which implies the original fault was spurious. However, in this case we return false and treat the original fault as if it was not spurious. Invert the return value so that we treat such a case as spurious. Cc: Catalin Marinas <catalin.marinas@arm.com> Fixes: 42f91093b043 ("arm64: mm: Ignore spurious translation faults taken from the kernel") Tested-by: James Morse <james.morse@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
|
#
308c5156 |
|
04-Oct-2019 |
Mark Rutland <mark.rutland@arm.com> |
arm64: mm: fix spurious fault detection When detecting a spurious EL1 translation fault, we attempt to compare ESR_EL1.DFSC with PAR_EL1.FST. We erroneously use FIELD_PREP() to extract PAR_EL1.FST, when we should be using FIELD_GET(). In the wise words of Robin Murphy: | FIELD_GET() is a UBFX, FIELD_PREP() is a BFI Using FIELD_PREP() means that that dfsc & ESR_ELx_FSC_TYPE is always zero, and hence not equal to ESR_ELx_FSC_FAULT. Thus we detect any unhandled translation fault as spurious. ... so let's use FIELD_GET() to ensure we don't decide all translation faults are spurious. ESR_EL1.DFSC occupies bits [5:0], and requires no shifting. Fixes: 42f91093b043332a ("arm64: mm: Ignore spurious translation faults taken from the kernel") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Robin Murphy <robin.murphy@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
|
#
e4365f96 |
|
03-Oct-2019 |
Mark Rutland <mark.rutland@arm.com> |
arm64: mm: avoid virt_to_phys(init_mm.pgd) If we take an unhandled fault in the kernel, we call show_pte() to dump the {PGDP,PGD,PUD,PMD,PTE} values for the corresponding page table walk, where the PGDP value is virt_to_phys(mm->pgd). The boot-time and runtime kernel page tables, init_pg_dir and swapper_pg_dir respectively, are kernel symbols. Thus, it is not valid to call virt_to_phys() on either of these, though we'll do so if we take a fault on a TTBR1 address. When CONFIG_DEBUG_VIRTUAL is not selected, virt_to_phys() will silently fix this up. However, when CONFIG_DEBUG_VIRTUAL is selected, this results in splats as below. Depending on when these occur, they can happen to suppress information needed to debug the original unhandled fault, such as the backtrace: | Unable to handle kernel paging request at virtual address ffff7fffec73cf0f | Mem abort info: | ESR = 0x96000004 | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | Data abort info: | ISV = 0, ISS = 0x00000004 | CM = 0, WnR = 0 | ------------[ cut here ]------------ | virt_to_phys used for non-linear address: 00000000102c9dbe (swapper_pg_dir+0x0/0x1000) | WARNING: CPU: 1 PID: 7558 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0xe0/0x170 arch/arm64/mm/physaddr.c:12 | Kernel panic - not syncing: panic_on_warn set ... | SMP: stopping secondary CPUs | Dumping ftrace buffer: | (ftrace buffer empty) | Kernel Offset: disabled | CPU features: 0x0002,23000438 | Memory Limit: none | Rebooting in 1 seconds.. We can avoid this by ensuring that we call __pa_symbol() for init_mm.pgd, as this will always be a kernel symbol. As the dumped {PGD,PUD,PMD,PTE} values are the raw values from the relevant entries we don't need to handle these specially. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
|
#
42f91093 |
|
22-Aug-2019 |
Will Deacon <will@kernel.org> |
arm64: mm: Ignore spurious translation faults taken from the kernel Thanks to address translation being performed out of order with respect to loads and stores, it is possible for a CPU to take a translation fault when accessing a page that was mapped by a different CPU. For example, in the case that one CPU maps a page and then sets a flag to tell another CPU: CPU 0 ----- MOV X0, <valid pte> STR X0, [Xptep] // Store new PTE to page table DSB ISHST ISB MOV X1, #1 STR X1, [Xflag] // Set the flag CPU 1 ----- loop: LDAR X0, [Xflag] // Poll flag with Acquire semantics CBZ X0, loop LDR X1, [X2] // Translates using the new PTE then the final load on CPU 1 can raise a translation fault because the translation can be performed speculatively before the read of the flag and marked as "faulting" by the CPU. This isn't quite as bad as it sounds since, in reality, code such as: CPU 0 CPU 1 ----- ----- spin_lock(&lock); spin_lock(&lock); *ptr = vmalloc(size); if (*ptr) spin_unlock(&lock); foo = **ptr; spin_unlock(&lock); will not trigger the fault because there is an address dependency on CPU 1 which prevents the speculative translation. However, more exotic code where the virtual address is known ahead of time, such as: CPU 0 CPU 1 ----- ----- spin_lock(&lock); spin_lock(&lock); set_fixmap(0, paddr, prot); if (mapped) mapped = true; foo = *fix_to_virt(0); spin_unlock(&lock); spin_unlock(&lock); could fault. This can be avoided by any of: * Introducing broadcast TLB maintenance on the map path * Adding a DSB;ISB sequence after checking a flag which indicates that a virtual address is now mapped * Handling the spurious fault Given that we have never observed a problem due to this under Linux and future revisions of the architecture are being tightened so that translation table walks are effectively ordered in the same way as explicit memory accesses, we no longer treat spurious kernel faults as fatal if an AT instruction indicates that the access does not trigger a translation fault. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
|
#
233947ef |
|
14-Aug-2019 |
Mark Rutland <mark.rutland@arm.com> |
arm64: memory: fix flipped VA space fallout VA_START used to be the start of the TTBR1 address space, but now it's a point midway though. In a couple of places we still use VA_START to get the start of the TTBR1 address space, so let's fix these up to use PAGE_OFFSET instead. Fixes: 14c127c957c1c607 ("arm64: mm: Flip kernel VA space") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
|
#
2c624fe6 |
|
07-Aug-2019 |
Steve Capper <steve.capper@arm.com> |
arm64: mm: Remove vabits_user Previous patches have enabled 52-bit kernel + user VAs and there is no longer any scenario where user VA != kernel VA size. This patch removes the, now redundant, vabits_user variable and replaces usage with vabits_actual where appropriate. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
|
#
5383cc6e |
|
07-Aug-2019 |
Steve Capper <steve.capper@arm.com> |
arm64: mm: Introduce vabits_actual In order to support 52-bit kernel addresses detectable at boot time, one needs to know the actual VA_BITS detected. A new variable vabits_actual is introduced in this commit and employed for the KVM hypervisor layout, KASAN, fault handling and phys-to/from-virt translation where there would normally be compile time constants. In order to maintain performance in phys_to_virt, another variable physvirt_offset is introduced. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
|
#
2951d5ef |
|
06-Aug-2019 |
Miles Chen <miles.chen@mediatek.com> |
arm64: mm: print hexadecimal EC value in mem_abort_decode() This change prints the hexadecimal EC value in mem_abort_decode(), which makes it easier to lookup the corresponding EC in the ARM Architecture Reference Manual. The commit 1f9b8936f36f ("arm64: Decode information from ESR upon mem faults") prints useful information when memory abort occurs. It would be easier to lookup "0x25" instead of "DABT" in the document. Then we can check the corresponding ISS. For example: Current info Document EC Exception class "CP15 MCR/MRC" 0x3 "MCR or MRC access to CP15a..." "ASIMD" 0x7 "Access to SIMD or floating-point..." "DABT (current EL)" 0x25 "Data Abort taken without..." ... Before: Unable to handle kernel paging request at virtual address 000000000000c000 Mem abort info: ESR = 0x96000046 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000046 CM = 0, WnR = 1 After: Unable to handle kernel paging request at virtual address 000000000000c000 Mem abort info: ESR = 0x96000046 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000046 CM = 0, WnR = 1 Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: James Morse <james.morse@arm.com> Acked-by: Mark Rutland <Mark.rutland@arm.com> Signed-off-by: Miles Chen <miles.chen@mediatek.com> Signed-off-by: Will Deacon <will@kernel.org>
|
#
d8bb6718 |
|
01-Aug-2019 |
Masami Hiramatsu <mhiramat@kernel.org> |
arm64: Make debug exception handlers visible from RCU Make debug exceptions visible from RCU so that synchronize_rcu() correctly track the debug exception handler. This also introduces sanity checks for user-mode exceptions as same as x86's ist_enter()/ist_exit(). The debug exception can interrupt in idle task. For example, it warns if we put a kprobe on a function called from idle task as below. The warning message showed that the rcu_read_lock() caused this problem. But actually, this means the RCU is lost the context which is already in NMI/IRQ. /sys/kernel/debug/tracing # echo p default_idle_call >> kprobe_events /sys/kernel/debug/tracing # echo 1 > events/kprobes/enable /sys/kernel/debug/tracing # [ 135.122237] [ 135.125035] ============================= [ 135.125310] WARNING: suspicious RCU usage [ 135.125581] 5.2.0-08445-g9187c508bdc7 #20 Not tainted [ 135.125904] ----------------------------- [ 135.126205] include/linux/rcupdate.h:594 rcu_read_lock() used illegally while idle! [ 135.126839] [ 135.126839] other info that might help us debug this: [ 135.126839] [ 135.127410] [ 135.127410] RCU used illegally from idle CPU! [ 135.127410] rcu_scheduler_active = 2, debug_locks = 1 [ 135.128114] RCU used illegally from extended quiescent state! [ 135.128555] 1 lock held by swapper/0/0: [ 135.128944] #0: (____ptrval____) (rcu_read_lock){....}, at: call_break_hook+0x0/0x178 [ 135.130499] [ 135.130499] stack backtrace: [ 135.131192] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-08445-g9187c508bdc7 #20 [ 135.131841] Hardware name: linux,dummy-virt (DT) [ 135.132224] Call trace: [ 135.132491] dump_backtrace+0x0/0x140 [ 135.132806] show_stack+0x24/0x30 [ 135.133133] dump_stack+0xc4/0x10c [ 135.133726] lockdep_rcu_suspicious+0xf8/0x108 [ 135.134171] call_break_hook+0x170/0x178 [ 135.134486] brk_handler+0x28/0x68 [ 135.134792] do_debug_exception+0x90/0x150 [ 135.135051] el1_dbg+0x18/0x8c [ 135.135260] default_idle_call+0x0/0x44 [ 135.135516] cpu_startup_entry+0x2c/0x30 [ 135.135815] rest_init+0x1b0/0x280 [ 135.136044] arch_call_rest_init+0x14/0x1c [ 135.136305] start_kernel+0x4d4/0x500 [ 135.136597] So make debug exception visible to RCU can fix this warning. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
|
#
b98cca44 |
|
16-Jul-2019 |
Anshuman Khandual <anshuman.khandual@arm.com> |
mm, kprobes: generalize and rename notify_page_fault() as kprobe_page_fault() Architectures which support kprobes have very similar boilerplate around calling kprobe_fault_handler(). Use a helper function in kprobes.h to unify them, based on the x86 code. This changes the behaviour for other architectures when preemption is enabled. Previously, they would have disabled preemption while calling the kprobe handler. However, preemption would be disabled if this fault was due to a kprobe, so we know the fault was not due to a kprobe handler and can simply return failure. This behaviour was introduced in commit a980c0ef9f6d ("x86/kprobes: Refactor kprobes_fault() like kprobe_exceptions_notify()") [anshuman.khandual@arm.com: export kprobe_fault_handler()] Link: http://lkml.kernel.org/r/1561133358-8876-1-git-send-email-anshuman.khandual@arm.com Link: http://lkml.kernel.org/r/1560420444-25737-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: James Hogan <jhogan@kernel.org> Cc: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
caab277b |
|
02-Jun-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 503 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
4745224b |
|
07-Jun-2019 |
Anshuman Khandual <anshuman.khandual@arm.com> |
arm64/mm: Refactor __do_page_fault() __do_page_fault() is over complicated with multiple goto statements. This cleans up the code flow and while there drops local variable vm_fault_t. Reviewed-by: Mark Rutland <Mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
c49bd02f |
|
07-Jun-2019 |
Anshuman Khandual <anshuman.khandual@arm.com> |
arm64/mm: Document write abort detection from ESR This patch adds an is_write_abort() wrapper and documents the detection of the abort type on cache maintenance operations. Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Andrey Konovalov <andreyknvl@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> [catalin.marinas@arm.com: only keep the is_write_abort() wrapper] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
61681036 |
|
02-Jun-2019 |
Anshuman Khandual <anshuman.khandual@arm.com> |
arm64/mm: Drop task_struct argument from __do_page_fault() The task_struct argument is not getting used in __do_page_fault(). Hence just drop it and use current or cuurent->mm instead where ever required. This does not change any functionality. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
a0509313 |
|
02-Jun-2019 |
Anshuman Khandual <anshuman.khandual@arm.com> |
arm64/mm: Drop mmap_sem before calling __do_kernel_fault() There is an inconsistency between down_read_trylock() success and failure paths while dealing with kernel access for non exception table areas where it calls __do_kernel_fault(). In case of failure it just bails out without holding mmap_sem but when it succeeds it does so while holding mmap_sem. Fix this inconsistency by just dropping mmap_sem in success path as well. __do_kernel_fault() calls die_kernel_fault() which then calls show_pte(). show_pte() in this path might become bit more unreliable without holding mmap_sem. But there are already instances [1] in do_page_fault() where die_kernel_fault() gets called without holding mmap_sem. show_pte() can be made more robust independently but in a later patch. [1] Conditional block for (is_ttbr0_addr && is_el1_permission_fault) Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
01de1776 |
|
04-May-2019 |
Anshuman Khandual <anshuman.khandual@arm.com> |
arm64/mm: Identify user instruction aborts We don't currently set the FAULT_FLAG_INSTRUCTION mm flag for EL0 instruction aborts. This has no functional impact, as we don't override arch_vma_access_permitted(), and the default implementation always returns true. However, it would be helpful to provide the flag so that it can be consumed by tracepoints such as dax_pmd_fault. This patch sets the FAULT_FLAG_INSTRUCTION flag for EL0 instruction aborts. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
969f5ea6 |
|
29-Apr-2019 |
Will Deacon <will@kernel.org> |
arm64: errata: Add workaround for Cortex-A76 erratum #1463225 Revisions of the Cortex-A76 CPU prior to r4p0 are affected by an erratum that can prevent interrupts from being taken when single-stepping. This patch implements a software workaround to prevent userspace from effectively being able to disable interrupts. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
48caebf7 |
|
13-May-2019 |
Will Deacon <will@kernel.org> |
arm64: Print physical address of page table base in show_pte() When dumping the page table in response to an unexpected kernel page fault, we print the virtual (hashed) address of the page table base, but display physical addresses for everything else. Make the page table dumping code in show_pte() consistent, by printing the page table base pointer as a physical address. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
52c6d145 |
|
24-Feb-2019 |
Will Deacon <will@kernel.org> |
arm64: debug: Remove unused return value from do_debug_exception() do_debug_exception() goes out of its way to return a value that isn't ever used, so just make the thing void. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
7048a597 |
|
03-Apr-2019 |
Will Deacon <will@kernel.org> |
arm64: mm: Make show_pte() a static function show_pte() doesn't have any external callers, so make it static. Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
b9a4b9d0 |
|
01-Mar-2019 |
Will Deacon <will@kernel.org> |
arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals FAR_EL1 is UNKNOWN for all debug exceptions other than those caused by taking a hardware watchpoint. Unfortunately, if a debug handler returns a non-zero value, then we will propagate the UNKNOWN FAR value to userspace via the si_addr field of the SIGTRAP siginfo_t. Instead, let's set si_addr to take on the PC of the faulting instruction, which we have available in the current pt_regs. Cc: <stable@vger.kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
d44f1b8d |
|
29-Jan-2019 |
James Morse <james.morse@arm.com> |
arm64: KVM/mm: Move SEA handling behind a single 'claim' interface To split up APEIs in_nmi() path, the caller needs to always be in_nmi(). Add a helper to do the work and claim the notification. When KVM or the arch code takes an exception that might be a RAS notification, it asks the APEI firmware-first code whether it wants to claim the exception. A future kernel-first mechanism may be queried afterwards, and claim the notification, otherwise we fall through to the existing default behaviour. The NOTIFY_SEA code was merged before considering multiple, possibly interacting, NMI-like notifications and the need to consider kernel first in the future. Make the 'claiming' behaviour explicit. Restructuring the APEI code to allow multiple NMI-like notifications means any notification that might interrupt interrupts-masked code must always be wrapped in nmi_enter()/nmi_exit(). This will allow APEI to use in_nmi() to use the right fixmap entries. Mask SError over this window to prevent an asynchronous RAS error arriving and tripping 'nmi_enter()'s BUG_ON(in_nmi()). Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
0db5e022 |
|
29-Jan-2019 |
James Morse <james.morse@arm.com> |
KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing To split up APEIs in_nmi() path, the caller needs to always be in_nmi(). KVM shouldn't have to know about this, pull the RAS plumbing out into a header file. Currently guest synchronous external aborts are claimed as RAS notifications by handle_guest_sea(), which is hidden in the arch codes mm/fault.c. 32bit gets a dummy declaration in system_misc.h. There is going to be more of this in the future if/when the kernel supports the SError-based firmware-first notification mechanism and/or kernel-first notifications for both synchronous external abort and SError. Each of these will come with some Kconfig symbols and a handful of header files. Create a header file for all this. This patch gives handle_guest_sea() a 'kvm_' prefix, and moves the declarations to kvm_ras.h as preparation for a future patch that moves the ACPI-specific RAS code out of mm/fault.c. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
356607f2 |
|
28-Dec-2018 |
Andrey Konovalov <andreyknvl@google.com> |
kasan, arm64: fix up fault handling logic Right now arm64 fault handling code removes pointer tags from addresses covered by TTBR0 in faults taken from both EL0 and EL1, but doesn't do that for pointers covered by TTBR1. This patch adds two helper functions is_ttbr0_addr() and is_ttbr1_addr(), where the latter one accounts for the fact that TTBR1 pointers might be tagged when tag-based KASAN is in use, and uses these helper functions to perform pointer checks in arch/arm64/mm/fault.c. Link: http://lkml.kernel.org/r/3f349b0e9e48b5df3298a6b4ae0634332274494a.1544099024.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
67e7fdfc |
|
06-Dec-2018 |
Steve Capper <steve.capper@arm.com> |
arm64: mm: introduce 52-bit userspace support On arm64 there is optional support for a 52-bit virtual address space. To exploit this one has to be running with a 64KB page size and be running on hardware that supports this. For an arm64 kernel supporting a 48 bit VA with a 64KB page size, some changes are needed to support a 52-bit userspace: * TCR_EL1.T0SZ needs to be 12 instead of 16, * TASK_SIZE needs to reflect the new size. This patch implements the above when the support for 52-bit VAs is detected at early boot time. On arm64 userspace addresses translation is controlled by TTBR0_EL1. As well as userspace, TTBR0_EL1 controls: * The identity mapping, * EFI runtime code. It is possible to run a kernel with an identity mapping that has a larger VA size than userspace (and for this case __cpu_set_tcr_t0sz() would set TCR_EL1.T0SZ as appropriate). However, when the conditions for 52-bit userspace are met; it is possible to keep TCR_EL1.T0SZ fixed at 12. Thus in this patch, the TCR_EL1.T0SZ size changing logic is disabled. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
9a0c0328 |
|
28-Aug-2018 |
Julien Thierry <julien.thierry.kdev@gmail.com> |
arm64: Use daifflag_restore after bp_hardening For EL0 entries requiring bp_hardening, daif status is kept at DAIF_PROCCTX_NOIRQ until after hardening has been done. Then interrupts are enabled through local_irq_enable(). Before using local_irq_* functions, daifflags should be properly restored to a state where IRQs are enabled. Enable IRQs by restoring DAIF_PROCCTX state after bp hardening. Acked-by: James Morse <james.morse@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
359048f9 |
|
22-Sep-2018 |
Anshuman Khandual <anshuman.khandual@arm.com> |
arm64/mm: Define esr_to_debug_fault_info() fault_info[] and debug_fault_info[] are static arrays defining memory abort exception handling functions looking into ESR fault status code encodings. As esr_to_fault_info() is already available providing fault_info[] array lookup, it really makes sense to have a corresponding debug_fault_info[] array lookup function as well. This just adds an equivalent helper function esr_to_debug_fault_info(). Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
dbfe3828 |
|
22-Sep-2018 |
Anshuman Khandual <anshuman.khandual@arm.com> |
arm64/mm: Reorganize arguments for is_el1_permission_fault() Most memory abort exception handling related functions have the arguments in the order (addr, esr, regs) except is_el1_permission_fault(). This changes the argument order in this function as (addr, esr, regs) like others. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
00bbd5d9 |
|
22-Sep-2018 |
Anshuman Khandual <anshuman.khandual@arm.com> |
arm64/mm: Use ESR_ELx_FSC macro while decoding fault exception Just replace hard code value of 63 (0x111111) with an existing macro ESR_ELx_FSC when parsing for the status code during fault exception. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
b4d5557c |
|
22-Sep-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/arm64: Add and use arm64_force_sig_mceerr as appropriate Add arm64_force_sig_mceerr for consistency with arm64_force_sig_fault, and use it in the one location that can take advantage of it. This removes the fiddly filling out of siginfo before sending a signal reporting an memory error to userspace. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
feca355b |
|
22-Sep-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/arm64: Add and use arm64_force_sig_fault where appropriate Wrap force_sig_fault with a helper that calls arm64_show_signal and call arm64_force_sig_fault where appropraite. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
559d8d91 |
|
22-Sep-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/arm64: Only call set_thread_esr once in do_page_fault This code is truly common between the signal sending cases so share it. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
2d2837fa |
|
22-Sep-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/arm64: Only perform one esr_to_fault_info call in do_page_fault As this work is truly common between all of the signal sending cases there is no need to repeat it between the different cases. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
effb093a |
|
22-Sep-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/arm64: Expand __do_user_fault and remove it Not all of the signals passed to __do_user_fault can be handled the same way so expand the now tiny __do_user_fault in it's callers and remove it. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
aefab2b4 |
|
22-Sep-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/arm64: For clarity separate the 3 signal sending cases in do_page_fault It gets easy to confuse what is going on when some code is shared and some not so stop sharing the trivial bits of signal generation to make future updates easier to understand. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
9ea3a974 |
|
22-Sep-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/arm64: Consolidate the two hwpoison cases in do_page_fault These two cases are practically the same and use siginfo differently from the other signals sent from do_page_fault. So consolidate them to make future changes easier. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
f29ad209 |
|
22-Sep-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/arm64: Factor set_thread_esr out of __do_user_fault This pepares for sending signals with something other than arm64_force_sig_info. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
24b8f79d |
|
21-Sep-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/arm64: Remove unneeded tsk parameter from arm64_force_sig_info Every caller passes in current for tsk so there is no need to pass tsk. Instead make tsk a local variable initialized to current. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
6fa998e8 |
|
21-Sep-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/arm64: Push siginfo generation into arm64_notify_die Instead of generating a struct siginfo before calling arm64_notify_die pass the signal number, tne sicode and the fault address into arm64_notify_die and have it call force_sig_fault instead of force_sig_info to let the generic code generate the struct siginfo. This keeps code passing just the needed information into siginfo generating code, making it easier to see what is happening and harder to get wrong. Further by letting the generic code handle the generation of struct siginfo it reduces the number of sites generating struct siginfo making it possible to review them and verify that all of the fiddly details for a structure passed to userspace are handled properly. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
b8925ee2 |
|
07-Aug-2018 |
Will Deacon <will@kernel.org> |
arm64: cpu: Move errata and feature enable callbacks closer to callers The cpu errata and feature enable callbacks are only called via their respective arm64_cpu_capabilities structure and therefore shouldn't exist in the global namespace. Move the PAN, RAS and cache maintenance emulation enable callbacks into the same files as their corresponding arm64_cpu_capabilities structures, making them static in the process. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
50a7ca3c |
|
17-Aug-2018 |
Souptick Joarder <jrdr.linux@gmail.com> |
mm: convert return type of handle_mm_fault() caller to vm_fault_t Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t") In this patch all the caller of handle_mm_fault() are changed to return vm_fault_t type. Link: http://lkml.kernel.org/r/20180617084810.GA6730@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Tony Luck <tony.luck@intel.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: James Hogan <jhogan@kernel.org> Cc: Ley Foon Tan <lftan@altera.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: David S. Miller <davem@davemloft.net> Cc: Richard Weinberger <richard@nod.at> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1035a078 |
|
06-Aug-2018 |
Dongjiu Geng <gengdongjiu@huawei.com> |
arm64 / ACPI: clean the additional checks before calling ghes_notify_sea() In order to remove the additional check before calling the ghes_notify_sea(), make stub definition when !CONFIG_ACPI_APEI_SEA. After this cleanup, we can simply call the ghes_notify_sea() to let APEI driver handle the SEA notification. Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
25be597a |
|
11-Jul-2018 |
Mark Rutland <mark.rutland@arm.com> |
arm64: kill config_sctlr_el1() Now that we have sysreg_clear_set(), we can consistently use this instead of config_sctlr_el1(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Dave Martin <dave.martin@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
c870f14e |
|
21-May-2018 |
Mark Rutland <mark.rutland@arm.com> |
arm64: Unify kernel fault reporting In do_page_fault(), we handle some kernel faults early, and simply die() with a message. For faults handled later, we dump the faulting address, decode the ESR, walk the page tables, and perform a number of steps to ensure that this data is reported. Let's unify the handling of fatal kernel faults with a new die_kernel_fault() helper, handling all of these details. This is largely the same as the existing logic in __do_kernel_fault(), except that addresses are consistently padded to 16 hex characters, as would be expected for a 64-bit address. The messages currently logged in do_page_fault are adjusted to fit into the die_kernel_fault() message template. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
969e61ba |
|
21-May-2018 |
Mark Rutland <mark.rutland@arm.com> |
arm64: make is_permission_fault() name clearer The naming of is_permission_fault() makes it sound like it should return true for permission faults from EL0, but by design, it only does so for faults from EL1. Let's make this clear by dropping el1 in the name, as we do for is_el1_instruction_abort(). Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
cc198460 |
|
22-May-2018 |
Peter Maydell <peter.maydell@linaro.org> |
arm64: fault: Don't leak data in ESR context for user fault on kernel VA If userspace faults on a kernel address, handing them the raw ESR value on the sigframe as part of the delivered signal can leak data useful to attackers who are using information about the underlying hardware fault type (e.g. translation vs permission) as a mechanism to defeat KASLR. However there are also legitimate uses for the information provided in the ESR -- notably the GCC and LLVM sanitizers use this to report whether wild pointer accesses by the application are reads or writes (since a wild write is a more serious bug than a wild read), so we don't want to drop the ESR information entirely. For faulting addresses in the kernel, sanitize the ESR. We choose to present userspace with the illusion that there is nothing mapped in the kernel's part of the address space at all, by reporting all faults as level 0 translation faults taken to EL1. These fields are safe to pass through to userspace as they depend only on the instruction that userspace used to provoke the fault: EC IL (always) ISV CM WNR (for all data aborts) All the other fields in ESR except DFSC are architecturally RES0 for an L0 translation fault taken to EL1, so can be zeroed out without confusing userspace. The illusion is not entirely perfect, as there is a tiny wrinkle where we will report an alignment fault that was not due to the memory type (for instance a LDREX to an unaligned address) as a translation fault, whereas if you do this on real unmapped memory the alignment fault takes precedence. This is not likely to trip anybody up in practice, as the only users we know of for the ESR information who care about the behaviour for kernel addresses only really want to know about the WnR bit. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
3eb0f519 |
|
17-Apr-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Ensure every siginfo we send has all bits initialized Call clear_siginfo to ensure every stack allocated siginfo is properly initialized before being passed to the signal sending functions. Note: It is not safe to depend on C initializers to initialize struct siginfo on the stack because C is allowed to skip holes when initializing a structure. The initialization of struct siginfo in tracehook_report_syscall_exit was moved from the helper user_single_step_siginfo into tracehook_report_syscall_exit itself, to make it clear that the local variable siginfo gets fully initialized. In a few cases the scope of struct siginfo has been reduced to make it clear that siginfo siginfo is not used on other paths in the function in which it is declared. Instances of using memset to initialize siginfo have been replaced with calls clear_siginfo for clarity. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
c0cda3b8 |
|
26-Mar-2018 |
Dave Martin <dave.martin@arm.com> |
arm64: capabilities: Update prototype for enable call back We issue the enable() call back for all CPU hwcaps capabilities available on the system, on all the CPUs. So far we have ignored the argument passed to the call back, which had a prototype to accept a "void *" for use with on_each_cpu() and later with stop_machine(). However, with commit 0a0d111d40fd1 ("arm64: cpufeature: Pass capability structure to ->enable callback"), there are some users of the argument who wants the matching capability struct pointer where there are multiple matching criteria for a single capability. Clean up the declaration of the call back to make it clear. 1) Renamed to cpu_enable(), to imply taking necessary actions on the called CPU for the entry. 2) Pass const pointer to the capability, to allow the call back to check the entry. (e.,g to check if any action is needed on the CPU) 3) We don't care about the result of the call back, turning this to a void. Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: James Morse <james.morse@arm.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Dave Martin <dave.martin@arm.com> [suzuki: convert more users, rename call back and drop results] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
af40ff68 |
|
08-Mar-2018 |
Dave Martin <Dave.Martin@arm.com> |
arm64: signal: Ensure si_code is valid for all fault signals Currently, as reported by Eric, an invalid si_code value 0 is passed in many signals delivered to userspace in response to faults and other kernel errors. Typically 0 is passed when the fault is insufficiently diagnosable or when there does not appear to be any sensible alternative value to choose. This appears to violate POSIX, and is intuitively wrong for at least two reasons arising from the fact that 0 == SI_USER: 1) si_code is a union selector, and SI_USER (and si_code <= 0 in general) implies the existence of a different set of fields (siginfo._kill) from that which exists for a fault signal (siginfo._sigfault). However, the code raising the signal typically writes only the _sigfault fields, and the _kill fields make no sense in this case. Thus when userspace sees si_code == 0 (SI_USER) it may legitimately inspect fields in the inactive union member _kill and obtain garbage as a result. There appears to be software in the wild relying on this, albeit generally only for printing diagnostic messages. 2) Software that wants to be robust against spurious signals may discard signals where si_code == SI_USER (or <= 0), or may filter such signals based on the si_uid and si_pid fields of siginfo._sigkill. In the case of fault signals, this means that important (and usually fatal) error conditions may be silently ignored. In practice, many of the faults for which arm64 passes si_code == 0 are undiagnosable conditions such as exceptions with syndrome values in ESR_ELx to which the architecture does not yet assign any meaning, or conditions indicative of a bug or error in the kernel or system and thus that are unrecoverable and should never occur in normal operation. The approach taken in this patch is to translate all such undiagnosable or "impossible" synchronous fault conditions to SIGKILL, since these are at least probably localisable to a single process. Some of these conditions should really result in a kernel panic, but due to the lack of diagnostic information it is difficult to be certain: this patch does not add any calls to panic(), but this could change later if justified. Although si_code will not reach userspace in the case of SIGKILL, it is still desirable to pass a nonzero value so that the common siginfo handling code can detect incorrect use of si_code == 0 without false positives. In this case the si_code dependent siginfo fields will not be correctly initialised, but since they are not passed to userspace I deem this not to matter. A few faults can reasonably occur in realistic userspace scenarios, and _should_ raise a regular, handleable (but perhaps not ignorable/blockable) signal: for these, this patch attempts to choose a suitable standard si_code value for the raised signal in each case instead of 0. arm64 was the only arch to define a BUS_FIXME code, so after this patch nobody defines it. This patch therefore also removes the relevant code from siginfo_layout(). Cc: James Morse <james.morse@arm.com> Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
92ff0674 |
|
20-Feb-2018 |
Will Deacon <will@kernel.org> |
arm64: mm: Rework unhandled user pagefaults to call arm64_force_sig_info Reporting unhandled user pagefaults via arm64_force_sig_info means that __do_user_fault can be drastically simplified, since it no longer has to worry about printing the fault information and can consequently just take the siginfo as a parameter. Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
1049c308 |
|
20-Feb-2018 |
Will Deacon <will@kernel.org> |
arm64: Pass user fault info to arm64_notify_die instead of printing it There's no need for callers of arm64_notify_die to print information about user faults. Instead, they can pass a string to arm64_notify_die which will be printed subject to show_unhandled_signals. Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
20a004e7 |
|
15-Feb-2018 |
Will Deacon <will@kernel.org> |
arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables In many cases, page tables can be accessed concurrently by either another CPU (due to things like fast gup) or by the hardware page table walker itself, which may set access/dirty bits. In such cases, it is important to use READ_ONCE/WRITE_ONCE when accessing page table entries so that entries cannot be torn, merged or subject to apparent loss of coherence due to compiler transformations. Whilst there are some scenarios where this cannot happen (e.g. pinned kernel mappings for the linear region), the overhead of using READ_ONCE /WRITE_ONCE everywhere is minimal and makes the code an awful lot easier to reason about. This patch consistently uses these macros in the arch code, as well as explicitly namespacing pointers to page table entries from the entries themselves by using adopting a 'p' suffix for the former (as is sometimes used elsewhere in the kernel source). Tested-by: Yury Norov <ynorov@caviumnetworks.com> Tested-by: Richard Ruigrok <rruigrok@codeaurora.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
30d88c0e |
|
02-Feb-2018 |
Will Deacon <will@kernel.org> |
arm64: entry: Apply BP hardening for suspicious interrupts from EL0 It is possible to take an IRQ from EL0 following a branch to a kernel address in such a way that the IRQ is prioritised over the instruction abort. Whilst an attacker would need to get the stars to align here, it might be sufficient with enough calibration so perform BP hardening in the rare case that we see a kernel address in the ELR when handling an IRQ from EL0. Reported-by: Dan Hettena <dhettena@nvidia.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
5dfc6ed2 |
|
02-Feb-2018 |
Will Deacon <will@kernel.org> |
arm64: entry: Apply BP hardening for high-priority synchronous exceptions Software-step and PC alignment fault exceptions have higher priority than instruction abort exceptions, so apply the BP hardening hooks there too if the user PC appears to reside in kernel space. Reported-by: Dan Hettena <dhettena@nvidia.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
51369e39 |
|
05-Feb-2018 |
Robin Murphy <robin.murphy@arm.com> |
arm64: Make USER_DS an inclusive limit Currently, USER_DS represents an exclusive limit while KERNEL_DS is inclusive. In order to do some clever trickery for speculation-safe masking, we need them both to behave equivalently - there aren't enough bits to make KERNEL_DS exclusive, so we have precisely one option. This also happens to correct a longstanding false negative for a range ending on the very top byte of kernel memory. Mark Rutland points out that we've actually got the semantics of addresses vs. segments muddled up in most of the places we need to amend, so shuffle the {USER,KERNEL}_DS definitions around such that we can correct those properly instead of just pasting "-1"s everywhere. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
526c3ddb |
|
03-Jan-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/arm64: Document conflicts with SI_USER and SIGFPE,SIGTRAP,SIGBUS Setting si_code to 0 results in a userspace seeing an si_code of 0. This is the same si_code as SI_USER. Posix and common sense requires that SI_USER not be a signal specific si_code. As such this use of 0 for the si_code is a pretty horribly broken ABI. Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a value of __SI_KILL and now sees a value of SIL_KILL with the result that uid and pid fields are copied and which might copying the si_addr field by accident but certainly not by design. Making this a very flakey implementation. Utilizing FPE_FIXME, BUS_FIXME, TRAP_FIXME siginfo_layout will now return SIL_FAULT and the appropriate fields will be reliably copied. But folks this is a new and unique kind of bad. This is massively untested code bad. This is inventing new and unique was to get siginfo wrong bad. This is don't even think about Posix or what siginfo means bad. This is lots of eyeballs all missing the fact that the code does the wrong thing bad. This is getting stuck and keep making the same mistake bad. I really hope we can find a non userspace breaking fix for this on a port as new as arm64. Possible ABI fixes include: - Send the signal without siginfo - Don't generate a signal - Possibly assign and use an appropriate si_code - Don't handle cases which can't happen Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Tyler Baicar <tbaicar@codeaurora.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Lindgren <tony@atomide.com> Cc: Nicolas Pitre <nico@linaro.org> Cc: Olof Johansson <olof@lixom.net> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-arm-kernel@lists.infradead.org Ref: 53631b54c870 ("arm64: Floating point and SIMD") Ref: 32015c235603 ("arm64: exception: handle Synchronous External Abort") Ref: 1d18c47c735e ("arm64: MMU fault handling and page table management") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
0f15adbb |
|
03-Jan-2018 |
Will Deacon <will@kernel.org> |
arm64: Add skeleton to harden the branch predictor against aliasing attacks Aliasing attacks against CPU branch predictors can allow an attacker to redirect speculative control flow on some CPUs and potentially divulge information from one context to another. This patch adds initial skeleton code behind a new Kconfig option to enable implementation-specific mitigations against these attacks for CPUs that are affected. Co-developed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
faa75e14 |
|
13-Dec-2017 |
Dongjiu Geng <gengdongjiu@huawei.com> |
arm64: fault: avoid send SIGBUS two times do_sea() calls arm64_notify_die() which will always signal user-space. It also returns whether APEI claimed the external abort as a RAS notification. If it returns failure do_mem_abort() will signal user-space too. do_mem_abort() wants to know if we handled the error, we always call arm64_notify_die() so can always return success. Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
80b6eb04 |
|
31-Oct-2017 |
Will Deacon <will@kernel.org> |
arm64: Don't walk page table for user faults in do_mem_abort Commit 42dbf54e8890 ("arm64: consistently log ESR and page table") dumps page table entries for user faults hitting do_bad entries in the fault handler table. Whilst this shouldn't really happen in practice, it's not beyond the realms of possibility if e.g. running an old kernel on a new CPU. Generally, we want to avoid exposing physical addresses under the control of userspace (see commit bf396c09c24 ("arm64: mm: don't print out page table entries on EL0 faults")), so walk the page tables only on exceptions from EL1. Reported-by: Kristina Martsenko <kristina.martsenko@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
42dbf54e |
|
19-Oct-2017 |
Mark Rutland <mark.rutland@arm.com> |
arm64: consistently log ESR and page table When we take a fault we can't handle, we try to dump some relevant information, but we're not consistent about doing so. In do_mem_abort(), we log the full ESR, but don't dump a page table walk. In __do_kernel_fault, we dump an attempted decoding of the ESR (but not the ESR itself) along with a page table walk. Let's try to make things more consistent by dumping the full ESR in mem_abort_decode(), and having do_mem_abort dump a page table walk. The existing dump of the ESR in do_mem_abort() is rendered redundant, and removed. Tested-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Julien Thierry <julien.thierry@arm.com> Cc: Kristina Martsenko <kristina.martsenko@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
3f7c86b2 |
|
17-Oct-2017 |
Julien Thierry <julien.thierry.kdev@gmail.com> |
arm64: Update fault_info table with new exception types Based on: ARM Architecture Reference Manual, ARMv8 (DDI 0487B.b). ARMv8.1 introduces the optional feature ARMv8.1-TTHM which can trigger a new type of memory abort. This exception is triggered when hardware update of page table flags is not atomic in regards to other memory accesses. Replace the corresponding unknown entry with a more accurate one. Cf: Section D10.2.28 ESR_ELx, Exception Syndrome Register (p D10-2381), section D4.4.11 Restriction on memory types for hardware updates on page tables (p D4-2116 - D4-2117). ARMv8.2 does not add new exception types, however it is worth mentioning that when obligatory feature RAS (optional for ARMv8.{0,1}) is implemented, exceptions related to "Synchronous parity or ECC error on memory access, not on translation table walk" become reserved and should not occur. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
0a6de8b8 |
|
01-Oct-2017 |
Mark Rutland <mark.rutland@arm.com> |
arm64: fix misleading data abort decoding Currently data_abort_decode() dumps the ISS field as a decimal value with a '0x' prefix, which is somewhat misleading. Fix it to print as hexadecimal, as was intended. Fixes: 1f9b8936f36f4a8e ("arm64: Decode information from ESR upon mem faults") Reviewed-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
f67d5c4f |
|
22-Sep-2017 |
Will Deacon <will@kernel.org> |
arm64: mm: Remove useless and wrong comments from fault.c Fault.c seems to be a magnet for useless and wrong comments, largely due to its ancestry in other architectures where the code has since moved on, but the comments have remained intact. This patch removes both useless and incorrect comments, leaving only those that say something correct and relevant. Reported-by: Wenjia Zhou <zhiyuan_zhu@htc.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
760bfb47 |
|
28-Sep-2017 |
Will Deacon <will@kernel.org> |
arm64: fault: Route pte translation faults via do_translation_fault We currently route pte translation faults via do_page_fault, which elides the address check against TASK_SIZE before invoking the mm fault handling code. However, this can cause issues with the path walking code in conjunction with our word-at-a-time implementation because load_unaligned_zeropad can end up faulting in kernel space if it reads across a page boundary and runs into a page fault (e.g. by attempting to read from a guard region). In the case of such a fault, load_unaligned_zeropad has registered a fixup to shift the valid data and pad with zeroes, however the abort is reported as a level 3 translation fault and we dispatch it straight to do_page_fault, despite it being a kernel address. This results in calling a sleeping function from atomic context: BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:313 in_atomic(): 0, irqs_disabled(): 0, pid: 10290 Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [...] [<ffffff8e016cd0cc>] ___might_sleep+0x134/0x144 [<ffffff8e016cd158>] __might_sleep+0x7c/0x8c [<ffffff8e016977f0>] do_page_fault+0x140/0x330 [<ffffff8e01681328>] do_mem_abort+0x54/0xb0 Exception stack(0xfffffffb20247a70 to 0xfffffffb20247ba0) [...] [<ffffff8e016844fc>] el1_da+0x18/0x78 [<ffffff8e017f399c>] path_parentat+0x44/0x88 [<ffffff8e017f4c9c>] filename_parentat+0x5c/0xd8 [<ffffff8e017f5044>] filename_create+0x4c/0x128 [<ffffff8e017f59e4>] SyS_mkdirat+0x50/0xc8 [<ffffff8e01684e30>] el0_svc_naked+0x24/0x28 Code: 36380080 d5384100 f9400800 9402566d (d4210000) ---[ end trace 2d01889f2bca9b9f ]--- Fix this by dispatching all translation faults to do_translation_faults, which avoids invoking the page fault logic for faults on kernel addresses. Cc: <stable@vger.kernel.org> Reported-by: Ankit Jain <ankijain@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
289d07a2 |
|
11-Jul-2017 |
Mark Rutland <mark.rutland@arm.com> |
arm64: mm: abort uaccess retries upon fatal signal When there's a fatal signal pending, arm64's do_page_fault() implementation returns 0. The intent is that we'll return to the faulting userspace instruction, delivering the signal on the way. However, if we take a fatal signal during fixing up a uaccess, this results in a return to the faulting kernel instruction, which will be instantly retried, resulting in the same fault being taken forever. As the task never reaches userspace, the signal is not delivered, and the task is left unkillable. While the task is stuck in this state, it can inhibit the forward progress of the system. To avoid this, we must ensure that when a fatal signal is pending, we apply any necessary fixup for a faulting kernel instruction. Thus we will return to an error path, and it is up to that code to make forward progress towards delivering the fatal signal. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Laura Abbott <labbott@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Steve Capper <steve.capper@arm.com> Tested-by: Steve Capper <steve.capper@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Tested-by: James Morse <james.morse@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
af29678f |
|
06-Jul-2017 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Remove the !CONFIG_ARM64_HW_AFDBM alternative code paths Since the pte handling for hardware AF/DBM works even when the hardware feature is not present, make the pte accessors implementation permanent and remove the corresponding #ifdefs. The Kconfig option is kept as it can still be used to disable the feature at the hardware level. Reviewed-by: Will Deacon <will.deacon@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
73e86cb0 |
|
04-Jul-2017 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Move PTE_RDONLY bit handling out of set_pte_at() Currently PTE_RDONLY is treated as a hardware only bit and not handled by the pte_mkwrite(), pte_wrprotect() or the user PAGE_* definitions. The set_pte_at() function is responsible for setting this bit based on the write permission or dirty state. This patch moves the PTE_RDONLY handling out of set_pte_at into the pte_mkwrite()/pte_wrprotect() functions. The PAGE_* definitions to need to be updated to explicitly include PTE_RDONLY when !PTE_WRITE. The patch also removes the redundant PAGE_COPY(_EXEC) definitions as they are identical to the corresponding PAGE_READONLY(_EXEC). Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
3bbf7157 |
|
26-Jun-2017 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Convert pte handling from inline asm to using (cmp)xchg With the support for hardware updates of the access and dirty states, the following pte handling functions had to be implemented using exclusives: __ptep_test_and_clear_young(), ptep_get_and_clear(), ptep_set_wrprotect() and ptep_set_access_flags(). To take advantage of the LSE atomic instructions and also make the code cleaner, convert these pte functions to use the more generic cmpxchg()/xchg(). Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
1f9b8936 |
|
04-Aug-2017 |
Julien Thierry <julien.thierry.kdev@gmail.com> |
arm64: Decode information from ESR upon mem faults When receiving unhandled faults from the CPU, description is very sparse. Adding information about faults decoded from ESR. Added defines to esr.h corresponding ESR fields. Values are based on ARM Archtecture Reference Manual (DDI 0487B.a), section D7.2.28 ESR_ELx, Exception Syndrome Register (ELx) (pages D7-2275 to D7-2280). New output is of the form: [ 77.818059] Mem abort info: [ 77.820826] Exception class = DABT (current EL), IL = 32 bits [ 77.826706] SET = 0, FnV = 0 [ 77.829742] EA = 0, S1PTW = 0 [ 77.832849] Data abort info: [ 77.835713] ISV = 0, ISS = 0x00000070 [ 77.839522] CM = 0, WnR = 1 Signed-off-by: Julien Thierry <julien.thierry@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> [catalin.marinas@arm.com: fix "%lu" in a pr_alert() call] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
6d332747 |
|
25-Jul-2017 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Fix potential race with hardware DBM in ptep_set_access_flags() In a system with DBM (dirty bit management) capable agents there is a possible race between a CPU executing ptep_set_access_flags() (maybe non-DBM capable) and a hardware update of the dirty state (clearing of PTE_RDONLY). The scenario: a) the pte is writable (PTE_WRITE set), clean (PTE_RDONLY set) and old (PTE_AF clear) b) ptep_set_access_flags() is called as a result of a read access and it needs to set the pte to writable, clean and young (PTE_AF set) c) a DBM-capable agent, as a result of a different write access, is marking the entry as young (setting PTE_AF) and dirty (clearing PTE_RDONLY) The current ptep_set_access_flags() implementation would set the PTE_RDONLY bit in the resulting value overriding the DBM update and losing the dirty state. This patch fixes such race by setting PTE_RDONLY to the most permissive (lowest value) of the current entry and the new one. Fixes: 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM") Cc: Will Deacon <will.deacon@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
621f48e4 |
|
21-Jun-2017 |
Tyler Baicar <tbaicar@codeaurora.org> |
arm/arm64: KVM: add guest SEA support Currently external aborts are unsupported by the guest abort handling. Add handling for SEAs so that the host kernel reports SEAs which occur in the guest kernel. When an SEA occurs in the guest kernel, the guest exits and is routed to kvm_handle_guest_abort(). Prior to this patch, a print message of an unsupported FSC would be printed and nothing else would happen. With this patch, the code gets routed to the APEI handling of SEAs in the host kernel to report the SEA information. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
7edda088 |
|
21-Jun-2017 |
Tyler Baicar <tbaicar@codeaurora.org> |
acpi: apei: handle SEA notification type for ARMv8 ARM APEI extension proposal added SEA (Synchronous External Abort) notification type for ARMv8. Add a new GHES error source handling function for SEA. If an error source's notification type is SEA, then this function can be registered into the SEA exception handler. That way GHES will parse and report SEA exceptions when they occur. An SEA can interrupt code that had interrupts masked and is treated as an NMI. To aid this the page of address space for mapping APEI buffers while in_nmi() is always reserved, and ghes_ioremap_pfn_nmi() is changed to use the helper methods to find the prot_t to map with in the same way as ghes_ioremap_pfn_irq(). Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Reviewed-by: James Morse <james.morse@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
32015c23 |
|
21-Jun-2017 |
Tyler Baicar <tbaicar@codeaurora.org> |
arm64: exception: handle Synchronous External Abort SEA exceptions are often caused by an uncorrected hardware error, and are handled when data abort and instruction abort exception classes have specific values for their Fault Status Code. When SEA occurs, before killing the process, report the error in the kernel logs. Update fault_info[] with specific SEA faults so that the new SEA handler is used. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Reviewed-by: James Morse <james.morse@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [will: use NULL instead of 0 when assigning si_addr] Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
0e3a9026 |
|
08-Jun-2017 |
Punit Agrawal <punitagrawal@gmail.com> |
arm64: mm: Update perf accounting to handle poison faults Re-organise the perf accounting for fault handling in preparation for enabling handling of hardware poison faults in subsequent commits. The change updates perf accounting to be inline with the behaviour on x86. With this update, the perf fault accounting - * Always report PERF_COUNT_SW_PAGE_FAULTS * Doesn't report anything else for VM_FAULT_ERROR (which includes hwpoison faults) * Reports PERF_COUNT_SW_PAGE_FAULTS_MAJ if it's a major fault (indicated by VM_FAULT_MAJOR) * Otherwise, reports PERF_COUNT_SW_PAGE_FAULTS_MIN Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
e7c600f1 |
|
08-Jun-2017 |
Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> |
arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling Add VM_FAULT_HWPOISON[_LARGE] handling to the arm64 page fault handler. Handling of VM_FAULT_HWPOISON[_LARGE] is very similar to VM_FAULT_OOM, the only difference is that a different si_code (BUS_MCEERR_AR) is passed to user space and si_addr_lsb field is initialized. Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> (fix new __do_user_fault call-site) Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Steve Capper <steve.capper@arm.com> Tested-by: Manoj Iyer <manoj.iyer@canonical.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
1eb34b6e |
|
15-May-2017 |
Will Deacon <will@kernel.org> |
arm64: fault: Print info about page table structure when dumping pte Whilst debugging a remote crash, I noticed that show_pte is unhelpful when it comes to describing the structure of the page table being walked. This is easily fixed by printing out the page table (swapper vs user), page size and virtual address size when displaying the PGD address. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
83016b20 |
|
09-Jun-2017 |
Kristina Martsenko <kristina.martsenko@arm.com> |
arm64: mm: print file name of faulting vma Print out the name of the file associated with the vma that faulted. This is usually the executable or shared library name. We already print out the task name, but also printing the library name is useful for pinpointing bugs to libraries. Also print the base address and size of the vma, which together with the PC (printed by __show_regs) gives the offset into the library. Fault prints now look like: test[2361]: unhandled level 2 translation fault (11) at 0x00000012, esr 0x92000006, in libfoo.so[ffffa0145000+1000] This is already done on x86, for more details see commit 03252919b798 ("x86: print which shared library/executable faulted in segfault etc. messages v3"). Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
bf396c09 |
|
09-Jun-2017 |
Kristina Martsenko <kristina.martsenko@arm.com> |
arm64: mm: don't print out page table entries on EL0 faults When we take a fault from EL0 that can't be handled, we print out the page table entries associated with the faulting address. This allows userspace to print out any current page table entries, including kernel (TTBR1) entries. Exposing kernel mappings like this could pose a security risk, so don't print out page table information on EL0 faults. (But still print it out for EL1 faults.) This also follows the same behaviour as x86, printing out page table entries on kernel mode faults but not user mode faults. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
67ce16ec |
|
09-Jun-2017 |
Kristina Martsenko <kristina.martsenko@arm.com> |
arm64: mm: print out correct page table entries When we take a fault that can't be handled, we print out the page table entries associated with the faulting address. In some cases we currently print out the wrong entries. For a faulting TTBR1 address, we sometimes print out TTBR0 table entries instead, and for a faulting TTBR0 address we sometimes print out TTBR1 table entries. Fix this by choosing the tables based on the faulting address. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> [will: zero-extend addrs to 64-bit, don't walk swapper w/ TTBR0 addr] Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
c07ab957 |
|
08-May-2017 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
arm64: Call __show_regs directly Generic code expects show_regs() to also dump the stack, but arm64's show_reg() does not do this. Some arm64 callers of show_regs() *only* want the registers dumped, without the stack. To enable generic code to work as expected, we need to make show_regs() dump the stack. Where we only want the registers dumped, we must use __show_regs(). This patch updates code to use __show_regs() where only registers are desired. A subsequent patch will modify show_regs(). Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
b824b930 |
|
05-Apr-2017 |
Stephen Boyd <stephen.boyd@linaro.org> |
arm64: print a fault message when attempting to write RO memory If a page is marked read only we should print out that fact, instead of printing out that there was a page fault. Right now we get a cryptic error message that something went wrong with an unhandled fault, but we don't evaluate the esr to figure out that it was a read/write permission fault. Instead of seeing: Unable to handle kernel paging request at virtual address ffff000008e460d8 pgd = ffff800003504000 [ffff000008e460d8] *pgd=0000000083473003, *pud=0000000083503003, *pmd=0000000000000000 Internal error: Oops: 9600004f [#1] PREEMPT SMP we'll see: Unable to handle kernel write to read-only memory at virtual address ffff000008e760d8 pgd = ffff80003d3de000 [ffff000008e760d8] *pgd=0000000083472003, *pud=0000000083435003, *pmd=0000000000000000 Internal error: Oops: 9600004f [#1] PREEMPT SMP We also add a userspace address check into is_permission_fault() so that the function doesn't return true for ttbr0 PAN faults when it shouldn't. Reviewed-by: James Morse <james.morse@arm.com> Tested-by: James Morse <james.morse@arm.com> Acked-by: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
09a6adf5 |
|
03-Apr-2017 |
Victor Kamensky <kamensky@cisco.com> |
arm64: mm: unaligned access by user-land should be received as SIGBUS After 52d7523 (arm64: mm: allow the kernel to handle alignment faults on user accesses) commit user-land accesses that produce unaligned exceptions like in case of aarch32 ldm/stm/ldrd/strd instructions operating on unaligned memory received by user-land as SIGSEGV. It is wrong, it should be reported as SIGBUS as it was before 52d7523 commit. Changed do_bad_area function to take signal and code parameters out of esr value using fault_info table, so in case of do_alignment_fault fault user-land will receive SIGBUS. Wrapped access to fault_info table into esr_to_fault_info function. Cc: <stable@vger.kernel.org> Fixes: 52d7523 (arm64: mm: allow the kernel to handle alignment faults on user accesses) Signed-off-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
b17b0153 |
|
08-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h> We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/debug.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
3f07c014 |
|
08-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/signal.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
c8b06e3f |
|
09-Jan-2017 |
James Morse <james.morse@arm.com> |
arm64: Remove useless UAO IPI and describe how this gets enabled Since its introduction, the UAO enable call was broken, and useless. commit 2a6dcb2b5f3e ("arm64: cpufeature: Schedule enable() calls instead of calling them via IPI"), fixed the framework so that these calls are scheduled, so that they can modify PSTATE. Now it is just useless. Remove it. UAO is enabled by the code patching which causes get_user() and friends to use the 'ldtr' family of instructions. This relies on the PSTATE.UAO bit being set to match addr_limit, which we do in uao_thread_switch() called via __switch_to(). All that is needed to enable UAO is patch the code, and call schedule(). __apply_alternatives_multi_stop() calls stop_machine() when it modifies the kernel text to enable the alternatives, (including the UAO code in uao_thread_switch()). Once stop_machine() has finished __switch_to() is called to reschedule the original task, this causes PSTATE.UAO to be set appropriately. An explicit enable() call is not needed. Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
|
#
6ef4fb38 |
|
03-Jan-2017 |
Mark Rutland <mark.rutland@arm.com> |
arm64: mm: fix show_pte KERN_CONT fallout Recent changes made KERN_CONT mandatory for continued lines. In the absence of KERN_CONT, a newline may be implicit inserted by the core printk code. In show_pte, we (erroneously) use printk without KERN_CONT for continued prints, resulting in output being split across a number of lines, and not matching the intended output, e.g. [ff000000000000] *pgd=00000009f511b003 , *pud=00000009f4a80003 , *pmd=0000000000000000 Fix this by using pr_cont() for all the continuations. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
78688963 |
|
01-Jul-2016 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Handle faults caused by inadvertent user access with PAN enabled When TTBR0_EL1 is set to the reserved page, an erroneous kernel access to user space would generate a translation fault. This patch adds the checks for the software-set PSR_PAN_BIT to emulate a permission fault and report it accordingly. Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
a8ada146 |
|
28-Oct-2016 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Update the synchronous external abort fault description This patch updates the description of the synchronous external aborts on translation table walks. Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Kees Cook <keescook@chromium.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
7209c868 |
|
18-Oct-2016 |
James Morse <james.morse@arm.com> |
arm64: mm: Set PSTATE.PAN from the cpu_enable_pan() call Commit 338d4f49d6f7 ("arm64: kernel: Add support for Privileged Access Never") enabled PAN by enabling the 'SPAN' feature-bit in SCTLR_EL1. This means the PSTATE.PAN bit won't be set until the next return to the kernel from userspace. On a preemptible kernel we may schedule work that accesses userspace on a CPU before it has done this. Now that cpufeature enable() calls are scheduled via stop_machine(), we can set PSTATE.PAN from the cpu_enable_pan() call. Add WARN_ON_ONCE(in_interrupt()) to check the PSTATE value we updated is not immediately discarded. Reported-by: Tony Thompson <anthony.thompson@arm.com> Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> [will: fixed typo in comment] Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
2a6dcb2b |
|
18-Oct-2016 |
James Morse <james.morse@arm.com> |
arm64: cpufeature: Schedule enable() calls instead of calling them via IPI The enable() call for a cpufeature/errata is called using on_each_cpu(). This issues a cross-call IPI to get the work done. Implicitly, this stashes the running PSTATE in SPSR when the CPU receives the IPI, and restores it when we return. This means an enable() call can never modify PSTATE. To allow PAN to do this, change the on_each_cpu() call to use stop_machine(). This schedules the work on each CPU which allows us to modify PSTATE. This involves changing the protype of all the enable() functions. enable_cpu_capabilities() is called during boot and enables the feature on all online CPUs. This path now uses stop_machine(). CPU features for hotplug'd CPUs are enabled by verify_local_cpu_features() which only acts on the local CPU, and can already modify the running PSTATE as it is called from secondary_start_kernel(). Reported-by: Tony Thompson <anthony.thompson@arm.com> Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
0edfa839 |
|
19-Sep-2016 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
arm64: migrate exception table users off module.h and onto extable.h These files were only including module.h for exception table related functions. We've now separated that content out into its own file "extable.h" so now move over to that and avoid all the extra header content in module.h that we don't really need to compile these files. Cc: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
cab15ce6 |
|
11-Aug-2016 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Introduce execute-only page access permissions The ARMv8 architecture allows execute-only user permissions by clearing the PTE_UXN and PTE_USER bits. However, the kernel running on a CPU implementation without User Access Override (ARMv8.2 onwards) can still access such page, so execute-only page permission does not protect against read(2)/write(2) etc. accesses. Systems requiring such protection must enable features like SECCOMP. This patch changes the arm64 __P100 and __S100 protection_map[] macros to the new __PAGE_EXECONLY attributes. A side effect is that pte_user() no longer triggers for __PAGE_EXECONLY since PTE_USER isn't set. To work around this, the check is done on the PTE_NG bit via the pte_ng() macro. VM_READ is also checked now for page faults. Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
9adeb8e7 |
|
09-Aug-2016 |
Laura Abbott <labbott@redhat.com> |
arm64: Handle el1 synchronous instruction aborts cleanly Executing from a non-executable area gives an ugly message: lkdtm: Performing direct entry EXEC_RODATA lkdtm: attempting ok execution at ffff0000084c0e08 lkdtm: attempting bad execution at ffff000008880700 Bad mode in Synchronous Abort handler detected on CPU2, code 0x8400000e -- IABT (current EL) CPU: 2 PID: 998 Comm: sh Not tainted 4.7.0-rc2+ #13 Hardware name: linux,dummy-virt (DT) task: ffff800077e35780 ti: ffff800077970000 task.ti: ffff800077970000 PC is at lkdtm_rodata_do_nothing+0x0/0x8 LR is at execute_location+0x74/0x88 The 'IABT (current EL)' indicates the error but it's a bit cryptic without knowledge of the ARM ARM. There is also no indication of the specific address which triggered the fault. The increase in kernel page permissions makes hitting this case more likely as well. Handling the case in the vectors gives a much more familiar looking error message: lkdtm: Performing direct entry EXEC_RODATA lkdtm: attempting ok execution at ffff0000084c0840 lkdtm: attempting bad execution at ffff000008880680 Unable to handle kernel paging request at virtual address ffff000008880680 pgd = ffff8000089b2000 [ffff000008880680] *pgd=00000000489b4003, *pud=0000000048904003, *pmd=0000000000000000 Internal error: Oops: 8400000e [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 997 Comm: sh Not tainted 4.7.0-rc1+ #24 Hardware name: linux,dummy-virt (DT) task: ffff800077f9f080 ti: ffff800008a1c000 task.ti: ffff800008a1c000 PC is at lkdtm_rodata_do_nothing+0x0/0x8 LR is at execute_location+0x74/0x88 Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
dcddffd4 |
|
26-Jul-2016 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm: do not pass mm_struct into handle_mm_fault We always have vma->vm_mm around. Link: http://lkml.kernel.org/r/1466021202-61880-8-git-send-email-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2dd0e8d2 |
|
07-Jul-2016 |
Sandeepa Prabhu <sandeepa.s.prabhu@gmail.com> |
arm64: Kprobes with single stepping support Add support for basic kernel probes(kprobes) and jump probes (jprobes) for ARM64. Kprobes utilizes software breakpoint and single step debug exceptions supported on ARM v8. A software breakpoint is placed at the probe address to trap the kernel execution into the kprobe handler. ARM v8 supports enabling single stepping before the break exception return (ERET), with next PC in exception return address (ELR_EL1). The kprobe handler prepares an executable memory slot for out-of-line execution with a copy of the original instruction being probed, and enables single stepping. The PC is set to the out-of-line slot address before the ERET. With this scheme, the instruction is executed with the exact same register context except for the PC (and DAIF) registers. Debug mask (PSTATE.D) is enabled only when single stepping a recursive kprobe, e.g.: during kprobes reenter so that probed instruction can be single stepped within the kprobe handler -exception- context. The recursion depth of kprobe is always 2, i.e. upon probe re-entry, any further re-entry is prevented by not calling handlers and the case counted as a missed kprobe). Single stepping from the x-o-l slot has a drawback for PC-relative accesses like branching and symbolic literals access as the offset from the new PC (slot address) may not be ensured to fit in the immediate value of the opcode. Such instructions need simulation, so reject probing them. Instructions generating exceptions or cpu mode change are rejected for probing. Exclusive load/store instructions are rejected too. Additionally, the code is checked to see if it is inside an exclusive load/store sequence (code from Pratyush). System instructions are mostly enabled for stepping, except MSR/MRS accesses to "DAIF" flags in PSTATE, which are not safe for probing. This also changes arch/arm64/include/asm/ptrace.h to use include/asm-generic/ptrace.h. Thanks to Steve Capper and Pratyush Anand for several suggested Changes. Signed-off-by: Sandeepa Prabhu <sandeepa.s.prabhu@gmail.com> Signed-off-by: David A. Long <dave.long@linaro.org> Signed-off-by: Pratyush Anand <panand@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
e19a6ee2 |
|
20-Jun-2016 |
James Morse <james.morse@arm.com> |
arm64: kernel: Save and restore UAO and addr_limit on exception entry If we take an exception while at EL1, the exception handler inherits the original context's addr_limit and PSTATE.UAO values. To be consistent always reset addr_limit and PSTATE.UAO on (re-)entry to EL1. This prevents accidental re-use of the original context's addr_limit. Based on a similar patch for arm from Russell King. Cc: <stable@vger.kernel.org> # 4.6- Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
541ec870 |
|
30-May-2016 |
Mark Rutland <mark.rutland@arm.com> |
arm64: kill ESR_LNX_EXEC Currently we treat ESR_EL1 bit 24 as software-defined for distinguishing instruction aborts from data aborts, but this bit is architecturally RES0 for instruction aborts, and could be allocated for an arbitrary purpose in future. Additionally, we hard-code the value in entry.S without the mnemonic, making the code difficult to understand. Instead, remove ESR_LNX_EXEC, and distinguish aborts based on the esr, which we already pass to the sole use of ESR_LNX_EXEC. A new helper, is_el0_instruction_abort() is added to make the logic clear. Any instruction aborts taken from EL1 will already have been handled by bad_mode, so we need not handle that case in the helper. For consistency, the existing permission_fault helper is renamed to is_permission_fault, and the return type is changed to bool. There should be no functional changes as the return value was a boolean expression, and the result is only used in another boolean expression. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Dave P Martin <dave.martin@arm.com> Cc: Huang Shijie <shijie.huang@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
275f344b |
|
30-May-2016 |
Mark Rutland <mark.rutland@arm.com> |
arm64: add macro to extract ESR_ELx.EC Several places open-code extraction of the EC field from an ESR_ELx value, in subtly different ways. This is unfortunate duplication and variation, and the precise logic used to extract the field is a distraction. This patch adds a new macro, ESR_ELx_EC(), to extract the EC field from an ESR_ELx value in a consistent fashion. Existing open-coded extractions in core arm64 code are moved over to the new helper. KVM code is left as-is for the moment. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Huang Shijie <shijie.huang@arm.com> Cc: Dave P Martin <dave.martin@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
bbb1681e |
|
13-Jun-2016 |
Mark Rutland <mark.rutland@arm.com> |
arm64: mm: mark fault_info table const Unlike the debug_fault_info table, we never intentionally alter the fault_info table at runtime, and all derived pointers are treated as const currently. Make the table const so that it can be placed in .rodata and protected from unintentional writes, as we do for the syscall tables. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
0106d456 |
|
07-Jun-2016 |
Will Deacon <will@kernel.org> |
arm64: mm: always take dirty state from new pte in ptep_set_access_flags Commit 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM") ensured that pte flags are updated atomically in the face of potential concurrent, hardware-assisted updates. However, Alex reports that: | This patch breaks swapping for me. | In the broken case, you'll see either systemd cpu time spike (because | it's stuck in a page fault loop) or the system hang (because the | application owning the screen is stuck in a page fault loop). It turns out that this is because the 'dirty' argument to ptep_set_access_flags is always 0 for read faults, and so we can't use it to set PTE_RDONLY. The failing sequence is: 1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte 2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF 3. A read faults due to the missing access flag 4. ptep_set_access_flags is called with dirty = 0, due to the read fault 5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!) 6. A write faults, but pte_write is true so we get stuck The solution is to check the new page table entry (as would be done by the generic, non-atomic definition of ptep_set_access_flags that just calls set_pte_at) to establish the dirty state. Cc: <stable@vger.kernel.org> # 4.3+ Fixes: 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM") Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Alexander Graf <agraf@suse.de> Tested-by: Alexander Graf <agraf@suse.de> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
3a72db70 |
|
17-Apr-2016 |
Huang Shijie <shijie.huang@arm.com> |
arm64: mm: remove the redundant code We already re-enable interrupts where necessary in the entry code, so there is no need to do it again in do_page fault. This patch removes the redundant code. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Huang Shijie <shijie.huang@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
66dbd6e6 |
|
13-Apr-2016 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Implement ptep_set_access_flags() for hardware AF/DBM When hardware updates of the access and dirty states are enabled, the default ptep_set_access_flags() implementation based on calling set_pte_at() directly is potentially racy. This triggers the "racy dirty state clearing" warning in set_pte_at() because an existing writable PTE is overridden with a clean entry. There are two main scenarios for this situation: 1. The CPU getting an access fault does not support hardware updates of the access/dirty flags. However, a different agent in the system (e.g. SMMU) can do this, therefore overriding a writable entry with a clean one could potentially lose the automatically updated dirty status 2. A more complex situation is possible when all CPUs support hardware AF/DBM: a) Initial state: shareable + writable vma and pte_none(pte) b) Read fault taken by two threads of the same process on different CPUs c) CPU0 takes the mmap_sem and proceeds to handling the fault. It eventually reaches do_set_pte() which sets a writable + clean pte. CPU0 releases the mmap_sem d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The pte entry it reads is present, writable and clean and it continues to pte_mkyoung() e) CPU1 calls ptep_set_access_flags() If between (d) and (e) the hardware (another CPU) updates the dirty state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit marking the entry clean again. This patch implements an arm64-specific ptep_set_access_flags() function to perform an atomic update of the PTE flags. Fixes: 2f4b829c625e ("arm64: Add support for hardware updates of the access and dirty pte bits") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Ming Lei <tom.leiming@gmail.com> Tested-by: Julien Grall <julien.grall@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> # 4.3+ [will: reworded comment] Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
6afedcd2 |
|
13-Apr-2016 |
James Morse <james.morse@arm.com> |
arm64: mm: Add trace_irqflags annotations to do_debug_exception() With CONFIG_PROVE_LOCKING, CONFIG_DEBUG_LOCKDEP and CONFIG_TRACE_IRQFLAGS enabled, lockdep will compare current->hardirqs_enabled with the flags from local_irq_save(). When a debug exception occurs, interrupts are disabled in entry.S, but lockdep isn't told, resulting in: DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) ------------[ cut here ]------------ WARNING: at ../kernel/locking/lockdep.c:3523 Modules linked in: CPU: 3 PID: 1752 Comm: perf Not tainted 4.5.0-rc4+ #2204 Hardware name: ARM Juno development board (r1) (DT) task: ffffffc974868000 ti: ffffffc975f40000 task.ti: ffffffc975f40000 PC is at check_flags.part.35+0x17c/0x184 LR is at check_flags.part.35+0x17c/0x184 pc : [<ffffff80080fc93c>] lr : [<ffffff80080fc93c>] pstate: 600003c5 [...] ---[ end trace 74631f9305ef5020 ]--- Call trace: [<ffffff80080fc93c>] check_flags.part.35+0x17c/0x184 [<ffffff80080ffe30>] lock_acquire+0xa8/0xc4 [<ffffff8008093038>] breakpoint_handler+0x118/0x288 [<ffffff8008082434>] do_debug_exception+0x3c/0xa8 [<ffffff80080854b4>] el1_dbg+0x18/0x6c [<ffffff80081e82f4>] do_filp_open+0x64/0xdc [<ffffff80081d6e60>] do_sys_open+0x140/0x204 [<ffffff80081d6f58>] SyS_openat+0x10/0x18 [<ffffff8008085d30>] el0_svc_naked+0x24/0x28 possible reason: unannotated irqs-off. irq event stamp: 65857 hardirqs last enabled at (65857): [<ffffff80081fb1c0>] lookup_mnt+0xf4/0x1b4 hardirqs last disabled at (65856): [<ffffff80081fb188>] lookup_mnt+0xbc/0x1b4 softirqs last enabled at (65790): [<ffffff80080bdca4>] __do_softirq+0x1f8/0x290 softirqs last disabled at (65757): [<ffffff80080be038>] irq_exit+0x9c/0xd0 This patch adds the annotations to do_debug_exception(), while trying not to call trace_hardirqs_off() if el1_dbg() interrupted a task that already had irqs disabled. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
0e8fb931 |
|
17-Mar-2016 |
Jan Kara <jack@suse.cz> |
mm: remove VM_FAULT_MINOR The define has a comment from Nick Piggin from 2007: /* For backwards compat. Remove me quickly. */ I guess 9 years should not be too hurried sense of 'quickly' even for kernel measures. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
70c8abc2 |
|
19-Feb-2016 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: User die() instead of panic() in do_page_fault() The former gives better error reporting on unhandled permission faults (introduced by the UAO patches). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
52d7523d |
|
15-Feb-2016 |
EunTaik Lee <eun.taik.lee@samsung.com> |
arm64: mm: allow the kernel to handle alignment faults on user accesses Although we don't expect to take alignment faults on access to normal memory, misbehaving (i.e. buggy) user code can pass MMIO pointers into system calls, leading to things like get_user accessing device memory. Rather than OOPS the kernel, allow any exception fixups to run and return something like -EFAULT back to userspace. This makes the behaviour more consistent with userspace, even though applications with access to device mappings can easily cause other issues if they try hard enough. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Eun Taik Lee <eun.taik.lee@samsung.com> [will: dropped __kprobes annotation and rewrote commit mesage] Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
e950631e |
|
18-Feb-2016 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Remove the get_thread_info() function This function was introduced by previous commits implementing UAO. However, it can be replaced with task_thread_info() in uao_thread_switch() or get_fs() in do_page_fault() (the latter being called only on the current context, so no need for using the saved pt_regs). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
70544196 |
|
05-Feb-2016 |
James Morse <james.morse@arm.com> |
arm64: kernel: Don't toggle PAN on systems with UAO If a CPU supports both Privileged Access Never (PAN) and User Access Override (UAO), we don't need to disable/re-enable PAN round all copy_to_user() like calls. UAO alternatives cause these calls to use the 'unprivileged' load/store instructions, which are overridden to be the privileged kind when fs==KERNEL_DS. This patch changes the copy_to_user() calls to have their PAN toggling depend on a new composite 'feature' ARM64_ALT_PAN_NOT_UAO. If both features are detected, PAN will be enabled, but the copy_to_user() alternatives will not be applied. This means PAN will be enabled all the time for these functions. If only PAN is detected, the toggling will be enabled as normal. This will save the time taken to disable/re-enable PAN, and allow us to catch copy_to_user() accesses that occur with fs==KERNEL_DS. Futex and swp-emulation code continue to hang their PAN toggling code on ARM64_HAS_PAN. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
57f4959b |
|
05-Feb-2016 |
James Morse <james.morse@arm.com> |
arm64: kernel: Add support for User Access Override 'User Access Override' is a new ARMv8.2 feature which allows the unprivileged load and store instructions to be overridden to behave in the normal way. This patch converts {get,put}_user() and friends to use ldtr*/sttr* instructions - so that they can only access EL0 memory, then enables UAO when fs==KERNEL_DS so that these functions can access kernel memory. This allows user space's read/write permissions to be checked against the page tables, instead of testing addr<USER_DS, then using the kernel's read/write permissions. Signed-off-by: James Morse <james.morse@arm.com> [catalin.marinas@arm.com: move uao_thread_switch() above dsb()] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
c03784ee8 |
|
23-Nov-2015 |
Mark Rutland <mark.rutland@arm.com> |
arm64: mm: fix fault_info table xFSC decoding We are missing descriptions for some valid xFSC values in the fault info table (e.g. "TLB conflict abort"), and have erroneous descriptions for reserved values (e.g. "asynchronous external abort", "debug event"). This patch adds the missing xFSC values, and removes erroneous decoding of values reserved by the architecture, as described in ARM DDI 0487A.h. At the same time, fixed the unbalanced brackets for the synchronous parity error strings in the table. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
dbb4e152 |
|
19-Oct-2015 |
Suzuki K. Poulose <suzuki.poulose@arm.com> |
arm64: Delay cpu feature capability checks At the moment we run through the arm64_features capability list for each CPU and set the capability if one of the CPU supports it. This could be problematic in a heterogeneous system with differing capabilities. Delay the CPU feature checks until all the enabled CPUs are up(i.e, smp_cpus_done(), so that we can make better decisions based on the overall system capability. Once we decide and advertise the capabilities the alternatives can be applied. From this state, we cannot roll back a feature to disabled based on the values from a new hotplugged CPU, due to the runtime patching and other reasons. So, for all new CPUs, we need to make sure that they have the established system capabilities. Failing which, we bring the CPU down, preventing it from turning online. Once the capabilities are decided, any new CPU booting up goes through verification to ensure that it has all the enabled capabilities and also invokes the respective enable() method on the CPU. The CPU errata checks are not delayed and is still executed per-CPU to detect the respective capabilities. If we ever come across a non-errata capability that needs to be checked on each-CPU, we could introduce them via a new capability table(or introduce a flag), which can be processed per CPU. The next patch will make the feature checks use the system wide safe value of a feature register. NOTE: The enable() methods associated with the capability is scheduled on all the CPUs (which is the only use case at the moment). If we need a different type of 'enable()' which only needs to be run once on any CPU, we should be able to handle that when needed. Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Tested-by: Dave Martin <Dave.Martin@arm.com> [catalin.marinas@arm.com: static variable and coding style fixes] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
569ba74a |
|
21-Sep-2015 |
Mark Salyzyn <salyzyn@android.com> |
arm64: readahead: fault retry breaks mmap file read random detection This is the arm64 portion of commit 45cac65b0fcd ("readahead: fault retry breaks mmap file read random detection"), which was absent from the initial port and has since gone unnoticed. The original commit says: > .fault now can retry. The retry can break state machine of .fault. In > filemap_fault, if page is miss, ra->mmap_miss is increased. In the second > try, since the page is in page cache now, ra->mmap_miss is decreased. And > these are done in one fault, so we can't detect random mmap file access. > > Add a new flag to indicate .fault is tried once. In the second try, skip > ra->mmap_miss decreasing. The filemap_fault state machine is ok with it. With this change, Mark reports that: > Random read improves by 250%, sequential read improves by 40%, and > random write by 400% to an eMMC device with dm crypto wrapped around it. Cc: Shaohua Li <shli@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Riley Andrews <riandrews@android.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
9fb7410f |
|
24-Jul-2015 |
Dave P Martin <Dave.Martin@arm.com> |
arm64/BUG: Use BRK instruction for generic BUG traps Currently, the minimal default BUG() implementation from asm- generic is used for arm64. This patch uses the BRK software breakpoint instruction to generate a trap instead, similarly to most other arches, with the generic BUG code generating the dmesg boilerplate. This allows bug metadata to be moved to a separate table and reduces the amount of inline code at BUG and WARN sites. This also avoids clobbering any registers before they can be dumped. To mitigate the size of the bug table further, this patch makes use of the existing infrastructure for encoding addresses within the bug table as 32-bit offsets instead of absolute pointers. (Note that this limits the kernel size to 2GB.) Traps are registered at arch_initcall time for aarch64, but BUG has minimal real dependencies and it is desirable to be able to generate bug splats as early as possible. This patch redirects all debug exceptions caused by BRK directly to bug_handler() until the full debug exception support has been initialised. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
338d4f49 |
|
22-Jul-2015 |
James Morse <james.morse@arm.com> |
arm64: kernel: Add support for Privileged Access Never 'Privileged Access Never' is a new arm8.1 feature which prevents privileged code from accessing any virtual address where read or write access is also permitted at EL0. This patch enables the PAN feature on all CPUs, and modifies {get,put}_user helpers temporarily to permit access. This will catch kernel bugs where user memory is accessed directly. 'Unprivileged loads and stores' using ldtrb et al are unaffected by PAN. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: James Morse <james.morse@arm.com> [will: use ALTERNATIVE in asm and tidy up pan_enable check] Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
f871d268 |
|
03-Jul-2015 |
Suzuki K. Poulose <suzuki.poulose@arm.com> |
arm64: Fix show_unhandled_signal_ratelimited usage Commit 86dca36e6ba introduced ratelimited usage for 'unhandled_signal' messages. The commit checks the ratelimit irrespective of whether the signal is handled or not, which is wrong and leads to false reports like the below in dmesg : __do_user_fault: 127 callbacks suppressed Do the ratelimit check only if the signal is unhandled. Fixes: 86dca36e6ba0 ("arm64: use private ratelimit state along with show_unhandled_signals") Cc: Vladimir Murzin <Vladimir.Murzin@arm.com> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
86dca36e |
|
19-Jun-2015 |
Vladimir Murzin <vladimir.murzin@arm.com> |
arm64: use private ratelimit state along with show_unhandled_signals printk_ratelimit() shares the ratelimiting state with other callers what may lead to scenarios where at the time we want to print out debug information we already limited, so nothing appears in the dmesg - this makes exception-trace quite poor helper in debugging. Additionally, we have imbalance with some messages limited with global ratelimit state and other messages limited with their private state defined via pr_*_ratelimited(). To address this inconsistency show_unhandled_signals_ratelimited() macro is introduced and caller sites are converted to use it. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
9e793ab8 |
|
19-Jun-2015 |
Vladimir Murzin <vladimir.murzin@arm.com> |
arm64: show unhandled SP/PC alignment faults Report unhandled SP/PC alignment faults if the show_unhandled_signals variable is set (via /proc/sys/debug/exception-trace). Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
70ffdb93 |
|
11-May-2015 |
David Hildenbrand <dahi@linux.vnet.ibm.com> |
mm/fault, arch: Use pagefault_disable() to check for disabled pagefaults in the handler Introduce faulthandler_disabled() and use it to check for irq context and disabled pagefaults (via pagefault_disable()) in the pagefault handlers. Please note that we keep the in_atomic() checks in place - to detect whether in irq context (in which case preemption is always properly disabled). In contrast, preempt_disable() should never be used to disable pagefaults. With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt counter, and therefore the result of in_atomic() differs. We validate that condition by using might_fault() checks when calling might_sleep(). Therefore, add a comment to faulthandler_disabled(), describing why this is needed. faulthandler_disabled() and pagefault_disable() are defined in linux/uaccess.h, so let's properly add that include to all relevant files. This patch is based on a patch from Thomas Gleixner. Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: benh@kernel.crashing.org Cc: bigeasy@linutronix.de Cc: borntraeger@de.ibm.com Cc: daniel.vetter@intel.com Cc: heiko.carstens@de.ibm.com Cc: herbert@gondor.apana.org.au Cc: hocko@suse.cz Cc: hughd@google.com Cc: mst@redhat.com Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: schwidefsky@de.ibm.com Cc: yang.shi@windriver.com Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
aed40e01 |
|
23-Nov-2014 |
Mark Rutland <mark.rutland@arm.com> |
arm64: move to ESR_ELx macros Now that we have common ESR_ELx_* macros, move the core arm64 code over to them. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Will Deacon <will.deacon@arm.com>
|
#
7f73f7ae |
|
21-Nov-2014 |
Will Deacon <will@kernel.org> |
arm64: mm: report unhandled level-0 translation faults correctly Translation faults that occur due to the input address being outside of the address range mapped by the relevant base register are reported as level 0 faults in ESR.DFSC. If the faulting access cannot be resolved by the kernel (e.g. because it is not mapped by a vma), then we report "input address range fault" on the console. This was fine until we added support for 48-bit VAs, which actually place PGDs at level 0 and can trigger faults for invalid addresses that are within the range of the page tables. This patch changes the string to report "level 0 translation fault", which is far less confusing. Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
c79b954b |
|
12-May-2014 |
Jungseok Lee <jays.lee@samsung.com> |
arm64: mm: Implement 4 levels of translation tables This patch implements 4 levels of translation tables since 3 levels of page tables with 4KB pages cannot support 40-bit physical address space described in [1] due to the following issue. It is a restriction that kernel logical memory map with 4KB + 3 levels (0xffffffc000000000-0xffffffffffffffff) cannot cover RAM region from 544GB to 1024GB in [1]. Specifically, ARM64 kernel fails to create mapping for this region in map_mem function since __phys_to_virt for this region reaches to address overflow. If SoC design follows the document, [1], over 32GB RAM would be placed from 544GB. Even 64GB system is supposed to use the region from 544GB to 576GB for only 32GB RAM. Naturally, it would reach to enable 4 levels of page tables to avoid hacking __virt_to_phys and __phys_to_virt. However, it is recommended 4 levels of page table should be only enabled if memory map is too sparse or there is about 512GB RAM. References ---------- [1]: Principles of ARM Memory Maps, White Paper, Issue C Signed-off-by: Jungseok Lee <jays.lee@samsung.com> Reviewed-by: Sungjinn Chung <sungjinn.chung@samsung.com> Acked-by: Kukjin Kim <kgene.kim@samsung.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Steve Capper <steve.capper@linaro.org> [catalin.marinas@arm.com: MEMBLOCK_INITIAL_LIMIT removed, same as PUD_SIZE] [catalin.marinas@arm.com: early_ioremap_init() updated for 4 levels] [catalin.marinas@arm.com: 48-bit VA depends on BROKEN until KVM is fixed] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Jungseok Lee <jungseoklee85@gmail.com>
|
#
5a0fdfad |
|
16-May-2014 |
Catalin Marinas <catalin.marinas@arm.com> |
Revert "arm64: Introduce execute-only page access permissions" This reverts commit bc07c2c6e9ed125d362af0214b6313dca180cb08. While the aim is increased security for --x memory maps, it does not protect against kernel level reads. Until SECCOMP is implemented for arm64, revert this patch to avoid giving a false idea of execute-only mappings. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
bc07c2c6 |
|
03-Apr-2014 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Introduce execute-only page access permissions The ARMv8 architecture allows execute-only user permissions by clearing the PTE_UXN and PTE_USER bits. The kernel, however, can still access such page, so execute-only page permission does not protect against read(2)/write(2) etc. accesses. Systems requiring such protection must implement/enable features like SECCOMP. This patch changes the arm64 __P100 and __S100 protection_map[] macros to the new __PAGE_EXECONLY attributes. A side effect is that pte_valid_user() no longer triggers for __PAGE_EXECONLY since PTE_USER isn't set. To work around this, the check is done on the PTE_NG bit via the pte_valid_ng() macro. VM_READ is also checked now for page faults. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
9141300a |
|
06-Apr-2014 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Provide read/write fault information in compat signal handlers For AArch32, bit 11 (WnR) of the FSR/ESR register is set when the fault was caused by a write access and applications like Qemu rely on such information being provided in sigcontext. This patch introduces the ESR_EL1 tracking for the arm64 kernel faults and sets bit 11 accordingly in compat sigcontext. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
59f67e16 |
|
16-Sep-2013 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Make do_bad_area() function static This function is only called from arch/arm64/mm/fault.c. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
759496ba |
|
12-Sep-2013 |
Johannes Weiner <hannes@cmpxchg.org> |
arch: mm: pass userspace fault flag to generic fault handler Unlike global OOM handling, memory cgroup code will invoke the OOM killer in any OOM situation because it has no way of telling faults occuring in kernel context - which could be handled more gracefully - from user-triggered faults. Pass a flag that identifies faults originating in user space from the architecture-specific fault handlers to generic code so that memcg OOM handling can be improved. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: azurIt <azurit@pobox.sk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
87134102 |
|
12-Sep-2013 |
Johannes Weiner <hannes@cmpxchg.org> |
arch: mm: do not invoke OOM killer on kernel fault OOM Kernel faults are expected to handle OOM conditions gracefully (gup, uaccess etc.), so they should never invoke the OOM killer. Reserve this for faults triggered in user context when it is the only option. Most architectures already do this, fix up the remaining few. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: azurIt <azurit@pobox.sk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
db6f4106 |
|
19-Jul-2013 |
Will Deacon <will@kernel.org> |
arm64: mm: don't treat user cache maintenance faults as writes On arm64, cache maintenance faults appear as data aborts with the CM bit set in the ESR. The WnR bit, usually used to distinguish between faulting loads and stores, always reads as 1 and (slightly confusingly) the instructions are treated as reads by the architecture. This patch fixes our fault handling code to treat cache maintenance faults in the same way as loads. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
084bd298 |
|
10-Apr-2013 |
Steve Capper <steve.capper@linaro.org> |
ARM64: mm: HugeTLB support. Add huge page support to ARM64, different huge page sizes are supported depending on the size of normal pages: PAGE_SIZE is 4KB: 2MB - (pmds) these can be allocated at any time. 1024MB - (puds) usually allocated on bootup with the command line with something like: hugepagesz=1G hugepages=6 PAGE_SIZE is 64KB: 512MB - (pmds) usually allocated on bootup via command line. Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
953dbbed |
|
20-May-2013 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Do not report user faults for handled signals Currently user faults (page, undefined instruction) are always reported even though the user may have a signal handler for them. This patch adds unhandled_signal() check together with printk_ratelimit() for these cases. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
0e7f7bcc |
|
07-May-2013 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Ignore the 'write' ESR flag on cache maintenance faults ESR.WnR bit is always set on data cache maintenance faults even though the page is not required to have write permission. If a translation fault (page not yet mapped) happens for read-only user address range, Linux incorrectly assumes a permission fault. This patch adds the check of the ESR.CM bit during the page fault handling to ignore the 'write' flag. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Tim Northover <Tim.Northover@arm.com> Cc: stable@vger.kernel.org
|
#
4339e3f3 |
|
19-Apr-2013 |
Steve Capper <steve.capper@linaro.org> |
arm64: mm: Correct show_pte behaviour show_pte makes use of the *_none_or_clear_bad style functions. If a pgd, pud or pmd is identified as being bad, it will then be cleared. As show_pte appears to be called from either the user or kernel fault handlers this side effect can lead to unpredictable behaviour; especially as TLB entries are not invalidated. This patch removes the page table sanitisation from show_pte. If a bad pgd, pud or pmd is encountered it is left unmodified. Signed-off-by: Steve Capper <steve.capper@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
3495386b |
|
24-Oct-2012 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: Make the user fault reporting more specific For user space faults the kernel reports "unhandled page fault" and it gives the ESR value. With this patch the error message looked up in the fault info array to give a better description. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
1d18c47c |
|
05-Mar-2012 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: MMU fault handling and page table management This patch adds support for the handling of the MMU faults (exception entry code introduced by a previous patch) and page table management. The user translation table is pointed to by TTBR0 and the kernel one (swapper_pg_dir) by TTBR1. There is no translation information shared or address space overlapping between user and kernel page tables. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Olof Johansson <olof@lixom.net> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Acked-by: Arnd Bergmann <arnd@arndb.de>
|