History log of /linux-master/arch/arm64/mm/fault.c
Revision Date Author Comments
# 5a00bfd6 15-Feb-2024 Ryan Roberts <ryan.roberts@arm.com>

arm64/mm: new ptep layer to manage contig bit

Create a new layer for the in-table PTE manipulation APIs. For now, The
existing API is prefixed with double underscore to become the arch-private
API and the public API is just a simple wrapper that calls the private
API.

The public API implementation will subsequently be used to transparently
manipulate the contiguous bit where appropriate. But since there are
already some contig-aware users (e.g. hugetlb, kernel mapper), we must
first ensure those users use the private API directly so that the future
contig-bit manipulations in the public API do not interfere with those
existing uses.

The following APIs are treated this way:

- ptep_get
- set_pte
- set_ptes
- pte_clear
- ptep_get_and_clear
- ptep_test_and_clear_young
- ptep_clear_flush_young
- ptep_set_wrprotect
- ptep_set_access_flags

Link: https://lkml.kernel.org/r/20240215103205.2607016-11-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Barry Song <21cnbao@gmail.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 659e1930 15-Feb-2024 Ryan Roberts <ryan.roberts@arm.com>

arm64/mm: convert set_pte_at() to set_ptes(..., 1)

Since set_ptes() was introduced, set_pte_at() has been implemented as a
generic macro around set_ptes(..., 1). So this change should continue to
generate the same code. However, making this change prepares us for the
transparent contpte support. It means we can reroute set_ptes() to
__set_ptes(). Since set_pte_at() is a generic macro, there will be no
equivalent __set_pte_at() to reroute to.

Note that a couple of calls to set_pte_at() remain in the arch code. This
is intentional, since those call sites are acting on behalf of core-mm and
should continue to call into the public set_ptes() rather than the
arch-private __set_ptes().

Link: https://lkml.kernel.org/r/20240215103205.2607016-9-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Barry Song <21cnbao@gmail.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 53273655 15-Feb-2024 Ryan Roberts <ryan.roberts@arm.com>

arm64/mm: convert READ_ONCE(*ptep) to ptep_get(ptep)

There are a number of places in the arch code that read a pte by using the
READ_ONCE() macro. Refactor these call sites to instead use the
ptep_get() helper, which itself is a READ_ONCE(). Generated code should
be the same.

This will benefit us when we shortly introduce the transparent contpte
support. In this case, ptep_get() will become more complex so we now have
all the code abstracted through it.

Link: https://lkml.kernel.org/r/20240215103205.2607016-8-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Barry Song <21cnbao@gmail.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 7ac8d5b2 14-Feb-2024 Ard Biesheuvel <ardb@kernel.org>

arm64: Add ESR decoding for exceptions involving translation level -1

The LPA2 feature introduces new FSC values to report abort exceptions
related to translation level -1. Define these and wire them up.

Reuse the new ESR FSC classification helpers that arrived via the KVM
arm64 tree, and update the one for translation faults to check
specifically for a translation fault at level -1. (Access flag or
permission faults cannot occur at level -1 because they alway involve a
descriptor at the superior level so changing those helpers is not
needed).

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240214122845.2033971-73-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 46e714c7 26-Dec-2023 Suren Baghdasaryan <surenb@google.com>

arch/mm/fault: fix major fault accounting when retrying under per-VMA lock

A test [1] in Android test suite started failing after [2] was merged. It
turns out that after handling a major fault under per-VMA lock, the
process major fault counter does not register that fault as major. Before
[2] read faults would be done under mmap_lock, in which case
FAULT_FLAG_TRIED flag is set before retrying. That in turn causes
mm_account_fault() to account the fault as major once retry completes.
With per-VMA locks we often retry because a fault can't be handled without
locking the whole mm using mmap_lock. Therefore such retries do not set
FAULT_FLAG_TRIED flag. This logic does not work after [2] because we can
now handle read major faults under per-VMA lock and upon retry the fact
there was a major fault gets lost. Fix this by setting FAULT_FLAG_TRIED
after retrying under per-VMA lock if VM_FAULT_MAJOR was returned. Ideally
we would use an additional VM_FAULT bit to indicate the reason for the
retry (could not handle under per-VMA lock vs other reason) but this
simpler solution seems to work, so keeping it simple.

[1] https://cs.android.com/android/platform/superproject/+/master:test/vts-testcase/kernel/api/drop_caches_prop/drop_caches_test.cpp
[2] https://lore.kernel.org/all/20231006195318.4087158-6-willy@infradead.org/

Link: https://lkml.kernel.org/r/20231226214610.109282-1-surenb@google.com
Fixes: 12214eba1992 ("mm: handle read faults under the VMA lock")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 4e00f1d9 16-Oct-2023 Mark Rutland <mark.rutland@arm.com>

arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN

We use cpus_have_const_cap() to check for ARM64_HAS_EPAN but this is not
necessary and alternative_has_cap() or cpus_have_cap() would be
preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

The ARM64_HAS_EPAN cpucap is used to affect two things:

1) The permision bits used for userspace executable mappings, which are
chosen by adjust_protection_map(), which is an arch_initcall. This is
called after the ARM64_HAS_EPAN cpucap has been detected and
alternatives have been patched, and before any userspace translation
tables exist.

2) The handling of faults taken from (user or kernel) accesses to
userspace executable mappings in do_page_fault(). Userspace
translation tables are created after adjust_protection_map() is
called, and hence after the ARM64_HAS_EPAN cpucap has been detected
and alternatives have been patched.

Neither of these run until after ARM64_HAS_EPAN cpucap has been detected
and alternatives have been patched, and hence there's no need to use
cpus_have_const_cap(). Since adjust_protection_map() is only executed
once at boot time it would be best for it to use cpus_have_cap(), and
since do_page_fault() is executed frequently it would be best for it to
use alternatives_have_cap_unlikely().

This patch replaces the uses of cpus_have_const_cap() with
cpus_have_cap() and alternative_has_cap_unlikely(), which will avoid
generating redundant code, and should be better for all subsequent calls
at runtime. The ARM64_HAS_EPAN cpucap is added to cpucap_is_possible()
so that code can be elided entirely when this is not possible.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 4089eef0 30-Jun-2023 Suren Baghdasaryan <surenb@google.com>

mm: drop per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED

handle_mm_fault returning VM_FAULT_RETRY or VM_FAULT_COMPLETED means
mmap_lock has been released. However with per-VMA locks behavior is
different and the caller should still release it. To make the rules
consistent for the caller, drop the per-VMA lock when returning
VM_FAULT_RETRY or VM_FAULT_COMPLETED. Currently the only path returning
VM_FAULT_RETRY under per-VMA locks is do_swap_page and no path returns
VM_FAULT_COMPLETED for now.

[willy@infradead.org: fix riscv]
Link: https://lkml.kernel.org/r/CAJuCfpE6GWEx1rPBmNpUfoD5o-gNFz9-UFywzCE2PbEGBiVz7g@mail.gmail.com
Link: https://lkml.kernel.org/r/20230630211957.1341547-4-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 284e0592 24-Jul-2023 Matthew Wilcox (Oracle) <willy@infradead.org>

mm: remove CONFIG_PER_VMA_LOCK ifdefs

Patch series "Handle most file-backed faults under the VMA lock", v3.

This patchset adds the ability to handle page faults on parts of files
which are already in the page cache without taking the mmap lock.


This patch (of 10):

Provide lock_vma_under_rcu() when CONFIG_PER_VMA_LOCK is not defined to
eliminate ifdefs in the users.

Link: https://lkml.kernel.org/r/20230724185410.1124082-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20230724185410.1124082-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 24be4d0b 03-Jul-2023 SeongJae Park <sj@kernel.org>

arch/arm64/mm/fault: Fix undeclared variable error in do_page_fault()

Commit ae870a68b5d1 ("arm64/mm: Convert to using
lock_mm_and_find_vma()") made do_page_fault() to use 'vma' even if
CONFIG_PER_VMA_LOCK is not defined, but the declaration is still in the
ifdef.

As a result, building kernel without the config fails with undeclared
variable error as below:

arch/arm64/mm/fault.c: In function 'do_page_fault':
arch/arm64/mm/fault.c:624:2: error: 'vma' undeclared (first use in this function); did you mean 'vmap'?
624 | vma = lock_mm_and_find_vma(mm, addr, regs);
| ^~~
| vmap

Fix it by moving the declaration out of the ifdef.

Fixes: ae870a68b5d1 ("arm64/mm: Convert to using lock_mm_and_find_vma()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ae870a68 15-Jun-2023 Linus Torvalds <torvalds@linux-foundation.org>

arm64/mm: Convert to using lock_mm_and_find_vma()

This converts arm64 to use the new page fault helper. It was very
straightforward, but still needed a fix for the "obvious" conversion I
initially did. Thanks to Suren for the fix and testing.

Fixed-and-tested-by: Suren Baghdasaryan <surenb@google.com>
Unnecessary-code-removal-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 52924726 08-Jun-2023 Hugh Dickins <hughd@google.com>

arm64: allow pte_offset_map() to fail

In rare transient cases, not yet made possible, pte_offset_map() and
pte_offset_map_lock() may not find a page table: handle appropriately.

Link: https://lkml.kernel.org/r/35e46485-8499-4337-c51f-b8fa495a1a93@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# bb6e04a1 09-May-2023 Arnd Bergmann <arnd@arndb.de>

kasan: use internal prototypes matching gcc-13 builtins

gcc-13 warns about function definitions for builtin interfaces that have a
different prototype, e.g.:

In file included from kasan_test.c:31:
kasan.h:574:6: error: conflicting types for built-in function '__asan_register_globals'; expected 'void(void *, long int)' [-Werror=builtin-declaration-mismatch]
574 | void __asan_register_globals(struct kasan_global *globals, size_t size);
kasan.h:577:6: error: conflicting types for built-in function '__asan_alloca_poison'; expected 'void(void *, long int)' [-Werror=builtin-declaration-mismatch]
577 | void __asan_alloca_poison(unsigned long addr, size_t size);
kasan.h:580:6: error: conflicting types for built-in function '__asan_load1'; expected 'void(void *)' [-Werror=builtin-declaration-mismatch]
580 | void __asan_load1(unsigned long addr);
kasan.h:581:6: error: conflicting types for built-in function '__asan_store1'; expected 'void(void *)' [-Werror=builtin-declaration-mismatch]
581 | void __asan_store1(unsigned long addr);
kasan.h:643:6: error: conflicting types for built-in function '__hwasan_tag_memory'; expected 'void(void *, unsigned char, long int)' [-Werror=builtin-declaration-mismatch]
643 | void __hwasan_tag_memory(unsigned long addr, u8 tag, unsigned long size);

The two problems are:

- Addresses are passes as 'unsigned long' in the kernel, but gcc-13
expects a 'void *'.

- sizes meant to use a signed ssize_t rather than size_t.

Change all the prototypes to match these. Using 'void *' consistently for
addresses gets rid of a couple of type casts, so push that down to the
leaf functions where possible.

This now passes all randconfig builds on arm, arm64 and x86, but I have
not tested it on the other architectures that support kasan, since they
tend to fail randconfig builds in other ways. This might fail if any of
the 32-bit architectures expect a 'long' instead of 'int' for the size
argument.

The __asan_allocas_unpoison() function prototype is somewhat weird, since
it uses a pointer for 'stack_top' and an size_t for 'stack_bottom'. This
looks like it is meant to be 'addr' and 'size' like the others, but the
implementation clearly treats them as 'top' and 'bottom'.

Link: https://lkml.kernel.org/r/20230509145735.9263-2-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 0e2aba69 24-May-2023 Jisheng Zhang <jszhang@kernel.org>

arm64: mm: pass original fault address to handle_mm_fault() in PER_VMA_LOCK block

When reading the arm64's PER_VMA_LOCK support code, I found a bit
difference between arm64 and other arch when calling handle_mm_fault()
during VMA lock-based page fault handling: the fault address is masked
before passing to handle_mm_fault(). This is also different from the
usage in mmap_lock-based handling. I think we need to pass the
original fault address to handle_mm_fault() as we did in
commit 84c5e23edecd ("arm64: mm: Pass original fault address to
handle_mm_fault()").

If we go through the code path further, we can find that the "masked"
fault address can cause mismatched fault address between perf sw
major/minor page fault sw event and perf page fault sw event:

do_page_fault
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, ..., addr) // orig addr
handle_mm_fault
mm_account_fault
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, ...) // masked addr

Fixes: cd7f176aea5f ("arm64/mm: try VMA lock-based page fault handling first")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230524131305.2808-1-jszhang@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>


# 1f9d4ba6 11-May-2023 Mark Brown <broonie@kernel.org>

arm64/esr: Add decode of ISS2 to data abort reporting

The architecture has added more information about faults to ISS2 within
ESR. Add decode of this to our data abort fault decode to aid diagnostics.
Features that are not currently enabled are included here for completeness.

Since the architecture specifies the values of bits within ISS2 in terms
of ISS2 rather than in terms of the register as a whole we do so for our
definitions as well, this makes it easier to review bitfield definitions.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230417-arm64-iss2-dabt-decode-v3-2-c1fa503e503a@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# e13d32e9 16-May-2023 Arnd Bergmann <arnd@arndb.de>

arm64: move early_brk64 prototype to header

The prototype used for calling early_brk64() is in the file that calls
it, which is the wrong place, as it is not included for the definition:

arch/arm64/kernel/traps.c:1100:12: error: no previous prototype for 'early_brk64' [-Werror=missing-prototypes]

Move it to an appropriate header instead.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230516160642.523862-15-arnd@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# d91d5808 02-May-2023 Min-Hua Chen <minhuadotchen@gmail.com>

arm64/mm: mark private VM_FAULT_X defines as vm_fault_t

This patch fixes several sparse warnings for fault.c:

arch/arm64/mm/fault.c:493:24: sparse: warning: incorrect type in return expression (different base types)
arch/arm64/mm/fault.c:493:24: sparse: expected restricted vm_fault_t
arch/arm64/mm/fault.c:493:24: sparse: got int
arch/arm64/mm/fault.c:501:32: sparse: warning: incorrect type in return expression (different base types)
arch/arm64/mm/fault.c:501:32: sparse: expected restricted vm_fault_t
arch/arm64/mm/fault.c:501:32: sparse: got int
arch/arm64/mm/fault.c:503:32: sparse: warning: incorrect type in return expression (different base types)
arch/arm64/mm/fault.c:503:32: sparse: expected restricted vm_fault_t
arch/arm64/mm/fault.c:503:32: sparse: got int
arch/arm64/mm/fault.c:511:24: sparse: warning: incorrect type in return expression (different base types)
arch/arm64/mm/fault.c:511:24: sparse: expected restricted vm_fault_t
arch/arm64/mm/fault.c:511:24: sparse: got int
arch/arm64/mm/fault.c:670:13: sparse: warning: restricted vm_fault_t degrades to integer
arch/arm64/mm/fault.c:670:13: sparse: warning: restricted vm_fault_t degrades to integer
arch/arm64/mm/fault.c:713:39: sparse: warning: restricted vm_fault_t degrades to integer

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com>
Link: https://lore.kernel.org/r/20230502151909.128810-1-minhuadotchen@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>


# cd7f176a 27-Feb-2023 Suren Baghdasaryan <surenb@google.com>

arm64/mm: try VMA lock-based page fault handling first

Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.

Link: https://lkml.kernel.org/r/20230227173632.3292573-31-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 6bc56a4d 16-Jan-2023 Matthew Wilcox (Oracle) <willy@infradead.org>

mm: add vma_alloc_zeroed_movable_folio()

Replace alloc_zeroed_user_highpage_movable(). The main difference is
returning a folio containing a single page instead of returning the page,
but take the opportunity to rename the function to match other allocation
functions a little better and rewrite the documentation to place more
emphasis on the zeroing rather than the highmem aspect.

Link: https://lkml.kernel.org/r/20230116191813.2145215-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# d77e59a8 03-Nov-2022 Catalin Marinas <catalin.marinas@arm.com>

arm64: mte: Lock a page for MTE tag initialisation

Initialising the tags and setting PG_mte_tagged flag for a page can race
between multiple set_pte_at() on shared pages or setting the stage 2 pte
via user_mem_abort(). Introduce a new PG_mte_lock flag as PG_arch_3 and
set it before attempting page initialisation. Given that PG_mte_tagged
is never cleared for a page, consider setting this flag to mean page
unlocked and wait on this bit with acquire semantics if the page is
locked:

- try_page_mte_tagging() - lock the page for tagging, return true if it
can be tagged, false if already tagged. No acquire semantics if it
returns true (PG_mte_tagged not set) as there is no serialisation with
a previous set_page_mte_tagged().

- set_page_mte_tagged() - set PG_mte_tagged with release semantics.

The two-bit locking is based on Peter Collingbourne's idea.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221104011041.290951-6-pcc@google.com


# e059853d 03-Nov-2022 Catalin Marinas <catalin.marinas@arm.com>

arm64: mte: Fix/clarify the PG_mte_tagged semantics

Currently the PG_mte_tagged page flag mostly means the page contains
valid tags and it should be set after the tags have been cleared or
restored. However, in mte_sync_tags() it is set before setting the tags
to avoid, in theory, a race with concurrent mprotect(PROT_MTE) for
shared pages. However, a concurrent mprotect(PROT_MTE) with a copy on
write in another thread can cause the new page to have stale tags.
Similarly, tag reading via ptrace() can read stale tags if the
PG_mte_tagged flag is set before actually clearing/restoring the tags.

Fix the PG_mte_tagged semantics so that it is only set after the tags
have been cleared or restored. This is safe for swap restoring into a
MAP_SHARED or CoW page since the core code takes the page lock. Add two
functions to test and set the PG_mte_tagged flag with acquire and
release semantics. The downside is that concurrent mprotect(PROT_MTE) on
a MAP_SHARED page may cause tag loss. This is already the case for KVM
guests if a VMM changes the page protection while the guest triggers a
user_mem_abort().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[pcc@google.com: fix build with CONFIG_ARM64_MTE disabled]
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221104011041.290951-3-pcc@google.com


# e8dfdf31 28-Oct-2022 Ard Biesheuvel <ardb@kernel.org>

arm64: efi: Recover from synchronous exceptions occurring in firmware

Unlike x86, which has machinery to deal with page faults that occur
during the execution of EFI runtime services, arm64 has nothing like
that, and a synchronous exception raised by firmware code brings down
the whole system.

With more EFI based systems appearing that were not built to run Linux
(such as the Windows-on-ARM laptops based on Qualcomm SOCs), as well as
the introduction of PRM (platform specific firmware routines that are
callable just like EFI runtime services), we are more likely to run into
issues of this sort, and it is much more likely that we can identify and
work around such issues if they don't bring down the system entirely.

Since we already use a EFI runtime services call wrapper in assembler,
we can quite easily add some code that captures the execution state at
the point where the call is made, allowing us to revert to this state
and proceed execution if the call triggered a synchronous exception.

Given that the kernel and the firmware don't share any data structures
that could end up in an indeterminate state, we can happily continue
running, as long as we mark the EFI runtime services as unavailable from
that point on.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>


# 0bb1fbff 14-Nov-2022 Mark Rutland <mark.rutland@arm.com>

arm64: mm: kfence: only handle translation faults

Alexander noted that KFENCE only expects to handle faults from invalid page
table entries (i.e. translation faults), but arm64's fault handling logic will
call kfence_handle_page_fault() for other types of faults, including alignment
faults caused by unaligned atomics. This has the unfortunate property of
causing those other faults to be reported as "KFENCE: use-after-free",
which is misleading and hinders debugging.

Fix this by only forwarding unhandled translation faults to the KFENCE
code, similar to what x86 does already.

Alexander has verified that this passes all the tests in the KFENCE test
suite and avoids bogus reports on misaligned atomics.

Link: https://lore.kernel.org/all/20221102081620.1465154-1-zhongbaisong@huawei.com/
Fixes: 840b23986344 ("arm64, kfence: enable KFENCE for ARM64")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221114104411.2853040-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>


# 7572ac3c 30-Nov-2022 Ard Biesheuvel <ardb@kernel.org>

arm64: efi: Revert "Recover from synchronous exceptions ..."

This reverts commit 23715a26c8d81291, which introduced some code in
assembler that manipulates both the ordinary and the shadow call stack
pointer in a way that could potentially be taken advantage of. So let's
revert it, and do a better job the next time around.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


# 23715a26 28-Oct-2022 Ard Biesheuvel <ardb@kernel.org>

arm64: efi: Recover from synchronous exceptions occurring in firmware

Unlike x86, which has machinery to deal with page faults that occur
during the execution of EFI runtime services, arm64 has nothing like
that, and a synchronous exception raised by firmware code brings down
the whole system.

With more EFI based systems appearing that were not built to run Linux
(such as the Windows-on-ARM laptops based on Qualcomm SOCs), as well as
the introduction of PRM (platform specific firmware routines that are
callable just like EFI runtime services), we are more likely to run into
issues of this sort, and it is much more likely that we can identify and
work around such issues if they don't bring down the system entirely.

Since we already use a EFI runtime services call wrapper in assembler,
we can quite easily add some code that captures the execution state at
the point where the call is made, allowing us to revert to this state
and proceed execution if the call triggered a synchronous exception.

Given that the kernel and the firmware don't share any data structures
that could end up in an indeterminate state, we can happily continue
running, as long as we mark the EFI runtime services as unavailable from
that point on.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>


# 3fc24ef3 01-Jul-2022 Ard Biesheuvel <ardb@kernel.org>

arm64: compat: Implement misalignment fixups for multiword loads

The 32-bit ARM kernel implements fixups on behalf of user space when
using LDM/STM or LDRD/STRD instructions on addresses that are not 32-bit
aligned. This is not something that is supported by the architecture,
but was done anyway to increase compatibility with user space software,
which mostly targeted x86 at the time and did not care about aligned
accesses.

This feature is one of the remaining impediments to being able to switch
to 64-bit kernels on 64-bit capable hardware running 32-bit user space,
so let's implement it for the arm64 compat layer as well.

Note that the intent is to implement the exact same handling of
misaligned multi-word loads and stores as the 32-bit kernel does,
including what appears to be missing support for user space programs
that rely on SETEND to switch to a different byte order and back. Also,
like the 32-bit ARM version, we rely on the faulting address reported by
the CPU to infer the memory address, instead of decoding the instruction
fully to obtain this information.

This implementation is taken from the 32-bit ARM tree, with all pieces
removed that deal with instructions other than LDRD/STRD and LDM/STM, or
that deal with alignment exceptions taken in kernel mode.

Cc: debian-arm@lists.debian.org
Cc: Vagrant Cascadian <vagrant@debian.org>
Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Steve McIntyre <steve@einval.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20220701135322.3025321-1-ardb@kernel.org
[catalin.marinas@arm.com: change the option to 'default n']
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# d9272525 30-May-2022 Peter Xu <peterx@redhat.com>

mm: avoid unnecessary page fault retires on shared memory types

I observed that for each of the shared file-backed page faults, we're very
likely to retry one more time for the 1st write fault upon no page. It's
because we'll need to release the mmap lock for dirty rate limit purpose
with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).

Then after that throttling we return VM_FAULT_RETRY.

We did that probably because VM_FAULT_RETRY is the only way we can return
to the fault handler at that time telling it we've released the mmap lock.

However that's not ideal because it's very likely the fault does not need
to be retried at all since the pgtable was well installed before the
throttling, so the next continuous fault (including taking mmap read lock,
walk the pgtable, etc.) could be in most cases unnecessary.

It's not only slowing down page faults for shared file-backed, but also add
more mmap lock contention which is in most cases not needed at all.

To observe this, one could try to write to some shmem page and look at
"pgfault" value in /proc/vmstat, then we should expect 2 counts for each
shmem write simply because we retried, and vm event "pgfault" will capture
that.

To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
show that we've completed the whole fault and released the lock. It's also
a hint that we should very possibly not need another fault immediately on
this page because we've just completed it.

This patch provides a ~12% perf boost on my aarch64 test VM with a simple
program sequentially dirtying 400MB shmem file being mmap()ed and these are
the time it needs:

Before: 650.980 ms (+-1.94%)
After: 569.396 ms (+-1.38%)

I believe it could help more than that.

We need some special care on GUP and the s390 pgfault handler (for gmap
code before returning from pgfault), the rest changes in the page fault
handlers should be relatively straightforward.

Another thing to mention is that mm_account_fault() does take this new
fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.

I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
them as-is.

Link: https://lkml.kernel.org/r/20220530183450.42886-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vineet Gupta <vgupta@kernel.org>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm part]
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Weinberger <richard@nod.at>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Will Deacon <will@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Chris Zankel <chris@zankel.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Helge Deller <deller@gmx.de>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 70c248ac 10-Jun-2022 Catalin Marinas <catalin.marinas@arm.com>

mm: kasan: Skip unpoisoning of user pages

Commit c275c5c6d50a ("kasan: disable freed user page poisoning with HW
tags") added __GFP_SKIP_KASAN_POISON to GFP_HIGHUSER_MOVABLE. A similar
argument can be made about unpoisoning, so also add
__GFP_SKIP_KASAN_UNPOISON to user pages. To ensure the user page is
still accessible via page_address() without a kasan fault, reset the
page->flags tag.

With the above changes, there is no need for the arm64
tag_clear_highpage() to reset the page->flags tag.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20220610152141.2148929-3-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>


# bc249e37 03-May-2022 Mark Brown <broonie@kernel.org>

arm64/mte: Make TCF field values and naming more standard

In preparation for automatic generation of the defines for system registers
make the values used for the enumeration in SCTLR_ELx.TCF suitable for use
with the newly defined SYS_FIELD_PREP_ENUM helper, removing the shift from
the define and using the helper to generate it on use instead. Since we
only ever interact with this field in EL1 and in preparation for generation
of the defines also rename from SCTLR_ELx to SCTLR_EL1. SCTLR_EL2 is not
quite the same as SCTLR_EL1 so the conversion does not share the field
definitions.

There should be no functional change from this patch.

Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220503170233.507788-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 8d56e5c5 24-Apr-2022 Alexandru Elisei <alexandru.elisei@arm.com>

arm64: Treat ESR_ELx as a 64-bit register

In the initial release of the ARM Architecture Reference Manual for
ARMv8-A, the ESR_ELx registers were defined as 32-bit registers. This
changed in 2018 with version D.a (ARM DDI 0487D.a) of the architecture,
when they became 64-bit registers, with bits [63:32] defined as RES0. In
version G.a, a new field was added to ESR_ELx, ISS2, which covers bits
[36:32]. This field is used when the Armv8.7 extension FEAT_LS64 is
implemented.

As a result of the evolution of the register width, Linux stores it as
both a 64-bit value and a 32-bit value, which hasn't affected correctness
so far as Linux only uses the lower 32 bits of the register.

Make the register type consistent and always treat it as 64-bit wide. The
register is redefined as an "unsigned long", which is an unsigned
double-word (64-bit quantity) for the LP64 machine (aapcs64 [1], Table 1,
page 14). The type was chosen because "unsigned int" is the most frequent
type for ESR_ELx and because FAR_ELx, which is used together with ESR_ELx
in exception handling, is also declared as "unsigned long". The 64-bit type
also makes adding support for architectural features that use fields above
bit 31 easier in the future.

The KVM hypervisor will receive a similar update in a subsequent patch.

[1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425114444.368693-4-alexandru.elisei@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 0e25498f 28-Jun-2021 Eric W. Biederman <ebiederm@xmission.com>

exit: Add and use make_task_dead.

There are two big uses of do_exit. The first is it's design use to be
the guts of the exit(2) system call. The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.

Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure. In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.

Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.

As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 36ef159f 14-Jan-2022 Qi Zheng <zhengqi.arch@bytedance.com>

mm: remove redundant check about FAULT_FLAG_ALLOW_RETRY bit

Since commit 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple
times") allowed VM_FAULT_RETRY for multiple times, the
FAULT_FLAG_ALLOW_RETRY bit of fault_flag will not be changed in the page
fault path, so the following check is no longer needed:

flags & FAULT_FLAG_ALLOW_RETRY

So just remove it.

[akpm@linux-foundation.org: coding style fixes]

Link: https://lkml.kernel.org/r/20211110123358.36511-1-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Peter Xu <peterx@redhat.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 07b742a4 07-Dec-2021 Mark Rutland <mark.rutland@arm.com>

arm64: mm: log potential KASAN shadow alias

When the kernel is built with KASAN_GENERIC or KASAN_SW_TAGS, shadow
memory is allocated and mapped for all legitimate kernel addresses, and
prior to a regular memory access instrumentation will read from the
corresponding shadow address.

Due to the way memory addresses are converted to shadow addresses, bogus
pointers (e.g. NULL) can generate shadow addresses out of the bounds of
allocated shadow memory. For example, with KASAN_GENERIC and 48-bit VAs,
NULL would have a shadow address of dfff800000000000, which falls
between the TTBR ranges.

To make such cases easier to debug, this patch makes die_kernel_fault()
dump the real memory address range for any potential KASAN shadow access
using kasan_non_canonical_hook(), which results in fault information as
below when KASAN is enabled:

| Unable to handle kernel paging request at virtual address dfff800000000017
| KASAN: null-ptr-deref in range [0x00000000000000b8-0x00000000000000bf]
| Mem abort info:
| ESR = 0x96000004
| EC = 0x25: DABT (current EL), IL = 32 bits
| SET = 0, FnV = 0
| EA = 0, S1PTW = 0
| FSC = 0x04: level 0 translation fault
| Data abort info:
| ISV = 0, ISS = 0x00000004
| CM = 0, WnR = 0
| [dfff800000000017] address between user and kernel address ranges

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will@kernel.org>
Tested-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211207183226.834557-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 6f6cfa58 07-Dec-2021 Mark Rutland <mark.rutland@arm.com>

arm64: mm: use die_kernel_fault() in do_mem_abort()

If we take an unhandled fault from EL1, either:

a) The xFSC handler calls die_kernel_fault() directly. In this case,
die_kernel_fault() calls:

pr_alert(..., msg, addr);
mem_abort_decode(esr);
show_pte(addr);
die();
bust_spinlocks(0);
do_exit(SIGKILL);

b) The xFSC handler returns to do_mem_abort(), indicating failure. In
this case, do_mem_abort() calls:

pr_alert(..., addr);
mem_abort_decode(esr);
show_pte(addr);
arm64_notify_die() {
die();
}

This inconstency is unfortunatem, and in theory in case (b) registered
notifiers can prevent us from terminating the faulting thread by
returning NOTIFY_STOP, whereupon we'll end up returning from the fault,
replaying, and almost certainly get stuck in a livelock spewing errors
into dmesg. We don't expect notifers to fix things up, since we dump
state to dmesg before invoking them, so it would be more sensible to
consistently terminate the thread in this case.

This patch has do_mem_abort() call die_kernel_fault() for unhandled
faults taken from EL1. Where we would previously have logged a messafe
of the form:

| Unhandled fault at ${ADDR}

... we will now log a message of the form:

| Unable to handle kernel ${FAULT_NAME} at virtual address ${ADDR}

... and we will consistently terminate the thread from which the fault
was taken.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Tested-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211207183226.834557-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 76721503 14-Jul-2021 Mark Rutland <mark.rutland@arm.com>

arm64: kasan: mte: remove redundant mte_report_once logic

We have special logic to suppress MTE tag check fault reporting, based
on a global `mte_report_once` and `reported` variables. These can be
used to suppress calling kasan_report() when taking a tag check fault,
but do not prevent taking the fault in the first place, nor does they
affect the way we disable tag checks upon taking a fault.

The core KASAN code already defaults to reporting a single fault, and
has a `multi_shot` control to permit reporting multiple faults. The only
place we transiently alter `mte_report_once` is in lib/test_kasan.c,
where we also the `multi_shot` state as the same time. Thus
`mte_report_once` and `reported` are redundant, and can be removed.

When a tag check fault is taken, tag checking will be disabled by
`do_tag_recovery` and must be explicitly re-enabled if desired. The test
code does this by calling kasan_enable_tagging_sync().

This patch removes the redundant mte_report_once() logic and associated
variables.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Tested-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20210714143843.56537-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 84c5e23e 14-Jun-2021 Gavin Shan <gshan@redhat.com>

arm64: mm: Pass original fault address to handle_mm_fault()

Currently, the lower bits of fault address is cleared before it's
passed to handle_mm_fault(). It's unnecessary since generic code
does same thing since the commit 1a29d85eb0f19 ("mm: use vmf->address
instead of of vmf->virtual_address").

This passes the original fault address to handle_mm_fault() in case
the generic code needs to know the exact fault address.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20210614122701.100515-1-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>


# e0e3903f 08-Jun-2021 Mark Rutland <mark.rutland@arm.com>

arm64: mm: decode xFSC in mem_abort_decode()

It would be helpful if mem_abort_decode() could decode the DFSC/IFSC, as
this can make it easier to identify common bugs (e.g. accesses which
trigger alignment faults) without having to manually decode the xFSC
value.

Decode the xFSC in mem_abort_decode().

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210608123742.11921-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>


# 064dbfb4 07-Jun-2021 Mark Rutland <mark.rutland@arm.com>

arm64: entry: convert IRQ+FIQ handlers to C

For various reasons we'd like to convert the bulk of arm64's exception
triage logic to C. As a step towards that, this patch converts the EL1
and EL0 IRQ+FIQ triage logic to C.

Separate C functions are added for the native and compat cases so that
in subsequent patches we can handle native/compat differences in C.

Since the triage functions can now call arm64_apply_bp_hardening()
directly, the do_el0_irq_bp_hardening() wrapper function is removed.

Since the user_exit_irqoff macro is now unused, it is removed. The
user_enter_irqoff macro is still used by the ret_to_user code, and
cannot be removed at this time.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210607094624.34689-8-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>


# 013bb59d 02-Jun-2021 Peter Collingbourne <pcc@google.com>

arm64: mte: handle tags zeroing at page allocation time

Currently, on an anonymous page fault, the kernel allocates a zeroed
page and maps it in user space. If the mapping is tagged (PROT_MTE),
set_pte_at() additionally clears the tags. It is, however, more
efficient to clear the tags at the same time as zeroing the data on
allocation. To avoid clearing the tags on any page (which may not be
mapped as tagged), only do this if the vma flags contain VM_MTE. This
requires introducing a new GFP flag that is used to determine whether
to clear the tags.

The DC GZVA instruction with a 0 top byte (and 0 tag) requires
top-byte-ignore. Set the TCR_EL1.{TBI1,TBID1} bits irrespective of
whether KASAN_HW is enabled.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://linux-review.googlesource.com/id/Id46dc94e30fe11474f7e54f5d65e7658dbdddb26
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20210602235230.3928842-4-pcc@google.com
Signed-off-by: Will Deacon <will@kernel.org>


# fcf9dc02 03-Jun-2021 Kefeng Wang <wangkefeng.wang@huawei.com>

arm64: mm: Add is_el1_data_abort() helper

We alread have is_el1_instruction_abort(), add is_el1_data_abort()
helper and use it.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210603120239.169018-1-wangkefeng.wang@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>


# 18107f8a 12-Mar-2021 Vladimir Murzin <vladimir.murzin@arm.com>

arm64: Support execute-only permissions with Enhanced PAN

Enhanced Privileged Access Never (EPAN) allows Privileged Access Never
to be used with Execute-only mappings.

Absence of such support was a reason for 24cecc377463 ("arm64: Revert
support for execute-only user mappings"). Thus now it can be revisited
and re-enabled.

Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210312173811.58284-2-vladimir.murzin@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# bc8fbc5f 25-Feb-2021 Marco Elver <elver@google.com>

kfence: add test suite

Add KFENCE test suite, testing various error detection scenarios. Makes
use of KUnit for test organization. Since KFENCE's interface to obtain
error reports is via the console, the test verifies that KFENCE outputs
expected reports to the console.

[elver@google.com: fix typo in test]
Link: https://lkml.kernel.org/r/X9lHQExmHGvETxY4@elver.google.com
[elver@google.com: show access type in report]
Link: https://lkml.kernel.org/r/20210111091544.3287013-2-elver@google.com

Link: https://lkml.kernel.org/r/20201103175841.3495947-9-elver@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d438fabc 25-Feb-2021 Marco Elver <elver@google.com>

kfence: use pt_regs to generate stack trace on faults

Instead of removing the fault handling portion of the stack trace based on
the fault handler's name, just use struct pt_regs directly.

Change kfence_handle_page_fault() to take a struct pt_regs, and plumb it
through to kfence_report_error() for out-of-bounds, use-after-free, or
invalid access errors, where pt_regs is used to generate the stack trace.

If the kernel is a DEBUG_KERNEL, also show registers for more information.

Link: https://lkml.kernel.org/r/20201105092133.2075331-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 840b2398 25-Feb-2021 Marco Elver <elver@google.com>

arm64, kfence: enable KFENCE for ARM64

Add architecture specific implementation details for KFENCE and enable
KFENCE for the arm64 architecture. In particular, this implements the
required interface in <asm/kfence.h>.

KFENCE requires that attributes for pages from its memory pool can
individually be set. Therefore, force the entire linear map to be mapped
at page granularity. Doing so may result in extra memory allocated for
page tables in case rodata=full is not set; however, currently
CONFIG_RODATA_FULL_DEFAULT_ENABLED=y is the default, and the common case
is therefore not affected by this change.

[elver@google.com: add missing copyright and description header]
Link: https://lkml.kernel.org/r/20210118092159.145934-3-elver@google.com

Link: https://lkml.kernel.org/r/20201103175841.3495947-4-elver@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Co-developed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joern Engel <joern@purestorage.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f05842cf 24-Feb-2021 Andrey Konovalov <andreyknvl@google.com>

kasan, arm64: allow using KUnit tests with HW_TAGS mode

On a high level, this patch allows running KUnit KASAN tests with the
hardware tag-based KASAN mode.

Internally, this change reenables tag checking at the end of each KASAN
test that triggers a tag fault and leads to tag checking being disabled.

Also simplify is_write calculation in report_tag_fault.

With this patch KASAN tests are still failing for the hardware tag-based
mode; fixes come in the next few patches.

[andreyknvl@google.com: export HW_TAGS symbols for KUnit tests]
Link: https://lkml.kernel.org/r/e7eeb252da408b08f0c81b950a55fb852f92000b.1613155970.git.andreyknvl@google.com

Link: https://linux-review.googlesource.com/id/Id94dc9eccd33b23cda4950be408c27f879e474c8
Link: https://lkml.kernel.org/r/51b23112cf3fd62b8f8e9df81026fa2b15870501.1610733117.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6459b846 01-Feb-2021 Mark Rutland <mark.rutland@arm.com>

arm64: entry: consolidate Cortex-A76 erratum 1463225 workaround

The workaround for Cortex-A76 erratum 1463225 is split across the
syscall and debug handlers in separate files. This structure currently
forces us to do some redundant work for debug exceptions from EL0, is a
little difficult to follow, and gets in the way of some future rework of
the exception entry code as it requires exceptions to be unmasked late
in the syscall handling path.

To simplify things, and as a preparatory step for future rework of
exception entry, this patch moves all the workaround logic into
entry-common.c. As the debug handler only needs to run for EL1 debug
exceptions, we no longer call it for EL0 debug exceptions, and no longer
need to check user_mode(regs) as this is always false. For clarity
cortex_a76_erratum_1463225_debug_handler() is changed to return bool.

In the SVC path, the workaround is applied earlier, but this should have
no functional impact as exceptions are still masked. In the debug path
we run the fixup before explicitly disabling preemption, but we will not
attempt to preempt before returning from the exception.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210202120341.28858-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>


# abd4737f 05-Feb-2021 Miaohe Lin <linmiaohe@huawei.com>

mm/arm64: Correct obsolete comment in do_page_fault()

commit d8ed45c5dcd4 ("mmap locking API: use coccinelle to convert mmap_sem
rwsem call sites") has convertd down_read_trylock() to mmap_read_trylock().
But it forgot to update the relevant comment.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Link: https://lore.kernel.org/r/20210205090919.63382-1-linmiaohe@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>


# 3ed86b9a 15-Jan-2021 Andrey Konovalov <andreyknvl@google.com>

kasan, arm64: fix pointer tags in KASAN reports

As of the "arm64: expose FAR_EL1 tag bits in siginfo" patch, the address
that is passed to report_tag_fault has pointer tags in the format of 0x0X,
while KASAN uses 0xFX format (note the difference in the top 4 bits).

Fix up the pointer tag for kernel pointers in do_tag_check_fault by
setting them to the same value as bit 55. Explicitly use __untagged_addr()
instead of untagged_addr(), as the latter doesn't affect TTBR1 addresses.

Fixes: dceec3ff7807 ("arm64: expose FAR_EL1 tag bits in siginfo")
Fixes: 4291e9ee6189 ("kasan, arm64: print report from tag fault handler")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://linux-review.googlesource.com/id/I9ced973866036d8679e8f4ae325de547eb969649
Link: https://lore.kernel.org/r/ff30b0afe6005fd046f9ac72bfb71822aedccd89.1610731872.git.andreyknvl@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 4291e9ee 22-Dec-2020 Andrey Konovalov <andreyknvl@google.com>

kasan, arm64: print report from tag fault handler

Add error reporting for hardware tag-based KASAN. When
CONFIG_KASAN_HW_TAGS is enabled, print KASAN report from the arm64 tag
fault handler.

SAS bits aren't set in ESR for all faults reported in EL1, so it's
impossible to find out the size of the access the caused the fault. Adapt
KASAN reporting code to handle this case.

Link: https://lkml.kernel.org/r/b559c82b6a969afedf53b4694b475f0234067a1a.1606161801.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 98c970da 22-Dec-2020 Vincenzo Frascino <vincenzo.frascino@arm.com>

arm64: mte: add in-kernel tag fault handler

Add the implementation of the in-kernel fault handler.

When a tag fault happens on a kernel address:
* MTE is disabled on the current CPU,
* the execution continues.

When a tag fault happens on a user address:
* the kernel executes do_bad_area() and panics.

The tag fault handler for kernel addresses is currently empty and will be
filled in by a future commit.

Link: https://lkml.kernel.org/r/20201203102628.GB2224@gaia

Link: https://lkml.kernel.org/r/ad31529b073e22840b7a2246172c2b67747ed7c4.1606161801.git.andreyknvl@google.com
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
[catalin.marinas@arm.com: ensure CONFIG_ARM64_PAN is enabled with MTE]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3d2403fd 02-Dec-2020 Mark Rutland <mark.rutland@arm.com>

arm64: uaccess: remove set_fs()

Now that the uaccess primitives dont take addr_limit into account, we
have no need to manipulate this via set_fs() and get_fs(). Remove
support for these, along with some infrastructure this renders
redundant.

We no longer need to flip UAO to access kernel memory under KERNEL_DS,
and head.S unconditionally clears UAO for all kernel configurations via
an ERET in init_kernel_el. Thus, we don't need to dynamically flip UAO,
nor do we need to context-switch it. However, we still need to adjust
PAN during SDEI entry.

Masking of __user pointers no longer needs to use the dynamic value of
addr_limit, and can use a constant derived from the maximum possible
userspace task size. A new TASK_SIZE_MAX constant is introduced for
this, which is also used by core code. In configurations supporting
52-bit VAs, this may include a region of unusable VA space above a
48-bit TTBR0 limit, but never includes any portion of TTBR1.

Note that TASK_SIZE_MAX is an exclusive limit, while USER_DS and
KERNEL_DS were inclusive limits, and is converted to a mask by
subtracting one.

As the SDEI entry code repurposes the otherwise unnecessary
pt_regs::orig_addr_limit field to store the TTBR1 of the interrupted
context, for now we rename that to pt_regs::sdei_ttbr1. In future we can
consider factoring that out.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-10-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 2a9b3e6a 30-Nov-2020 Mark Rutland <mark.rutland@arm.com>

arm64: entry: fix EL1 debug transitions

In debug_exception_enter() and debug_exception_exit() we trace hardirqs
on/off while RCU isn't guaranteed to be watching, and we don't save and
restore the hardirq state, and so may return with this having changed.

Handle this appropriately with new entry/exit helpers which do the bare
minimum to ensure this is appropriately maintained, without marking
debug exceptions as NMIs. These are placed in entry-common.c with the
other entry/exit helpers.

In future we'll want to reconsider whether some debug exceptions should
be NMIs, but this will require a significant refactoring, and for now
this should prevent issues with lockdep and RCU.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marins <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-12-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>


# 23529049 30-Nov-2020 Mark Rutland <mark.rutland@arm.com>

arm64: entry: fix non-NMI user<->kernel transitions

When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE
will WARN() at boot time that interrupts are enabled when we call
context_tracking_user_enter(), despite the DAIF flags indicating that
IRQs are masked.

The problem is that we're not tracking IRQ flag changes accurately, and
so lockdep believes interrupts are enabled when they are not (and
vice-versa). We can shuffle things so to make this more accurate. For
kernel->user transitions there are a number of constraints we need to
consider:

1) When we call __context_tracking_user_enter() HW IRQs must be disabled
and lockdep must be up-to-date with this.

2) Userspace should be treated as having IRQs enabled from the PoV of
both lockdep and tracing.

3) As context_tracking_user_enter() stops RCU from watching, we cannot
use RCU after calling it.

4) IRQ flag tracing and lockdep have state that must be manipulated
before RCU is disabled.

... with similar constraints applying for user->kernel transitions, with
the ordering reversed.

The generic entry code has enter_from_user_mode() and
exit_to_user_mode() helpers to handle this. We can't use those directly,
so we add arm64 copies for now (without the instrumentation markers
which aren't used on arm64). These replace the existing user_exit() and
user_exit_irqoff() calls spread throughout handlers, and the exception
unmasking is left as-is.

Note that:

* The accounting for debug exceptions from userspace now happens in
el0_dbg() and ret_to_user(), so this is removed from
debug_exception_enter() and debug_exception_exit(). As
user_exit_irqoff() wakes RCU, the userspace-specific check is removed.

* The accounting for syscalls now happens in el0_svc(),
el0_svc_compat(), and ret_to_user(), so this is removed from
el0_svc_common(). This does not adversely affect the workaround for
erratum 1463225, as this does not depend on any of the state tracking.

* In ret_to_user() we mask interrupts with local_daif_mask(), and so we
need to inform lockdep and tracing. Here a trace_hardirqs_off() is
sufficient and safe as we have not yet exited kernel context and RCU
is usable.

* As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only
needs to check for the latter.

* EL0 SError handling will be dealt with in a subsequent patch, as this
needs to be treated as an NMI.

Prior to this patch, booting an appropriately-configured kernel would
result in spats as below:

| DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
| WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0
| Modules linked in:
| CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : check_flags.part.54+0x1dc/0x1f0
| lr : check_flags.part.54+0x1dc/0x1f0
| sp : ffff80001003bd80
| x29: ffff80001003bd80 x28: ffff66ce801e0000
| x27: 00000000ffffffff x26: 00000000000003c0
| x25: 0000000000000000 x24: ffffc31842527258
| x23: ffffc31842491368 x22: ffffc3184282d000
| x21: 0000000000000000 x20: 0000000000000001
| x19: ffffc318432ce000 x18: 0080000000000000
| x17: 0000000000000000 x16: ffffc31840f18a78
| x15: 0000000000000001 x14: ffffc3184285c810
| x13: 0000000000000001 x12: 0000000000000000
| x11: ffffc318415857a0 x10: ffffc318406614c0
| x9 : ffffc318415857a0 x8 : ffffc31841f1d000
| x7 : 647261685f706564 x6 : ffffc3183ff7c66c
| x5 : ffff66ce801e0000 x4 : 0000000000000000
| x3 : ffffc3183fe00000 x2 : ffffc31841500000
| x1 : e956dc24146b3500 x0 : 0000000000000000
| Call trace:
| check_flags.part.54+0x1dc/0x1f0
| lock_is_held_type+0x10c/0x188
| rcu_read_lock_sched_held+0x70/0x98
| __context_tracking_enter+0x310/0x350
| context_tracking_enter.part.3+0x5c/0xc8
| context_tracking_user_enter+0x6c/0x80
| finish_ret_to_user+0x2c/0x13cr

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-8-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>


# dceec3ff 20-Nov-2020 Peter Collingbourne <pcc@google.com>

arm64: expose FAR_EL1 tag bits in siginfo

The kernel currently clears the tag bits (i.e. bits 56-63) in the fault
address exposed via siginfo.si_addr and sigcontext.fault_address. However,
the tag bits may be needed by tools in order to accurately diagnose
memory errors, such as HWASan [1] or future tools based on the Memory
Tagging Extension (MTE).

Expose these bits via the arch_untagged_si_addr mechanism, so that
they are only exposed to signal handlers with the SA_EXPOSE_TAGBITS
flag set.

[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html

Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://linux-review.googlesource.com/id/Ia8876bad8c798e0a32df7c2ce1256c4771c81446
Link: https://lore.kernel.org/r/0010296597784267472fa13b39f8238d87a72cf8.1605904350.git.pcc@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 96d389ca 28-Oct-2020 Rob Herring <robh@kernel.org>

arm64: Add workaround for Arm Cortex-A77 erratum 1508412

On Cortex-A77 r0p0 and r1p0, a sequence of a non-cacheable or device load
and a store exclusive or PAR_EL1 read can cause a deadlock.

The workaround requires a DMB SY before and after a PAR_EL1 register
read. In addition, it's possible an interrupt (doing a device read) or
KVM guest exit could be taken between the DMB and PAR read, so we
also need a DMB before returning from interrupt and before returning to
a guest.

A deadlock is still possible with the workaround as KVM guests must also
have the workaround. IOW, a malicious guest can deadlock an affected
systems.

This workaround also depends on a firmware counterpart to enable the h/w
to insert DMB SY after load and store exclusive instructions. See the
errata document SDEN-1152370 v10 [1] for more information.

[1] https://static.docs.arm.com/101992/0010/Arm_Cortex_A77_MP074_Software_Developer_Errata_Notice_v10.pdf

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: kvmarm@lists.cs.columbia.edu
Link: https://lore.kernel.org/r/20201028182839.166037-2-robh@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>


# 6a1bdb17 30-Sep-2020 Will Deacon <will@kernel.org>

arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op

Our use of broadcast TLB maintenance means that spurious page-faults
that have been handled already by another CPU do not require additional
TLB maintenance.

Make flush_tlb_fix_spurious_fault() a no-op and rely on the existing TLB
invalidation instead. Add an explicit flush_tlb_page() when making a page
dirty, as the TLB is permitted to cache the old read-only entry.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200728092220.GA21800@willie-the-truck
Signed-off-by: Will Deacon <will@kernel.org>


# 637ec831 16-Sep-2019 Vincenzo Frascino <vincenzo.frascino@arm.com>

arm64: mte: Handle synchronous and asynchronous tag check faults

The Memory Tagging Extension has two modes of notifying a tag check
fault at EL0, configurable through the SCTLR_EL1.TCF0 field:

1. Synchronous raising of a Data Abort exception with DFSC 17.
2. Asynchronous setting of a cumulative bit in TFSRE0_EL1.

Add the exception handler for the synchronous exception and handling of
the asynchronous TFSRE0_EL1.TF0 bit setting via a new TIF flag in
do_notify_resume().

On a tag check failure in user-space, whether synchronous or
asynchronous, a SIGSEGV will be raised on the faulting thread.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>


# 6a1bb025 11-Aug-2020 Peter Xu <peterx@redhat.com>

mm/arm64: use general page fault accounting

Use the general page fault accounting by passing regs into
handle_mm_fault(). It naturally solve the issue of multiple page fault
accounting when page fault retry happened. To do this, we pass pt_regs
pointer into __do_page_fault().

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: http://lkml.kernel.org/r/20200707225021.200906-6-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bce617ed 11-Aug-2020 Peter Xu <peterx@redhat.com>

mm: do page fault accounting in handle_mm_fault

Patch series "mm: Page fault accounting cleanups", v5.

This is v5 of the pf accounting cleanup series. It originates from Gerald
Schaefer's report on an issue a week ago regarding to incorrect page fault
accountings for retried page fault after commit 4064b9827063 ("mm: allow
VM_FAULT_RETRY for multiple times"):

https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/

What this series did:

- Correct page fault accounting: we do accounting for a page fault
(no matter whether it's from #PF handling, or gup, or anything else)
only with the one that completed the fault. For example, page fault
retries should not be counted in page fault counters. Same to the
perf events.

- Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
event is used in an adhoc way across different archs.

Case (1): for many archs it's done at the entry of a page fault
handler, so that it will also cover e.g. errornous faults.

Case (2): for some other archs, it is only accounted when the page
fault is resolved successfully.

Case (3): there're still quite some archs that have not enabled
this perf event.

Since this series will touch merely all the archs, we unify this
perf event to always follow case (1), which is the one that makes most
sense. And since we moved the accounting into handle_mm_fault, the
other two MAJ/MIN perf events are well taken care of naturally.

- Unify definition of "major faults": the definition of "major
fault" is slightly changed when used in accounting (not
VM_FAULT_MAJOR). More information in patch 1.

- Always account the page fault onto the one that triggered the page
fault. This does not matter much for #PF handlings, but mostly for
gup. More information on this in patch 25.

Patchset layout:

Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
Patch 2-23: Enable the new accounting for arch #PF handlers one by one.
Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.)
Patch 25: Cleanup GUP task_struct pointer since it's not needed any more

This patch (of 25):

This is a preparation patch to move page fault accountings into the
general code in handle_mm_fault(). This includes both the per task
flt_maj/flt_min counters, and the major/minor page fault perf events. To
do this, the pt_regs pointer is passed into handle_mm_fault().

PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
handlers.

So far, all the pt_regs pointer that passed into handle_mm_fault() is
NULL, which means this patch should have no intented functional change.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d8ed45c5 08-Jun-2020 Michel Lespinasse <walken@google.com>

mmap locking API: use coccinelle to convert mmap_sem rwsem call sites

This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e31cf2f4 08-Jun-2020 Mike Rapoport <rppt@kernel.org>

mm: don't include asm/pgtable.h if linux/mm.h is already included

Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once. For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.

Most of these definitions are actually identical and typically it boils
down to, e.g.

static inline unsigned long pmd_index(unsigned long address)
{
return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}

These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.

These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.

This patch (of 12):

The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
functions involving page table manipulations, e.g. pte_alloc() and
pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h>
in the files that include <linux/mm.h>.

The include statements in such cases are remove with a simple loop:

for f in $(git grep -l "include <linux/mm.h>") ; do
sed -i -e '/include <asm\/pgtable.h>/ d' $f
done

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e9f63768 04-Jun-2020 Mike Rapoport <rppt@kernel.org>

arm64: add support for folded p4d page tables

Implement primitives necessary for the 4th level folding, add walks of p4d
level where appropriate, replace 5level-fixup.h with pgtable-nop4d.h and
remove __ARCH_USE_5LEVEL_HACK.

[arnd@arndb.de: fix gcc-10 shift warning]
Link: http://lkml.kernel.org/r/20200429185657.4085975-1-arnd@arndb.de
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: James Morse <james.morse@arm.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200414153455.21744-4-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8fcc4ae6 01-May-2020 James Morse <james.morse@arm.com>

arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work

APEI is unable to do all of its error handling work in nmi-context, so
it defers non-fatal work onto the irq_work queue. arch_irq_work_raise()
sends an IPI to the calling cpu, but this is not guaranteed to be taken
before returning to user-space.

Unless the exception interrupted a context with irqs-masked,
irq_work_run() can run immediately. Otherwise return -EINPROGRESS to
indicate ghes_notify_sea() found some work to do, but it hasn't
finished yet.

With this apei_claim_sea() returning '0' means this external-abort was
also notification of a firmware-first RAS error, and that APEI has
processed the CPER records.

Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Tyler Baicar <baicar@os.amperecomputing.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6cb4d9a2 10-Apr-2020 Anshuman Khandual <anshuman.khandual@arm.com>

mm/vma: introduce VM_ACCESS_FLAGS

There are many places where all basic VMA access flags (read, write,
exec) are initialized or checked against as a group. One such example
is during page fault. Existing vma_is_accessible() wrapper already
creates the notion of VMA accessibility as a group access permissions.

Hence lets just create VM_ACCESS_FLAGS (VM_READ|VM_WRITE|VM_EXEC) which
will not only reduce code duplication but also extend the VMA
accessibility concept in general.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rob Springer <rspringer@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Link: http://lkml.kernel.org/r/1583391014-8170-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4064b982 01-Apr-2020 Peter Xu <peterx@redhat.com>

mm: allow VM_FAULT_RETRY for multiple times

The idea comes from a discussion between Linus and Andrea [1].

Before this patch we only allow a page fault to retry once. We achieved
this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
handle_mm_fault() the second time. This was majorly used to avoid
unexpected starvation of the system by looping over forever to handle the
page fault on a single page. However that should hardly happen, and after
all for each code path to return a VM_FAULT_RETRY we'll first wait for a
condition (during which time we should possibly yield the cpu) to happen
before VM_FAULT_RETRY is really returned.

This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY
flag when we receive VM_FAULT_RETRY. It means that the page fault handler
now can retry the page fault for multiple times if necessary without the
need to generate another page fault event. Meanwhile we still keep the
FAULT_FLAG_TRIED flag so page fault handler can still identify whether a
page fault is the first attempt or not.

Then we'll have these combinations of fault flags (only considering
ALLOW_RETRY flag and TRIED flag):

- ALLOW_RETRY and !TRIED: this means the page fault allows to
retry, and this is the first try

- ALLOW_RETRY and TRIED: this means the page fault allows to
retry, and this is not the first try

- !ALLOW_RETRY and !TRIED: this means the page fault does not allow
to retry at all

- !ALLOW_RETRY and TRIED: this is forbidden and should never be used

In existing code we have multiple places that has taken special care of
the first condition above by checking against (fault_flags &
FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to detect
the first retry of a page fault by checking against both (fault_flags &
FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now
even the 2nd try will have the ALLOW_RETRY set, then use that helper in
all existing special paths. One example is in __lock_page_or_retry(), now
we'll drop the mmap_sem only in the first attempt of page fault and we'll
keep it in follow up retries, so old locking behavior will be retained.

This will be a nice enhancement for current code [2] at the same time a
supporting material for the future userfaultfd-writeprotect work, since in
that work there will always be an explicit userfault writeprotect retry
for protected pages, and if that cannot resolve the page fault (e.g., when
userfaultfd-writeprotect is used in conjunction with swapped pages) then
we'll possibly need a 3rd retry of the page fault. It might also benefit
other potential users who will have similar requirement like userfault
write-protection.

GUP code is not touched yet and will be covered in follow up patch.

Please read the thread below for more information.

[1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/
[2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# dde16072 01-Apr-2020 Peter Xu <peterx@redhat.com>

mm: introduce FAULT_FLAG_DEFAULT

Although there're tons of arch-specific page fault handlers, most of them
are still sharing the same initial value of the page fault flags. Say,
merely all of the page fault handlers would allow the fault to be retried,
and they also allow the fault to respond to SIGKILL.

Let's define a default value for the fault flags to replace those initial
page fault flags that were copied over. With this, it'll be far easier to
introduce new fault flag that can be used by all the architectures instead
of touching all the archs.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b502f038 01-Apr-2020 Peter Xu <peterx@redhat.com>

arm64/mm: use helper fault_signal_pending()

Let the arm64 fault handling to use the new fault_signal_pending() helper,
by moving the signal handling out of the retry logic.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220155927.9264-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 24cecc37 06-Jan-2020 Catalin Marinas <catalin.marinas@arm.com>

arm64: Revert support for execute-only user mappings

The ARMv8 64-bit architecture supports execute-only user permissions by
clearing the PTE_USER and PTE_UXN bits, practically making it a mostly
privileged mapping but from which user running at EL0 can still execute.

The downside, however, is that the kernel at EL1 inadvertently reading
such mapping would not trip over the PAN (privileged access never)
protection.

Revert the relevant bits from commit cab15ce604e5 ("arm64: Introduce
execute-only page access permissions") so that PROT_EXEC implies
PROT_READ (and therefore PTE_USER) until the architecture gains proper
support for execute-only user mappings.

Fixes: cab15ce604e5 ("arm64: Introduce execute-only page access permissions")
Cc: <stable@vger.kernel.org> # 4.9.x-
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e44ec4a3 29-Oct-2019 Xiang Zheng <zhengxiang9@huawei.com>

arm64: print additional fault message when executing non-exec memory

When attempting to executing non-executable memory, the fault message
shows:

Unable to handle kernel read from unreadable memory at virtual address
ffff802dac469000

This may confuse someone, so add a new fault message for instruction
abort.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# bfe29874 25-Oct-2019 James Morse <james.morse@arm.com>

arm64: entry-common: don't touch daif before bp-hardening

The previous patches mechanically transformed the assembly version of
entry.S to entry-common.c for synchronous exceptions.

The C version of local_daif_restore() doesn't quite do the same thing
as the assembly versions if pseudo-NMI is in use. In particular,
| local_daif_restore(DAIF_PROCCTX_NOIRQ)
will still allow pNMI to be delivered. This is not the behaviour
do_el0_ia_bp_hardening() and do_sp_pc_abort() want as it should not
be possible for the PMU handler to run as an NMI until the bp-hardening
sequence has run.

The bp-hardening calls were placed where they are because this was the
first C code to run after the relevant exceptions. As we've now moved
that point earlier, move the checks and calls earlier too.

This makes it clearer that this stuff runs before any kind of exception,
and saves modifying PSTATE twice.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# afa7c0e5 25-Oct-2019 James Morse <james.morse@arm.com>

arm64: Remove asmlinkage from updated functions

Now that the callers of these functions have moved into C, they no longer
need the asmlinkage annotation. Remove it.

Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# b6e43c0e 25-Oct-2019 James Morse <james.morse@arm.com>

arm64: remove __exception annotations

Since commit 732674980139 ("arm64: unwind: reference pt_regs via embedded
stack frame") arm64 has not used the __exception annotation to dump
the pt_regs during stack tracing. in_exception_text() has no callers.

This annotation is only used to blacklist kprobes, it means the same as
__kprobes.

Section annotations like this require the functions to be grouped
together between the start/end markers, and placed according to
the linker script. For kprobes we also have NOKPROBE_SYMBOL() which
logs the symbol address in a section that kprobes parses and
blacklists at boot.

Using NOKPROBE_SYMBOL() instead lets kprobes publish the list of
blacklisted symbols, and saves us from having an arm64 specific
spelling of __kprobes.

do_debug_exception() already has a NOKPROBE_SYMBOL() annotation.

Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 38137335 15-Oct-2019 Mark Rutland <mark.rutland@arm.com>

arm64: mm: fix inverted PAR_EL1.F check

When detecting a spurious EL1 translation fault, we have the CPU retry
the translation using an AT S1E1R instruction, and inspect PAR_EL1 to
determine if the fault was spurious.

When PAR_EL1.F == 0, the AT instruction successfully translated the
address without a fault, which implies the original fault was spurious.
However, in this case we return false and treat the original fault as if
it was not spurious.

Invert the return value so that we treat such a case as spurious.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 42f91093b043 ("arm64: mm: Ignore spurious translation faults taken from the kernel")
Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>


# 308c5156 04-Oct-2019 Mark Rutland <mark.rutland@arm.com>

arm64: mm: fix spurious fault detection

When detecting a spurious EL1 translation fault, we attempt to compare
ESR_EL1.DFSC with PAR_EL1.FST. We erroneously use FIELD_PREP() to
extract PAR_EL1.FST, when we should be using FIELD_GET().

In the wise words of Robin Murphy:

| FIELD_GET() is a UBFX, FIELD_PREP() is a BFI

Using FIELD_PREP() means that that dfsc & ESR_ELx_FSC_TYPE is always
zero, and hence not equal to ESR_ELx_FSC_FAULT. Thus we detect any
unhandled translation fault as spurious.

... so let's use FIELD_GET() to ensure we don't decide all translation
faults are spurious. ESR_EL1.DFSC occupies bits [5:0], and requires no
shifting.

Fixes: 42f91093b043332a ("arm64: mm: Ignore spurious translation faults taken from the kernel")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Robin Murphy <robin.murphy@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>


# e4365f96 03-Oct-2019 Mark Rutland <mark.rutland@arm.com>

arm64: mm: avoid virt_to_phys(init_mm.pgd)

If we take an unhandled fault in the kernel, we call show_pte() to dump
the {PGDP,PGD,PUD,PMD,PTE} values for the corresponding page table walk,
where the PGDP value is virt_to_phys(mm->pgd).

The boot-time and runtime kernel page tables, init_pg_dir and
swapper_pg_dir respectively, are kernel symbols. Thus, it is not valid
to call virt_to_phys() on either of these, though we'll do so if we take
a fault on a TTBR1 address.

When CONFIG_DEBUG_VIRTUAL is not selected, virt_to_phys() will silently
fix this up. However, when CONFIG_DEBUG_VIRTUAL is selected, this
results in splats as below. Depending on when these occur, they can
happen to suppress information needed to debug the original unhandled
fault, such as the backtrace:

| Unable to handle kernel paging request at virtual address ffff7fffec73cf0f
| Mem abort info:
| ESR = 0x96000004
| EC = 0x25: DABT (current EL), IL = 32 bits
| SET = 0, FnV = 0
| EA = 0, S1PTW = 0
| Data abort info:
| ISV = 0, ISS = 0x00000004
| CM = 0, WnR = 0
| ------------[ cut here ]------------
| virt_to_phys used for non-linear address: 00000000102c9dbe (swapper_pg_dir+0x0/0x1000)
| WARNING: CPU: 1 PID: 7558 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0xe0/0x170 arch/arm64/mm/physaddr.c:12
| Kernel panic - not syncing: panic_on_warn set ...
| SMP: stopping secondary CPUs
| Dumping ftrace buffer:
| (ftrace buffer empty)
| Kernel Offset: disabled
| CPU features: 0x0002,23000438
| Memory Limit: none
| Rebooting in 1 seconds..

We can avoid this by ensuring that we call __pa_symbol() for
init_mm.pgd, as this will always be a kernel symbol. As the dumped
{PGD,PUD,PMD,PTE} values are the raw values from the relevant entries we
don't need to handle these specially.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>


# 42f91093 22-Aug-2019 Will Deacon <will@kernel.org>

arm64: mm: Ignore spurious translation faults taken from the kernel

Thanks to address translation being performed out of order with respect to
loads and stores, it is possible for a CPU to take a translation fault when
accessing a page that was mapped by a different CPU.

For example, in the case that one CPU maps a page and then sets a flag to
tell another CPU:

CPU 0
-----

MOV X0, <valid pte>
STR X0, [Xptep] // Store new PTE to page table
DSB ISHST
ISB
MOV X1, #1
STR X1, [Xflag] // Set the flag

CPU 1
-----

loop: LDAR X0, [Xflag] // Poll flag with Acquire semantics
CBZ X0, loop
LDR X1, [X2] // Translates using the new PTE

then the final load on CPU 1 can raise a translation fault because the
translation can be performed speculatively before the read of the flag and
marked as "faulting" by the CPU. This isn't quite as bad as it sounds
since, in reality, code such as:

CPU 0 CPU 1
----- -----
spin_lock(&lock); spin_lock(&lock);
*ptr = vmalloc(size); if (*ptr)
spin_unlock(&lock); foo = **ptr;
spin_unlock(&lock);

will not trigger the fault because there is an address dependency on CPU 1
which prevents the speculative translation. However, more exotic code where
the virtual address is known ahead of time, such as:

CPU 0 CPU 1
----- -----
spin_lock(&lock); spin_lock(&lock);
set_fixmap(0, paddr, prot); if (mapped)
mapped = true; foo = *fix_to_virt(0);
spin_unlock(&lock); spin_unlock(&lock);

could fault. This can be avoided by any of:

* Introducing broadcast TLB maintenance on the map path
* Adding a DSB;ISB sequence after checking a flag which indicates
that a virtual address is now mapped
* Handling the spurious fault

Given that we have never observed a problem due to this under Linux and
future revisions of the architecture are being tightened so that
translation table walks are effectively ordered in the same way as explicit
memory accesses, we no longer treat spurious kernel faults as fatal if an
AT instruction indicates that the access does not trigger a translation
fault.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>


# 233947ef 14-Aug-2019 Mark Rutland <mark.rutland@arm.com>

arm64: memory: fix flipped VA space fallout

VA_START used to be the start of the TTBR1 address space, but now it's a
point midway though. In a couple of places we still use VA_START to get
the start of the TTBR1 address space, so let's fix these up to use
PAGE_OFFSET instead.

Fixes: 14c127c957c1c607 ("arm64: mm: Flip kernel VA space")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>


# 2c624fe6 07-Aug-2019 Steve Capper <steve.capper@arm.com>

arm64: mm: Remove vabits_user

Previous patches have enabled 52-bit kernel + user VAs and there is no
longer any scenario where user VA != kernel VA size.

This patch removes the, now redundant, vabits_user variable and replaces
usage with vabits_actual where appropriate.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>


# 5383cc6e 07-Aug-2019 Steve Capper <steve.capper@arm.com>

arm64: mm: Introduce vabits_actual

In order to support 52-bit kernel addresses detectable at boot time, one
needs to know the actual VA_BITS detected. A new variable vabits_actual
is introduced in this commit and employed for the KVM hypervisor layout,
KASAN, fault handling and phys-to/from-virt translation where there
would normally be compile time constants.

In order to maintain performance in phys_to_virt, another variable
physvirt_offset is introduced.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>


# 2951d5ef 06-Aug-2019 Miles Chen <miles.chen@mediatek.com>

arm64: mm: print hexadecimal EC value in mem_abort_decode()

This change prints the hexadecimal EC value in mem_abort_decode(),
which makes it easier to lookup the corresponding EC in
the ARM Architecture Reference Manual.

The commit 1f9b8936f36f ("arm64: Decode information from ESR upon mem
faults") prints useful information when memory abort occurs. It would
be easier to lookup "0x25" instead of "DABT" in the document. Then we
can check the corresponding ISS.

For example:
Current info Document
EC Exception class
"CP15 MCR/MRC" 0x3 "MCR or MRC access to CP15a..."
"ASIMD" 0x7 "Access to SIMD or floating-point..."
"DABT (current EL)" 0x25 "Data Abort taken without..."
...

Before:
Unable to handle kernel paging request at virtual address 000000000000c000
Mem abort info:
ESR = 0x96000046
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000046
CM = 0, WnR = 1

After:
Unable to handle kernel paging request at virtual address 000000000000c000
Mem abort info:
ESR = 0x96000046
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000046
CM = 0, WnR = 1

Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: James Morse <james.morse@arm.com>
Acked-by: Mark Rutland <Mark.rutland@arm.com>
Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Signed-off-by: Will Deacon <will@kernel.org>


# d8bb6718 01-Aug-2019 Masami Hiramatsu <mhiramat@kernel.org>

arm64: Make debug exception handlers visible from RCU

Make debug exceptions visible from RCU so that synchronize_rcu()
correctly track the debug exception handler.

This also introduces sanity checks for user-mode exceptions as same
as x86's ist_enter()/ist_exit().

The debug exception can interrupt in idle task. For example, it warns
if we put a kprobe on a function called from idle task as below.
The warning message showed that the rcu_read_lock() caused this
problem. But actually, this means the RCU is lost the context which
is already in NMI/IRQ.

/sys/kernel/debug/tracing # echo p default_idle_call >> kprobe_events
/sys/kernel/debug/tracing # echo 1 > events/kprobes/enable
/sys/kernel/debug/tracing # [ 135.122237]
[ 135.125035] =============================
[ 135.125310] WARNING: suspicious RCU usage
[ 135.125581] 5.2.0-08445-g9187c508bdc7 #20 Not tainted
[ 135.125904] -----------------------------
[ 135.126205] include/linux/rcupdate.h:594 rcu_read_lock() used illegally while idle!
[ 135.126839]
[ 135.126839] other info that might help us debug this:
[ 135.126839]
[ 135.127410]
[ 135.127410] RCU used illegally from idle CPU!
[ 135.127410] rcu_scheduler_active = 2, debug_locks = 1
[ 135.128114] RCU used illegally from extended quiescent state!
[ 135.128555] 1 lock held by swapper/0/0:
[ 135.128944] #0: (____ptrval____) (rcu_read_lock){....}, at: call_break_hook+0x0/0x178
[ 135.130499]
[ 135.130499] stack backtrace:
[ 135.131192] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-08445-g9187c508bdc7 #20
[ 135.131841] Hardware name: linux,dummy-virt (DT)
[ 135.132224] Call trace:
[ 135.132491] dump_backtrace+0x0/0x140
[ 135.132806] show_stack+0x24/0x30
[ 135.133133] dump_stack+0xc4/0x10c
[ 135.133726] lockdep_rcu_suspicious+0xf8/0x108
[ 135.134171] call_break_hook+0x170/0x178
[ 135.134486] brk_handler+0x28/0x68
[ 135.134792] do_debug_exception+0x90/0x150
[ 135.135051] el1_dbg+0x18/0x8c
[ 135.135260] default_idle_call+0x0/0x44
[ 135.135516] cpu_startup_entry+0x2c/0x30
[ 135.135815] rest_init+0x1b0/0x280
[ 135.136044] arch_call_rest_init+0x14/0x1c
[ 135.136305] start_kernel+0x4d4/0x500
[ 135.136597]

So make debug exception visible to RCU can fix this warning.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>


# b98cca44 16-Jul-2019 Anshuman Khandual <anshuman.khandual@arm.com>

mm, kprobes: generalize and rename notify_page_fault() as kprobe_page_fault()

Architectures which support kprobes have very similar boilerplate around
calling kprobe_fault_handler(). Use a helper function in kprobes.h to
unify them, based on the x86 code.

This changes the behaviour for other architectures when preemption is
enabled. Previously, they would have disabled preemption while calling
the kprobe handler. However, preemption would be disabled if this fault
was due to a kprobe, so we know the fault was not due to a kprobe
handler and can simply return failure.

This behaviour was introduced in commit a980c0ef9f6d ("x86/kprobes:
Refactor kprobes_fault() like kprobe_exceptions_notify()")

[anshuman.khandual@arm.com: export kprobe_fault_handler()]
Link: http://lkml.kernel.org/r/1561133358-8876-1-git-send-email-anshuman.khandual@arm.com
Link: http://lkml.kernel.org/r/1560420444-25737-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# caab277b 02-Jun-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http www gnu org
licenses

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 503 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 4745224b 07-Jun-2019 Anshuman Khandual <anshuman.khandual@arm.com>

arm64/mm: Refactor __do_page_fault()

__do_page_fault() is over complicated with multiple goto statements. This
cleans up the code flow and while there drops local variable vm_fault_t.

Reviewed-by: Mark Rutland <Mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# c49bd02f 07-Jun-2019 Anshuman Khandual <anshuman.khandual@arm.com>

arm64/mm: Document write abort detection from ESR

This patch adds an is_write_abort() wrapper and documents the detection
of the abort type on cache maintenance operations.

Cc: Will Deacon <will.deacon@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
[catalin.marinas@arm.com: only keep the is_write_abort() wrapper]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 61681036 02-Jun-2019 Anshuman Khandual <anshuman.khandual@arm.com>

arm64/mm: Drop task_struct argument from __do_page_fault()

The task_struct argument is not getting used in __do_page_fault(). Hence
just drop it and use current or cuurent->mm instead where ever required.
This does not change any functionality.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# a0509313 02-Jun-2019 Anshuman Khandual <anshuman.khandual@arm.com>

arm64/mm: Drop mmap_sem before calling __do_kernel_fault()

There is an inconsistency between down_read_trylock() success and failure
paths while dealing with kernel access for non exception table areas where
it calls __do_kernel_fault(). In case of failure it just bails out without
holding mmap_sem but when it succeeds it does so while holding mmap_sem.
Fix this inconsistency by just dropping mmap_sem in success path as well.

__do_kernel_fault() calls die_kernel_fault() which then calls show_pte().
show_pte() in this path might become bit more unreliable without holding
mmap_sem. But there are already instances [1] in do_page_fault() where
die_kernel_fault() gets called without holding mmap_sem. show_pte() can
be made more robust independently but in a later patch.

[1] Conditional block for (is_ttbr0_addr && is_el1_permission_fault)

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 01de1776 04-May-2019 Anshuman Khandual <anshuman.khandual@arm.com>

arm64/mm: Identify user instruction aborts

We don't currently set the FAULT_FLAG_INSTRUCTION mm flag for EL0
instruction aborts. This has no functional impact, as we don't override
arch_vma_access_permitted(), and the default implementation always returns
true. However, it would be helpful to provide the flag so that it can be
consumed by tracepoints such as dax_pmd_fault.

This patch sets the FAULT_FLAG_INSTRUCTION flag for EL0 instruction aborts.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 969f5ea6 29-Apr-2019 Will Deacon <will@kernel.org>

arm64: errata: Add workaround for Cortex-A76 erratum #1463225

Revisions of the Cortex-A76 CPU prior to r4p0 are affected by an erratum
that can prevent interrupts from being taken when single-stepping.

This patch implements a software workaround to prevent userspace from
effectively being able to disable interrupts.

Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 48caebf7 13-May-2019 Will Deacon <will@kernel.org>

arm64: Print physical address of page table base in show_pte()

When dumping the page table in response to an unexpected kernel page
fault, we print the virtual (hashed) address of the page table base, but
display physical addresses for everything else.

Make the page table dumping code in show_pte() consistent, by printing
the page table base pointer as a physical address.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 52c6d145 24-Feb-2019 Will Deacon <will@kernel.org>

arm64: debug: Remove unused return value from do_debug_exception()

do_debug_exception() goes out of its way to return a value that isn't
ever used, so just make the thing void.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 7048a597 03-Apr-2019 Will Deacon <will@kernel.org>

arm64: mm: Make show_pte() a static function

show_pte() doesn't have any external callers, so make it static.

Signed-off-by: Will Deacon <will.deacon@arm.com>


# b9a4b9d0 01-Mar-2019 Will Deacon <will@kernel.org>

arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals

FAR_EL1 is UNKNOWN for all debug exceptions other than those caused by
taking a hardware watchpoint. Unfortunately, if a debug handler returns
a non-zero value, then we will propagate the UNKNOWN FAR value to
userspace via the si_addr field of the SIGTRAP siginfo_t.

Instead, let's set si_addr to take on the PC of the faulting instruction,
which we have available in the current pt_regs.

Cc: <stable@vger.kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# d44f1b8d 29-Jan-2019 James Morse <james.morse@arm.com>

arm64: KVM/mm: Move SEA handling behind a single 'claim' interface

To split up APEIs in_nmi() path, the caller needs to always be
in_nmi(). Add a helper to do the work and claim the notification.

When KVM or the arch code takes an exception that might be a RAS
notification, it asks the APEI firmware-first code whether it wants
to claim the exception. A future kernel-first mechanism may be queried
afterwards, and claim the notification, otherwise we fall through
to the existing default behaviour.

The NOTIFY_SEA code was merged before considering multiple, possibly
interacting, NMI-like notifications and the need to consider kernel
first in the future. Make the 'claiming' behaviour explicit.

Restructuring the APEI code to allow multiple NMI-like notifications
means any notification that might interrupt interrupts-masked
code must always be wrapped in nmi_enter()/nmi_exit(). This will
allow APEI to use in_nmi() to use the right fixmap entries.

Mask SError over this window to prevent an asynchronous RAS error
arriving and tripping 'nmi_enter()'s BUG_ON(in_nmi()).

Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 0db5e022 29-Jan-2019 James Morse <james.morse@arm.com>

KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing

To split up APEIs in_nmi() path, the caller needs to always be
in_nmi(). KVM shouldn't have to know about this, pull the RAS plumbing
out into a header file.

Currently guest synchronous external aborts are claimed as RAS
notifications by handle_guest_sea(), which is hidden in the arch codes
mm/fault.c. 32bit gets a dummy declaration in system_misc.h.

There is going to be more of this in the future if/when the kernel
supports the SError-based firmware-first notification mechanism and/or
kernel-first notifications for both synchronous external abort and
SError. Each of these will come with some Kconfig symbols and a
handful of header files.

Create a header file for all this.

This patch gives handle_guest_sea() a 'kvm_' prefix, and moves the
declarations to kvm_ras.h as preparation for a future patch that moves
the ACPI-specific RAS code out of mm/fault.c.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 356607f2 28-Dec-2018 Andrey Konovalov <andreyknvl@google.com>

kasan, arm64: fix up fault handling logic

Right now arm64 fault handling code removes pointer tags from addresses
covered by TTBR0 in faults taken from both EL0 and EL1, but doesn't do
that for pointers covered by TTBR1.

This patch adds two helper functions is_ttbr0_addr() and is_ttbr1_addr(),
where the latter one accounts for the fact that TTBR1 pointers might be
tagged when tag-based KASAN is in use, and uses these helper functions to
perform pointer checks in arch/arm64/mm/fault.c.

Link: http://lkml.kernel.org/r/3f349b0e9e48b5df3298a6b4ae0634332274494a.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 67e7fdfc 06-Dec-2018 Steve Capper <steve.capper@arm.com>

arm64: mm: introduce 52-bit userspace support

On arm64 there is optional support for a 52-bit virtual address space.
To exploit this one has to be running with a 64KB page size and be
running on hardware that supports this.

For an arm64 kernel supporting a 48 bit VA with a 64KB page size,
some changes are needed to support a 52-bit userspace:
* TCR_EL1.T0SZ needs to be 12 instead of 16,
* TASK_SIZE needs to reflect the new size.

This patch implements the above when the support for 52-bit VAs is
detected at early boot time.

On arm64 userspace addresses translation is controlled by TTBR0_EL1. As
well as userspace, TTBR0_EL1 controls:
* The identity mapping,
* EFI runtime code.

It is possible to run a kernel with an identity mapping that has a
larger VA size than userspace (and for this case __cpu_set_tcr_t0sz()
would set TCR_EL1.T0SZ as appropriate). However, when the conditions for
52-bit userspace are met; it is possible to keep TCR_EL1.T0SZ fixed at
12. Thus in this patch, the TCR_EL1.T0SZ size changing logic is
disabled.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 9a0c0328 28-Aug-2018 Julien Thierry <julien.thierry.kdev@gmail.com>

arm64: Use daifflag_restore after bp_hardening

For EL0 entries requiring bp_hardening, daif status is kept at
DAIF_PROCCTX_NOIRQ until after hardening has been done. Then interrupts
are enabled through local_irq_enable().

Before using local_irq_* functions, daifflags should be properly restored
to a state where IRQs are enabled.

Enable IRQs by restoring DAIF_PROCCTX state after bp hardening.

Acked-by: James Morse <james.morse@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 359048f9 22-Sep-2018 Anshuman Khandual <anshuman.khandual@arm.com>

arm64/mm: Define esr_to_debug_fault_info()

fault_info[] and debug_fault_info[] are static arrays defining memory abort
exception handling functions looking into ESR fault status code encodings.
As esr_to_fault_info() is already available providing fault_info[] array
lookup, it really makes sense to have a corresponding debug_fault_info[]
array lookup function as well. This just adds an equivalent helper function
esr_to_debug_fault_info().

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# dbfe3828 22-Sep-2018 Anshuman Khandual <anshuman.khandual@arm.com>

arm64/mm: Reorganize arguments for is_el1_permission_fault()

Most memory abort exception handling related functions have the arguments
in the order (addr, esr, regs) except is_el1_permission_fault(). This
changes the argument order in this function as (addr, esr, regs) like
others.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 00bbd5d9 22-Sep-2018 Anshuman Khandual <anshuman.khandual@arm.com>

arm64/mm: Use ESR_ELx_FSC macro while decoding fault exception

Just replace hard code value of 63 (0x111111) with an existing macro
ESR_ELx_FSC when parsing for the status code during fault exception.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# b4d5557c 22-Sep-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/arm64: Add and use arm64_force_sig_mceerr as appropriate

Add arm64_force_sig_mceerr for consistency with arm64_force_sig_fault,
and use it in the one location that can take advantage of it.

This removes the fiddly filling out of siginfo before sending a signal
reporting an memory error to userspace.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# feca355b 22-Sep-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/arm64: Add and use arm64_force_sig_fault where appropriate

Wrap force_sig_fault with a helper that calls arm64_show_signal
and call arm64_force_sig_fault where appropraite.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>


# 559d8d91 22-Sep-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/arm64: Only call set_thread_esr once in do_page_fault

This code is truly common between the signal sending cases so share it.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>


# 2d2837fa 22-Sep-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/arm64: Only perform one esr_to_fault_info call in do_page_fault

As this work is truly common between all of the signal sending cases
there is no need to repeat it between the different cases.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>


# effb093a 22-Sep-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/arm64: Expand __do_user_fault and remove it

Not all of the signals passed to __do_user_fault can be handled
the same way so expand the now tiny __do_user_fault in it's callers
and remove it.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# aefab2b4 22-Sep-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/arm64: For clarity separate the 3 signal sending cases in do_page_fault

It gets easy to confuse what is going on when some code is shared and some not
so stop sharing the trivial bits of signal generation to make future updates
easier to understand.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 9ea3a974 22-Sep-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/arm64: Consolidate the two hwpoison cases in do_page_fault

These two cases are practically the same and use siginfo differently
from the other signals sent from do_page_fault. So consolidate them
to make future changes easier.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# f29ad209 22-Sep-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/arm64: Factor set_thread_esr out of __do_user_fault

This pepares for sending signals with something other than
arm64_force_sig_info.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 24b8f79d 21-Sep-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/arm64: Remove unneeded tsk parameter from arm64_force_sig_info

Every caller passes in current for tsk so there is no need to pass
tsk. Instead make tsk a local variable initialized to current.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 6fa998e8 21-Sep-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/arm64: Push siginfo generation into arm64_notify_die

Instead of generating a struct siginfo before calling arm64_notify_die
pass the signal number, tne sicode and the fault address into
arm64_notify_die and have it call force_sig_fault instead of
force_sig_info to let the generic code generate the struct siginfo.

This keeps code passing just the needed information into
siginfo generating code, making it easier to see what
is happening and harder to get wrong. Further by letting
the generic code handle the generation of struct siginfo
it reduces the number of sites generating struct siginfo
making it possible to review them and verify that all
of the fiddly details for a structure passed to userspace
are handled properly.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# b8925ee2 07-Aug-2018 Will Deacon <will@kernel.org>

arm64: cpu: Move errata and feature enable callbacks closer to callers

The cpu errata and feature enable callbacks are only called via their
respective arm64_cpu_capabilities structure and therefore shouldn't
exist in the global namespace.

Move the PAN, RAS and cache maintenance emulation enable callbacks into
the same files as their corresponding arm64_cpu_capabilities structures,
making them static in the process.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 50a7ca3c 17-Aug-2018 Souptick Joarder <jrdr.linux@gmail.com>

mm: convert return type of handle_mm_fault() caller to vm_fault_t

Use new return type vm_fault_t for fault handler. For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno. Once all instances are converted, vm_fault_t will become a
distinct type.

Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t")

In this patch all the caller of handle_mm_fault() are changed to return
vm_fault_t type.

Link: http://lkml.kernel.org/r/20180617084810.GA6730@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: David S. Miller <davem@davemloft.net>
Cc: Richard Weinberger <richard@nod.at>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1035a078 06-Aug-2018 Dongjiu Geng <gengdongjiu@huawei.com>

arm64 / ACPI: clean the additional checks before calling ghes_notify_sea()

In order to remove the additional check before calling the
ghes_notify_sea(), make stub definition when !CONFIG_ACPI_APEI_SEA.

After this cleanup, we can simply call the ghes_notify_sea() to let
APEI driver handle the SEA notification.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 25be597a 11-Jul-2018 Mark Rutland <mark.rutland@arm.com>

arm64: kill config_sctlr_el1()

Now that we have sysreg_clear_set(), we can consistently use this
instead of config_sctlr_el1().

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# c870f14e 21-May-2018 Mark Rutland <mark.rutland@arm.com>

arm64: Unify kernel fault reporting

In do_page_fault(), we handle some kernel faults early, and simply
die() with a message. For faults handled later, we dump the faulting
address, decode the ESR, walk the page tables, and perform a number of
steps to ensure that this data is reported.

Let's unify the handling of fatal kernel faults with a new
die_kernel_fault() helper, handling all of these details. This is
largely the same as the existing logic in __do_kernel_fault(), except
that addresses are consistently padded to 16 hex characters, as would be
expected for a 64-bit address.

The messages currently logged in do_page_fault are adjusted to fit into
the die_kernel_fault() message template.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 969e61ba 21-May-2018 Mark Rutland <mark.rutland@arm.com>

arm64: make is_permission_fault() name clearer

The naming of is_permission_fault() makes it sound like it should return
true for permission faults from EL0, but by design, it only does so for
faults from EL1.

Let's make this clear by dropping el1 in the name, as we do for
is_el1_instruction_abort().

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# cc198460 22-May-2018 Peter Maydell <peter.maydell@linaro.org>

arm64: fault: Don't leak data in ESR context for user fault on kernel VA

If userspace faults on a kernel address, handing them the raw ESR
value on the sigframe as part of the delivered signal can leak data
useful to attackers who are using information about the underlying hardware
fault type (e.g. translation vs permission) as a mechanism to defeat KASLR.

However there are also legitimate uses for the information provided
in the ESR -- notably the GCC and LLVM sanitizers use this to report
whether wild pointer accesses by the application are reads or writes
(since a wild write is a more serious bug than a wild read), so we
don't want to drop the ESR information entirely.

For faulting addresses in the kernel, sanitize the ESR. We choose
to present userspace with the illusion that there is nothing mapped
in the kernel's part of the address space at all, by reporting all
faults as level 0 translation faults taken to EL1.

These fields are safe to pass through to userspace as they depend
only on the instruction that userspace used to provoke the fault:
EC IL (always)
ISV CM WNR (for all data aborts)
All the other fields in ESR except DFSC are architecturally RES0
for an L0 translation fault taken to EL1, so can be zeroed out
without confusing userspace.

The illusion is not entirely perfect, as there is a tiny wrinkle
where we will report an alignment fault that was not due to the memory
type (for instance a LDREX to an unaligned address) as a translation
fault, whereas if you do this on real unmapped memory the alignment
fault takes precedence. This is not likely to trip anybody up in
practice, as the only users we know of for the ESR information who
care about the behaviour for kernel addresses only really want to
know about the WnR bit.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 3eb0f519 17-Apr-2018 Eric W. Biederman <ebiederm@xmission.com>

signal: Ensure every siginfo we send has all bits initialized

Call clear_siginfo to ensure every stack allocated siginfo is properly
initialized before being passed to the signal sending functions.

Note: It is not safe to depend on C initializers to initialize struct
siginfo on the stack because C is allowed to skip holes when
initializing a structure.

The initialization of struct siginfo in tracehook_report_syscall_exit
was moved from the helper user_single_step_siginfo into
tracehook_report_syscall_exit itself, to make it clear that the local
variable siginfo gets fully initialized.

In a few cases the scope of struct siginfo has been reduced to make it
clear that siginfo siginfo is not used on other paths in the function
in which it is declared.

Instances of using memset to initialize siginfo have been replaced
with calls clear_siginfo for clarity.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# c0cda3b8 26-Mar-2018 Dave Martin <dave.martin@arm.com>

arm64: capabilities: Update prototype for enable call back

We issue the enable() call back for all CPU hwcaps capabilities
available on the system, on all the CPUs. So far we have ignored
the argument passed to the call back, which had a prototype to
accept a "void *" for use with on_each_cpu() and later with
stop_machine(). However, with commit 0a0d111d40fd1
("arm64: cpufeature: Pass capability structure to ->enable callback"),
there are some users of the argument who wants the matching capability
struct pointer where there are multiple matching criteria for a single
capability. Clean up the declaration of the call back to make it clear.

1) Renamed to cpu_enable(), to imply taking necessary actions on the
called CPU for the entry.
2) Pass const pointer to the capability, to allow the call back to
check the entry. (e.,g to check if any action is needed on the CPU)
3) We don't care about the result of the call back, turning this to
a void.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: James Morse <james.morse@arm.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Dave Martin <dave.martin@arm.com>
[suzuki: convert more users, rename call back and drop results]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# af40ff68 08-Mar-2018 Dave Martin <Dave.Martin@arm.com>

arm64: signal: Ensure si_code is valid for all fault signals

Currently, as reported by Eric, an invalid si_code value 0 is
passed in many signals delivered to userspace in response to faults
and other kernel errors. Typically 0 is passed when the fault is
insufficiently diagnosable or when there does not appear to be any
sensible alternative value to choose.

This appears to violate POSIX, and is intuitively wrong for at
least two reasons arising from the fact that 0 == SI_USER:

1) si_code is a union selector, and SI_USER (and si_code <= 0 in
general) implies the existence of a different set of fields
(siginfo._kill) from that which exists for a fault signal
(siginfo._sigfault). However, the code raising the signal
typically writes only the _sigfault fields, and the _kill
fields make no sense in this case.

Thus when userspace sees si_code == 0 (SI_USER) it may
legitimately inspect fields in the inactive union member _kill
and obtain garbage as a result.

There appears to be software in the wild relying on this,
albeit generally only for printing diagnostic messages.

2) Software that wants to be robust against spurious signals may
discard signals where si_code == SI_USER (or <= 0), or may
filter such signals based on the si_uid and si_pid fields of
siginfo._sigkill. In the case of fault signals, this means
that important (and usually fatal) error conditions may be
silently ignored.

In practice, many of the faults for which arm64 passes si_code == 0
are undiagnosable conditions such as exceptions with syndrome
values in ESR_ELx to which the architecture does not yet assign any
meaning, or conditions indicative of a bug or error in the kernel
or system and thus that are unrecoverable and should never occur in
normal operation.

The approach taken in this patch is to translate all such
undiagnosable or "impossible" synchronous fault conditions to
SIGKILL, since these are at least probably localisable to a single
process. Some of these conditions should really result in a kernel
panic, but due to the lack of diagnostic information it is
difficult to be certain: this patch does not add any calls to
panic(), but this could change later if justified.

Although si_code will not reach userspace in the case of SIGKILL,
it is still desirable to pass a nonzero value so that the common
siginfo handling code can detect incorrect use of si_code == 0
without false positives. In this case the si_code dependent
siginfo fields will not be correctly initialised, but since they
are not passed to userspace I deem this not to matter.

A few faults can reasonably occur in realistic userspace scenarios,
and _should_ raise a regular, handleable (but perhaps not
ignorable/blockable) signal: for these, this patch attempts to
choose a suitable standard si_code value for the raised signal in
each case instead of 0.

arm64 was the only arch to define a BUS_FIXME code, so after this
patch nobody defines it. This patch therefore also removes the
relevant code from siginfo_layout().

Cc: James Morse <james.morse@arm.com>
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 92ff0674 20-Feb-2018 Will Deacon <will@kernel.org>

arm64: mm: Rework unhandled user pagefaults to call arm64_force_sig_info

Reporting unhandled user pagefaults via arm64_force_sig_info means
that __do_user_fault can be drastically simplified, since it no longer
has to worry about printing the fault information and can consequently
just take the siginfo as a parameter.

Signed-off-by: Will Deacon <will.deacon@arm.com>


# 1049c308 20-Feb-2018 Will Deacon <will@kernel.org>

arm64: Pass user fault info to arm64_notify_die instead of printing it

There's no need for callers of arm64_notify_die to print information
about user faults. Instead, they can pass a string to arm64_notify_die
which will be printed subject to show_unhandled_signals.

Signed-off-by: Will Deacon <will.deacon@arm.com>


# 20a004e7 15-Feb-2018 Will Deacon <will@kernel.org>

arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables

In many cases, page tables can be accessed concurrently by either another
CPU (due to things like fast gup) or by the hardware page table walker
itself, which may set access/dirty bits. In such cases, it is important
to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
entries cannot be torn, merged or subject to apparent loss of coherence
due to compiler transformations.

Whilst there are some scenarios where this cannot happen (e.g. pinned
kernel mappings for the linear region), the overhead of using READ_ONCE
/WRITE_ONCE everywhere is minimal and makes the code an awful lot easier
to reason about. This patch consistently uses these macros in the arch
code, as well as explicitly namespacing pointers to page table entries
from the entries themselves by using adopting a 'p' suffix for the former
(as is sometimes used elsewhere in the kernel source).

Tested-by: Yury Norov <ynorov@caviumnetworks.com>
Tested-by: Richard Ruigrok <rruigrok@codeaurora.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 30d88c0e 02-Feb-2018 Will Deacon <will@kernel.org>

arm64: entry: Apply BP hardening for suspicious interrupts from EL0

It is possible to take an IRQ from EL0 following a branch to a kernel
address in such a way that the IRQ is prioritised over the instruction
abort. Whilst an attacker would need to get the stars to align here,
it might be sufficient with enough calibration so perform BP hardening
in the rare case that we see a kernel address in the ELR when handling
an IRQ from EL0.

Reported-by: Dan Hettena <dhettena@nvidia.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 5dfc6ed2 02-Feb-2018 Will Deacon <will@kernel.org>

arm64: entry: Apply BP hardening for high-priority synchronous exceptions

Software-step and PC alignment fault exceptions have higher priority than
instruction abort exceptions, so apply the BP hardening hooks there too
if the user PC appears to reside in kernel space.

Reported-by: Dan Hettena <dhettena@nvidia.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 51369e39 05-Feb-2018 Robin Murphy <robin.murphy@arm.com>

arm64: Make USER_DS an inclusive limit

Currently, USER_DS represents an exclusive limit while KERNEL_DS is
inclusive. In order to do some clever trickery for speculation-safe
masking, we need them both to behave equivalently - there aren't enough
bits to make KERNEL_DS exclusive, so we have precisely one option. This
also happens to correct a longstanding false negative for a range
ending on the very top byte of kernel memory.

Mark Rutland points out that we've actually got the semantics of
addresses vs. segments muddled up in most of the places we need to
amend, so shuffle the {USER,KERNEL}_DS definitions around such that we
can correct those properly instead of just pasting "-1"s everywhere.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 526c3ddb 03-Jan-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/arm64: Document conflicts with SI_USER and SIGFPE,SIGTRAP,SIGBUS

Setting si_code to 0 results in a userspace seeing an si_code of 0.
This is the same si_code as SI_USER. Posix and common sense requires
that SI_USER not be a signal specific si_code. As such this use of 0
for the si_code is a pretty horribly broken ABI.

Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a
value of __SI_KILL and now sees a value of SIL_KILL with the result
that uid and pid fields are copied and which might copying the si_addr
field by accident but certainly not by design. Making this a very
flakey implementation.

Utilizing FPE_FIXME, BUS_FIXME, TRAP_FIXME siginfo_layout will now return
SIL_FAULT and the appropriate fields will be reliably copied.

But folks this is a new and unique kind of bad. This is massively
untested code bad. This is inventing new and unique was to get
siginfo wrong bad. This is don't even think about Posix or what
siginfo means bad. This is lots of eyeballs all missing the fact
that the code does the wrong thing bad. This is getting stuck
and keep making the same mistake bad.

I really hope we can find a non userspace breaking fix for this on a
port as new as arm64.

Possible ABI fixes include:
- Send the signal without siginfo
- Don't generate a signal
- Possibly assign and use an appropriate si_code
- Don't handle cases which can't happen

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tyler Baicar <tbaicar@codeaurora.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Olof Johansson <olof@lixom.net>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arm-kernel@lists.infradead.org
Ref: 53631b54c870 ("arm64: Floating point and SIMD")
Ref: 32015c235603 ("arm64: exception: handle Synchronous External Abort")
Ref: 1d18c47c735e ("arm64: MMU fault handling and page table management")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 0f15adbb 03-Jan-2018 Will Deacon <will@kernel.org>

arm64: Add skeleton to harden the branch predictor against aliasing attacks

Aliasing attacks against CPU branch predictors can allow an attacker to
redirect speculative control flow on some CPUs and potentially divulge
information from one context to another.

This patch adds initial skeleton code behind a new Kconfig option to
enable implementation-specific mitigations against these attacks for
CPUs that are affected.

Co-developed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# faa75e14 13-Dec-2017 Dongjiu Geng <gengdongjiu@huawei.com>

arm64: fault: avoid send SIGBUS two times

do_sea() calls arm64_notify_die() which will always signal
user-space. It also returns whether APEI claimed the external
abort as a RAS notification. If it returns failure do_mem_abort()
will signal user-space too.

do_mem_abort() wants to know if we handled the error, we always
call arm64_notify_die() so can always return success.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 80b6eb04 31-Oct-2017 Will Deacon <will@kernel.org>

arm64: Don't walk page table for user faults in do_mem_abort

Commit 42dbf54e8890 ("arm64: consistently log ESR and page table")
dumps page table entries for user faults hitting do_bad entries in the
fault handler table. Whilst this shouldn't really happen in practice,
it's not beyond the realms of possibility if e.g. running an old kernel
on a new CPU.

Generally, we want to avoid exposing physical addresses under the control
of userspace (see commit bf396c09c24 ("arm64: mm: don't print out page
table entries on EL0 faults")), so walk the page tables only on exceptions
from EL1.

Reported-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 42dbf54e 19-Oct-2017 Mark Rutland <mark.rutland@arm.com>

arm64: consistently log ESR and page table

When we take a fault we can't handle, we try to dump some relevant
information, but we're not consistent about doing so.

In do_mem_abort(), we log the full ESR, but don't dump a page table
walk. In __do_kernel_fault, we dump an attempted decoding of the ESR
(but not the ESR itself) along with a page table walk.

Let's try to make things more consistent by dumping the full ESR in
mem_abort_decode(), and having do_mem_abort dump a page table walk. The
existing dump of the ESR in do_mem_abort() is rendered redundant, and
removed.

Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Kristina Martsenko <kristina.martsenko@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 3f7c86b2 17-Oct-2017 Julien Thierry <julien.thierry.kdev@gmail.com>

arm64: Update fault_info table with new exception types

Based on: ARM Architecture Reference Manual, ARMv8 (DDI 0487B.b).

ARMv8.1 introduces the optional feature ARMv8.1-TTHM which can trigger a
new type of memory abort. This exception is triggered when hardware update
of page table flags is not atomic in regards to other memory accesses.
Replace the corresponding unknown entry with a more accurate one.

Cf: Section D10.2.28 ESR_ELx, Exception Syndrome Register (p D10-2381),
section D4.4.11 Restriction on memory types for hardware updates on page
tables (p D4-2116 - D4-2117).

ARMv8.2 does not add new exception types, however it is worth mentioning
that when obligatory feature RAS (optional for ARMv8.{0,1}) is implemented,
exceptions related to "Synchronous parity or ECC error on memory access,
not on translation table walk" become reserved and should not occur.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 0a6de8b8 01-Oct-2017 Mark Rutland <mark.rutland@arm.com>

arm64: fix misleading data abort decoding

Currently data_abort_decode() dumps the ISS field as a decimal value
with a '0x' prefix, which is somewhat misleading.

Fix it to print as hexadecimal, as was intended.

Fixes: 1f9b8936f36f4a8e ("arm64: Decode information from ESR upon mem faults")
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# f67d5c4f 22-Sep-2017 Will Deacon <will@kernel.org>

arm64: mm: Remove useless and wrong comments from fault.c

Fault.c seems to be a magnet for useless and wrong comments, largely
due to its ancestry in other architectures where the code has since
moved on, but the comments have remained intact.

This patch removes both useless and incorrect comments, leaving only
those that say something correct and relevant.

Reported-by: Wenjia Zhou <zhiyuan_zhu@htc.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 760bfb47 28-Sep-2017 Will Deacon <will@kernel.org>

arm64: fault: Route pte translation faults via do_translation_fault

We currently route pte translation faults via do_page_fault, which elides
the address check against TASK_SIZE before invoking the mm fault handling
code. However, this can cause issues with the path walking code in
conjunction with our word-at-a-time implementation because
load_unaligned_zeropad can end up faulting in kernel space if it reads
across a page boundary and runs into a page fault (e.g. by attempting to
read from a guard region).

In the case of such a fault, load_unaligned_zeropad has registered a
fixup to shift the valid data and pad with zeroes, however the abort is
reported as a level 3 translation fault and we dispatch it straight to
do_page_fault, despite it being a kernel address. This results in calling
a sleeping function from atomic context:

BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:313
in_atomic(): 0, irqs_disabled(): 0, pid: 10290
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[...]
[<ffffff8e016cd0cc>] ___might_sleep+0x134/0x144
[<ffffff8e016cd158>] __might_sleep+0x7c/0x8c
[<ffffff8e016977f0>] do_page_fault+0x140/0x330
[<ffffff8e01681328>] do_mem_abort+0x54/0xb0
Exception stack(0xfffffffb20247a70 to 0xfffffffb20247ba0)
[...]
[<ffffff8e016844fc>] el1_da+0x18/0x78
[<ffffff8e017f399c>] path_parentat+0x44/0x88
[<ffffff8e017f4c9c>] filename_parentat+0x5c/0xd8
[<ffffff8e017f5044>] filename_create+0x4c/0x128
[<ffffff8e017f59e4>] SyS_mkdirat+0x50/0xc8
[<ffffff8e01684e30>] el0_svc_naked+0x24/0x28
Code: 36380080 d5384100 f9400800 9402566d (d4210000)
---[ end trace 2d01889f2bca9b9f ]---

Fix this by dispatching all translation faults to do_translation_faults,
which avoids invoking the page fault logic for faults on kernel addresses.

Cc: <stable@vger.kernel.org>
Reported-by: Ankit Jain <ankijain@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 289d07a2 11-Jul-2017 Mark Rutland <mark.rutland@arm.com>

arm64: mm: abort uaccess retries upon fatal signal

When there's a fatal signal pending, arm64's do_page_fault()
implementation returns 0. The intent is that we'll return to the
faulting userspace instruction, delivering the signal on the way.

However, if we take a fatal signal during fixing up a uaccess, this
results in a return to the faulting kernel instruction, which will be
instantly retried, resulting in the same fault being taken forever. As
the task never reaches userspace, the signal is not delivered, and the
task is left unkillable. While the task is stuck in this state, it can
inhibit the forward progress of the system.

To avoid this, we must ensure that when a fatal signal is pending, we
apply any necessary fixup for a faulting kernel instruction. Thus we
will return to an error path, and it is up to that code to make forward
progress towards delivering the fatal signal.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Steve Capper <steve.capper@arm.com>
Tested-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# af29678f 06-Jul-2017 Catalin Marinas <catalin.marinas@arm.com>

arm64: Remove the !CONFIG_ARM64_HW_AFDBM alternative code paths

Since the pte handling for hardware AF/DBM works even when the hardware
feature is not present, make the pte accessors implementation permanent
and remove the corresponding #ifdefs. The Kconfig option is kept as it
can still be used to disable the feature at the hardware level.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 73e86cb0 04-Jul-2017 Catalin Marinas <catalin.marinas@arm.com>

arm64: Move PTE_RDONLY bit handling out of set_pte_at()

Currently PTE_RDONLY is treated as a hardware only bit and not handled
by the pte_mkwrite(), pte_wrprotect() or the user PAGE_* definitions.
The set_pte_at() function is responsible for setting this bit based on
the write permission or dirty state. This patch moves the PTE_RDONLY
handling out of set_pte_at into the pte_mkwrite()/pte_wrprotect()
functions. The PAGE_* definitions to need to be updated to explicitly
include PTE_RDONLY when !PTE_WRITE.

The patch also removes the redundant PAGE_COPY(_EXEC) definitions as
they are identical to the corresponding PAGE_READONLY(_EXEC).

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 3bbf7157 26-Jun-2017 Catalin Marinas <catalin.marinas@arm.com>

arm64: Convert pte handling from inline asm to using (cmp)xchg

With the support for hardware updates of the access and dirty states,
the following pte handling functions had to be implemented using
exclusives: __ptep_test_and_clear_young(), ptep_get_and_clear(),
ptep_set_wrprotect() and ptep_set_access_flags(). To take advantage of
the LSE atomic instructions and also make the code cleaner, convert
these pte functions to use the more generic cmpxchg()/xchg().

Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 1f9b8936 04-Aug-2017 Julien Thierry <julien.thierry.kdev@gmail.com>

arm64: Decode information from ESR upon mem faults

When receiving unhandled faults from the CPU, description is very sparse.
Adding information about faults decoded from ESR.

Added defines to esr.h corresponding ESR fields. Values are based on ARM
Archtecture Reference Manual (DDI 0487B.a), section D7.2.28 ESR_ELx, Exception
Syndrome Register (ELx) (pages D7-2275 to D7-2280).

New output is of the form:
[ 77.818059] Mem abort info:
[ 77.820826] Exception class = DABT (current EL), IL = 32 bits
[ 77.826706] SET = 0, FnV = 0
[ 77.829742] EA = 0, S1PTW = 0
[ 77.832849] Data abort info:
[ 77.835713] ISV = 0, ISS = 0x00000070
[ 77.839522] CM = 0, WnR = 1

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
[catalin.marinas@arm.com: fix "%lu" in a pr_alert() call]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 6d332747 25-Jul-2017 Catalin Marinas <catalin.marinas@arm.com>

arm64: Fix potential race with hardware DBM in ptep_set_access_flags()

In a system with DBM (dirty bit management) capable agents there is a
possible race between a CPU executing ptep_set_access_flags() (maybe
non-DBM capable) and a hardware update of the dirty state (clearing of
PTE_RDONLY). The scenario:

a) the pte is writable (PTE_WRITE set), clean (PTE_RDONLY set) and old
(PTE_AF clear)
b) ptep_set_access_flags() is called as a result of a read access and it
needs to set the pte to writable, clean and young (PTE_AF set)
c) a DBM-capable agent, as a result of a different write access, is
marking the entry as young (setting PTE_AF) and dirty (clearing
PTE_RDONLY)

The current ptep_set_access_flags() implementation would set the
PTE_RDONLY bit in the resulting value overriding the DBM update and
losing the dirty state.

This patch fixes such race by setting PTE_RDONLY to the most permissive
(lowest value) of the current entry and the new one.

Fixes: 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM")
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 621f48e4 21-Jun-2017 Tyler Baicar <tbaicar@codeaurora.org>

arm/arm64: KVM: add guest SEA support

Currently external aborts are unsupported by the guest abort
handling. Add handling for SEAs so that the host kernel reports
SEAs which occur in the guest kernel.

When an SEA occurs in the guest kernel, the guest exits and is
routed to kvm_handle_guest_abort(). Prior to this patch, a print
message of an unsupported FSC would be printed and nothing else
would happen. With this patch, the code gets routed to the APEI
handling of SEAs in the host kernel to report the SEA information.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 7edda088 21-Jun-2017 Tyler Baicar <tbaicar@codeaurora.org>

acpi: apei: handle SEA notification type for ARMv8

ARM APEI extension proposal added SEA (Synchronous External Abort)
notification type for ARMv8.
Add a new GHES error source handling function for SEA. If an error
source's notification type is SEA, then this function can be registered
into the SEA exception handler. That way GHES will parse and report
SEA exceptions when they occur.
An SEA can interrupt code that had interrupts masked and is treated as
an NMI. To aid this the page of address space for mapping APEI buffers
while in_nmi() is always reserved, and ghes_ioremap_pfn_nmi() is
changed to use the helper methods to find the prot_t to map with in
the same way as ghes_ioremap_pfn_irq().

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 32015c23 21-Jun-2017 Tyler Baicar <tbaicar@codeaurora.org>

arm64: exception: handle Synchronous External Abort

SEA exceptions are often caused by an uncorrected hardware
error, and are handled when data abort and instruction abort
exception classes have specific values for their Fault Status
Code.
When SEA occurs, before killing the process, report the error
in the kernel logs.
Update fault_info[] with specific SEA faults so that the
new SEA handler is used.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
[will: use NULL instead of 0 when assigning si_addr]
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 0e3a9026 08-Jun-2017 Punit Agrawal <punitagrawal@gmail.com>

arm64: mm: Update perf accounting to handle poison faults

Re-organise the perf accounting for fault handling in preparation for
enabling handling of hardware poison faults in subsequent commits. The
change updates perf accounting to be inline with the behaviour on
x86.

With this update, the perf fault accounting -

* Always report PERF_COUNT_SW_PAGE_FAULTS

* Doesn't report anything else for VM_FAULT_ERROR (which includes
hwpoison faults)

* Reports PERF_COUNT_SW_PAGE_FAULTS_MAJ if it's a major
fault (indicated by VM_FAULT_MAJOR)

* Otherwise, reports PERF_COUNT_SW_PAGE_FAULTS_MIN

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# e7c600f1 08-Jun-2017 Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>

arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling

Add VM_FAULT_HWPOISON[_LARGE] handling to the arm64 page fault
handler. Handling of VM_FAULT_HWPOISON[_LARGE] is very similar
to VM_FAULT_OOM, the only difference is that a different si_code
(BUS_MCEERR_AR) is passed to user space and si_addr_lsb field is
initialized.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
(fix new __do_user_fault call-site)
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 1eb34b6e 15-May-2017 Will Deacon <will@kernel.org>

arm64: fault: Print info about page table structure when dumping pte

Whilst debugging a remote crash, I noticed that show_pte is unhelpful
when it comes to describing the structure of the page table being walked.
This is easily fixed by printing out the page table (swapper vs user),
page size and virtual address size when displaying the PGD address.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 83016b20 09-Jun-2017 Kristina Martsenko <kristina.martsenko@arm.com>

arm64: mm: print file name of faulting vma

Print out the name of the file associated with the vma that faulted.
This is usually the executable or shared library name. We already print
out the task name, but also printing the library name is useful for
pinpointing bugs to libraries.

Also print the base address and size of the vma, which together with the
PC (printed by __show_regs) gives the offset into the library.

Fault prints now look like:
test[2361]: unhandled level 2 translation fault (11) at 0x00000012, esr 0x92000006, in libfoo.so[ffffa0145000+1000]

This is already done on x86, for more details see commit 03252919b798
("x86: print which shared library/executable faulted in segfault etc.
messages v3").

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# bf396c09 09-Jun-2017 Kristina Martsenko <kristina.martsenko@arm.com>

arm64: mm: don't print out page table entries on EL0 faults

When we take a fault from EL0 that can't be handled, we print out the
page table entries associated with the faulting address. This allows
userspace to print out any current page table entries, including kernel
(TTBR1) entries. Exposing kernel mappings like this could pose a
security risk, so don't print out page table information on EL0 faults.
(But still print it out for EL1 faults.) This also follows the same
behaviour as x86, printing out page table entries on kernel mode faults
but not user mode faults.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 67ce16ec 09-Jun-2017 Kristina Martsenko <kristina.martsenko@arm.com>

arm64: mm: print out correct page table entries

When we take a fault that can't be handled, we print out the page table
entries associated with the faulting address. In some cases we currently
print out the wrong entries. For a faulting TTBR1 address, we sometimes
print out TTBR0 table entries instead, and for a faulting TTBR0 address
we sometimes print out TTBR1 table entries. Fix this by choosing the
tables based on the faulting address.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[will: zero-extend addrs to 64-bit, don't walk swapper w/ TTBR0 addr]
Signed-off-by: Will Deacon <will.deacon@arm.com>


# c07ab957 08-May-2017 Kefeng Wang <wangkefeng.wang@huawei.com>

arm64: Call __show_regs directly

Generic code expects show_regs() to also dump the stack, but arm64's
show_reg() does not do this. Some arm64 callers of show_regs() *only*
want the registers dumped, without the stack.

To enable generic code to work as expected, we need to make
show_regs() dump the stack. Where we only want the registers dumped,
we must use __show_regs().

This patch updates code to use __show_regs() where only registers are
desired. A subsequent patch will modify show_regs().

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# b824b930 05-Apr-2017 Stephen Boyd <stephen.boyd@linaro.org>

arm64: print a fault message when attempting to write RO memory

If a page is marked read only we should print out that fact,
instead of printing out that there was a page fault. Right now we
get a cryptic error message that something went wrong with an
unhandled fault, but we don't evaluate the esr to figure out that
it was a read/write permission fault.

Instead of seeing:

Unable to handle kernel paging request at virtual address ffff000008e460d8
pgd = ffff800003504000
[ffff000008e460d8] *pgd=0000000083473003, *pud=0000000083503003, *pmd=0000000000000000
Internal error: Oops: 9600004f [#1] PREEMPT SMP

we'll see:

Unable to handle kernel write to read-only memory at virtual address ffff000008e760d8
pgd = ffff80003d3de000
[ffff000008e760d8] *pgd=0000000083472003, *pud=0000000083435003, *pmd=0000000000000000
Internal error: Oops: 9600004f [#1] PREEMPT SMP

We also add a userspace address check into is_permission_fault()
so that the function doesn't return true for ttbr0 PAN faults
when it shouldn't.

Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Acked-by: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 09a6adf5 03-Apr-2017 Victor Kamensky <kamensky@cisco.com>

arm64: mm: unaligned access by user-land should be received as SIGBUS

After 52d7523 (arm64: mm: allow the kernel to handle alignment faults on
user accesses) commit user-land accesses that produce unaligned exceptions
like in case of aarch32 ldm/stm/ldrd/strd instructions operating on
unaligned memory received by user-land as SIGSEGV. It is wrong, it should
be reported as SIGBUS as it was before 52d7523 commit.

Changed do_bad_area function to take signal and code parameters out of esr
value using fault_info table, so in case of do_alignment_fault fault
user-land will receive SIGBUS. Wrapped access to fault_info table into
esr_to_fault_info function.

Cc: <stable@vger.kernel.org>
Fixes: 52d7523 (arm64: mm: allow the kernel to handle alignment faults on user accesses)
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# b17b0153 08-Feb-2017 Ingo Molnar <mingo@kernel.org>

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h>

We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/debug.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 3f07c014 08-Feb-2017 Ingo Molnar <mingo@kernel.org>

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h>

We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# c8b06e3f 09-Jan-2017 James Morse <james.morse@arm.com>

arm64: Remove useless UAO IPI and describe how this gets enabled

Since its introduction, the UAO enable call was broken, and useless.
commit 2a6dcb2b5f3e ("arm64: cpufeature: Schedule enable() calls instead
of calling them via IPI"), fixed the framework so that these calls
are scheduled, so that they can modify PSTATE.

Now it is just useless. Remove it. UAO is enabled by the code patching
which causes get_user() and friends to use the 'ldtr' family of
instructions. This relies on the PSTATE.UAO bit being set to match
addr_limit, which we do in uao_thread_switch() called via __switch_to().

All that is needed to enable UAO is patch the code, and call schedule().
__apply_alternatives_multi_stop() calls stop_machine() when it modifies
the kernel text to enable the alternatives, (including the UAO code in
uao_thread_switch()). Once stop_machine() has finished __switch_to() is
called to reschedule the original task, this causes PSTATE.UAO to be set
appropriately. An explicit enable() call is not needed.

Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>


# 6ef4fb38 03-Jan-2017 Mark Rutland <mark.rutland@arm.com>

arm64: mm: fix show_pte KERN_CONT fallout

Recent changes made KERN_CONT mandatory for continued lines. In the
absence of KERN_CONT, a newline may be implicit inserted by the core
printk code.

In show_pte, we (erroneously) use printk without KERN_CONT for continued
prints, resulting in output being split across a number of lines, and
not matching the intended output, e.g.

[ff000000000000] *pgd=00000009f511b003
, *pud=00000009f4a80003
, *pmd=0000000000000000

Fix this by using pr_cont() for all the continuations.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 78688963 01-Jul-2016 Catalin Marinas <catalin.marinas@arm.com>

arm64: Handle faults caused by inadvertent user access with PAN enabled

When TTBR0_EL1 is set to the reserved page, an erroneous kernel access
to user space would generate a translation fault. This patch adds the
checks for the software-set PSR_PAN_BIT to emulate a permission fault
and report it accordingly.

Cc: Will Deacon <will.deacon@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# a8ada146 28-Oct-2016 Catalin Marinas <catalin.marinas@arm.com>

arm64: Update the synchronous external abort fault description

This patch updates the description of the synchronous external aborts on
translation table walks.

Cc: Will Deacon <will.deacon@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 7209c868 18-Oct-2016 James Morse <james.morse@arm.com>

arm64: mm: Set PSTATE.PAN from the cpu_enable_pan() call

Commit 338d4f49d6f7 ("arm64: kernel: Add support for Privileged Access
Never") enabled PAN by enabling the 'SPAN' feature-bit in SCTLR_EL1.
This means the PSTATE.PAN bit won't be set until the next return to the
kernel from userspace. On a preemptible kernel we may schedule work that
accesses userspace on a CPU before it has done this.

Now that cpufeature enable() calls are scheduled via stop_machine(), we
can set PSTATE.PAN from the cpu_enable_pan() call.

Add WARN_ON_ONCE(in_interrupt()) to check the PSTATE value we updated
is not immediately discarded.

Reported-by: Tony Thompson <anthony.thompson@arm.com>
Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
[will: fixed typo in comment]
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 2a6dcb2b 18-Oct-2016 James Morse <james.morse@arm.com>

arm64: cpufeature: Schedule enable() calls instead of calling them via IPI

The enable() call for a cpufeature/errata is called using on_each_cpu().
This issues a cross-call IPI to get the work done. Implicitly, this
stashes the running PSTATE in SPSR when the CPU receives the IPI, and
restores it when we return. This means an enable() call can never modify
PSTATE.

To allow PAN to do this, change the on_each_cpu() call to use
stop_machine(). This schedules the work on each CPU which allows
us to modify PSTATE.

This involves changing the protype of all the enable() functions.

enable_cpu_capabilities() is called during boot and enables the feature
on all online CPUs. This path now uses stop_machine(). CPU features for
hotplug'd CPUs are enabled by verify_local_cpu_features() which only
acts on the local CPU, and can already modify the running PSTATE as it
is called from secondary_start_kernel().

Reported-by: Tony Thompson <anthony.thompson@arm.com>
Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 0edfa839 19-Sep-2016 Paul Gortmaker <paul.gortmaker@windriver.com>

arm64: migrate exception table users off module.h and onto extable.h

These files were only including module.h for exception table
related functions. We've now separated that content out into its
own file "extable.h" so now move over to that and avoid all the
extra header content in module.h that we don't really need to compile
these files.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# cab15ce6 11-Aug-2016 Catalin Marinas <catalin.marinas@arm.com>

arm64: Introduce execute-only page access permissions

The ARMv8 architecture allows execute-only user permissions by clearing
the PTE_UXN and PTE_USER bits. However, the kernel running on a CPU
implementation without User Access Override (ARMv8.2 onwards) can still
access such page, so execute-only page permission does not protect
against read(2)/write(2) etc. accesses. Systems requiring such
protection must enable features like SECCOMP.

This patch changes the arm64 __P100 and __S100 protection_map[] macros
to the new __PAGE_EXECONLY attributes. A side effect is that
pte_user() no longer triggers for __PAGE_EXECONLY since PTE_USER isn't
set. To work around this, the check is done on the PTE_NG bit via the
pte_ng() macro. VM_READ is also checked now for page faults.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 9adeb8e7 09-Aug-2016 Laura Abbott <labbott@redhat.com>

arm64: Handle el1 synchronous instruction aborts cleanly

Executing from a non-executable area gives an ugly message:

lkdtm: Performing direct entry EXEC_RODATA
lkdtm: attempting ok execution at ffff0000084c0e08
lkdtm: attempting bad execution at ffff000008880700
Bad mode in Synchronous Abort handler detected on CPU2, code 0x8400000e -- IABT (current EL)
CPU: 2 PID: 998 Comm: sh Not tainted 4.7.0-rc2+ #13
Hardware name: linux,dummy-virt (DT)
task: ffff800077e35780 ti: ffff800077970000 task.ti: ffff800077970000
PC is at lkdtm_rodata_do_nothing+0x0/0x8
LR is at execute_location+0x74/0x88

The 'IABT (current EL)' indicates the error but it's a bit cryptic
without knowledge of the ARM ARM. There is also no indication of the
specific address which triggered the fault. The increase in kernel
page permissions makes hitting this case more likely as well.
Handling the case in the vectors gives a much more familiar looking
error message:

lkdtm: Performing direct entry EXEC_RODATA
lkdtm: attempting ok execution at ffff0000084c0840
lkdtm: attempting bad execution at ffff000008880680
Unable to handle kernel paging request at virtual address ffff000008880680
pgd = ffff8000089b2000
[ffff000008880680] *pgd=00000000489b4003, *pud=0000000048904003, *pmd=0000000000000000
Internal error: Oops: 8400000e [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 997 Comm: sh Not tainted 4.7.0-rc1+ #24
Hardware name: linux,dummy-virt (DT)
task: ffff800077f9f080 ti: ffff800008a1c000 task.ti: ffff800008a1c000
PC is at lkdtm_rodata_do_nothing+0x0/0x8
LR is at execute_location+0x74/0x88

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# dcddffd4 26-Jul-2016 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

mm: do not pass mm_struct into handle_mm_fault

We always have vma->vm_mm around.

Link: http://lkml.kernel.org/r/1466021202-61880-8-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2dd0e8d2 07-Jul-2016 Sandeepa Prabhu <sandeepa.s.prabhu@gmail.com>

arm64: Kprobes with single stepping support

Add support for basic kernel probes(kprobes) and jump probes
(jprobes) for ARM64.

Kprobes utilizes software breakpoint and single step debug
exceptions supported on ARM v8.

A software breakpoint is placed at the probe address to trap the
kernel execution into the kprobe handler.

ARM v8 supports enabling single stepping before the break exception
return (ERET), with next PC in exception return address (ELR_EL1). The
kprobe handler prepares an executable memory slot for out-of-line
execution with a copy of the original instruction being probed, and
enables single stepping. The PC is set to the out-of-line slot address
before the ERET. With this scheme, the instruction is executed with the
exact same register context except for the PC (and DAIF) registers.

Debug mask (PSTATE.D) is enabled only when single stepping a recursive
kprobe, e.g.: during kprobes reenter so that probed instruction can be
single stepped within the kprobe handler -exception- context.
The recursion depth of kprobe is always 2, i.e. upon probe re-entry,
any further re-entry is prevented by not calling handlers and the case
counted as a missed kprobe).

Single stepping from the x-o-l slot has a drawback for PC-relative accesses
like branching and symbolic literals access as the offset from the new PC
(slot address) may not be ensured to fit in the immediate value of
the opcode. Such instructions need simulation, so reject
probing them.

Instructions generating exceptions or cpu mode change are rejected
for probing.

Exclusive load/store instructions are rejected too. Additionally, the
code is checked to see if it is inside an exclusive load/store sequence
(code from Pratyush).

System instructions are mostly enabled for stepping, except MSR/MRS
accesses to "DAIF" flags in PSTATE, which are not safe for
probing.

This also changes arch/arm64/include/asm/ptrace.h to use
include/asm-generic/ptrace.h.

Thanks to Steve Capper and Pratyush Anand for several suggested
Changes.

Signed-off-by: Sandeepa Prabhu <sandeepa.s.prabhu@gmail.com>
Signed-off-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Pratyush Anand <panand@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# e19a6ee2 20-Jun-2016 James Morse <james.morse@arm.com>

arm64: kernel: Save and restore UAO and addr_limit on exception entry

If we take an exception while at EL1, the exception handler inherits
the original context's addr_limit and PSTATE.UAO values. To be consistent
always reset addr_limit and PSTATE.UAO on (re-)entry to EL1. This
prevents accidental re-use of the original context's addr_limit.

Based on a similar patch for arm from Russell King.

Cc: <stable@vger.kernel.org> # 4.6-
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 541ec870 30-May-2016 Mark Rutland <mark.rutland@arm.com>

arm64: kill ESR_LNX_EXEC

Currently we treat ESR_EL1 bit 24 as software-defined for distinguishing
instruction aborts from data aborts, but this bit is architecturally
RES0 for instruction aborts, and could be allocated for an arbitrary
purpose in future. Additionally, we hard-code the value in entry.S
without the mnemonic, making the code difficult to understand.

Instead, remove ESR_LNX_EXEC, and distinguish aborts based on the esr,
which we already pass to the sole use of ESR_LNX_EXEC. A new helper,
is_el0_instruction_abort() is added to make the logic clear. Any
instruction aborts taken from EL1 will already have been handled by
bad_mode, so we need not handle that case in the helper.

For consistency, the existing permission_fault helper is renamed to
is_permission_fault, and the return type is changed to bool. There
should be no functional changes as the return value was a boolean
expression, and the result is only used in another boolean expression.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Dave P Martin <dave.martin@arm.com>
Cc: Huang Shijie <shijie.huang@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 275f344b 30-May-2016 Mark Rutland <mark.rutland@arm.com>

arm64: add macro to extract ESR_ELx.EC

Several places open-code extraction of the EC field from an ESR_ELx
value, in subtly different ways. This is unfortunate duplication and
variation, and the precise logic used to extract the field is a
distraction.

This patch adds a new macro, ESR_ELx_EC(), to extract the EC field from
an ESR_ELx value in a consistent fashion.

Existing open-coded extractions in core arm64 code are moved over to the
new helper. KVM code is left as-is for the moment.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Huang Shijie <shijie.huang@arm.com>
Cc: Dave P Martin <dave.martin@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# bbb1681e 13-Jun-2016 Mark Rutland <mark.rutland@arm.com>

arm64: mm: mark fault_info table const

Unlike the debug_fault_info table, we never intentionally alter the
fault_info table at runtime, and all derived pointers are treated as
const currently.

Make the table const so that it can be placed in .rodata and protected
from unintentional writes, as we do for the syscall tables.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 0106d456 07-Jun-2016 Will Deacon <will@kernel.org>

arm64: mm: always take dirty state from new pte in ptep_set_access_flags

Commit 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for
hardware AF/DBM") ensured that pte flags are updated atomically in the
face of potential concurrent, hardware-assisted updates. However, Alex
reports that:

| This patch breaks swapping for me.
| In the broken case, you'll see either systemd cpu time spike (because
| it's stuck in a page fault loop) or the system hang (because the
| application owning the screen is stuck in a page fault loop).

It turns out that this is because the 'dirty' argument to
ptep_set_access_flags is always 0 for read faults, and so we can't use
it to set PTE_RDONLY. The failing sequence is:

1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte
2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF
3. A read faults due to the missing access flag
4. ptep_set_access_flags is called with dirty = 0, due to the read fault
5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!)
6. A write faults, but pte_write is true so we get stuck

The solution is to check the new page table entry (as would be done by
the generic, non-atomic definition of ptep_set_access_flags that just
calls set_pte_at) to establish the dirty state.

Cc: <stable@vger.kernel.org> # 4.3+
Fixes: 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Alexander Graf <agraf@suse.de>
Tested-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 3a72db70 17-Apr-2016 Huang Shijie <shijie.huang@arm.com>

arm64: mm: remove the redundant code

We already re-enable interrupts where necessary in the entry code, so
there is no need to do it again in do_page fault. This patch removes
the redundant code.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 66dbd6e6 13-Apr-2016 Catalin Marinas <catalin.marinas@arm.com>

arm64: Implement ptep_set_access_flags() for hardware AF/DBM

When hardware updates of the access and dirty states are enabled, the
default ptep_set_access_flags() implementation based on calling
set_pte_at() directly is potentially racy. This triggers the "racy dirty
state clearing" warning in set_pte_at() because an existing writable PTE
is overridden with a clean entry.

There are two main scenarios for this situation:

1. The CPU getting an access fault does not support hardware updates of
the access/dirty flags. However, a different agent in the system
(e.g. SMMU) can do this, therefore overriding a writable entry with a
clean one could potentially lose the automatically updated dirty
status

2. A more complex situation is possible when all CPUs support hardware
AF/DBM:

a) Initial state: shareable + writable vma and pte_none(pte)
b) Read fault taken by two threads of the same process on different
CPUs
c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
eventually reaches do_set_pte() which sets a writable + clean pte.
CPU0 releases the mmap_sem
d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
pte entry it reads is present, writable and clean and it continues
to pte_mkyoung()
e) CPU1 calls ptep_set_access_flags()

If between (d) and (e) the hardware (another CPU) updates the dirty
state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
marking the entry clean again.

This patch implements an arm64-specific ptep_set_access_flags() function
to perform an atomic update of the PTE flags.

Fixes: 2f4b829c625e ("arm64: Add support for hardware updates of the access and dirty pte bits")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Ming Lei <tom.leiming@gmail.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # 4.3+
[will: reworded comment]
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 6afedcd2 13-Apr-2016 James Morse <james.morse@arm.com>

arm64: mm: Add trace_irqflags annotations to do_debug_exception()

With CONFIG_PROVE_LOCKING, CONFIG_DEBUG_LOCKDEP and CONFIG_TRACE_IRQFLAGS
enabled, lockdep will compare current->hardirqs_enabled with the flags from
local_irq_save().

When a debug exception occurs, interrupts are disabled in entry.S, but
lockdep isn't told, resulting in:
DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
------------[ cut here ]------------
WARNING: at ../kernel/locking/lockdep.c:3523
Modules linked in:
CPU: 3 PID: 1752 Comm: perf Not tainted 4.5.0-rc4+ #2204
Hardware name: ARM Juno development board (r1) (DT)
task: ffffffc974868000 ti: ffffffc975f40000 task.ti: ffffffc975f40000
PC is at check_flags.part.35+0x17c/0x184
LR is at check_flags.part.35+0x17c/0x184
pc : [<ffffff80080fc93c>] lr : [<ffffff80080fc93c>] pstate: 600003c5
[...]
---[ end trace 74631f9305ef5020 ]---
Call trace:
[<ffffff80080fc93c>] check_flags.part.35+0x17c/0x184
[<ffffff80080ffe30>] lock_acquire+0xa8/0xc4
[<ffffff8008093038>] breakpoint_handler+0x118/0x288
[<ffffff8008082434>] do_debug_exception+0x3c/0xa8
[<ffffff80080854b4>] el1_dbg+0x18/0x6c
[<ffffff80081e82f4>] do_filp_open+0x64/0xdc
[<ffffff80081d6e60>] do_sys_open+0x140/0x204
[<ffffff80081d6f58>] SyS_openat+0x10/0x18
[<ffffff8008085d30>] el0_svc_naked+0x24/0x28
possible reason: unannotated irqs-off.
irq event stamp: 65857
hardirqs last enabled at (65857): [<ffffff80081fb1c0>] lookup_mnt+0xf4/0x1b4
hardirqs last disabled at (65856): [<ffffff80081fb188>] lookup_mnt+0xbc/0x1b4
softirqs last enabled at (65790): [<ffffff80080bdca4>] __do_softirq+0x1f8/0x290
softirqs last disabled at (65757): [<ffffff80080be038>] irq_exit+0x9c/0xd0

This patch adds the annotations to do_debug_exception(), while trying not
to call trace_hardirqs_off() if el1_dbg() interrupted a task that already
had irqs disabled.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 0e8fb931 17-Mar-2016 Jan Kara <jack@suse.cz>

mm: remove VM_FAULT_MINOR

The define has a comment from Nick Piggin from 2007:

/* For backwards compat. Remove me quickly. */

I guess 9 years should not be too hurried sense of 'quickly' even for
kernel measures.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 70c8abc2 19-Feb-2016 Catalin Marinas <catalin.marinas@arm.com>

arm64: User die() instead of panic() in do_page_fault()

The former gives better error reporting on unhandled permission faults
(introduced by the UAO patches).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 52d7523d 15-Feb-2016 EunTaik Lee <eun.taik.lee@samsung.com>

arm64: mm: allow the kernel to handle alignment faults on user accesses

Although we don't expect to take alignment faults on access to normal
memory, misbehaving (i.e. buggy) user code can pass MMIO pointers into
system calls, leading to things like get_user accessing device memory.

Rather than OOPS the kernel, allow any exception fixups to run and
return something like -EFAULT back to userspace. This makes the
behaviour more consistent with userspace, even though applications with
access to device mappings can easily cause other issues if they try
hard enough.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Eun Taik Lee <eun.taik.lee@samsung.com>
[will: dropped __kprobes annotation and rewrote commit mesage]
Signed-off-by: Will Deacon <will.deacon@arm.com>


# e950631e 18-Feb-2016 Catalin Marinas <catalin.marinas@arm.com>

arm64: Remove the get_thread_info() function

This function was introduced by previous commits implementing UAO.
However, it can be replaced with task_thread_info() in
uao_thread_switch() or get_fs() in do_page_fault() (the latter being
called only on the current context, so no need for using the saved
pt_regs).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 70544196 05-Feb-2016 James Morse <james.morse@arm.com>

arm64: kernel: Don't toggle PAN on systems with UAO

If a CPU supports both Privileged Access Never (PAN) and User Access
Override (UAO), we don't need to disable/re-enable PAN round all
copy_to_user() like calls.

UAO alternatives cause these calls to use the 'unprivileged' load/store
instructions, which are overridden to be the privileged kind when
fs==KERNEL_DS.

This patch changes the copy_to_user() calls to have their PAN toggling
depend on a new composite 'feature' ARM64_ALT_PAN_NOT_UAO.

If both features are detected, PAN will be enabled, but the copy_to_user()
alternatives will not be applied. This means PAN will be enabled all the
time for these functions. If only PAN is detected, the toggling will be
enabled as normal.

This will save the time taken to disable/re-enable PAN, and allow us to
catch copy_to_user() accesses that occur with fs==KERNEL_DS.

Futex and swp-emulation code continue to hang their PAN toggling code on
ARM64_HAS_PAN.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 57f4959b 05-Feb-2016 James Morse <james.morse@arm.com>

arm64: kernel: Add support for User Access Override

'User Access Override' is a new ARMv8.2 feature which allows the
unprivileged load and store instructions to be overridden to behave in
the normal way.

This patch converts {get,put}_user() and friends to use ldtr*/sttr*
instructions - so that they can only access EL0 memory, then enables
UAO when fs==KERNEL_DS so that these functions can access kernel memory.

This allows user space's read/write permissions to be checked against the
page tables, instead of testing addr<USER_DS, then using the kernel's
read/write permissions.

Signed-off-by: James Morse <james.morse@arm.com>
[catalin.marinas@arm.com: move uao_thread_switch() above dsb()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# c03784ee8 23-Nov-2015 Mark Rutland <mark.rutland@arm.com>

arm64: mm: fix fault_info table xFSC decoding

We are missing descriptions for some valid xFSC values in the fault info
table (e.g. "TLB conflict abort"), and have erroneous descriptions for
reserved values (e.g. "asynchronous external abort", "debug event").

This patch adds the missing xFSC values, and removes erroneous decoding
of values reserved by the architecture, as described in ARM DDI 0487A.h.

At the same time, fixed the unbalanced brackets for the synchronous
parity error strings in the table.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# dbb4e152 19-Oct-2015 Suzuki K. Poulose <suzuki.poulose@arm.com>

arm64: Delay cpu feature capability checks

At the moment we run through the arm64_features capability list for
each CPU and set the capability if one of the CPU supports it. This
could be problematic in a heterogeneous system with differing capabilities.
Delay the CPU feature checks until all the enabled CPUs are up(i.e,
smp_cpus_done(), so that we can make better decisions based on the
overall system capability. Once we decide and advertise the capabilities
the alternatives can be applied. From this state, we cannot roll back
a feature to disabled based on the values from a new hotplugged CPU,
due to the runtime patching and other reasons. So, for all new CPUs,
we need to make sure that they have the established system capabilities.
Failing which, we bring the CPU down, preventing it from turning online.
Once the capabilities are decided, any new CPU booting up goes through
verification to ensure that it has all the enabled capabilities and also
invokes the respective enable() method on the CPU.

The CPU errata checks are not delayed and is still executed per-CPU
to detect the respective capabilities. If we ever come across a non-errata
capability that needs to be checked on each-CPU, we could introduce them via
a new capability table(or introduce a flag), which can be processed per CPU.

The next patch will make the feature checks use the system wide
safe value of a feature register.

NOTE: The enable() methods associated with the capability is scheduled
on all the CPUs (which is the only use case at the moment). If we need
a different type of 'enable()' which only needs to be run once on any CPU,
we should be able to handle that when needed.

Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Tested-by: Dave Martin <Dave.Martin@arm.com>
[catalin.marinas@arm.com: static variable and coding style fixes]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 569ba74a 21-Sep-2015 Mark Salyzyn <salyzyn@android.com>

arm64: readahead: fault retry breaks mmap file read random detection

This is the arm64 portion of commit 45cac65b0fcd ("readahead: fault
retry breaks mmap file read random detection"), which was absent from
the initial port and has since gone unnoticed. The original commit says:

> .fault now can retry. The retry can break state machine of .fault. In
> filemap_fault, if page is miss, ra->mmap_miss is increased. In the second
> try, since the page is in page cache now, ra->mmap_miss is decreased. And
> these are done in one fault, so we can't detect random mmap file access.
>
> Add a new flag to indicate .fault is tried once. In the second try, skip
> ra->mmap_miss decreasing. The filemap_fault state machine is ok with it.

With this change, Mark reports that:

> Random read improves by 250%, sequential read improves by 40%, and
> random write by 400% to an eMMC device with dm crypto wrapped around it.

Cc: Shaohua Li <shli@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Riley Andrews <riandrews@android.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 9fb7410f 24-Jul-2015 Dave P Martin <Dave.Martin@arm.com>

arm64/BUG: Use BRK instruction for generic BUG traps

Currently, the minimal default BUG() implementation from asm-
generic is used for arm64.

This patch uses the BRK software breakpoint instruction to generate
a trap instead, similarly to most other arches, with the generic
BUG code generating the dmesg boilerplate.

This allows bug metadata to be moved to a separate table and
reduces the amount of inline code at BUG and WARN sites. This also
avoids clobbering any registers before they can be dumped.

To mitigate the size of the bug table further, this patch makes
use of the existing infrastructure for encoding addresses within
the bug table as 32-bit offsets instead of absolute pointers.
(Note that this limits the kernel size to 2GB.)

Traps are registered at arch_initcall time for aarch64, but BUG
has minimal real dependencies and it is desirable to be able to
generate bug splats as early as possible. This patch redirects
all debug exceptions caused by BRK directly to bug_handler() until
the full debug exception support has been initialised.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 338d4f49 22-Jul-2015 James Morse <james.morse@arm.com>

arm64: kernel: Add support for Privileged Access Never

'Privileged Access Never' is a new arm8.1 feature which prevents
privileged code from accessing any virtual address where read or write
access is also permitted at EL0.

This patch enables the PAN feature on all CPUs, and modifies {get,put}_user
helpers temporarily to permit access.

This will catch kernel bugs where user memory is accessed directly.
'Unprivileged loads and stores' using ldtrb et al are unaffected by PAN.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
[will: use ALTERNATIVE in asm and tidy up pan_enable check]
Signed-off-by: Will Deacon <will.deacon@arm.com>


# f871d268 03-Jul-2015 Suzuki K. Poulose <suzuki.poulose@arm.com>

arm64: Fix show_unhandled_signal_ratelimited usage

Commit 86dca36e6ba introduced ratelimited usage for
'unhandled_signal' messages.
The commit checks the ratelimit irrespective of whether
the signal is handled or not, which is wrong and leads
to false reports like the below in dmesg :

__do_user_fault: 127 callbacks suppressed

Do the ratelimit check only if the signal is unhandled.

Fixes: 86dca36e6ba0 ("arm64: use private ratelimit state along with show_unhandled_signals")
Cc: Vladimir Murzin <Vladimir.Murzin@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 86dca36e 19-Jun-2015 Vladimir Murzin <vladimir.murzin@arm.com>

arm64: use private ratelimit state along with show_unhandled_signals

printk_ratelimit() shares the ratelimiting state with other callers what
may lead to scenarios where at the time we want to print out debug
information we already limited, so nothing appears in the dmesg - this
makes exception-trace quite poor helper in debugging.

Additionally, we have imbalance with some messages limited with global
ratelimit state and other messages limited with their private state
defined via pr_*_ratelimited().

To address this inconsistency show_unhandled_signals_ratelimited()
macro is introduced and caller sites are converted to use it.

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 9e793ab8 19-Jun-2015 Vladimir Murzin <vladimir.murzin@arm.com>

arm64: show unhandled SP/PC alignment faults

Report unhandled SP/PC alignment faults if the show_unhandled_signals
variable is set (via /proc/sys/debug/exception-trace).

Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 70ffdb93 11-May-2015 David Hildenbrand <dahi@linux.vnet.ibm.com>

mm/fault, arch: Use pagefault_disable() to check for disabled pagefaults in the handler

Introduce faulthandler_disabled() and use it to check for irq context and
disabled pagefaults (via pagefault_disable()) in the pagefault handlers.

Please note that we keep the in_atomic() checks in place - to detect
whether in irq context (in which case preemption is always properly
disabled).

In contrast, preempt_disable() should never be used to disable pagefaults.
With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
counter, and therefore the result of in_atomic() differs.
We validate that condition by using might_fault() checks when calling
might_sleep().

Therefore, add a comment to faulthandler_disabled(), describing why this
is needed.

faulthandler_disabled() and pagefault_disable() are defined in
linux/uaccess.h, so let's properly add that include to all relevant files.

This patch is based on a patch from Thomas Gleixner.

Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# aed40e01 23-Nov-2014 Mark Rutland <mark.rutland@arm.com>

arm64: move to ESR_ELx macros

Now that we have common ESR_ELx_* macros, move the core arm64 code over
to them.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>


# 7f73f7ae 21-Nov-2014 Will Deacon <will@kernel.org>

arm64: mm: report unhandled level-0 translation faults correctly

Translation faults that occur due to the input address being outside
of the address range mapped by the relevant base register are reported
as level 0 faults in ESR.DFSC.

If the faulting access cannot be resolved by the kernel (e.g. because
it is not mapped by a vma), then we report "input address range fault"
on the console. This was fine until we added support for 48-bit VAs,
which actually place PGDs at level 0 and can trigger faults for invalid
addresses that are within the range of the page tables.

This patch changes the string to report "level 0 translation fault",
which is far less confusing.

Signed-off-by: Will Deacon <will.deacon@arm.com>


# c79b954b 12-May-2014 Jungseok Lee <jays.lee@samsung.com>

arm64: mm: Implement 4 levels of translation tables

This patch implements 4 levels of translation tables since 3 levels
of page tables with 4KB pages cannot support 40-bit physical address
space described in [1] due to the following issue.

It is a restriction that kernel logical memory map with 4KB + 3 levels
(0xffffffc000000000-0xffffffffffffffff) cannot cover RAM region from
544GB to 1024GB in [1]. Specifically, ARM64 kernel fails to create
mapping for this region in map_mem function since __phys_to_virt for
this region reaches to address overflow.

If SoC design follows the document, [1], over 32GB RAM would be placed
from 544GB. Even 64GB system is supposed to use the region from 544GB
to 576GB for only 32GB RAM. Naturally, it would reach to enable 4 levels
of page tables to avoid hacking __virt_to_phys and __phys_to_virt.

However, it is recommended 4 levels of page table should be only enabled
if memory map is too sparse or there is about 512GB RAM.

References
----------
[1]: Principles of ARM Memory Maps, White Paper, Issue C

Signed-off-by: Jungseok Lee <jays.lee@samsung.com>
Reviewed-by: Sungjinn Chung <sungjinn.chung@samsung.com>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Steve Capper <steve.capper@linaro.org>
[catalin.marinas@arm.com: MEMBLOCK_INITIAL_LIMIT removed, same as PUD_SIZE]
[catalin.marinas@arm.com: early_ioremap_init() updated for 4 levels]
[catalin.marinas@arm.com: 48-bit VA depends on BROKEN until KVM is fixed]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Jungseok Lee <jungseoklee85@gmail.com>


# 5a0fdfad 16-May-2014 Catalin Marinas <catalin.marinas@arm.com>

Revert "arm64: Introduce execute-only page access permissions"

This reverts commit bc07c2c6e9ed125d362af0214b6313dca180cb08.

While the aim is increased security for --x memory maps, it does not
protect against kernel level reads. Until SECCOMP is implemented for
arm64, revert this patch to avoid giving a false idea of execute-only
mappings.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# bc07c2c6 03-Apr-2014 Catalin Marinas <catalin.marinas@arm.com>

arm64: Introduce execute-only page access permissions

The ARMv8 architecture allows execute-only user permissions by clearing
the PTE_UXN and PTE_USER bits. The kernel, however, can still access
such page, so execute-only page permission does not protect against
read(2)/write(2) etc. accesses. Systems requiring such protection must
implement/enable features like SECCOMP.

This patch changes the arm64 __P100 and __S100 protection_map[] macros
to the new __PAGE_EXECONLY attributes. A side effect is that
pte_valid_user() no longer triggers for __PAGE_EXECONLY since PTE_USER
isn't set. To work around this, the check is done on the PTE_NG bit via
the pte_valid_ng() macro. VM_READ is also checked now for page faults.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 9141300a 06-Apr-2014 Catalin Marinas <catalin.marinas@arm.com>

arm64: Provide read/write fault information in compat signal handlers

For AArch32, bit 11 (WnR) of the FSR/ESR register is set when the fault
was caused by a write access and applications like Qemu rely on such
information being provided in sigcontext. This patch introduces the
ESR_EL1 tracking for the arm64 kernel faults and sets bit 11 accordingly
in compat sigcontext.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 59f67e16 16-Sep-2013 Catalin Marinas <catalin.marinas@arm.com>

arm64: Make do_bad_area() function static

This function is only called from arch/arm64/mm/fault.c.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 759496ba 12-Sep-2013 Johannes Weiner <hannes@cmpxchg.org>

arch: mm: pass userspace fault flag to generic fault handler

Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.

Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 87134102 12-Sep-2013 Johannes Weiner <hannes@cmpxchg.org>

arch: mm: do not invoke OOM killer on kernel fault OOM

Kernel faults are expected to handle OOM conditions gracefully (gup,
uaccess etc.), so they should never invoke the OOM killer. Reserve this
for faults triggered in user context when it is the only option.

Most architectures already do this, fix up the remaining few.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# db6f4106 19-Jul-2013 Will Deacon <will@kernel.org>

arm64: mm: don't treat user cache maintenance faults as writes

On arm64, cache maintenance faults appear as data aborts with the CM
bit set in the ESR. The WnR bit, usually used to distinguish between
faulting loads and stores, always reads as 1 and (slightly confusingly)
the instructions are treated as reads by the architecture.

This patch fixes our fault handling code to treat cache maintenance
faults in the same way as loads.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 084bd298 10-Apr-2013 Steve Capper <steve.capper@linaro.org>

ARM64: mm: HugeTLB support.

Add huge page support to ARM64, different huge page sizes are
supported depending on the size of normal pages:

PAGE_SIZE is 4KB:
2MB - (pmds) these can be allocated at any time.
1024MB - (puds) usually allocated on bootup with the command line
with something like: hugepagesz=1G hugepages=6

PAGE_SIZE is 64KB:
512MB - (pmds) usually allocated on bootup via command line.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>


# 953dbbed 20-May-2013 Catalin Marinas <catalin.marinas@arm.com>

arm64: Do not report user faults for handled signals

Currently user faults (page, undefined instruction) are always reported
even though the user may have a signal handler for them. This patch adds
unhandled_signal() check together with printk_ratelimit() for these
cases.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 0e7f7bcc 07-May-2013 Catalin Marinas <catalin.marinas@arm.com>

arm64: Ignore the 'write' ESR flag on cache maintenance faults

ESR.WnR bit is always set on data cache maintenance faults even though
the page is not required to have write permission. If a translation
fault (page not yet mapped) happens for read-only user address range,
Linux incorrectly assumes a permission fault. This patch adds the check
of the ESR.CM bit during the page fault handling to ignore the 'write'
flag.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Tim Northover <Tim.Northover@arm.com>
Cc: stable@vger.kernel.org


# 4339e3f3 19-Apr-2013 Steve Capper <steve.capper@linaro.org>

arm64: mm: Correct show_pte behaviour

show_pte makes use of the *_none_or_clear_bad style functions. If a
pgd, pud or pmd is identified as being bad, it will then be cleared.

As show_pte appears to be called from either the user or kernel
fault handlers this side effect can lead to unpredictable behaviour;
especially as TLB entries are not invalidated.

This patch removes the page table sanitisation from show_pte. If a
bad pgd, pud or pmd is encountered it is left unmodified.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 3495386b 24-Oct-2012 Catalin Marinas <catalin.marinas@arm.com>

arm64: Make the user fault reporting more specific

For user space faults the kernel reports "unhandled page fault" and it
gives the ESR value. With this patch the error message looked up in the
fault info array to give a better description.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 1d18c47c 05-Mar-2012 Catalin Marinas <catalin.marinas@arm.com>

arm64: MMU fault handling and page table management

This patch adds support for the handling of the MMU faults (exception
entry code introduced by a previous patch) and page table management.

The user translation table is pointed to by TTBR0 and the kernel one
(swapper_pg_dir) by TTBR1. There is no translation information shared or
address space overlapping between user and kernel page tables.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>