History log of /linux-master/arch/x86/mm/dump_pagetables.c
Revision Date Author Comments
# 6cdc82db 30-Jan-2024 Christophe Leroy <christophe.leroy@csgroup.eu>

mm: ptdump: have ptdump_check_wx() return bool

Have ptdump_check_wx() return true when the check is successful or false
otherwise.

[akpm@linux-foundation.org: fix a couple of build issues (x86_64 allmodconfig)]
Link: https://lkml.kernel.org/r/7943149fe955458cb7b57cd483bf41a3aad94684.1706610398.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Phong Tran <tranmanphong@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steven Price <steven.price@arm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# a5e8131a 30-Jan-2024 Christophe Leroy <christophe.leroy@csgroup.eu>

arm64, powerpc, riscv, s390, x86: ptdump: refactor CONFIG_DEBUG_WX

All architectures using the core ptdump functionality also implement
CONFIG_DEBUG_WX, and they all do it more or less the same way, with a
function called debug_checkwx() that is called by mark_rodata_ro(), which
is a substitute to ptdump_check_wx() when CONFIG_DEBUG_WX is set and a
no-op otherwise.

Refactor by centrally defining debug_checkwx() in linux/ptdump.h and call
debug_checkwx() immediately after calling mark_rodata_ro() instead of
calling it at the end of every mark_rodata_ro().

On x86_32, mark_rodata_ro() first checks __supported_pte_mask has _PAGE_NX
before calling debug_checkwx(). Now the check is inside the callee
ptdump_walk_pgd_level_checkwx().

On powerpc_64, mark_rodata_ro() bails out early before calling
ptdump_check_wx() when the MMU doesn't have KERNEL_RO feature. The check
is now also done in ptdump_check_wx() as it is called outside
mark_rodata_ro().

Link: https://lkml.kernel.org/r/a59b102d7964261d31ead0316a9f18628e4e7a8e.1706610398.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Phong Tran <tranmanphong@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steven Price <steven.price@arm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# ea4654e0 21-Nov-2023 Breno Leitao <leitao@debian.org>

x86/bugs: Rename CONFIG_PAGE_TABLE_ISOLATION => CONFIG_MITIGATION_PAGE_TABLE_ISOLATION

Step 4/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Converted new uses that got added since the series was posted. ]

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231121160740.1249350-5-leitao@debian.org


# e31cf2f4 08-Jun-2020 Mike Rapoport <rppt@kernel.org>

mm: don't include asm/pgtable.h if linux/mm.h is already included

Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once. For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.

Most of these definitions are actually identical and typically it boils
down to, e.g.

static inline unsigned long pmd_index(unsigned long address)
{
return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}

These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.

These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.

This patch (of 12):

The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
functions involving page table manipulations, e.g. pte_alloc() and
pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h>
in the files that include <linux/mm.h>.

The include statements in such cases are remove with a simple loop:

for f in $(git grep -l "include <linux/mm.h>") ; do
sed -i -e '/include <asm\/pgtable.h>/ d' $f
done

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 99395ee3 01-Jun-2020 Steven Price <steven.price@arm.com>

mm: ptdump: expand type of 'val' in note_page()

The page table entry is passed in the 'val' argument to note_page(),
however this was previously an "unsigned long" which is fine on 64-bit
platforms. But for 32 bit x86 it is not always big enough to contain a
page table entry which may be 64 bits.

Change the type to u64 to ensure that it is always big enough.

[akpm@linux-foundation.org: fix riscv]
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200521152308.33096-3-steven.price@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1494e0c3 01-Jun-2020 Steven Price <steven.price@arm.com>

x86: mm: ptdump: calculate effective permissions correctly

Patch series "Fix W+X debug feature on x86"

Jan alerted me[1] that the W+X detection debug feature was broken in x86
by my change[2] to switch x86 to use the generic ptdump infrastructure.

Fundamentally the approach of trying to move the calculation of
effective permissions into note_page() was broken because note_page() is
only called for 'leaf' entries and the effective permissions are passed
down via the internal nodes of the page tree. The solution I've taken
here is to create a new (optional) callback which is called for all
nodes of the page tree and therefore can calculate the effective
permissions.

Secondly on some configurations (32 bit with PAE) "unsigned long" is not
large enough to store the table entries. The fix here is simple - let's
just use a u64.

[1] https://lore.kernel.org/lkml/d573dc7e-e742-84de-473d-f971142fa319@suse.com/
[2] 2ae27137b2db ("x86: mm: convert dump_pagetables to use walk_page_range")

This patch (of 2):

By switching the x86 page table dump code to use the generic code the
effective permissions are no longer calculated correctly because the
note_page() function is only called for *leaf* entries. To calculate
the actual effective permissions it is necessary to observe the full
hierarchy of the page tree.

Introduce a new callback for ptdump which is called for every entry and
can therefore update the prot_levels array correctly. note_page() can
then simply access the appropriate element in the array.

[steven.price@arm.com: make the assignment conditional on val != 0]
Link: http://lkml.kernel.org/r/430c8ab4-e7cd-6933-dde6-087fac6db872@arm.com
Fixes: 2ae27137b2db ("x86: mm: convert dump_pagetables to use walk_page_range")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200521152308.33096-1-steven.price@arm.com
Link: http://lkml.kernel.org/r/20200521152308.33096-2-steven.price@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bba42aff 21-Feb-2020 Juergen Gross <jgross@suse.com>

x86/mm: Fix dump_pagetables with Xen PV

Commit 2ae27137b2db89 ("x86: mm: convert dump_pagetables to use
walk_page_range") broke Xen PV guests as the hypervisor reserved hole in
the memory map was not taken into account.

Fix that by starting the kernel range only at GUARD_HOLE_END_ADDR.

Fixes: 2ae27137b2db89 ("x86: mm: convert dump_pagetables to use walk_page_range")
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Julien Grall <julien@xen.org>
Link: https://lkml.kernel.org/r/20200221103851.7855-1-jgross@suse.com


# e47690d7 03-Feb-2020 Steven Price <steven.price@arm.com>

x86: mm: avoid allocating struct mm_struct on the stack

struct mm_struct is quite large (~1664 bytes) and so allocating on the
stack may cause problems as the kernel stack size is small.

Since ptdump_walk_pgd_level_core() was only allocating the structure so
that it could modify the pgd argument we can instead introduce a pgd
override in struct mm_walk and pass this down the call stack to where it
is needed.

Since the correct mm_struct is now being passed down, it is now also
unnecessary to take the mmap_sem semaphore because ptdump_walk_pgd() will
now take the semaphore on the real mm.

[steven.price@arm.com: restore missed arm64 changes]
Link: http://lkml.kernel.org/r/20200108145710.34314-1-steven.price@arm.com
Link: http://lkml.kernel.org/r/20200108145710.34314-1-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f8f0d0b6 03-Feb-2020 Steven Price <steven.price@arm.com>

mm: ptdump: reduce level numbers by 1 in note_page()

Rather than having to increment the 'depth' number by 1 in ptdump_hole(),
let's change the meaning of 'level' in note_page() since that makes the
code simplier.

Note that for x86, the level numbers were previously increased by 1 in
commit 45dcd2091363 ("x86/mm/dump_pagetables: Fix printout of p4d level")
and the comment "Bit 7 has a different meaning" was not updated, so this
change also makes the code match the comment again.

Link: http://lkml.kernel.org/r/20191218162402.45610-24-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2ae27137 03-Feb-2020 Steven Price <steven.price@arm.com>

x86: mm: convert dump_pagetables to use walk_page_range

Make use of the new functionality in walk_page_range to remove the arch
page walking code and use the generic code to walk the page tables.

The effective permissions are passed down the chain using new fields in
struct pg_state.

The KASAN optimisation is implemented by setting action=CONTINUE in the
callbacks to skip an entire tree of entries.

Link: http://lkml.kernel.org/r/20191218162402.45610-21-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c5cfae12 03-Feb-2020 Steven Price <steven.price@arm.com>

x86: mm: convert ptdump_walk_pgd_level_debugfs() to take an mm_struct

To enable x86 to use the generic walk_page_range() function, the callers
of ptdump_walk_pgd_level_debugfs() need to pass in the mm_struct.

This means that ptdump_walk_pgd_level_core() is now always passed a valid
pgd, so drop the support for pgd==NULL.

Link: http://lkml.kernel.org/r/20191218162402.45610-19-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e455248d 03-Feb-2020 Steven Price <steven.price@arm.com>

x86: mm+efi: convert ptdump_walk_pgd_level() to take a mm_struct

To enable x86 to use the generic walk_page_range() function, the callers
of ptdump_walk_pgd_level() need to pass an mm_struct rather than the raw
pgd_t pointer. Luckily since commit 7e904a91bf60 ("efi: Use efi_mm in x86
as well as ARM") we now have an mm_struct for EFI on x86.

Link: http://lkml.kernel.org/r/20191218162402.45610-18-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 74d2aaa1 03-Feb-2020 Steven Price <steven.price@arm.com>

x86: mm: point to struct seq_file from struct pg_state

mm/dump_pagetables.c passes both struct seq_file and struct pg_state down
the chain of walk_*_level() functions to be passed to note_page().
Instead place the struct seq_file in struct pg_state and access it from
struct pg_state (which is private to this file) in note_page().

Link: http://lkml.kernel.org/r/20191218162402.45610-17-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b886d83c 01-Jun-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation version 2 of the license

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 315 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190115.503150771@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 510bb96f 15-Apr-2019 Thomas Gleixner <tglx@linutronix.de>

x86/mm: Prevent bogus warnings with "noexec=off"

Xose Vazquez Perez reported boot warnings when NX is disabled on the kernel command line.

__early_set_fixmap() triggers this warning:

attempted to set unsupported pgprot: 8000000000000163
bits: 8000000000000000
supported: 7fffffffffffffff

WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/pgtable.h:537
__early_set_fixmap+0xa2/0xff

because it uses __default_kernel_pte_mask to mask out unsupported bits.

Use __supported_pte_mask instead.

Disabling NX on the command line also triggers the NX warning in the page
table mapping check:

WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:262 note_page+0x2ae/0x650
....

Make the warning depend on NX set in __supported_pte_mask.

Reported-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Tested-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1904151037530.1729@nanos.tec.linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 28e3ace7 29-Mar-2019 Borislav Petkov <bp@suse.de>

x86/mm: Convert some slow-path static_cpu_has() callers to boot_cpu_has()

Using static_cpu_has() is pointless on those paths, convert them to the
boot_cpu_has() variant.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190330112022.28888-5-bp@alien8.de


# 8e8a3cea 14-Feb-2019 Shaokun Zhang <zhangshaokun@hisilicon.com>

x86/mm/dump_pagetables: Remove the unused prev_pud variable

The 'prev_pud' local variable became unused after commit

04b67022fb6d ("x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y").

Remove it.

Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1550136829-49088-1-git-send-email-zhangshaokun@hisilicon.com


# 9577dd74 28-Dec-2018 Andrey Konovalov <andreyknvl@google.com>

kasan: rename kasan_zero_page to kasan_early_shadow_page

With tag based KASAN mode the early shadow value is 0xff and not 0x00, so
this patch renames kasan_zero_(page|pte|pmd|pud|p4d) to
kasan_early_shadow_(page|pte|pmd|pud|p4d) to avoid confusion.

Link: http://lkml.kernel.org/r/3fed313280ebf4f88645f5b89ccbc066d320e177.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 254eb550 30-Nov-2018 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/dump_pagetables: Fix LDT remap address marker

The LDT remap placement has been changed. It's now placed before the direct
mapping in the kernel virtual address space for both paging modes.

Change address markers order accordingly.

Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: luto@kernel.org
Cc: peterz@infradead.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: bhe@redhat.com
Cc: hans.van.kranenburg@mendix.com
Cc: linux-mm@kvack.org
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20181130202328.65359-3-kirill.shutemov@linux.intel.com


# 16877a55 30-Nov-2018 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm: Fix guard hole handling

There is a guard hole at the beginning of the kernel address space, also
used by hypervisors. It occupies 16 PGD entries.

This reserved range is not defined explicitely, it is calculated relative
to other entities: direct mapping and user space ranges.

The calculation got broken by recent changes of the kernel memory layout:
LDT remap range is now mapped before direct mapping and makes the
calculation invalid.

The breakage leads to crash on Xen dom0 boot[1].

Define the reserved range explicitely. It's part of kernel ABI (hypervisors
expect it to be stable) and must not depend on changes in the rest of
kernel memory layout.

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03313.html

Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging")
Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: luto@kernel.org
Cc: peterz@infradead.org
Cc: boris.ostrovsky@oracle.com
Cc: bhe@redhat.com
Cc: linux-mm@kvack.org
Cc: xen-devel@lists.xenproject.org
Link: https://lkml.kernel.org/r/20181130202328.65359-2-kirill.shutemov@linux.intel.com


# c200dac7 08-Oct-2018 Thomas Gleixner <tglx@linutronix.de>

x86/mm: Do not warn about PCI BIOS W+X mappings

PCI BIOS requires the BIOS area 0x0A0000-0x0FFFFFF to be mapped W+X for
various legacy reasons. When CONFIG_DEBUG_WX is enabled, this triggers the
WX warning, but this is misleading because the mapping is required and is
not a result of an accidental oversight.

Prevent the full warning when PCI BIOS is enabled and the detected WX
mapping is in the BIOS area. Just emit a pr_warn() which denotes the
fact. This is partially duplicating the info which the PCI BIOS code emits
when it maps the area as executable, but that info is not in the context of
the WX checking output.

Remove the extra %p printout in the WARN_ONCE() while at it. %pS is enough.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1810082151160.2455@nanos.tec.linutronix.de


# d878efce 08-Aug-2018 Joerg Roedel <jroedel@suse.de>

x86/mm/pti: Move user W+X check into pti_finalize()

The user page-table gets the updated kernel mappings in pti_finalize(),
which runs after the RO+X permissions got applied to the kernel page-table
in mark_readonly().

But with CONFIG_DEBUG_WX enabled, the user page-table is already checked in
mark_readonly() for insecure mappings. This causes false-positive
warnings, because the user page-table did not get the updated mappings yet.

Move the W+X check for the user page-table into pti_finalize() after it
updated all required mappings.

[ tglx: Folded !NX supported fix ]

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1533727000-9172-1-git-send-email-joro@8bytes.org


# f3e48e54 18-Jul-2018 Joerg Roedel <jroedel@suse.de>

x86/ldt: Reserve address-space range on 32 bit for the LDT

Reserve 2MB/4MB of address-space for mapping the LDT to user-space on 32
bit PTI kernels.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-34-git-send-email-joro@8bytes.org


# 4e8537e4 18-Jul-2018 Joerg Roedel <jroedel@suse.de>

x86/mm/dump_pagetables: Define INIT_PGD

Define INIT_PGD to point to the correct initial page-table for 32 and 64
bit and use it where needed. This fixes the build on 32 bit with
CONFIG_PAGE_TABLE_ISOLATION enabled.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-32-git-send-email-joro@8bytes.org


# ed7588d5 18-May-2018 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm: Stop pretending pgtable_l5_enabled is a variable

pgtable_l5_enabled is defined using cpu_feature_enabled() but we refer
to it as a variable. This is misleading.

Make pgtable_l5_enabled() a function.

We cannot literally define it as a function due to circular dependencies
between header files. Function-alike macros is close enough.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180518103528.59260-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d6ef1f19 17-Apr-2018 Joerg Roedel <jroedel@suse.de>

x86/mm: Prevent kernel Oops in PTDUMP code with HIGHPTE=y

The walk_pte_level() function just uses __va to get the virtual address of
the PTE page, but that breaks when the PTE page is not in the direct
mapping with HIGHPTE=y.

The result is an unhandled kernel paging request at some random address
when accessing the current_kernel or current_user file.

Use the correct API to access PTE pages.

Fixes: fe770bf0310d ('x86: clean up the page table dumper and add 32-bit support')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: jgross@suse.com
Cc: JBeulich@suse.com
Cc: hpa@zytor.com
Cc: aryabinin@virtuozzo.com
Cc: kirill.shutemov@linux.intel.com
Link: https://lkml.kernel.org/r/1523971636-4137-1-git-send-email-joro@8bytes.org


# 672c0ae0 23-Feb-2018 Jan Beulich <JBeulich@suse.com>

x86/mm: Consider effective protection attributes in W+X check

Using just the leaf page table entry flags would cause a false warning
in case _PAGE_RW is clear or _PAGE_NX is set in a higher level entry.
Hand through both the current entry's flags as well as the accumulated
effective value (the latter as pgprotval_t instead of pgprot_t, as it's
not an actual entry's value).

This in particular eliminates the false W+X warning when running under
Xen, as commit:

2cc42bac1c ("x86-64/Xen: eliminate W+X mappings")

had to make the necessary adjustment in L2 rather than L1 (the reason is
explained there). I.e. _PAGE_RW is clear there in L1, but _PAGE_NX is
set in L2.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/5A8FDE8902000078001AABBB@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 91f606a8 14-Feb-2018 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm: Replace compile-time checks for 5-level paging with runtime-time checks

This patch converts the of CONFIG_X86_5LEVEL check to runtime checks for
p4d folding.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-9-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 09e61a77 14-Feb-2018 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm: Make __VIRTUAL_MASK_SHIFT dynamic

For boot-time switching between paging modes, we need to be able to
adjust virtual mask shifts.

The change doesn't affect the kernel image size much:

text data bss dec hex filename
8628892 4734340 1368064 14731296 e0c820 vmlinux.before
8628966 4734340 1368064 14731370 e0c86a vmlinux.after

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-9-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# c65e774f 14-Feb-2018 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variable

For boot-time switching between 4- and 5-level paging we need to be able
to fold p4d page table level at runtime. It requires variable
PGDIR_SHIFT and PTRS_PER_P4D.

The change doesn't affect the kernel image size much:

text data bss dec hex filename
8628091 4734304 1368064 14730459 e0c4db vmlinux.before
8628393 4734340 1368064 14730797 e0c62d vmlinux.after

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 5c7919bb 14-Feb-2018 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm: Make LDT_BASE_ADDR dynamic

LDT_BASE_ADDR has different value in 4- and 5-level paging
configurations.

We need to make it dynamic in preparation for boot-time switching
between paging modes.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-6-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f2078904 04-Jan-2018 Thomas Gleixner <tglx@linutronix.de>

x86/mm: Map cpu_entry_area at the same place on 4/5 level

There is no reason for 4 and 5 level pagetables to have a different
layout. It just makes determining vaddr_end for KASLR harder than
necessary.

Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Gilbert <benjamin.gilbert@coreos.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable <stable@vger.kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>,
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos


# a4b51ef6 04-Dec-2017 Thomas Gleixner <tglx@linutronix.de>

x86/mm/dump_pagetables: Allow dumping current pagetables

Add two debugfs files which allow to dump the pagetable of the current
task.

current_kernel dumps the regular page table. This is the page table which
is normally shared between kernel and user space. If kernel page table
isolation is enabled this is the kernel space mapping.

If kernel page table isolation is enabled the second file, current_user,
dumps the user space page table.

These files allow to verify the resulting page tables for page table
isolation, but even in the normal case its useful to be able to inspect
user space page tables of current for debugging purposes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# b4bf4f92 04-Dec-2017 Thomas Gleixner <tglx@linutronix.de>

x86/mm/dump_pagetables: Check user space page table for WX pages

ptdump_walk_pgd_level_checkwx() checks the kernel page table for WX pages,
but does not check the PAGE_TABLE_ISOLATION user space page table.

Restructure the code so that dmesg output is selected by an explicit
argument and not implicit via checking the pgd argument for !NULL.

Add the check for the user space page table.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f55f0501 12-Dec-2017 Andy Lutomirski <luto@kernel.org>

x86/pti: Put the LDT in its own PGD if PTI is on

With PTI enabled, the LDT must be mapped in the usermode tables somewhere.
The LDT is per process, i.e. per mm.

An earlier approach mapped the LDT on context switch into a fixmap area,
but that's a big overhead and exhausted the fixmap space when NR_CPUS got
big.

Take advantage of the fact that there is an address space hole which
provides a completely unused pgd. Use this pgd to manage per-mm LDT
mappings.

This has a down side: the LDT isn't (currently) randomized, and an attack
that can write the LDT is instant root due to call gates (thanks, AMD, for
leaving call gates in AMD64 but designing them wrong so they're only useful
for exploits). This can be mitigated by making the LDT read-only or
randomizing the mapping, either of which is strightforward on top of this
patch.

This will significantly slow down LDT users, but that shouldn't matter for
important workloads -- the LDT is only used by DOSEMU(2), Wine, and very
old libc implementations.

[ tglx: Cleaned it up. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 92a0f81d 20-Dec-2017 Thomas Gleixner <tglx@linutronix.de>

x86/cpu_entry_area: Move it out of the fixmap

Put the cpu_entry_area into a separate P4D entry. The fixmap gets too big
and 0-day already hit a case where the fixmap PTEs were cleared by
cleanup_highmap().

Aside of that the fixmap API is a pain as it's all backwards.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 146122e2 20-Dec-2017 Thomas Gleixner <tglx@linutronix.de>

x86/mm/dump_pagetables: Make the address hints correct and readable

The address hints are a trainwreck. The array entry numbers have to kept
magically in sync with the actual hints, which is doomed as some of the
array members are initialized at runtime via the entry numbers.

Designated initializers have been around before this code was
implemented....

Use the entry numbers to populate the address hints array and add the
missing bits and pieces. Split 32 and 64 bit for readability sake.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# c0534494 15-Dec-2017 Thomas Gleixner <tglx@linutronix.de>

x86/mm/dump_pagetables: Check PAGE_PRESENT for real

The check for a present page in printk_prot():

if (!pgprot_val(prot)) {
/* Not present */

is bogus. If a PTE is set to PAGE_NONE then the pgprot_val is not zero and
the entry is decoded in bogus ways, e.g. as RX GLB. That is confusing when
analyzing mapping correctness. Check for the present bit to make an
informed decision.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 04b67022 24-Jul-2017 Andrey Ryabinin <ryabinin.a.a@gmail.com>

x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y

KASAN fills kernel page tables with repeated values to map several
TBs of the virtual memory to the single kasan_zero_page:
kasan_zero_p4d ->
kasan_zero_pud ->
kasan_zero_pmd->
kasan_zero_pte->
kasan_zero_page

Walking the whole KASAN shadow range takes a lot of time, especially
with 5-level page tables. Since we already know that all kasan page tables
eventually point to the kasan_zero_page we could call note_page()
right and avoid walking lower levels of the page tables.
This will not affect the output of the kernel_page_tables file,
but let us avoid spending time in page table walkers:

Before:

$ time cat /sys/kernel/debug/kernel_page_tables > /dev/null

real 0m55.855s
user 0m0.000s
sys 0m55.840s

After:

$ time cat /sys/kernel/debug/kernel_page_tables > /dev/null

real 0m0.054s
user 0m0.000s
sys 0m0.054s

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170724152558.24689-1-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 45dcd209 16-Jul-2017 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm/dump_pagetables: Fix printout of p4d level

Modify printk_prot() and callers to print out additional page table
level correctly.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170716225954.74185-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 3a366f79 16-Jul-2017 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/mm/dump_pagetables: Generalize address normalization

Modify normalize_addr to handle different sizes of virtual address
space.

It's preparation for enabling 5-level paging.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170716225954.74185-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 65ade2f8 06-Jun-2017 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/boot/64: Rename init_level4_pgt and early_level4_pgt

With CONFIG_X86_5LEVEL=y, level 4 is no longer top level of page tables.

Let's give these variable more generic names: init_top_pgt and
early_top_pgt.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170606113133.22974-9-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 84bbabc3 12-Apr-2017 Juergen Gross <jgross@suse.com>

x86/mm: Fix dump pagetables for 4 levels of page tables

Commit fdd3d8ce0ea62 ("x86/dump_pagetables: Add support for 5-level
paging") introduced an error for dumping with only 4 levels by setting
PGD_LEVEL_MULT to a wrong value.

This is leading to e.g. addresses printed as "(null)" for ranges:

x86/mm: Found insecure W+X mapping at address (null)/(null)

Make PGD_LEVEL_MULT a multiple of PTRS_PER_P4D instead of PTRS_PER_PUD

Fixes: fdd3d8ce0ea62 ("x86/dump_pagetables: Add support for 5-level paging")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20170412143634.6846-1-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# fdd3d8ce 28-Mar-2017 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

x86/dump_pagetables: Add support for 5-level paging

Simple extension to support one more page table level.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170328104806.41711-1-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f991376e 17-Mar-2017 Thomas Garnier <thgarnie@google.com>

x86/mm: Correct fixmap header usage on adaptable MODULES_END

This patch removes fixmap header usage on non-x86 code that was
introduced by the adaptable MODULE_END change.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170317175034.4701-1-thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f06bdd40 14-Mar-2017 Thomas Garnier <thgarnie@google.com>

x86/mm: Adapt MODULES_END based on fixmap section size

This patch aligns MODULES_END to the beginning of the fixmap section.
It optimizes the space available for both sections. The address is
pre-computed based on the number of pages required by the fixmap
section.

It will allow GDT remapping in the fixmap section. The current
MODULES_END static address does not provide enough space for the kernel
to support a large number of processors.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Luis R . Rodriguez <mcgrof@kernel.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: kasan-dev@googlegroups.com
Cc: kernel-hardening@lists.openwall.com
Cc: kvm@vger.kernel.org
Cc: lguest@lists.ozlabs.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-pm@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Cc: zijun_hu <zijun_hu@htc.com>
Link: http://lkml.kernel.org/r/20170314170508.100882-1-thgarnie@google.com
[ Small build fix. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 025205f8 14-Feb-2017 Andrey Ryabinin <ryabinin.a.a@gmail.com>

x86/mm/ptdump: Add address marker for KASAN shadow region

Annotate the KASAN shadow with address markers in page table
dump output:

$ cat /sys/kernel/debug/kernel_page_tables
...

---[ Vmemmap ]---
0xffffea0000000000-0xffffea0003000000 48M RW PSE GLB NX pmd
0xffffea0003000000-0xffffea0004000000 16M pmd
0xffffea0004000000-0xffffea0005000000 16M RW PSE GLB NX pmd
0xffffea0005000000-0xffffea0040000000 944M pmd
0xffffea0040000000-0xffffea8000000000 511G pud
0xffffea8000000000-0xffffec0000000000 1536G pgd
---[ KASAN shadow ]---
0xffffec0000000000-0xffffed0000000000 1T ro GLB NX pte
0xffffed0000000000-0xffffed0018000000 384M RW PSE GLB NX pmd
0xffffed0018000000-0xffffed0020000000 128M pmd
0xffffed0020000000-0xffffed0028200000 130M RW PSE GLB NX pmd
0xffffed0028200000-0xffffed0040000000 382M pmd
0xffffed0040000000-0xffffed8000000000 511G pud
0xffffed8000000000-0xfffff50000000000 7680G pgd
0xfffff50000000000-0xfffffbfff0000000 7339776M ro GLB NX pte
0xfffffbfff0000000-0xfffffbfff0200000 2M pmd
0xfffffbfff0200000-0xfffffbfff0a00000 8M RW PSE GLB NX pmd
0xfffffbfff0a00000-0xfffffbffffe00000 244M pmd
0xfffffbffffe00000-0xfffffc0000000000 2M ro GLB NX pte
---[ KASAN shadow end ]---
0xfffffc0000000000-0xffffff0000000000 3T pgd
---[ ESPfix Area ]---
...

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: kasan-dev@googlegroups.com
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Link: http://lkml.kernel.org/r/20170214100839.17186-2-aryabinin@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 243b72aa 14-Feb-2017 Andrey Ryabinin <ryabinin.a.a@gmail.com>

x86/mm/ptdump: Optimize check for W+X mappings for CONFIG_KASAN=y

Enabling both DEBUG_WX=y and KASAN=y options significantly increases
boot time (dozens of seconds at least).
KASAN fills kernel page tables with repeated values to map several
TBs of the virtual memory to the single kasan_zero_page:

kasan_zero_pud ->
kasan_zero_pmd->
kasan_zero_pte->
kasan_zero_page

So, the page table walker used to find W+X mapping check the same
kasan_zero_p?d page table entries a lot more than once.
With patch pud walker will skip the pud if it has the same value as
the previous one . Skipping done iff we search for W+X mappings,
so this optimization won't affect the page table dump via debugfs.

This dropped time spend in W+X check from ~30 sec to reasonable 0.1 sec:

Before:
[ 4.579991] Freeing unused kernel memory: 1000K
[ 35.257523] x86/mm: Checked W+X mappings: passed, no W+X pages found.

After:
[ 5.138756] Freeing unused kernel memory: 1000K
[ 5.266496] x86/mm: Checked W+X mappings: passed, no W+X pages found.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: kasan-dev@googlegroups.com
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Link: http://lkml.kernel.org/r/20170214100839.17186-1-aryabinin@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 146fbb76 09-Feb-2017 Andrey Ryabinin <ryabinin.a.a@gmail.com>

x86/mm/ptdump: Fix soft lockup in page table walker

CONFIG_KASAN=y needs a lot of virtual memory mapped for its shadow.
In that case ptdump_walk_pgd_level_core() takes a lot of time to
walk across all page tables and doing this without
a rescheduling causes soft lockups:

NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [swapper/0:1]
...
Call Trace:
ptdump_walk_pgd_level_core+0x40c/0x550
ptdump_walk_pgd_level_checkwx+0x17/0x20
mark_rodata_ro+0x13b/0x150
kernel_init+0x2f/0x120
ret_from_fork+0x2c/0x40

I guess that this issue might arise even without KASAN on huge machines
with several terabytes of RAM.

Stick cond_resched() in pgd loop to fix this.

Reported-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: kasan-dev@googlegroups.com
Cc: Alexander Potapenko <glider@google.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170210095405.31802-1-aryabinin@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 84e629b6 13-Jul-2016 Paul Gortmaker <paul.gortmaker@windriver.com>

x86: Don't use module.h just for AUTHOR / LICENSE tags

The Kconfig controlling compilation of these files are:

arch/x86/Kconfig.debug:config DEBUG_RODATA_TEST
arch/x86/Kconfig.debug: bool "Testcase for the marking rodata read-only"

arch/x86/Kconfig.debug:config X86_PTDUMP_CORE
arch/x86/Kconfig.debug: def_bool n

...meaning that it currently is not being built as a module by anyone.

Lets remove the couple traces of modular infrastructure use, so that
when reading the driver there is no doubt it is builtin-only.

We delete the MODULE_LICENSE tag etc. since all that information
is already contained at the top of the file in the comments.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-2-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0483e1fa 21-Jun-2016 Thomas Garnier <thgarnie@google.com>

x86/mm: Implement ASLR for kernel memory regions

Randomizes the virtual address space of kernel memory regions for
x86_64. This first patch adds the infrastructure and does not randomize
any region. The following patches will randomize the physical memory
mapping, vmalloc and vmemmap regions.

This security feature mitigates exploits relying on predictable kernel
addresses. These addresses can be used to disclose the kernel modules
base addresses or corrupt specific structures to elevate privileges
bypassing the current implementation of KASLR. This feature can be
enabled with the CONFIG_RANDOMIZE_MEMORY option.

The order of each memory region is not changed. The feature looks at the
available space for the regions based on different configuration options
and randomizes the base and space between each. The size of the physical
memory mapping is the available physical memory. No performance impact
was detected while testing the feature.

Entropy is generated using the KASLR early boot functions now shared in
the lib directory (originally written by Kees Cook). Randomization is
done on PGD & PUD page table levels to increase possible addresses. The
physical memory mapping code was adapted to support PUD level virtual
addresses. This implementation on the best configuration provides 30,000
possible virtual addresses in average for each memory region. An
additional low memory page is used to ensure each CPU can start with a
PGD aligned virtual address (for realmode).

x86/dump_pagetable was updated to correctly display each region.

Updated documentation on x86_64 memory layout accordingly.

Performance data, after all patches in the series:

Kernbench shows almost no difference (-+ less than 1%):

Before:

Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115) System Time 87.056 (0.456416) Percent CPU 1092.9
(13.892) Context Switches 199805 (3455.33) Sleeps 97907.8 (900.636)

After:

Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053) System Time 87.764 (0.49345) Percent CPU 1095
(12.7715) Context Switches 199036 (4298.1) Sleeps 97681.6 (1031.11)

Hackbench shows 0% difference on average (hackbench 90 repeated 10 times):

attemp,before,after 1,0.076,0.069 2,0.072,0.069 3,0.066,0.066 4,0.066,0.068
5,0.066,0.067 6,0.066,0.069 7,0.067,0.066 8,0.063,0.067 9,0.067,0.065
10,0.068,0.071 average,0.0677,0.0677

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# b176862f 18-Feb-2016 Borislav Petkov <bp@suse.de>

x86/mm/ptdump: Remove paravirt_enabled()

is_hypervisor_range() can simply check if the PGD index is
within ffff800000000000 - ffff87ffffffffff which is the range
reserved for a hypervisor. That range is practically an ABI, see
Documentation/x86/x86_64/mm.txt.

Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Under Xen, as PV guest
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1455825641-19585-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 173ae9ba 15-Dec-2015 Linus Torvalds <torvalds@linux-foundation.org>

Fix user-visible spelling error

Pavel Machek reports a warning about W+X pages found in the "Persisent"
kmap area. After grepping for it (using the correct spelling), and not
finding it, I noticed how the debug printk was just misspelled. Fix it.

The actual mapping bug that Pavel reported is still open. It's
apparently a separate issue from the known EFI page tables, looks like
it's related to the HIGHMEM mappings.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8609d1b5 19-Nov-2015 Kees Cook <keescook@chromium.org>

x86/mm: Turn CONFIG_X86_PTDUMP into a module

Being able to examine page tables is handy, so make this a
module that can be loaded as needed.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20151120010755.GA9060@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f4e342c8 05-Nov-2015 Boris Ostrovsky <boris.ostrovsky@oracle.com>

x86/mm: Skip the hypervisor range when walking PGD

The range between 0xffff800000000000 and 0xffff87ffffffffff is reserved
for hypervisor and therefore we should not try to follow PGD's indexes
corresponding to those addresses.

While this has always been a problem, with the new W+X warning
mechanism ptdump_walk_pgd_level_core() can now be called during boot,
causing a PV Xen guest to crash.

[ tglx: Replaced the macro with a readable inline ]

Fixes: e1a58320a38d "x86/mm: Warn on W^X mappings"
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: xen-devel@lists.xen.org
Link: http://lkml.kernel.org/r/1446749795-27764-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# e1a58320 04-Oct-2015 Stephen Smalley <sds@tycho.nsa.gov>

x86/mm: Warn on W^X mappings

Warn on any residual W+X mappings after setting NX
if DEBUG_WX is enabled. Introduce a separate
X86_PTDUMP_CORE config that enables the code for
dumping the page tables without enabling the debugfs
interface, so that DEBUG_WX can be enabled without
exposing the debugfs interface. Switch EFI_PGT_DUMP
to using X86_PTDUMP_CORE so that it also does not require
enabling the debugfs interface.

On success it prints this to the kernel log:

x86/mm: Checked W+X mappings: passed, no W+X pages found.

On failure it prints a warning and a count of the failed pages:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 note_page+0x610/0x7b0()
x86/mm: Found insecure W+X mapping at address ffffffff81755000/__stop___ex_table+0xfa8/0xabfa8
[...]
Call Trace:
[<ffffffff81380a5f>] dump_stack+0x44/0x55
[<ffffffff8109d3f2>] warn_slowpath_common+0x82/0xc0
[<ffffffff8109d48c>] warn_slowpath_fmt+0x5c/0x80
[<ffffffff8106cfc9>] ? note_page+0x5c9/0x7b0
[<ffffffff8106d010>] note_page+0x610/0x7b0
[<ffffffff8106d409>] ptdump_walk_pgd_level_core+0x259/0x3c0
[<ffffffff8106d5a7>] ptdump_walk_pgd_level_checkwx+0x17/0x20
[<ffffffff81063905>] mark_rodata_ro+0xf5/0x100
[<ffffffff817415a0>] ? rest_init+0x80/0x80
[<ffffffff817415bd>] kernel_init+0x1d/0xe0
[<ffffffff8174cd1f>] ret_from_fork+0x3f/0x70
[<ffffffff817415a0>] ? rest_init+0x80/0x80
---[ end trace a1f23a1e42a2ac76 ]---
x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1444064120-11450-1-git-send-email-sds@tycho.nsa.gov
[ Improved the Kconfig help text and made the new option default-y
if CONFIG_DEBUG_RODATA=y, because it already found buggy mappings,
so we really want people to have this on by default. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# da25e628 17-Sep-2015 Toshi Kani <toshi.kani@hpe.com>

x86/mm: Fix page table dump to show PAT bit

/sys/kernel/debug/kernel_page_tables does not show the PAT bit for
PUD/PMD mappings. This is because walk_pud_level(), walk_pmd_level()
and note_page() mask the flags with PTE_FLAGS_MASK, which does not
cover their PAT bit, _PAGE_PAT_LARGE.

Fix it by replacing the use of PTE_FLAGS_MASK with p?d_flags(),
which masks the flags properly.

Also change to show the PAT bit as "PAT" to be consistent with
other bits.

Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Robert Elliot <elliott@hpe.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1442514264-12475-7-git-send-email-toshi.kani@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# f439c429c3 03-Nov-2014 Juergen Gross <jgross@suse.com>

x86: Support PAT bit in pagetable dump for lower levels

Dumping page table protection bits is not correct for entries on levels
2 and 3 regarding the PAT bit, which is at a different position as on
level 4.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-16-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 8266e31e 21-Sep-2014 Mathias Krause <minipli@googlemail.com>

x86, ptdump: Add section for EFI runtime services

In commit 3891a04aafd6 ("x86-64, espfix: Don't leak bits 31:16 of %esp
returning..") the "ESPFix Area" was added to the page table dump special
sections. That area, though, has a limited amount of entries printed.

The EFI runtime services are, unfortunately, located in-between the
espfix area and the high kernel memory mapping. Due to the enforced
limitation for the espfix area, the EFI mappings won't be printed in the
page table dump.

To make the ESP runtime service mappings visible again, provide them a
dedicated entry.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>


# 8a5a5d15 07-Sep-2014 Mathias Krause <minipli@googlemail.com>

x86-64, ptdump: Mark espfix area only if existent

We should classify the espfix area as such only if we actually have
enabled the corresponding option. Otherwise the page table dump might
look confusing.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Link: http://lkml.kernel.org/r/1410114629-24523-1-git-send-email-minipli@googlemail.com
Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 3891a04a 29-Apr-2014 H. Peter Anvin <hpa@linux.intel.com>

x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack

The IRET instruction, when returning to a 16-bit segment, only
restores the bottom 16 bits of the user space stack pointer. This
causes some 16-bit software to break, but it also leaks kernel state
to user space. We have a software workaround for that ("espfix") for
the 32-bit kernel, but it relies on a nonzero stack segment base which
is not available in 64-bit mode.

In checkin:

b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
the logic that 16-bit support is crippled on 64-bit kernels anyway (no
V86 support), but it turns out that people are doing stuff like
running old Win16 binaries under Wine and expect it to work.

This works around this by creating percpu "ministacks", each of which
is mapped 2^16 times 64K apart. When we detect that the return SS is
on the LDT, we copy the IRET frame to the ministack and use the
relevant alias to return to userspace. The ministacks are mapped
readonly, so if IRET faults we promote #GP to #DF which is an IST
vector and thus has its own stack; we then do the fixup in the #DF
handler.

(Making #GP an IST exception would make the msr_safe functions unsafe
in NMI/MC context, and quite possibly have other effects.)

Special thanks to:

- Andy Lutomirski, for the suggestion of using very small stack slots
and copy (as opposed to map) the IRET frame there, and for the
suggestion to mark them readonly and let the fault promote to #DF.
- Konrad Wilk for paravirt fixup and testing.
- Borislav Petkov for testing help and useful comments.

Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrew Lutomriski <amluto@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dirk Hohndel <dirk@hohndel.org>
Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
Cc: comex <comexk@gmail.com>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <stable@vger.kernel.org> # consider after upstream merge


# ef6bea6d 17-Jan-2014 Borislav Petkov <bp@suse.de>

x86, ptdump: Add the functionality to dump an arbitrary pagetable

With reusing the ->trampoline_pgd page table for mapping EFI regions in
order to use them after having switched to EFI virtual mode, it is very
useful to be able to dump aforementioned page table in dmesg. This adds
that functionality through the walk_pgd_level() interface which can be
called from somewhere else.

The original functionality of dumping to debugfs remains untouched.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>


# 92851e2f 20-Jul-2010 Andres Salomon <dilinger@queued.net>

x86, mm: Create symbolic index into address_markers array

Without this, adding entries into the address_markers array means adding
more and more of an #ifdef maze in pt_dump_init(). By using indices, we
can keep it a bit saner.

Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <201007202219.o6KMJkUs021052@imap1.linux-foundation.org>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 6424fb38 14-Apr-2009 Yinghai Lu <yinghai@kernel.org>

x86: remove (null) in /sys kernel_page_tables

Impact: cleanup

%p prints out 0x000000000000000 as (null)
so use %lx instead.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49E43282.1090607@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 27990eac 19-Aug-2008 Jeremy Fitzhardinge <jeremy@goop.org>

x86: another user of PTE_FLAGS_MASK

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 77be1fab 21-Jul-2008 Jeremy Fitzhardinge <jeremy@goop.org>

x86: add PTE_FLAGS_MASK

PTE_PFN_MASK was getting lonely, so I made it a friend.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 59438c9f 21-Jul-2008 Jeremy Fitzhardinge <jeremy@goop.org>

x86: rename PTE_MASK to PTE_PFN_MASK

Rusty, in his peevish way, complained that macros defining constants
should have a name which somewhat accurately reflects the actual
purpose of the constant.

Aside from the fact that PTE_MASK gives no clue as to what's actually
being masked, and is misleadingly similar to the functionally entirely
different PMD_MASK, PUD_MASK and PGD_MASK, I don't really see what the
problem is.

But if this patch silences the incessent noise, then it will have
achieved its goal (TODO: write test-case).

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 684eb016 12-May-2008 Jiri Slaby <jirislaby@kernel.org>

x86_64: use PAGE_OFFSET in dump_pagetables

Use PAGE_OFFSET macro instead of using 0xffff810000000000UL directly.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: hpa@zytor.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# a4928cff 23-Apr-2008 Ingo Molnar <mingo@elte.hu>

"make namespacecheck" fixes

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9a79cf9c 07-Mar-2008 Yinghai Lu <yhlu.kernel@gmail.com>

x86: sort address_markers for dump_pagetables

otherwise Vmemmap and High Kernel Mapping string is not showing up.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# fe770bf0 17-Apr-2008 H. Peter Anvin <hpa@zytor.com>

x86: clean up the page table dumper and add 32-bit support

Clean up the page table dumper (fix boundary conditions, table driven
address ranges, some formatting changes since it is no longer using
the kernel log but a separate virtual file), and generalize to 32
bits.

[ mingo@elte.hu: x86: fix the pagetable dumper ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 926e5392 17-Apr-2008 Arjan van de Ven <arjan@linux.intel.com>

x86: add code to dump the (kernel) page tables for visual inspection by kernel developers

This patch adds code to the kernel to have an (optional)
/proc/kernel_page_tables debug file that basically dumps the kernel
pagetables; this allows us kernel developers to verify that nothing fishy is
going on and that the various mappings are set up correctly. This was quite
useful in finding various change_page_attr() bugs, and is very likely to be
useful in the future as well.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: mingo@elte.hu
Cc: tglx@tglx.de
Cc: hpa@zytor.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>