History log of /linux-master/arch/s390/mm/pageattr.c
Revision Date Author Comments
# 0a845e0f 04-Mar-2024 Peter Xu <peterx@redhat.com>

mm/treewide: replace pud_large() with pud_leaf()

pud_large() is always defined as pud_leaf(). Merge their usages. Chose
pud_leaf() because pud_leaf() is a global API, while pud_large() is not.

Link: https://lkml.kernel.org/r/20240305043750.93762-9-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 2f709f7b 04-Mar-2024 Peter Xu <peterx@redhat.com>

mm/treewide: replace pmd_large() with pmd_leaf()

pmd_large() is always defined as pmd_leaf(). Merge their usages. Chose
pmd_leaf() because pmd_leaf() is a global API, while pmd_large() is not.

Link: https://lkml.kernel.org/r/20240305043750.93762-8-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 527618ab 11-Sep-2023 Heiko Carstens <hca@linux.ibm.com>

s390/ctlreg: add struct ctlreg

Add struct ctlreg to enforce strict type checking / usage for control
register functions.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>


# 850612c8 25-Aug-2023 Heiko Carstens <hca@linux.ibm.com>

s390/set_memory: add __set_memory() variant

Add a __set_memory_yy() variant for all set_memory_yy()
implementations. The new variant takes start and end void pointers,
which allows them to be used without the usual unsigned long cast.

However more important: the new variant can be used for areas larger
than 8TB. The old variant comes with an "int numpages" parameter, which
overflows with more than 8TB. Given that for debug_pagealloc
set_memory_4k() is used on the whole kernel mapping this is not only a
theoretical problem, but must be fixed.

Changing all set_memory_yy() variants only on s390 to take an "unsigned
long numpages" parameter is not possible, since the common module code
requires an int parameter from all architectures on these functions.
See module_set_memory().

Therefore change/fix this on s390 only with a new interface, and address
common code later.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# 6ecc21bb 12-Jun-2023 Rick Edgecombe <rick.p.edgecombe@intel.com>

mm: Move pte/pmd_mkwrite() callers with no VMA to _novma()

The x86 Shadow stack feature includes a new type of memory called shadow
stack. This shadow stack memory has some unusual properties, which requires
some core mm changes to function properly.

One of these unusual properties is that shadow stack memory is writable,
but only in limited ways. These limits are applied via a specific PTE
bit combination. Nevertheless, the memory is writable, and core mm code
will need to apply the writable permissions in the typical paths that
call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so
that the x86 implementation of it can know whether to create regular
writable or shadow stack mappings.

But there are a couple of challenges to this. Modifying the signatures of
each arch pte_mkwrite() implementation would be error prone because some
are generated with macros and would need to be re-implemented. Also, some
pte_mkwrite() callers operate on kernel memory without a VMA.

So this can be done in a three step process. First pte_mkwrite() can be
renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite()
added that just calls pte_mkwrite_novma(). Next callers without a VMA can
be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers
can be changed to take/pass a VMA.

Earlier work did the first step, so next move the callers that don't have
a VMA to pte_mkwrite_novma(). Also do the same for pmd_mkwrite(). This
will be ok for the shadow stack feature, as these callers are on kernel
memory which will not need to be made shadow stack, and the other
architectures only currently support one type of memory in pte_mkwrite()

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/all/20230613001108.3040476-3-rick.p.edgecombe%40intel.com


# ef104443 16-May-2023 Arnd Bergmann <arnd@arndb.de>

procfs: consolidate arch_report_meminfo declaration

The arch_report_meminfo() function is provided by four architectures,
with a __weak fallback in procfs itself. On architectures that don't
have a custom version, the __weak version causes a warning because
of the missing prototype.

Remove the architecture specific prototypes and instead add one
in linux/proc_fs.h.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # for arch/x86
Acked-by: Helge Deller <deller@gmx.de> # parisc
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Message-Id: <20230516195834.551901-1-arnd@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 34e4c79f 14-Apr-2023 Heiko Carstens <hca@linux.ibm.com>

s390/mm: use VM_FLUSH_RESET_PERMS in module_alloc()

Make use of the set_direct_map() calls for module allocations.
In particular:

- All changes to read-only permissions in kernel VA mappings are also
applied to the direct mapping. Note that execute permissions are
intentionally not applied to the direct mapping in order to make
sure that all allocated pages within the direct mapping stay
non-executable

- module_alloc() passes the VM_FLUSH_RESET_PERMS to __vmalloc_node_range()
to make sure that all implicit permission changes made to the direct
mapping are reset when the allocated vm area is freed again

Side effects: the direct mapping will be fragmented depending on how many
vm areas with VM_FLUSH_RESET_PERMS and/or explicit page permission changes
are allocated and freed again.

For example, just after boot of a system the direct mapping statistics look
like:

$cat /proc/meminfo
...
DirectMap4k: 111628 kB
DirectMap1M: 16665600 kB
DirectMap2G: 0 kB

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>


# 0490d6d7 14-Apr-2023 Heiko Carstens <hca@linux.ibm.com>

s390/mm: enable ARCH_HAS_SET_DIRECT_MAP

Implement the set_direct_map_*() API, which allows to invalidate and set
default permissions to pages within the direct mapping.

Note that kernel_page_present(), which is also supposed to be part of this
API, is intentionally not implemented. The reason for this is that
kernel_page_present() is only used (and currently only makes sense) for
suspend/resume, which isn't supported on s390.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>


# 81e84796 06-Apr-2023 Heiko Carstens <hca@linux.ibm.com>

s390/mm: fix direct map accounting

Commit bb1520d581a3 ("s390/mm: start kernel with DAT enabled") did not
implement direct map accounting in the early page table setup code. In
result the reported values are bogus now:

$cat /proc/meminfo
...
DirectMap4k: 5120 kB
DirectMap1M: 18446744073709546496 kB
DirectMap2G: 0 kB

Fix this by adding the missing accounting. The result looks sane again:

$cat /proc/meminfo
...
DirectMap4k: 6156 kB
DirectMap1M: 2091008 kB
DirectMap2G: 6291456 kB

Fixes: bb1520d581a3 ("s390/mm: start kernel with DAT enabled")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>


# e4c31004 14-Feb-2023 Vasily Gorbik <gor@linux.ibm.com>

s390/mm,pageattr: allow KASAN shadow memory

Allow changing page table attributes for KASAN shadow memory ranges.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# 869a9dbc 21-Feb-2022 Heiko Carstens <hca@linux.ibm.com>

s390/mm,pageattr: don't use pte_val()/pXd_val() as lvalue

Convert pgtable code so pte_val()/pXd_val() aren't used as lvalue
anymore. This allows in later step to convert pte_val()/pXd_val() to
functions, which in turn makes it impossible to use these macros to
modify page table entries like they have been used before.

Therefore a construct like this:

pte_val(*pte) = __pa(addr) | prot;

which would directly write into a page table, isn't possible anymore
with the last step of this series.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>


# b8e3b379 21-Feb-2022 Heiko Carstens <hca@linux.ibm.com>

s390/mm: use set_pXd()/set_pte() helper functions everywhere

Use the new set_pXd()/set_pte() helper functions at all places where
page table entries are modified.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>


# 273cd173 11-Jan-2021 Alexander Gordeev <agordeev@linux.ibm.com>

s390/pgtable: use physical address for Page-Table Origin

Instructions IPTE, IDTE and CRDTE accept Page-Table Origin
as one of the arguments, but instead the pgtable virtual
address is passed. Fix that and also update the crdte()
prototype to conform to csp() and cspg() friends.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>


# c4f0e5cf 24-Aug-2021 Heiko Carstens <hca@linux.ibm.com>

s390/mm,pageattr: fix walk_pte_level() early exit

In case of splitting to 4k mapping the early exit in walk_pte_level()
must only be taken iff flags is equal to SET_MEMORY_4K.
Currently the early exit is taken if the flag is set, and also others
might be set. This may lead to the situation that a mapping is split
but other changes are not done, like e.g. setting pages to R/W.

There is currently no such caller, but there might be in the future.

Fixes: b3e1a00c8fa4 ("s390/mm: implement set_memory_4k()")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# cfafad6d 03-Aug-2021 Heiko Carstens <hca@linux.ibm.com>

s390/mm: use page_to_virt() in __kernel_map_pages()

Fix virtual vs physical address confusion (which currently are the same).

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# e41ba111 28-Jul-2021 Sven Schnelle <svens@linux.ibm.com>

s390: add support for KFENCE

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
[hca@linux.ibm.com: simplify/rework code]
Link: https://lore.kernel.org/r/20210728190254.3921642-4-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# b3e1a00c 28-Jul-2021 Heiko Carstens <hca@linux.ibm.com>

s390/mm: implement set_memory_4k()

Implement set_memory_4k() which will split any present large or huge
mapping in the given range to a 4k mapping.

Link: https://lore.kernel.org/r/20210728190254.3921642-2-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>


# da1694ad 07-Sep-2020 Heiko Carstens <hca@linux.ibm.com>

s390/mm,ptdump: hold cpa mutex while walking for kernel page table dump

This is currently only preventing that outdated information is
provided to user space. A concurrent split of huge/large pages does
modify the kernel page tables, however either the huge/large mapping
is reported or the split area is being walked.

This "fixes" also only a potential future bug, since split pages could
also be merged again if page permissions are the same for larger
memory areas.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>


# 974b9b2c 08-Jun-2020 Mike Rapoport <rppt@kernel.org>

mm: consolidate pte_index() and pte_offset_*() definitions

All architectures define pte_index() as

(address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)

and all architectures define pte_offset_kernel() as an entry in the array
of PTEs indexed by the pte_index().

For the most architectures the pte_offset_kernel() implementation relies
on the availability of pmd_page_vaddr() that converts a PMD entry value to
the virtual address of the page containing PTEs array.

Let's move x86 definitions of the PTE accessors to the generic place in
<linux/pgtable.h> and then simply drop the respective definitions from the
other architectures.

The architectures that didn't provide pmd_page_vaddr() are updated to have
that defined.

The generic implementation of pte_offset_kernel() can be overridden by an
architecture and alpha makes use of this because it has special ordering
requirements for its version of pte_offset_kernel().

[rppt@linux.ibm.com: v2]
Link: http://lkml.kernel.org/r/20200514170327.31389-11-rppt@kernel.org
[rppt@linux.ibm.com: update]
Link: http://lkml.kernel.org/r/20200514170327.31389-12-rppt@kernel.org
[rppt@linux.ibm.com: update]
Link: http://lkml.kernel.org/r/20200514170327.31389-13-rppt@kernel.org
[akpm@linux-foundation.org: fix x86 warning]
[sfr@canb.auug.org.au: fix powerpc build]
Link: http://lkml.kernel.org/r/20200607153443.GB738695@linux.ibm.com

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-10-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e05c7b1f 08-Jun-2020 Mike Rapoport <rppt@kernel.org>

mm: pgtable: add shortcuts for accessing kernel PMD and PTE

The powerpc 32-bit implementation of pgtable has nice shortcuts for
accessing kernel PMD and PTE for a given virtual address. Make these
helpers available for all architectures.

[rppt@linux.ibm.com: microblaze: fix page table traversal in setup_rt_frame()]
Link: http://lkml.kernel.org/r/20200518191511.GD1118872@kernel.org
[akpm@linux-foundation.org: s/pmd_ptr_k/pmd_off_k/ in various powerpc places]

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-9-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e31cf2f4 08-Jun-2020 Mike Rapoport <rppt@kernel.org>

mm: don't include asm/pgtable.h if linux/mm.h is already included

Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once. For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.

Most of these definitions are actually identical and typically it boils
down to, e.g.

static inline unsigned long pmd_index(unsigned long address)
{
return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}

These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.

These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.

This patch (of 12):

The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
functions involving page table manipulations, e.g. pte_alloc() and
pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h>
in the files that include <linux/mm.h>.

The include statements in such cases are remove with a simple loop:

for f in $(git grep -l "include <linux/mm.h>") ; do
sed -i -e '/include <asm\/pgtable.h>/ d' $f
done

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 39421627 18-Mar-2020 Heiko Carstens <hca@linux.ibm.com>

s390: remove broken hibernate / power management support

Hibernation is known to be broken for many years on s390. Given that
there aren't any real use cases, remove the code instead of spending
time to fix and maintain it.

Without hibernate support it doesn't make too much sense to keep power
management support; therefore remove it completely.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>


# 964c2c05 13-Jul-2018 Dominik Dingel <dingel@linux.vnet.ibm.com>

s390/mm: Clear huge page storage keys on enable_skey

When a guest starts using storage keys, we trap and set a default one
for its whole valid address space. With this patch we are now able to
do that for large pages.

To speed up the storage key insertion, we use
__storage_key_init_range, which in-turn will use sske_frame to set
multiple storage keys with one instruction. As it has been previously
used for debuging we have to get rid of the default key check and make
it quiescing.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
[replaced page_set_storage_key loop with __storage_key_init_range]
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# a01ef308 16-Jun-2017 Heiko Carstens <hca@linux.ibm.com>

s390/mm,vmem: simplify region and segment table allocation code

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 28c807e5 26-Jul-2016 Martin Schwidefsky <schwidefsky@de.ibm.com>

s390/mm: add guest ASCE TLB flush optimization

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 118bd31b 26-Jul-2016 Martin Schwidefsky <schwidefsky@de.ibm.com>

s390/mm: add no-dat TLB flush optimization

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 1aea9b3f 24-Apr-2017 Martin Schwidefsky <schwidefsky@de.ibm.com>

s390/mm: implement 5 level pages tables

Add the logic to upgrade the page table for a 64-bit process to
five levels. This increases the TASK_SIZE from 8PB to 16EB-4K.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# e6c7c630 08-May-2017 Laura Abbott <labbott@redhat.com>

s390: use set_memory.h header

set_memory_* functions have moved to set_memory.h. Switch to this
explicitly

Link: http://lkml.kernel.org/r/1488920133-27229-5-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1366def3 24-Apr-2017 Heiko Carstens <hca@linux.ibm.com>

s390/pageattr: avoid unnecessary page table splitting

The kernel page table splitting code will split page tables even for
features the CPU does not support. E.g. a CPU may not support the NX
feature.
In order to avoid this, remove those bits from the flags parameter
that correlate with unsupported CPU features within __set_memory(). In
addition add an early exit if the flags parameter does not have any
bits set afterwards.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# ff24b07a 09-Feb-2017 Paul Gortmaker <paul.gortmaker@windriver.com>

s390: mm: Audit and remove any unnecessary uses of module.h

Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends. That changed
when we forked out support for the latter into the export.h file.

This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig. The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.

Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each change instance
for the presence of either and replace as needed. An instance
where module_param was used without moduleparam.h was also fixed,
as well as an implict use of asm/elf.h header.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 57d7f939 22-Mar-2016 Martin Schwidefsky <schwidefsky@de.ibm.com>

s390: add no-execute support

Bit 0x100 of a page table, segment table of region table entry
can be used to disallow code execution for the virtual addresses
associated with the entry.

There is one tricky bit, the system call to return from a signal
is part of the signal frame written to the user stack. With a
non-executable stack this would stop working. To avoid breaking
things the protection fault handler checks the opcode that caused
the fault for 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn)
and injects a system call. This is preferable to the alternative
solution with a stub function in the vdso because it works for
vdso=off and statically linked binaries as well.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 34eeaf37 13-Jun-2016 Martin Schwidefsky <schwidefsky@de.ibm.com>

s390/mm: merge local / non-local IPTE helper

Merge the __ptep_ipte and __ptep_ipte_local functions into a single
__ptep_ipte function with an additional parameter. The __pte_ipte_range
function is still extra as the while loops makes it hard to merge.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 4d81aaa5 08-Aug-2016 Heiko Carstens <hca@linux.ibm.com>

s390/pageattr: handle numpages parameter correctly

Both set_memory_ro() and set_memory_rw() will modify the page
attributes of at least one page, even if the numpages parameter is
zero.

The author expected that calling these functions with numpages == zero
would never happen. However with the new 444d13ff10fb ("modules: add
ro_after_init support") feature this happens frequently.

Therefore do the right thing and make these two functions return
gracefully if nothing should be done.

Fixes crashes on module load like this one:

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 000003ff80008000 TEID: 000003ff80008407
Fault in home space mode while using kernel ASCE.
AS:0000000000d18007 R3:00000001e6aa4007 S:00000001e6a10800 P:00000001e34ee21d
Oops: 0004 ilc:3 [#1] SMP
Modules linked in: x_tables
CPU: 10 PID: 1 Comm: systemd Not tainted 4.7.0-11895-g3fa9045 #4
Hardware name: IBM 2964 N96 703 (LPAR)
task: 00000001e9118000 task.stack: 00000001e9120000
Krnl PSW : 0704e00180000000 00000000005677f8 (rb_erase+0xf0/0x4d0)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 000003ff80008b20 000003ff80008b20 000003ff80008b70 0000000000b9d608
000003ff80008b20 0000000000000000 00000001e9123e88 000003ff80008950
00000001e485ab40 000003ff00000000 000003ff80008b00 00000001e4858480
0000000100000000 000003ff80008b68 00000000001d5998 00000001e9123c28
Krnl Code: 00000000005677e8: ec1801c3007c cgij %r1,0,8,567b6e
00000000005677ee: e32010100020 cg %r2,16(%r1)
#00000000005677f4: a78401c2 brc 8,567b78
>00000000005677f8: e35010080024 stg %r5,8(%r1)
00000000005677fe: ec5801af007c cgij %r5,0,8,567b5c
0000000000567804: e30050000024 stg %r0,0(%r5)
000000000056780a: ebacf0680004 lmg %r10,%r12,104(%r15)
0000000000567810: 07fe bcr 15,%r14
Call Trace:
([<000003ff80008900>] __this_module+0x0/0xffffffffffffd700 [x_tables])
([<0000000000264fd4>] do_init_module+0x12c/0x220)
([<00000000001da14a>] load_module+0x24e2/0x2b10)
([<00000000001da976>] SyS_finit_module+0xbe/0xd8)
([<0000000000803b26>] system_call+0xd6/0x264)
Last Breaking-Event-Address:
[<000000000056771a>] rb_erase+0x12/0x4d0
Kernel panic - not syncing: Fatal exception: panic_on_oops

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Fixes: e8a97e42dc98 ("s390/pageattr: allow kernel page table splitting")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# de3fa841 13-Jun-2016 Heiko Carstens <hca@linux.ibm.com>

s390/mm: fix compile for PAGE_DEFAULT_KEY != 0

The usual problem for code that is ifdef'ed out is that it doesn't
compile after a while. That's also the case for the storage key
initialisation code, if it would be used (set PAGE_DEFAULT_KEY to
something not zero):

./arch/s390/include/asm/page.h: In function 'storage_key_init_range':
./arch/s390/include/asm/page.h:36:2: error: implicit declaration of function '__storage_key_init_range'

Since the code itself has been useful for debugging purposes several
times, remove the ifdefs and make sure the code gets compiler
coverage. The cost for this is eight bytes.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 37cd944c 20-May-2016 Heiko Carstens <hca@linux.ibm.com>

s390/pgtable: add mapping statistics

Add statistics that show how memory is mapped within the kernel
identity mapping. This is more or less the same like git
commit ce0c0e50f94e ("x86, generic: CPA add statistics about state
of direct mapping v4") for x86.

I also intentionally copied the lower case "k" within DirectMap4k vs
the upper case "M" and "G" within the two other lines. Let's have
consistent inconsistencies across architectures.

The output of /proc/meminfo now contains these additional lines:

DirectMap4k: 2048 kB
DirectMap1M: 3991552 kB
DirectMap2G: 4194304 kB

The implementation on s390 is lockless unlike the x86 version, since I
assume changes to the kernel mapping are a very rare event. Therefore
it really doesn't matter if these statistics could potentially be
inconsistent if read while kernel pages tables are being changed.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# e8a97e42 17-May-2016 Heiko Carstens <hca@linux.ibm.com>

s390/pageattr: allow kernel page table splitting

set_memory_ro() and set_memory_rw() currently only work on 4k
mappings, which is good enough for module code aka the vmalloc area.

However we stumbled already twice into the need to make this also work
on larger mappings:
- the ro after init patch set
- the crash kernel resize code

Therefore this patch implements automatic kernel page table splitting
if e.g. set_memory_ro() would be called on parts of a 2G mapping.
This works quite the same as the x86 code, but is much simpler.

In order to make this work and to be architecturally compliant we now
always use the csp, cspg or crdte instructions to replace valid page
table entries. This means that set_memory_ro() and set_memory_rw()
will be much more expensive than before. In order to avoid huge
latencies the code contains a couple of cond_resched() calls.

The current code only splits page tables, but does not merge them if
it would be possible. The reason for this is that currently there is
no real life scenarion where this would really happen. All current use
cases that I know of only change access rights once during the life
time. If that should change we can still implement kernel page table
merging at a later time.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 3e76ee99 19-May-2016 Heiko Carstens <hca@linux.ibm.com>

s390/mm: always use PAGE_KERNEL when mapping pages

Always use PAGE_KERNEL when re-enabling pages within the kernel
mapping due to debug pagealloc. Without using this pgprot value
pte_mkwrite() and pte_wrprotect() won't work on such mappings after an
unmap -> map cycle anymore.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 007ccec5 03-Feb-2016 Martin Schwidefsky <schwidefsky@de.ibm.com>

s390/pageattr: do a single TLB flush for change_page_attr

The change of the access rights for an address range in the kernel
address space is currently done with a loop of IPTE + a store of the
modified PTE. Between the IPTE and the store the PTE will be invalid,
this intermediate state can cause problems with concurrent accesses.

Consider a change of a kernel area from read-write to read-only, a
concurrent reader of that area should be fine but with the invalid
PTE it might get an unexpected exception.

Remove the IPTEs for each PTE and do a global flush after all PTEs
have been modified.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 5a79859a 12-Feb-2015 Heiko Carstens <hca@linux.ibm.com>

s390: remove 31 bit support

Remove the 31 bit support in order to reduce maintenance cost and
effectively remove dead code. Since a couple of years there is no
distribution left that comes with a 31 bit kernel.

The 31 bit kernel also has been broken since more than a year before
anybody noticed. In addition I added a removal warning to the kernel
shown at ipl for 5 minutes: a960062e5826 ("s390: add 31 bit warning
message") which let everybody know about the plan to remove 31 bit
code. We didn't get any response.

Given that the last 31 bit only machine was introduced in 1999 let's
remove the code.
Anybody with 31 bit user space code can still use the compat mode.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 031bc574 12-Dec-2014 Joonsoo Kim <iamjoonsoo.kim@lge.com>

mm/debug-pagealloc: make debug-pagealloc boottime configurable

Now, we have prepared to avoid using debug-pagealloc in boottime. So
introduce new kernel-parameter to disable debug-pagealloc in boottime, and
makes related functions to be disabled in this case.

Only non-intuitive part is change of guard page functions. Because guard
page is effective only if debug-pagealloc is enabled, turning off
according to debug-pagealloc is reasonable thing to do.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cfb0b241 23-Sep-2014 Heiko Carstens <hca@linux.ibm.com>

s390/mm: make use of ipte range facility

Invalidate several pte entries at once if the ipte range facility
is available. Currently this works only for DEBUG_PAGE_ALLOC where
several up to 2 ^ MAX_ORDER may be invalidated at once.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 127c1fef 06-Oct-2013 Martin Schwidefsky <schwidefsky@de.ibm.com>

s390/mm: do not initialize storage keys

With dirty and referenced bits implemented in software it is unnecessary
to initialize the storage key for every page. With this patch not a single
storage key operation is done for a system that does not use KVM.
For KVM set_pte_at/pgste_set_key will do the initialization for the guest
view of the storage key when the mapping for the page is established in
the host.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# e5098611 23-Jul-2013 Martin Schwidefsky <schwidefsky@de.ibm.com>

s390/mm: cleanup page table definitions

Improve the encoding of the different pte types and the naming of the
page, segment table and region table bits. Due to the different pte
encoding the hugetlbfs primitives need to be adapted as well. To improve
compatability with common code make the huge ptes use the encoding of
normal ptes. The conversion between the pte and pmd encoding for a huge
pte is done with set_huge_pte_at and huge_ptep_get.
Overall the code is now easier to understand.

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# f7f8d7e5 14-Mar-2013 Heiko Carstens <hca@linux.ibm.com>

s390/mm: speedup storage key initialization

Use sske with multiple block control to initialize storage keys within
a 1 MB frame at once.
It turned out that the sske with mb=1 is an order of magnitude faster
than pfmf. This is only an issue for very large systems (several 100GB)
where storage key initialization could last more than a minute.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# abf09bed 07-Nov-2012 Martin Schwidefsky <schwidefsky@de.ibm.com>

s390/mm: implement software dirty bits

The s390 architecture is unique in respect to dirty page detection,
it uses the change bit in the per-page storage key to track page
modifications. All other architectures track dirty bits by means
of page table entries. This property of s390 has caused numerous
problems in the past, e.g. see git commit ef5d437f71afdf4a
"mm: fix XFS oops due to dirty pages without buffers on s390".

To avoid future issues in regard to per-page dirty bits convert
s390 to a fault based software dirty bit detection mechanism. All
user page table entries which are marked as clean will be hardware
read-only, even if the pte is supposed to be writable. A write by
the user process will trigger a protection fault which will cause
the user pte to be marked as dirty and the hardware read-only bit
is removed.

With this change the dirty bit in the storage key is irrelevant
for Linux as a host, but the storage key is still required for
KVM guests. The effect is that page_test_and_clear_dirty and the
related code can be removed. The referenced bit in the storage
key is still used by the page_test_and_clear_young primitive to
provide page age information.

For page cache pages of mappings with mapping_cap_account_dirty
there will not be any change in behavior as the dirty bit tracking
already uses read-only ptes to control the amount of dirty pages.
Only for swap cache pages and pages of mappings without
mapping_cap_account_dirty there can be additional protection faults.
To avoid an excessive number of additional faults the mk_pte
primitive checks for PageDirty if the pgprot value allows for writes
and pre-dirties the pte. That avoids all additional faults for
tmpfs and shmem pages until these pages are added to the swap cache.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 0a4ccc99 02-Nov-2012 Heiko Carstens <hca@linux.ibm.com>

s390/mm: move kernel_page_present/kernel_map_pages to page_attr.c

Keep related functions together and move to appropriate file.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 6b70a920 01-Nov-2012 Heiko Carstens <hca@linux.ibm.com>

s390/memory hotplug: use pfmf instruction to initialize storage keys

Move and rename init_storage_keys() to pageattr.c, so it can also be
used from the sclp memory hotplug code in order to initialize
storage keys.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 18da2369 08-Oct-2012 Heiko Carstens <hca@linux.ibm.com>

s390/mm,vmem: use 2GB frames for identity mapping

Use 2GB frames for indentity mapping if EDAT2 is
available to reduce TLB pressure.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# cc5b9a45 01-Oct-2012 Heiko Carstens <hca@linux.ibm.com>

s390/mm,pageattr: remove superfluous EXPORT_SYMBOLs

Remove some EXPORT_SYMBOLs. There is no module code anywhere
which calls these pageattr functions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 5b1ba9e3 01-Oct-2012 Heiko Carstens <hca@linux.ibm.com>

s390/mm,pageattr: add more page table walk sanity checks

The current page table walk code in pageattr.c only checks for large pages
while walking the kernel page table, but happily assumes that everything
else is just fine.
Add more checks so we never access invalid memory regions.

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 378b1e7a 30-Sep-2012 Heiko Carstens <hca@linux.ibm.com>

s390/mm: fix pmd_huge() usage for kernel mapping

pmd_huge() will always return 0 on !HUGETLBFS, however we use that helper
function when walking the kernel page tables to decide if we have a
1MB page frame or not.
Since we create 1MB frames for the kernel 1:1 mapping independently of
HUGETLBFS this can lead to incorrect storage accesses since the code
can assume that we have a pointer to a page table instead of a pointer
to a 1MB frame.

Fix this by adding a pmd_large() primitive like other architectures have
it already and remove all references to HUGETLBFS/HUGETLBPAGE from the
code that walks kernel page tables.

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 638ad34a 30-Oct-2011 Martin Schwidefsky <schwidefsky@de.ibm.com>

[S390] sparse: fix sparse warnings about missing prototypes

Add prototypes and includes for functions used in different modules.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# b2fa47e6 23-May-2011 Martin Schwidefsky <schwidefsky@de.ibm.com>

[S390] refactor page table functions for better pgste support

Rework the architecture page table functions to access the bits in the
page table extension array (pgste). There are a number of changes:
1) Fix missing pgste update if the attach_count for the mm is <= 1.
2) For every operation that affects the invalid bit in the pte or the
rcp byte in the pgste the pcl lock needs to be acquired. The function
pgste_get_lock gets the pcl lock and returns the current pgste value
for a pte pointer. The function pgste_set_unlock stores the pgste
and releases the lock. Between these two calls the bits in the pgste
can be shuffled.
3) Define two software bits in the pte _PAGE_SWR and _PAGE_SWC to avoid
calling SetPageDirty and SetPageReferenced from pgtable.h. If the
host reference backup bit or the host change backup bit has been
set the dirty/referenced state is transfered to the pte. The common
code will pick up the state from the pte.
4) Add ptep_modify_prot_start and ptep_modify_prot_commit for mprotect.
5) Remove pgd_populate_kernel, pud_populate_kernel, pmd_populate_kernel
pgd_clear_kernel, pud_clear_kernel, pmd_clear_kernel and ptep_invalidate.
6) Rename kvm_s390_test_and_clear_page_dirty to
ptep_test_and_clear_user_dirty and add ptep_test_and_clear_user_young.
7) Define mm_exclusive() and mm_has_pgste() helper to improve readability.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 448694a1 19-May-2011 Jan Glauber <jan.glauber@gmail.com>

module: undo module RONX protection correctly.

While debugging I stumbled over two problems in the code that protects module
pages.

First issue is that disabling the protection before freeing init or unload of
a module is not symmetric with the enablement. For instance, if pages are set
to RO the page range from module_core to module_core + core_ro_size is
protected. If a module is unloaded the page range from module_core to
module_core + core_size is set back to RW.
So pages that were not set to RO are also changed to RW.
This is not critical but IMHO it should be symmetric.

Second issue is that while set_memory_rw & set_memory_ro are used for
RO/RW changes only set_memory_nx is involved for NX/X. One would await that
the inverse function is called when the NX protection should be removed,
which is not the case here, unless I'm missing something.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# e4c031b4 20-Apr-2011 Jan Glauber <jan.glauber@gmail.com>

[S390] fix page table walk for changing page attributes

The page table walk for changing page attributes used the wrong
address for pgd/pud/pmd lookups if the range was bigger than
a pmd entry. Fix the lookup by using the correct address.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 305b1523 15-Mar-2011 Jan Glauber <jan.glauber@gmail.com>

[S390] Write protect module text and RO data

Implement write protection for kernel modules text and read-only
data sections by implementing set_memory_[ro|rw] on s390.
Since s390 has no execute bit in the pte's NX is not supported.

set_memory_[ro|rw] will only work on normal pages and not on
large pages, so in case a large page should be modified bail
out with a warning.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>