#
c05995b7 |
|
04-Mar-2024 |
Peter Xu <peterx@redhat.com> |
mm/treewide: align up pXd_leaf() retval across archs Even if pXd_leaf() API is defined globally, it's not clear on the retval, and there are three types used (bool, int, unsigned log). Always return a boolean for pXd_leaf() APIs. Link: https://lkml.kernel.org/r/20240305043750.93762-11-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
e72c7c2b |
|
04-Mar-2024 |
Peter Xu <peterx@redhat.com> |
mm/treewide: drop pXd_large() They're not used anymore, drop all of them. Link: https://lkml.kernel.org/r/20240305043750.93762-10-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
ce7a9de3 |
|
29-Jan-2024 |
David Hildenbrand <david@redhat.com> |
sparc/pgtable: define PFN_PTE_SHIFT We want to make use of pte_next_pfn() outside of set_ptes(). Let's simply define PFN_PTE_SHIFT, required by pte_next_pfn(). Link: https://lkml.kernel.org/r/20240129124649.189745-8-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Tested-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Russell King (Oracle) <linux@armlinux.org.uk> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
533c67e6 |
|
27-Dec-2023 |
Kinsey Ho <kinseyho@google.com> |
mm/mglru: add dummy pmd_dirty() Add dummy pmd_dirty() for architectures that don't provide it. This is similar to commit 6617da8fb565 ("mm: add dummy pmd_young() for architectures not having it"). Link: https://lkml.kernel.org/r/20231227141205.2200125-5-kinseyho@google.com Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312210606.1Etqz3M4-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202312210042.xQEiqlEh-lkp@intel.com/ Signed-off-by: Kinsey Ho <kinseyho@google.com> Suggested-by: Yu Zhao <yuzhao@google.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Donet Tom <donettom@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
2f0584f3 |
|
12-Jun-2023 |
Rick Edgecombe <rick.p.edgecombe@intel.com> |
mm: Rename arch pte_mkwrite()'s to pte_mkwrite_novma() The x86 Shadow stack feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One of these unusual properties is that shadow stack memory is writable, but only in limited ways. These limits are applied via a specific PTE bit combination. Nevertheless, the memory is writable, and core mm code will need to apply the writable permissions in the typical paths that call pte_mkwrite(). The goal is to make pte_mkwrite() take a VMA, so that the x86 implementation of it can know whether to create regular writable or shadow stack mappings. But there are a couple of challenges to this. Modifying the signatures of each arch pte_mkwrite() implementation would be error prone because some are generated with macros and would need to be re-implemented. Also, some pte_mkwrite() callers operate on kernel memory without a VMA. So this can be done in a three step process. First pte_mkwrite() can be renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite() added that just calls pte_mkwrite_novma(). Next callers without a VMA can be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers can be changed to take/pass a VMA. Start the process by renaming pte_mkwrite() to pte_mkwrite_novma() and adding the pte_mkwrite() wrapper in linux/pgtable.h. Apply the same pattern for pmd_mkwrite(). Since not all archs have a pmd_mkwrite_novma(), create a new arch config HAS_HUGE_PAGE that can be used to tell if pmd_mkwrite() should be defined. Otherwise in the !HAS_HUGE_PAGE cases the compiler would not be able to find pmd_mkwrite_novma(). No functional change. Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/lkml/CAHk-=wiZjSu7c9sFYZb3q04108stgHff2wfbokGCCgW7riz+8Q@mail.gmail.com/ Link: https://lore.kernel.org/all/20230613001108.3040476-2-rick.p.edgecombe%40intel.com
|
#
1a10a44d |
|
02-Aug-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
sparc64: implement the new page table range API Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and flush_icache_pages(). Convert the PG_dcache_dirty flag from being per-page to per-folio. Link: https://lkml.kernel.org/r/20230802151406.3735276-27-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
fa2e71a6 |
|
11-Apr-2023 |
David Hildenbrand <david@redhat.com> |
sparc/mm: don't unconditionally set HW writable bit when setting PTE dirty on 64bit On sparc64, there is no HW modified bit, therefore, SW tracks via a SW bit if the PTE is dirty via pte_mkdirty(). However, pte_mkdirty() currently also unconditionally sets the HW writable bit, which is wrong. pte_mkdirty() is not supposed to make a PTE actually writable, unless the SW writable bit -- pte_write() -- indicates that the PTE is not write-protected. Fortunately, sparc64 also defines a SW writable bit. For example, this already turned into a problem in the context of THP splitting as documented in commit 624a2c94f5b7 ("Partly revert "mm/thp: carry over dirty bit when thp splits on pmd""), and for page migration, as documented in commit 96a9c287e25d ("mm/migrate: fix wrongly apply write bit after mkdirty on sparc64"). Also, we might want to use the dirty PTE bit in the context of KSM with shared zeropage [1], whereby setting the page writable would be problematic. But more general, any code that might end up setting a PTE/PMD dirty inside a VM without write permissions is possibly broken, Before this commit (sun4u in QEMU): root@debian:~/linux/tools/testing/selftests/mm# ./mkdirty # [INFO] detected THP size: 8192 KiB TAP version 13 1..6 # [INFO] PTRACE write access not ok 1 SIGSEGV generated, page not modified # [INFO] PTRACE write access to THP not ok 2 SIGSEGV generated, page not modified # [INFO] Page migration ok 3 SIGSEGV generated, page not modified # [INFO] Page migration of THP ok 4 SIGSEGV generated, page not modified # [INFO] PTE-mapping a THP ok 5 SIGSEGV generated, page not modified # [INFO] UFFDIO_COPY not ok 6 SIGSEGV generated, page not modified Bail out! 3 out of 6 tests failed # Totals: pass:3 fail:3 xfail:0 xpass:0 skip:0 error:0 Test #3,#4,#5 pass ever since we added some MM workarounds, the underlying issue remains. Let's fix the remaining issues and prepare for reverting the workarounds by setting the HW writable bit only if both, the SW dirty bit and the SW writable bit are set. We have to move pte_dirty() and pte_write() up. The code patching mechanism and handling constants > 22bit is a bit special on sparc64. The ASM logic in pte_mkdirty() and pte_mkwrite() match the logic in pte_mkold() to create the mask depending on the machine type. The ASM logic in __pte_mkhwwrite() matches the logic in pte_present(), just using an "or" instead of an "and" instruction. With this commit (sun4u in QEMU): root@debian:~/linux/tools/testing/selftests/mm# ./mkdirty # [INFO] detected THP size: 8192 KiB TAP version 13 1..6 # [INFO] PTRACE write access ok 1 SIGSEGV generated, page not modified # [INFO] PTRACE write access to THP ok 2 SIGSEGV generated, page not modified # [INFO] Page migration ok 3 SIGSEGV generated, page not modified # [INFO] Page migration of THP ok 4 SIGSEGV generated, page not modified # [INFO] PTE-mapping a THP ok 5 SIGSEGV generated, page not modified # [INFO] UFFDIO_COPY ok 6 SIGSEGV generated, page not modified # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0 This handling seems to have been in place forever. [1] https://lkml.kernel.org/r/533a7c3d-3a48-b16b-b421-6e8386e0b142@redhat.com Link: https://lkml.kernel.org/r/20230411142512.438404-4-david@redhat.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
950fe885 |
|
13-Jan-2023 |
David Hildenbrand <david@redhat.com> |
mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVE __HAVE_ARCH_PTE_SWP_EXCLUSIVE is now supported by all architectures that support swp PTEs, so let's drop it. Link: https://lkml.kernel.org/r/20230113171026.582290-27-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
adf8e329 |
|
13-Jan-2023 |
David Hildenbrand <david@redhat.com> |
sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 64bit Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the type. Generic MM currently only uses 5 bits for the type (MAX_SWAPFILES_SHIFT), so the stolen bit was effectively unused. While at it, mask the type in __swp_entry(). Link: https://lkml.kernel.org/r/20230113171026.582290-23-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
6617da8f |
|
30-Nov-2022 |
Juergen Gross <jgross@suse.com> |
mm: add dummy pmd_young() for architectures not having it In order to avoid #ifdeffery add a dummy pmd_young() implementation as a fallback. This is required for the later patch "mm: introduce arch_has_hw_nonleaf_pmd_young()". Link: https://lkml.kernel.org/r/fd3ac3cd-7349-6bbd-890a-71a9454ca0b3@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Yu Zhao <yuzhao@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
25740d31 |
|
10-Jul-2022 |
Anshuman Khandual <anshuman.khandual@arm.com> |
sparc/mm: move protection_map[] inside the platform This moves protection_map[] inside the platform and while here, also enable ARCH_HAS_VM_GET_PAGE_PROT on 32 bit platforms via DECLARE_VM_GET_PAGE_PROT. Link: https://lkml.kernel.org/r/20220711070600.2378316-5-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Chris Zankel <chris@zankel.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
dc4875f0 |
|
07-Jul-2021 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t * No functional change in this patch. [aneesh.kumar@linux.ibm.com: m68k build error reported by kernel robot] Link: https://lkml.kernel.org/r/87tulxnb2v.fsf@linux.ibm.com Link: https://lkml.kernel.org/r/20210615110859.320299-2-aneesh.kumar@linux.ibm.com Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9cf6fa24 |
|
07-Jul-2021 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t * No functional change in this patch. [aneesh.kumar@linux.ibm.com: fix] Link: https://lkml.kernel.org/r/87wnqtnb60.fsf@linux.ibm.com [sfr@canb.auug.org.au: another fix] Link: https://lkml.kernel.org/r/20210619134410.89559-1-aneesh.kumar@linux.ibm.com Link: https://lkml.kernel.org/r/20210615110859.320299-1-aneesh.kumar@linux.ibm.com Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1c2f7d14 |
|
30-Jun-2021 |
Anshuman Khandual <anshuman.khandual@arm.com> |
mm/thp: define default pmd_pgtable() Currently most platforms define pmd_pgtable() as pmd_page() duplicating the same code all over. Instead just define a default value i.e pmd_page() for pmd_pgtable() and let platforms override when required via <asm/pgtable.h>. All the existing platform that override pmd_pgtable() have been moved into their respective <asm/pgtable.h> header in order to precede before the new generic definition. This makes it much cleaner with reduced code. Link: https://lkml.kernel.org/r/1623646133-20306-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Nick Hu <nickhu@andestech.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Brian Cain <bcain@codeaurora.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Stafford Horne <shorne@gmail.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fac7757e |
|
30-Jun-2021 |
Anshuman Khandual <anshuman.khandual@arm.com> |
mm: define default value for FIRST_USER_ADDRESS Currently most platforms define FIRST_USER_ADDRESS as 0UL duplication the same code all over. Instead just define a generic default value (i.e 0UL) for FIRST_USER_ADDRESS and let the platforms override when required. This makes it much cleaner with reduced code. The default FIRST_USER_ADDRESS here would be skipped in <linux/pgtable.h> when the given platform overrides its value via <asm/pgtable.h>. Link: https://lkml.kernel.org/r/1620615725-24623-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Guo Ren <guoren@kernel.org> [csky] Acked-by: Stafford Horne <shorne@gmail.com> [openrisc] Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> [RISC-V] Cc: Richard Henderson <rth@twiddle.net> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Brian Cain <bcain@codeaurora.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Stafford Horne <shorne@gmail.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
79c1c594 |
|
30-Jun-2021 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
mm/hugetlb: change parameters of arch_make_huge_pte() Patch series "Subject: [PATCH v2 0/5] Implement huge VMAP and VMALLOC on powerpc 8xx", v2. This series implements huge VMAP and VMALLOC on powerpc 8xx. Powerpc 8xx has 4 page sizes: - 4k - 16k - 512k - 8M At the time being, vmalloc and vmap only support huge pages which are leaf at PMD level. Here the PMD level is 4M, it doesn't correspond to any supported page size. For now, implement use of 16k and 512k pages which is done at PTE level. Support of 8M pages will be implemented later, it requires use of hugepd tables. To allow this, the architecture provides two functions: - arch_vmap_pte_range_map_size() which tells vmap_pte_range() what page size to use. A stub returning PAGE_SIZE is provided when the architecture doesn't provide this function. - arch_vmap_pte_supported_shift() which tells __vmalloc_node_range() what page shift to use for a given area size. A stub returning PAGE_SHIFT is provided when the architecture doesn't provide this function. This patch (of 5): At the time being, arch_make_huge_pte() has the following prototype: pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma, struct page *page, int writable); vma is used to get the pages shift or size. vma is also used on Sparc to get vm_flags. page is not used. writable is not used. In order to use this function without a vma, replace vma by shift and flags. Also remove the used parameters. Link: https://lkml.kernel.org/r/cover.1620795204.git.christophe.leroy@csgroup.eu Link: https://lkml.kernel.org/r/f4633ac6a7da2f22f31a04a89e0a7026bb78b15b.1620795204.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e6e4f42e |
|
13-Nov-2020 |
Peter Zijlstra <peterz@infradead.org> |
sparc64/mm: Implement pXX_leaf_size() support Sparc64 has non-pagetable aligned large page support; wire up the pXX_leaf_size() functions to report the correct pagetable page size. This enables PERF_SAMPLE_{DATA,CODE}_PAGE_SIZE to report accurate pagetable leaf sizes. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201126121121.301768209@infradead.org
|
#
974b9b2c |
|
08-Jun-2020 |
Mike Rapoport <rppt@kernel.org> |
mm: consolidate pte_index() and pte_offset_*() definitions All architectures define pte_index() as (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1) and all architectures define pte_offset_kernel() as an entry in the array of PTEs indexed by the pte_index(). For the most architectures the pte_offset_kernel() implementation relies on the availability of pmd_page_vaddr() that converts a PMD entry value to the virtual address of the page containing PTEs array. Let's move x86 definitions of the PTE accessors to the generic place in <linux/pgtable.h> and then simply drop the respective definitions from the other architectures. The architectures that didn't provide pmd_page_vaddr() are updated to have that defined. The generic implementation of pte_offset_kernel() can be overridden by an architecture and alpha makes use of this because it has special ordering requirements for its version of pte_offset_kernel(). [rppt@linux.ibm.com: v2] Link: http://lkml.kernel.org/r/20200514170327.31389-11-rppt@kernel.org [rppt@linux.ibm.com: update] Link: http://lkml.kernel.org/r/20200514170327.31389-12-rppt@kernel.org [rppt@linux.ibm.com: update] Link: http://lkml.kernel.org/r/20200514170327.31389-13-rppt@kernel.org [akpm@linux-foundation.org: fix x86 warning] [sfr@canb.auug.org.au: fix powerpc build] Link: http://lkml.kernel.org/r/20200607153443.GB738695@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-10-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ca5999fd |
|
08-Jun-2020 |
Mike Rapoport <rppt@kernel.org> |
mm: introduce include/linux/pgtable.h The include/linux/pgtable.h is going to be the home of generic page table manipulation functions. Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and make the latter include asm/pgtable.h. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
251a0ffe |
|
10-Apr-2020 |
Arjun Roy <arjunroy@google.com> |
mm: bring sparc pte_index() semantics inline with other platforms pte_index() on platforms other than sparc return a numerical index. On sparc, it returns a pte_t*. This presents an issue for vm_insert_pages(), which relies on pte_index() to find the offset for a pte within a pmd, for batched inserts. This patch: 1. Modifies pte_index() for sparc to return a numerical index, like other platforms, 2. Defines pte_entry() for sparc which returns a pte_t* (as pte_index() used to), 3. Converts existing sparc callers for pte_index() to use pte_entry(). [sfr@canb.auug.org.au: remove pte_entry and just directly modified pte_offset_kernel instead] Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: David Miller <davem@davemloft.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Arjun Roy <arjunroy.kdev@gmail.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Link: http://lkml.kernel.org/r/20200227105045.6b421d9f@canb.auug.org.au Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
80942493 |
|
03-Feb-2020 |
Steven Price <steven.price@arm.com> |
sparc: mm: add p?d_leaf() definitions walk_page_range() is going to be allowed to walk page tables other than those of user space. For this it needs to know when it has reached a 'leaf' entry in the page tables. This information is provided by the p?d_leaf() functions/macros. For sparc 64 bit, pmd_large() and pud_large() are already provided, so add macros to provide the p?d_leaf names required by the generic code. Link: http://lkml.kernel.org/r/20191218162402.45610-10-steven.price@arm.com Signed-off-by: Steven Price <steven.price@arm.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: "Liang, Kan" <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Zong Li <zong.li@sifive.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5637bc50 |
|
24-Nov-2019 |
Mike Rapoport <rppt@kernel.org> |
sparc64: add support for folded p4d page tables Implement primitives necessary for the 4th level folding, add walks of p4d level where appropriate and replace 5level-fixup.h with pgtable-nop4d.h. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a22fea94 |
|
26-Sep-2019 |
Andrew Morton <akpm@linux-foundation.org> |
arch/sparc/include/asm/pgtable_64.h: fix build A last-minute fixlet which I'd failed to merge at the appropriate time had the predictable effect. Fixes: f672e2c217e2d4b2 ("lib: untag user pointers in strn*_user") Cc: Andrey Konovalov <andreyknvl@google.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
903f433f |
|
25-Sep-2019 |
Andrey Konovalov <andreyknvl@google.com> |
lib: untag user pointers in strn*_user Patch series "arm64: untag user pointers passed to the kernel", v19. === Overview arm64 has a feature called Top Byte Ignore, which allows to embed pointer tags into the top byte of each pointer. Userspace programs (such as HWASan, a memory debugging tool [1]) might use this feature and pass tagged user pointers to the kernel through syscalls or other interfaces. Right now the kernel is already able to handle user faults with tagged pointers, due to these patches: 1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a tagged pointer") 2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged pointers") 3. 276e9327 ("arm64: entry: improve data abort handling of tagged pointers") This patchset extends tagged pointer support to syscall arguments. As per the proposed ABI change [3], tagged pointers are only allowed to be passed to syscalls when they point to memory ranges obtained by anonymous mmap() or sbrk() (see the patchset [3] for more details). For non-memory syscalls this is done by untaging user pointers when the kernel performs pointer checking to find out whether the pointer comes from userspace (most notably in access_ok). The untagging is done only when the pointer is being checked, the tag is preserved as the pointer makes its way through the kernel and stays tagged when the kernel dereferences the pointer when perfoming user memory accesses. The mmap and mremap (only new_addr) syscalls do not currently accept tagged addresses. Architectures may interpret the tag as a background colour for the corresponding vma. Other memory syscalls (mprotect, etc.) don't do user memory accesses but rather deal with memory ranges, and untagged pointers are better suited to describe memory ranges internally. Thus for memory syscalls we untag pointers completely when they enter the kernel. === Other approaches One of the alternative approaches to untagging that was considered is to completely strip the pointer tag as the pointer enters the kernel with some kind of a syscall wrapper, but that won't work with the countless number of different ioctl calls. With this approach we would need a custom wrapper for each ioctl variation, which doesn't seem practical. An alternative approach to untagging pointers in memory syscalls prologues is to inspead allow tagged pointers to be passed to find_vma() (and other vma related functions) and untag them there. Unfortunately, a lot of find_vma() callers then compare or subtract the returned vma start and end fields against the pointer that was being searched. Thus this approach would still require changing all find_vma() callers. === Testing The following testing approaches has been taken to find potential issues with user pointer untagging: 1. Static testing (with sparse [2] and separately with a custom static analyzer based on Clang) to track casts of __user pointers to integer types to find places where untagging needs to be done. 2. Static testing with grep to find parts of the kernel that call find_vma() (and other similar functions) or directly compare against vm_start/vm_end fields of vma. 3. Static testing with grep to find parts of the kernel that compare user pointers with TASK_SIZE or other similar consts and macros. 4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running a modified syzkaller version that passes tagged pointers to the kernel. Based on the results of the testing the requried patches have been added to the patchset. === Notes This patchset is meant to be merged together with "arm64 relaxed ABI" [3]. This patchset is a prerequisite for ARM's memory tagging hardware feature support [4]. This patchset has been merged into the Pixel 2 & 3 kernel trees and is now being used to enable testing of Pixel phones with HWASan. Thanks! [1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html [2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060e0145f292 [3] https://lkml.org/lkml/2019/6/12/745 [4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture-2018-developments-armv85a This patch (of 11) This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. strncpy_from_user and strnlen_user accept user addresses as arguments, and do not go through the same path as copy_from_user and others, so here we need to handle the case of tagged user addresses separately. Untag user pointers passed to these functions. Note, that this patch only temporarily untags the pointers to perform validity checks, but then uses them as is to perform user memory accesses. [andreyknvl@google.com: fix sparc4 build] Link: http://lkml.kernel.org/r/CAAeHK+yx4a-P0sDrXTUxMvO2H0CJZUFPffBrg_cU7oJOZyC7ew@mail.gmail.com Link: http://lkml.kernel.org/r/c5a78bcad3e94d6cda71fcaa60a423231ae71e4c.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
782de70c |
|
23-Sep-2019 |
Mike Rapoport <rppt@kernel.org> |
mm: consolidate pgtable_cache_init() and pgd_cache_init() Both pgtable_cache_init() and pgd_cache_init() are used to initialize kmem cache for page table allocations on several architectures that do not use PAGE_SIZE tables for one or more levels of the page table hierarchy. Most architectures do not implement these functions and use __weak default NOP implementation of pgd_cache_init(). Since there is no such default for pgtable_cache_init(), its empty stub is duplicated among most architectures. Rename the definitions of pgd_cache_init() to pgtable_cache_init() and drop empty stubs of pgtable_cache_init(). Link: http://lkml.kernel.org/r/1566457046-22637-1-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Will Deacon <will@kernel.org> [arm64] Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7b9afb86 |
|
11-Jul-2019 |
Christoph Hellwig <hch@lst.de> |
sparc64: use the generic get_user_pages_fast code The sparc64 code is mostly equivalent to the generic one, minus various bugfixes and two arch overrides that this patch adds to pgtable.h. Link: http://lkml.kernel.org/r/20190625143715.1689-10-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5875509d |
|
11-Jul-2019 |
Christoph Hellwig <hch@lst.de> |
sparc64: define untagged_addr() Add a helper to untag a user pointer. This is needed for ADI support in get_user_pages_fast. Link: http://lkml.kernel.org/r/20190625143715.1689-9-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d8550790 |
|
11-Jul-2019 |
Christoph Hellwig <hch@lst.de> |
sparc64: add the missing pgd_page definition sparc64 only had pgd_page_vaddr, but not pgd_page. [hch@lst.de: fix sparc64 build] Link: http://lkml.kernel.org/r/20190626131318.GA5101@lst.de Link: http://lkml.kernel.org/r/20190625143715.1689-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: David Miller <davem@davemloft.net> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5470dea4 |
|
13-May-2019 |
Alexander Duyck <alexander.h.duyck@linux.intel.com> |
mm: use mm_zero_struct_page from SPARC on all 64b architectures Patch series "Deferred page init improvements", v7. This patchset is essentially a refactor of the page initialization logic that is meant to provide for better code reuse while providing a significant improvement in deferred page initialization performance. In my testing on an x86_64 system with 384GB of RAM I have seen the following. In the case of regular memory initialization the deferred init time was decreased from 3.75s to 1.38s on average. This amounts to a 172% improvement for the deferred memory initialization performance. I have called out the improvement observed with each patch. This patch (of 4): Use the same approach that was already in use on Sparc on all the architectures that support a 64b long. This is mostly motivated by the fact that 7 to 10 store/move instructions are likely always going to be faster than having to call into a function that is not specialized for handling page init. An added advantage to doing it this way is that the compiler can get away with combining writes in the __init_single_page call. As a result the memset call will be reduced to only about 4 write operations, or at least that is what I am seeing with GCC 6.2 as the flags, LRU pointers, and count/mapcount seem to be cancelling out at least 4 of the 8 assignments on my system. One change I had to make to the function was to reduce the minimum page size to 56 to support some powerpc64 configurations. This change should introduce no change on SPARC since it already had this code. In the case of x86_64 I saw a reduction from 3.75s to 2.80s when initializing 384GB of RAM per node. Pavel Tatashin tested on a system with Broadcom's Stingray CPU and 48GB of RAM and found that __init_single_page() takes 19.30ns / 64-byte struct page before this patch and with this patch it takes 17.33ns / 64-byte struct page. Mike Rapoport ran a similar test on a OpenPower (S812LC 8348-21C) with Power8 processor and 128GB or RAM. His results per 64-byte struct page were 4.68ns before, and 4.59ns after this patch. Link: http://lkml.kernel.org/r/20190405221213.12227.9392.stgit@localhost.localdomain Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <yi.z.zhang@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3010a5ea |
|
07-Jun-2018 |
Laurent Dufour <ldufour@linux.vnet.ibm.com> |
mm: introduce ARCH_HAS_PTE_SPECIAL Currently the PTE special supports is turned on in per architecture header files. Most of the time, it is defined in arch/*/include/asm/pgtable.h depending or not on some other per architecture static definition. This patch introduce a new configuration variable to manage this directly in the Kconfig files. It would later replace __HAVE_ARCH_PTE_SPECIAL. Here notes for some architecture where the definition of __HAVE_ARCH_PTE_SPECIAL is not obvious: arm __HAVE_ARCH_PTE_SPECIAL which is currently defined in arch/arm/include/asm/pgtable-3level.h which is included by arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set. So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE. powerpc __HAVE_ARCH_PTE_SPECIAL is defined in 2 files: - arch/powerpc/include/asm/book3s/64/pgtable.h - arch/powerpc/include/asm/pte-common.h The first one is included if (PPC_BOOK3S & PPC64) while the second is included in all the other cases. So select ARCH_HAS_PTE_SPECIAL all the time. sparc: __HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) && defined(__arch64__) which are defined through the compiler in sparc/Makefile if !SPARC32 which I assume to be if SPARC64. So select ARCH_HAS_PTE_SPECIAL if SPARC64 There is no functional change introduced by this patch. Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.com Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Suggested-by: Jerome Glisse <jglisse@redhat.com> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <albert@sifive.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Christophe LEROY <christophe.leroy@c-s.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
74a04967 |
|
23-Feb-2018 |
Khalid Aziz <khalid.aziz@oracle.com> |
sparc64: Add support for ADI (Application Data Integrity) ADI is a new feature supported on SPARC M7 and newer processors to allow hardware to catch rogue accesses to memory. ADI is supported for data fetches only and not instruction fetches. An app can enable ADI on its data pages, set version tags on them and use versioned addresses to access the data pages. Upper bits of the address contain the version tag. On M7 processors, upper four bits (bits 63-60) contain the version tag. If a rogue app attempts to access ADI enabled data pages, its access is blocked and processor generates an exception. Please see Documentation/sparc/adi.txt for further details. This patch extends mprotect to enable ADI (TSTATE.mcde), enable/disable MCD (Memory Corruption Detection) on selected memory ranges, enable TTE.mcd in PTEs, return ADI parameters to userspace and save/restore ADI version tags on page swap out/in or migration. ADI is not enabled by default for any task. A task must explicitly enable ADI on a memory range and set version tag for ADI to be effective for the task. Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Khalid Aziz <khalid@gonehiking.org> Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
75037500 |
|
21-Feb-2018 |
Khalid Aziz <khalid.aziz@oracle.com> |
sparc64: Add support for ADI register fields, ASIs and traps SPARC M7 processor adds new control register fields, ASIs and a new trap to support the ADI (Application Data Integrity) feature. This patch adds definitions for these register fields, ASIs and a handler for the new precise memory corruption detected trap. Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Khalid Aziz <khalid@gonehiking.org> Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a8e654f0 |
|
31-Jan-2018 |
Nitin Gupta <nitin.m.gupta@oracle.com> |
sparc64: update pmdp_invalidate() to return old pmd value It's required to avoid losing dirty and accessed bits. [akpm@linux-foundation.org: add a `do' to the do-while loop] Link: http://lkml.kernel.org/r/20171213105756.69879-9-kirill.shutemov@linux.intel.com Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: David Miller <davem@davemloft.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e4e40e02 |
|
29-Nov-2017 |
Dan Williams <dan.j.williams@intel.com> |
mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITE In response to compile breakage introduced by a series that added the pud_write helper to x86, Stephen notes: did you consider using the other paradigm: In arch include files: #define pud_write pud_write static inline int pud_write(pud_t pud) ..... Then in include/asm-generic/pgtable.h: #ifndef pud_write tatic inline int pud_write(pud_t pud) { .... } #endif If you had, then the powerpc code would have worked ... ;-) and many of the other interfaces in include/asm-generic/pgtable.h are protected that way ... Given that some architecture already define pmd_write() as a macro, it's a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE. Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Oliver OHalloran <oliveroh@au1.ibm.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
78c94366 |
|
15-Nov-2017 |
Pavel Tatashin <pasha.tatashin@oracle.com> |
sparc64: optimize struct page zeroing Add an optimized mm_zero_struct_page(), so struct page's are zeroed without calling memset(). We do eight to ten regular stores based on the size of struct page. Compiler optimizes out the conditions of switch() statement. SPARC-M6 with 15T of memory, single thread performance: BASE FIX OPTIMIZED_FIX bootmem_init 28.440467985s 2.305674818s 2.305161615s free_area_init_nodes 202.845901673s 225.343084508s 172.556506560s -------------------------------------------- Total 231.286369658s 227.648759326s 174.861668175s BASE: current linux FIX: This patch series without "optimized struct page zeroing" OPTIMIZED_FIX: This patch series including the current patch. bootmem_init() is where memory for struct pages is zeroed during allocation. Note, about two seconds in this function is a fixed time: it does not increase as memory is increased. Link: http://lkml.kernel.org/r/20171013173214.27300-11-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b2441318 |
|
01-Nov-2017 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
df7b2155 |
|
11-Aug-2017 |
Nitin Gupta <nitin.m.gupta@oracle.com> |
sparc64: Add 16GB hugepage support Adds support for 16GB hugepage size. To use this page size use kernel parameters as: default_hugepagesz=16G hugepagesz=16G hugepages=10 Testing: Tested with the stream benchmark which allocates 48G of arrays backed by 16G hugepages and does RW operation on them in parallel. Orabug: 25362942 Cc: Anthony Yznaga <anthony.yznaga@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
44382b01 |
|
11-Aug-2017 |
Nitin Gupta <nitin.m.gupta@oracle.com> |
sparc64: Support huge PUD case in get_user_pages get_user_pages() is used to do direct IO. It already handles the case where the address range is backed by PMD huge pages. This patch now adds the case where the range could be backed by PUD huge pages. Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4d9fbf53 |
|
10-Aug-2017 |
David S. Miller <davem@davemloft.net> |
sparc64: Revert 16GB huge page support. It overflows the amount of space available in the initial .text section of trap handler assembler in some configurations, resulting in build failures. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f10bb007 |
|
29-Jul-2017 |
Nitin Gupta <nitin.m.gupta@oracle.com> |
sparc64: Add 16GB hugepage support Adds support for 16GB hugepage size. To use this page size use kernel parameters as: default_hugepagesz=16G hugepagesz=16G hugepages=10 Testing: Tested with the stream benchmark which allocates 48G of arrays backed by 16G hugepages and does RW operation on them in parallel. Orabug: 25362942 Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c9a844c5 |
|
29-Jul-2017 |
Nitin Gupta <nitin.m.gupta@oracle.com> |
sparc64: Support huge PUD case in get_user_pages get_user_pages() is used to do direct IO. It already handles the case where the address range is backed by PMD huge pages. This patch now adds the case where the range could be backed by PUD huge pages. Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ae34dbd |
|
31-Mar-2017 |
Tom Hromatka <tom.hromatka@oracle.com> |
sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write() This commit moves sparc64's prototype of pmd_write() outside of the CONFIG_TRANSPARENT_HUGEPAGE ifdef. In 2013, commit a7b9403f0e6d ("sparc64: Encode huge PMDs using PTE encoding.") exposed a path where pmd_write() could be called without CONFIG_TRANSPARENT_HUGEPAGE defined. This can result in the panic below. The diff is awkward to read, but the changes are straightforward. pmd_write() was moved outside of #ifdef CONFIG_TRANSPARENT_HUGEPAGE. Also, __HAVE_ARCH_PMD_WRITE was defined. kernel BUG at include/asm-generic/pgtable.h:576! \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ oracle_8114_cdb(8114): Kernel bad sw trap 5 [#1] CPU: 120 PID: 8114 Comm: oracle_8114_cdb Not tainted 4.1.12-61.7.1.el6uek.rc1.sparc64 #1 task: fff8400700a24d60 ti: fff8400700bc4000 task.ti: fff8400700bc4000 TSTATE: 0000004411e01607 TPC: 00000000004609f8 TNPC: 00000000004609fc Y: 00000005 Not tainted TPC: <gup_huge_pmd+0x198/0x1e0> g0: 000000000001c000 g1: 0000000000ef3954 g2: 0000000000000000 g3: 0000000000000001 g4: fff8400700a24d60 g5: fff8001fa5c10000 g6: fff8400700bc4000 g7: 0000000000000720 o0: 0000000000bc5058 o1: 0000000000000240 o2: 0000000000006000 o3: 0000000000001c00 o4: 0000000000000000 o5: 0000048000080000 sp: fff8400700bc6ab1 ret_pc: 00000000004609f0 RPC: <gup_huge_pmd+0x190/0x1e0> l0: fff8400700bc74fc l1: 0000000000020000 l2: 0000000000002000 l3: 0000000000000000 l4: fff8001f93250950 l5: 000000000113f800 l6: 0000000000000004 l7: 0000000000000000 i0: fff8400700ca46a0 i1: bd0000085e800453 i2: 000000026a0c4000 i3: 000000026a0c6000 i4: 0000000000000001 i5: fff800070c958de8 i6: fff8400700bc6b61 i7: 0000000000460dd0 I7: <gup_pud_range+0x170/0x1a0> Call Trace: [0000000000460dd0] gup_pud_range+0x170/0x1a0 [0000000000460e84] get_user_pages_fast+0x84/0x120 [00000000006f5a18] iov_iter_get_pages+0x98/0x240 [00000000005fa744] do_direct_IO+0xf64/0x1e00 [00000000005fbbc0] __blockdev_direct_IO+0x360/0x15a0 [00000000101f74fc] ext4_ind_direct_IO+0xdc/0x400 [ext4] [00000000101af690] ext4_ext_direct_IO+0x1d0/0x2c0 [ext4] [00000000101af86c] ext4_direct_IO+0xec/0x220 [ext4] [0000000000553bd4] generic_file_read_iter+0x114/0x140 [00000000005bdc2c] __vfs_read+0xac/0x100 [00000000005bf254] vfs_read+0x54/0x100 [00000000005bf368] SyS_pread64+0x68/0x80 Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9849a569 |
|
09-Mar-2017 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
arch, mm: convert all architectures to use 5level-fixup.h If an architecture uses 4level-fixup.h we don't need to do anything as it includes 5level-fixup.h. If an architecture uses pgtable-nop*d.h, define __ARCH_USE_5LEVEL_HACK before inclusion of the header. It makes asm-generic code to use 5level-fixup.h. If an architecture has 4-level paging or folds levels on its own, include 5level-fixup.h directly. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
589ee628 |
|
03-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare to remove the <linux/mm_types.h> dependency from <linux/sched.h> Update code that relied on sched.h including various MM types for them. This will allow us to remove the <linux/mm_types.h> include from <linux/sched.h>. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
c7d9f77d |
|
01-Feb-2017 |
Nitin Gupta <nitin.m.gupta@oracle.com> |
sparc64: Multi-page size support Add support for using multiple hugepage sizes simultaneously on mainline. Currently, support for 256M has been added which can be used along with 8M pages. Page tables are set like this (e.g. for 256M page): VA + (8M * x) -> PA + (8M * x) (sz bit = 256M) where x in [0, 31] and TSB is set similarly: VA + (4M * x) -> PA + (4M * x) (sz bit = 256M) where x in [0, 63] - Testing Tested on Sonoma (which supports 256M pages) by running stream benchmark instances in parallel: one instance uses 8M pages and another uses 256M pages, consuming 48G each. Boot params used: default_hugepagesz=256M hugepagesz=256M hugepages=300 hugepagesz=8M hugepages=10000 Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
acff7fdb |
|
09-Dec-2016 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
sparc64: fix typo in pgd_clear() It really has to be pgdp, not pgd. It just happend to work since all callers have 'pgd' as an argument. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7bc3777c |
|
29-Jul-2016 |
Nitin Gupta <nitin.m.gupta@oracle.com> |
sparc64: Trim page tables for 8M hugepages For PMD aligned (8M) hugepages, we currently allocate all four page table levels which is wasteful. We now allocate till PMD level only which saves memory usage from page tables. Also, when freeing page table for 8M hugepage backed region, make sure we don't try to access non-existent PTE level. Orabug: 22630259 Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
24e49ee3 |
|
30-Mar-2016 |
Nitin Gupta <nitin.m.gupta@oracle.com> |
sparc64: Reduce TLB flushes during hugepte changes During hugepage map/unmap, TSB and TLB flushes are currently issued at every PAGE_SIZE'd boundary which is unnecessary. We now issue the flush at REAL_HPAGE_SIZE boundaries only. Without this patch workloads which unmap a large hugepage backed VMA region get CPU lockups due to excessive TLB flush calls. Orabug: 22365539, 22643230, 22995196 Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fd8cfd30 |
|
19-May-2016 |
Hugh Dickins <hughd@google.com> |
arch: fix has_transparent_hugepage() I've just discovered that the useful-sounding has_transparent_hugepage() is actually an architecture-dependent minefield: on some arches it only builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when not, but on some of those (arm and arm64) it then gives the wrong answer; and on mips alone it's marked __init, which would crash if called later (but so far it has not been called later). Straighten this out: make it available to all configs, with a sensible default in asm-generic/pgtable.h, removing its definitions from those arches (arc, arm, arm64, sparc, tile) which are served by the default, adding #define has_transparent_hugepage has_transparent_hugepage to those (mips, powerpc, s390, x86) which need to override the default at runtime, and removing the __init from mips (but maybe that kind of code should be avoided after init: set a static variable the first time it's called). Signed-off-by: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Yang Shi <yang.shi@linaro.org> Cc: Ning Qu <quning@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc] Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [arch/s390] Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
08f80073 |
|
04-Mar-2016 |
Adam Buchbinder <adam.buchbinder@gmail.com> |
sparc: Fix misspellings in comments. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Reviewed-by: Julian Calaby <julian.calaby@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
79cedb8f |
|
15-Jan-2016 |
Minchan Kim <minchan@kernel.org> |
arch/sparc/include/asm/pgtable_64.h: add pmd_[dirty|mkclean] for THP MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite of the contents since MADV_FREE syscall is called for THP page. This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE support. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Shaohua Li <shli@kernel.org> Cc: <yalin.wang2010@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Gang <gang.chen.5i5j@gmail.com> Cc: Chris Zankel <chris@zankel.net> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: Helge Deller <deller@gmx.de> Cc: Hugh Dickins <hughd@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jason Evans <je@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttil <mika.penttila@nextfour.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Rik van Riel <riel@redhat.com> Cc: Roland Dreier <roland@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Shaohua Li <shli@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
99f1bc01 |
|
15-Jan-2016 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
sparc, thp: remove infrastructure for handling splitting PMDs With new refcounting we don't need to mark PMDs splitting. Let's drop code to handle this. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8809aa2d |
|
24-Jun-2015 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
mm: clarify that the function operates on hugepage pte We have confusing functions to clear pmd, pmd_clear_* and pmd_clear. Add _huge_ to pmdp_clear functions so that we are clear that they operate on hugepage pte. We don't bother about other functions like pmdp_set_wrprotect, pmdp_clear_flush_young, because they operate on PTE bits and hence indicate they are operating on hugepage ptes Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
494e5b6f |
|
27-May-2015 |
Khalid Aziz <khalid.aziz@oracle.com> |
sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTE sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTE Bit 9 of TTE is CV (Cacheable in V-cache) on sparc v9 processor while the same bit 9 is MCDE (Memory Corruption Detection Enable) on M7 processor. This creates a conflicting usage of the same bit. Kernel sets TTE.cv bit on all pages for sun4v architecture which works well for sparc v9 but enables memory corruption detection on M7 processor which is not the intent. This patch adds code to determine if kernel is running on M7 processor and takes steps to not enable memory corruption detection in TTE erroneously. Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d016bf7e |
|
11-Feb-2015 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm: make FIRST_USER_ADDRESS unsigned long on all archs LKP has triggered a compiler warning after my recent patch "mm: account pmd page tables to the process": mm/mmap.c: In function 'exit_mmap': >> mm/mmap.c:2857:2: warning: right shift count >= width of type [enabled by default] The code: > 2857 WARN_ON(mm_nr_pmds(mm) > 2858 round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT); In this, on tile, we have FIRST_USER_ADDRESS defined as 0. round_up() has the same type -- int. PUD_SHIFT. I think the best way to fix it is to define FIRST_USER_ADDRESS as unsigned long. On every arch for consistency. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6a8c4820 |
|
10-Feb-2015 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
sparc: drop pte_file()-related helpers We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. This patch also increase number of bits availble for swap offset. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c164e038 |
|
10-Dec-2014 |
Kirill A. Shutemov <kirill@shutemov.name> |
mm: fix huge zero page accounting in smaps report As a small zero page, huge zero page should not be accounted in smaps report as normal page. For small pages we rely on vm_normal_page() to filter out zero page, but vm_normal_page() is not designed to handle pmds. We only get here due hackish cast pmd to pte in smaps_pte_range() -- pte and pmd format is not necessary compatible on each and every architecture. Let's add separate codepath to handle pmds. follow_trans_huge_pmd() will detect huge zero page for us. We would need pmd_dirty() helper to do this properly. The patch adds it to THP-enabled architectures which don't yet have one. [akpm@linux-foundation.org: use do_div to fix 32-bit build] Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengwei Yin <yfw.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d195b71b |
|
27-Sep-2014 |
David S. Miller <davem@davemloft.net> |
sparc64: Kill unnecessary tables and increase MAX_BANKS. swapper_low_pmd_dir and swapper_pud_dir are actually completely useless and unnecessary. We just need swapper_pg_dir[]. Naturally the other page table chunks will be allocated on an as-needed basis. Since the kernel actually accesses these tables in the PAGE_OFFSET view, there is not even a TLB locality advantage of placing them in the kernel image. Use the hard coded vmlinux.ld.S slot for swapper_pg_dir which is naturally page aligned. Increase MAX_BANKS to 1024 in order to handle heavily fragmented virtual guests. Even with this MAX_BANKS increase, the kernel is 20K+ smaller. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
|
#
bb4e6e85 |
|
27-Sep-2014 |
David S. Miller <davem@davemloft.net> |
sparc64: Adjust vmalloc region size based upon available virtual address bits. In order to accomodate embedded per-cpu allocation with large numbers of cpus and numa nodes, we have to use as much virtual address space as possible for the vmalloc region. Otherwise we can get things like: PERCPU: max_distance=0x380001c10000 too large for vmalloc space 0xff00000000 So, once we select a value for PAGE_OFFSET, derive the size of the vmalloc region based upon that. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
|
#
7c0fa0f2 |
|
24-Sep-2014 |
David S. Miller <davem@davemloft.net> |
sparc64: Increase MAX_PHYS_ADDRESS_BITS to 53. Make sure, at compile time, that the kernel can properly support whatever MAX_PHYS_ADDRESS_BITS is defined to. On M7 chips, use a max_phys_bits value of 49. Based upon a patch by Bob Picco. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
|
#
0dd5b7b0 |
|
24-Sep-2014 |
David S. Miller <davem@davemloft.net> |
sparc64: Fix physical memory management regressions with large max_phys_bits. If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like DEBUG_PAGEALLOC stop working because the 3-level page tables only can cover up to 43 bits. Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to 47, several statically allocated tables became enormous. Compounding this is that we will need to support up to 49 bits of physical addressing for M7 chips. The two tables in question are sparc64_valid_addr_bitmap and kpte_linear_bitmap. The first holds a bitmap, with 1 bit for each 4MB chunk of physical memory, indicating whether that chunk actually exists in the machine and is valid. The second table is a set of 2-bit values which tell how large of a mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB chunk of ram in the system. These tables are huge and take up an enormous amount of the BSS section of the sparc64 kernel image. Specifically, the sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K. So let's solve the space wastage and the DEBUG_PAGEALLOC problem at the same time, by using the kernel page tables (as designed) to manage this information. We have to keep using large mappings when DEBUG_PAGEALLOC is disabled, and we do this by encoding huge PMDs and PUDs. On a T4-2 with 256GB of ram the kernel page table takes up 16K with DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this memory is dynamically allocated at run time rather than coded statically into the kernel image. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
|
#
ac55c768 |
|
26-Sep-2014 |
David S. Miller <davem@davemloft.net> |
sparc64: Switch to 4-level page tables. This has become necessary with chips that support more than 43-bits of physical addressing. Based almost entirely upon a patch by Bob Picco. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
|
#
f05a6865 |
|
16-May-2014 |
Sam Ravnborg <sam@ravnborg.org> |
sparc: drop use of extern for prototypes in arch/sparc/include/asm Drop extern for all prototypes and adjust alignment of parameters as required after the removal. In a few rare cases adjust linelength to conform to maximum 80 chars, and likewise in a few rare cases adjust alignment of parameters to static functions. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b18eb2d7 |
|
07-May-2014 |
David S. Miller <davem@davemloft.net> |
sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus. Access to the TSB hash tables during TLB misses requires that there be an atomic 128-bit quad load available so that we fetch a matching TAG and DATA field at the same time. On cpus prior to UltraSPARC-III only virtual address based quad loads are available. UltraSPARC-III and later provide physical address based variants which are easier to use. When we only have virtual address based quad loads available this means that we have to lock the TSB into the TLB at a fixed virtual address on each cpu when it runs that process. We can't just access the PAGE_OFFSET based aliased mapping of these TSBs because we cannot take a recursive TLB miss inside of the TLB miss handler without risking running out of hardware trap levels (some trap combinations can be deep, such as those generated by register window spill and fill traps). Without huge pages it's working perfectly fine, but when the huge TSB got added another chunk of fixed virtual address space was not allocated for this second TSB mapping. So we were mapping both the 8K and 4MB TSBs to the same exact virtual address, causing multiple TLB matches which gives undefined behavior. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fe866433 |
|
29-Apr-2014 |
David S. Miller <davem@davemloft.net> |
sparc64: Give more detailed information in {pgd,pmd}_ERROR() and kill pte_ERROR(). pte_ERROR() is not used anywhere, delete it. For pgd_ERROR() and pmd_ERROR(), output something similar to x86, giving the address of the pgd/pmd as well as it's value. Also provide the caller, since these macros are invoked from pgd_clear_bad() and pmd_clear_bad() which provides little context as to what high level operation was occuring when the BAD state was detected. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
26cf4325 |
|
29-Apr-2014 |
David S. Miller <davem@davemloft.net> |
sparc64: Add basic validations to {pud,pmd}_bad(). Instead of returning false we should at least check the most basic things, otherwise page table corruptions will be very difficult to debug. PMD and PTE tables are of size PAGE_SIZE, so none of the sub-PAGE_SIZE bits should be set. We also complement this with a check that the physical address the pud/pmd points to is valid memory. PowerPC was used as a guide while implementating this. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0eef331a |
|
03-May-2014 |
David S. Miller <davem@davemloft.net> |
sparc64: Use 'ILOG2_4MB' instead of constant '22'. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ee73887e |
|
29-Apr-2014 |
David S. Miller <davem@davemloft.net> |
sparc64: Fix range check in kern_addr_valid(). In commit b2d438348024b75a1ee8b66b85d77f569a5dfed8 ("sparc64: Make PAGE_OFFSET variable."), the MAX_PHYS_ADDRESS_BITS value was increased (to 47). This constant reference to '41UL' was missed. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eaf85da8 |
|
28-Apr-2014 |
David S. Miller <davem@davemloft.net> |
sparc64: Don't use _PAGE_PRESENT in pte_modify() mask. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c2e4e676 |
|
27-Apr-2014 |
David S. Miller <davem@davemloft.net> |
sparc64: Fix hex values in comment above pte_modify(). When _PAGE_SPECIAL and _PAGE_PMD_HUGE were added to the mask, the comment was not updated. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
04df419d |
|
25-Apr-2014 |
David S. Miller <davem@davemloft.net> |
sparc64: Fix bugs in get_user_pages_fast() wrt. THP. The large PMD path needs to check _PAGE_VALID not _PAGE_PRESENT, to decide if it needs to bail and return 0. pmd_large() should therefore just check _PAGE_PMD_HUGE. Calls to gup_huge_pmd() are guarded with a check of pmd_large(), so we just need to add a valid bit check. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
51e5ef1b |
|
24-Apr-2014 |
David S. Miller <davem@davemloft.net> |
sparc64: Fix huge PMD invalidation. On sparc64 "present" and "valid" are seperate PTE bits, this allows us to naturally distinguish between the user explicitly asking for PROT_NONE with mprotect() and other situations. However we weren't handling this properly in the huge PMD paths. First of all, the page table walker in the TSB miss path only checks for _PAGE_PMD_HUGE. So the generic pmdp_invalidate() would clear _PAGE_PRESENT but the TLB miss paths would still load it into the TLB as a valid huge PMD. Fix this by clearing the valid bit in pmdp_invalidate(), and also checking the valid bit in USER_PGTABLE_CHECK_PMD_HUGE using "brgez" since _PAGE_VALID is bit 63 in both the sun4u and sun4v pte layouts. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
20841405 |
|
18-Dec-2013 |
Rik van Riel <riel@redhat.com> |
mm: fix TLB flush race between migration, and change_protection_range There are a few subtle races, between change_protection_range (used by mprotect and change_prot_numa) on one side, and NUMA page migration and compaction on the other side. The basic race is that there is a time window between when the PTE gets made non-present (PROT_NONE or NUMA), and the TLB is flushed. During that time, a CPU may continue writing to the page. This is fine most of the time, however compaction or the NUMA migration code may come in, and migrate the page away. When that happens, the CPU may continue writing, through the cached translation, to what is no longer the current memory location of the process. This only affects x86, which has a somewhat optimistic pte_accessible. All other architectures appear to be safe, and will either always flush, or flush whenever there is a valid mapping, even with no permissions (SPARC). The basic race looks like this: CPU A CPU B CPU C load TLB entry make entry PTE/PMD_NUMA fault on entry read/write old page start migrating page change PTE/PMD to new page read/write old page [*] flush TLB reload TLB from new entry read/write new page lose data [*] the old page may belong to a new user at this point! The obvious fix is to flush remote TLB entries, by making sure that pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may still be accessible if there is a TLB flush pending for the mm. This should fix both NUMA migration and compaction. [mgorman@suse.de: fix build] Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a7b9403f |
|
26-Sep-2013 |
David S. Miller <davem@davemloft.net> |
sparc64: Encode huge PMDs using PTE encoding. Now that we have 64-bits for PMDs we can stop using special encodings for the huge PMD values, and just put real PTEs in there. We allocate a _PAGE_PMD_HUGE bit to distinguish between plain PMDs and huge ones. It is the same for both 4U and 4V PTE layouts. We also use _PAGE_SPECIAL to indicate the splitting state, since a huge PMD cannot also be special. All of the PMD --> PTE translation code disappears, and most of the huge PMD bit modifications and tests just degenerate into the PTE operations. In particular USER_PGTABLE_CHECK_PMD_HUGE becomes trivial. As a side effect, normal PMDs don't shift the physical address around. This also speeds up the page table walks in the TLB miss paths since they don't have to do the shifts any more. Another non-trivial aspect is that pte_modify() has to be changed to preserve the _PAGE_PMD_HUGE bits as well as the page size field of the pte. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2b77933c |
|
25-Sep-2013 |
David S. Miller <davem@davemloft.net> |
sparc64: Move to 64-bit PGDs and PMDs. To make the page tables compact, we were using 32-bit PGDs and PMDs. We only had to support <= 43 bits of physical addresses so this was quite feasible. In order to support larger physical addresses we have to move to 64-bit PGDs and PMDs. Most of the changes are straight-forward: 1) {pgd,pmd}_t --> unsigned long 2) Anything that tries to use plain "unsigned int" types with pgd/pmd values needs to be adjusted. In particular things like "0U" become "0UL". 3) {PGDIR,PMD}_BITS decrease by one. 4) In the assembler page table walkers, use "ldxa" instead of "lduwa" and adjust the low bit masks to clear out the low 3 bits instead of just the low 2 bits during pgd/pmd address formation. Also, use PTRS_PER_PGD and PTRS_PER_PMD in the sizing of the swapper_{pg_dir,low_pmd_dir} arrays. This patch does not try to take advantage of having 64-bits in the PMDs to simplify the hugepage code, that will come in a subsequent change. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
37b3a8ff |
|
25-Sep-2013 |
David S. Miller <davem@davemloft.net> |
sparc64: Move from 4MB to 8MB huge pages. The impetus for this is that we would like to move to 64-bit PMDs and PGDs, but that would result in only supporting a 42-bit address space with the current page table layout. It'd be nice to support at least 43-bits. The reason we'd end up with only 42-bits after making PMDs and PGDs 64-bit is that we only use half-page sized PTE tables in order to make PMDs line up to 4MB, the hardware huge page size we use. So what we do here is we make huge pages 8MB, and fabricate them using 4MB hw TLB entries. Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in places that really need to operate on hardware 4MB pages. Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT, PGD_SHIFT, and the build time CPP test as needed. Use a CPP test to make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up. This makes the pgtable cache completely unused, so remove the code managing it and the state used in mm_context_t. Now we have less spinlocks taken in the page table allocation path. The technique we use to fabricate the 8MB pages is to transfer bit 22 from the missing virtual address into the PTEs physical address field. That takes care of the transparent huge pages case. For hugetlb, we fill things in at the PTE level and that code already puts the sub huge page physical bits into the PTEs, based upon the offset, so there is nothing special we need to do. It all just works out. So, a small amount of complexity in the THP case, but this code is about to get much simpler when we move the 64-bit PMDs as we can move away from the fancy 32-bit huge PMD encoding and just put a real PTE value in there. With bug fixes and help from Bob Picco. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
40d158e6 |
|
10-May-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
consolidate io_remap_pfn_range definitions Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
6b0b50b0 |
|
05-Jun-2013 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
mm/THP: add pmd args to pgtable deposit and withdraw APIs This will be later used by powerpc THP support. In powerpc we want to use pgtable for storing the hash index values. So instead of adding them to mm_context list, we would like to store them in the second half of pmd Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
f36391d2 |
|
19-Apr-2013 |
David S. Miller <davem@davemloft.net> |
sparc64: Fix race in TLB batch processing. As reported by Dave Kleikamp, when we emit cross calls to do batched TLB flush processing we have a race because we do not synchronize on the sibling cpus completing the cross call. So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.) and either flushes are missed or flushes will flush the wrong addresses. Fix this by using generic infrastructure to synchonize on the completion of the cross call. This first required getting the flush_tlb_pending() call out from switch_to() which operates with locks held and interrupts disabled. The problem is that smp_call_function_many() cannot be invoked with IRQs disabled and this is explicitly checked for with WARN_ON_ONCE(). We get the batch processing outside of locked IRQ disabled sections by using some ideas from the powerpc port. Namely, we only batch inside of arch_{enter,leave}_lazy_mmu_mode() calls. If we're not in such a region, we flush TLBs synchronously. 1) Get rid of xcall_flush_tlb_pending and per-cpu type implementations. 2) Do TLB batch cross calls instead via: smp_call_function_many() tlb_pending_func() __flush_tlb_pending() 3) Batch only in lazy mmu sequences: a) Add 'active' member to struct tlb_batch b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE c) Set 'active' in arch_enter_lazy_mmu_mode() d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode() e) Check 'active' in tlb_batch_add_one() and do a synchronous flush if it's clear. 4) Add infrastructure for synchronous TLB page flushes. a) Implement __flush_tlb_page and per-cpu variants, patch as needed. b) Likewise for xcall_flush_tlb_page. c) Implement smp_flush_tlb_page() to invoke the cross-call. d) Wire up global_flush_tlb_page() to the right routine based upon CONFIG_SMP 5) It turns out that singleton batches are very common, 2 out of every 3 batch flushes have only a single entry in them. The batch flush waiting is very expensive, both because of the poll on sibling cpu completeion, as well as because passing the tlb batch pointer to the sibling cpus invokes a shared memory dereference. Therefore, in flush_tlb_pending(), if there is only one entry in the batch perform a completely asynchronous global_flush_tlb_page() instead. Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
#
89a77915 |
|
13-Feb-2013 |
David S. Miller <davem@davemloft.net> |
sparc64: Fix get_user_pages_fast() wrt. THP. Mostly mirrors the s390 logic, as unlike x86 we don't need the SetPageReferenced() bits. On sparc64 we also lack a user/privileged bit in the huge PMDs. In order to make this work for THP and non-THP builds, some header file adjustments were necessary. Namely, provide the PMD_HUGE_* bit defines and the pmd_large() inline unconditionally rather than protected by TRANSPARENT_HUGEPAGE. Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4a9d1946 |
|
18-Dec-2012 |
David S. Miller <davem@davemloft.net> |
sparc64: Define pte_accessible() We can elide flush_tlb_*() calls when _PAGE_VALID is clear as that is the test used to determine whether or not to queue up a TLB flush in set_pte_at(). Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9e695d2e |
|
08-Oct-2012 |
David Miller <davem@davemloft.net> |
sparc64: Support transparent huge pages. This is relatively easy since PMD's now cover exactly 4MB of memory. Our PMD entries are 32-bits each, so we use a special encoding. The lowest bit, PMD_ISHUGE, determines the interpretation. This is possible because sparc64's page tables are purely software entities so we can use whatever encoding scheme we want. We just have to make the TLB miss assembler page table walkers aware of the layout. set_pmd_at() works much like set_pte_at() but it has to operate in two page from a table of non-huge PTEs, so we have to queue up TLB flushes based upon what mappings are valid in the PTE table. In the second regime we are going from huge-page to non-huge-page, and in that case we need only queue up a single TLB flush to push out the huge page mapping. We still have 5 bits remaining in the huge PMD encoding so we can very likely support any new pieces of THP state tracking that might get added in the future. With lots of help from Johannes Weiner. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
dbc9fdf0 |
|
08-Oct-2012 |
David Miller <davem@davemloft.net> |
sparc64: Document PGD and PMD layout. We're going to be messing around with the PMD interpretation and layout for the sake of transparent huge pages, so we better clearly document what we're starting with. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
56a70b8c |
|
08-Oct-2012 |
David Miller <davem@davemloft.net> |
sparc64: Halve the size of PTE tables The reason we want to do this is to facilitate transparent huge page support. Right now PMD's cover 8MB of address space, and our huge page size is 4MB. The current transparent hugepage support is not able to handle HPAGE_SIZE != PMD_SIZE. So make PTE tables be sized to half of a page instead of a full page. We can still map properly the whole supported virtual address range which on sparc64 requires 44 bits. Add a compile time CPP test which ensures that this requirement is always met. There is a minor inefficiency added by this change. We only use half of the page for PTE tables. It's not trivial to use only half of the page yet still get all of the pgtable_page_{ctor,dtor}() stuff working properly. It is doable, and that will come in a subsequent change. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
15b9350a |
|
08-Oct-2012 |
David Miller <davem@davemloft.net> |
sparc64: Only support 4MB huge pages and 8KB base pages. Narrowing the scope of the page size configurations will make the transparent hugepage changes much simpler. In the end what we really want to do is have the kernel support multiple huge page sizes and use whatever is appropriate as the context dictactes. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
679bea5e |
|
13-May-2012 |
David S. Miller <davem@davemloft.net> |
sparc: Kill mmu_{un,}lockarea(). These were used on sun4c during floppy data transfers since on that chip we had to lock the cpu mappings into the TLB because we cannot take a TLB miss during the assembler floppy interrupt handler that does the data transfer. That is no longer necessary since we've removed sun4c support, thus this stuff can disappear completely. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2533e824 |
|
01-Apr-2012 |
Aaro Koskinen <aaro.koskinen@iki.fi> |
sparc: pgtable_64: change include order Fix the following build breakage in v3.4-rc1: CC arch/sparc/kernel/cpu.o In file included from /home/aaro/git/linux/arch/sparc/include/asm/pgtable_64.h:15:0, from /home/aaro/git/linux/arch/sparc/include/asm/pgtable.h:4, from arch/sparc/kernel/cpu.c:15: include/asm-generic/pgtable-nopud.h:13:16: error: unknown type name 'pgd_t' include/asm-generic/pgtable-nopud.h:25:28: error: unknown type name 'pgd_t' include/asm-generic/pgtable-nopud.h:26:27: error: unknown type name 'pgd_t' include/asm-generic/pgtable-nopud.h:27:31: error: unknown type name 'pgd_t' include/asm-generic/pgtable-nopud.h:28:30: error: unknown type name 'pgd_t' include/asm-generic/pgtable-nopud.h:38:34: error: unknown type name 'pgd_t' In file included from /home/aaro/git/linux/arch/sparc/include/asm/pgtable_64.h:783:0, from /home/aaro/git/linux/arch/sparc/include/asm/pgtable.h:4, from arch/sparc/kernel/cpu.c:15: include/asm-generic/pgtable.h: In function 'pgd_none_or_clear_bad': include/asm-generic/pgtable.h:258:2: error: implicit declaration of function 'pgd_none' [-Werror=implicit-function-declaration] include/asm-generic/pgtable.h:260:2: error: implicit declaration of function 'pgd_bad' [-Werror=implicit-function-declaration] include/asm-generic/pgtable.h: In function 'pud_none_or_clear_bad': include/asm-generic/pgtable.h:269:6: error: request for member 'pgd' in something not a structure or union Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d550bbd4 |
|
28-Mar-2012 |
David Howells <dhowells@redhat.com> |
Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for Sparc. Signed-off-by: David Howells <dhowells@redhat.com> cc: sparclinux@vger.kernel.org
|
#
3e37fd31 |
|
17-Nov-2011 |
David S. Miller <davem@davemloft.net> |
sparc: Kill custom io_remap_pfn_range(). To handle the large physical addresses, just make a simple wrapper around remap_pfn_range() like MIPS does. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
683d2fa6 |
|
25-Jul-2011 |
David S. Miller <davem@davemloft.net> |
sparc64: add support for _PAGE_SPECIAL Luckily there are still a few software PTE bits remaining and they even match up in both the sun4u and sun4v pte layouts. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
90f08e39 |
|
24-May-2011 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
sparc: mmu_gather rework Rework the sparc mmu_gather usage to conform to the new world order :-) Sparc mmu_gather does two things: - tracks vaddrs to unhash - tracks pages to free Split these two things like powerpc has done and keep the vaddrs in per-cpu data structures and flush them on context switch. The remaining bits can then use the generic mmu_gather. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cb1b8209 |
|
21-Apr-2011 |
Sam Ravnborg <sam@ravnborg.org> |
sparc: consolidate show_cpuinfo in cpu.c We have all the cpu related info in cpu.c - so move the remaining functions to support /proc/cpuinfo to this file. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ece0e2b6 |
|
26-Oct-2010 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
mm: remove pte_*map_nested() Since we no longer need to provide KM_type, the whole pte_*map_nested() API is now redundant, remove it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4b3073e1 |
|
18-Dec-2009 |
Russell King <rmk+kernel@arm.linux.org.uk> |
MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself On VIVT ARM, when we have multiple shared mappings of the same file in the same MM, we need to ensure that we have coherency across all copies. We do this via make_coherent() by making the pages uncacheable. This used to work fine, until we allowed highmem with highpte - we now have a page table which is mapped as required, and is not available for modification via update_mmu_cache(). Ralf Beache suggested getting rid of the PTE value passed to update_mmu_cache(): On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables to construct a pointer to the pte again. Passing a pte_t * is much more elegant. Maybe we might even replace the pte argument with the pte_t? Ben Herrenschmidt would also like the pte pointer for PowerPC: Passing the ptep in there is exactly what I want. I want that -instead- of the PTE value, because I have issue on some ppc cases, for I$/D$ coherency, where set_pte_at() may decide to mask out the _PAGE_EXEC. So, pass in the mapped page table pointer into update_mmu_cache(), and remove the PTE value, updating all implementations and call sites to suit. Includes a fix from Stephen Rothwell: sparc: fix fallout from update_mmu_cache API change Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
#
1b6b9d62 |
|
28-Sep-2009 |
David S. Miller <davem@davemloft.net> |
sparc64: Increase vmalloc size to fix percpu regressions. Since we now use the embedding percpu allocator we have to make the vmalloc area at least as large as the stretch can be between nodes. Besides some minor asm adjustments, this turned out to be pretty trivial. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d8ed1d43 |
|
25-Aug-2009 |
David S. Miller <davem@davemloft.net> |
sparc64: Validate linear D-TLB misses. When page alloc debugging is not enabled, we essentially accept any virtual address for linear kernel TLB misses. But with kgdb, kernel address probing, and other facilities we can try to access arbitrary crap. So, make sure the address we miss on will translate to physical memory that actually exists. In order to make this work we have to embed the valid address bitmap into the kernel image. And in order to make that less expensive we make an adjustment, in that the max physical memory address is decreased to "1 << 41", even on the chips that support a 42-bit physical address space. We can do this because bit 41 indicates "I/O space" and thus covers non-memory ranges. The result of this is that: 1) kpte_linear_bitmap shrinks from 2K to 1K in size 2) we need 64K more for the valid address bitmap We can't let the valid address bitmap be dynamically allocated once we start using it to validate TLB misses, otherwise we have crazy issues to deal with wrt. recursive TLB misses and such. If we're in a TLB miss it could be the deepest trap level that's legal inside of the cpu. So if we TLB miss referencing the bitmap, the cpu will be out of trap levels and enter RED state. To guard against out-of-range accesses to the bitmap, we have to check to make sure no bits in the physical address above bit 40 are set. We could export and use last_valid_pfn for this check, but that's just an unnecessary extra memory reference. On the plus side of all this, since we load all of these translations into the special 4MB mapping TSB, and we check the TSB first for TLB misses, there should be absolutely no real cost for these new checks in the TLB miss path. Reported-by: heyongli@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b539c467 |
|
12-Sep-2008 |
David S. Miller <davem@davemloft.net> |
sparc64: Fix sparse warnings in fault.c 1) set_brkpt() is referenced by nothing and hasn't been used by anyone to my knowledge for many many years. So just delete it. 2) add extern decl for do_sparc64_fault() in asm/pgtable_64.h Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a439fe51 |
|
27-Jul-2008 |
Sam Ravnborg <sam@ravnborg.org> |
sparc, sparc64: use arch/sparc/include The majority of this patch was created by the following script: *** ASM=arch/sparc/include/asm mkdir -p $ASM git mv include/asm-sparc64/ftrace.h $ASM git rm include/asm-sparc64/* git mv include/asm-sparc/* $ASM sed -ie 's/asm-sparc64/asm/g' $ASM/* sed -ie 's/asm-sparc/asm/g' $ASM/* *** The rest was an update of the top-level Makefile to use sparc for header files when sparc64 is being build. And a small fixlet to pick up the correct unistd.h from sparc64 code. Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|