History log of /linux-master/drivers/misc/cxl/fault.c
Revision Date Author Comments
# d9fa0e37 06-Sep-2022 Matthew Wilcox (Oracle) <willy@infradead.org>

cxl: remove vma linked list walk

Use the VMA iterator instead. This requires a little restructuring of the
surrounding code to hoist the mm to the caller. That turns
cxl_prefault_one() into a trivial function, so call cxl_fault_segment()
directly.

Link: https://lkml.kernel.org/r/20220906194824.2110408-38-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 2d9f69bc 10-Mar-2021 Laurent Dufour <ldufour@linux.ibm.com>

cxl: don't manipulate the mm.mm_users field directly

It is better to rely on the API provided by the MM layer instead of
directly manipulating the mm_users field.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210310174405.51044-1-ldufour@linux.ibm.com


# 615d2ef0 10-Mar-2021 Laurent Dufour <ldufour@linux.ibm.com>

cxl: don't manipulate the mm.mm_users field directly

It is better to rely on the API provided by the MM layer instead of
directly manipulating the mm_users field.

Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Link: https://lore.kernel.org/r/20210310174405.51044-1-ldufour@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# d8ed45c5 08-Jun-2020 Michel Lespinasse <walken@google.com>

mmap locking API: use coccinelle to convert mmap_sem rwsem call sites

This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 0034d395 17-Apr-2019 Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

powerpc/mm/hash64: Map all the kernel regions in the same 0xc range

This patch maps vmalloc, IO and vmemap regions in the 0xc address range
instead of the current 0xd and 0xf range. This brings the mapping closer
to radix translation mode.

With hash 64K page size each of this region is 512TB whereas with 4K config
we are limited by the max page table range of 64TB and hence there regions
are of 16TB size.

The kernel mapping is now:

On 4K hash

kernel_region_map_size = 16TB
kernel vmalloc start = 0xc000100000000000
kernel IO start = 0xc000200000000000
kernel vmemmap start = 0xc000300000000000

64K hash, 64K radix and 4k radix:

kernel_region_map_size = 512TB
kernel vmalloc start = 0xc008000000000000
kernel IO start = 0xc00a000000000000
kernel vmemmap start = 0xc00c000000000000

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 50a7ca3c 17-Aug-2018 Souptick Joarder <jrdr.linux@gmail.com>

mm: convert return type of handle_mm_fault() caller to vm_fault_t

Use new return type vm_fault_t for fault handler. For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno. Once all instances are converted, vm_fault_t will become a
distinct type.

Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t")

In this patch all the caller of handle_mm_fault() are changed to return
vm_fault_t type.

Link: http://lkml.kernel.org/r/20180617084810.GA6730@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: David S. Miller <davem@davemloft.net>
Cc: Richard Weinberger <richard@nod.at>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a0ac3687 03-Aug-2018 Parth Y Shah <sparth1292@gmail.com>

misc: cxl: changed asterisk position

Resolved <"foo* bar" should be "foo *bar"> error

Signed-off-by: Parth Y Shah <sparth1292@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 01711142 03-Aug-2018 Parth Y Shah <sparth1292@gmail.com>

misc: cxl: changed asterisk position

Resolved <"foo* bar" should be "foo *bar"> error

Signed-off-by: Parth Y Shah <sparth1292@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 56328743 08-Sep-2017 Christophe Lombard <clombard@linux.vnet.ibm.com>

cxl: Add support for POWER9 DD2

The PSL initialization sequence has been updated to DD2.
This patch adapts to the changes, retaining compatibility with DD1.
The patch includes some changes to DD1 fix-ups as well.

Tests performed on some of the old/new hardware.

The function is_page_fault(), for POWER9, lists the Translation Checkout
Responses where the page fault will be handled by copro_handle_mm_fault().
This list is too restrictive and not necessary.

This patches removes this restriction and all page faults, whatever the
reason, will be handled. In this case, the interruption is always
acknowledged.

The following features will be added soon:
- phb reset when switching to capi mode.
- cxllib update to support new functions.

Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>

Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 22259a6e 28-Aug-2017 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

powerpc/mm/cxl: Add barrier when setting mm cpumask

We need to add memory barrier so that the page table walk doesn't happen
before the cpumask is set and made visible to the other cpus. We need
to use a sync here instead of lwsync because lwsync is not sufficient for
store/load ordering.

We also need to add an if (mm) check so that we do the right thing when called
with a kernel context. For kernel context, we have mm = NULL. W.r.t kernel
address we can skip setting the mm cpumask.

Fixes: 0f4bc0932e ("powerpc/mm/cxl: Add the fault handling cpu to mm cpumask")
Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 0f4bc093 27-Jul-2017 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

powerpc/mm/cxl: Add the fault handling cpu to mm cpumask

We use mm cpumask for serializing against lockless page table walk.
Anybody who is doing a lockless page table walk is expected to disable
irq and only cpus in mm cpumask is expected do the lockless walk. This
ensure that a THP split can send IPI to only cpus in the mm cpumask,
to make sure there are no parallel lockless page table walk.

Add the CAPI fault handling cpu to the mm cpumask so that we can do
the lockless page table walk while inserting hash page table entries.

Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 3ced8d73 22-Jun-2017 Christophe Lombard <clombard@linux.vnet.ibm.com>

cxl: Export library to support IBM XSL

This patch exports a in-kernel 'library' API which can be called by
other drivers to help interacting with an IBM XSL on a POWER9 system.

The XSL (Translation Service Layer) is a stripped down version of the
PSL (Power Service Layer) used in some cards such as the Mellanox CX5.
Like the PSL, it implements the CAIA architecture, but has a number
of differences, mostly in it's implementation dependent registers.

The XSL also uses a special DMA cxl mode, which uses a slightly
different init sequence for the CAPP and PHB.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 797625de 13-Jun-2017 Christophe Lombard <clombard@linux.vnet.ibm.com>

cxl: Fixes for Coherent Accelerator Interface Architecture 2.0

A previous set of patches "cxl: Add support for Coherent Accelerator
Interface Architecture 2.0" has introduced a new support for the CAPI
cards. These patches have been tested on Simulation environment and
quite a bit of them have been tested on real hardware.

This patch brings new fixes after a series of tests carried out on new
equipment:
- Add POWER9 definition.
- Re-enable any masked interrupts when the AFU is not activated
after resetting the AFU.
- Remove the api cxl_is_psl8/9 which is no longer useful.
- Do not dump CAPI1 registers.
- Rewrite cxl_is_page_fault() function.
- Do not register slb callack on P9.

Fixes: f24be42aab37 ("cxl: Add psl9 specific code")
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# f24be42a 12-Apr-2017 Christophe Lombard <clombard@linux.vnet.ibm.com>

cxl: Add psl9 specific code

The new Coherent Accelerator Interface Architecture, level 2, for the
IBM POWER9 brings new content and features:
- POWER9 Service Layer
- Registers
- Radix mode
- Process element entry
- Dedicated-Shared Process Programming Model
- Translation Fault Handling
- CAPP
- Memory Context ID
If a valid mm_struct is found the memory context id is used for each
transaction associated with the process handle. The PSL uses the
context ID to find the corresponding process element.

Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
[mpe: Fixup comment formatting, unsplit long strings]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 6dd2d234 07-Apr-2017 Christophe Lombard <clombard@linux.vnet.ibm.com>

cxl: Keep track of mm struct associated with a context

The mm_struct corresponding to the current task is acquired each time
an interrupt is raised. So to simplify the code, we only get the
mm_struct when attaching an AFU context to the process.
The mm_count reference is increased to ensure that the mm_struct can't
be freed. The mm_struct will be released when the context is detached.
A reference on mm_users is not kept to avoid a circular dependency if
the process mmaps its cxl mmio and forget to unmap before exiting.
The field glpid (pid of the group leader associated with the pid), of
the structure cxl_context, is removed because it's no longer useful.

Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 174cd4b1 02-Feb-2017 Ingo Molnar <mingo@kernel.org>

sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>

Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6e84f315 08-Feb-2017 Ingo Molnar <mingo@kernel.org>

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h>

We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/mm.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

The APIs that are going to be moved first are:

mm_alloc()
__mmdrop()
mmdrop()
mmdrop_async_fn()
mmdrop_async()
mmget_not_zero()
mmput()
mmput_async()
get_task_mm()
mm_access()
mm_release()

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# ac29c640 29-Apr-2016 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED

_PAGE_PRIVILEGED means the page can be accessed only by the kernel. This
is done to keep pte bits similar to PowerISA 3.0 Radix PTE format. User
pages are now marked by clearing _PAGE_PRIVILEGED bit.

Previously we allowed the kernel to have a privileged page in the lower
address range (USER_REGION). With this patch such access is denied.

We also prevent a kernel access to a non-privileged page in higher
address range (ie, REGION_ID != 0).

Both the above access scenarios should never happen.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# c7d54842 29-Apr-2016 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

powerpc/mm: Use _PAGE_READ to indicate Read access

This splits the _PAGE_RW bit into _PAGE_READ and _PAGE_WRITE. It also
removes the dependency on _PAGE_USER for implying read only. Few things
to note here is that, we have read implied with write and execute
permission. Hence we should always find _PAGE_READ set on hash pte
fault.

We still can't switch PROT_NONE to !(_PAGE_RWX). Auto numa depends on
marking a prot none pte _PAGE_WRITE. (For more details look at
b191f9b106ea "mm: numa: preserve PTE write permissions across a NUMA
hinting fault")

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 3b1dbfa1 20-Apr-2016 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

cxl: Fix DAR check & use REGION_ID instead of opencoding

The current code will set _PAGE_USER to the access flags for any
fault address, because the ~ operation will be true for all address we
take a fault on. But setting _PAGE_USER also means that the fault will
be handled only if the page table have _PAGE_USER set. Hence there is
no security hole with the current code.

Now if it is an user space access, then the change in this patch really
don't have an impact because we have (!ctx->kernel) set true
and we take the if condition true.

Now kernel context created fault on an address in the kernel range
will result in a fault loop because we will not insert the
hash pte due to access and pte permission mismatch. This patch fix
the above issue.

Fixes: f204e0b8cedd ("cxl: Driver code for powernv PCIe based cards for userspace access")
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# ea2d1f95 03-Mar-2016 Frederic Barrat <fbarrat@linux.vnet.ibm.com>

cxl: Isolate a few bare-metal-specific calls

A few functions are mostly common between bare-metal and guest and
just need minor tuning. To avoid crowding the backend API, introduce a
few 'if' based on the CPU being in HV mode.

Co-authored-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 5be587b1 03-Mar-2016 Frederic Barrat <fbarrat@linux.vnet.ibm.com>

cxl: Introduce implementation-specific API

The backend API (in cxl.h) lists some low-level functions whose
implementation is different on bare-metal and in a guest. Each
environment implements its own functions, and the common code uses
them through function pointers, defined in cxl_backend_ops

Co-authored-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Manoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 7b8ad495 24-Nov-2015 Vaibhav Jain <vaibhav@linux.vnet.ibm.com>

cxl: Fix DSI misses when the context owning task exits

Presently when a user-space process issues CXL_IOCTL_START_WORK ioctl we
store the pid of the current task_struct and use it to get pointer to
the mm_struct of the process, while processing page or segment faults
from the capi card. However this causes issues when the thread that had
originally issued the start-work ioctl exits in which case the stored
pid is no more valid and the cxl driver is unable to handle faults as
the mm_struct corresponding to process is no more accessible.

This patch fixes this issue by using the mm_struct of the next alive
task in the thread group. This is done by iterating over all the tasks
in the thread group starting from thread group leader and calling
get_task_mm on each one of them. When a valid mm_struct is obtained the
pid of the associated task is stored in the context replacing the
exiting one for handling future faults.

The patch introduces a new function named get_mem_context that checks if
the current task pointed to by ctx->pid is dead? If yes it performs the
steps described above. Also a new variable cxl_context.glpid is
introduced which stores the pid of the thread group leader associated
with the context owning task.

Reported-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reported-by: Frank Haverkamp <HAVERKAM@de.ibm.com>
Suggested-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# a6b07d82 27-May-2015 Michael Neuling <mikey@neuling.org>

cxl: Only check pid for userspace contexts

We only need to check the pid attached to this context for userspace contexts.
Kernel contexts can skip this check.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 9bcf28cd 09-Jan-2015 Ian Munsie <imunsie@au1.ibm.com>

cxl: Add tracepoints

This patch adds tracepoints throughout the cxl driver, which can provide
insight into:

- Context lifetimes
- Commands sent to the PSL and AFU and their completion status
- Segment and page table misses and their resolution
- PSL and AFU interrupts
- slbia calls from the powerpc copro_fault code

These tracepoints are mostly intended to aid in debugging (particularly
for new AFU designs), and may be useful standalone or in conjunction
with hardware traces collected by the PSL (read out via the trace
interface in debugfs) and AFUs.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 13da7046 08-Dec-2014 Ian Munsie <imunsie@au1.ibm.com>

cxl: Early return from cxl_handle_fault for a shut down context

If a context is being detached and we get a translation fault for it
there is little point getting it's mm and handling the fault, so just
respond with an address error and return earlier.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# aefa5688 03-Dec-2014 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault

upatepp can get called for a nohpte fault when we find from the linux
page table that the translation was hashed before. In that case
we are sure that there is no existing translation, hence we could
avoid doing tlbie.

We could possibly race with a parallel fault filling the TLB. But
that should be ok because updatepp is only ever relaxing permissions.
We also look at linux pte permission bits when filling hash pte
permission bits. We also hold the linux pte busy bits while
inserting/updating a hashpte entry, hence a paralle update of
linux pte is not possible. On the other hand mprotect involves
ptep_modify_prot_start which cause a hpte invalidate and not updatepp.

Performance number:
We use randbox_access_bench written by Anton.

Kernel with THP disabled and smaller hash page table size.

86.60% random_access_b [kernel.kallsyms] [k] .native_hpte_updatepp
2.10% random_access_b random_access_bench [.] doit
1.99% random_access_b [kernel.kallsyms] [k] .do_raw_spin_lock
1.85% random_access_b [kernel.kallsyms] [k] .native_hpte_insert
1.26% random_access_b [kernel.kallsyms] [k] .native_flush_hash_range
1.18% random_access_b [kernel.kallsyms] [k] .__delay
0.69% random_access_b [kernel.kallsyms] [k] .native_hpte_remove
0.37% random_access_b [kernel.kallsyms] [k] .clear_user_page
0.34% random_access_b [kernel.kallsyms] [k] .__hash_page_64K
0.32% random_access_b [kernel.kallsyms] [k] fast_exception_return
0.30% random_access_b [kernel.kallsyms] [k] .hash_page_mm

With Fix:

27.54% random_access_b random_access_bench [.] doit
22.90% random_access_b [kernel.kallsyms] [k] .native_hpte_insert
5.76% random_access_b [kernel.kallsyms] [k] .native_hpte_remove
5.20% random_access_b [kernel.kallsyms] [k] fast_exception_return
5.12% random_access_b [kernel.kallsyms] [k] .__hash_page_64K
4.80% random_access_b [kernel.kallsyms] [k] .hash_page_mm
3.31% random_access_b [kernel.kallsyms] [k] data_access_common
1.84% random_access_b [kernel.kallsyms] [k] .trace_hardirqs_on_caller

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# eb01d4c2 27-Oct-2014 Ian Munsie <imunsie@au1.ibm.com>

cxl: Fix PSL error due to duplicate segment table entries

In certain circumstances the PSL (Power Service Layer, which provides
translation services for CXL hardware) can send an interrupt for a
segment miss that the kernel has already handled. This can happen if
multiple translations for the same segment are queued in the PSL before
the kernel has restarted the first translation.

The CXL driver does not expect this situation and does not check if a
segment had already been handled. This could cause a duplicate segment
table entry which in turn caused a PSL error taking down the card.

This patch fixes the issue by checking for existing entries in the
segment table that match the segment we are trying to insert, so as to
avoid inserting duplicate entries.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# b03a7f57 27-Oct-2014 Ian Munsie <imunsie@au1.ibm.com>

cxl: Refactor cxl_load_segment() and find_free_sste()

This moves the segment table hash calculation from cxl_load_segment()
into find_free_sste() since that is the only place it is actually used.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 5100a9d6 27-Oct-2014 Ian Munsie <imunsie@au1.ibm.com>

cxl: Disable secondary hash in segment table

This patch simplifies the process of finding a free segment table entry
by disabling the secondary hash. This reduces the number of possible
entries in the segment table for a given address from 16 to 8.

Due to the large segment sizes we use it is extremely unlikely that the
secondary hash would ever have been used in practice, so this should not
have any negative impacts and may even improve performance due to the
reduced number of comparisons that software & hardware need to perform.

This patch clears the SC bit in the hardware's state register
(CXL_PSL_SR_An) to disable the secondary hash in the hardware since we
can no longer fill out entries using it.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# f204e0b8 08-Oct-2014 Ian Munsie <imunsie@au1.ibm.com>

cxl: Driver code for powernv PCIe based cards for userspace access

This is the core of the cxl driver.

It adds support for using cxl cards in the powernv environment only (ie POWER8
bare metal). It allows access to cxl accelerators by userspace using the
/dev/cxl/afuM.N char devices.

The kernel driver has no knowledge of the function implemented by the
accelerator. It provides services to userspace via the /dev/cxl/afuM.N
devices. When a program opens this device and runs the start work IOCTL, the
accelerator will have coherent access to that processes memory using the same
virtual addresses. That process may mmap the device to access any MMIO space
the accelerator provides. Also, reads on the device will allow interrupts to
be received. These services are further documented in a later patch in
Documentation/powerpc/cxl.txt.

Documentation of the cxl hardware architecture and userspace API is provided in
subsequent patches.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>