Cross Reference: /linux-master/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c

Revision	Date	Author	Comments
# 46dec616	08-Dec-2023	Thierry Reding <treding@nvidia.com>	drm/nouveau: Fixup gk20a instobj hierarchy Commit 12c9b05da918 ("drm/nouveau/imem: support allocations not preserved across suspend") uses container_of() to cast from struct nvkm_memory to struct nvkm_instobj, assuming that all instance objects are derived from struct nvkm_instobj. For the gk20a family that's not the case and they are derived from struct nvkm_memory instead. This causes some subtle data corruption (nvkm_instobj.preserve ends up mapping to gk20a_instobj.vaddr) that causes a NULL pointer dereference in gk20a_instobj_acquire_iommu() (and possibly elsewhere) and also prevents suspend/resume from working. Fix this by making struct gk20a_instobj derive from struct nvkm_instobj instead. Fixes: 12c9b05da918 ("drm/nouveau/imem: support allocations not preserved across suspend") Reported-by: Jonathan Hunter <jonathanh@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231208104653.1917055-1-thierry.reding@gmail.com
# 624c6f78	18-Sep-2023	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/imem/tu102-: prepare for GSP-RM - move suspend/resume paths to HW-specific code - allow (future) RM paths to be based on nv50_instmem Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230918202149.4343-15-skeggsb@gmail.com
# 1369459b	23-Jan-2023	Jason Gunthorpe <jgg@ziepe.ca>	iommu: Add a gfp parameter to iommu_map() The internal mechanisms support this, but instead of exposting the gfp to the caller it wrappers it into iommu_map() and iommu_map_atomic() Fix this instead of adding more variants for GFP_KERNEL_ACCOUNT. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/1-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
# d9691a22	03-Dec-2020	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/instmem: switch to instanced constructor Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
# e0ec8a4d	17-Aug-2020	Christoph Hellwig <hch@lst.de>	drm/nouveau/gk20a: stop setting DMA_ATTR_NON_CONSISTENT DMA_ATTR_NON_CONSISTENT is a no-op except on PA-RISC and a few MIPS configs, so don't set it in this ARM specific driver part. Signed-off-by: Christoph Hellwig <hch@lst.de>
# 632b740c	31-Oct-2017	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/mmu: remove old vmm frontend Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# 9202d732	31-Oct-2017	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/imem/nv50-: use new interfaces for vmm operations Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# f9463a4b	31-Oct-2017	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/mmu: implement new vmm frontend These are the new priviledged interfaces to the VMM backends, and expose some functionality that wasn't previously available. It's now possible to allocate a chunk of address-space (even all of it), without causing page tables to be allocated up-front, and then map into it at arbitrary locations. This is the basic primitive used to support features such as sparse mapping, or to allow userspace control over its own address-space, or HMM (where the GPU driver isn't in control of the address-space layout). Rather than being tied to a subtle combination of memory object and VMA properties, arguments that control map flags (ro, kind, etc) are passed explicitly at map time. The compatibility hacks to implement the old frontend on top of the new driver backends have been replaced with something similar to implement the old frontend's interfaces on top of the new frontend. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# bd275f1d	31-Oct-2017	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau: wrap nvkm_mem objects in nvkm_memory interfaces This is a transition step, to enable finer-grained commits while transitioning to new MMU interfaces. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# 19a82e49	31-Oct-2017	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/core/memory: change map interface to support upcoming mmu changes Map flags (access, kind, etc) are currently defined in either the VMA, or the memory object, which turns out to not be ideal for things like suballocated buffers, etc. These will become per-map flags instead, so we need to support passing these arguments in nvkm_memory_map(). Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# 9ce523cc	31-Oct-2017	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau: separate buffer object backing memory from nvkm structures Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# 71370e62	31-Oct-2017	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/imem: remove now-unused wrapper for backend objects Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# 07bbc1c5	31-Oct-2017	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/core/memory: split info pointers from accessor pointers The accessor functions can change as a result of acquire()/release() calls, and are protected by any refcounting done there. Other functions must remain constant, as they can be called any time. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# e5ffa727	30-Jan-2017	Thierry Reding <treding@nvidia.com>	drm/nouveau/imem/gk20a: Turn instmem lock into mutex The gk20a implementation of instance memory uses vmap()/vunmap() to map memory regions into the kernel's virtual address space. These functions may sleep, so protecting them by a spin lock is not safe. This triggers a warning if the DEBUG_ATOMIC_SLEEP Kconfig option is enabled. Fix this by using a mutex instead. Signed-off-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Alexandre Courbot <acourbot@nvidia.com> Tested-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# d2ee3605	09-May-2016	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/core/memory: distinguish between coherent/non-coherent targets Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# 134fdc1a	03-Oct-2015	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/core/mm: replace region list with next pointer We never have any need for a double-linked list here, and as there's generally a large number of these objects, replace it with a single- linked list in order to save some memory. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# 00085f1e	03-Aug-2016	Krzysztof Kozlowski <krzk@kernel.org>	dma-mapping: use unsigned long for dma_attrs The dma-mapping core and the implementations do not change the DMA attributes passed by pointer. Thus the pointer can point to const data. However the attributes do not have to be a bitfield. Instead unsigned long will do fine: 1. This is just simpler. Both in terms of reading the code and setting attributes. Instead of initializing local attributes on the stack and passing pointer to it to dma_set_attr(), just set the bits. 2. It brings safeness and checking for const correctness because the attributes are passed by value. Semantic patches for this change (at least most of them): virtual patch virtual context @r@ identifier f, attrs; @@ f(..., - struct dma_attrs attrs + unsigned long attrs , ...) { ... } @@ identifier r.f; @@ f(..., - NULL + 0 ) and // Options: --all-includes virtual patch virtual context @r@ identifier f, attrs; type t; @@ t f(..., struct dma_attrs attrs); @@ identifier r.f; @@ f(..., - NULL + 0 ) Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> Acked-by: Mark Salter <msalter@redhat.com> [c6x] Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris] Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm] Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp] Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core] Acked-by: David Vrabel <david.vrabel@citrix.com> [xen] Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb] Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon] Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32] Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc] Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# e02d586d	03-Mar-2016	Alexandre Courbot <acourbot@nvidia.com>	drm/nouveau/instmem/gk20a: add write barrier when releasing DMA object When using the DMA-API for instmem, we may obtain a write-combined mapping. For such cases, add a write barrier in gk20a_instobj_release_dma() to make sure that all writes have reached memory at this time. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# b306712d	11-Nov-2015	Alexandre Courbot <acourbot@nvidia.com>	drm/nouveau/instmem/gk20a: use DMA API CPU mapping Commit 69c4938249fb ("drm/nouveau/instmem/gk20a: use direct CPU access") tried to be smart while using the DMA-API by managing the CPU mappings of buffers allocated with the DMA-API by itself. In doing so, it relied on dma_to_phys() which is an architecture-private function not available everywhere. This broke the build on several architectures. Since there is no reliable and portable way to obtain the physical address of a DMA-API buffer, stop trying to be smart and just use the CPU mapping that the DMA-API can provide. This means that buffers will be CPU-mapped for all their life as opposed to when we need them, but anyway using the DMA-API here is a fallback for when no IOMMU is available so we should not expect optimal behavior. This makes the IOMMU and DMA-API implementations of instmem diverge enough that we should maybe put them into separate files... Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# 338840ee	09-Nov-2015	Alexandre Courbot <acourbot@nvidia.com>	drm/nouveau/instmem/gk20a: fix race conditions The LRU list used for recycling CPU mappings was handling concurrency very poorly. For instance, if an instobj was acquired twice before being released once, it would end up into the LRU list even though there is still a client accessing it. This patch fixes this by properly counting how many clients are currently using a given instobj. While at it, we also raise errors when inconsistencies are detected, and factorize some code. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# 10855aeb	09-Nov-2015	Dave Airlie <airlied@redhat.com>	drm/nouveau: fix build failures on all non ARM. gk20a is an ARM only GPU, so we can just do the correct thing on ARM but fail on other architectures. The other option was to use SWIOTLB as the define, which means phys_to_page exists, but this seems clearer. Signed-off-by: Dave Airlie <airlied@redhat.com>
# 68b56653	04-Sep-2015	Alexandre Courbot <acourbot@nvidia.com>	drm/nouveau/instmem/gk20a: make use of the IOMMU bit Use the IOMMU bit specified in platform data instead of hardcoding it to the bit used by current Tegra GPUs. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# 69c49382	04-Sep-2015	Alexandre Courbot <acourbot@nvidia.com>	drm/nouveau/instmem/gk20a: use direct CPU access The Great Nouveau Refactoring Take II brought us a lot of goodness, including acquire/release methods that are called before and after an instobj is modified. These functions can be used as synchronization points to manage CPU/GPU coherency if we modify an instobj using the CPU. This patch replaces the legacy and slow PRAMIN access for gk20a instmem with CPU mappings and writes. A LRU list is used to unmap unused mappings after a certain threshold (currently 1MB) of mapped instobjs is reached. This allows mappings to be reused most of the time. Accessing instobjs using the CPU requires to maintain the GPU L2 cache, which we do in the acquire/release functions. This triggers a lot of L2 flushes/invalidates, but most of them are performed on an empty cache (and thus return immediately), and overall context setup performance greatly benefits from this (from 250ms to 160ms on Jetson TK1 for a simple libdrm program). Making L2 management more explicit should allow us to grab some more performance in the future. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# 43a70661	19-Aug-2015	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/tegra: merge platform setup from nouveau drm The copyright header in nvkm/engine/device/platform.c has been replaced with the NVIDIA one from drm/nouveau_platform.c, as most of the actual code is now theirs. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# 26c9e8ef	19-Aug-2015	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/device: remove pci/platform_device from common struct Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# b7a2bc18	19-Aug-2015	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/imem: convert to new-style nvkm_subdev Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# d8e83994	19-Aug-2015	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/imem: improve management of instance memory Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# 47b2505e	19-Aug-2015	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/platform: remove subclassing of nvkm_device Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# 00c55507	19-Aug-2015	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/imem: switch to subdev printk macros Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# d5c5bcf6	19-Aug-2015	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/imem: switch to device pri macros Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# c44c06ae	19-Aug-2015	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/imem: cosmetic changes This is purely preparation for upcoming commits, there should be no code changes here. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# 9ace404b	19-Aug-2015	Ben Skeggs <bskeggs@redhat.com>	drm/nouveau/device: include core/device.h automatically for subdevs/engines Pretty much every subdev/engine is going to need access to nvkm_device shortly to touch registers and/or output messages. The odd placement of the includes is necessary to work around some inter-dependencies that currently exist. This will be fixed later. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# df16896b	10-Mar-2015	Alexandre Courbot <acourbot@nvidia.com>	drm/nouveau/instmem/gk20a: fix crash during error path If a memory allocation fails when using the DMA allocator, gk20a_instobj_dtor_dma() will be called on the failed instmem object. At this time, node->handle might not be NULL despite the call to dma_alloc_attrs() having failed. node->cpuaddr is the right member to check for such a failure, so use it instead. Reported-by: Vince Hsu <vinceh@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Reviewed-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# a7f6da6e	20-Feb-2015	Alexandre Courbot <acourbot@nvidia.com>	drm/nouveau/instmem/gk20a: add IOMMU support Let GK20A's instmem take advantage of the IOMMU if it is present. Having an IOMMU means that instmem is no longer allocated using the DMA API, but instead obtained through page_alloc and made contiguous to the GPU by IOMMU mappings. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# 5dc240bc	20-Feb-2015	Alexandre Courbot <acourbot@nvidia.com>	drm/nouveau/instmem/gk20a: use DMA attributes instmem for GK20A is allocated using dma_alloc_coherent(), which provides us with a coherent CPU mapping that we never use because instmem objects are accessed through PRAMIN. Switch to dma_alloc_attrs() which gives us the option to dismiss that CPU mapping and free up some CPU virtual space. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
# a6ff85d3	20-Feb-2015	Alexandre Courbot <acourbot@nvidia.com>	drm/nouveau/instmem/gk20a: move memory allocation to instmem GK20A does not have dedicated RAM, thus having a RAM device for it does not make sense. Move the contiguous physical memory allocation to instmem. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>