#
e4ae85e3 |
|
07-Nov-2023 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Add ability for tracking buffer objects per client In order to show per client memory usage lets add some infrastructure which enables tracking buffer objects owned by clients. We add a per client list protected by a new per client lock and to support delayed destruction (post client exit) we make tracked objects hold references to the owning client. Also, object memory region teardown is moved to the existing RCU free callback to allow safe dereference from the fdinfo RCU read section. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231107101806.608990-1-tvrtko.ursulin@linux.intel.com
|
#
d6c531ab |
|
01-Aug-2023 |
Chris Wilson <chris.p.wilson@linux.intel.com> |
drm/i915: Invalidate the TLBs on each GT With multi-GT devices, the object may have been bound on each GT. Invalidate the TLBs across all GT before releasing the pages back to the system. Signed-off-by: Chris Wilson <chris.p.wilson@linux.intel.com> Cc: Fei Yang <fei.yang@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230801141955.383305-4-andi.shyti@linux.intel.com
|
#
72e31c0a |
|
27-Jul-2023 |
Jouni Högander <jouni.hogander@intel.com> |
drm/i915: Add macros to get i915 device from i915_gem_object We want to stop touching directly i915_gem_object struct members in intel_frontbuffer code. As a part of this we add helper macro to get i915 device from i915_gem_object. v2: operate on and return pointer in defined macros Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230727064142.751976-2-jouni.hogander@intel.com
|
#
9275277d |
|
09-May-2023 |
Fei Yang <fei.yang@intel.com> |
drm/i915: use pat_index instead of cache_level Currently the KMD is using enum i915_cache_level to set caching policy for buffer objects. This is flaky because the PAT index which really controls the caching behavior in PTE has far more levels than what's defined in the enum. In addition, the PAT index is platform dependent, having to translate between i915_cache_level and PAT index is not reliable, and makes the code more complicated. From UMD's perspective there is also a necessity to set caching policy for performance fine tuning. It's much easier for the UMD to directly use PAT index because the behavior of each PAT index is clearly defined in Bspec. Having the abstracted i915_cache_level sitting in between would only cause more ambiguity. PAT is expected to work much like MOCS already works today, and by design userspace is expected to select the index that exactly matches the desired behavior described in the hardware specification. For these reasons this patch replaces i915_cache_level with PAT index. Also note, the cache_level is not completely removed yet, because the KMD still has the need of creating buffer objects with simple cache settings such as cached, uncached, or writethrough. For kernel objects, cache_level is used for simplicity and backward compatibility. For Pre-gen12 platforms PAT can have 1:1 mapping to i915_cache_level, so these two are interchangeable. see the use of LEGACY_CACHELEVEL. One consequence of this change is that gen8_pte_encode is no longer working for gen12 platforms due to the fact that gen12 platforms has different PAT definitions. In the meantime the mtl_pte_encode introduced specfically for MTL becomes generic for all gen12 platforms. This patch renames the MTL PTE encode function into gen12_pte_encode and apply it to all gen12. Even though this change looks unrelated, but separating them would temporarily break gen12 PTE encoding, thus squash them in one patch. Special note: this patch changes the way caching behavior is controlled in the sense that some objects are left to be managed by userspace. For such objects we need to be careful not to change the userspace settings.There are kerneldoc and comments added around obj->cache_coherent, cache_dirty, and how to bypass the checkings by i915_gem_object_has_cache_level. For full understanding, these changes need to be looked at together with the two follow-up patches, one disables the {set|get}_caching ioctl's and the other adds set_pat extension to the GEM_CREATE uAPI. Bspec: 63019 Cc: Chris Wilson <chris.p.wilson@linux.intel.com> Signed-off-by: Fei Yang <fei.yang@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
|
#
5e352e32 |
|
09-May-2023 |
Fei Yang <fei.yang@intel.com> |
drm/i915: preparation for using PAT index This patch is a preparation for replacing enum i915_cache_level with PAT index. Caching policy for buffer objects is set through the PAT index in PTE, the old i915_cache_level is not sufficient to represent all caching modes supported by the hardware. Preparing the transition by adding some platform dependent data structures and helper functions to translate the cache_level to pat_index. cachelevel_to_pat: a platform dependent array mapping cache_level to pat_index. max_pat_index: the maximum PAT index recommended in hardware specification Needed for validating the PAT index passed in from user space. i915_gem_get_pat_index: function to convert cache_level to PAT index. obj_to_i915(obj): macro moved to header file for wider usage. I915_MAX_CACHE_LEVEL: upper bound of i915_cache_level for the convenience of coding. Cc: Chris Wilson <chris.p.wilson@linux.intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Fei Yang <fei.yang@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-2-fei.yang@intel.com
|
#
ddb24fc5 |
|
04-Apr-2023 |
Nirmoy Das <nirmoy.das@intel.com> |
drm/i915/ttm: Add I915_BO_PREALLOC Add a mechanism to preserve existing data when creating a TTM object with the I915_BO_ALLOC_USER flag. This will be used in the subsequent patch where the I915_BO_ALLOC_USER flag will be applied to the framebuffer object. For a pre-allocated framebuffer without the I915_BO_PREALLOC flag. TTM would clear the content, which is not desirable. Cc: Matthew Auld <matthew.auld@intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Imre Deak <imre.deak@intel.com> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230404143100.10452-1-nirmoy.das@intel.com
|
#
779cb5ba |
|
20-Mar-2023 |
Ville Syrjälä <ville.syrjala@linux.intel.com> |
drm/i915/dpt: Treat the DPT BO as a framebuffer Currently i915_gem_object_is_framebuffer() doesn't treat the BO containing the framebuffer's DPT as a framebuffer itself. This means eg. that the shrinker can evict the DPT BO while leaving the actual FB BO bound, when the DPT is allocated from regular shmem. That causes an immediate oops during hibernate as we try to rewrite the PTEs inside the already evicted DPT obj. TODO: presumably this might also be the reason for the DPT related display faults under heavy memory pressure, but I'm still not sure how that would happen as the object should be pinned by intel_dpt_pin() while in active use by the display engine... Cc: stable@vger.kernel.org Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Imre Deak <imre.deak@intel.com> Fixes: 0dc987b699ce ("drm/i915/display: Add smem fallback allocation for dpt") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230320090522.9909-2-ville.syrjala@linux.intel.com Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
|
#
3413881e |
|
20-Mar-2023 |
Ville Syrjälä <ville.syrjala@linux.intel.com> |
drm/i915/dpt: Treat the DPT BO as a framebuffer Currently i915_gem_object_is_framebuffer() doesn't treat the BO containing the framebuffer's DPT as a framebuffer itself. This means eg. that the shrinker can evict the DPT BO while leaving the actual FB BO bound, when the DPT is allocated from regular shmem. That causes an immediate oops during hibernate as we try to rewrite the PTEs inside the already evicted DPT obj. TODO: presumably this might also be the reason for the DPT related display faults under heavy memory pressure, but I'm still not sure how that would happen as the object should be pinned by intel_dpt_pin() while in active use by the display engine... Cc: stable@vger.kernel.org Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Imre Deak <imre.deak@intel.com> Fixes: 0dc987b699ce ("drm/i915/display: Add smem fallback allocation for dpt") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230320090522.9909-2-ville.syrjala@linux.intel.com Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> (cherry picked from commit 779cb5ba64ec7df80675a956c9022929514f517a) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
95df9cc2 |
|
12-Dec-2022 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/ttm: consider CCS for backup objects It seems we can have one or more framebuffers that are still pinned when suspending lmem, in such a case we end up creating a shmem backup object, instead of evicting the object directly, but this will skip copying the CCS aux state, since we don't allocate the extra storage for the CCS pages as part of the ttm_tt construction. Since we can already deal with pinned objects just fine, it doesn't seem too nasty to just extend to support dealing with the CCS aux state, if the object is a pinned framebuffer. This fixes display corruption (like in gnome-shell) seen on DG2 when returning from suspend. Fixes: da0595ae91da ("drm/i915/migrate: Evict and restore the flatccs capable lmem obj") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: <stable@vger.kernel.org> # v5.19+ Tested-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221212171958.82593-2-matthew.auld@intel.com
|
#
a3185f91 |
|
09-May-2022 |
Christian König <christian.koenig@amd.com> |
drm/ttm: merge ttm_bo_api.h and ttm_bo_driver.h v2 Merge and cleanup the two headers into a single description of the object API. Also move all the documentation to the implementation and drop unnecessary includes from the header. No functional change. v2: minimal checkpatch.pl cleanup Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221125102137.1801-4-christian.koenig@amd.com
|
#
ad0fca2d |
|
12-Dec-2022 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/ttm: consider CCS for backup objects It seems we can have one or more framebuffers that are still pinned when suspending lmem, in such a case we end up creating a shmem backup object, instead of evicting the object directly, but this will skip copying the CCS aux state, since we don't allocate the extra storage for the CCS pages as part of the ttm_tt construction. Since we can already deal with pinned objects just fine, it doesn't seem too nasty to just extend to support dealing with the CCS aux state, if the object is a pinned framebuffer. This fixes display corruption (like in gnome-shell) seen on DG2 when returning from suspend. Fixes: da0595ae91da ("drm/i915/migrate: Evict and restore the flatccs capable lmem obj") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: <stable@vger.kernel.org> # v5.19+ Tested-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221212171958.82593-2-matthew.auld@intel.com (cherry picked from commit 95df9cc24bee8a09d39c62bcef4319b984814e18) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
999f4562 |
|
04-Oct-2022 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: allow control over the flags when migrating In the next patch we want to move the object (if the current resource is not compatible), to the mappable part of lmem for some display buffers. Currently that requires being able to unset the I915_BO_ALLOC_GPU_ONLY hint. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Jianshui Yu <jianshui.yu@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221004131916.233474-3-matthew.auld@intel.com
|
#
695ddc93 |
|
04-Oct-2022 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: allow control over the flags when migrating In the next patch we want to move the object (if the current resource is not compatible), to the mappable part of lmem for some display buffers. Currently that requires being able to unset the I915_BO_ALLOC_GPU_ONLY hint. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Jianshui Yu <jianshui.yu@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221004131916.233474-3-matthew.auld@intel.com (cherry picked from commit 999f4562077208b683f0519e5f1aa1e5c2fd2191) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
ad74457a |
|
13-Sep-2022 |
Anshuman Gupta <anshuman.gupta@intel.com> |
drm/i915/dgfx: Release mmap on rpm suspend Release all mmap mapping for all lmem objects which are associated with userfault such that, while pcie function in D3hot, any access to memory mappings will raise a userfault. Runtime resume the dgpu(when gem object lies in lmem). This will transition the dgpu graphics function to D0 state if it was in D3 in order to access the mmap memory mappings. v2: - Squashes the patches. [Matt Auld] - Add adequate locking for lmem_userfault_list addition. [Matt Auld] - Reused obj->userfault_count to avoid double addition. [Matt Auld] - Added i915_gem_object_lock to check i915_gem_object_is_lmem. [Matt Auld] v3: - Use i915_ttm_cpu_maps_iomem. [Matt Auld] - Fix 'ret == 0 to ret == VM_FAULT_NOPAGE'. [Matt Auld] - Reuse obj->userfault_count as a bool 0 or 1. [Matt Auld] - Delete the mmaped obj from lmem_userfault_list in obj destruction path. [Matt Auld] - Get a wakeref for object destruction patch. [Matt Auld] - Use intel_wakeref_auto to delay runtime PM. [Matt Auld] v4: - Avoid using mmo offset to get the vma_node. [Matt Auld] - Added comment to use the lmem_userfault_lock. [Matt Auld] - Get lmem_userfault_lock in i915_gem_object_release_mmap_offset. [Matt Auld] - Fixed kernel test robot generated warning. v5: - Addressed the cosmetics comments. [Andi] - Changed i915_gem_runtime_pm_object_release_mmap_offset() name to i915_gem_object_runtime_pm_release_mmap_offset() to be rhythmic. PCIe Specs 5.3.1.4.1 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6331 Cc: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220913152714.16541-3-anshuman.gupta@intel.com
|
#
5d36acb7 |
|
27-Jul-2022 |
Chris Wilson <chris.p.wilson@intel.com> |
drm/i915/gt: Batch TLB invalidations Invalidate TLB in batches, in order to reduce performance regressions. Currently, every caller performs a full barrier around a TLB invalidation, ignoring all other invalidations that may have already removed their PTEs from the cache. As this is a synchronous operation and can be quite slow, we cause multiple threads to contend on the TLB invalidate mutex blocking userspace. We only need to invalidate the TLB once after replacing our PTE to ensure that there is no possible continued access to the physical address before releasing our pages. By tracking a seqno for each full TLB invalidate we can quickly determine if one has been performed since rewriting the PTE, and only if necessary trigger one for ourselves. That helps to reduce the performance regression introduced by TLB invalidate logic. [mchehab: rebased to not require moving the code to a separate file] Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/4e97ef5deb6739cadaaf40aa45620547e9c4ec06.1658924372.git.mchehab@kernel.org
|
#
59eda6ce |
|
27-Jul-2022 |
Chris Wilson <chris.p.wilson@intel.com> |
drm/i915/gt: Batch TLB invalidations Invalidate TLB in batches, in order to reduce performance regressions. Currently, every caller performs a full barrier around a TLB invalidation, ignoring all other invalidations that may have already removed their PTEs from the cache. As this is a synchronous operation and can be quite slow, we cause multiple threads to contend on the TLB invalidate mutex blocking userspace. We only need to invalidate the TLB once after replacing our PTE to ensure that there is no possible continued access to the physical address before releasing our pages. By tracking a seqno for each full TLB invalidate we can quickly determine if one has been performed since rewriting the PTE, and only if necessary trigger one for ourselves. That helps to reduce the performance regression introduced by TLB invalidate logic. [mchehab: rebased to not require moving the code to a separate file] Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/4e97ef5deb6739cadaaf40aa45620547e9c4ec06.1658924372.git.mchehab@kernel.org (cherry picked from commit 5d36acb7198b0e5eb88e6b701f9ad7b9448f8df9) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
bfe53be2 |
|
29-Jun-2022 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/ttm: handle blitter failure on DG2 If the move or clear operation somehow fails, and the memory underneath is not cleared, like when moving to lmem, then we currently fallback to memcpy or memset. However with small-BAR systems this fallback might no longer be possible. For now we use the set_wedged sledgehammer if we ever encounter such a scenario, and mark the object as borked to plug any holes where access to the memory underneath can happen. Add some basic selftests to exercise this. v2: - In the selftests make sure we grab the runtime pm around the reset. Also make sure we grab the reset lock before checking if the device is wedged, since the wedge might still be in-progress and hence the bit might not be set yet. - Don't wedge or put the object into an unknown state, if the request construction fails (or similar). Just returning an error and skipping the fallback should be safe here. - Make sure we wedge each gt. (Thomas) - Peek at the unknown_state in io_reserve, that way we don't have to export or hand roll the fault_wait_for_idle. (Thomas) - Add the missing read-side barriers for the unknown_state. (Thomas) - Some kernel-doc fixes. (Thomas) v3: - Tweak the ordering of the set_wedged, also add FIXME. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-11-matthew.auld@intel.com
|
#
ecbf2060 |
|
15-Mar-2022 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/ttm: wire up the object offset For the ttm backend we can use existing placements fpfn and lpfn to force the allocator to place the object at the requested offset, potentially evicting stuff if the spot is currently occupied. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220315181425.576828-5-matthew.auld@intel.com
|
#
30b9d1b3 |
|
25-Feb-2022 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: add I915_BO_ALLOC_GPU_ONLY If the user doesn't require CPU access for the buffer, then ALLOC_GPU_ONLY should be used, in order to prioritise allocating in the non-mappable portion of LMEM, on devices with small BAR. v2(Thomas): - The BO_ALLOC_TOPDOWN naming here is poor, since this is pure lies on systems that don't even have small BAR. A better name is GPU_ONLY, which is accurate regardless of the configuration. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220225145502.331818-3-matthew.auld@intel.com
|
#
39a2bd34 |
|
10-Jan-2022 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Use the vma resource as argument for gtt binding / unbinding When introducing asynchronous unbinding, the vma itself may no longer be alive when the actual binding or unbinding takes place. Update the gtt i915_vma_ops accordingly to take a struct i915_vma_resource instead of a struct i915_vma for the bind_vma() and unbind_vma() ops. Similarly change the insert_entries() op for struct i915_address_space. Replace a couple of i915_vma_snapshot members with their newly introduced i915_vma_resource counterparts, since they have the same lifetime. Also make sure to avoid changing the struct i915_vma_flags (in particular the bind flags) async. That should now only be done sync under the vm mutex. v2: - Update the vma_res::bound_flags when binding to the aliased ggtt v6: - Remove I915_VMA_ALLOC_BIT (Matthew Auld) - Change some members of struct i915_vma_resource from unsigned long to u64 (Matthew Auld) v7: - Fix vma resource size parameters to be u64 rather than unsigned long (Matthew Auld) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220110172219.107131-3-thomas.hellstrom@linux.intel.com
|
#
903e0387 |
|
06-Jan-2022 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/ttm: add unmap_virtual callback Ensure we call ttm_bo_unmap_virtual when releasing the pages. Importantly this should now handle the ttm swapping case, and all other places that already call into i915_ttm_move_notify(). v2: fix up the selftest Fixes: cf3e3e86d779 ("drm/i915: Use ttm mmap handling for ttm bo's.") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220106174910.280616-3-matthew.auld@intel.com
|
#
ffa3fe08 |
|
15-Dec-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: clean up shrinker_release_pages Add some proper flags for the different modes, and shorten the name to something more snappy. Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211215110746.865-2-matthew.auld@intel.com
|
#
93544177 |
|
15-Dec-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: remove writeback hook Ditch the writeback hook and drop i915_gem_object_writeback(). We already support the shrinker_release_pages hook which can just call shmem_writeback directly. Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211215110746.865-1-matthew.auld@intel.com
|
#
7938d615 |
|
19-Oct-2021 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Flush TLBs before releasing backing store We need to flush TLBs before releasing backing store otherwise userspace is able to encounter stale entries if a) it is not declaring access to certain buffers and b) it races with the backing store release from a such undeclared execution already executing on the GPU in parallel. The approach taken is to mark any buffer objects which were ever bound to the GPU and to trigger a serialized TLB flush when their backing store is released. Alternatively the flushing could be done on VMA unbind, at which point we would be able to ascertain whether there is potential a parallel GPU execution (which could race), but essentially it boils down to paying the cost of TLB flushes potentially needlessly at VMA unbind time (when the backing store is not known to be going away so not needed for safety), versus potentially needlessly at backing store relase time (since we at that point cannot tell whether there is anything executing on the GPU which uses that object). Thereforce simplicity of implementation has been chosen for now with scope to benchmark and refine later as required. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reported-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Dave Airlie <airlied@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8ee262ba |
|
06-Jan-2022 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/ttm: add unmap_virtual callback Ensure we call ttm_bo_unmap_virtual when releasing the pages. Importantly this should now handle the ttm swapping case, and all other places that already call into i915_ttm_move_notify(). v2: fix up the selftest Fixes: cf3e3e86d779 ("drm/i915: Use ttm mmap handling for ttm bo's.") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220106174910.280616-3-matthew.auld@intel.com (cherry picked from commit 903e0387270eef14a711c0feb23b7bf62d2480df) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
004746e4 |
|
22-Nov-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915/ttm: Correctly handle waiting for gpu when shrinking With async migration, the shrinker may end up wanting to release the pages of an object while the migration blit is still running, since the GT migration code doesn't set up VMAs and the shrinker is thus oblivious to the fact that the GPU is still using the pages. Add waiting for gpu in the shrinker_release_pages() op and an argument to that function indicating whether the shrinker expects it to not wait for gpu. In the latter case the shrinker_release_pages() op will return -EBUSY if the object is not idle. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211122214554.371864-5-thomas.hellstrom@linux.intel.com
|
#
cad7109a |
|
01-Nov-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Introduce refcounted sg-tables As we start to introduce asynchronous failsafe object migration, where we update the object state and then submit asynchronous commands we need to record what memory resources are actually used by various part of the command stream. Initially for three purposes: 1) Error capture. 2) Asynchronous migration error recovery. 3) Asynchronous vma bind. At the time where these happens, the object state may have been updated to be several migrations ahead and object sg-tables discarded. In order to make it possible to keep sg-tables with memory resource information for these operations, introduce refcounted sg-tables that aren't freed until the last user is done with them. The alternative would be to reference information sitting on the corresponding ttm_resources which typically have the same lifetime as these refcountes sg_tables, but that leads to other awkward constructs: Due to the design direction chosen for ttm resource managers that would lead to diamond-style inheritance, the LMEM resources may sometimes be prematurely freed, and finally the subclassed struct ttm_resource would have to bleed into the asynchronous vma bind code. v3: - Address a number of style issues (Matthew Auld) v4: - Dont check for st->sgl being NULL in i915_ttm_tt__shmem_unpopulate(), that should never happen. (Matthew Auld) v5: - Fix a Potential double-free (Matthew Auld) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211101122444.114607-1-thomas.hellstrom@linux.intel.com
|
#
ebd4a8ec |
|
18-Oct-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/ttm: move shrinker management into adjust_lru We currently just evict lmem objects to system memory when under memory pressure. For this case we might lack the usual object mm.pages, which effectively hides the pages from the i915-gem shrinker, until we actually "attach" the TT to the object, or in the case of lmem-only objects it just gets migrated back to lmem when touched again. For all cases we can just adjust the i915 shrinker LRU each time we also adjust the TTM LRU. The two cases we care about are: 1) When something is moved by TTM, including when initially populating an object. Importantly this covers the case where TTM moves something from lmem <-> smem, outside of the normal get_pages() interface, which should still ensure the shmem pages underneath are reclaimable. 2) When calling into i915_gem_object_unlock(). The unlock should ensure the object is removed from the shinker LRU, if it was indeed swapped out, or just purged, when the shrinker drops the object lock. v2(Thomas): - Handle managing the shrinker LRU in adjust_lru, where it is always safe to touch the object. v3(Thomas): - Pretty much a re-write. This time piggy back off the shrink_pin stuff, which actually seems to fit quite well for what we want here. v4(Thomas): - Just use a simple boolean for tracking ttm_shrinkable. v5: - Ensure we call adjust_lru when faulting the object, to ensure the pages are visible to the shrinker, if needed. - Add back the adjust_lru when in i915_ttm_move (Thomas) v6(Reported-by: kernel test robot <lkp@intel.com>): - Remove unused i915_tt Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #v4 Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-6-matthew.auld@intel.com
|
#
e25d1ea4 |
|
18-Oct-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: add some kernel-doc for shrink_pin and friends Attempt to document shrink_pin and the other relevant interfaces that interact with it, before we start messing with it. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-5-matthew.auld@intel.com
|
#
7ae03459 |
|
18-Oct-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/ttm: add tt shmem backend For cached objects we can allocate our pages directly in shmem. This should make it possible(in a later patch) to utilise the existing i915-gem shrinker code for such objects. For now this is still disabled. v2(Thomas): - Add optional try_to_writeback hook for objects. Importantly we need to check if the object is even still shrinkable; in between us dropping the shrinker LRU lock and acquiring the object lock it could for example have been moved. Also we need to differentiate between "lazy" shrinking and the immediate writeback mode. Also later we need to handle objects which don't even have mm.pages, so bundling this into put_pages() would require somehow handling that edge case, hence just letting the ttm backend handle everything in try_to_writeback doesn't seem too bad. v3(Thomas): - Likely a bad idea to touch the object from the unpopulate hook, since it's not possible to hold a reference, without also creating circular dependency, so likely this is too fragile. For now just ensure we at least mark the pages as dirty/accessed when called from the shrinker on WILLNEED objects. - s/try_to_writeback/shrinker_release_pages, since this can do more than just writeback. - Get rid of do_backup boolean and just set the SWAPPED flag prior to calling unpopulate. - Keep shmem_tt as lowest priority for the TTM LRU bo_swapout walk, since these just get skipped anyway. We can try to come up with something better later. v4(Thomas): - s/PCI_DMA/DMA/. Also drop NO_KERNEL_MAPPING and NO_WARN, which apparently doesn't do anything with streaming mappings. - Just pass along the error for ->truncate, and assume nothing. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: Oak Zeng <oak.zeng@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Acked-by: Oak Zeng <oak.zeng@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-2-matthew.auld@intel.com
|
#
df94fd05 |
|
18-Oct-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: expand on the kernel-doc for cache_dirty Add some details around non-LLC platforms and cflushing, when dealing with the flush-on-acquire, which is potentially security sensitive. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-7-matthew.auld@intel.com
|
#
d3ac8d42 |
|
24-Sep-2021 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/pxp: interfaces for using protected objects This api allow user mode to create protected buffers and to mark contexts as making use of such objects. Only when using contexts marked in such a way is the execution guaranteed to work as expected. Contexts can only be marked as using protected content at creation time (i.e. the parameter is immutable) and they must be both bannable and not recoverable. Given that the protected session gets invalidated on suspend, contexts created this way hold a runtime pm wakeref until they're either destroyed or invalidated. All protected objects and contexts will be considered invalid when the PXP session is destroyed and all new submissions using them will be rejected. All intel contexts within the invalidated gem contexts will be marked banned. Userspace can detect that an invalidation has occurred via the RESET_STATS ioctl, where we report it the same way as a ban due to a hang. v5: squash patches, rebase on proto_ctx, update kerneldoc v6: rebase on obj create_ext changes v7: Use session counter to check if an object it valid, hold wakeref in context, don't add a new flag to RESET_STATS (Daniel) v8: don't increase guilty count for contexts banned during pxp invalidation (Rodrigo) v9: better comments, avoid wakeref put race between pxp_inval and context_close, add usage examples (Rodrigo) v10: modify internal set/get-protected-context functions to not return -ENODEV when setting PXP param to false or getting param when running on pxp-unsupported hw or getting param when i915 was built with CONFIG_PXP off Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210924191452.1539378-11-alan.previn.teres.alexis@intel.com
|
#
a259cc14 |
|
22-Sep-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Reduce the number of objects subject to memcpy recover We really only need memcpy restore for objects that affect the operability of the migrate context. That is, primarily the page-table objects of the migrate VM. Add an object flag, I915_BO_ALLOC_PM_EARLY for objects that need early restores using memcpy and a way to assign LMEM page-table object flags to be used by the vms. Restore objects without this flag with the gpu blitter and only objects carrying the flag using TTM memcpy. Initially mark the migrate, gt, gtt and vgpu vms to use this flag, and defer for a later audit which vms actually need it. Most importantly, user- allocated vms with pinned page-table objects can be restored using the blitter. Performance-wise memcpy restore is probably as fast as gpu restore if not faster, but using gpu restore will help tackling future restrictions in mappable LMEM size. v4: - Don't mark the aliasing ppgtt page table flags for early resume, but rather the ggtt page table flags as intended. (Matthew Auld) - The check for user buffer objects during early resume is pointless, since they are never marked I915_BO_ALLOC_PM_EARLY. (Matthew Auld) v5: - Mark GuC LMEM objects with I915_BO_ALLOC_PM_EARLY to have them restored before we fire up the migrate context. Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-8-thomas.hellstrom@linux.intel.com
|
#
0d8ee5ba |
|
22-Sep-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Don't back up pinned LMEM context images and rings during suspend Pinned context images are now reset during resume. Don't back them up, and assuming that rings can be assumed empty at suspend, don't back them up either. Introduce a new object flag, I915_BO_ALLOC_PM_VOLATILE meaning that an object is allowed to lose its content on suspend. v3: - Slight documentation clarification (Matthew Auld) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-7-thomas.hellstrom@linux.intel.com
|
#
c56ce956 |
|
22-Sep-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915 Implement LMEM backup and restore for suspend / resume Just evict unpinned objects to system. For pinned LMEM objects, make a backup system object and blit the contents to that. Backup is performed in three steps, 1: Opportunistically evict evictable objects using the gpu blitter. 2: After gt idle, evict evictable objects using the gpu blitter. This will be modified in an upcoming patch to backup pinned objects that are not used by the blitter itself. 3: Backup remaining pinned objects using memcpy. Also move uC suspend to after 2) to make sure we have a functional GuC during 2) if using GuC submission. v2: - Major refactor to make sure gem_exec_suspend@hang-SX subtests work, and suspend / resume works with a slightly modified GuC submission enabling patch series. v3: - Fix a potential use-after-free (Matthew Auld) - Use i915_gem_object_create_shmem() instead of i915_gem_object_create_region (Matthew Auld) - Minor simplifications (Matthew Auld) - Fix up kerneldoc for i195_ttm_restore_region(). - Final lmem_suspend() call moved to i915_gem_backup_suspend from i915_gem_suspend_late, since the latter gets called at driver unload and we don't unnecessarily want to run it at that time. v4: - Interface change of ttm- & lmem suspend / resume functions to use flags rather than bools. (Matthew Auld) - Completely drop the i915_gem_backup_suspend change (Matthew Auld) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-5-thomas.hellstrom@linux.intel.com
|
#
13d29c82 |
|
23-Jul-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/ehl: unconditionally flush the pages on acquire EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it possible for userspace to bypass the GTT caching bits set by the kernel, as per the given object cache_level. This is troublesome since the heavy flush we apply when first acquiring the pages is skipped if the kernel thinks the object is coherent with the GPU. As a result it might be possible to bypass the cache and read the contents of the page directly, which could be stale data. If it's just a case of userspace shooting themselves in the foot then so be it, but since i915 takes the stance of always zeroing memory before handing it to userspace, we need to prevent this. v2: this time actually set cache_dirty in put_pages() v3: move to get_pages() which looks simpler BSpec: 34007 References: 046091758b50 ("Revert "drm/i915/ehl: Update MOCS table for EHL"") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com> Cc: Francisco Jerez <francisco.jerez.plata@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris.p.wilson@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210723105045.400841-2-matthew.auld@intel.com
|
#
3821cc7f |
|
23-Jul-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: document caching related bits Try to document the object caching related bits, like cache_coherent and cache_dirty. v2(Ville): - As pointed out by Ville, fix the completely incorrect assumptions about the "partial" coherency on shared LLC platforms. v3(Daniel): - Fix nonsense about "dirtying" the cache with reads. v4(Daniel): - Various improvements, including adding some more details for WT. Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210723105045.400841-1-matthew.auld@intel.com
|
#
7961c5b6 |
|
14-Jul-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Add TTM offset argument to mmap. The FIXED mapping is only used for ttm, and tells userspace that the mapping type is pre-defined. This disables the other type of mmap offsets when discrete memory is used, so fix the selftests as well. Document the struct as well, so it shows up in docbook. Cc: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> [mauld: Included minor fixes from the review comments] Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210714122833.766586-1-maarten.lankhorst@linux.intel.com
|
#
b6e913e1 |
|
29-Jun-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915/gem: Implement object migration Introduce an interface to migrate objects between regions. This is primarily intended to migrate objects to LMEM for display and to SYSTEM for dma-buf, but might be reused in one form or another for performance-based migration. v2: - Verify that the memory region given as an id really exists. (Reported by Matthew Auld) - Call i915_gem_object_{init,release}_memory_region() when switching region to handle also switching region lists. (Reported by Matthew Auld) v3: - Fix i915_gem_object_can_migrate() to return true if object is already in the correct region, even if the object ops doesn't have a migrate() callback. - Update typo in commit message. - Fix kerneldoc of i915_gem_object_wait_migration(). v4: - Improve documentation (Suggested by Mattew Auld and Michael Ruhl) - Always assume TTM migration hits a TTM move and unsets the pages through move_notify. (Reported by Matthew Auld) - Add a dma_fence_might_wait() annotation to i915_gem_object_wait_migration() (Suggested by Daniel Vetter) v5: - Re-add might_sleep() instead of __dma_fence_might_wait(), Sent v4 with the wrong version, didn't compile and __dma_fence_might_wait() is not exported. - Added an R-B. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210629151203.209465-2-thomas.hellstrom@linux.intel.com
|
#
0ff37575 |
|
24-Jun-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Update object placement flags to be mutable The object ops i915_GEM_OBJECT_HAS_IOMEM and the object I915_BO_ALLOC_STRUCT_PAGE flags are considered immutable by much of our code. Introduce a new mem_flags member to hold these and make sure checks for these flags being set are either done under the object lock or with pages properly pinned. The flags will change during migration under the object lock. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210624084240.270219-2-thomas.hellstrom@linux.intel.com
|
#
687c7d0f |
|
16-Jun-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/ttm: remove node usage in our naming Now that ttm_resource_manager just returns a generic ttm_resource we don't need to reference the mm_node stuff anymore which mostly only makes sense for drm_mm_node. In the next few patches we want switch over to the ttm_buddy_man which is just another type of ttm_resource so reflect that in the naming. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210616152501.394518-5-matthew.auld@intel.com
|
#
cf3e3e86d |
|
10-Jun-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Use ttm mmap handling for ttm bo's. Use the ttm handlers for servicing page faults, and vm_access. We do our own validation of read-only access, otherwise use the ttm handlers as much as possible. Because the ttm handlers expect the vma_node at vma->base, we slightly need to massage the mmap handlers to look at vma_node->driver_private to fetch the bo, if it's NULL, we assume i915's normal mmap_offset uapi is used. This is the easiest way to achieve compatibility without changing ttm's semantics. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210610070152.572423-5-thomas.hellstrom@linux.intel.com
|
#
213d5092 |
|
10-Jun-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915/ttm: Introduce a TTM i915 gem object backend Most logical place to introduce TTM buffer objects is as an i915 gem object backend. We need to add some ops to account for added functionality like delayed delete and LRU list manipulation. Initially we support only LMEM and SYSTEM memory, but SYSTEM (which in this case means evicted LMEM objects) is not visible to i915 GEM yet. The plan is to move the i915 gem system region over to the TTM system memory type in upcoming patches. We set up GPU bindings directly both from LMEM and from the system region, as there is no need to use the legacy TTM_TT memory type. We reserve that for future porting of GGTT bindings to TTM. Remove the old lmem backend. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210610070152.572423-2-thomas.hellstrom@linux.intel.com
|
#
f4db23f2 |
|
02-Jun-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915/ttm: Embed a ttm buffer object in the i915 gem object Embed a struct ttm_buffer_object into the i915 gem object, making sure we alias the gem object part. It's a bit unfortunate that the struct ttm_buffer_ojbect embeds a gem object since we otherwise could make the TTM part private to the TTM backend, and use the usual i915 gem object for the other backends. To make this a bit more storage efficient for the other backends, we'd have to use a pointer for the gem object which would require a lot of changes in the driver. We postpone that for later. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210602083818.241793-3-thomas.hellstrom@linux.intel.com
|
#
d1487389 |
|
02-Jun-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915/ttm Initialize the ttm device and memory managers Temporarily remove the buddy allocator and related selftests and hook up the TTM range manager for i915 regions. Also modify the mock region selftests somewhat to account for a fragmenting manager. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210602083818.241793-2-thomas.hellstrom@linux.intel.com
|
#
4d8151ae |
|
01-Jun-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Don't free shared locks while shared We are currently sharing the VM reservation locks across a number of gem objects with page-table memory. Since TTM will individiualize the reservation locks when freeing objects, including accessing the shared locks, make sure that the shared locks are not freed until that is done. For PPGTT we add an additional refcount, for GGTT we take additional measures to make sure objects sharing the GGTT reservation lock are freed at GGTT takedown Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210601074654.3103-3-thomas.hellstrom@linux.intel.com
|
#
4f869f1d |
|
29-Apr-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/lmem: support optional CPU clearing for special internal use For some internal device local-memory objects it would be useful to have an option to CPU clear the pages upon gathering the backing store. Note that this might be before the blitter is useable, which is the case for some internal GuC objects. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Dave Airlie <airlied@gmail.com> Cc: dri-devel@lists.freedesktop.org Cc: mesa-dev@lists.freedesktop.org Acked-by: Kenneth Graunke <kenneth@whitecape.org> Link: https://patchwork.freedesktop.org/patch/msgid/20210429103056.407067-7-matthew.auld@intel.com
|
#
2459e56f |
|
29-Apr-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/uapi: implement object placement extension Add new extension to support setting an immutable-priority-list of potential placements, at creation time. If we use the normal gem_create or gem_create_ext without the extensions/placements then we still get the old behaviour with only placing the object in system memory. v2(Daniel & Jason): - Add a bunch of kernel-doc - Simplify design for placements extension Testcase: igt/gem_create/create-ext-placement-sanity-check Testcase: igt/gem_create/create-ext-placement-each Testcase: igt/gem_create/create-ext-placement-all Signed-off-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: CQ Tang <cq.tang@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com> Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Dave Airlie <airlied@gmail.com> Cc: dri-devel@lists.freedesktop.org Cc: mesa-dev@lists.freedesktop.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Link: https://patchwork.freedesktop.org/patch/msgid/20210429103056.407067-6-matthew.auld@intel.com
|
#
cf41a8f1 |
|
23-Mar-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Finally remove obj->mm.lock. With all callers and selftests fixed to use ww locking, we can now finally remove this lock. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-62-maarten.lankhorst@linux.intel.com
|
#
ed29c269 |
|
23-Mar-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v7. Instead of doing what we do currently, which will never work with PROVE_LOCKING, do the same as AMD does, and something similar to relocation slowpath. When all locks are dropped, we acquire the pages for pinning. When the locks are taken, we transfer those pages in .get_pages() to the bo. As a final check before installing the fences, we ensure that the mmu notifier was not called; if it is, we return -EAGAIN to userspace to signal it has to start over. Changes since v1: - Unbinding is done in submit_init only. submit_begin() removed. - MMU_NOTFIER -> MMU_NOTIFIER Changes since v2: - Make i915->mm.notifier a spinlock. Changes since v3: - Add WARN_ON if there are any page references left, should have been 0. - Return 0 on success in submit_init(), bug from spinlock conversion. - Release pvec outside of notifier_lock (Thomas). Changes since v4: - Mention why we're clearing eb->[i + 1].vma in the code. (Thomas) - Actually check all invalidations in eb_move_to_gpu. (Thomas) - Do not wait when process is exiting to fix gem_ctx_persistence.userptr. Changes since v5: - Clarify why check on PF_EXITING is (temporarily) required. Changes since v6: - Ensure userptr validity is checked in set_domain through a special path. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Dave Airlie <airlied@redhat.com> [danvet: s/kfree/kvfree/ in i915_gem_object_userptr_drop_ref in the previous review round, but which got lost. The other open questions around page refcount are imo better discussed in a separate series, with amdgpu folks involved]. Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-17-maarten.lankhorst@linux.intel.com
|
#
20ee27bd |
|
23-Mar-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Make compilation of userptr code depend on MMU_NOTIFIER. Now that unsynchronized mappings are removed, the only time userptr works is when the MMU notifier is enabled. Put all of the userptr code behind a mmu notifier ifdef. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-16-maarten.lankhorst@linux.intel.com
|
#
c471748d |
|
23-Mar-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Move HAS_STRUCT_PAGE to obj->flags We want to remove the changing of ops structure for attaching phys pages, so we need to kill off HAS_STRUCT_PAGE from ops->flags, and put it in the bo. This will remove a potential race of dereferencing the wrong obj->ops without ww mutex held. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> [danvet: apply with wiggle] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-8-maarten.lankhorst@linux.intel.com
|
#
0175969e |
|
19-Jan-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Use shrinkable status for unknown swizzle quirks Give obj->mm.quirked a name much more reflective of its purpose (i915_gem_object_has_tiling_quirk) and move it from the obj->mm field as it doesn't denote a quirk of the backing store, but a quirk in the object in its treatment of the backing pages, similar to tiling modes. Then instead of abusing the pinned status of the buffer to protect it from the shrinker, we can instead hide the buffer from the shrinker so it is never considered for being swapped. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210119214336.1463-4-chris@chris-wilson.co.uk
|
#
41a9c75d |
|
19-Jan-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Move stolen node into GEM object union The obj->stolen is currently used to identify an object allocated from stolen memory. This dates back to when there were just 1.5 types of objects, an object backed by shmemfs and an object backed by shmemfs with a contiguous physical address. Now that we have several different types of objects, we no longer want to treat stolen objects as a special case. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210119214336.1463-3-chris@chris-wilson.co.uk
|
#
e2f4367a |
|
19-Jan-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: move i915_map_type into i915_gem_object_types.h Looks like it belongs there anyway, otherwise we have to include the entirety of i915_gem_object.h just to get at the enum. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20210119133106.66294-3-matthew.auld@intel.com
|
#
0a1db6f0 |
|
05-Nov-2020 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/gem: Allow backends to override pread implementation As there are more and more complicated interactions between the different backing stores and userspace, push the control into the backends rather than accumulate them all inside the ioctl handlers. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201105154934.16022-1-chris@chris-wilson.co.uk (cherry picked from commit 0049b688459b846f819b6e51c24cd0781fcfde41) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
0049b688 |
|
05-Nov-2020 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/gem: Allow backends to override pread implementation As there are more and more complicated interactions between the different backing stores and userspace, push the control into the backends rather than accumulate them all inside the ioctl handlers. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201105154934.16022-1-chris@chris-wilson.co.uk
|
#
934941ed |
|
06-Oct-2020 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Fix DMA mapped scatterlist lookup As the previous patch fixed the places where we walk the whole scatterlist for DMA addresses, this patch fixes the random lookup functionality. To achieve this we have to add a second lookup iterator and add a i915_gem_object_get_sg_dma helper, to be used analoguous to existing i915_gem_object_get_sg_dma. Therefore two lookup caches are maintained per object and they are flushed at the same point for simplicity. (Strictly speaking the DMA cache should be flushed from i915_gem_gtt_finish_pages, but today this conincides with unsetting of the pages in general.) Partial VMA view is then fixed to use the new DMA lookup and properly query sg length. v2: * Checkpatch. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Tom Murphy <murphyt7@tcd.ie> Cc: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201006092508.1064287-2-tvrtko.ursulin@linux.intel.com
|
#
80f0b679 |
|
19-Aug-2020 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2. i915_gem_ww_ctx is used to lock all gem bo's for pinning and memory eviction. We don't use it yet, but lets start adding the definition first. To use it, we have to pass a non-NULL ww to gem_object_lock, and don't unlock directly. It is done in i915_gem_ww_ctx_fini. Changes since v1: - Change ww_ctx and obj order in locking functions (Jonas Lahtinen) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-6-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
89351925 |
|
29-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Switch to object allocations for page directories The GEM object is grossly overweight for the practicality of tracking large numbers of individual pages, yet it is currently our only abstraction for tracking DMA allocations. Since those allocations need to be reserved upfront before an operation, and that we need to break away from simple system memory, we need to ditch using plain struct page wrappers. In the process, we drop the WC mapping as we ended up clflushing everything anyway due to various issues across a wider range of platforms. Though in a future step, we need to drop the kmap_atomic approach which suggests we need to pre-map all the pages and keep them mapped. v2: Verify our large scratch page is suitably DMA aligned; and manually clear the scratch since we are allocating plain struct pages full of prior content. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-2-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
096a42dd |
|
01-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Move obj->lut_list under its own lock The obj->lut_list is traversed when the object is closed as the file table is destroyed during process termination. As this occurs before we kill any outstanding context if, due to some bug or another, the closure is blocked, then we fail to shootdown any inflight operations potentially leaving the GPU spinning forever. As we only need to guard the list against concurrent closures and insertions, the hold is short and merits being treated as a simple spinlock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200701084439.17025-1-chris@chris-wilson.co.uk
|
#
7d192daa |
|
29-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Give each object class a friendly name Name the object classes and their offspring for easier lockdep debugging. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200529183204.16850-2-chris@chris-wilson.co.uk
|
#
9da0ea09 |
|
01-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Drop cached obj->bind_count We cached the number of vma bound to the object in order to speed up shrinker decisions. This has been superseded by being more proactive in removing objects we cannot shrink from the shrinker lists, and so we can drop the clumsy attempt at atomically counting the bind count and comparing it to the number of pinned mappings of the object. This will only get more clumsier with asynchronous binding and unbinding. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200401223924.16667-1-chris@chris-wilson.co.uk
|
#
aa314619 |
|
02-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Wean off drm_pci_alloc/drm_pci_free drm_pci_alloc and drm_pci_free are just very thin wrappers around dma_alloc_coherent, with a note that we should be removing them. Furthermore since commit de09d31dd38a50fdce106c15abd68432eebbd014 Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Date: Fri Jan 15 16:51:42 2016 -0800 page-flags: define PG_reserved behavior on compound pages As far as I can see there's no users of PG_reserved on compound pages. Let's use PF_NO_COMPOUND here. drm_pci_alloc has been declared broken since it mixes GFP_COMP and SetPageReserved. Avoid this conflict by weaning ourselves off using the abstraction and using the dma functions directly. Reported-by: Taketo Kabe Closes: https://gitlab.freedesktop.org/drm/intel/issues/1027 Fixes: de09d31dd38a ("page-flags: define PG_reserved behavior on compound pages") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: <stable@vger.kernel.org> # v4.5+ Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200202153934.3899472-1-chris@chris-wilson.co.uk (cherry picked from commit c6790dc22312f592c1434577258b31c48c72d52a) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
07ccd6bd |
|
20-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Store mmap_offsets in an rbtree rather than a plain list Currently we create a new mmap_offset for every call to mmap_offset_ioctl. This exposes ourselves to an abusive client that may simply create new mmap_offsets ad infinitum, which will exhaust physical memory and the virtual address space. In addition to the exhaustion, a very long linear list of mmap_offsets causes other clients using the object to incur long list walks -- these long lists can also be generated by simply having many clients generate their own mmap_offset. However, we can simply use the drm_vma_node itself to manage the file association (allow/revoke) dropping our need to keep an mmo per-file. Then if we keep a small rbtree of per-type mmap_offsets, we can lookup duplicate requests quickly. Fixes: cc662126b413 ("drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200120104924.4000706-3-chris@chris-wilson.co.uk (cherry picked from commit 7865559872074a9ab169c87915504661d630addf) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
f6c26b55 |
|
04-Feb-2020 |
Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> |
drm/i915: Never allow userptr into the new mapping types Commit 4f2a572eda67 ("drm/i915/userptr: Never allow userptr into the mappable GGTT") made I915_GEM_MMAP_GTT IOCTLs to fail when attempted on a userptr object in order to protect from a lockdep splat. Later on, new mapping types were introduced by commit cc662126b413 ("drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET"). Those new mapping types suffer from the same lockdep splat issue but they now succeed when tried on top of a userptr object. Fix it. v2: Don't play with the -ENODEV driver response (Chris) Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200204162302.1299516-1-chris@chris-wilson.co.uk
|
#
c6790dc2 |
|
02-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Wean off drm_pci_alloc/drm_pci_free drm_pci_alloc and drm_pci_free are just very thin wrappers around dma_alloc_coherent, with a note that we should be removing them. Furthermore since commit de09d31dd38a50fdce106c15abd68432eebbd014 Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Date: Fri Jan 15 16:51:42 2016 -0800 page-flags: define PG_reserved behavior on compound pages As far as I can see there's no users of PG_reserved on compound pages. Let's use PF_NO_COMPOUND here. drm_pci_alloc has been declared broken since it mixes GFP_COMP and SetPageReserved. Avoid this conflict by weaning ourselves off using the abstraction and using the dma functions directly. Reported-by: Taketo Kabe Closes: https://gitlab.freedesktop.org/drm/intel/issues/1027 Fixes: de09d31dd38a ("page-flags: define PG_reserved behavior on compound pages") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: <stable@vger.kernel.org> # v4.5+ Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200202153934.3899472-1-chris@chris-wilson.co.uk
|
#
78655598 |
|
20-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Store mmap_offsets in an rbtree rather than a plain list Currently we create a new mmap_offset for every call to mmap_offset_ioctl. This exposes ourselves to an abusive client that may simply create new mmap_offsets ad infinitum, which will exhaust physical memory and the virtual address space. In addition to the exhaustion, a very long linear list of mmap_offsets causes other clients using the object to incur long list walks -- these long lists can also be generated by simply having many clients generate their own mmap_offset. However, we can simply use the drm_vma_node itself to manage the file association (allow/revoke) dropping our need to keep an mmo per-file. Then if we keep a small rbtree of per-type mmap_offsets, we can lookup duplicate requests quickly. Fixes: cc662126b413 ("drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200120104924.4000706-3-chris@chris-wilson.co.uk
|
#
e85ade1f |
|
18-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Hold reference to intel_frontbuffer as we track activity Since obj->frontbuffer is no longer protected by the struct_mutex, as we are processing the execbuf, it may be removed. Mark the intel_frontbuffer as rcu protected, and so acquire a reference to the struct as we track activity upon it. Closes: https://gitlab.freedesktop.org/drm/intel/issues/827 Fixes: 8e7cb1799b4f ("drm/i915: Extract intel_frontbuffer active tracking") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191218104043.3539458-1-chris@chris-wilson.co.uk (cherry picked from commit da42104f589d979bbe402703fd836cec60befae1) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
da42104f |
|
18-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Hold reference to intel_frontbuffer as we track activity Since obj->frontbuffer is no longer protected by the struct_mutex, as we are processing the execbuf, it may be removed. Mark the intel_frontbuffer as rcu protected, and so acquire a reference to the struct as we track activity upon it. Closes: https://gitlab.freedesktop.org/drm/intel/issues/827 Fixes: 8e7cb1799b4f ("drm/i915: Extract intel_frontbuffer active tracking") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191218104043.3539458-1-chris@chris-wilson.co.uk
|
#
cc662126 |
|
03-Dec-2019 |
Abdiel Janulgue <abdiel.janulgue@linux.intel.com> |
drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET This is really just an alias of mmap_gtt. The 'mmap offset' nomenclature comes from the value returned by this ioctl which is the offset into the device fd which userpace uses with mmap(2). mmap_gtt was our initial mmap_offset implementation, this extends our CPU mmap support to allow additional fault handlers that depends on the object's backing pages. Note that we multiplex mmap_gtt and mmap_offset through the same ioctl, and use the zero extending behaviour of drm to differentiate between them, when we inspect the flags. To support multiple mmap types on an object we need to support multiple mmap_offsets for an object (each offset in the global device address space corresponding to a unique instance of the object for a file + mmap type). As we drop the simplified drm core idea of a single mmap_offset, we need to provide replacement hooks for the dumb mmap interface as well. Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1675 Testcase: igt/gem_mmap_offset Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191204120032.3682839-1-chris@chris-wilson.co.uk
|
#
f86dbacb |
|
05-Nov-2019 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/i915: Switch obj->mm.lock lockdep annotations on its head The trouble with having a plain nesting flag for locks which do not naturally nest (unlike block devices and their partitions, which is the original motivation for nesting levels) is that lockdep will never spot a true deadlock if you screw up. This patch is an attempt at trying better, by highlighting a bit more of the actual nature of the nesting that's going on. Essentially we have two kinds of objects: - objects without pages allocated, which cannot be on any lru and are hence inaccessible to the shrinker. - objects which have pages allocated, which are on an lru, and which the shrinker can decide to throw out. For the former type of object, memory allocations while holding obj->mm.lock are permissible. For the latter they are not. And get/put_pages transitions between the two types of objects. This is still not entirely fool-proof since the rules might change. But as long as we run such a code ever at runtime lockdep should be able to observe the inconsistency and complain (like with any other lockdep class that we've split up in multiple classes). But there are a few clear benefits: - We can drop the nesting flag parameter from __i915_gem_object_put_pages, because that function by definition is never going allocate memory, and calling it on an object which doesn't have its pages allocated would be a bug. - We strictly catch more bugs, since there's not only one place in the entire tree which is annotated with the special class. All the other places that had explicit lockdep nesting annotations we're now going to leave up to lockdep again. - Specifically this catches stuff like calling get_pages from put_pages (which isn't really a good idea, if we can call get_pages so could the shrinker). I've seen patches do exactly that. Of course I fully expect CI will show me for the fool I am with this one here :-) v2: There can only be one (lockdep only has a cache for the first subclass, not for deeper ones, and we don't want to make these locks even slower). Still separate enums for better documentation. Real fix: don't forget about phys objs and pin_map(), and fix the shrinker to have the right annotations ... silly me. v3: Forgot usertptr too ... v4: Improve comment for pages_pin_count, drop the IMPORTANT comment and instead prime lockdep (Chris). v5: Appease checkpatch, no double empty lines (Chris) v6: More rebasing over selftest changes. Also somehow I forgot to push this patch :-/ Also format comments consistently while at it. v7: Fix typo in commit message (Joonas) Also drop the priming, with the lmem merge we now have allocations while holding the lmem lock, which wreaks the generic priming I've done in earlier patches. Should probably be resurrected when lmem is fixed. See commit 232a6ebae419193f5b8da4fa869ae5089ab105c2 Author: Matthew Auld <matthew.auld@intel.com> Date: Tue Oct 8 17:01:14 2019 +0100 drm/i915: introduce intel_memory_region I'm keeping the priming patch locally so it wont get lost. Cc: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Tang, CQ" <cq.tang@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5) Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v6) Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191105090148.30269-1-daniel.vetter@ffwll.ch [mlankhorst: Fix commit typos pointed out by Michael Ruhl]
|
#
01377a0d |
|
25-Oct-2019 |
Abdiel Janulgue <abdiel.janulgue@linux.intel.com> |
drm/i915/lmem: support kernel mapping We can create LMEM objects, but we also need to support mapping them into kernel space for internal use. Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Steve Hampson <steven.t.hampson@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191025153728.23689-3-chris@chris-wilson.co.uk
|
#
4f2a572e |
|
28-Sep-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/userptr: Never allow userptr into the mappable GGTT Daniel Vetter uncovered a nasty cycle in using the mmu-notifiers to invalidate userptr objects which also happen to be pulled into GGTT mmaps. That is when we unbind the userptr object (on mmu invalidation), we revoke all CPU mmaps, which may then recurse into mmu invalidation. We looked for ways of breaking the cycle, but the revocation on invalidation is required and cannot be avoided. The only solution we could see was to not allow such GGTT bindings of userptr objects in the first place. In practice, no one really wants to use a GGTT mmapping of a CPU pointer... Just before Daniel's explosive lockdep patches land in v5.4-rc1, we got a genuine blip from CI: <4>[ 246.793958] ====================================================== <4>[ 246.793972] WARNING: possible circular locking dependency detected <4>[ 246.793989] 5.3.0-gbd6c56f50d15-drmtip_372+ #1 Tainted: G U <4>[ 246.794003] ------------------------------------------------------ <4>[ 246.794017] kswapd0/145 is trying to acquire lock: <4>[ 246.794030] 000000003f565be6 (&dev->struct_mutex/1){+.+.}, at: userptr_mn_invalidate_range_start+0x18f/0x220 [i915] <4>[ 246.794250] but task is already holding lock: <4>[ 246.794263] 000000001799cef9 (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe6/0x2a0 <4>[ 246.794291] which lock already depends on the new lock. <4>[ 246.794307] the existing dependency chain (in reverse order) is: <4>[ 246.794322] -> #3 (&anon_vma->rwsem){++++}: <4>[ 246.794344] down_write+0x33/0x70 <4>[ 246.794357] __vma_adjust+0x3d9/0x7b0 <4>[ 246.794370] __split_vma+0x16a/0x180 <4>[ 246.794385] mprotect_fixup+0x2a5/0x320 <4>[ 246.794399] do_mprotect_pkey+0x208/0x2e0 <4>[ 246.794413] __x64_sys_mprotect+0x16/0x20 <4>[ 246.794429] do_syscall_64+0x55/0x1c0 <4>[ 246.794443] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 246.794456] -> #2 (&mapping->i_mmap_rwsem){++++}: <4>[ 246.794478] down_write+0x33/0x70 <4>[ 246.794493] unmap_mapping_pages+0x48/0x130 <4>[ 246.794519] i915_vma_revoke_mmap+0x81/0x1b0 [i915] <4>[ 246.794519] i915_vma_unbind+0x11d/0x4a0 [i915] <4>[ 246.794519] i915_vma_destroy+0x31/0x300 [i915] <4>[ 246.794519] __i915_gem_free_objects+0xb8/0x4b0 [i915] <4>[ 246.794519] drm_file_free.part.0+0x1e6/0x290 <4>[ 246.794519] drm_release+0xa6/0xe0 <4>[ 246.794519] __fput+0xc2/0x250 <4>[ 246.794519] task_work_run+0x82/0xb0 <4>[ 246.794519] do_exit+0x35b/0xdb0 <4>[ 246.794519] do_group_exit+0x34/0xb0 <4>[ 246.794519] __x64_sys_exit_group+0xf/0x10 <4>[ 246.794519] do_syscall_64+0x55/0x1c0 <4>[ 246.794519] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 246.794519] -> #1 (&vm->mutex){+.+.}: <4>[ 246.794519] i915_gem_shrinker_taints_mutex+0x6d/0xe0 [i915] <4>[ 246.794519] i915_address_space_init+0x9f/0x160 [i915] <4>[ 246.794519] i915_ggtt_init_hw+0x55/0x170 [i915] <4>[ 246.794519] i915_driver_probe+0xc9f/0x1620 [i915] <4>[ 246.794519] i915_pci_probe+0x43/0x1b0 [i915] <4>[ 246.794519] pci_device_probe+0x9e/0x120 <4>[ 246.794519] really_probe+0xea/0x3d0 <4>[ 246.794519] driver_probe_device+0x10b/0x120 <4>[ 246.794519] device_driver_attach+0x4a/0x50 <4>[ 246.794519] __driver_attach+0x97/0x130 <4>[ 246.794519] bus_for_each_dev+0x74/0xc0 <4>[ 246.794519] bus_add_driver+0x13f/0x210 <4>[ 246.794519] driver_register+0x56/0xe0 <4>[ 246.794519] do_one_initcall+0x58/0x300 <4>[ 246.794519] do_init_module+0x56/0x1f6 <4>[ 246.794519] load_module+0x25bd/0x2a40 <4>[ 246.794519] __se_sys_finit_module+0xd3/0xf0 <4>[ 246.794519] do_syscall_64+0x55/0x1c0 <4>[ 246.794519] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 246.794519] -> #0 (&dev->struct_mutex/1){+.+.}: <4>[ 246.794519] __lock_acquire+0x15d8/0x1e90 <4>[ 246.794519] lock_acquire+0xa6/0x1c0 <4>[ 246.794519] __mutex_lock+0x9d/0x9b0 <4>[ 246.794519] userptr_mn_invalidate_range_start+0x18f/0x220 [i915] <4>[ 246.794519] __mmu_notifier_invalidate_range_start+0x85/0x110 <4>[ 246.794519] try_to_unmap_one+0x76b/0x860 <4>[ 246.794519] rmap_walk_anon+0x104/0x280 <4>[ 246.794519] try_to_unmap+0xc0/0xf0 <4>[ 246.794519] shrink_page_list+0x561/0xc10 <4>[ 246.794519] shrink_inactive_list+0x220/0x440 <4>[ 246.794519] shrink_node_memcg+0x36e/0x740 <4>[ 246.794519] shrink_node+0xcb/0x490 <4>[ 246.794519] balance_pgdat+0x241/0x580 <4>[ 246.794519] kswapd+0x16c/0x530 <4>[ 246.794519] kthread+0x119/0x130 <4>[ 246.794519] ret_from_fork+0x24/0x50 <4>[ 246.794519] other info that might help us debug this: <4>[ 246.794519] Chain exists of: &dev->struct_mutex/1 --> &mapping->i_mmap_rwsem --> &anon_vma->rwsem <4>[ 246.794519] Possible unsafe locking scenario: <4>[ 246.794519] CPU0 CPU1 <4>[ 246.794519] ---- ---- <4>[ 246.794519] lock(&anon_vma->rwsem); <4>[ 246.794519] lock(&mapping->i_mmap_rwsem); <4>[ 246.794519] lock(&anon_vma->rwsem); <4>[ 246.794519] lock(&dev->struct_mutex/1); <4>[ 246.794519] *** DEADLOCK *** v2: Say no to mmap_ioctl Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111744 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111870 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@vger.kernel.org Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190928082546.3473-1-chris@chris-wilson.co.uk (cherry picked from commit a4311745bba9763e3c965643d4531bd5765b0513) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
7c98501a |
|
08-Oct-2019 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/region: support volatile objects Volatile objects are marked as DONTNEED while pinned, therefore once unpinned the backing store can be discarded. This is limited to kernel internal objects. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: CQ Tang <cq.tang@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191008160116.18379-4-matthew.auld@intel.com
|
#
2f0b97ca |
|
08-Oct-2019 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/region: support contiguous allocations Some kernel internal objects may need to be allocated as a contiguous block, also thinking ahead the various kernel io_mapping interfaces seem to expect it, although this is purely a limitation in the kernel API...so perhaps something to be improved. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Cc: Michael J Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191008160116.18379-3-matthew.auld@intel.com
|
#
232a6eba |
|
08-Oct-2019 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: introduce intel_memory_region Support memory regions, as defined by a given (start, end), and allow creating GEM objects which are backed by said region. The immediate goal here is to have something to represent our device memory, but later on we also want to represent every memory domain with a region, so stolen, shmem, and of course device. At some point we are probably going to want use a common struct here, such that we are better aligned with say TTM. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191008160116.18379-2-matthew.auld@intel.com
|
#
b1e3177b |
|
04-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Coordinate i915_active with its own mutex Forgo the struct_mutex serialisation for i915_active, and interpose its own mutex handling for active/retire. This is a multi-layered sleight-of-hand. First, we had to ensure that no active/retire callbacks accidentally inverted the mutex ordering rules, nor assumed that they were themselves serialised by struct_mutex. More challenging though, is the rule over updating elements of the active rbtree. Instead of the whole i915_active now being serialised by struct_mutex, allocations/rotations of the tree are serialised by the i915_active.mutex and individual nodes are serialised by the caller using the i915_timeline.mutex (we need to use nested spinlocks to interact with the dma_fence callback lists). The pain point here is that instead of a single mutex around execbuf, we now have to take a mutex for active tracker (one for each vma, context, etc) and a couple of spinlocks for each fence update. The improvement in fine grained locking allowing for multiple concurrent clients (eventually!) should be worth it in typical loads. v2: Add some comments that barely elucidate anything :( Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-6-chris@chris-wilson.co.uk
|
#
a4311745 |
|
28-Sep-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/userptr: Never allow userptr into the mappable GGTT Daniel Vetter uncovered a nasty cycle in using the mmu-notifiers to invalidate userptr objects which also happen to be pulled into GGTT mmaps. That is when we unbind the userptr object (on mmu invalidation), we revoke all CPU mmaps, which may then recurse into mmu invalidation. We looked for ways of breaking the cycle, but the revocation on invalidation is required and cannot be avoided. The only solution we could see was to not allow such GGTT bindings of userptr objects in the first place. In practice, no one really wants to use a GGTT mmapping of a CPU pointer... Just before Daniel's explosive lockdep patches land in v5.4-rc1, we got a genuine blip from CI: <4>[ 246.793958] ====================================================== <4>[ 246.793972] WARNING: possible circular locking dependency detected <4>[ 246.793989] 5.3.0-gbd6c56f50d15-drmtip_372+ #1 Tainted: G U <4>[ 246.794003] ------------------------------------------------------ <4>[ 246.794017] kswapd0/145 is trying to acquire lock: <4>[ 246.794030] 000000003f565be6 (&dev->struct_mutex/1){+.+.}, at: userptr_mn_invalidate_range_start+0x18f/0x220 [i915] <4>[ 246.794250] but task is already holding lock: <4>[ 246.794263] 000000001799cef9 (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe6/0x2a0 <4>[ 246.794291] which lock already depends on the new lock. <4>[ 246.794307] the existing dependency chain (in reverse order) is: <4>[ 246.794322] -> #3 (&anon_vma->rwsem){++++}: <4>[ 246.794344] down_write+0x33/0x70 <4>[ 246.794357] __vma_adjust+0x3d9/0x7b0 <4>[ 246.794370] __split_vma+0x16a/0x180 <4>[ 246.794385] mprotect_fixup+0x2a5/0x320 <4>[ 246.794399] do_mprotect_pkey+0x208/0x2e0 <4>[ 246.794413] __x64_sys_mprotect+0x16/0x20 <4>[ 246.794429] do_syscall_64+0x55/0x1c0 <4>[ 246.794443] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 246.794456] -> #2 (&mapping->i_mmap_rwsem){++++}: <4>[ 246.794478] down_write+0x33/0x70 <4>[ 246.794493] unmap_mapping_pages+0x48/0x130 <4>[ 246.794519] i915_vma_revoke_mmap+0x81/0x1b0 [i915] <4>[ 246.794519] i915_vma_unbind+0x11d/0x4a0 [i915] <4>[ 246.794519] i915_vma_destroy+0x31/0x300 [i915] <4>[ 246.794519] __i915_gem_free_objects+0xb8/0x4b0 [i915] <4>[ 246.794519] drm_file_free.part.0+0x1e6/0x290 <4>[ 246.794519] drm_release+0xa6/0xe0 <4>[ 246.794519] __fput+0xc2/0x250 <4>[ 246.794519] task_work_run+0x82/0xb0 <4>[ 246.794519] do_exit+0x35b/0xdb0 <4>[ 246.794519] do_group_exit+0x34/0xb0 <4>[ 246.794519] __x64_sys_exit_group+0xf/0x10 <4>[ 246.794519] do_syscall_64+0x55/0x1c0 <4>[ 246.794519] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 246.794519] -> #1 (&vm->mutex){+.+.}: <4>[ 246.794519] i915_gem_shrinker_taints_mutex+0x6d/0xe0 [i915] <4>[ 246.794519] i915_address_space_init+0x9f/0x160 [i915] <4>[ 246.794519] i915_ggtt_init_hw+0x55/0x170 [i915] <4>[ 246.794519] i915_driver_probe+0xc9f/0x1620 [i915] <4>[ 246.794519] i915_pci_probe+0x43/0x1b0 [i915] <4>[ 246.794519] pci_device_probe+0x9e/0x120 <4>[ 246.794519] really_probe+0xea/0x3d0 <4>[ 246.794519] driver_probe_device+0x10b/0x120 <4>[ 246.794519] device_driver_attach+0x4a/0x50 <4>[ 246.794519] __driver_attach+0x97/0x130 <4>[ 246.794519] bus_for_each_dev+0x74/0xc0 <4>[ 246.794519] bus_add_driver+0x13f/0x210 <4>[ 246.794519] driver_register+0x56/0xe0 <4>[ 246.794519] do_one_initcall+0x58/0x300 <4>[ 246.794519] do_init_module+0x56/0x1f6 <4>[ 246.794519] load_module+0x25bd/0x2a40 <4>[ 246.794519] __se_sys_finit_module+0xd3/0xf0 <4>[ 246.794519] do_syscall_64+0x55/0x1c0 <4>[ 246.794519] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 246.794519] -> #0 (&dev->struct_mutex/1){+.+.}: <4>[ 246.794519] __lock_acquire+0x15d8/0x1e90 <4>[ 246.794519] lock_acquire+0xa6/0x1c0 <4>[ 246.794519] __mutex_lock+0x9d/0x9b0 <4>[ 246.794519] userptr_mn_invalidate_range_start+0x18f/0x220 [i915] <4>[ 246.794519] __mmu_notifier_invalidate_range_start+0x85/0x110 <4>[ 246.794519] try_to_unmap_one+0x76b/0x860 <4>[ 246.794519] rmap_walk_anon+0x104/0x280 <4>[ 246.794519] try_to_unmap+0xc0/0xf0 <4>[ 246.794519] shrink_page_list+0x561/0xc10 <4>[ 246.794519] shrink_inactive_list+0x220/0x440 <4>[ 246.794519] shrink_node_memcg+0x36e/0x740 <4>[ 246.794519] shrink_node+0xcb/0x490 <4>[ 246.794519] balance_pgdat+0x241/0x580 <4>[ 246.794519] kswapd+0x16c/0x530 <4>[ 246.794519] kthread+0x119/0x130 <4>[ 246.794519] ret_from_fork+0x24/0x50 <4>[ 246.794519] other info that might help us debug this: <4>[ 246.794519] Chain exists of: &dev->struct_mutex/1 --> &mapping->i_mmap_rwsem --> &anon_vma->rwsem <4>[ 246.794519] Possible unsafe locking scenario: <4>[ 246.794519] CPU0 CPU1 <4>[ 246.794519] ---- ---- <4>[ 246.794519] lock(&anon_vma->rwsem); <4>[ 246.794519] lock(&mapping->i_mmap_rwsem); <4>[ 246.794519] lock(&anon_vma->rwsem); <4>[ 246.794519] lock(&dev->struct_mutex/1); <4>[ 246.794519] *** DEADLOCK *** v2: Say no to mmap_ioctl Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111744 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111870 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@vger.kernel.org Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190928082546.3473-1-chris@chris-wilson.co.uk
|
#
99013b10 |
|
10-Sep-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Make shrink/unshrink be atomic Add an atomic counter and always take the spinlock around the pin/unpin events, so that we can perform the list manipulation concurrently. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190910212204.17190-1-chris@chris-wilson.co.uk
|
#
fd521d3b |
|
09-Sep-2019 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: include GTT page-size info in error state It might prove useful in the future to know if the vma is utilising huge-GTT-pages. Related to this is the GTT cache, where there is some HW "quirkiness" where it must be disabled if using 2M pages, so include that for good measure. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190909171646.22090-1-matthew.auld@intel.com
|
#
5a90606d |
|
01-Sep-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Replace obj->pin_global with obj->frontbuffer obj->pin_global was originally used as a means to keep the shrinker off the active scanout, but we use the vma->pin_count itself for that and the obj->frontbuffer to delay shrinking active framebuffers. The other role that obj->pin_global gained was for spotting display objects inside GEM and working harder to keep those coherent; for which we can again simply inspect obj->frontbuffer directly. Coming up next, we will want to manipulate the pin_global counter outside of the principle locks, so would need to make pin_global atomic. However, since obj->frontbuffer is already managed atomically, it makes sense to use that the primary key for display objects instead of having pin_global. Ville pointed out the principle difference is that obj->frontbuffer is set for as long as an intel_framebuffer is attached to an object, but obj->pin_global was only raised for as long as the object was active. In practice, this means that we consider the object as being on the scanout for longer than is strictly required, causing us to be more proactive in flushing -- though it should be true that we would have flushed eventually when the back became the front, except that on the flip path that flush is async but when hit from another ioctl it will be synchronous. v2: i915_gem_object_is_framebuffer() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190902040303.14195-5-chris@chris-wilson.co.uk
|
#
8e7cb179 |
|
16-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Extract intel_frontbuffer active tracking Move the active tracking for the frontbuffer operations out of the i915_gem_object and into its own first class (refcounted) object. In the process of detangling, we switch from low level request tracking to the easier i915_active -- with the plan that this avoids any potential atomic callbacks as the frontbuffer tracking wishes to sleep as it flushes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190816074635.26062-1-chris@chris-wilson.co.uk
|
#
b40d7378 |
|
04-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Replace struct_mutex for batch pool serialisation Switch to tracking activity via i915_active on individual nodes, only keeping a list of retired objects in the cache, and reaping the cache when the engine itself idles. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190804124826.30272-2-chris@chris-wilson.co.uk
|
#
a93615f9 |
|
21-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Throw away the active object retirement complexity Remove the accumulated optimisations that we have for i915_vma_retire and reduce it to the bare essential of tracking the active object reference. This allows us to only use atomic operations, and so will be able to avoid the struct_mutex requirement. The principal loss here is the shrinker MRU bumping, so now if we have to shrink, we will do so in much more random order and more likely to try and shrink recently used objects. That is a nuisance, but shrinking active objects is a second step we try to avoid and will always be a system-wide performance issue. The other loss is here is in the automatic pruning of the reservation_object when idling. This is not as large an issue as upon reservation_object introduction as now adding new fences into the object replaces already signaled fences, keeping the array compact. But we do lose the auto-expiration of stale fences and unused arrays. That may be a noticeable problem for which we need to re-implement autopruning. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190621183801.23252-3-chris@chris-wilson.co.uk
|
#
ef78f7b1 |
|
18-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use drm_gem_object.resv Since commit 1ba627148ef5 ("drm: Add reservation_object to drm_gem_object"), struct drm_gem_object grew its own builtin reservation_object rendering our own private one bloat. Remove our redundant reservation_object and point into obj->base.resv instead. References: 1ba627148ef5 ("drm: Add reservation_object to drm_gem_object") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190618125858.7295-1-chris@chris-wilson.co.uk
|
#
ecab9be1 |
|
12-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Combine unbound/bound list tracking for objects With async binding, we don't want to manage a bound/unbound list as we may end up running before we even acquire the pages. All that is required is keeping track of shrinkable objects, so reduce it to the minimum list. Fixes: 6951e5893b48 ("drm/i915: Move GEM object domain management from struct_mutex to local") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190612105720.30310-1-chris@chris-wilson.co.uk
|
#
155ab883 |
|
05-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move object close under its own lock Use i915_gem_object_lock() to guard the LUT and active reference to allow us to break free of struct_mutex for handling GEM_CLOSE. Testcase: igt/gem_close_race Testcase: igt/gem_exec_parallel Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190606112320.9704-1-chris@chris-wilson.co.uk
|
#
c017cf6b |
|
28-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Drop the deferred active reference An old optimisation to reduce the number of atomics per batch sadly relies on struct_mutex for coordination. In order to remove struct_mutex from serialising object/context closing, always taking and releasing an active reference on first use / last use greatly simplifies the locking. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-15-chris@chris-wilson.co.uk
|
#
f033428d |
|
28-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move phys objects to its own file Continuing the decluttering of i915_gem.c, this time the legacy physical object. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-5-chris@chris-wilson.co.uk
|
#
5e5d2e20 |
|
28-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Split GEM object type definition to its own header For convenience in avoiding inline spaghetti, keep the type definition as a separate header. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-1-chris@chris-wilson.co.uk
|