#
5f2ec909 |
|
10-Feb-2022 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: don't include drm_cache.h in i915_drv.h Include it only in files that use it. Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/14edab4a193ea3f73f387a88e3836c8555401871.1644507885.git.jani.nikula@intel.com
|
#
b508d01f |
|
10-Feb-2022 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: split out i915_gem_internal.h from i915_drv.h We already have the i915_gem_internal.c file. Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/6715d1f3232c445990630bb3aac00f279f516fee.1644507885.git.jani.nikula@intel.com
|
#
9ced1218 |
|
13-Oct-2021 |
Ville Syrjälä <ville.syrjala@linux.intel.com> |
drm/i915: Catch yet another unconditioal clflush Replace the unconditional clflush() with drm_clflush_virt_range() which does the wbinvd() fallback when clflush is not available. This time no justification is given for the clflush in the offending commit. Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 2c8ab3339e39 ("drm/i915: Pin timeline map after first timeline pin, v4.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-4-ville.syrjala@linux.intel.com Reviewed-by: Dave Airlie <airlied@redhat.com>
|
#
af7b6d23 |
|
13-Oct-2021 |
Ville Syrjälä <ville.syrjala@linux.intel.com> |
drm/i915: Convert unconditional clflush to drm_clflush_virt_range() This one is apparently a "clflush for good measure", so bit more justification (if you can call it that) than some of the others. Convert to drm_clflush_virt_range() again so that machines without clflush will survive the ordeal. Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@intel.com> #v1 Fixes: 12ca695d2c1e ("drm/i915: Do not share hwsp across contexts any more, v8.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-3-ville.syrjala@linux.intel.com Reviewed-by: Dave Airlie <airlied@redhat.com>
|
#
faf89098 |
|
30-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Fix syncmap memory leak A small race exists between intel_gt_retire_requests_timeout and intel_timeline_exit which could result in the syncmap not getting free'd. Rather than work to hard to seal this race, simply cleanup the syncmap on fini. unreferenced object 0xffff88813bc53b18 (size 96): comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 ................ 00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00 ........kkkk.... backtrace: [<00000000120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915] [<00000000042f6959>] __sync_set+0x1bb/0x240 [i915] [<0000000090f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915] [<0000000056a48219>] i915_request_await_object+0x222/0x360 [i915] [<00000000aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915] [<000000003c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915] [<00000000fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm] [<00000000e721ee87>] drm_ioctl+0x305/0x3c0 [drm] [<000000008b0d8986>] __x64_sys_ioctl+0x71/0xb0 [<0000000076c362a4>] do_syscall_64+0x33/0x80 [<00000000eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Matthew Brost <matthew.brost@intel.com> Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit") Cc: <stable@vger.kernel.org> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210730195342.110234-1-matthew.brost@intel.com
|
#
9761ffb8 |
|
13-Oct-2021 |
Ville Syrjälä <ville.syrjala@linux.intel.com> |
drm/i915: Catch yet another unconditioal clflush Replace the unconditional clflush() with drm_clflush_virt_range() which does the wbinvd() fallback when clflush is not available. This time no justification is given for the clflush in the offending commit. Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 2c8ab3339e39 ("drm/i915: Pin timeline map after first timeline pin, v4.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-4-ville.syrjala@linux.intel.com Reviewed-by: Dave Airlie <airlied@redhat.com> (cherry picked from commit 9ced12182d0d8401d821e9602e56e276459900fc) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
fcf918ff |
|
13-Oct-2021 |
Ville Syrjälä <ville.syrjala@linux.intel.com> |
drm/i915: Convert unconditional clflush to drm_clflush_virt_range() This one is apparently a "clflush for good measure", so bit more justification (if you can call it that) than some of the others. Convert to drm_clflush_virt_range() again so that machines without clflush will survive the ordeal. Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@intel.com> #v1 Fixes: 12ca695d2c1e ("drm/i915: Do not share hwsp across contexts any more, v8.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-3-ville.syrjala@linux.intel.com Reviewed-by: Dave Airlie <airlied@redhat.com> (cherry picked from commit af7b6d234eefa30c461cc16912bafb32b9e6141c) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
a63bcf08 |
|
30-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Fix syncmap memory leak A small race exists between intel_gt_retire_requests_timeout and intel_timeline_exit which could result in the syncmap not getting free'd. Rather than work to hard to seal this race, simply cleanup the syncmap on fini. unreferenced object 0xffff88813bc53b18 (size 96): comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 ................ 00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00 ........kkkk.... backtrace: [<00000000120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915] [<00000000042f6959>] __sync_set+0x1bb/0x240 [i915] [<0000000090f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915] [<0000000056a48219>] i915_request_await_object+0x222/0x360 [i915] [<00000000aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915] [<000000003c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915] [<00000000fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm] [<00000000e721ee87>] drm_ioctl+0x305/0x3c0 [drm] [<000000008b0d8986>] __x64_sys_ioctl+0x71/0xb0 [<0000000076c362a4>] do_syscall_64+0x33/0x80 [<00000000eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Matthew Brost <matthew.brost@intel.com> Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit") Cc: <stable@vger.kernel.org> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210730195342.110234-1-matthew.brost@intel.com (cherry picked from commit faf890985e30d5e88cc3a7c50c1bcad32f89ab7c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
c3b14760 |
|
04-May-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: drop the __i915_active_call pointer packing We use some of the lower bits of the retire function pointer for potential flags, which is quite thorny, since the caller needs to remember to give the function the correct alignment with __i915_active_call, otherwise we might incorrectly unpack the pointer and jump to some garbage address later. Instead of all this let's just pass the flags along as a separate parameter. Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Suggested-by: Daniel Vetter <daniel@ffwll.ch> References: ca419f407b43 ("drm/i915: Fix crash in auto_retire") References: d8e44e4dd221 ("drm/i915/overlay: Fix active retire callback alignment") References: fd5f262db118 ("drm/i915/selftests: Fix active retire callback alignment") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210504164136.96456-1-matthew.auld@intel.com
|
#
24f90d66 |
|
22-Jan-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: SPDX cleanup Clean up the SPDX licence declarations to comply with checkpatch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210122192913.4518-1-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
2c8ab333 |
|
23-Mar-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Pin timeline map after first timeline pin, v4. We're starting to require the reservation lock for pinning, so wait until we have that. Update the selftests to handle this correctly, and ensure pin is called in live_hwsp_rollover_user() and mock_hwsp_freelist(). Changes since v1: - Fix NULL + XX arithmatic, use casts. (kbuild) Changes since v2: - Clear entire cacheline when pinning. Changes since v3: - CACHELINE_BYTES -> TIMELINE_SEQNO_BYTES. (jekstrand) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-3-maarten.lankhorst@linux.intel.com
|
#
12ca695d |
|
23-Mar-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Do not share hwsp across contexts any more, v8. Instead of sharing pages with breadcrumbs, give each timeline a single page. This allows unrelated timelines not to share locks any more during command submission. As an additional benefit, seqno wraparound no longer requires i915_vma_pin, which means we no longer need to worry about a potential -EDEADLK at a point where we are ready to submit. Changes since v1: - Fix erroneous i915_vma_acquire that should be a i915_vma_release (ickle). - Extra check for completion in intel_read_hwsp(). Changes since v2: - Fix inconsistent indent in hwsp_alloc() (kbuild) - memset entire cacheline to 0. Changes since v3: - Do same in intel_timeline_reset_seqno(), and clflush for good measure. Changes since v4: - Use refcounting on timeline, instead of relying on i915_active. - Fix waiting on kernel requests. Changes since v5: - Bump amount of slots to maximum (256), for best wraparounds. - Add hwsp_offset to i915_request to fix potential wraparound hang. - Ensure timeline wrap test works with the changes. - Assign hwsp in intel_timeline_read_hwsp() within the rcu lock to fix a hang. Changes since v6: - Rename i915_request_active_offset to i915_request_active_seqno(), and elaborate the function. (tvrtko) Changes since v7: - Move hunk to where it belongs. (jekstrand) - Replace CACHELINE_BYTES with TIMELINE_SEQNO_BYTES. (jekstrand) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> #v1 Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-2-maarten.lankhorst@linux.intel.com
|
#
163433e5 |
|
14-Jan-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark up protected uses of 'i915_request_completed' When we know that we are inside the timeline mutex, or inside the submission flow (under active.lock or the holder's rcu lock), we know that the rq->hwsp is stable and we can use the simpler direct version. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210114135612.13210-1-chris@chris-wilson.co.uk
|
#
b436a5f8 |
|
22-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Track all timelines created using the HWSP We assume that the contents of the HWSP are lost across suspend, and so upon resume we must restore critical values such as the timeline seqno. Keep track of every timeline allocated that uses the HWSP as its storage and so we can then reset all seqno values by walking that list. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201222104242.10993-1-chris@chris-wilson.co.uk
|
#
9bb36cf6 |
|
17-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Check for rq->hwsp validity after acquiring RCU lock Since we allow removing the timeline map at runtime, there is a risk that rq->hwsp points into a stale page. To control that risk, we hold the RCU read lock while reading *rq->hwsp, but we missed a couple of important barriers. First, the unpinning / removal of the timeline map must be after all RCU readers into that map are complete, i.e. after an rcu barrier (in this case courtesy of call_rcu()). Secondly, we must make sure that the rq->hwsp we are about to dereference under the RCU lock is valid. In this case, we make the rq->hwsp pointer safe during i915_request_retire() and so we know that rq->hwsp may become invalid only after the request has been signaled. Therefore is the request is not yet signaled when we acquire rq->hwsp under the RCU, we know that rq->hwsp will remain valid for the duration of the RCU read lock. This is a very small window that may lead to either considering the request not completed (causing a delay until the request is checked again, any wait for the request is not affected) or dereferencing an invalid pointer. Fixes: 3adac4689f58 ("drm/i915: Introduce concept of per-timeline (context) HWSP") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.1+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201218122421.18344-1-chris@chris-wilson.co.uk
|
#
0986317a |
|
19-Nov-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Show all active timelines for debugging Include the active timelines for debugfs/i915_engine_info, so that we can see which have unready requests inflight which are not shown otherwise. Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-4-chris@chris-wilson.co.uk
|
#
45db630e |
|
18-Jan-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Check for rq->hwsp validity after acquiring RCU lock Since we allow removing the timeline map at runtime, there is a risk that rq->hwsp points into a stale page. To control that risk, we hold the RCU read lock while reading *rq->hwsp, but we missed a couple of important barriers. First, the unpinning / removal of the timeline map must be after all RCU readers into that map are complete, i.e. after an rcu barrier (in this case courtesy of call_rcu()). Secondly, we must make sure that the rq->hwsp we are about to dereference under the RCU lock is valid. In this case, we make the rq->hwsp pointer safe during i915_request_retire() and so we know that rq->hwsp may become invalid only after the request has been signaled. Therefore is the request is not yet signaled when we acquire rq->hwsp under the RCU, we know that rq->hwsp will remain valid for the duration of the RCU read lock. This is a very small window that may lead to either considering the request not completed (causing a delay until the request is checked again, any wait for the request is not affected) or dereferencing an invalid pointer. Fixes: 3adac4689f58 ("drm/i915: Introduce concept of per-timeline (context) HWSP") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.1+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201218122421.18344-1-chris@chris-wilson.co.uk (cherry picked from commit 9bb36cf66091ddf2d8840e5aa705ad3c93a6279b) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210118101755.476744-1-chris@chris-wilson.co.uk
|
#
8ce70996 |
|
22-Oct-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Use the local HWSP offset during submission We wrap the timeline on construction of the next request, but there may still be requests in flight that have not yet finalized the breadcrumb. (The breadcrumb is delayed as we need engine-local offsets, and for the virtual engine that is not known until execution.) As such, by the time we write to the timeline's HWSP offset it may have changed, and we should use the value we preserved in the request instead. Though the window is small and infrequent (at full flow we can expect a timeline's seqno to wrap once every 30 minutes), the impact of writing the old seqno into the new HWSP is severe: the old requests are never completed, and the new requests are completed before they are even submitted. Fixes: ebece7539242 ("drm/i915: Keep timeline HWSP allocated until idle across the system") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201022064127.10159-1-chris@chris-wilson.co.uk (cherry picked from commit c10f6019d0b2dc8a6a62b55459f3ada5bc4e5e1a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
c10f6019 |
|
22-Oct-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Use the local HWSP offset during submission We wrap the timeline on construction of the next request, but there may still be requests in flight that have not yet finalized the breadcrumb. (The breadcrumb is delayed as we need engine-local offsets, and for the virtual engine that is not known until execution.) As such, by the time we write to the timeline's HWSP offset it may have changed, and we should use the value we preserved in the request instead. Though the window is small and infrequent (at full flow we can expect a timeline's seqno to wrap once every 30 minutes), the impact of writing the old seqno into the new HWSP is severe: the old requests are never completed, and the new requests are completed before they are even submitted. Fixes: ebece7539242 ("drm/i915: Keep timeline HWSP allocated until idle across the system") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201022064127.10159-1-chris@chris-wilson.co.uk
|
#
47b08693 |
|
19-Aug-2020 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Make sure execbuffer always passes ww state to i915_vma_pin. As a preparation step for full object locking and wait/wound handling during pin and object mapping, ensure that we always pass the ww context in i915_gem_execbuffer.c to i915_vma_pin, use lockdep to ensure this happens. This also requires changing the order of eb_parse slightly, to ensure we pass ww at a point where we could still handle -EDEADLK safely. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-15-maarten.lankhorst@linux.intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
5d934137 |
|
31-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Export a preallocate variant of i915_active_acquire() Sometimes we have to be very careful not to allocate underneath a mutex (or spinlock) and yet still want to track activity. Enter i915_active_acquire_for_context(). This raises the activity counter on i915_active prior to use and ensures that the fence-tree contains a slot for the context. v2: Refactor active_lookup() so it can be called again before/after locking to resolve contention. Since we protect the rbtree until we idle, we can do a lockfree lookup, with the caveat that if another thread performs a concurrent insertion, the rotations from the insert may cause us to not find our target. A second pass holding the treelock will find the target if it exists, or the place to perform our insertion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-3-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
d1bf5dd8 |
|
30-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Support multiple pinned timelines We may need to allocate more than one pinned context/timeline for each engine which can utilise the per-engine HWSP, so we need to give each a different offset within it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200730183906.25422-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
d45171ac |
|
14-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Trace placement of timeline HWSP Track the position of the HWSP for each timeline. References: https://gitlab.freedesktop.org/drm/intel/-/issues/2169 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200714135002.17508-1-chris@chris-wilson.co.uk
|
#
e31fe02e |
|
11-May-2020 |
Mika Kuoppala <mika.kuoppala@linux.intel.com> |
drm/i915: Make intel_timeline_init static Commit fb5970da1b42 ("drm/i915/gt: Use the kernel_context to measure the breadcrumb size") removed the last external user for intel_timeline_init. Mark it static. Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200511102201.9275-1-mika.kuoppala@linux.intel.com
|
#
2abaad4e |
|
27-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Check cacheline is valid before acquiring The hwsp_cacheline pointer from i915_request is very, very flimsy. The i915_request.timeline (and the hwsp_cacheline) are lost upon retiring (after an RCU grace). Therefore we need to confirm that once we have the right pointer for the cacheline, it is not in the process of being retired and disposed of before we attempt to acquire a reference to the cacheline. <3>[ 547.208237] BUG: KASAN: use-after-free in active_debug_hint+0x6a/0x70 [i915] <3>[ 547.208366] Read of size 8 at addr ffff88822a0d2710 by task gem_exec_parall/2536 <4>[ 547.208547] CPU: 3 PID: 2536 Comm: gem_exec_parall Tainted: G U 5.7.0-rc2-ged7a286b5d02d-kasan_117+ #1 <4>[ 547.208556] Hardware name: Dell Inc. XPS 13 9350/, BIOS 1.4.12 11/30/2016 <4>[ 547.208564] Call Trace: <4>[ 547.208579] dump_stack+0x96/0xdb <4>[ 547.208707] ? active_debug_hint+0x6a/0x70 [i915] <4>[ 547.208719] print_address_description.constprop.6+0x16/0x310 <4>[ 547.208841] ? active_debug_hint+0x6a/0x70 [i915] <4>[ 547.208963] ? active_debug_hint+0x6a/0x70 [i915] <4>[ 547.208975] __kasan_report+0x137/0x190 <4>[ 547.209106] ? active_debug_hint+0x6a/0x70 [i915] <4>[ 547.209127] kasan_report+0x32/0x50 <4>[ 547.209257] ? i915_gemfs_fini+0x40/0x40 [i915] <4>[ 547.209376] active_debug_hint+0x6a/0x70 [i915] <4>[ 547.209389] debug_print_object+0xa7/0x220 <4>[ 547.209405] ? lockdep_hardirqs_on+0x348/0x5f0 <4>[ 547.209426] debug_object_assert_init+0x297/0x430 <4>[ 547.209449] ? debug_object_free+0x360/0x360 <4>[ 547.209472] ? lock_acquire+0x1ac/0x8a0 <4>[ 547.209592] ? intel_timeline_read_hwsp+0x4f/0x840 [i915] <4>[ 547.209737] ? i915_active_acquire_if_busy+0x66/0x120 [i915] <4>[ 547.209861] i915_active_acquire_if_busy+0x66/0x120 [i915] <4>[ 547.209990] ? __live_alloc.isra.15+0xc0/0xc0 [i915] <4>[ 547.210005] ? rcu_read_lock_sched_held+0xd0/0xd0 <4>[ 547.210017] ? print_usage_bug+0x580/0x580 <4>[ 547.210153] intel_timeline_read_hwsp+0xbc/0x840 [i915] <4>[ 547.210284] __emit_semaphore_wait+0xd5/0x480 [i915] <4>[ 547.210415] ? i915_fence_get_timeline_name+0x110/0x110 [i915] <4>[ 547.210428] ? lockdep_hardirqs_on+0x348/0x5f0 <4>[ 547.210442] ? _raw_spin_unlock_irq+0x2a/0x40 <4>[ 547.210567] ? __await_execution.constprop.51+0x2e0/0x570 [i915] <4>[ 547.210706] i915_request_await_dma_fence+0x8f7/0xc70 [i915] Fixes: 85bedbf191e8 ("drm/i915/gt: Eliminate the trylock for reading a timeline's hwsp") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200427093038.29219-1-chris@chris-wilson.co.uk (cherry picked from commit 2759e395358b2b909577928894f856ab75bea41a) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
2759e395 |
|
27-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Check cacheline is valid before acquiring The hwsp_cacheline pointer from i915_request is very, very flimsy. The i915_request.timeline (and the hwsp_cacheline) are lost upon retiring (after an RCU grace). Therefore we need to confirm that once we have the right pointer for the cacheline, it is not in the process of being retired and disposed of before we attempt to acquire a reference to the cacheline. <3>[ 547.208237] BUG: KASAN: use-after-free in active_debug_hint+0x6a/0x70 [i915] <3>[ 547.208366] Read of size 8 at addr ffff88822a0d2710 by task gem_exec_parall/2536 <4>[ 547.208547] CPU: 3 PID: 2536 Comm: gem_exec_parall Tainted: G U 5.7.0-rc2-ged7a286b5d02d-kasan_117+ #1 <4>[ 547.208556] Hardware name: Dell Inc. XPS 13 9350/, BIOS 1.4.12 11/30/2016 <4>[ 547.208564] Call Trace: <4>[ 547.208579] dump_stack+0x96/0xdb <4>[ 547.208707] ? active_debug_hint+0x6a/0x70 [i915] <4>[ 547.208719] print_address_description.constprop.6+0x16/0x310 <4>[ 547.208841] ? active_debug_hint+0x6a/0x70 [i915] <4>[ 547.208963] ? active_debug_hint+0x6a/0x70 [i915] <4>[ 547.208975] __kasan_report+0x137/0x190 <4>[ 547.209106] ? active_debug_hint+0x6a/0x70 [i915] <4>[ 547.209127] kasan_report+0x32/0x50 <4>[ 547.209257] ? i915_gemfs_fini+0x40/0x40 [i915] <4>[ 547.209376] active_debug_hint+0x6a/0x70 [i915] <4>[ 547.209389] debug_print_object+0xa7/0x220 <4>[ 547.209405] ? lockdep_hardirqs_on+0x348/0x5f0 <4>[ 547.209426] debug_object_assert_init+0x297/0x430 <4>[ 547.209449] ? debug_object_free+0x360/0x360 <4>[ 547.209472] ? lock_acquire+0x1ac/0x8a0 <4>[ 547.209592] ? intel_timeline_read_hwsp+0x4f/0x840 [i915] <4>[ 547.209737] ? i915_active_acquire_if_busy+0x66/0x120 [i915] <4>[ 547.209861] i915_active_acquire_if_busy+0x66/0x120 [i915] <4>[ 547.209990] ? __live_alloc.isra.15+0xc0/0xc0 [i915] <4>[ 547.210005] ? rcu_read_lock_sched_held+0xd0/0xd0 <4>[ 547.210017] ? print_usage_bug+0x580/0x580 <4>[ 547.210153] intel_timeline_read_hwsp+0xbc/0x840 [i915] <4>[ 547.210284] __emit_semaphore_wait+0xd5/0x480 [i915] <4>[ 547.210415] ? i915_fence_get_timeline_name+0x110/0x110 [i915] <4>[ 547.210428] ? lockdep_hardirqs_on+0x348/0x5f0 <4>[ 547.210442] ? _raw_spin_unlock_irq+0x2a/0x40 <4>[ 547.210567] ? __await_execution.constprop.51+0x2e0/0x570 [i915] <4>[ 547.210706] i915_request_await_dma_fence+0x8f7/0xc70 [i915] Fixes: 85bedbf191e8 ("drm/i915/gt: Eliminate the trylock for reading a timeline's hwsp") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200427093038.29219-1-chris@chris-wilson.co.uk
|
#
bd3ec9e7 |
|
21-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Poison residual state [HWSP] across resume. Since we may lose the content of any buffer when we relinquish control of the system (e.g. suspend/resume), we have to be careful not to rely on regaining control. A good method to detect when we might be using garbage is by always injecting that garbage prior to first use on load/resume/etc. v2: Drop sanitize callback on cleanup v3: Move seqno reset to timeline enter, so we reset all timelines. However, this is done on every activation during runtime and not reset. The similar level of paranoia we apply to correcting context state after a period of inactivity. Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Venkata Ramana Nayana <venkata.ramana.nayana@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200421092504.7416-1-chris@chris-wilson.co.uk
|
#
8e87e013 |
|
23-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period Since we take advantage of RCU for some i915_active objects, like the intel_timeline_cacheline, we need to delay the i915_active_fini until after the RCU grace period and we perform the kfree -- that is until after all RCU protected readers. <3> [108.204873] ODEBUG: assert_init not available (active state 0) object type: i915_active hint: __cacheline_active+0x0/0x80 [i915] <4> [108.207377] WARNING: CPU: 3 PID: 2342 at lib/debugobjects.c:488 debug_print_object+0x67/0x90 <4> [108.207400] Modules linked in: vgem snd_hda_codec_hdmi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_intel_dspcfg snd_hda_codec ax88179_178a snd_hwdep usbnet btusb snd_hda_core btrtl mii btbcm btintel snd_pcm bluetooth ecdh_generic ecc i915 i2c_hid pinctrl_sunrisepoint pinctrl_intel intel_lpss_pci prime_numbers <4> [108.207587] CPU: 3 PID: 2342 Comm: gem_exec_parall Tainted: G U 5.6.0-rc6-CI-Patchwork_17047+ #1 <4> [108.207609] Hardware name: Google Soraka/Soraka, BIOS MrChromebox-4.10 08/25/2019 <4> [108.207639] RIP: 0010:debug_print_object+0x67/0x90 <4> [108.207668] Code: 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 87 d2 8a 02 8b 53 10 4c 89 e6 48 c7 c7 38 2b 32 82 48 8b 14 d5 80 2f 07 82 e8 49 d5 b7 ff <0f> 0b 5b 83 05 c3 f6 22 01 01 5d 41 5c c3 83 05 b8 f6 22 01 01 c3 <4> [108.207692] RSP: 0018:ffffc90000e7f890 EFLAGS: 00010282 <4> [108.207723] RAX: 0000000000000000 RBX: ffffc90000e7f8b0 RCX: 0000000000000001 <4> [108.207747] RDX: 0000000080000001 RSI: ffff88817ada8cb8 RDI: 00000000ffffffff <4> [108.207770] RBP: ffffffffa0341cc0 R08: ffff88816b5a8948 R09: 0000000000000000 <4> [108.207792] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82322d54 <4> [108.207814] R13: ffffffffa0341cc0 R14: ffffffff83df9568 R15: ffff88816064f400 <4> [108.207839] FS: 00007f437d753700(0000) GS:ffff88817ad80000(0000) knlGS:0000000000000000 <4> [108.207863] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [108.207887] CR2: 00007f2ad1fb5000 CR3: 00000001725d8004 CR4: 00000000003606e0 <4> [108.207907] Call Trace: <4> [108.207959] debug_object_assert_init+0x15c/0x180 <4> [108.208475] ? i915_active_acquire_if_busy+0x10/0x50 [i915] <4> [108.208513] ? rcu_read_lock_held+0x4d/0x60 <4> [108.208970] i915_active_acquire_if_busy+0x10/0x50 [i915] <4> [108.209380] intel_timeline_read_hwsp+0x81/0x540 [i915] <4> [108.210262] __emit_semaphore_wait+0x45/0x1b0 [i915] <4> [108.210726] ? i915_request_await_dma_fence+0x143/0x560 [i915] <4> [108.211156] i915_request_await_dma_fence+0x28a/0x560 [i915] <4> [108.211633] i915_request_await_object+0x24a/0x3f0 [i915] <4> [108.212102] eb_submit.isra.47+0x58f/0x920 [i915] <4> [108.212622] i915_gem_do_execbuffer+0x1706/0x2c70 [i915] <4> [108.213071] ? i915_gem_execbuffer2_ioctl+0xc0/0x470 [i915] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200323092841.22240-1-chris@chris-wilson.co.uk
|
#
8ea6bb8e |
|
06-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Close race between cacheline_retire and free If the cacheline may still be busy, atomically mark it for future release, and only if we can determine that it will never be used again, immediately free it. Closes: https://gitlab.freedesktop.org/drm/intel/issues/1392 Fixes: ebece7539242 ("drm/i915: Keep timeline HWSP allocated until idle across the system") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Link: https://patchwork.freedesktop.org/patch/msgid/20200306154647.3528345-1-chris@chris-wilson.co.uk (cherry picked from commit 2d4bd971f5baa51418625f379a69f5d58b5a0450) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
2d4bd971 |
|
06-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Close race between cacheline_retire and free If the cacheline may still be busy, atomically mark it for future release, and only if we can determine that it will never be used again, immediately free it. Closes: https://gitlab.freedesktop.org/drm/intel/issues/1392 Fixes: ebece7539242 ("drm/i915: Keep timeline HWSP allocated until idle across the system") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Link: https://patchwork.freedesktop.org/patch/msgid/20200306154647.3528345-1-chris@chris-wilson.co.uk
|
#
8faa7251 |
|
03-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Warn about the hidden i915_vma_pin in timeline_get_seqno On seqno rollover, we need to allocate ourselves a new cacheline. This might incur grabbing a new page and pinning it into the GGTT, with some rather unfortunate lockdep implications. To avoid a mutex, and more specifically pinning in the GGTT from inside the kernel context being used to flush the GGTT in emergencies, we will likely need to lift the next-cacheline allocation to a pre-reservation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200203094152.4150550-3-chris@chris-wilson.co.uk
|
#
e3793468 |
|
30-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use the async worker to avoid reclaim tainting the ggtt->mutex On Braswell and Broxton (also known as Valleyview and Apollolake), we need to serialise updates of the GGTT using the big stop_machine() hammer. This has the side effect of appearing to lockdep as a possible reclaim (since it uses the cpuhp mutex and that is tainted by per-cpu allocations). However, we want to use vm->mutex (including ggtt->mutex) from within the shrinker and so must avoid such possible taints. For this purpose, we introduced the asynchronous vma binding and we can apply it to the PIN_GLOBAL so long as take care to add the necessary waits for the worker afterwards. Closes: https://gitlab.freedesktop.org/drm/intel/issues/211 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200130181710.2030251-3-chris@chris-wilson.co.uk
|
#
6e8b0f53 |
|
07-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Drop a defunct timeline assertion intel_timeline_enter() has been decoupled from intel_timeline_pin() and both enter/exit & pin/unpin are allowed [read expected] to run concurrently with one another. The assertion that they had better not is stale. Closes: https://gitlab.freedesktop.org/drm/intel/issues/940 References: a6edbca74b30 ("drm/i915/gt: Close race between engine_park and intel_gt_retire_requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200107143826.3298401-1-chris@chris-wilson.co.uk
|
#
85bedbf1 |
|
16-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Eliminate the trylock for reading a timeline's hwsp As we stash a pointer to the HWSP cacheline on the request, when reading it we only need confirm that the cacheline is still valid by checking that the request and timeline are still intact. v2: Protect hwsp_cachline with RCU Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191217011659.3092130-1-chris@chris-wilson.co.uk
|
#
f1925f33 |
|
13-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use EAGAIN for trylock failures While not good behaviour, it is, however, established behaviour that we can punt EAGAIN to userspace if we need to retry the ioctl. When trying to acquire a mutex, prefer to use EAGAIN to propagate losing the race so that if it does end up back in userspace, we try again. Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences") Closes: https://gitlab.freedesktop.org/drm/intel/issues/800 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191213160347.1789004-1-chris@chris-wilson.co.uk
|
#
df9f85d8 |
|
27-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Serialise i915_active_fence_set() with itself The expected downside to commit 58b4c1a07ada ("drm/i915: Reduce nested prepare_remote_context() to a trylock") was that it would need to return -EAGAIN to userspace in order to resolve potential mutex inversion. Such an unsightly round trip is unnecessary if we could atomically insert a barrier into the i915_active_fence, so make it happen. Currently, we use the timeline->mutex (or some other named outer lock) to order insertion into the i915_active_fence (and so individual nodes of i915_active). Inside __i915_active_fence_set, we only need then serialise with the interrupt handler in order to claim the timeline for ourselves. However, if we remove the outer lock, we need to ensure the order is intact between not only multiple threads trying to insert themselves into the timeline, but also with the interrupt handler completing the previous occupant. We use xchg() on insert so that we have an ordered sequence of insertions (and each caller knows the previous fence on which to wait, preserving the chain of all fences in the timeline), but we then have to cmpxchg() in the interrupt handler to avoid overwriting the new occupant. The only nasty side-effect is having to temporarily strip off the RCU-annotations to apply the atomic operations, otherwise the rules are much more conventional! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112402 Fixes: 58b4c1a07ada ("drm/i915: Reduce nested prepare_remote_context() to a trylock") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191127134527.3438410-1-chris@chris-wilson.co.uk
|
#
31177017 |
|
25-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Schedule request retirement when timeline idles The major drawback of commit 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX corruption WA") is that it disables RC6 while Skylake (and friends) is active, and we do not consider the GPU idle until all outstanding requests have been retired and the engine switched over to the kernel context. If userspace is idle, this task falls onto our background idle worker, which only runs roughly once a second, meaning that userspace has to have been idle for a couple of seconds before we enable RC6 again. Naturally, this causes us to consume considerably more energy than before as powersaving is effectively disabled while a display server (here's looking at you Xorg) is running. As execlists will get a completion event as each context is completed, we can use this interrupt to queue a retire worker bound to this engine to cleanup idle timelines. We will then immediately notice the idle engine (without userspace intervention or the aid of the background retire worker) and start parking the GPU. Thus during light workloads, we will do much more work to idle the GPU faster... Hopefully with commensurate power saving! v2: Watch context completions and only look at those local to the engine when retiring to reduce the amount of excess work we perform. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112315 References: 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX corruption WA") References: 2248a28384fe ("drm/i915/gen8+: Add RC6 CTX corruption WA") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-3-chris@chris-wilson.co.uk (cherry picked from commit 4f88f8747fa43c97c3b3712d8d87295ea757cc51) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
ca1711d1 |
|
20-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Close race between engine_park and intel_gt_retire_requests The general concept was that intel_timeline.active_count was locked by the intel_timeline.mutex. The exception was for power management, where the engine->kernel_context->timeline could be manipulated under the global wakeref.mutex. This was quite solid, as we always manipulated the timeline only while we held an engine wakeref. And then we started retiring requests outside of struct_mutex, only using the timelines.active_list and the timeline->mutex. There we started manipulating intel_timeline.active_count outside of an engine wakeref, and so introduced a race between __engine_park() and intel_gt_retire_requests(), a race that could result in the engine->kernel_context not being added to the active timelines and so losing requests, which caused us to keep the system permanently powered up [and unloadable]. The race would be easy to close if we could take the engine wakeref for the timeline before we retire -- except timelines are not bound to any engine and so we would need to keep all active engines awake. The alternative is to guard intel_timeline_enter/intel_timeline_exit for use outside of the timeline->mutex. Fixes: e5dadff4b093 ("drm/i915: Protect request retirement with timeline->mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120165514.3955081-1-chris@chris-wilson.co.uk (cherry picked from commit a6edbca74b305adc165e67065d7ee766006e6a48) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
4f88f874 |
|
25-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Schedule request retirement when timeline idles The major drawback of commit 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX corruption WA") is that it disables RC6 while Skylake (and friends) is active, and we do not consider the GPU idle until all outstanding requests have been retired and the engine switched over to the kernel context. If userspace is idle, this task falls onto our background idle worker, which only runs roughly once a second, meaning that userspace has to have been idle for a couple of seconds before we enable RC6 again. Naturally, this causes us to consume considerably more energy than before as powersaving is effectively disabled while a display server (here's looking at you Xorg) is running. As execlists will get a completion event as each context is completed, we can use this interrupt to queue a retire worker bound to this engine to cleanup idle timelines. We will then immediately notice the idle engine (without userspace intervention or the aid of the background retire worker) and start parking the GPU. Thus during light workloads, we will do much more work to idle the GPU faster... Hopefully with commensurate power saving! v2: Watch context completions and only look at those local to the engine when retiring to reduce the amount of excess work we perform. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112315 References: 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX corruption WA") References: 2248a28384fe ("drm/i915/gen8+: Add RC6 CTX corruption WA") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-3-chris@chris-wilson.co.uk
|
#
88cec497 |
|
20-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Declare timeline.lock to be irq-free Now that we never allow the intel_wakeref callbacks to be invoked from interrupt context, we do not need the irqsafe spinlock for the timeline. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120170858.3965380-1-chris@chris-wilson.co.uk
|
#
a6edbca7 |
|
20-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Close race between engine_park and intel_gt_retire_requests The general concept was that intel_timeline.active_count was locked by the intel_timeline.mutex. The exception was for power management, where the engine->kernel_context->timeline could be manipulated under the global wakeref.mutex. This was quite solid, as we always manipulated the timeline only while we held an engine wakeref. And then we started retiring requests outside of struct_mutex, only using the timelines.active_list and the timeline->mutex. There we started manipulating intel_timeline.active_count outside of an engine wakeref, and so introduced a race between __engine_park() and intel_gt_retire_requests(), a race that could result in the engine->kernel_context not being added to the active timelines and so losing requests, which caused us to keep the system permanently powered up [and unloadable]. The race would be easy to close if we could take the engine wakeref for the timeline before we retire -- except timelines are not bound to any engine and so we would need to keep all active engines awake. The alternative is to guard intel_timeline_enter/intel_timeline_exit for use outside of the timeline->mutex. Fixes: e5dadff4b093 ("drm/i915: Protect request retirement with timeline->mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120165514.3955081-1-chris@chris-wilson.co.uk
|
#
1683d24c |
|
19-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Move new timelines to the end of active_list When adding a new active timeline, place it at the end of the list. This allows for intel_gt_retire_requests() to pick up the newcomer more quickly and hopefully complete the retirement sooner. A miniscule optimisation. References: 7936a22dd466 ("drm/i915/gt: Wait for new requests in intel_gt_retire_requests()") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191119162559.3313003-1-chris@chris-wilson.co.uk
|
#
4605bb73 |
|
01-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Pull timeline initialise to intel_gt_init_early Our timelines are currently contained within an intel_gt, and we only need to perform list/spinlock initialisation, so we can pull the intel_timelines_init() into our intel_gt_init_early(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191101130406.4142-1-chris@chris-wilson.co.uk
|
#
2871ea85 |
|
24-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Split intel_ring_submission Split the legacy submission backend from the common CS ring buffer handling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191024100344.5041-1-chris@chris-wilson.co.uk
|
#
b1e3177b |
|
04-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Coordinate i915_active with its own mutex Forgo the struct_mutex serialisation for i915_active, and interpose its own mutex handling for active/retire. This is a multi-layered sleight-of-hand. First, we had to ensure that no active/retire callbacks accidentally inverted the mutex ordering rules, nor assumed that they were themselves serialised by struct_mutex. More challenging though, is the rule over updating elements of the active rbtree. Instead of the whole i915_active now being serialised by struct_mutex, allocations/rotations of the tree are serialised by the i915_active.mutex and individual nodes are serialised by the caller using the i915_timeline.mutex (we need to use nested spinlocks to interact with the dma_fence callback lists). The pain point here is that instead of a single mutex around execbuf, we now have to take a mutex for active tracker (one for each vma, context, etc) and a couple of spinlocks for each fence update. The improvement in fine grained locking allowing for multiple concurrent clients (eventually!) should be worth it in typical loads. v2: Add some comments that barely elucidate anything :( Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-6-chris@chris-wilson.co.uk
|
#
274cbf20 |
|
04-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Push the i915_active.retire into a worker As we need to use a mutex to serialise i915_active activation (because we want to allow the callback to sleep), we need to push the i915_active.retire into a worker callback in case we get need to retire from an atomic context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-5-chris@chris-wilson.co.uk
|
#
9eee0dd7 |
|
18-Sep-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Protect timeline->hwsp dereferencing As not only is the signal->timeline volatile, so will be acquiring the timeline's HWSP. We must first carefully acquire the timeline from the signaling request and then lock the timeline. With the removal of the struct_mutex serialisation of request construction, we can have multiple timelines active at once, and so we must avoid using the nested mutex lock as it is quite possible for both timelines to be establishing semaphores on the other and so deadlock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-3-chris@chris-wilson.co.uk
|
#
d19d71fc |
|
18-Sep-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark i915_request.timeline as a volatile, rcu pointer The request->timeline is only valid until the request is retired (i.e. before it is completed). Upon retiring the request, the context may be unpinned and freed, and along with it the timeline may be freed. We therefore need to be very careful when chasing rq->timeline that the pointer does not disappear beneath us. The vast majority of users are in a protected context, either during request construction or retirement, where the timeline->mutex is held and the timeline cannot disappear. It is those few off the beaten path (where we access a second timeline) that need extra scrutiny -- to be added in the next patch after first adding the warnings about dangerous access. One complication, where we cannot use the timeline->mutex itself, is during request submission onto hardware (under spinlocks). Here, we want to check on the timeline to finalize the breadcrumb, and so we need to impose a second rule to ensure that the request->timeline is indeed valid. As we are submitting the request, it's context and timeline must be pinned, as it will be used by the hardware. Since it is pinned, we know the request->timeline must still be valid, and we cannot submit the idle barrier until after we release the engine->active.lock, ergo while submitting and holding that spinlock, a second thread cannot release the timeline. v2: Don't be lazy inside selftests; hold the timeline->mutex for as long as we need it, and tidy up acquiring the timeline with a bit of refactoring (i915_active_add_request) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-1-chris@chris-wilson.co.uk
|
#
ff36c5c4 |
|
23-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Hold irq-off for the entire fake lock period Sadly lockdep records when the irqs are re-enabled and then marks up the fake lock as being irq-unsafe. Our hand is forced and so we must mark up the entire fake lock critical section as irq-off. Hopefully this is the last tweak required! v2: Not quite, we need to mark the timeline spinlock as irqsafe. That was a genuine bug being hidden by the earlier lockdep splat. Fixes: d67739268cf0 ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190823132700.25286-2-chris@chris-wilson.co.uk (cherry picked from commit 6dcb85a0ad990455ae7c596e3fc966ad9c1ba9c5) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
6dcb85a0 |
|
23-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Hold irq-off for the entire fake lock period Sadly lockdep records when the irqs are re-enabled and then marks up the fake lock as being irq-unsafe. Our hand is forced and so we must mark up the entire fake lock critical section as irq-off. Hopefully this is the last tweak required! v2: Not quite, we need to mark the timeline spinlock as irqsafe. That was a genuine bug being hidden by the earlier lockdep splat. Fixes: d67739268cf0 ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190823132700.25286-2-chris@chris-wilson.co.uk
|
#
25ffd4b1 |
|
16-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Markup expected timeline locks for i915_active As every i915_active_request should be serialised by a dedicated lock, i915_active consists of a tree of locks; one for each node. Markup up the i915_active_request with what lock is supposed to be guarding it so that we can verify that the serialised updated are indeed serialised. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190816121000.8507-2-chris@chris-wilson.co.uk
|
#
6c69a454 |
|
16-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Mark context->active_count as protected by timeline->mutex We use timeline->mutex to protect modifications to context->active_count, and the associated enable/disable callbacks. Due to complications with engine-pm barrier there is a path where we used a "superlock" to provide serialised protect and so could not unconditionally assert with lockdep that it was always held. However, we can mark the mutex as taken (noting that we may be nested underneath ourselves) which means we can be reassured the right timeline->mutex is always treated as held and let lockdep roam free. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190816121000.8507-1-chris@chris-wilson.co.uk
|
#
ccb23d2d |
|
15-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Guard timeline pinning without relying on struct_mutex In preparation for removing struct_mutex from around context retirement, we need to make timeline pinning and unpinning safe. Since multiple engines/contexts can share a single timeline, we cannot rely on borrowing the context mutex (otherwise we could state that the timeline is only pinned/unpinned inside the context pin/unpin and so guarded by it). However, we only perform a sequence of atomic operations inside the timeline pin/unpin and the sequence of those operations is safe for a concurrent unpin / pin, so we can relax the struct_mutex requirement. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-3-chris@chris-wilson.co.uk
|
#
338aade9 |
|
15-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Convert timeline tracking to spinlock Convert the active_list manipulation of timelines to use spinlocks so that we can perform the updates from underneath a quick interrupt callback, if need be. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-2-chris@chris-wilson.co.uk
|
#
531958f6 |
|
15-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Track timeline activeness in enter/exit Lift moving the timeline to/from the active_list on enter/exit in order to shorten the active tracking span in comparison to the existing pin/unpin. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-1-chris@chris-wilson.co.uk
|
#
f0ca820c |
|
25-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Always call kref_init for the timeline Always initialise the refcount, even for the embedded timelines inside mock devices. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190625233349.32371-2-chris@chris-wilson.co.uk
|
#
b38565fa |
|
25-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Drop stale commentary for timeline density We no longer allocate a contiguous set of timeline ids for all engines upon creation, so we no longer should assume that the timelines are densely allocated within a context. Hopefully, the set of fences used within a workload are still dense enough for us to take advantage of the compressed radix tree used for the syncmap. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190625233349.32371-1-chris@chris-wilson.co.uk
|
#
12c255b5 |
|
21-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Provide an i915_active.acquire callback If we introduce a callback for i915_active that is only called the first time we use the i915_active and is symmetrically paired with the i915_active.retire callback, we can replace the open-coded and non-atomic implementations -- which will be very fragile (i.e. broken) upon removing the struct_mutex serialisation. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190621183801.23252-4-chris@chris-wilson.co.uk
|
#
9e953980 |
|
21-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove waiting & retiring from shrinker paths i915_gem_wait_for_idle() and i915_retire_requests() introduce a dependency on the timeline->mutex. This is problematic as we want to later perform allocations underneath i915_active.mutex, forming a link between the shrinker, the timeline and active mutexes. Nip this cycle in the bud by removing the acquisition of the timeline mutex (i.e. retiring) from inside the shrinker. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190621183801.23252-1-chris@chris-wilson.co.uk
|
#
c6fe28b0 |
|
21-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Rename i915_gt_timelines Since the anonymous i915_gt became struct intel_gt and encloses struct i915_gt_timelines, rename i915_gt_timelines to intel_gt_timelines to match its parentage. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190621131640.28864-1-chris@chris-wilson.co.uk
|
#
f0c02c1b |
|
21-Jun-2019 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Rename i915_timeline to intel_timeline and move under gt Move all timeline code under gt and rename to intel_gt prefix. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-32-tvrtko.ursulin@linux.intel.com
|