#
152191e5 |
|
29-Mar-2024 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Fix the fix for reset lock confusion The previous fix for the circlular lock splat about the busyness worker wasn't quite complete. Even though the reset-in-progress flag is cleared at the start of intel_uc_reset_finish, the entire function is still inside the reset mutex lock. Not sure why the patch appeared to fix the issue both locally and in CI. However, it is now back again. There is a further complication that the wedge code path within intel_gt_reset() jumps around so much that it results in nested reset_prepare/_finish calls. That is, the call sequence is: intel_gt_reset | reset_prepare | __intel_gt_set_wedged | | reset_prepare | | reset_finish | reset_finish The nested finish means that even if the clear of the in-progress flag was moved to the end of _finish, it would still be clear for the entire second call. Surprisingly, this does not seem to be causing any other problems at present. As an aside, a wedge on fini does not call the finish functions at all. The reset_in_progress flag is left set (twice). So instead of trying to cancel the worker anywhere at all in the reset path, just add a cancel to intel_guc_submission_fini instead. Note that it is not a problem if the worker is still active during a reset. Either it will run before the reset path starts locking things and will simply block the reset code for a tiny amount of time. Or it will run after the locks have been acquired and will early exit due to the try-lock. Also, do not use the reset-in-progress flag to decide whether a synchronous cancel is safe (from a lockdep perspective) or not. Instead, use the actual reset mutex state (both the genuine one and the custom rolled BACKOFF one). Fixes: 0e00a8814eec ("drm/i915/guc: Avoid circular locking issue on busyness flush") Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Cc: Zhanjun Dong <zhanjun.dong@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240329235306.1559639-1-John.C.Harrison@Intel.com (cherry picked from commit 3563d855312acedcd445a3767f0cb07906f1c26f) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
0e00a881 |
|
19-Dec-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Avoid circular locking issue on busyness flush Avoid the following lockdep complaint: <4> [298.856498] ====================================================== <4> [298.856500] WARNING: possible circular locking dependency detected <4> [298.856503] 6.7.0-rc5-CI_DRM_14017-g58ac4ffc75b6+ #1 Tainted: G N <4> [298.856505] ------------------------------------------------------ <4> [298.856507] kworker/4:1H/190 is trying to acquire lock: <4> [298.856509] ffff8881103e9978 (>->reset.backoff_srcu){++++}-{0:0}, at: _intel_gt_reset_lock+0x35/0x380 [i915] <4> [298.856661] but task is already holding lock: <4> [298.856663] ffffc900013f7e58 ((work_completion)(&(&guc->timestamp.work)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x264/0x530 <4> [298.856671] which lock already depends on the new lock. The complaint is not actually valid. The busyness worker thread does indeed hold the worker lock and then attempt to acquire the reset lock (which may have happened in reverse order elsewhere). However, it does so with a trylock that exits if the reset lock is not available (specifically to prevent this and other similar deadlocks). Unfortunately, lockdep does not understand the trylock semantics (the lock is an i915 specific custom implementation for resets). Not doing a synchronous flush of the worker thread when a reset is in progress resolves the lockdep splat by never even attempting to grab the lock in this particular scenario. There are situatons where a synchronous cancel is required, however. So, always do the synchronous cancel if not in reset. And add an extra synchronous cancel to the end of the reset flow to account for when a reset is occurring at driver shutdown and the cancel is no longer synchronous but could lead to unallocated memory accesses if the worker is not stopped. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20231219195957.212600-1-John.C.Harrison@Intel.com
|
#
2f2cc53b |
|
29-Dec-2023 |
Alan Previn <alan.previn.teres.alexis@intel.com> |
drm/i915/guc: Close deregister-context race against CT-loss If we are at the end of suspend or very early in resume its possible an async fence signal (via rcu_call) is triggered to free_engines which could lead us to the execution of the context destruction worker (after a prior worker flush). Thus, when suspending, insert rcu_barriers at the start of i915_gem_suspend (part of driver's suspend prepare) and again in i915_gem_suspend_late so that all such cases have completed and context destruction list isn't missing anything. In destroyed_worker_func, close the race against CT-loss by checking that CT is enabled before calling into deregister_destroyed_contexts. Based on testing, guc_lrc_desc_unpin may still race and fail as we traverse the GuC's context-destroy list because the CT could be disabled right before calling GuC's CT send function. We've witnessed this race condition once every ~6000-8000 suspend-resume cycles while ensuring workloads that render something onscreen is continuously started just before we suspend (and the workload is small enough to complete and trigger the queued engine/context free-up either very late in suspend or very early in resume). In such a case, we need to unroll the entire process because guc-lrc-unpin takes a gt wakeref which only gets released in the G2H IRQ reply that never comes through in this corner case. Without the unroll, the taken wakeref is leaked and will cascade into a kernel hang later at the tail end of suspend in this function: intel_wakeref_wait_for_idle(>->wakeref) (called by) - intel_gt_pm_wait_for_idle (called by) - wait_for_suspend Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_- contexts if guc_lrc_desc_unpin fails due to CT send falure. When unrolling, keep the context in the GuC's destroy-list so it can get picked up on the next destroy worker invocation (if suspend aborted) or get fully purged as part of a GuC sanitization (end of suspend) or a reset flow. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Tested-by: Mousumi Jana <mousumi.jana@intel.com> Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231229215143.581619-1-alan.previn.teres.alexis@intel.com
|
#
5e83c060 |
|
27-Dec-2023 |
Alan Previn <alan.previn.teres.alexis@intel.com> |
drm/i915/guc: Flush context destruction worker at suspend When suspending, flush the context-guc-id deregistration worker at the final stages of intel_gt_suspend_late when we finally call gt_sanitize that eventually leads down to __uc_sanitize so that the deregistration worker doesn't fire off later as we reset the GuC microcontroller. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Tested-by: Mousumi Jana <mousumi.jana@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231228045558.536585-2-alan.previn.teres.alexis@intel.com
|
#
be5bcc4b |
|
06-Dec-2023 |
Andi Shyti <andi.shyti@linux.intel.com> |
drm/i915/guc: Create the guc_to_i915() wrapper Given a reference to "guc", the guc_to_i915() returns the pointer to "i915" private data. Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231206184322.57111-1-andi.shyti@linux.intel.com
|
#
cf9cb028 |
|
30-Nov-2023 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Use internal class when counting engine resets Commit 503579448db9 ("drm/i915/gsc: Mark internal GSC engine with reserved uabi class") made the GSC0 engine not have a valid uabi class and so broke the engine reset counting, which in turn was made class based in cb823ed9915b ("drm/i915/gt: Use intel_gt as the primary object for handling resets"). Despite the title and commit text of the latter is not mentioning it (and has left the storage array incorrectly sized), tracking by class, despite it adding aliasing in hypthotetical multi-tile systems, is handy for virtual engines which for instance do not have a valid engine->id. Therefore we keep that but just change it to use the internal class which is always valid. We also add a helper to increment the count, which aligns with the existing getter. What was broken without this fix were out of bounds reads every time a reset would happen on the GSC0 engine, or during selftests when storing and cross-checking the counts in igt_live_test_begin and igt_live_test_end. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: dfed6b58d54f ("drm/i915/gsc: Mark internal GSC engine with reserved uabi class") [tursulin: fixed Fixes tag] Reported-by: Alan Previn Teres Alexis <alan.previn.teres.alexis@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231201122109.729006-2-tvrtko.ursulin@linux.intel.com
|
#
5e4e06e4 |
|
30-Oct-2023 |
Andrzej Hajda <andrzej.hajda@intel.com> |
drm/i915: Track gt pm wakerefs Track every intel_gt_pm_get() until its corresponding release in intel_gt_pm_put() by returning a cookie to the caller for acquire that must be passed by on released. When there is an imbalance, we can see who either tried to free a stale wakeref, or who forgot to free theirs. v2: track recently added calls in gen8_ggtt_bind_get_ce and destroyed_worker_func Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231030-ref_tracker_i915-v1-2-006fe6b96421@intel.com
|
#
1f721a93 |
|
30-Nov-2023 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Use internal class when counting engine resets Commit 503579448db9 ("drm/i915/gsc: Mark internal GSC engine with reserved uabi class") made the GSC0 engine not have a valid uabi class and so broke the engine reset counting, which in turn was made class based in cb823ed9915b ("drm/i915/gt: Use intel_gt as the primary object for handling resets"). Despite the title and commit text of the latter is not mentioning it (and has left the storage array incorrectly sized), tracking by class, despite it adding aliasing in hypthotetical multi-tile systems, is handy for virtual engines which for instance do not have a valid engine->id. Therefore we keep that but just change it to use the internal class which is always valid. We also add a helper to increment the count, which aligns with the existing getter. What was broken without this fix were out of bounds reads every time a reset would happen on the GSC0 engine, or during selftests when storing and cross-checking the counts in igt_live_test_begin and igt_live_test_end. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 503579448db9 ("drm/i915/gsc: Mark internal GSC engine with reserved uabi class") [tursulin: fixed Fixes tag] Reported-by: Alan Previn Teres Alexis <alan.previn.teres.alexis@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231201122109.729006-2-tvrtko.ursulin@linux.intel.com (cherry picked from commit cf9cb028ac56696ff879af1154c4b2f0b12701fd) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
55ac6ea7 |
|
17-Oct-2023 |
Jonathan Cavitt <jonathan.cavitt@intel.com> |
drm/i915: No TLB invalidation on wedged GT It is not an error for GuC TLB invalidations to fail when the GT is wedged or disabled, so do not process a wait failure as one in guc_send_invalidate_tlb. Signed-off-by: Fei Yang <fei.yang@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> CC: John Harrison <john.c.harrison@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231017180806.3054290-6-jonathan.cavitt@intel.com
|
#
2202eca0 |
|
17-Oct-2023 |
Jonathan Cavitt <jonathan.cavitt@intel.com> |
drm/i915: No TLB invalidation on suspended GT In case of GT is suspended, don't allow submission of new TLB invalidation request and cancel all pending requests. The TLB entries will be invalidated either during GuC reload or on system resume. Signed-off-by: Fei Yang <fei.yang@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> CC: John Harrison <john.c.harrison@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231017180806.3054290-5-jonathan.cavitt@intel.com
|
#
af58ee22 |
|
17-Oct-2023 |
Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> |
drm/i915: Define and use GuC and CTB TLB invalidation routines The GuC firmware had defined the interface for Translation Look-Aside Buffer (TLB) invalidation. We should use this interface when invalidating the engine and GuC TLBs. Add additional functionality to intel_gt_invalidate_tlb, invalidating the GuC TLBs and falling back to GT invalidation when the GuC is disabled. The invalidation is done by sending a request directly to the GuC tlb_lookup that invalidates the table. The invalidation is submitted as a wait request and is performed in the CT event handler. This means we cannot perform this TLB invalidation path if the CT is not enabled. If the request isn't fulfilled in two seconds, this would constitute an error in the invalidation as that would constitute either a lost request or a severe GuC overload. With this new invalidation routine, we can perform GuC-based GGTT invalidations. GuC-based GGTT invalidation is incompatible with MMIO invalidation so we should not perform MMIO invalidation when GuC-based GGTT invalidation is expected. The additional complexity incurred in this patch will be necessary for range-based tlb invalidations, which will be platformed in the future. Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Signed-off-by: Fei Yang <fei.yang@intel.com> CC: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231017180806.3054290-4-jonathan.cavitt@intel.com
|
#
6a3ecfd4 |
|
21-Sep-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Suppress 'ignoring reset notification' message If an active context has been banned (e.g. Ctrl+C killed) then it is likely to be reset as part of evicting it from the hardware. That results in a 'ignoring context reset notification: banned = 1' message at info level. This confuses/concerns people and makes them think something has gone wrong when it hasn't. There is already a debug level message with essentially the same information. So drop the 'ignore' info level one and just add the 'ignore' flag to the debug level one instead (which will therefore not appear by default but will still show up in CI runs). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230921182033.135448-1-John.C.Harrison@Intel.com
|
#
e2f99b79 |
|
25-Sep-2023 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
i915/guc: Get runtime pm in busyness worker only if already active Ideally the busyness worker should take a gt pm wakeref because the worker only needs to be active while gt is awake. However, the gt_park path cancels the worker synchronously and this complicates the flow if the worker is also running at the same time. The cancel waits for the worker and when the worker releases the wakeref, that would call gt_park and would lead to a deadlock. The resolution is to take the global pm wakeref if runtime pm is already active. If not, we don't need to update the busyness stats as the stats would already be updated when the gt was parked. Note: - We do not requeue the worker if we cannot take a reference to runtime pm since intel_guc_busyness_unpark would requeue the worker in the resume path. - If the gt was parked longer than time taken for GT timestamp to roll over, we ignore those rollovers since we don't care about tracking the exact GT time. We only care about roll overs when the gt is active and running workloads. - There is a window of time between gt_park and runtime suspend, where the worker may run. This is acceptable since the worker will not find any new data to update busyness. v2: (Daniele) - Edit commit message and code comment - Use runtime pm in the worker - Put runtime pm after enabling the worker - Use Link tag and add Fixes tag v3: (Daniele) - Reword commit and comments and add details Link: https://gitlab.freedesktop.org/drm/intel/-/issues/7077 Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230925192117.2497058-1-umesh.nerlige.ramappa@intel.com
|
#
28041067 |
|
21-Aug-2023 |
Andrzej Hajda <andrzej.hajda@intel.com> |
drm/i915: mark requests for GuC virtual engines to avoid use-after-free References to i915_requests may be trapped by userspace inside a sync_file or dmabuf (dma-resv) and held indefinitely across different proceses. To counter-act the memory leaks, we try to not to keep references from the request past their completion. On the other side on fence release we need to know if rq->engine is valid and points to hw engine (true for non-virtual requests). To make it possible extra bit has been added to rq->execution_mask, for marking virtual engines. Fixes: bcb9aa45d5a0 ("Revert "drm/i915: Hold reference to intel_context over life of i915_request"") Signed-off-by: Chris Wilson <chris.p.wilson@linux.intel.com> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230821153035.3903006-1-andrzej.hajda@intel.com
|
#
5a213086 |
|
21-Aug-2023 |
Matt Roper <matthew.d.roper@intel.com> |
drm/i915: Eliminate IS_MTL_GRAPHICS_STEP Several workarounds are guarded by IS_MTL_GRAPHICS_STEP. However none of these workarounds are actually tied to MTL as a platform; they only relate to the Xe_LPG graphics IP, regardless of what platform it appears in. At the moment MTL is the only platform that uses Xe_LPG with IP versions 12.70 and 12.71, but we can't count on this being true in the future. Switch these to use a new IS_GFX_GT_IP_STEP() macro instead that is purely based on IP version. IS_GFX_GT_IP_STEP() is also GT-based rather than device-based, which will help prevent mistakes where we accidentally try to apply Xe_LPG graphics workarounds to the Xe_LPM+ media GT and vice-versa. v2: - Switch to a more generic and shorter IS_GT_IP_STEP macro that can be used for both graphics and media IP (and any other kind of GTs that show up in the future). v3: - Switch back to long-form IS_GFX_GT_IP_STEP macro. (Jani) - Move macro to intel_gt.h. (Andi) v4: - Build IS_GFX_GT_IP_STEP on top of IS_GFX_GT_IP_RANGE and IS_GRAPHICS_STEP building blocks and name the parameters from/until rather than begin/fixed. (Jani) - Fix usage examples in comment. v5: - Tweak comment on macro. (Gustavo) Cc: Gustavo Sousa <gustavo.sousa@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230821180619.650007-15-matthew.d.roper@intel.com
|
#
28c46fee |
|
21-Aug-2023 |
Matt Roper <matthew.d.roper@intel.com> |
drm/i915: Consolidate condition for Wa_22011802037 The workaround bounds for Wa_22011802037 are somewhat complex and are replicated in several places throughout the code. Pull the condition out to a helper function to prevent mistakes if this condition needs to change again in the future. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230821180619.650007-12-matthew.d.roper@intel.com
|
#
907ef039 |
|
25-Sep-2023 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
i915/guc: Get runtime pm in busyness worker only if already active Ideally the busyness worker should take a gt pm wakeref because the worker only needs to be active while gt is awake. However, the gt_park path cancels the worker synchronously and this complicates the flow if the worker is also running at the same time. The cancel waits for the worker and when the worker releases the wakeref, that would call gt_park and would lead to a deadlock. The resolution is to take the global pm wakeref if runtime pm is already active. If not, we don't need to update the busyness stats as the stats would already be updated when the gt was parked. Note: - We do not requeue the worker if we cannot take a reference to runtime pm since intel_guc_busyness_unpark would requeue the worker in the resume path. - If the gt was parked longer than time taken for GT timestamp to roll over, we ignore those rollovers since we don't care about tracking the exact GT time. We only care about roll overs when the gt is active and running workloads. - There is a window of time between gt_park and runtime suspend, where the worker may run. This is acceptable since the worker will not find any new data to update busyness. v2: (Daniele) - Edit commit message and code comment - Use runtime pm in the worker - Put runtime pm after enabling the worker - Use Link tag and add Fixes tag v3: (Daniele) - Reword commit and comments and add details Link: https://gitlab.freedesktop.org/drm/intel/-/issues/7077 Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230925192117.2497058-1-umesh.nerlige.ramappa@intel.com (cherry picked from commit e2f99b79d4c594cdf7ab449e338d4947f5ea8903) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
5eefc530 |
|
21-Aug-2023 |
Andrzej Hajda <andrzej.hajda@intel.com> |
drm/i915: mark requests for GuC virtual engines to avoid use-after-free References to i915_requests may be trapped by userspace inside a sync_file or dmabuf (dma-resv) and held indefinitely across different proceses. To counter-act the memory leaks, we try to not to keep references from the request past their completion. On the other side on fence release we need to know if rq->engine is valid and points to hw engine (true for non-virtual requests). To make it possible extra bit has been added to rq->execution_mask, for marking virtual engines. Fixes: bcb9aa45d5a0 ("Revert "drm/i915: Hold reference to intel_context over life of i915_request"") Signed-off-by: Chris Wilson <chris.p.wilson@linux.intel.com> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230821153035.3903006-1-andrzej.hajda@intel.com (cherry picked from commit 280410677af763f3871b93e794a199cfcf6fb580) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
e4730ae4 |
|
28-Apr-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Fix error capture for virtual engines GuC based register dumps in error capture logs were basically broken for virtual engines. This can be seen in igt@gem_exec_balancer@hang: [IGT] gem_exec_balancer: starting subtest hang [drm] GPU HANG: ecode 12:4:e1524110, in gem_exec_balanc [6388] [drm] GT0: GUC: No register capture node found for 0x1005 / 0xFEDC311D [drm] GPU HANG: ecode 12:4:00000000, in gem_exec_balanc [6388] [IGT] gem_exec_balancer: exiting, ret=0 The test causes a hang on both engines of a virtual engine context. The engine instance zero hang gets a valid error capture but the non-instance-zero hang does not. Fix that by scanning through the list of pending register captures when a hang notification for a virtual engine is received. That way, the hang can be assigned to the correct physical engine prior to starting the error capture process. So later on, when the error capture handler tries to find the engine register list, it looks for one on the correct engine. Also, sneak in a missing blank line before a comment in the node search code. v2: Fix null pointer deref on non-GuC platforms. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230428185636.457407-5-John.C.Harrison@Intel.com
|
#
5aa857db |
|
27-Apr-2023 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
i915/pmu: Add support for total context runtime for GuC back-end GPU accumulates the context runtime in a 32 bit counter - CTX_TIMESTAMP in the context image. This value is saved/restored on context switches. KMD accumulates these values into a 64 bit counter taking care of any overflows as needed. This count provides the basis for client specific busyness in the fdinfo interface. KMD accumulation happens just before the context is unpinned and when context switches out. This works for execlist back-end since execlist scheduling has visibility into context switches. With GuC mode, KMD does not have visibility into context switches and this counter is accumulated only when context is unpinned. Context is unpinned once the context scheduling is successfully disabled. Disabling context scheduling is an asynchronous operation. Also if a context is servicing frequent requests, scheduling may never be disabled on it. For GuC mode, since updates to the context runtime may be delayed, add hooks to update the context runtime in a worker thread as well as when a user queries for it. Limitation: - If a context is never switched out or runs for a long period of time, the runtime value of CTX_TIMESTAMP may never be updated, so the counter value may be unreliable. This patch does not support such cases. Such support must be available from the GuC FW and it is WIP. This patch is an extract from previous work authored by John/Umesh here - https://patchwork.freedesktop.org/patch/496441/?series=105085&rev=4 v2: (Ashutosh) - Drop COPS_RUNTIME_ACTIVE_TOTAL - s/guc_context_update_clks/__guc_context_update_stats - Pin context before accessing in guc_timestamp_ping - In guc_context_unpin, use spinlock to serialize access to runtime stats Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Co-developed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230427224705.2785566-2-umesh.nerlige.ramappa@intel.com
|
#
514b8a79 |
|
18-Apr-2023 |
Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep@intel.com> |
drm/i915/mtl: Extend Wa_22011802037 to MTL A-step Wa_22011802037 was being applied to all graphics_ver 11 & 12. This patch updates the if statement to apply the W/A to right platforms and extends it to MTL-M:A step. v1.1: Fix checkpatch warning. v2: Change the check to reflect the wa at other places(Lucas) Bspec: 66622 Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep@intel.com> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230418220446.2205509-4-radhakrishna.sripada@intel.com
|
#
cd414f4f |
|
17-Feb-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Fix missing return code checks in submission init The CI results for the 'fast request' patch set (enables error return codes for fire-and-forget H2G messages) hit an issue with the KMD sending context submission requests on an invalid context. That was caused by a fault injection probe failing the context creation of a kernel context. However, there was no return code checking on any of the kernel context registration paths. So the driver kept going and tried to use the kernel context for the record defaults process. This would not cause any actual problems. The invalid requests would be rejected by GuC and ultimately the start up sequence would correctly wedge due to the context creation failure. But fixing the issue correctly rather ignoring it means we won't get CI complaining when the fast request patch lands and enables the extra error checking. So fix it by checking for errors and aborting as appropriate when creating kernel contexts. While at it, clean up some other submission init related failure cleanup paths. Also, rename guc_init_lrc_mapping to guc_init_submission as the former name hasn't been valid in a long time. v2: Add another wrapper to keep the flow balanced (Daniele) Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230217223308.3449737-3-John.C.Harrison@Intel.com
|
#
fcb03489 |
|
17-Feb-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Improve clean up of busyness stats worker The stats worker thread management was mis-matched between enable/disable call sites. Fix those up. Also, abstract the cancel/enable code into a helper function rather than replicating in multiple places. v2: Rename the helpers and wrap the enable as well as the cancel (review feedback from Daniele). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230217223308.3449737-2-John.C.Harrison@Intel.com
|
#
5dfb29d4 |
|
31-Jan-2023 |
Michal Wajdeczko <michal.wajdeczko@intel.com> |
drm/i915/guc: Improve debug message on context reset notification Just recently we switched over to new GuC oriented log macros but in the meantime yet another message was added that we missed to update. While around improve that new message by adding engine name and use existing helpers to check for context state. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230131214413.1879-1-michal.wajdeczko@intel.com
|
#
f0c4fc41 |
|
28-Jan-2023 |
Michal Wajdeczko <michal.wajdeczko@intel.com> |
drm/i915/guc: Update GuC messages in intel_guc_submission.c Use new macros to have common prefix that also include GT#. v2: improve few existing messages Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230128195907.1837-8-michal.wajdeczko@intel.com
|
#
e9823f0f |
|
26-Jan-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Add a debug print on GuC triggered reset For understanding bug reports, it can be useful to have an explicit dmesg print when a reset notification is received from GuC. As opposed to simply inferring that this happened from other messages. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-8-John.C.Harrison@Intel.com
|
#
d907852d |
|
26-Jan-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Look for a guilty context when an engine reset fails Engine resets are supposed to never fail. But in the case when one does (due to unknown reasons that normally come down to a missing w/a), it is useful to get as much information out of the system as possible. Given that the GuC intentionally dies on such a situation, it is not possible to get a guilty context notification back. So do a manual search instead. Given that GuC is dead, this is safe because GuC won't be changing the engine state asynchronously. v2: Change comment to be less alarming (Tvrtko) Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-7-John.C.Harrison@Intel.com
|
#
3700e353 |
|
26-Jan-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915: Fix request ref counting during error capture & debugfs dump When GuC support was added to error capture, the reference counting around the request object was broken. Fix it up. The context based search manages the spinlocking around the search internally. So it needs to grab the reference count internally as well. The execlist only request based search relies on external locking, so it needs an external reference count but within the spinlock not outside it. The only other caller of the context based search is the code for dumping engine state to debugfs. That code wasn't previously getting an explicit reference at all as it does everything while holding the execlist specific spinlock. So, that needs updaing as well as that spinlock doesn't help when using GuC submission. Rather than trying to conditionally get/put depending on submission model, just change it to always do the get/put. v2: Explicitly document adding an extra blank line in some dense code (Andy Shevchenko). Fix multiple potential null pointer derefs in case of no request found (some spotted by Tvrtko, but there was more!). Also fix a leaked request in case of !started and another in __guc_reset_context now that intel_context_find_active_request is actually reference counting the returned request. v3: Add a _get suffix to intel_context_find_active_request now that it grabs a reference (Daniele). v4: Split the intel_guc_find_hung_context change to a separate patch and rename intel_context_find_active_request_get to intel_context_get_active_request (Tvrtko). v5: s/locking/reference counting/ in commit message (Tvrtko) Fixes: dc0dad365c5e ("drm/i915/guc: Fix for error capture after full GPU reset with GuC") Fixes: 573ba126aef3 ("drm/i915/guc: Capture error state on context reset") Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Michael Cheng <michael.cheng@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-3-John.C.Harrison@Intel.com
|
#
d1c37175 |
|
26-Jan-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Fix locking when searching for a hung request intel_guc_find_hung_context() was not acquiring the correct spinlock before searching the request list. So fix that up. While at it, add some extra whitespace padding for readability. Fixes: dc0dad365c5e ("drm/i915/guc: Fix for error capture after full GPU reset with GuC") Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Michael Cheng <michael.cheng@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com> Cc: Chris Wilson <chris.p.wilson@intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-2-John.C.Harrison@Intel.com
|
#
41bb543f |
|
05-Jan-2023 |
Matt Roper <matthew.d.roper@intel.com> |
drm/i915/mtl: Add initial gt workarounds This patch introduces initial gt workarounds for the MTL platform. v2: drop redundant/stale comments specifying wa platforms affected (Lucas). v3: drop additional redundant stale comments (MattR) Bspec: 66622 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230105234408.277750-1-matthew.s.atwood@intel.com
|
#
d830e0dc |
|
21-Dec-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Fix a static analysis warning A static analyser was complaining about not checking for null pointers. However, the location of the complaint can only be reached in the first place if said pointer is non-null. Basically, if we are using a v69 GuC then the descriptor pool is guaranteed to be alocated at start of day or submission will be disabled with an ENOMEM error. And if we are using a later GuC that does not use a descriptor pool then the v69 submission function would not be called. So, not a possible null at that point in the code. Hence adding a GEM_BUG_ON(!ptr) to keep the tool happy. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221221193031.687266-3-John.C.Harrison@Intel.com
|
#
9bbba066 |
|
29-Nov-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Use GuC submission API version number The GuC firmware includes an extra version number to specify the submission API level. So use that rather than the main firmware version number for submission related checks. Also, while it is guaranteed that GuC version number components are only 8-bits in size, other firmwares do not have that restriction. So stop making assumptions about them generically fitting in a u16 individually, or in a u32 as a combined 8.8.8. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221129232031.3401386-4-John.C.Harrison@Intel.com
|
#
86d8ddc7 |
|
26-Jan-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915: Fix request ref counting during error capture & debugfs dump When GuC support was added to error capture, the reference counting around the request object was broken. Fix it up. The context based search manages the spinlocking around the search internally. So it needs to grab the reference count internally as well. The execlist only request based search relies on external locking, so it needs an external reference count but within the spinlock not outside it. The only other caller of the context based search is the code for dumping engine state to debugfs. That code wasn't previously getting an explicit reference at all as it does everything while holding the execlist specific spinlock. So, that needs updaing as well as that spinlock doesn't help when using GuC submission. Rather than trying to conditionally get/put depending on submission model, just change it to always do the get/put. v2: Explicitly document adding an extra blank line in some dense code (Andy Shevchenko). Fix multiple potential null pointer derefs in case of no request found (some spotted by Tvrtko, but there was more!). Also fix a leaked request in case of !started and another in __guc_reset_context now that intel_context_find_active_request is actually reference counting the returned request. v3: Add a _get suffix to intel_context_find_active_request now that it grabs a reference (Daniele). v4: Split the intel_guc_find_hung_context change to a separate patch and rename intel_context_find_active_request_get to intel_context_get_active_request (Tvrtko). v5: s/locking/reference counting/ in commit message (Tvrtko) Fixes: dc0dad365c5e ("drm/i915/guc: Fix for error capture after full GPU reset with GuC") Fixes: 573ba126aef3 ("drm/i915/guc: Capture error state on context reset") Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Michael Cheng <michael.cheng@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-3-John.C.Harrison@Intel.com (cherry picked from commit 3700e353781e27f1bc7222f51f2cc36cbeb9b4ec) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
87b04e53 |
|
26-Jan-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Fix locking when searching for a hung request intel_guc_find_hung_context() was not acquiring the correct spinlock before searching the request list. So fix that up. While at it, add some extra whitespace padding for readability. Fixes: dc0dad365c5e ("drm/i915/guc: Fix for error capture after full GPU reset with GuC") Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Michael Cheng <michael.cheng@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com> Cc: Chris Wilson <chris.p.wilson@intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-2-John.C.Harrison@Intel.com (cherry picked from commit d1c3717501bcf56536e8b8c1bdaf5cd5357f6bb2) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
801543b2 |
|
09-Nov-2022 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: stop including i915_irq.h from i915_trace.h Turns out many of the files that need i915_reg.h get it implicitly via {display/intel_de.h, gt/intel_context.h} -> i915_trace.h -> i915_irq.h -> i915_reg.h. Since i915_trace.h doesn't actually need i915_irq.h, makes sense to drop it, but that requires adding quite a few new includes all over the place. Prefer including i915_reg.h where needed instead of adding another implicit include, because eventually we'll want to split up i915_reg.h and only include the specific registers at each place. Also some places actually needed i915_irq.h too. Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/6e78a2e0ac1bffaf5af3b5ccc21dff05e6518cef.1668008071.git.jani.nikula@intel.com
|
#
178b8a36 |
|
02-Nov-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Don't deadlock busyness stats vs reset The engine busyness stats has a worker function to do things like 64bit extend the 32bit hardware counters. The GuC's reset prepare function flushes out this worker function to ensure no corruption happens during the reset. Unforunately, the worker function has an infinite wait for active resets to finish before doing its work. Thus a deadlock would occur if the worker function had actually started just as the reset starts. The function being used to lock the reset-in-progress mutex is called intel_gt_reset_trylock(). However, as noted it does not follow standard 'trylock' conventions and exit if already locked. So rename the current _trylock function to intel_gt_reset_lock_interruptible(), which is the behaviour it actually provides. In addition, add a new implementation of _trylock and call that from the busyness stats worker instead. v2: Rename existing trylock to interruptible rather than trying to preserve the existing (confusing) naming scheme (review comments from Tvrtko). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221102192109.2492625-3-John.C.Harrison@Intel.com
|
#
de51de96 |
|
02-Nov-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Properly initialise kernel contexts If a context has already been registered prior to first submission then context init code was not being called. The noticeable effect of that was the scheduling priority was left at zero (meaning super high priority) instead of being set to normal. This would occur with kernel contexts at start of day as they are manually pinned up front rather than on first submission. So add a call to initialise those when they are pinned. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221102192109.2492625-2-John.C.Harrison@Intel.com
|
#
cc2e0cf0 |
|
31-Oct-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Remove excessive line feeds in state dumps Some of the GuC state dump messages were adding extra line feeds. When printing via a DRM printer to dmesg, for example, that messes up the log formatting as it loses any prefixing from the printer. Given that the extra line feeds are just in the middle of random bits of GuC state, there isn't any real need for them. So just remove them completely. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221031220007.4176835-1-John.C.Harrison@Intel.com
|
#
a7ac9d84 |
|
06-Oct-2022 |
Alan Previn <alan.previn.teres.alexis@intel.com> |
drm/i915/guc: Remove intel_context:number_committed_requests counter With the introduction of the delayed disable-sched behavior, we use the GuC's xarray of valid guc-id's as a way to identify if new requests had been added to a context when the said context is being checked for closure. Additionally that prior change also closes the race for when a new incoming request fails to cancel the pending delayed disable-sched worker. With these two complementary checks, we see no more use for intel_context:guc_state:number_committed_requests. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221006225121.826257-3-alan.previn.teres.alexis@intel.com
|
#
83321094 |
|
06-Oct-2022 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Delay disabling guc_id scheduling for better hysteresis Add a delay, configurable via debugfs (default 34ms), to disable scheduling of a context after the pin count goes to zero. Disable scheduling is a costly operation as it requires synchronizing with the GuC. So the idea is that a delay allows the user to resubmit something before doing this operation. This delay is only done if the context isn't closed and less than a given threshold (default is 3/4) of the guc_ids are in use. Alan Previn: Matt Brost first introduced this patch back in Oct 2021. However no real world workload with measured performance impact was available to prove the intended results. Today, this series is being republished in response to a real world workload that benefited greatly from it along with measured performance improvement. Workload description: 36 containers were created on a DG2 device where each container was performing a combination of 720p 3d game rendering and 30fps video encoding. The workload density was configured in a way that guaranteed each container to ALWAYS be able to render and encode no less than 30fps with a predefined maximum render + encode latency time. That means the totality of all 36 containers and their workloads were not saturating the engines to their max (in order to maintain just enough headrooom to meet the min fps and max latencies of incoming container submissions). Problem statement: It was observed that the CPU core processing the i915 soft IRQ work was experiencing severe load. Using tracelogs and an instrumentation patch to count specific i915 IRQ events, it was confirmed that the majority of the CPU cycles were caused by the gen11_other_irq_handler() -> guc_irq_handler() code path. The vast majority of the cycles was determined to be processing a specific G2H IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent by GuC in response to i915 KMD sending H2G requests: INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent whenever a context goes idle so that we can unpin the context from GuC. The high CPU utilization % symptom was limiting density scaling. Root Cause Analysis: Because the incoming execution buffers were spread across 36 different containers (each with multiple contexts) but the system in totality was NOT saturated to the max, it was assumed that each context was constantly idling between submissions. This was causing a thrashing of unpinning contexts from GuC at one moment, followed quickly by repinning them due to incoming workload the very next moment. These event-pairs were being triggered across multiple contexts per container, across all containers at the rate of > 30 times per sec per context. Metrics: When running this workload without this patch, we measured an average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10 seconds or ~10 million times over ~25+ mins. With this patch, the count reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The improvement observed is ~99% for the average counts per 10 seconds. Design awareness: Selftest impact. As temporary WA disable this feature for the selftests. Selftests are very timing sensitive and any change in timing can cause failure. A follow up patch will fixup the selftests to understand this delay. Design awareness: Race between guc_request_alloc and guc_context_close. If a context close is issued while there is a request submission in flight and a delayed schedule disable is pending, guc_context_close and guc_request_alloc will race to cancel the delayed disable. To close the race, make sure that guc_request_alloc waits for guc_context_close to finish running before checking any state. Design awareness: GT Reset event. If a gt reset is triggered, as preparation steps, add an additional step to ensure all contexts that have a pending delay-disable-schedule task be flushed of it. Move them directly into the closed state after cancelling the worker. This is okay because the existing flow flushes all yet-to-arrive G2H's dropping them anyway. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221006225121.826257-2-alan.previn.teres.alexis@intel.com
|
#
568944af |
|
06-Oct-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Limit scheduling properties to avoid overflow GuC converts the pre-emption timeout and timeslice quantum values into clock ticks internally. That significantly reduces the point of 32bit overflow. On current platforms, worst case scenario is approximately 110 seconds. Rather than allowing the user to set higher values and then get confused by early timeouts, add limits when setting these values. v2: Add helper functions for clamping (review feedback from Tvrtko). v3: Add a bunch of BUG_ON range checks in addition to the checks already in the clamping functions (Tvrtko) Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221006213813.1563435-2-John.C.Harrison@Intel.com
|
#
0add082c |
|
03-Oct-2022 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915/guc: Fix revocation of non-persistent contexts Patch which added graceful exit for non-persistent contexts missed the fact it is not enough to set the exiting flag on a context and let the backend handle it from there. GuC backend cannot handle it because it runs independently in the firmware and driver might not see the requests ever again. Patch also missed the fact some usages of intel_context_is_banned in the GuC backend needed replacing with newly introduced intel_context_is_schedulable. Fix the first issue by calling into backend revoke when we know this is the last chance to do it. Fix the second issue by replacing intel_context_is_banned with intel_context_is_schedulable, which should always be safe since latter is a superset of the former. v2: * Just call ce->ops->revoke unconditionally. (Andrzej) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 45c64ecf97ee ("drm/i915: Improve user experience and driver robustness under SIGINT or similar") Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: <stable@vger.kernel.org> # v6.0+ Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221003121630.694249-1-tvrtko.ursulin@linux.intel.com
|
#
d263545e |
|
29-Sep-2022 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/i915/gt: Fix platform prefix Different handling for XeHP and later platforms should be using the xehp prefix, not gen125. Rename them. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220930050903.3479619-4-lucas.demarchi@intel.com
|
#
1e88da4f |
|
22-Sep-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Enable compute scheduling on DG2 DG2 has issues. To work around one of these the GuC must schedule apps in an exclusive manner across both RCS and CCS. That is, if a context from app X is running on RCS then all CCS engines must sit idle even if there are contexts from apps Y, Z, ... waiting to run. A certain OS favours RCS to the total starvation of CCS. Linux does not. Hence the GuC now has a scheduling policy setting to control this abitration. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220922201209.1446343-2-John.C.Harrison@Intel.com
|
#
d09aa852 |
|
14-Sep-2022 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: move i915_coherent_map_type() to i915_gem_pages.c and un-inline The inline function has no place in i915_drv.h. Move it away, un-inline, and untangle some header dependencies while at it. Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220914163514.1837467-1-jani.nikula@intel.com
|
#
70234728 |
|
03-Oct-2022 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915/guc: Fix revocation of non-persistent contexts Patch which added graceful exit for non-persistent contexts missed the fact it is not enough to set the exiting flag on a context and let the backend handle it from there. GuC backend cannot handle it because it runs independently in the firmware and driver might not see the requests ever again. Patch also missed the fact some usages of intel_context_is_banned in the GuC backend needed replacing with newly introduced intel_context_is_schedulable. Fix the first issue by calling into backend revoke when we know this is the last chance to do it. Fix the second issue by replacing intel_context_is_banned with intel_context_is_schedulable, which should always be safe since latter is a superset of the former. v2: * Just call ce->ops->revoke unconditionally. (Andrzej) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 45c64ecf97ee ("drm/i915: Improve user experience and driver robustness under SIGINT or similar") Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: <stable@vger.kernel.org> # v6.0+ Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221003121630.694249-1-tvrtko.ursulin@linux.intel.com (cherry picked from commit 0add082cebac8555ee3972ba768ae5c01db7a498) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
03d2c54d |
|
06-Sep-2022 |
Matt Roper <matthew.d.roper@intel.com> |
drm/i915/mtl: Use primary GT's irq lock for media GT When we hook up interrupts (in the next patch), interrupts for the media GT are still processed as part of the primary GT's interrupt flow. As such, we should share the same IRQ lock with the primary GT. Let's convert gt->irq_lock into a pointer and just point the media GT's instance at the same lock the primary GT is using. v2: - Point media's gt->irq_lock at the primary GT lock properly. (Daniele) - Fix jump target for intel_root_gt_init_early errors. (Daniele) Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220906234934.3655440-14-matthew.d.roper@intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
31335aa8 |
|
26-Aug-2022 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
drm/i915/guc: Cancel GuC engine busyness worker synchronously The worker is canceled in gt_park path, but earlier it was assumed that gt_park path cannot sleep and the cancel is asynchronous. This caused a race with suspend flow where the worker runs after suspend and causes an unclaimed register access warning. Cancel the worker synchronously since the gt_park is indeed allowed to sleep. v2: Fix author name and sign-off mismatch Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4419 Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220827002135.139349-1-umesh.nerlige.ramappa@intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
65332a5b |
|
06-Sep-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/uc: Add patch level version number support With the move to un-versioned filenames, it becomes more difficult to know exactly what version of a given firmware is being used. So add the patch level version number to the debugfs output. Also, support matching by patch level when selecting code paths for firmware compatibility. While a patch level change cannot be backwards breaking, it is potentially possible that a new feature only works from a given patch level onwards (even though it was theoretically added in an earlier version that bumped the major or minor version). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220906230147.479945-2-daniele.ceraolospurio@intel.com
|
#
665ae9c9 |
|
06-Sep-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/uc: Support for version reduced and multiple firmware files There was a misunderstanding in how firmware file compatibility should be managed within i915. This has been clarified as: i915 must support all existing firmware releases forever new minor firmware releases should replace prior versions only backwards compatibility breaking releases should be a new file This patch cleans up the single fallback file support that was added as a quick fix emergency effort. That is now removed in preference to supporting arbitrary numbers of firmware files per platform. The patch also adds support for having GuC firmware files that are named by major version only (because the major version indicates backwards breaking changes that affect the KMD) and for having HuC firmware files with no version number at all (because the KMD has no interface requirements with the HuC). For GuC, the driver will report via dmesg if the found file is older than expected. For HuC, the KMD will no longer require updating for any new HuC release so will not be able to report what the latest expected version is. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220906230147.479945-1-daniele.ceraolospurio@intel.com
|
#
54c204c5 |
|
19-Aug-2022 |
Matthew Auld <matthew.auld@intel.com> |
Revert "drm/i915/guc: Add delay to disable scheduling after pin count goes to zero" This reverts commit 6a079903847cce1dd06345127d2a32f26d2cd9c6. Everything in CI using GuC is now timing out[1], and killing the machine with this change (perhaps a deadlock?). CI was recently on fire due to some changes coming in from -rc1, so likely the pre-merge CI results for this series were invalid? For now just revert, unless GuC experts already have a fix in mind. [1] https://intel-gfx-ci.01.org/tree/drm-tip/index.html? Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220819123904.913750-1-matthew.auld@intel.com
|
#
6a079903 |
|
16-Aug-2022 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Add a delay, configurable via debugfs (default 34ms), to disable scheduling of a context after the pin count goes to zero. Disable scheduling is a costly operation as it requires synchronizing with the GuC. So the idea is that a delay allows the user to resubmit something before doing this operation. This delay is only done if the context isn't closed and less than a given threshold (default is 3/4) of the guc_ids are in use. As temporary WA disable this feature for the selftests. Selftests are very timing sensitive and any change in timing can cause failure. A follow up patch will fixup the selftests to understand this delay. Alan Previn: Matt Brost first introduced this series back in Oct 2021. However no real world workload with measured performance impact was available to prove the intended results. Today, this series is being republished in response to a real world workload that benefited greatly from it along with measured performance improvement. Workload description: 36 containers were created on a DG2 device where each container was performing a combination of 720p 3d game rendering and 30fps video encoding. The workload density was configured in a way that guaranteed each container to ALWAYS be able to render and encode no less than 30fps with a predefined maximum render + encode latency time. That means the totality of all 36 containers and their workloads were not saturating the engines to their max (in order to maintain just enough headrooom to meet the min fps and max latencies of incoming container submissions). Problem statement: It was observed that the CPU core processing the i915 soft IRQ work was experiencing severe load. Using tracelogs and an instrumentation patch to count specific i915 IRQ events, it was confirmed that the majority of the CPU cycles were caused by the gen11_other_irq_handler() -> guc_irq_handler() code path. The vast majority of the cycles was determined to be processing a specific G2H IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent by GuC in response to i915 KMD sending H2G requests: INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent whenever a context goes idle so that we can unpin the context from GuC. The high CPU utilization % symptom was limiting density scaling. Root Cause Analysis: Because the incoming execution buffers were spread across 36 different containers (each with multiple contexts) but the system in totality was NOT saturated to the max, it was assumed that each context was constantly idling between submissions. This was causing a thrashing of unpinning contexts from GuC at one moment, followed quickly by repinning them due to incoming workload the very next moment. These event-pairs were being triggered across multiple contexts per container, across all containers at the rate of > 30 times per sec per context. Metrics: When running this workload without this patch, we measured an average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10 seconds or ~10 million times over ~25+ mins. With this patch, the count reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The improvement observed is ~99% for the average counts per 10 seconds. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220817020511.2180747-3-alan.previn.teres.alexis@intel.com
|
#
f922fbb0 |
|
11-Aug-2022 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: clear stalled request after a reset If the GuC CTs are full and we need to stall the request submission while waiting for space, we save the stalled request and where the stall occurred; when the CTs have space again we pick up the request submission from where we left off. If a full GT reset occurs, the state of all contexts is cleared and all non-guilty requests are unsubmitted, therefore we need to restart the stalled request submission from scratch. To make sure that we do so, clear the saved request after a reset. Fixes note: the patch that introduced the bug is in 5.15, but no officially supported platform had GuC submission enabled by default in that kernel, so the backport to that particular version (and only that one) can potentially be skipped. Fixes: 925dc1cf58ed ("drm/i915/guc: Implement GuC submission tasklet") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Cc: <stable@vger.kernel.org> # v5.15+ Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220811210812.3239621-1-daniele.ceraolospurio@intel.com
|
#
6c82c752 |
|
27-Jul-2022 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: Don't send policy update for child contexts. The GuC FW applies the parent context policy to all the children, so individual updates to the children are not supported and we should not send them. Note that sending the message did not have any functional consequences, because the GuC just drops it and logs an error; since we were trying to set the child policy to match the parent anyway the message being dropped was not a problem. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220728003339.2361010-1-daniele.ceraolospurio@intel.com
|
#
69142c0a |
|
28-Jul-2022 |
Rahul Kumar Singh <rahul.kumar.singh@intel.com> |
drm/i915/guc: Add selftest for a hung GuC Add a test to check that the hangcheck will recover from a submission hang in the GuC. Signed-off-by: Rahul Kumar Singh <rahul.kumar.singh@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220728182616.2417491-1-John.C.Harrison@Intel.com
|
#
9fb34737 |
|
27-Jul-2022 |
Michał Winiarski <michal.winiarski@intel.com> |
drm/i915/guc: Route semaphores to GuC for Gen12+ In GuC submission mode, there is an option to use auto-switch out semaphores and have GuC auto-switch in a waiting context. This requires routing the semaphore interrupt to GuC. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220728024225.2363663-2-John.C.Harrison@Intel.com
|
#
774ce151 |
|
18-Jul-2022 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: support v69 in parallel to v70 This patch re-introduces support for GuC v69 in parallel to v70. As this is a quick fix, v69 has been re-introduced as the single "fallback" guc version in case v70 is not available on disk and only for platforms that are out of force_probe and require the GuC by default. All v69 specific code has been labeled as such for easy identification, and the same was done for all v70 functions for which there is a separate v69 version, to avoid accidentally calling the wrong version via the unlabeled name. When the fallback mode kicks in, a drm_notice message is printed in dmesg to inform the user of the required update. The existing logging of the fetch function has also been updated so that we no longer complain immediately if we can't find a fw and we only throw an error if the fetch of both the base and fallback blobs fails. The plan is to follow this up with a more complex rework to allow for multiple different GuC versions to be supported at the same time. v2: reduce the fallback to platform that require it, switch to firmware_request_nowarn(), improve logs. Fixes: 2584b3549f4c ("drm/i915/guc: Update to GuC version 70.1.1") Link: https://lists.freedesktop.org/archives/intel-gfx/2022-July/301640.html Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Dave Airlie <airlied@gmail.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220718230732.1409641-1-daniele.ceraolospurio@intel.com
|
#
aee5ae7c |
|
26-Aug-2022 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
drm/i915/guc: Cancel GuC engine busyness worker synchronously The worker is canceled in gt_park path, but earlier it was assumed that gt_park path cannot sleep and the cancel is asynchronous. This caused a race with suspend flow where the worker runs after suspend and causes an unclaimed register access warning. Cancel the worker synchronously since the gt_park is indeed allowed to sleep. v2: Fix author name and sign-off mismatch Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4419 Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220827002135.139349-1-umesh.nerlige.ramappa@intel.com Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (cherry picked from commit 31335aa8e08be3fe10c50aecd2f11aba77544a78) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
4595a254 |
|
11-Aug-2022 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: clear stalled request after a reset If the GuC CTs are full and we need to stall the request submission while waiting for space, we save the stalled request and where the stall occurred; when the CTs have space again we pick up the request submission from where we left off. If a full GT reset occurs, the state of all contexts is cleared and all non-guilty requests are unsubmitted, therefore we need to restart the stalled request submission from scratch. To make sure that we do so, clear the saved request after a reset. Fixes note: the patch that introduced the bug is in 5.15, but no officially supported platform had GuC submission enabled by default in that kernel, so the backport to that particular version (and only that one) can potentially be skipped. Fixes: 925dc1cf58ed ("drm/i915/guc: Implement GuC submission tasklet") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Cc: <stable@vger.kernel.org> # v5.15+ Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220811210812.3239621-1-daniele.ceraolospurio@intel.com (cherry picked from commit f922fbb0f2ad1fd3e3186f39c46673419e6d9281) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
de2228c0 |
|
11-Aug-2022 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: clear stalled request after a reset If the GuC CTs are full and we need to stall the request submission while waiting for space, we save the stalled request and where the stall occurred; when the CTs have space again we pick up the request submission from where we left off. If a full GT reset occurs, the state of all contexts is cleared and all non-guilty requests are unsubmitted, therefore we need to restart the stalled request submission from scratch. To make sure that we do so, clear the saved request after a reset. Fixes note: the patch that introduced the bug is in 5.15, but no officially supported platform had GuC submission enabled by default in that kernel, so the backport to that particular version (and only that one) can potentially be skipped. Fixes: 925dc1cf58ed ("drm/i915/guc: Implement GuC submission tasklet") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Cc: <stable@vger.kernel.org> # v5.15+ Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220811210812.3239621-1-daniele.ceraolospurio@intel.com (cherry picked from commit f922fbb0f2ad1fd3e3186f39c46673419e6d9281) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
0667429c |
|
21-Jun-2022 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
drm/i915/reset: Add additional steps for Wa_22011802037 for execlist backend For execlists backend, current implementation of Wa_22011802037 is to stop the CS before doing a reset of the engine. This WA was further extended to wait for any pending MI FORCE WAKEUPs before issuing a reset. Add the extended steps in the execlist path of reset. In addition, extend the WA to gen11. v2: (Tvrtko) - Clarify comments, commit message, fix typos - Use IS_GRAPHICS_VER for gen 11/12 checks v3: (Daneile) - Drop changes to intel_ring_submission since WA does not apply to it - Log an error if MSG IDLE is not defined for an engine Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Fixes: f6aa0d713c88 ("drm/i915: Add Wa_22011802037 force cs halt") Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220621192105.2100585-1-umesh.nerlige.ramappa@intel.com
|
#
59bcdb56 |
|
22-Jun-2022 |
Alan Previn <alan.previn.teres.alexis@intel.com> |
drm/i915/guc: Don't update engine busyness stats too frequently Using two different types of workoads, it was observed that guc_update_engine_gt_clks was being called too frequently and/or causing a CPU-to-lmem bandwidth hit over PCIE. Details on the workloads and numbers are in the notes below. Background: At the moment, guc_update_engine_gt_clks can be invoked via one of 3 ways. #1 and #2 are infrequent under normal operating conditions: 1.When a predefined "ping_delay" timer expires so that GuC- busyness can sample the GTPM clock counter to ensure it doesn't miss a wrap-around of the 32-bits of the HW counter. (The ping_delay is calculated based on 1/8th the time taken for the counter go from 0x0 to 0xffffffff based on the GT frequency. This comes to about once every 28 seconds at a GT frequency of 19.2Mhz). 2.In preparation for a gt reset. 3.In response to __gt_park events (as the gt power management puts the gt into a lower power state when there is no work being done). Root-cause: For both the workloads described farther below, it was observed that when user space calls IOCTLs that unparks the gt momentarily and repeats such calls many times in quick succession, it triggers calling guc_update_engine_gt_clks as many times. However, the primary purpose of guc_update_engine_gt_clks is to ensure we don't miss the wraparound while the counter is ticking. Thus, the solution is to ensure we skip that check if gt_park is calling this function earlier than necessary. Solution: Snapshot jiffies when we do actually update the busyness stats. Then get the new jiffies every time intel_guc_busyness_park is called and bail if we are being called too soon. Use half of the ping_delay as a safe threshold. NOTE1: Workload1: IGTs' gem_create was modified to create a file handle, allocate memory with sizes that range from a min of 4K to the max supported (in power of two step-sizes). Its maps, modifies and reads back the memory. Allocations and modification is repeated until total memory allocation reaches the max. Then the file handle is closed. With this workload, guc_update_engine_gt_clks was called over 188 thousand times in the span of 15 seconds while this test ran three times. With this patch, the number of calls reduced to 14. NOTE2: Workload2: 30 transcode sessions are created in quick succession. While these sessions are created, pcm-iio tool was used to measure I/O read operation bandwidth consumption sampled at 100 milisecond intervals over the course of 20 seconds. The total bandwidth consumed over 20 seconds without this patch was measured at average at 311KBps per sample. With this patch, the number went down to about 175Kbps which is about a 43% savings. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220623023157.211650-2-alan.previn.teres.alexis@intel.com
|
#
45c64ecf |
|
27-May-2022 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Improve user experience and driver robustness under SIGINT or similar We have long standing customer complaints that pressing Ctrl-C (or to the effect of) causes engine resets with otherwise well behaving programs. Not only is logging engine resets during normal operation not desirable since it creates support incidents, but more fundamentally we should avoid going the engine reset path when we can since any engine reset introduces a chance of harming an innocent context. Reason for this undesirable behaviour is that the driver currently does not distinguish between banned contexts and non-persistent contexts which have been closed. To fix this we add the distinction between the two reasons for revoking contexts, which then allows the strict timeout only be applied to banned, while innocent contexts (well behaving) can preempt cleanly and exit without triggering the engine reset path. Note that the added context exiting category applies both to closed non- persistent context, and any exiting context when hangcheck has been disabled by the user. At the same time we rename the backend operation from 'ban' to 'revoke' which more accurately describes the actual semantics. (There is no ban at the backend level since banning is a concept driven by the scheduling frontend. Backends are simply able to revoke a running context so that is the more appropriate name chosen.) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220527072452.2225610-1-tvrtko.ursulin@linux.intel.com
|
#
303760aa |
|
25-Apr-2022 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
i915/guc/reset: Make __guc_reset_context aware of guilty engines There are 2 ways an engine can get reset in i915 and the method of reset affects how KMD labels a context as guilty/innocent. (1) GuC initiated engine-reset: GuC resets a hung engine and notifies KMD. The context that hung on the engine is marked guilty and all other contexts are innocent. The innocent contexts are resubmitted. (2) GT based reset: When an engine heartbeat fails to tick, KMD initiates a gt/chip reset. All active contexts are marked as guilty and discarded. In order to correctly mark the contexts as guilty/innocent, pass a mask of engines that were reset to __guc_reset_context. Fixes: eb5e7da736f3 ("drm/i915/guc: Reset implementation for new GuC interface") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220426003045.3929439-1-umesh.nerlige.ramappa@intel.com
|
#
a5c89f7c |
|
04-May-2022 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Support programming the EU priority in the GuC descriptor In GuC submission mode the EU priority must be updated by the GuC rather than the driver as the GuC owns the programming of the context descriptor. Given that the GuC code uses the GuC priorities, we can't use a generic function using i915 priorities for both execlists and GuC submission. The existing function has therefore been pushed to the execlists back-end while a new one has been added for GuC. v2: correctly use the GuC prio. Cc: John Harrison <john.c.harrison@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220504234636.2119794-1-daniele.ceraolospurio@intel.com
|
#
a7a47a5d |
|
21-Jun-2022 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
drm/i915/reset: Add additional steps for Wa_22011802037 for execlist backend For execlists backend, current implementation of Wa_22011802037 is to stop the CS before doing a reset of the engine. This WA was further extended to wait for any pending MI FORCE WAKEUPs before issuing a reset. Add the extended steps in the execlist path of reset. In addition, extend the WA to gen11. v2: (Tvrtko) - Clarify comments, commit message, fix typos - Use IS_GRAPHICS_VER for gen 11/12 checks v3: (Daneile) - Drop changes to intel_ring_submission since WA does not apply to it - Log an error if MSG IDLE is not defined for an engine Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Fixes: f6aa0d713c88 ("drm/i915: Add Wa_22011802037 force cs halt") Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220621192105.2100585-1-umesh.nerlige.ramappa@intel.com (cherry picked from commit 0667429ce68e0b08f9f1fec8fd0b1f57228f605e) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
44314885 |
|
18-Jul-2022 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: support v69 in parallel to v70 This patch re-introduces support for GuC v69 in parallel to v70. As this is a quick fix, v69 has been re-introduced as the single "fallback" guc version in case v70 is not available on disk and only for platforms that are out of force_probe and require the GuC by default. All v69 specific code has been labeled as such for easy identification, and the same was done for all v70 functions for which there is a separate v69 version, to avoid accidentally calling the wrong version via the unlabeled name. When the fallback mode kicks in, a drm_notice message is printed in dmesg to inform the user of the required update. The existing logging of the fetch function has also been updated so that we no longer complain immediately if we can't find a fw and we only throw an error if the fetch of both the base and fallback blobs fails. The plan is to follow this up with a more complex rework to allow for multiple different GuC versions to be supported at the same time. v2: reduce the fallback to platform that require it, switch to firmware_request_nowarn(), improve logs. Fixes: 2584b3549f4c ("drm/i915/guc: Update to GuC version 70.1.1") Link: https://lists.freedesktop.org/archives/intel-gfx/2022-July/301640.html Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Dave Airlie <airlied@gmail.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220718230732.1409641-1-daniele.ceraolospurio@intel.com (cherry picked from commit 774ce1510e6ccb9c0752d4aa7a9ff3624b3db3f3) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
e7999fa1 |
|
04-May-2022 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Support programming the EU priority in the GuC descriptor In GuC submission mode the EU priority must be updated by the GuC rather than the driver as the GuC owns the programming of the context descriptor. Given that the GuC code uses the GuC priorities, we can't use a generic function using i915 priorities for both execlists and GuC submission. The existing function has therefore been pushed to the execlists back-end while a new one has been added for GuC. v2: correctly use the GuC prio. Cc: John Harrison <john.c.harrison@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220504234636.2119794-1-daniele.ceraolospurio@intel.com (cherry picked from commit a5c89f7c43c12c592a882a0ec2a15e9df0011e80) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
166c44e6 |
|
25-Apr-2022 |
Chris Wilson <chris.p.wilson@intel.com> |
drm/i915/gt: Clear SET_PREDICATE_RESULT prior to executing the ring Userspace may leave predication enabled upon return from the batch buffer, which has the consequent of preventing all operation from the ring from being executed, including all the synchronisation, coherency control, arbitration and user signaling. This is more than just a local gpu hang in one client, as the user has the ability to prevent the kernel from applying critical workarounds and can cause a full GT reset. We could simply execute MI_SET_PREDICATE upon return from the user batch, but this has the repercussion of modifying the user's context state. Instead, we opt to execute a fixup batch which by mixing predicated operations can determine the state of the SET_PREDICATE_RESULT register and restore it prior to the next userspace batch. This allows us to protect the kernel's ring without changing the uABI. Suggested-by: Zbigniew Kempczynski <zbigniew.kempczynski@intel.com> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Zbigniew Kempczynski <zbigniew.kempczynski@intel.com> Cc: Thomas Hellstrom <thomas.hellstrom@intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220425152317.4275-4-ramalingam.c@intel.com
|
#
ad6ade8e |
|
26-Apr-2022 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
drm/i915/pmu: Use existing uncore helper to read gpm_timestamp Use intel_uncore_read64_2x32 to read upper and lower fields of the GPM timestamp. v2: Fix compile error Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220427003515.3944267-1-umesh.nerlige.ramappa@intel.com
|
#
717f9bad |
|
15-Apr-2022 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/dg2: Enable Wa_14014475959 - RCS / CCS context exit There is bug in DG2 where if the CCS contexts switches out while the RCS is running it can cause memory corruption. To workaround this add an atomic to a memory address with a value 1 and semaphore wait to the same address for a value of 0. The GuC firmware is responsible for writing 0 to the memory address when it is safe for the context to switch out. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220415224025.3693037-6-umesh.nerlige.ramappa@intel.com
|
#
dac38381 |
|
15-Apr-2022 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
drm/i915/guc: Enable Wa_22011802037 for gen12 GuC based platforms Initiating a reset when the command streamer is not idle or in the middle of executing an MI_FORCE_WAKE can result in a hang. Multiple command streamers can be part of a single reset domain, so resetting one would mean resetting all command streamers in that domain. To workaround this, before initiating a reset, ensure that all command streamers within that reset domain are either IDLE or are not executing a MI_FORCE_WAKE. Enable GuC PRE_PARSER WA bit so that GuC follows the WA sequence when initiating engine-resets. For gt-resets, ensure that i915 applies the WA sequence. Opens to address in future patches: - The part of the WA to wait for pending forcewakes is also applicable to execlists backend. - The WA also needs to be applied for gen11 Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220415224025.3693037-3-umesh.nerlige.ramappa@intel.com
|
#
2584b354 |
|
12-Apr-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Update to GuC version 70.1.1 The latest GuC firmware drops the context descriptor pool in favour of passing all creation data in the create H2G. It also greatly simplifies the work queue and removes the process descriptor used for multi-LRC submission. So, remove all mention of LRC and process descriptors and update the registration code accordingly. Unfortunately, the new API also removes the ability to set default values for the scheduling policies at context registration time. Instead, a follow up H2G must be sent. The individual scheduling policy update H2G commands are also dropped in favour of a single KLV based H2G. So, change the update wrappers accordingly and call this during context registration.. Of course, this second H2G per registration might fail due to being backed up. The registration code has a complicated state machine to cope with the actual registration call failing. However, if that works then there is no support for unwinding if a further call should fail. Unwinding would require sending a H2G to de-register - but that can't be done because the CTB is already backed up. So instead, add a new flag to say whether the context has a pending policy update. This is set if the policy H2G fails at registration time. The submission code checks for this flag and retries the policy update if set. If that call fails, the submission path early exists with a retry error. This is something that is already supported for other reasons. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220412225955.1802543-2-John.C.Harrison@Intel.com
|
#
a0f1f7b4 |
|
21-Mar-2022 |
Alan Previn <alan.previn.teres.alexis@intel.com> |
drm/i915/guc: Print the GuC error capture output register list. Print the GuC captured error state register list (string names and values) when gpu_coredump_state printout is invoked via the i915 debugfs for flushing the gpu error-state that was captured prior. Since GuC could have reported multiple engine register dumps in a single notification event, parse the captured data (appearing as a stream of structures) to identify each dump as a different 'engine-capture-group-output'. Finally, for each 'engine-capture-group-output' that is found, verify if the engine register dump corresponds to the engine_coredump content that was previously populated by the i915_gpu_coredump function. That function would have copied the context's vma's including the bacth buffer during the G2H-context-reset notification that occurred earlier. Perform this verification check by comparing guc_id, lrca and engine- instance obtained from the 'engine-capture-group-output' vs a copy of that same info taken during i915_gpu_coredump. If they match, then print those vma's as well (such as the batch buffers). NOTE: the output format was verified using the gem_exec_capture IGT test. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-14-alan.previn.teres.alexis@intel.com
|
#
a6f0f9cf |
|
21-Mar-2022 |
Alan Previn <alan.previn.teres.alexis@intel.com> |
drm/i915/guc: Plumb GuC-capture into gpu_coredump Add a flags parameter through all of the coredump creation functions. Add a bitmask flag to indicate if the top level gpu_coredump event is triggered in response to a GuC context reset notification. Using that flag, ensure all coredump functions that read or print mmio-register values related to work submission or command-streamer engines are skipped and replaced with a calls guc-capture module equivalent functions to retrieve or print the register dump. While here, split out display related register reading and printing into its own function that is called agnostic to whether GuC had triggered the reset. For now, introduce an empty printing function that can filled in on a subsequent patch just to handle formatting. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-13-alan.previn.teres.alexis@intel.com
|
#
f5718a72 |
|
21-Mar-2022 |
Alan Previn <alan.previn.teres.alexis@intel.com> |
drm/i915/guc: Extract GuC error capture lists on G2H notification. - Upon the G2H Notify-Err-Capture event, parse through the GuC Log Buffer (error-capture-subregion) and generate one or more capture-nodes. A single node represents a single "engine- instance-capture-dump" and contains at least 3 register lists: global, engine-class and engine-instance. An internal link list is maintained to store one or more nodes. - Because the link-list node generation happen before the call to i915_gpu_codedump, duplicate global and engine-class register lists for each engine-instance register dump if we find dependent-engine resets in a engine-capture-group. - When i915_gpu_coredump calls into capture_engine, (in a subsequent patch) we detach the matching node (guc-id, LRCA, etc) from the link list above and attach it to i915_gpu_coredump's intel_engine_coredump structure when have matching LRCA/guc-id/engine-instance. Additional notes to be aware of: - GuC generates the error capture dump into the GuC log buffer but this buffer is one big log buffer with 3 independent subregions within it. Each subregion is populated with different content and used in different ways and timings but all regions operate behave as independent ring buffers. Each guc-log subregion (general-logs, crash-dump and error- capture) has it's own guc_log_buffer_state that contain independent read and write pointers. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-11-alan.previn.teres.alexis@intel.com
|
#
61c5ed94 |
|
21-Mar-2022 |
Michael Cheng <michael.cheng@intel.com> |
drm/i915/gt: replace cache_clflush_range Replace all occurrence of cache_clflush_range with drm_clflush_virt_range. This will prevent compile errors on non-x86 platforms. Signed-off-by: Michael Cheng <michael.cheng@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321223819.72833-6-michael.cheng@intel.com
|
#
f9576e36 |
|
03-Mar-2022 |
Matt Roper <matthew.d.roper@intel.com> |
drm/i915/xehp: Support platforms with CCS engines but no RCS In the past we've always assumed that an RCS engine is present on every platform. However now that we have compute engines there may be platforms that have CCS engines but no RCS, or platforms that are designed to have both, but have the RCS engine fused off. Various engine-centric initialization that only needs to be done a single time for the group of RCS+CCS engines can't rely on being setup with the RCS now; instead we add a I915_ENGINE_FIRST_RENDER_COMPUTE flag that will be assigned to a single engine in the group; whichever engine has this flag will be responsible for some of the general setup (RCU_MODE programming, initialization of certain workarounds, etc.). Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220303223435.2793124-1-matthew.d.roper@intel.com
|
#
e1dd8714 |
|
01-Mar-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Fix potential invalid pointer dereferences when decoding G2Hs Some G2H handlers were reading the context id field from the payload before checking the payload met the minimum length required. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-9-John.C.Harrison@Intel.com
|
#
77dcbffb |
|
01-Mar-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Rename desc_idx to ctx_id The LRC descriptor pool is going away. So, stop naming context ids as descriptor pool indecies. While at it, add a bunch of missing line feeds to some error messages. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-7-John.C.Harrison@Intel.com
|
#
8e2e9c43 |
|
01-Mar-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Move lrc desc setup to where it is needed The LRC descriptor was being initialised early on in the context registration sequence. It could then be determined that the actual registration needs to be delayed and the descriptor would be wiped out. This is inefficient, so move the setup to later in the process after the point of no return. v2: Move some split changes into the split patch (and do them correctly). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-6-John.C.Harrison@Intel.com
|
#
58ea7d62 |
|
01-Mar-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Split guc_lrc_desc_pin apart The LRC descriptor pool is going away. Further, the function that was populating it was also doing a bunch of logic about the context registration sequence. So, split that code apart into separate state setup and try to register functions. Note that some of those 'try to register' code paths actually undo the state setup and leave it to be redone again later (with potentially different values). This is inefficient. The next patch will correct this. Also, move a comment about ignoring return values to the place where the return values are actually ignored. v2: Move some more splitting from a later patch (and do it correctly). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-5-John.C.Harrison@Intel.com
|
#
d1249022 |
|
01-Mar-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Better name for context id limit The LRC descriptor pool is going away. So, stop using it as the limit for how many context ids are available. Instead, size the pool according to the number of contexts allowed. Note that this is just a naming change, the actual limit is identical in value. While at it, also update a kzalloc(sizeof()*count) to be a kcalloc(count,size). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-4-John.C.Harrison@Intel.com
|
#
09570c50 |
|
01-Mar-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Add an explicit 'submission_initialized' flag The LRC descriptor pool is going away. So, stop using it as a check for whether submission has been initialised or not. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-3-John.C.Harrison@Intel.com
|
#
02942b42 |
|
01-Mar-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Do not conflate lrc_desc with GuC id for registration The LRC descriptor pool is going away. So, stop using it as a check for context registration, use the GuC id instead (being the thing that actually gets registered with the GuC). Also, rename the set/clear/query helper functions for context id mappings to better reflect their purpose and to differentiate from other registration related helper functions. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-2-John.C.Harrison@Intel.com
|
#
89e96d82 |
|
25-Apr-2022 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
i915/guc/reset: Make __guc_reset_context aware of guilty engines There are 2 ways an engine can get reset in i915 and the method of reset affects how KMD labels a context as guilty/innocent. (1) GuC initiated engine-reset: GuC resets a hung engine and notifies KMD. The context that hung on the engine is marked guilty and all other contexts are innocent. The innocent contexts are resubmitted. (2) GT based reset: When an engine heartbeat fails to tick, KMD initiates a gt/chip reset. All active contexts are marked as guilty and discarded. In order to correctly mark the contexts as guilty/innocent, pass a mask of engines that were reset to __guc_reset_context. Fixes: eb5e7da736f3 ("drm/i915/guc: Reset implementation for new GuC interface") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220426003045.3929439-1-umesh.nerlige.ramappa@intel.com (cherry picked from commit 303760aa914b7f5ac9602dbb4b471a2ad52eeb3e) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
87cb6d80 |
|
01-Mar-2022 |
Matt Roper <matthew.d.roper@intel.com> |
drm/i915/xehp: Enable ccs/dual-ctx in RCU_MODE We have to specify in the Render Control Unit Mode register when CCS is enabled. v2: - Move RCU_MODE programming to a helper function. (Tvrtko) - Clean up and clarify comments. (Tvrtko) - Add RCU_MODE to the GuC save/restore list. (Daniele) v3: - Move this patch before the GuC ADS update to enable compute engines; the definition of RCU_MODE and its insertion into the save/restore list moves to this patch. (Daniele) v4: - Call xehp_enable_ccs_engines() directly in guc_resume() and execlists_resume() rather than adding an extra layer of wrapping to the engine->resume() vfunc. (Umesh) Bspec: 46034 Original-author: Michel Thierry Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302001554.1836066-1-matthew.d.roper@intel.com
|
#
c674c5b9 |
|
01-Mar-2022 |
Matt Roper <matthew.d.roper@intel.com> |
drm/i915/xehp: CCS should use RCS setup functions The compute engine handles the same commands the render engine can (except 3D pipeline), so it makes sense that CCS is more similar to RCS than non-render engines. The CCS context state (lrc) is also similar to the render one, so reuse it. Note that the compute engine has its own CTX_R_PWR_CLK_STATE register. In order to avoid having multiple RCS && CCS checks, add the following engine flag: - I915_ENGINE_HAS_RCS_REG_STATE - use the render (larger) reg state ctx. BSpec: 46260 Original-author: Michel Thierry Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-6-matthew.d.roper@intel.com
|
#
e2a1e7ab |
|
24-Feb-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Do not complain about stale reset notifications It is possible for reset notifications to arrive for a context that is in the process of being banned. So don't flag these as an error, just report it as informational (because it is still useful to know that resets are happening even if they are being ignored). v2: Better wording for the message (review feedback from Tvrtko). v3: Fix rebase issue (review feedback from Daniele). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220225015232.1939497-1-John.C.Harrison@Intel.com
|
#
e068ef3f |
|
14-Feb-2022 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: Initialize GuC submission locks and queues early Move initialization of submission-related spinlock, lists and workers to init_early. This fixes an issue where if the GuC init fails we might still try to get the lock in the context cleanup code. Note that it is safe to call the GuC context cleanup code even if the init failed because all contexts are initialized with an invalid GuC ID, which will cause the GuC side of the cleanup to be skipped, so it is easier to just make sure the variables are initialized than to special case the cleanup to handle the case when they're not. References: https://gitlab.freedesktop.org/drm/intel/-/issues/4932 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220215011123.734572-1-daniele.ceraolospurio@intel.com
|
#
eee5215b |
|
17-Feb-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Fix flag query helper function to not modify state A flag query helper was actually writing to the flags word rather than just reading. Fix that. Also update the function's comment as it was out of date. NB: No need for a 'Fixes' tag. The test was only ever used inside a BUG_ON during context registration. Rather than asserting that the condition was true, it was making the condition true. So, in theory, there was no consequence because we should never have hit a BUG_ON anyway. Which means the write should always have been a no-op. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220217212942.629922-1-John.C.Harrison@Intel.com
|
#
4801b995 |
|
16-Feb-2022 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/i915/guc: Convert engine record to iosys_map Use iosys_map to read fields from the dma_blob so access to IO and system memory is abstracted away. Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Atwood<matthew.s.atwood@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220216174147.3073235-9-lucas.demarchi@intel.com
|
#
0d6419e9 |
|
27-Jan-2022 |
Matt Roper <matthew.d.roper@intel.com> |
drm/i915: Move GT registers to their own header file This is a huge, chaotic mass of registers copied over as-is without any real cleanup. We'll come back and organize these better, align on consistent coding style, remove dead code, etc. in separate patches later that will be easier to review. v2: - Add missing include in intel_pxp_irq.c v3: - Correct a few indentation errors (Lucas) - Minor conflict resolution Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220127234334.4016964-6-matthew.d.roper@intel.com
|
#
512712a8 |
|
24-Jan-2022 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
drm/i915/pmu: Fix KMD and GuC race on accessing busyness GuC updates shared memory and KMD reads it. Since this is not synchronized, we run into a race where the value read is inconsistent. Sometimes the inconsistency is in reading the upper MSB bytes of the last_switch_in value. 2 types of cases are seen - upper 8 bits are zero and upper 24 bits are zero. Since these are non-zero values, it is not trivial to determine validity of these values. Instead we read the values multiple times until they are consistent. In test runs, 3 attempts results in consistent values. The upper bound is set to 6 attempts and may need to be tuned as per any new occurences. Since the duration that gt is parked can vary, the patch also updates the gt timestamp on unpark before starting the worker. v2: - Initialize i - Use READ_ONCE to access engine record Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220125020124.788679-2-umesh.nerlige.ramappa@intel.com
|
#
721fd84e |
|
10-Jan-2022 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
drm/i915/pmu: Use PM timestamp instead of RING TIMESTAMP for reference All timestamps returned by GuC for GuC PMU busyness are captured from GUC PM TIMESTAMP. Since this timestamp does not tick when GuC goes idle, kmd uses RING_TIMESTAMP to measure busyness of an engine with an active context. In further stress testing, the MMIO read of the RING_TIMESTAMP is seen to cause a rare hang. Resolve the issue by using gt specific timestamp from PM which is in sync with the GuC PM timestamp. Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220111015523.225562-1-umesh.nerlige.ramappa@intel.com
|
#
85e97b1d |
|
19-Jan-2022 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Ensure multi-lrc fini breadcrumb math is correct Realized that the GuC multi-lrc fini breadcrumb emit code is very delicate as the math this code does relies on functions it calls to emit a certain number of DWs. Add a few GEM_BUG_ONs to assert the math is correct. v2: - Rebase + resend for CI (Checkpatch) - Fix blank line warning Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220119210639.33053-1-matthew.brost@intel.com
|
#
5fe0fdd2 |
|
20-Jan-2022 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Flush G2H handler during a GT reset Now that the error capture is fully decoupled from fence signalling (request retirement to free memory, which in turn depends on resets) we can safely flush the G2H handler during a GT reset. This eliminates corner cases where GuC generated G2H (e.g. engine resets) race with a GT reset. v2: (John Harrison) - Fix typo in commit message (s/is/in) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220121043118.24886-4-matthew.brost@intel.com
|
#
1f73a367 |
|
20-Jan-2022 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Add work queue to trigger a GT reset The G2H handler needs to be flushed during a GT reset but a G2H indicating engine reset failure can trigger a GT reset. Add a worker to trigger the GT rest when an engine reset failure is received to break this circular dependency. v2: (John Harrison) - Store engine reset mask - Fix typo in commit message v3: (John Harrison) - Fix another typo in commit message - s/reset_*/reset_fail_*/ Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220121043118.24886-3-matthew.brost@intel.com
|
#
41f8aa5d |
|
13-Jan-2022 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Remove hacks for reset and schedule disable G2H being received out of order In the i915 there are several hacks in place to make request cancellation work with an old version of the GuC which delivered the G2H indicating schedule disable is done before G2H indicating a context reset. Version 69 fixes this, so we can remove these hacks. v2: (Checkpatch) - s/cancelation/cancellation Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220113181351.21296-3-matthew.brost@intel.com
|
#
202b1f4c |
|
10-Jan-2022 |
Matt Roper <matthew.d.roper@intel.com> |
drm/i915/gt: Move engine registers to their own header Let's continue breaking up and cleaning up the massive i915_reg.h file by moving all registers that are defined in relation to an engine base to their own header. There are probably a bunch of other "engine registers" that we haven't moved yet (especially those that belong to the render engine in the 0x2??? range), but this is a relatively straightforward first step. Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220111051600.3429104-8-matthew.d.roper@intel.com
|
#
77b6f79d |
|
06-Jan-2022 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Update to GuC version 69.0.3 Update to the latest GuC release. The latest GuC firmware introduces a number of interface changes: GuC may return NO_RESPONSE_RETRY message for requests sent over CTB. Add support for this reply and try resending the request again as a new CTB message. A KLV (key-length-value) mechanism is now used for passing configuration data such as CTB management. With the new KLV scheme, the old CTB management actions are no longer used and are removed. Register capture on hang is now supported by GuC. Full i915 support for this will be added by a later patch. A minimum support of providing capture memory and register lists is required though, so add that in. The device id of the current platform needs to be provided at init time. The 'poll CS' w/a (Wa_22012773006) was blanket enabled by previous versions of GuC. It must now be explicitly requested by the KMD. So, add in the code to turn it on when relevant. The GuC log entry format has changed. This requires adding a new field to the log header structure to mark the wrap point at the end of the buffer (as the buffer size is no longer a multiple of the log entry size). New CTB notification messages are now sent for some things that were previously only sent via MMIO notifications. Of these, the crash dump notification was not really being handled by i915. It called the log flush code but that only flushed the regular debug log and then only if relay logging was enabled. So just report an error message instead. The 'exception' notification was just being ignored completely. So add an error message for that as well. Note that in either the crash dump or the exception case, the GuC is basically dead. The KMD will detect this via the heartbeat and trigger both an error log (which will include the crash dump as part of the GuC log) and a GT reset. So no other processing is really required. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220107000622.292081-3-John.C.Harrison@Intel.com
|
#
c3c2ac4c |
|
21-Dec-2021 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Check for wedged before doing stuff A fault injection probe test hit a BUG_ON in a GuC error path. It showed that the GuC code could potentially attempt to do many things when the device is actually wedged. So, add a check in to prevent that. v2: Use intel_gt_is_wedged instead of testing bits directly in the GuC submission code (review feedback from Tvrtko). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211221210212.1438670-1-John.C.Harrison@Intel.com
|
#
a88afcfa |
|
22-Dec-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/execlists: Weak parallel submission support for execlists A weak implementation of parallel submission (multi-bb execbuf IOCTL) for execlists. Doing as little as possible to support this interface for execlists - basically just passing submit fences between each request generated and virtual engines are not allowed. This is on par with what is there for the existing (hopefully soon deprecated) bonding interface. We perma-pin these execlists contexts to align with GuC implementation. v2: (John Harrison) - Drop siblings array as num_siblings must be 1 v3: (John Harrison) - Drop single submission v4: (John Harrison) - Actually drop single submission - Use IS_ERR check on return value from intel_context_create - Set last request to NULL on unpin Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211222223532.28698-1-matthew.brost@intel.com
|
#
249af724 |
|
22-Dec-2021 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Report error on invalid reset notification Don't silently drop reset notifications from the GuC. It might not be safe to do an error capture but we still want some kind of report that the reset happened. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211223013128.1739792-1-John.C.Harrison@Intel.com
|
#
7d73c602 |
|
24-Jan-2022 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
drm/i915/pmu: Fix KMD and GuC race on accessing busyness GuC updates shared memory and KMD reads it. Since this is not synchronized, we run into a race where the value read is inconsistent. Sometimes the inconsistency is in reading the upper MSB bytes of the last_switch_in value. 2 types of cases are seen - upper 8 bits are zero and upper 24 bits are zero. Since these are non-zero values, it is not trivial to determine validity of these values. Instead we read the values multiple times until they are consistent. In test runs, 3 attempts results in consistent values. The upper bound is set to 6 attempts and may need to be tuned as per any new occurences. Since the duration that gt is parked can vary, the patch also updates the gt timestamp on unpark before starting the worker. v2: - Initialize i - Use READ_ONCE to access engine record Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220125020124.788679-2-umesh.nerlige.ramappa@intel.com (cherry picked from commit 512712a824de9b856a4e61343e3e4390eba2c391) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
b3f74938 |
|
10-Jan-2022 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
drm/i915/pmu: Use PM timestamp instead of RING TIMESTAMP for reference All timestamps returned by GuC for GuC PMU busyness are captured from GUC PM TIMESTAMP. Since this timestamp does not tick when GuC goes idle, kmd uses RING_TIMESTAMP to measure busyness of an engine with an active context. In further stress testing, the MMIO read of the RING_TIMESTAMP is seen to cause a rare hang. Resolve the issue by using gt specific timestamp from PM which is in sync with the GuC PM timestamp. Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220111015523.225562-1-umesh.nerlige.ramappa@intel.com (cherry picked from commit 721fd84ea1fe957453587efad5fdc44dfba58e04) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
fb3965f9 |
|
10-Dec-2021 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Flag an error if an engine reset fails If GuC encounters an error during engine reset, the i915 driver promotes to full GT reset. This includes an info message about why the reset is happening. However, that is not treated as a failure by any of the CI systems because resets are an expected occurrance during testing. This kind of failure is a major problem and should never happen. So, complain more loudly and make sure CI notices. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211211065859.2248188-4-John.C.Harrison@Intel.com
|
#
0013f5f5 |
|
14-Dec-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Selftest for stealing of guc ids Testing the stealing of guc ids is hard from user space as we have 64k guc_ids. Add a selftest, which artificially reduces the number of guc ids, and forces a steal. The test creates a spinner which is used to block all subsequent submissions until it completes. Next, a loop creates a context and a NOP request each iteration until the guc_ids are exhausted (request creation returns -EAGAIN). The spinner is ended, unblocking all requests created in the loop. At this point all guc_ids are exhausted but are available to steal. Try to create another request which should successfully steal a guc_id. Wait on last request to complete, idle GPU, verify a guc_id was stolen via a counter, and exit the test. Test also artificially reduces the number of guc_ids so the test runs in a timely manner. v2: (John Harrison) - s/stole/stolen - Fix some wording in test description - Rework indexing into context array - Add test description to commit message - Fix typo in commit message (Checkpatch) - s/guc/(guc) in NUMBER_MULTI_LRC_GUC_ID v3: (John Harrison) - Set array value to NULL after extracting error - Fix a few typos in comments / error messages - Delete redundant comment in commit message Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211214170500.28569-8-matthew.brost@intel.com
|
#
2406846e |
|
14-Dec-2021 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Don't hog IRQs when destroying contexts While attempting to debug a CT deadlock issue in various CI failures (most easily reproduced with gem_ctx_create/basic-files), I was seeing CPU deadlock errors being reported. This were because the context destroy loop was blocking waiting on H2G space from inside an IRQ spinlock. There no was deadlock as such, it's just that the H2G queue was full of context destroy commands and GuC was taking a long time to process them. However, the kernel was seeing the large amount of time spent inside the IRQ lock as a dead CPU. Various Bad Things(tm) would then happen (heartbeat failures, CT deadlock errors, outstanding H2G WARNs, etc.). Re-working the loop to only acquire the spinlock around the list management (which is all it is meant to protect) rather than the entire destroy operation seems to fix all the above issues. v2: (John Harrison) - Fix typo in comment message Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211214170500.28569-5-matthew.brost@intel.com
|
#
7aa6d5fe |
|
14-Dec-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Remove racey GEM_BUG_ON A full GT reset can race with the last context put resulting in the context ref count being zero but the destroyed bit not yet being set. Remove GEM_BUG_ON in scrub_guc_desc_for_outstanding_g2h that asserts the destroyed bit must be set in ref count is zero. Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211214170500.28569-4-matthew.brost@intel.com
|
#
939d8e9c |
|
14-Dec-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Only assign guc_id.id when stealing guc_id Previously assigned whole guc_id structure (list, spin lock) which is incorrect, only assign the guc_id.id. Fixes: 0f7976506de61 ("drm/i915/guc: Rework and simplify locking") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211214170500.28569-3-matthew.brost@intel.com
|
#
b25db8c7 |
|
14-Dec-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Use correct context lock when callig clr_context_registered s/ce/cn/ when grabbing guc_state.lock before calling clr_context_registered. Fixes: 0f7976506de61 ("drm/i915/guc: Rework and simplify locking") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211214170500.28569-2-matthew.brost@intel.com
|
#
1ff9fc70 |
|
06-Dec-2021 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
drm/i915/pmu: Fix wakeref leak in PMU busyness during reset GuC PMU busyness gets gt wakeref if awake, but fails to release the wakeref if a reset is in progress. Release the wakeref if it was acquried successfully. v2: Simplify the fix (Ashutosh) Fixes: 2a67b18e67f3 ("drm/i915/pmu: Fix synchronization of PMU callback with reset") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211207020239.43402-1-umesh.nerlige.ramappa@intel.com
|
#
2a67b18e |
|
08-Nov-2021 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
drm/i915/pmu: Fix synchronization of PMU callback with reset Since the PMU callback runs in irq context, it synchronizes with gt reset using the reset count. We could run into a case where the PMU callback could read the reset count before it is updated. This has a potential of corrupting the busyness stats. In addition to the reset count, check if the reset bit is set before capturing busyness. In addition save the previous stats only if you intend to update them. v2: - The 2 reset counts captured in the PMU callback can end up being the same if they were captured right after the count is incremented in the reset flow. This can lead to a bad busyness state. Ensure that reset is not in progress when the initial reset count is captured. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211108211057.68783-1-umesh.nerlige.ramappa@intel.com
|
#
865fbc0f |
|
19-Nov-2021 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
drm/i915/pmu: Avoid with_intel_runtime_pm within spinlock When guc timestamp ping worker runs it takes the spinlock and calls with_intel_runtime_pm. Since with_intel_runtime_pm may sleep, move the spinlock inside __update_guc_busyness_stats. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211120014201.26480-1-umesh.nerlige.ramappa@intel.com
|
#
fe69a2dd |
|
16-Nov-2021 |
Dan Carpenter <dan.carpenter@oracle.com> |
drm/i915/guc: fix NULL vs IS_ERR() checking The intel_engine_create_virtual() function does not return NULL. It returns error pointers. Fixes: e5e32171a2cf ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211116114916.GB11936@kili
|
#
fc12b70d |
|
16-Nov-2021 |
Dan Carpenter <dan.carpenter@oracle.com> |
drm/i915/guc: fix NULL vs IS_ERR() checking The intel_engine_create_virtual() function does not return NULL. It returns error pointers. Fixes: e5e32171a2cf ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211116114916.GB11936@kili
|
#
08d1ecd9 |
|
08-Nov-2021 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Refcount context during error capture When i915 receives a context reset notification from GuC, it triggers an error capture before resetting any outstanding requsts of that context. Unfortunately, the error capture is not a time bound operation. In certain situations it can take a long time, particularly when multiple large LMEM buffers must be read back and eoncoded. If this delay is longer than other timeouts (heartbeat, test recovery, etc.) then a full GT reset can be triggered in the middle. That can result in the context being reset by GuC actually being destroyed before the error capture completes and the GuC submission code resumes. Thus, the GuC side can start dereferencing stale pointers and Bad Things ensue. So add a refcount get of the context during the entire reset operation. That way, the context can't be destroyed part way through no matter what other resets or user interactions occur. v2: (Matthew Brost) - Update patch to work with async error capture v3: (Matthew Brost) - Drop async capture support as that hasn't landed yet Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211108164054.23588-1-matthew.brost@intel.com
|
#
77cdd054 |
|
26-Oct-2021 |
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> |
drm/i915/pmu: Connect engine busyness stats from GuC to pmu With GuC handling scheduling, i915 is not aware of the time that a context is scheduled in and out of the engine. Since i915 pmu relies on this info to provide engine busyness to the user, GuC shares this info with i915 for all engines using shared memory. For each engine, this info contains: - total busyness: total time that the context was running (total) - id: id of the running context (id) - start timestamp: timestamp when the context started running (start) At the time (now) of sampling the engine busyness, if the id is valid (!= ~0), and start is non-zero, then the context is considered to be active and the engine busyness is calculated using the below equation engine busyness = total + (now - start) All times are obtained from the gt clock base. For inactive contexts, engine busyness is just equal to the total. The start and total values provided by GuC are 32 bits and wrap around in a few minutes. Since perf pmu provides busyness as 64 bit monotonically increasing values, there is a need for this implementation to account for overflows and extend the time to 64 bits before returning busyness to the user. In order to do that, a worker runs periodically at frequency = 1/8th the time it takes for the timestamp to wrap. As an example, that would be once in 27 seconds for a gt clock frequency of 19.2 MHz. Note: There might be an over-accounting of busyness due to the fact that GuC may be updating the total and start values while kmd is reading them. (i.e kmd may read the updated total and the stale start). In such a case, user may see higher busyness value followed by smaller ones which would eventually catch up to the higher value. v2: (Tvrtko) - Include details in commit message - Move intel engine busyness function into execlist code - Use union inside engine->stats - Use natural type for ping delay jiffies - Drop active_work condition checks - Use for_each_engine if iterating all engines - Drop seq locking, use spinlock at GuC level to update engine stats - Document worker specific details v3: (Tvrtko/Umesh) - Demarcate GuC and execlist stat objects with comments - Document known over-accounting issue in commit - Provide a consistent view of GuC state - Add hooks to gt park/unpark for GuC busyness - Stop/start worker in gt park/unpark path - Drop inline - Move spinlock and worker inits to GuC initialization - Drop helpers that are called only once v4: (Tvrtko/Matt/Umesh) - Drop addressed opens from commit message - Get runtime pm in ping, remove from the park path - Use cancel_delayed_work_sync in disable_submission path - Update stats during reset prepare - Skip ping if reset in progress - Explicitly name execlists and GuC stats objects - Since disable_submission is called from many places, move resetting stats to intel_guc_submission_reset_prepare v5: (Tvrtko) - Add a trylock helper that does not sleep and synchronize PMU event callbacks and worker with gt reset v6: (CI BAT failures) - DUTs using execlist submission failed to boot since __gt_unpark is called during i915 load. This ends up calling the GuC busyness unpark hook and results in kick-starting an uninitialized worker. Let park/unpark hooks check if GuC submission has been initialized. - drop cant_sleep() from trylock helper since rcu_read_lock takes care of that. v7: (CI) Fix igt@i915_selftest@live@gt_engines - For GuC mode of submission the engine busyness is derived from gt time domain. Use gt time elapsed as reference in the selftest. - Increase busyness calculation to 10ms duration to ensure batch runs longer and falls within the busyness tolerances in selftest. v8: - Use ktime_get in selftest as before - intel_reset_trylock_no_wait results in a lockdep splat that is not trivial to fix since the PMU callback runs in irq context and the reset paths are tightly knit into the driver. The test that uncovers this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait, instead use the reset_count to synchronize with gt reset during pmu callback. For the ping, continue to use intel_reset_trylock since ping is not run in irq context. - GuC PM timestamp does not tick when GuC is idle. This can potentially result in wrong busyness values when a context is active on the engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to process the GuC busyness stats. This works since both GuC timestamp and RING timestamp are synced with the same clock. - The busyness stats may get updated after the batch starts running. This delay causes the busyness reported for 100us duration to fall below 95% in the selftest. The only option at this time is to wait for GuC busyness to change from idle to active before we sample busyness over a 100us period. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
|
#
12a9917e |
|
20-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Fix recursive lock in GuC submission Use __release_guc_id (lock held) rather than release_guc_id (acquires lock), add lockdep annotations. 213.280129] i915: Running i915_perf_live_selftests/live_noa_gpr [ 213.283459] ============================================ [ 213.283462] WARNING: possible recursive locking detected {{[ 213.283466] 5.15.0-rc6+ #18 Tainted: G U W }} [ 213.283470] -------------------------------------------- [ 213.283472] kworker/u24:0/8 is trying to acquire lock: [ 213.283475] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x2df/0x350 [i915] {{[ 213.283618] }} {{ but task is already holding lock:}} [ 213.283621] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915] {{[ 213.283720] }} {{ other info that might help us debug this:}} [ 213.283724] Possible unsafe locking scenario:[ 213.283727] CPU0 [ 213.283728] ---- [ 213.283730] lock(&guc->submission_state.lock); [ 213.283734] lock(&guc->submission_state.lock); {{[ 213.283737] }} {{ *** DEADLOCK ***}}[ 213.283740] May be due to missing lock nesting notation[ 213.283744] 3 locks held by kworker/u24:0/8: [ 213.283747] #0: ffff8ffb80059d38 ((wq_completion)events_unbound){..}-{0:0}, at: process_one_work+0x1f3/0x550 [ 213.283757] #1: ffffb509000e3e78 ((work_completion)(&guc->submission_state.destroyed_worker)){..}-{0:0}, at: process_one_work+0x1f3/0x550 [ 213.283766] #2: ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915] {{[ 213.283860] }} {{ stack backtrace:}} [ 213.283863] CPU: 8 PID: 8 Comm: kworker/u24:0 Tainted: G U W 5.15.0-rc6+ #18 [ 213.283868] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021 [ 213.283873] Workqueue: events_unbound destroyed_worker_func [i915] [ 213.283957] Call Trace: [ 213.283960] dump_stack_lvl+0x57/0x72 [ 213.283966] __lock_acquire.cold+0x191/0x2d3 [ 213.283972] lock_acquire+0xb5/0x2b0 [ 213.283978] ? destroyed_worker_func+0x2df/0x350 [i915] [ 213.284059] ? destroyed_worker_func+0x2d7/0x350 [i915] [ 213.284139] ? lock_release+0xb9/0x280 [ 213.284143] _raw_spin_lock_irqsave+0x48/0x60 [ 213.284148] ? destroyed_worker_func+0x2df/0x350 [i915] [ 213.284226] destroyed_worker_func+0x2df/0x350 [i915] [ 213.284310] process_one_work+0x270/0x550 [ 213.284315] worker_thread+0x52/0x3b0 [ 213.284319] ? process_one_work+0x550/0x550 [ 213.284322] kthread+0x135/0x160 [ 213.284326] ? set_kthread_struct+0x40/0x40 [ 213.284331] ret_from_fork+0x1f/0x30 and a bit later in the trace: {{ 227.499864] do_raw_spin_lock+0x94/0xa0}} [ 227.499868] _raw_spin_lock_irqsave+0x50/0x60 [ 227.499871] ? guc_flush_destroyed_contexts+0x4f/0xf0 [i915] [ 227.499995] guc_flush_destroyed_contexts+0x4f/0xf0 [i915] [ 227.500104] intel_guc_submission_reset_prepare+0x99/0x4b0 [i915] [ 227.500209] ? mark_held_locks+0x49/0x70 [ 227.500212] intel_uc_reset_prepare+0x46/0x50 [i915] [ 227.500320] reset_prepare+0x78/0x90 [i915] [ 227.500412] __intel_gt_set_wedged.part.0+0x13/0xe0 [i915] [ 227.500485] intel_gt_set_wedged.part.0+0x54/0x100 [i915] [ 227.500556] intel_gt_set_wedged_on_fini+0x1a/0x30 [i915] [ 227.500622] intel_gt_driver_unregister+0x1e/0x60 [i915] [ 227.500694] i915_driver_remove+0x4a/0xf0 [i915] [ 227.500767] i915_pci_probe+0x84/0x170 [i915] [ 227.500838] local_pci_probe+0x42/0x80 [ 227.500842] pci_device_probe+0xd9/0x190 [ 227.500844] really_probe+0x1f2/0x3f0 [ 227.500847] __driver_probe_device+0xfe/0x180 [ 227.500848] driver_probe_device+0x1e/0x90 [ 227.500850] __driver_attach+0xc4/0x1d0 [ 227.500851] ? __device_attach_driver+0xe0/0xe0 [ 227.500853] ? __device_attach_driver+0xe0/0xe0 [ 227.500854] bus_for_each_dev+0x64/0x90 [ 227.500856] bus_add_driver+0x12e/0x1f0 [ 227.500857] driver_register+0x8f/0xe0 [ 227.500859] i915_init+0x1d/0x8f [i915] [ 227.500934] ? 0xffffffffc144a000 [ 227.500936] do_one_initcall+0x58/0x2d0 [ 227.500938] ? rcu_read_lock_sched_held+0x3f/0x80 [ 227.500940] ? kmem_cache_alloc_trace+0x238/0x2d0 [ 227.500944] do_init_module+0x5c/0x270 [ 227.500946] __do_sys_finit_module+0x95/0xe0 [ 227.500949] do_syscall_64+0x38/0x90 [ 227.500951] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 227.500953] RIP: 0033:0x7ffa59d2ae0d [ 227.500954] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48 [ 227.500955] RSP: 002b:00007fff320bbf48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 227.500956] RAX: ffffffffffffffda RBX: 00000000022ea710 RCX: 00007ffa59d2ae0d [ 227.500957] RDX: 0000000000000000 RSI: 00000000022e1d90 RDI: 0000000000000004 [ 227.500958] RBP: 0000000000000020 R08: 00007ffa59df3a60 R09: 0000000000000070 [ 227.500958] R10: 00000000022e1d90 R11: 0000000000000246 R12: 00000000022e1d90 [ 227.500959] R13: 00000000022e58e0 R14: 0000000000000043 R15: 00000000022e42c0 v2: (CI build) - Fix build error Fixes: 1a52faed31311 ("drm/i915/guc: Take GT PM ref when deregistering context") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211020192147.8048-1-matthew.brost@intel.com
|
#
7807bf28 |
|
14-Dec-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Only assign guc_id.id when stealing guc_id Previously assigned whole guc_id structure (list, spin lock) which is incorrect, only assign the guc_id.id. Fixes: 0f7976506de61 ("drm/i915/guc: Rework and simplify locking") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211214170500.28569-3-matthew.brost@intel.com (cherry picked from commit 939d8e9c87e704fd5437e2c8b80929591fe540eb) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
64d16aca |
|
14-Dec-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Use correct context lock when callig clr_context_registered s/ce/cn/ when grabbing guc_state.lock before calling clr_context_registered. Fixes: 0f7976506de61 ("drm/i915/guc: Rework and simplify locking") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211214170500.28569-2-matthew.brost@intel.com (cherry picked from commit b25db8c782ad7ae80d4cea2a09c222f4f8980bb9) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
8b2abf77 |
|
16-Nov-2021 |
Dan Carpenter <dan.carpenter@oracle.com> |
drm/i915/guc: fix NULL vs IS_ERR() checking The intel_engine_create_virtual() function does not return NULL. It returns error pointers. Fixes: e5e32171a2cf ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211116114916.GB11936@kili (cherry picked from commit fc12b70d12d07598cde27cc17dbfafc2a2a33ff8) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
9ca8bb7a |
|
20-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Fix recursive lock in GuC submission Use __release_guc_id (lock held) rather than release_guc_id (acquires lock), add lockdep annotations. 213.280129] i915: Running i915_perf_live_selftests/live_noa_gpr [ 213.283459] ============================================ [ 213.283462] WARNING: possible recursive locking detected {{[ 213.283466] 5.15.0-rc6+ #18 Tainted: G U W }} [ 213.283470] -------------------------------------------- [ 213.283472] kworker/u24:0/8 is trying to acquire lock: [ 213.283475] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x2df/0x350 [i915] {{[ 213.283618] }} {{ but task is already holding lock:}} [ 213.283621] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915] {{[ 213.283720] }} {{ other info that might help us debug this:}} [ 213.283724] Possible unsafe locking scenario:[ 213.283727] CPU0 [ 213.283728] ---- [ 213.283730] lock(&guc->submission_state.lock); [ 213.283734] lock(&guc->submission_state.lock); {{[ 213.283737] }} {{ *** DEADLOCK ***}}[ 213.283740] May be due to missing lock nesting notation[ 213.283744] 3 locks held by kworker/u24:0/8: [ 213.283747] #0: ffff8ffb80059d38 ((wq_completion)events_unbound){..}-{0:0}, at: process_one_work+0x1f3/0x550 [ 213.283757] #1: ffffb509000e3e78 ((work_completion)(&guc->submission_state.destroyed_worker)){..}-{0:0}, at: process_one_work+0x1f3/0x550 [ 213.283766] #2: ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915] {{[ 213.283860] }} {{ stack backtrace:}} [ 213.283863] CPU: 8 PID: 8 Comm: kworker/u24:0 Tainted: G U W 5.15.0-rc6+ #18 [ 213.283868] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021 [ 213.283873] Workqueue: events_unbound destroyed_worker_func [i915] [ 213.283957] Call Trace: [ 213.283960] dump_stack_lvl+0x57/0x72 [ 213.283966] __lock_acquire.cold+0x191/0x2d3 [ 213.283972] lock_acquire+0xb5/0x2b0 [ 213.283978] ? destroyed_worker_func+0x2df/0x350 [i915] [ 213.284059] ? destroyed_worker_func+0x2d7/0x350 [i915] [ 213.284139] ? lock_release+0xb9/0x280 [ 213.284143] _raw_spin_lock_irqsave+0x48/0x60 [ 213.284148] ? destroyed_worker_func+0x2df/0x350 [i915] [ 213.284226] destroyed_worker_func+0x2df/0x350 [i915] [ 213.284310] process_one_work+0x270/0x550 [ 213.284315] worker_thread+0x52/0x3b0 [ 213.284319] ? process_one_work+0x550/0x550 [ 213.284322] kthread+0x135/0x160 [ 213.284326] ? set_kthread_struct+0x40/0x40 [ 213.284331] ret_from_fork+0x1f/0x30 and a bit later in the trace: {{ 227.499864] do_raw_spin_lock+0x94/0xa0}} [ 227.499868] _raw_spin_lock_irqsave+0x50/0x60 [ 227.499871] ? guc_flush_destroyed_contexts+0x4f/0xf0 [i915] [ 227.499995] guc_flush_destroyed_contexts+0x4f/0xf0 [i915] [ 227.500104] intel_guc_submission_reset_prepare+0x99/0x4b0 [i915] [ 227.500209] ? mark_held_locks+0x49/0x70 [ 227.500212] intel_uc_reset_prepare+0x46/0x50 [i915] [ 227.500320] reset_prepare+0x78/0x90 [i915] [ 227.500412] __intel_gt_set_wedged.part.0+0x13/0xe0 [i915] [ 227.500485] intel_gt_set_wedged.part.0+0x54/0x100 [i915] [ 227.500556] intel_gt_set_wedged_on_fini+0x1a/0x30 [i915] [ 227.500622] intel_gt_driver_unregister+0x1e/0x60 [i915] [ 227.500694] i915_driver_remove+0x4a/0xf0 [i915] [ 227.500767] i915_pci_probe+0x84/0x170 [i915] [ 227.500838] local_pci_probe+0x42/0x80 [ 227.500842] pci_device_probe+0xd9/0x190 [ 227.500844] really_probe+0x1f2/0x3f0 [ 227.500847] __driver_probe_device+0xfe/0x180 [ 227.500848] driver_probe_device+0x1e/0x90 [ 227.500850] __driver_attach+0xc4/0x1d0 [ 227.500851] ? __device_attach_driver+0xe0/0xe0 [ 227.500853] ? __device_attach_driver+0xe0/0xe0 [ 227.500854] bus_for_each_dev+0x64/0x90 [ 227.500856] bus_add_driver+0x12e/0x1f0 [ 227.500857] driver_register+0x8f/0xe0 [ 227.500859] i915_init+0x1d/0x8f [i915] [ 227.500934] ? 0xffffffffc144a000 [ 227.500936] do_one_initcall+0x58/0x2d0 [ 227.500938] ? rcu_read_lock_sched_held+0x3f/0x80 [ 227.500940] ? kmem_cache_alloc_trace+0x238/0x2d0 [ 227.500944] do_init_module+0x5c/0x270 [ 227.500946] __do_sys_finit_module+0x95/0xe0 [ 227.500949] do_syscall_64+0x38/0x90 [ 227.500951] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 227.500953] RIP: 0033:0x7ffa59d2ae0d [ 227.500954] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48 [ 227.500955] RSP: 002b:00007fff320bbf48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 227.500956] RAX: ffffffffffffffda RBX: 00000000022ea710 RCX: 00007ffa59d2ae0d [ 227.500957] RDX: 0000000000000000 RSI: 00000000022e1d90 RDI: 0000000000000004 [ 227.500958] RBP: 0000000000000020 R08: 00007ffa59df3a60 R09: 0000000000000070 [ 227.500958] R10: 00000000022e1d90 R11: 0000000000000246 R12: 00000000022e1d90 [ 227.500959] R13: 00000000022e58e0 R14: 0000000000000043 R15: 00000000022e42c0 v2: (CI build) - Fix build error Fixes: 1a52faed31311 ("drm/i915/guc: Take GT PM ref when deregistering context") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211020192147.8048-1-matthew.brost@intel.com (cherry picked from commit 12a9917e9e84fef4efa73c09b32870df0b1ed795) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
28c70233 |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Handle errors in multi-lrc requests If an error occurs in the front end when multi-lrc requests are getting generated we need to skip these in the backend but we still need to emit the breadcrumbs seqno. An issues arises because with multi-lrc breadcrumbs there is a handshake between the parent and children to make forward progress. If all the requests are not present this handshake doesn't work. To work around this, if multi-lrc request has an error we skip the handshake but still emit the breadcrumbs seqno. v2: (John Harrison) - Add comment explaining the skipping of the handshake logic - Fix typos in the commit message v3: (John Harrison) - Fix up some comments about the math to NOP the ring Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-22-matthew.brost@intel.com
|
#
544460c3 |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Multi-BB execbuf Allow multiple batch buffers to be submitted in a single execbuf IOCTL after a context has been configured with the 'set_parallel' extension. The number batches is implicit based on the contexts configuration. This is implemented with a series of loops. First a loop is used to find all the batches, a loop to pin all the HW contexts, a loop to create all the requests, a loop to submit (emit BB start, etc...) all the requests, a loop to tie the requests to the VMAs they touch, and finally a loop to commit the requests to the backend. A composite fence is also created for the generated requests to return to the user and to stick in dma resv slots. No behavior from the existing IOCTL should be changed aside from when throttling because the ring for a context is full. In this situation, i915 will now wait while holding the object locks. This change was done because the code is much simpler to wait while holding the locks and we believe there isn't a huge benefit of dropping these locks. If this proves false we can restructure the code to drop the locks during the wait. IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1 media UMD: https://github.com/intel/media-driver/pull/1252 v2: (Matthew Brost) - Return proper error value if i915_request_create fails v3: (John Harrison) - Add comment explaining create / add order loops + locking - Update commit message explaining different in IOCTL behavior - Line wrap some comments - eb_add_request returns void - Return -EINVAL rather triggering BUG_ON if cmd parser used (Checkpatch) - Check eb->batch_len[*current_batch] v4: (CI) - Set batch len if passed if via execbuf args - Call __i915_request_skip after __i915_request_commit (Kernel test robot) - Initialize rq to NULL in eb_pin_timeline v5: (John Harrison) - Fix typo in comments near bb order loops Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-21-matthew.brost@intel.com
|
#
5851387a |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Implement no mid batch preemption for multi-lrc For some users of multi-lrc, e.g. split frame, it isn't safe to preempt mid BB. To safely enable preemption at the BB boundary, a handshake between parent and child is needed, syncing the set of BBs at the beginning and end of each batch. This is implemented via custom emit_bb_start & emit_fini_breadcrumb functions and enabled by default if a context is configured by set parallel extension. Lastly, this patch updates the process descriptor to the correct size as the memory used in the handshake is directly after the process descriptor. v2: (John Harrison) - Fix a few comments wording - Add struture for parent page layout v3: (John Harrison) - A structure for sync semaphore - Use offsetof to calc address - Update commit message v4: (John Harrison) - Fix typos in comment explaining memory map of scratch page Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-20-matthew.brost@intel.com
|
#
f9d72092 |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Add basic GuC multi-lrc selftest Add very basic (single submission) multi-lrc selftest. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-19-matthew.brost@intel.com
|
#
e5e32171 |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Connect UAPI to GuC multi-lrc interface Introduce 'set parallel submit' extension to connect UAPI to GuC multi-lrc interface. Kernel doc in new uAPI should explain it all. IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1 media UMD: https://github.com/intel/media-driver/pull/1252 v2: (Daniel Vetter) - Add IGT link and placeholder for media UMD link v3: (Kernel test robot) - Fix warning in unpin engines call (John Harrison) - Reword a bunch of the kernel doc v4: (John Harrison) - Add comment why perma-pin is done after setting gem context - Update some comments / docs for proto contexts v5: (John Harrison) - Rework perma-pin comment - Add BUG_IN if context is pinned when setting gem context Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-17-matthew.brost@intel.com
|
#
d38a9294 |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Update debugfs for GuC multi-lrc Display the workqueue status in debugfs for GuC contexts that are in parent-child relationship. v2: (John Harrison) - Output number children in debugfs Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-16-matthew.brost@intel.com
|
#
872758db |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Implement multi-lrc reset Update context and full GPU reset to work with multi-lrc. The idea is parent context tracks all the active requests inflight for itself and its children. The parent context owns the reset replaying / canceling requests as needed. v2: (John Harrison) - Simply loop in find active request - Add comments to find ative request / reset loop v3: (John Harrison) - s/its'/its/g - Fix comment when searching for active request - Reorder if state in __guc_reset_context v4: (Kernel test robot) - Delete unused is_multi_lrc function Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-15-matthew.brost@intel.com
|
#
bc955204 |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Insert submit fences between requests in parent-child relationship The GuC must receive requests in the order submitted for contexts in a parent-child relationship to function correctly. To ensure this, insert a submit fence between the current request and last request submitted for requests / contexts in a parent child relationship. This is conceptually similar to a single timeline. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-14-matthew.brost@intel.com
|
#
6b540bf6 |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Implement multi-lrc submission Implement multi-lrc submission via a single workqueue entry and single H2G. The workqueue entry contains an updated tail value for each request, of all the contexts in the multi-lrc submission, and updates these values simultaneously. As such, the tasklet and bypass path have been updated to coalesce requests into a single submission. v2: (John Harrison) - s/wqe/wqi - Use FIELD_PREP macros - Add GEM_BUG_ONs ensures length fits within field - Add comment / white space to intel_guc_write_barrier (Kernel test robot) - Make need_tasklet a static function v3: (Docs) - A comment for submission_stall_reason v4: (Kernel test robot) - Initialize return value in bypass tasklt submit function (John Harrison) - Add comment near work queue defs - Add BUILD_BUG_ON to ensure WQ_SIZE is a power of 2 - Update write_barrier comment to talk about work queue v5: (John Harrison) - Fix typo in work queue comment Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-13-matthew.brost@intel.com
|
#
99b47aad |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Implement parallel context pin / unpin functions Parallel contexts are perma-pinned by the upper layers which makes the backend implementation rather simple. The parent pins the guc_id and children increment the parent's pin count on pin to ensure all the contexts are unpinned before we disable scheduling with the GuC / or deregister the context. v2: (Daniel Vetter) - Perma-pin parallel contexts Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-12-matthew.brost@intel.com
|
#
09c5e3a5 |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids Assign contexts in parent-child relationship consecutive guc_ids. This is accomplished by partitioning guc_id space between ones that need to be consecutive (1/16 available guc_ids) and ones that do not (15/16 of available guc_ids). The consecutive search is implemented via the bitmap API. This is a precursor to the full GuC multi-lrc implementation but aligns to how GuC mutli-lrc interface is defined - guc_ids must be consecutive when using the GuC multi-lrc interface. v2: (Daniel Vetter) - Explicitly state why we assign consecutive guc_ids v3: (John Harrison) - Bring back in spin lock Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-11-matthew.brost@intel.com
|
#
44d25fec |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Ensure GuC schedule operations do not operate on child contexts In GuC parent-child contexts the parent context controls the scheduling, ensure only the parent does the scheduling operations. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-10-matthew.brost@intel.com
|
#
c2aa552f |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Add multi-lrc context registration Add multi-lrc context registration H2G. In addition a workqueue and process descriptor are setup during multi-lrc context registration as these data structures are needed for multi-lrc submission. v2: (John Harrison) - Move GuC specific fields into sub-struct - Clean up WQ defines - Add comment explaining math to derive WQ / PD address v3: (John Harrison) - Add PARENT_SCRATCH_SIZE define - Update comment explaining multi-lrc register v4: (John Harrison) - Move PARENT_SCRATCH_SIZE to common file Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-9-matthew.brost@intel.com
|
#
4f3059dc |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Add logical engine mapping Add logical engine mapping. This is required for split-frame, as workloads need to be placed on engines in a logically contiguous manner. v2: (Daniel Vetter) - Add kernel doc for new fields v3: (Tvrtko) - Update comment for new logical_mask field v4: (John Harrison) - Update comment for new logical_mask field Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-6-matthew.brost@intel.com
|
#
f61eae18 |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Take engine PM when a context is pinned with GuC submission Taking a PM reference to prevent intel_gt_wait_for_idle from short circuiting while any user context has scheduling enabled. Returning GT idle when it is not can cause all sorts of issues throughout the stack. v2: (Daniel Vetter) - Add might_lock annotations to pin / unpin function v3: (CI) - Drop intel_engine_pm_might_put from unpin path as an async put is used v4: (John Harrison) - Make intel_engine_pm_might_get/put work with GuC virtual engines - Update commit message v5: - Update commit message again Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-4-matthew.brost@intel.com
|
#
1a52faed |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Take GT PM ref when deregistering context Taking a PM reference to prevent intel_gt_wait_for_idle from short circuiting while a deregister context H2G is in flight. To do this must issue the deregister H2G from a worker as context can be destroyed from an atomic context and taking GT PM ref blows up. Previously we took a runtime PM from this atomic context which worked but will stop working once runtime pm autosuspend in enabled. So this patch is two fold, stop intel_gt_wait_for_idle from short circuting and fix runtime pm autosuspend. v2: (John Harrison) - Split structure changes out in different patch (Tvrtko) - Don't drop lock in deregister_destroyed_contexts v3: (John Harrison) - Flush destroyed contexts before destroying context reg pool Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-3-matthew.brost@intel.com
|
#
0ea92ace |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Move GuC guc_id allocation under submission state sub-struct Move guc_id allocation under submission state sub-struct as a future patch will reuse the spin lock as a global submission state lock. Moving this into sub-struct makes ownership of fields / lock clear. v2: (Docs) - Add comment for submission_state sub-structure v3: (John Harrison) - Fixup a few comments v4: (John Harrison) - Fix typo Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-2-matthew.brost@intel.com
|
#
3e42cc61 |
|
22-Sep-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915/gt: Register the migrate contexts with their engines Pinned contexts, like the migrate contexts need reset after resume since their context image may have been lost. Also the GuC needs to register pinned contexts. Add a list to struct intel_engine_cs where we add all pinned contexts on creation, and traverse that list at resume time to reset the pinned contexts. This fixes the kms_pipe_crc_basic@suspend-read-crc-pipe-a selftest for now, but proper LMEM backup / restore is needed for full suspend functionality. However, note that even with full LMEM backup / restore it may be desirable to keep the reset since backing up the migrate context images must happen using memcpy() after the migrate context has become inactive, and for performance- and other reasons we want to avoid memcpy() from LMEM. Also traverse the list at guc_init_lrc_mapping() calling guc_kernel_context_pin() for the pinned contexts, like is already done for the kernel context. v2: - Don't reset the contexts on each __engine_unpark() but rather at resume time (Chris Wilson). v3: - Reset contexts in the engine sanitize callback. (Chris Wilson) Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Brost Matthew <matthew.brost@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-6-thomas.hellstrom@linux.intel.com
|
#
4f41ddc7 |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Add GuC kernel doc Add GuC kernel doc for all structures added thus far for GuC submission and update the main GuC submission section with the new interface details. v2: - Drop guc_active.lock DOC v3: - Fixup a few kernel doc comments (Daniele) v4 (Daniele): - Implement doc suggestions from John - Add kerneldoc for all members of the GuC structure and pull the file in i915.rst v5 (Daniele): - Implement new doc suggestions from John Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-24-matthew.brost@intel.com
|
#
af5bc9f2 |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Drop guc_active move everything into guc_state Now that we have locking hierarchy of sched_engine->lock -> ce->guc_state everything from guc_active can be moved into guc_state and protected the guc_state.lock. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-23-matthew.brost@intel.com
|
#
3cb3e343 |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Move fields protected by guc->contexts_lock into sub structure To make ownership of locking clear move fields (guc_id, guc_id_ref, guc_id_link) to sub structure guc_id in intel_context. Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-22-matthew.brost@intel.com
|
#
9798b172 |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Move GuC priority fields in context under guc_active Move GuC management fields in context under guc_active struct as this is where the lock that protects theses fields lives. Also only set guc_prio field once during context init. v2: (Daniele) - set CONTEXT_SET_INIT Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-21-matthew.brost@intel.com
|
#
5b116c17 |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Drop pin count check trick between sched_disable and re-pin Drop pin count check trick between a sched_disable and re-pin, now rely on the lock and counter of the number of committed requests to determine if scheduling should be disabled on the context. Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-20-matthew.brost@intel.com
|
#
1424ba81 |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Proper xarray usage for contexts_lookup Lock the xarray and take ref to the context if needed. v2: (Checkpatch) - Add new line after declaration (Daniel Vetter) - Correct put / get accounting in xa_for_loops v3: (Checkpatch) - Extra new line Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-19-matthew.brost@intel.com
|
#
0f797650 |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Rework and simplify locking Rework and simplify the locking with GuC subission. Drop sched_state_no_lock and move all fields under the guc_state.sched_state and protect all these fields with guc_state.lock . This requires changing the locking hierarchy from guc_state.lock -> sched_engine.lock to sched_engine.lock -> guc_state.lock. v2: (Daniele) - Don't check fields outside of lock during sched disable, check less fields within lock as some of the outside are no longer needed Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-18-matthew.brost@intel.com
|
#
52d66c06 |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Move guc_blocked fence to struct guc_state Move guc_blocked fence to struct guc_state as the lock which protects the fence lives there. s/ce->guc_blocked/ce->guc_state.blocked/g v2: (Daniele) - s/blocked_fence/blocked/g Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-17-matthew.brost@intel.com
|
#
b0d83888 |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Release submit fence from an irq_work A subsequent patch will flip the locking hierarchy from ce->guc_state.lock -> sched_engine->lock to sched_engine->lock -> ce->guc_state.lock. As such we need to release the submit fence for a request from an IRQ to break a lock inversion - i.e. the fence must be release went holding ce->guc_state.lock and the releasing of the can acquire sched_engine->lock. v2: (Daniele) - Delete request from list before calling irq_work_queue Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-16-matthew.brost@intel.com
|
#
ae36b629 |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Reset LRC descriptor if register returns -ENODEV Reset LRC descriptor if a context register returns -ENODEV as this means we are mid-reset. Fixes: eb5e7da736f3 ("drm/i915/guc: Reset implementation for new GuC interface") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-15-matthew.brost@intel.com
|
#
f16d5cb9 |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Don't touch guc_state.sched_state without a lock Before we did some clever tricks to not use the a lock when touching guc_state.sched_state in certain cases. Don't do that, enforce the use of the lock. v2: (kernel test robo ) - Add __maybe_unused to sched_state_is_init() v3: rebase after the unused code path removal has been moved to an earlier patch. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-14-matthew.brost@intel.com
|
#
422cda4f |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Take context ref when cancelling request A context can get destroyed after cancelling a request, if a context or GT reset occurs, so take a reference to context when cancelling a request. Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-13-matthew.brost@intel.com
|
#
d2420c2e |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/selftests: Add initial GuC selftest for scrubbing lost G2H While debugging an issue with full GT resets I went down a rabbit hole thinking the scrubbing of lost G2H wasn't working correctly. This proved to be incorrect as this was working just fine but this chase inspired me to write a selftest to prove that this works. This simple selftest injects errors dropping various G2H and then issues a full GT reset proving that the scrubbing of these G2H doesn't blow up. v2: (Daniel Vetter) - Use ifdef instead of macros for selftests v3: (Checkpatch) - A space after 'switch' statement v4: (Daniele) - A comment saying GT won't idle if G2H are lost Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-12-matthew.brost@intel.com
|
#
9888beaa |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Don't enable scheduling on a banned context, guc_id invalid, not registered When unblocking a context, do not enable scheduling if the context is banned, guc_id invalid, or not registered. v2: (Daniele) - Add helper for unblock Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-10-matthew.brost@intel.com
|
#
cf37e5c8 |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Kick tasklet after queuing a request Kick tasklet after queuing a request so it submitted in a timely manner. Fixes: 3a4cdf1982f0 ("drm/i915/guc: Implement GuC context operations for new inteface") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-9-matthew.brost@intel.com
|
#
1ca36cff |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Workaround reset G2H is received after schedule done G2H If the context is reset as a result of the request cancellation the context reset G2H is received after schedule disable done G2H which is the wrong order. The schedule disable done G2H release the waiting request cancellation code which resubmits the context. This races with the context reset G2H which also wants to resubmit the context but in this case it really should be a NOP as request cancellation code owns the resubmit. Use some clever tricks of checking the context state to seal this race until the GuC firmware is fixed. v2: (Checkpatch) - Fix typos v3: (Daniele) - State that is a bug in the GuC firmware Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-7-matthew.brost@intel.com
|
#
88209a8e |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Don't drop ce->guc_active.lock when unwinding context Don't drop ce->guc_active.lock when unwinding a context after reset. At one point we had to drop this because of a lock inversion but that is no longer the case. It is much safer to hold the lock so let's do that. Fixes: eb5e7da736f3 ("drm/i915/guc: Reset implementation for new GuC interface") Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-5-matthew.brost@intel.com
|
#
c39f51cc |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Unwind context requests in reverse order When unwinding requests on a reset context, if other requests in the context are in the priority list the requests could be resubmitted out of seqno order. Traverse the list of active requests in reverse and append to the head of the priority list to fix this. Fixes: eb5e7da736f3 ("drm/i915/guc: Reset implementation for new GuC interface") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-4-matthew.brost@intel.com
|
#
669b949c |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Fix outstanding G2H accounting A small race that could result in incorrect accounting of the number of outstanding G2H. Basically prior to this patch we did not increment the number of outstanding G2H if we encoutered a GT reset while sending a H2G. This was incorrect as the context state had already been updated to anticipate a G2H response thus the counter should be incremented. As part of this change we remove a legacy (now unused) path that was the last caller requiring a G2H response that was not guaranteed to loop. This allows us to simplify the accounting as we don't need to handle the case where the send fails due to the channel being busy. Also always use helper when decrementing this value. v2 (Daniele): update GEM_BUG_ON check, pull in dead code removal from later patch, remove loop param from context_deregister. Fixes: f4eb1f3fe946 ("drm/i915/guc: Ensure G2H response has space in buffer") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: <stable@vger.kernel.org> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-3-matthew.brost@intel.com
|
#
fc30a676 |
|
09-Sep-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Fix blocked context accounting Prior to this patch the blocked context counter was cleared on init_sched_state (used during registering a context & resets) which is incorrect. This state needs to be persistent or the counter can read the incorrect value resulting in scheduling never getting enabled again. Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: <stable@vger.kernel.org> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-2-matthew.brost@intel.com
|
#
db301cff |
|
30-Jul-2021 |
Vinay Belgaumkar <vinay.belgaumkar@intel.com> |
drm/i915/guc/slpc: Remove BUG_ON in guc_submission_disable The assumption when it was added was that GT would not be holding any gt_pm references. However, uc_init is called from gt_init_hw, which holds a forcewake ref. If SLPC enable fails, we will still be holding this ref, which will result in the BUG_ON. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210730202119.23810-7-vinay.belgaumkar@intel.com
|
#
e754dccb |
|
26-Jul-2021 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: Unblock GuC submission on Gen11+ Unblock GuC submission on Gen11+ platforms. v2: (Martin Peres / John H) - Delete debug message when GuC is disabled by default on certain platforms Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-34-matthew.brost@intel.com
|
#
ee242ca7 |
|
26-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Implement GuC priority management Implement a simple static mapping algorithm of the i915 priority levels (int, -1k to 1k exposed to user) to the 4 GuC levels. Mapping is as follows: i915 level < 0 -> GuC low level (3) i915 level == 0 -> GuC normal level (2) i915 level < INT_MAX -> GuC high level (1) i915 level == INT_MAX -> GuC highest level (0) We believe this mapping should cover the UMD use cases (3 distinct user levels + 1 kernel level). In addition to static mapping, a simple counter system is attached to each context tracking the number of requests inflight on the context at each level. This is needed as the GuC levels are per context while in the i915 levels are per request. v2: (Daniele) - Add BUILD_BUG_ON to enforce ordering of priority levels - Add missing lockdep to guc_prio_fini - Check for return before setting context registered flag - Map DISPLAY priority or higher to highest guc prio - Update comment for guc_prio Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-33-matthew.brost@intel.com
|
#
3a4bfa09 |
|
26-Jul-2021 |
Rahul Kumar Singh <rahul.kumar.singh@intel.com> |
drm/i915/selftest: Fix workarounds selftest for GuC submission When GuC submission is enabled, the GuC controls engine resets. Rather than explicitly triggering a reset, the driver must submit a hanging context to GuC and wait for the reset to occur. Signed-off-by: Rahul Kumar Singh <rahul.kumar.singh@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-28-matthew.brost@intel.com
|
#
62eaf0ae |
|
26-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Support request cancellation This adds GuC backend support for i915_request_cancel(), which in turn makes CONFIG_DRM_I915_REQUEST_TIMEOUT work. This implementation makes use of fence while there are likely simplier options. A fence was chosen because of another feature coming soon which requires a user to block on a context until scheduling is disabled. In that case we return the fence to the user and the user can wait on that fence. v2: (Daniele) - A comment about locking the blocked incr / decr - A comments about the use of the fence - Update commit message explaining why fence - Delete redundant check blocked count in unblock function - Ring buffer implementation - Comment about blocked in submission path - Shorter rpm path v3: (Checkpatch) - Fix typos in commit message (Daniel) - Rework to simplier locking structure in guc_context_block / unblock Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-26-matthew.brost@intel.com
|
#
ae8ac10d |
|
26-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Implement banned contexts for GuC submission When using GuC submission, if a context gets banned disable scheduling and mark all inflight requests as complete. Cc: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-25-matthew.brost@intel.com
|
#
79357852 |
|
26-Jul-2021 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Hook GuC scheduling policies up Use the official driver default scheduling policies for configuring the GuC scheduler rather than a bunch of hardcoded values. v2: (Matthew Brost) - Move I915_ENGINE_WANT_FORCED_PREEMPTION to later patch Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Cc: Jose Souza <jose.souza@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-21-matthew.brost@intel.com
|
#
dc0dad36 |
|
26-Jul-2021 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Fix for error capture after full GPU reset with GuC In the case of a full GPU reset (e.g. because GuC has died or because GuC's hang detection has been disabled), the driver can't rely on GuC reporting the guilty context. Instead, the driver needs to scan all active contexts and find one that is currently executing, as per the execlist mode behaviour. In GuC mode, this scan is different to execlist mode as the active request list is handled very differently. Similarly, the request state dump in debugfs needs to be handled differently when in GuC submission mode. Also refactured some of the request scanning code to avoid duplication across the multiple code paths that are now replicating it. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-20-matthew.brost@intel.com
|
#
573ba126 |
|
26-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Capture error state on context reset We receive notification of an engine reset from GuC at its completion. Meaning GuC has potentially cleared any HW state we may have been interested in capturing. GuC resumes scheduling on the engine post-reset, as the resets are meant to be transparent, further muddling our error state. There is ongoing work to define an API for a GuC debug state dump. The suggestion for now is to manually disable FW initiated resets in cases where debug state is needed. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-19-matthew.brost@intel.com
|
#
f7957e60 |
|
26-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Handle engine reset failure notification GuC will notify the driver, via G2H, if it fails to reset an engine. We recover by resorting to a full GPU reset. v2: (John Harrison): - s/drm_dbg/drm_err Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-14-matthew.brost@intel.com
|
#
1e0fd2b5 |
|
26-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Handle context reset notification GuC will issue a reset on detecting an engine hang and will notify the driver via a G2H message. The driver will service the notification by resetting the guilty context to a simple state or banning it completely. v2: (John Harrison) - Move msg[0] lookup after length check v3: (John Harrison) - s/drm_dbg/drm_err Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-13-matthew.brost@intel.com
|
#
cad46a33 |
|
26-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Suspend/resume implementation for new interface The new GuC interface introduces an MMIO H2G command, INTEL_GUC_ACTION_RESET_CLIENT, which is used to implement suspend. This MMIO tears down any active contexts generating a context reset G2H CTB for each. Once that step completes the GuC tears down the CTB channels. It is safe to suspend once this MMIO H2G command completes and all G2H CTBs have been processed. In practice the i915 will likely never receive a G2H as suspend should only be called after the GPU is idle. Resume is implemented in the same manner as before - simply reload the GuC firmware and reinitialize everything (e.g. CTB channels, contexts, etc..). v2: (Michel / John H) - INTEL_GUC_ACTION_RESET_CLIENT 0x5B01 -> 0x5507 Cc: John Harrison <john.c.harrison@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-12-matthew.brost@intel.com
|
#
c41ee287 |
|
26-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Reset GPU immediately if submission is disabled If submission is disabled by the backend for any reason, reset the GPU immediately in the heartbeat code as the backend can't be reenabled until the GPU is reset. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-10-matthew.brost@intel.com
|
#
eb5e7da7 |
|
26-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Reset implementation for new GuC interface Reset implementation for new GuC interface. This is the legacy reset implementation which is called when the i915 owns the engine hang check. Future patches will offload the engine hang check to GuC but we will continue to maintain this legacy path as a fallback and this code path is also required if the GuC dies. With the new GuC interface it is not possible to reset individual engines - it is only possible to reset the GPU entirely. This patch forces an entire chip reset if any engine hangs. v2: (Michal) - Check for -EPIPE rather than -EIO (CT deadlock/corrupt check) v3: (John H) - Split into a series of smaller patches v4: (John H) - Fix typo - Add braces around if statements in reset code v5: (Checkpatch) - Fix warnings Cc: John Harrison <john.c.harrison@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <john.c.harrison@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-9-matthew.brost@intel.com
|
#
d1cee2d3 |
|
26-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Move active request tracking to a vfunc Move active request tracking to a backend vfunc rather than assuming all backends want to do this in the manner. In the of case execlists / ring submission the tracking is on the physical engine while with GuC submission it is on the context. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-8-matthew.brost@intel.com
|
#
a95d1160 |
|
26-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs With GuC virtual engines the physical engine which a request executes and completes on isn't known to the i915. Therefore we can't attach a request to a physical engines breadcrumbs. To work around this we create a single breadcrumbs per engine class when using GuC submission and direct all physical engine interrupts to this breadcrumbs. v2: (John H) - Rework header file structure so intel_engine_mask_t can be in intel_engine_types.h Signed-off-by: Matthew Brost <matthew.brost@intel.com> CC: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-6-matthew.brost@intel.com
|
#
96d3e0e1 |
|
26-Jul-2021 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Make hangcheck work with GuC virtual engines The serial number tracking of engines happens at the backend of request submission and was expecting to only be given physical engines. However, in GuC submission mode, the decomposition of virtual to physical engines does not happen in i915. Instead, requests are submitted to their virtual engine mask all the way through to the hardware (i.e. to GuC). This would mean that the heart beat code thinks the physical engines are idle due to the serial number not incrementing. Which in turns means hangcheck does not work for GuC virtual engines. This patch updates the tracking to decompose virtual engines into their physical constituents and tracks the request against each. This is not entirely accurate as the GuC will only be issuing the request to one physical engine. However, it is the best that i915 can do given that it has no knowledge of the GuC's scheduling decisions. Downside of this is that all physical engines constituting a GuC virtual engine will be periodically unparked (even during just a single context executing) in order to be pinged with a heartbeat request. However the power and performance cost of this is not expected to be measurable (due low frequency of heartbeat pulses) and it is considered an easier option than trying to make changes to GuC firmware. v2: (Tvrtko) - Update commit message - Have default behavior if no vfunc present Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-3-matthew.brost@intel.com
|
#
55612025 |
|
26-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: GuC virtual engines Implement GuC virtual engines. Rather simple implementation, basically just allocate an engine, setup context enter / exit function to virtual engine specific functions, set all other variables / functions to guc versions, and set the engine mask to that of all the siblings. v2: Update to work with proto-ctx v3: (Daniele) - Drop include, add comment to intel_virtual_engine_has_heartbeat Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-2-matthew.brost@intel.com
|
#
e03b5906 |
|
21-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Add intel_context tracing Add intel_context tracing. These trace points are particular helpful when debugging the GuC firmware and can be enabled via CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS kernel config option. Cc: John Harrison <john.c.harrison@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-19-matthew.brost@intel.com
|
#
dbf9da8d |
|
21-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Add trace point for GuC submit Add trace point for GuC submit. Extended existing request trace points to include submit fence value,, guc_id, and ring tail value. v2: Fix white space alignment in i915_request_add trace point v3: Delete dep_from , dep_to (Tvrtko) Cc: John Harrison <john.c.harrison@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-18-matthew.brost@intel.com
|
#
28ff6520 |
|
21-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Update GuC debugfs to support new GuC Update GuC debugfs to support the new GuC structures. v2: (John Harrison) - Remove intel_lrc_reg.h include from i915_debugfs.c (Michal) - Rename GuC debugfs functions Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-17-matthew.brost@intel.com
|
#
b97060a9 |
|
21-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC When running the GuC the GPU can't be considered idle if the GuC still has contexts pinned. As such, a call has been added in intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for the number of unpinned contexts to go to zero. v2: rtimeout -> remaining_timeout v3: Drop unnecessary includes, guc_submission_busy_loop -> guc_submission_send_busy_loop, drop negatie timeout trick, move a refactor of guc_context_unpin to earlier path (John H) v4: Add stddef.h back into intel_gt_requests.h, sort circuit idle function if not in GuC submission mode Cc: John Harrison <john.c.harrison@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-16-matthew.brost@intel.com
|
#
f4eb1f3f |
|
21-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Ensure G2H response has space in buffer Ensure G2H response has space in the buffer before sending H2G CTB as the GuC can't handle any backpressure on the G2H interface. v2: (Matthew) - s/INTEL_GUC_SEND/INTEL_GUC_CT_SEND v3: (Matthew) - Add G2H credit accounting to blocking path, add g2h_release_space helper (John H) - CTB_G2H_BUFFER_SIZE / 4 == G2H_ROOM_BUFFER_SIZE Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-15-matthew.brost@intel.com
|
#
1f5cdb06 |
|
21-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Extend deregistration fence to schedule disable Extend the deregistration context fence to fence whne a GuC context has scheduling disable pending. v2: (John H) - Update comment why we check the pin count within spin lock Cc: John Harrison <john.c.harrison@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <john.c.harrison@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-11-matthew.brost@intel.com
|
#
e0717063 |
|
21-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Defer context unpin until scheduling is disabled With GuC scheduling, it isn't safe to unpin a context while scheduling is enabled for that context as the GuC may touch some of the pinned state (e.g. LRC). To ensure scheduling isn't enabled when an unpin is done, a call back is added to intel_context_unpin when pin count == 1 to disable scheduling for that context. When the response CTB is received it is safe to do the final unpin. Future patches may add a heuristic / delay to schedule the disable call back to avoid thrashing on schedule enable / disable. v2: (John H) - s/drm_dbg/drm_err (Daneiel) - Clean up sched state function Cc: John Harrison <john.c.harrison@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-9-matthew.brost@intel.com
|
#
b208f2d5 |
|
21-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Insert fence on context when deregistering Sometimes during context pinning a context with the same guc_id is registered with the GuC. In this a case deregister must be done before the context can be registered. A fence is inserted on all requests while the deregister is in flight. Once the G2H is received indicating the deregistration is complete the context is registered and the fence is released. v2: (John H) - Fix commit message Cc: John Harrison <john.c.harrison@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <john.c.harrison@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-8-matthew.brost@intel.com
|
#
3a4cdf19 |
|
21-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Implement GuC context operations for new inteface Implement GuC context operations which includes GuC specific operations alloc, pin, unpin, and destroy. v2: (Daniel Vetter) - Use msleep_interruptible rather than cond_resched in busy loop (Michal) - Remove C++ style comment v3: (Matthew Brost) - Drop GUC_ID_START (John Harrison) - Fix a bunch of typos - Use drm_err rather than drm_dbg for G2H errors (Daniele) - Fix ;; typo - Clean up sched state functions - Add lockdep for guc_id functions - Don't call __release_guc_id when guc_id is invalid - Use MISSING_CASE - Add comment in guc_context_pin - Use shorter path to rpm (Daniele / CI) - Don't call release_guc_id on an invalid guc_id in destroy v4: (Daniel Vetter) - Add FIXME comment Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-7-matthew.brost@intel.com
|
#
2330923e |
|
21-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Add bypass tasklet submission path to GuC Add bypass tasklet submission path to GuC. The tasklet is only used if H2G channel has backpresure. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-6-matthew.brost@intel.com
|
#
925dc1cf |
|
21-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Implement GuC submission tasklet Implement GuC submission tasklet for new interface. The new GuC interface uses H2G to submit contexts to the GuC. Since H2G use a single channel, a single tasklet is used for the submission path. Also the per engine interrupt handler has been updated to disable the rescheduling of the physical engine tasklet, when using GuC scheduling, as the physical engine tasklet is no longer used. In this patch the field, guc_id, has been added to intel_context and is not assigned. Patches later in the series will assign this value. v2: (John Harrison) - Clean up some comments v3: (John Harrison) - More comment cleanups Cc: John Harrison <john.c.harrison@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-5-matthew.brost@intel.com
|
#
27213d79 |
|
21-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Add LRC descriptor context lookup array Add LRC descriptor context lookup array which can resolve the intel_context from the LRC descriptor index. In addition to lookup, it can determine if the LRC descriptor context is currently registered with the GuC by checking if an entry for a descriptor index is present. Future patches in the series will make use of this array. v2: (Michal) - "linux/xarray.h" -> <linux/xarray.h> - s/lrc/LRC (John H) - Fix commit message Cc: John Harrison <john.c.harrison@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-4-matthew.brost@intel.com
|
#
7518d9b6 |
|
21-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Remove GuC stage descriptor, add LRC descriptor Remove old GuC stage descriptor, add LRC descriptor which will be used by the new GuC interface implemented in this patch series. v2: (John Harrison) - s/lrc/LRC/g Cc: John Harrison <john.c.harrison@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-3-matthew.brost@intel.com
|
#
22916bad |
|
17-Jun-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Move submission tasklet to i915_sched_engine The submission tasklet operates on i915_sched_engine, thus it is the correct place for it. v3: (Jason Ekstrand) Change sched_engine->engine to a void* private data pointer Add kernel doc v4: (Daniele) Update private_data comment Set queue_priority_hint in kick_execlists v5: (CI) Rebase and fix build error Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-9-matthew.brost@intel.com
|
#
d2a31d02 |
|
17-Jun-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Update i915_scheduler to operate on i915_sched_engine Rather passing around an intel_engine_cs in the scheduling code, pass around a i915_sched_engine. v3: (Jason Ekstrand) Add READ_ONCE around rq->engine in lock_sched_engine Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-8-matthew.brost@intel.com
|
#
3f623e06 |
|
17-Jun-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Move engine->schedule to i915_sched_engine The schedule function should be in the schedule object. v3: (Jason Ekstrand) Add kernel doc Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-6-matthew.brost@intel.com
|
#
349a2bc5 |
|
17-Jun-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Move active tracking to i915_sched_engine Move active request tracking and its lock to i915_sched_engine. This lock is also the submission lock so having it in the i915_sched_engine is the correct place. v3: (Jason Ekstrand) Add kernel doc v6: Rebase Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.comk> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-5-matthew.brost@intel.com
|
#
c4fd7d8c |
|
17-Jun-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Reset sched_engine.no_priolist immediately after dequeue Rather than touching schedule state in the generic PM code, reset the priolist allocation when empty in the submission code. Add a wrapper function to do this and update the backends to call it in the correct place. v3: (Jason Ekstrand) Update patch commit message with a better description Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-4-matthew.brost@intel.com
|
#
074bb195 |
|
17-Jun-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Add i915_sched_engine_is_empty function Add wrapper function around RB tree to determine if i915_sched_engine is empty. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-3-matthew.brost@intel.com
|
#
3e28d371 |
|
17-Jun-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Move priolist to new i915_sched_engine object Introduce i915_sched_engine object which is lower level data structure that i915_scheduler / generic code can operate on without touching execlist specific structures. This allows additional submission backends to be added without breaking the layering. Currently the execlists backend uses 1 of these object per each engine (physical or virtual) but future backends like the GuC will point to less instances utilizing the reference counting. This is a bit of detour to integrating the i915 with the DRM scheduler but this object will still exist when the DRM scheduler lands in the i915. It will however look a bit different. It will encapsulate the drm_gpu_scheduler object plus and common variables (to the backends) related to scheduling. Regardless this is a step in the right direction. This patch starts the aforementioned transition by moving the priolist into the i915_sched_engine object. v3: (Jason Ekstrand) Update comment next to intel_engine_cs.virtual Add kernel doc (Checkpatch) Fix double the in commit message v4: (Daniele) Update comment message. Add comment about subclass field Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-2-matthew.brost@intel.com
|
#
c816723b |
|
05-Jun-2021 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/i915/gt: replace IS_GEN and friends with GRAPHICS_VER This was done by the following semantic patch: @@ expression i915; @@ - INTEL_GEN(i915) + GRAPHICS_VER(i915) @@ expression i915; expression E; @@ - INTEL_GEN(i915) >= E + GRAPHICS_VER(i915) >= E @@ expression dev_priv; expression E; @@ - !IS_GEN(dev_priv, E) + GRAPHICS_VER(dev_priv) != E @@ expression dev_priv; expression E; @@ - IS_GEN(dev_priv, E) + GRAPHICS_VER(dev_priv) == E @@ expression dev_priv; expression from, until; @@ - IS_GEN_RANGE(dev_priv, from, until) + IS_GRAPHICS_VER(dev_priv, from, until) @def@ expression E; identifier id =~ "^gen$"; @@ - id = GRAPHICS_VER(E) + ver = GRAPHICS_VER(E) @@ identifier def.id; @@ - id + ver It also takes care of renaming the variable we assign to GRAPHICS_VER() so to use "ver" rather than "gen". Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210605155356.4183026-2-lucas.demarchi@intel.com
|
#
8bb9fbc1 |
|
02-Jun-2021 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: enable only the user interrupt when using GuC submission In GuC submission mode the CS is owned by the GuC FW, so all CS status interrupts are handled by it. We only need the user interrupt as that signals request completion. Since we're now starting the engines directly in GuC submission mode when selected, we can stop switching back and forth between the execlists and the GuC programming and select directly the correct interrupt mask. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210603051630.2635-4-matthew.brost@intel.com
|
#
0669a6e1 |
|
21-May-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Move CS interrupt handler to the backend The different submission backends each have their own preferred behaviour and interrupt setup. Let each handle their own interrupts. This becomes more useful later as we to extract the use of auxiliary state in the interrupt handler that is backend specific. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210521183215.65451-4-matthew.brost@intel.com
|
#
c92c36ed |
|
21-May-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Move submission_method into intel_gt Since we setup the submission method for the engines once, it is easy to assign an enum and use that instead of probing into the backends. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210521183215.65451-3-matthew.brost@intel.com
|
#
0db3633f |
|
21-May-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Move engine setup out of set_default_submission Now that we no longer switch back and forth between guc and execlists, we no longer need to restore the backend's vfunc and can leave them set after initialisation. The only catch is that we lose the submission on wedging and still need to reset the submit_request vfunc on unwedging. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210521183215.65451-2-matthew.brost@intel.com
|
#
2913fa4d |
|
26-Jan-2021 |
Emil Renner Berthing <kernel@esmil.dk> |
drm/i915/gt: use new tasklet API for execution list This converts the driver to use the new tasklet API introduced in commit 12cc923f1ccc ("tasklet: Introduce new initialization API") v2: Fix up selftests/execlists. Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20210126150155.1617-1-kernel@esmil.dk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
2867ff6c |
|
19-Jan-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Strip out internal priorities Since we are not using any internal priority levels, and in the next few patches will introduce a new index for which the optimisation is not so lear cut, discard the small table within the priolist. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210120121439.17600-1-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
007c4578 |
|
12-Jan-2021 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: stop calling execlists_set_default_submission Initialize all required entries from guc_set_default_submission, instead of calling the execlists function. The previously inherited setup has been copied over from the execlist code and simplified by removing the execlists submission-specific parts. v2: move setting of relative_mmio flag to engine_setup_common (Chris) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v1 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20210113021236.8164-5-daniele.ceraolospurio@intel.com
|
#
43aaadc6 |
|
12-Jan-2021 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: init engine directly in GuC submission mode Instead of starting the engine in execlists submission mode and then switching to GuC, start directly in GuC submission mode. The initial setup functions have been copied over from the execlists code and simplified by removing the execlists submission-specific parts. v2: remove unneeded unexpected starting state check (Chris) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20210113021236.8164-4-daniele.ceraolospurio@intel.com
|
#
7e5299ce |
|
12-Jan-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Delete GuC code unused in future patches Delete GuC code unused in future patches that rewrite the GuC interface to work with the new firmware. Most of the code deleted relates to workqueues or execlist port. The code is safe to remove because we still don't allow GuC submission to be enabled, even when overriding the modparam, so it currently can't be reached. The defines + structs for the process descriptor and workqueue remain. Although the new GuC interface does not require either of these for the normal submission path multi-lrc submission does. The usage of the process descriptor and workqueue for multi-lrc will be quite different from the code that is deleted in this patch. A future patch will implement multi-lrc submission. v2: add a code in the commit message about the code being safe to remove (Chris) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20210113021236.8164-2-daniele.ceraolospurio@intel.com
|
#
a0d3fdb6 |
|
18-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Split logical ring contexts from execlist submission Split the definition, construction and updating of the Logical Ring Context from the execlist submission interface. The LRC is used by the HW, irrespective of our different submission backends. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201219020343.22681-1-chris@chris-wilson.co.uk
|
#
70a2b431 |
|
09-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Rename lrc.c to execlists_submission.c We want to separate the utility functions for controlling the logical ring context from the execlists submission mechanism (which is an overgrown scheduler). This is similar to Daniele's work to split up the files, but being selfish I wanted to base it after my own changes to intel_lrc.c petered out. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201209233618.4287-2-chris@chris-wilson.co.uk
|
#
8a25c4be |
|
18-Jun-2020 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915/params: switch to device specific parameters Start using device specific parameters instead of module parameters for most things. The module parameters become the immutable initial values for i915 parameters. The device specific parameters in i915->params start life as a copy of i915_modparams. Any later changes are only reflected in the debugfs. The stragglers are: * i915.force_probe and i915.modeset. Needed before dev_priv is available. This is fine because the parameters are read-only and never modified. * i915.verbose_state_checks. Passing dev_priv to I915_STATE_WARN and I915_STATE_WARN_ON would result in massive and ugly churn. This is handled by not exposing the parameter via debugfs, and leaving the parameter writable in sysfs. This may be fixed up in follow-up work. * i915.inject_probe_failure. Only makes sense in terms of the module, not the device. This is handled by not exposing the parameter via debugfs. v2: Fix uc i915 lookup code (Michał Winiarski) Cc: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com> Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20200618150402.14022-1-jani.nikula@intel.com
|
#
eec39e44 |
|
07-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove wait priority boosting Upon waiting a request (when asked), we gave that request a small priority boost, not enough for it to cause preemption, but enough for it to be scheduled next before all equals. We also used that bit to give new clients a small priority boost, similar to FQ_CODEL, such that we favoured short interactive tasks ahead of long running streams. However, this is causing lots of complications with timeslicing where we both want to honour the boost and yet ignore it. Those complications cause unexpected user behaviour (tasks not being timesliced and run concurrently as epxected), and the easiest way to resolve that is to remove the boost. Hopefully, we can find a compromise again if we need to, but in theory timeslicing itself and future more advanced schedulers should give us the interactivity boost we seek. Testcase: igt/gem_exec_schedule/lateslice Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200507152338.7452-3-chris@chris-wilson.co.uk
|
#
53b2622e |
|
28-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/execlists: Avoid reusing the same logical CCID The bspec is confusing on the nature of the upper 32bits of the LRC descriptor. Once upon a time, it said that it uses the upper 32b to decide if it should perform a lite-restore, and so we must ensure that each unique context submitted to HW is given a unique CCID [for the duration of it being on the HW]. Currently, this is achieved by using a small circular tag, and assigning every context submitted to HW a new id. However, this tag is being cleared on repinning an inflight context such that we end up re-using the 0 tag for multiple contexts. To avoid accidentally clearing the CCID in the upper 32bits of the LRC descriptor, split the descriptor into two dwords so we can update the GGTT address separately from the CCID. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796 Fixes: 2935ed5339c4 ("drm/i915: Remove logical HW ID") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-1-chris@chris-wilson.co.uk (cherry picked from commit 2632f174a2e1a5fd40a70404fa8ccfd0b1f79ebd) (cherry picked from commit a4b70fcc587860f4b972f68217d8ebebe295ec15) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
2632f174 |
|
28-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/execlists: Avoid reusing the same logical CCID The bspec is confusing on the nature of the upper 32bits of the LRC descriptor. Once upon a time, it said that it uses the upper 32b to decide if it should perform a lite-restore, and so we must ensure that each unique context submitted to HW is given a unique CCID [for the duration of it being on the HW]. Currently, this is achieved by using a small circular tag, and assigning every context submitted to HW a new id. However, this tag is being cleared on repinning an inflight context such that we end up re-using the 0 tag for multiple contexts. To avoid accidentally clearing the CCID in the upper 32bits of the LRC descriptor, split the descriptor into two dwords so we can update the GGTT address separately from the CCID. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796 Fixes: 2935ed5339c4 ("drm/i915: Remove logical HW ID") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-1-chris@chris-wilson.co.uk
|
#
36e191f0 |
|
03-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Apply i915_request_skip() on submission Trying to use i915_request_skip() prior to i915_request_add() causes us to try and fill the ring upto request->postfix, which has not yet been set, and so may cause us to memset() past the end of the ring. Instead of skipping the request immediately, just flag the error on the request (only accepting the first fatal error we see) and then clear the request upon submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200304121849.2448028-1-chris@chris-wilson.co.uk
|
#
202c98e7 |
|
18-Feb-2020 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: Apply new uC status tracking to GuC submission as well To be able to differentiate the before and after of our commitment to GuC submission, which will be used in follow-up patches to early set-up the submission structures. v2: move functions to guc_submission.h (Michal) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200218223327.11058-7-daniele.ceraolospurio@intel.com
|
#
e26b6d43 |
|
21-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Pull GT initialisation under intel_gt_init() Begin pulling the GT setup underneath a single GT umbrella; let intel_gt take ownership of its engines! As hinted, the complication is the lifetime of the probed engine versus the active lifetime of the GT backends. We need to detect the engine layout early and keep it until the end so that we can sanitize state on takeover and release. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191222120752.1368352-1-chris@chris-wilson.co.uk
|
#
9f3ccd40 |
|
20-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Drop GEM context as a direct link from i915_request Keep the intel_context as being the primary state for i915_request, with the GEM context a backpointer from the low level state for the rarer cases we need client information. Our goal is to remove such references to clients from the backend, and leave the HW submission agnostic to client interfaces and self-contained. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk
|
#
8c69bd74 |
|
16-Dec-2019 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: Remove function pointers for send/receive calls Since we started using CT buffers on all gens, the function pointers can only be set to either the _nop() or the _ct() functions. Since the _nop() case applies to when the CT are disabled, we can just handle that case in the _ct() functions and call them directly. v2: keep intel_guc_send() and make the CT send/receive functions work on intel_guc_ct. (Michal) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191217012316.13271-5-daniele.ceraolospurio@intel.com
|
#
639f2f24 |
|
13-Dec-2019 |
Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> |
drm/i915: Introduce new macros for tracing New macros ENGINE_TRACE(), CE_TRACE(), RQ_TRACE() and GT_TRACE() are introduce to tag device name and engine name with contexts and requests tracing in i915. Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191213155152.69182-2-venkata.s.dhanalakota@intel.com
|
#
3c9abe88 |
|
05-Dec-2019 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: kill the GuC client We now only use 1 client without any plan to add more. The client is also only holding information about the WQ and the process desc, so we can just move those in the intel_guc structure and always use stage_id 0. v2: fix comment (John) v3: fix the comment for real, fix kerneldoc Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205220243.27403-4-daniele.ceraolospurio@intel.com
|
#
e9362e13 |
|
05-Dec-2019 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: kill doorbell code and selftests Instead of relying on the workqueue, the upcoming reworked GuC submission flow will offer the host driver indipendent control over the execution status of each context submitted to GuC. As part of this, the doorbell usage model has been reworked, with each doorbell being paired to a single lrc and a doorbell ring representing new work available for that specific context. This mechanism, however, limits the number of contexts that can be registered with GuC to the number of doorbells, which is an undesired limitation. To avoid this limitation, we requested the GuC team to also provide a H2G that will allow the host to notify the GuC of work available for a specified lrc, so we can use that mechanism instead of relying on the doorbells. We can therefore drop the doorbell code we currently have, also given the fact that in the unlikely case we'd want to switch back to using doorbells we'd have to heavily rework it. The workqueue will still have a use in the new interface to pass special commands, so that code has been retained for now. With the doorbells gone and the GuC client becoming even simpler, the existing GuC selftests don't give us any meaningful coverage so we can remove them as well. Some selftests might come with the new code, but they will look different from what we have now so if doesn't seem worth it to keep the file around in the meantime. v2: fix comments and commit message (John) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205220243.27403-3-daniele.ceraolospurio@intel.com
|
#
18c094b3 |
|
05-Dec-2019 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: add a helper to allocate and map guc vma We already have a couple of use-cases in the code and another one will come in one of the later patches in the series. v2: use the new function for the CT object as well Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v1 Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205220243.27403-2-daniele.ceraolospurio@intel.com
|
#
d54dc6ee |
|
05-Dec-2019 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: Drop leftover preemption code Remove unused enums and ctx_save_restore_disabled() function, leftover from the legacy preemption removal. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191205220243.27403-1-daniele.ceraolospurio@intel.com
|
#
a22198a9 |
|
06-Dec-2019 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Update uncore access path in flush_ggtt_writes The preferred way to access the uncore is through the GT structure. Update the GuC function, flush_ggtt_writes, to use this path. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <john.c.harrison@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191207010033.24667-1-John.C.Harrison@Intel.com
|
#
93b0e8fe |
|
21-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark intel_wakeref_get() as a sleeper Assume that intel_wakeref_get() may take the mutex, and perform other sleeping actions in the course of its callbacks and so use might_sleep() to ensure that all callers abide. Anything that cannot sleep has to use e.g. intel_wakeref_get_if_active() to guarantee its avoidance of the non-atomic paths. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191121130528.309474-1-chris@chris-wilson.co.uk
|
#
e18417b4 |
|
20-Nov-2019 |
Stuart Summers <stuart.summers@intel.com> |
drm/i915: Use intel_gt_pm_put_async in GuC submission path GuC submission path can be called from an interrupt context and so should use a worker to avoid holding a mutex. References: 07779a76ee1f ("drm/i915: Mark up the calling context for intel_wakeref_put()") Signed-off-by: Stuart Summers <stuart.summers@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191120211321.88021-1-stuart.summers@intel.com
|
#
1cdc2330 |
|
05-Nov-2019 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: Properly capture & release GuC interrupts on Gen11+ With the new interrupt re-partitioning in Gen11, GuC controls by itself the interrupts it receives, so steering bits and registers have been defeatured. Being this the case, when the GuC is in control of submissions we won't know what to do with the ctx switch interrupt in the driver, so disable it. v2 (Daniele): replace the gen9 paths instead of keeping gen9 and gen11 functions since we won't support guc submission on any pre-gen11 platform. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191105225321.26642-1-daniele.ceraolospurio@intel.com
|
#
dd6e38df |
|
29-Oct-2019 |
Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> |
drm/i915: Fix i915_inject_load_error() name to read *_probe_* Commit 50d84418f586 ("drm/i915: Add i915 to i915_inject_probe_failure") introduced new functions unfortunately named incompatibly with rules established by commit f2db53f14d3d ("drm/i915: Replace "_load" with "_probe" consequently"). Fix it for consistency. Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Michał Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191029102036.6326-2-janusz.krzysztofik@linux.intel.com
|
#
3e7abf81 |
|
24-Oct-2019 |
Andi Shyti <andi@etezian.org> |
drm/i915: Extract GT render power state management i915_irq.c is large. One reason for this is that has a large chunk of the GT render power management stashed away in it. Extract that logic out of i915_irq.c and intel_pm.c and put it under one roof. Based on a patch by Chris Wilson. Signed-off-by: Andi Shyti <andi.shyti@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191024211642.7688-1-chris@chris-wilson.co.uk
|
#
2871ea85 |
|
24-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Split intel_ring_submission Split the legacy submission backend from the common CS ring buffer handling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191024100344.5041-1-chris@chris-wilson.co.uk
|
#
5d904e3c |
|
17-Oct-2019 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Pass in intel_gt at some for_each_engine sites Where the function, or code segment, operates on intel_gt, we need to start passing it instead of i915 to for_each_engine(_masked). This is another partial step in migration of i915->engines[] to gt->engines[]. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191017094500.21831-2-tvrtko.ursulin@linux.intel.com
|
#
218151e9 |
|
14-Oct-2019 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: improve documentation Add a short description of what we expect from GuC and some minor improvements to existing documentation. Also remove a comment about a difference between GuC and HuC that is not true anymore. v2: add that the GuC is not mandatory (Martin) v3: add extra newline for better text organization (Martin) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Martin Peres <martin.peres@linux.intel.com> Acked-by: Anna Karas <anna.karas@intel.com> Reviewed-by: Martin Peres <martin.peres@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191014183602.3643-2-daniele.ceraolospurio@intel.com
|
#
0b08ae03 |
|
13-Aug-2019 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: Remove client->submissions The engine->guc_id is GuC FW defined and it is not guaranteed to be below I915_NUM_ENGINES, so we shouldn't use it with the i915-defined client->submissions, as we might overflow. Instead of fixing it, just get rid of client->submissions, because the information we get from it is not interesting anymore now that we only have 1 client. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190814002145.29056-1-daniele.ceraolospurio@intel.com
|
#
5f15c1e6 |
|
12-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/guc: Use a local cancel_port_requests Since execlists and the guc have diverged in their port tracking, we cannot simply reuse the execlists cancellation code as it leads to unbalanced reference counting. Use a local, simpler routine for the guc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190812203626.3948-1-chris@chris-wilson.co.uk
|
#
ee94e0c4 |
|
12-Aug-2019 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/guc: keep breadcrumb irq always enabled We rely on the tasklet to update the GT PM refcount, so we can't disable it even if we've processed all the requests for the engine because we might have detected the request completion before the interrupt arrived. Since on all platforms on which we plan to support guc submission we don't allow disabling the breadcrumb interrupts, we can further siplify the park/unpark flow by removing the interrupt pin/unpin. A BUG_ON has been added to catch changes to this flow that would require us to restore some kind of pinning. v2: split removal of engine_pin/unpin_breadcrumbs_irq to its own patch (chris) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190812233152.2172-1-daniele.ceraolospurio@intel.com
|
#
3ea58029 |
|
12-Aug-2019 |
Michal Wajdeczko <michal.wajdeczko@intel.com> |
drm/i915/uc: Update copyright and license Include 2019 in copyright years and start using SPDX tag. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190812092935.21048-1-michal.wajdeczko@intel.com
|
#
c7302f20 |
|
08-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Defer final intel_wakeref_put to process context As we need to acquire a mutex to serialise the final intel_wakeref_put, we need to ensure that we are in process context at that time. However, we want to allow operation on the intel_wakeref from inside timer and other hardirq context, which means that need to defer that final put to a workqueue. Inside the final wakeref puts, we are safe to operate in any context, as we are simply marking up the HW and state tracking for the potential sleep. It's only the serialisation with the potential sleeping getting that requires careful wait avoidance. This allows us to retain the immediate processing as before (we only need to sleep over the same races as the current mutex_lock). v2: Add a selftest to ensure we exercise the code while lockdep watches. v3: That test was extremely loud and complained about many things! v4: Not a whale! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111295 References: https://bugs.freedesktop.org/show_bug.cgi?id=111245 References: https://bugs.freedesktop.org/show_bug.cgi?id=111256 Fixes: 18398904ca9e ("drm/i915: Only recover active engines") Fixes: 51fbd8de87dc ("drm/i915/pmu: Atomically acquire the gt_pm wakeref") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190808202758.10453-1-chris@chris-wilson.co.uk
|
#
a09d9a80 |
|
06-Aug-2019 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: avoid including intel_drv.h via i915_drv.h->i915_trace.h Disentangle i915_drv.h from intel_drv.h, which gets included via i915_trace.h. This necessitates including i915_trace.h wherever it's needed. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ed82bf259d3b725a1a1a3c3e9d6fb5c08bc4d489.1565085691.git.jani.nikula@intel.com
|
#
750e76b4 |
|
06-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Move the [class][inst] lookup for engines onto the GT To maintain a fast lookup from a GT centric irq handler, we want the engine lookup tables on the intel_gt. To avoid having multiple copies of the same multi-dimension lookup table, move the generic user engine lookup into an rbtree (for fast and flexible indexing). v2: Split uabi_instance cf uabi_class v3: Set uabi_class/uabi_instance after collating all engines to provide a stable uabi across parallel unordered construction. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> #v2 Link: https://patchwork.freedesktop.org/patch/msgid/20190806124300.24945-2-chris@chris-wilson.co.uk
|
#
5d1ef2b4 |
|
02-Aug-2019 |
Michal Wajdeczko <michal.wajdeczko@intel.com> |
drm/i915/uc: Inject probe errors into intel_uc_init_hw Inject probe errors into intel_uc_init_hw to make sure we correctly handle any uC initialization failure. To avoid complains from CI about injected errors use i915_probe_error to lower message level. v4: rebased after moving hot fixes moved to separate patches Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v1 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190802184055.31988-6-michal.wajdeczko@intel.com
|
#
724df646 |
|
31-Jul-2019 |
Michal Wajdeczko <michal.wajdeczko@intel.com> |
drm/i915/guc: Use dedicated flag to track submission mode Instead of relying on enable_guc modparam to represent actual GuC submission mode, use dedicated flag and look at modparam only to check if submission was explicitly disabled by the user. v2: rebased, simplified condition (Chris) Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190731223321.36436-4-michal.wajdeczko@intel.com
|
#
91e55e54 |
|
24-Jul-2019 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/uc: Unify uc_fw status tracking We currently track fetch and load status separately, but the 2 are actually sequential in the uc lifetime (fetch must complete before we can attempt the load!). Unifying the 2 variables we can better follow the sequential states and improve our trackng of the uC state. Also, sprinkle some GEM_BUG_ON to make sure we transition correctly between states. v2: rename states, add the running state (Michal), drop some logs in the fetch path (Michal, Chris) v3: re-rename states, extend early status check to all helpers (Michal) Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190725001813.4740-5-daniele.ceraolospurio@intel.com
|
#
84b1ca2f |
|
13-Jul-2019 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/uc: prefer intel_gt over i915 in GuC/HuC paths With our HW interface logic moving from i915 to gt and with GuC and HuC being part of the gt HW, it makes sense to use the intel_gt structure instead of i915 as our reference object in GuC/HuC paths. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190713100016.8026-9-chris@chris-wilson.co.uk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
#
8b5689d7 |
|
13-Jul-2019 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/uc: move GuC/HuC inside intel_gt under a new intel_uc Being part of the GT HW, it make sense to keep the guc/huc structures inside the GT structure. To help with the encapsulation work done by the following patches, both structures are placed inside a new intel_uc container. Although this results in code with ugly nested dereferences (i915->gt.uc.guc...), it saves us the extra work required in moving the structures twice (i915 -> gt -> uc). The following patches will reduce the number of places where we try to access the guc/huc structures directly from i915 and reduce the ugliness. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190713100016.8026-7-chris@chris-wilson.co.uk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|
#
0f261b24 |
|
13-Jul-2019 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/uc: move GuC and HuC files under gt/uc/ Both microcontrollers are part of the GT HW and are closely related to GT operations. To keep all the files cleanly together, they've been placed in their own subdir inside the gt/ folder Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190713100016.8026-6-chris@chris-wilson.co.uk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
|