#
95f4e97f |
|
15-Feb-2024 |
Jiri Slaby (SUSE) <jirislaby@kernel.org> |
drm/i915: remove execute_cb::signal execute_cb::signal is not used since commit 5ac545b8b014 (drm/i915/request: Remove the hook from await_execution). Drop it. Found by https://github.com/jirislaby/clang-struct. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: intel-gfx@lists.freedesktop.org Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240216065326.6910-20-jirislaby@kernel.org
|
#
28041067 |
|
21-Aug-2023 |
Andrzej Hajda <andrzej.hajda@intel.com> |
drm/i915: mark requests for GuC virtual engines to avoid use-after-free References to i915_requests may be trapped by userspace inside a sync_file or dmabuf (dma-resv) and held indefinitely across different proceses. To counter-act the memory leaks, we try to not to keep references from the request past their completion. On the other side on fence release we need to know if rq->engine is valid and points to hw engine (true for non-virtual requests). To make it possible extra bit has been added to rq->execution_mask, for marking virtual engines. Fixes: bcb9aa45d5a0 ("Revert "drm/i915: Hold reference to intel_context over life of i915_request"") Signed-off-by: Chris Wilson <chris.p.wilson@linux.intel.com> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230821153035.3903006-1-andrzej.hajda@intel.com
|
#
5eefc530 |
|
21-Aug-2023 |
Andrzej Hajda <andrzej.hajda@intel.com> |
drm/i915: mark requests for GuC virtual engines to avoid use-after-free References to i915_requests may be trapped by userspace inside a sync_file or dmabuf (dma-resv) and held indefinitely across different proceses. To counter-act the memory leaks, we try to not to keep references from the request past their completion. On the other side on fence release we need to know if rq->engine is valid and points to hw engine (true for non-virtual requests). To make it possible extra bit has been added to rq->execution_mask, for marking virtual engines. Fixes: bcb9aa45d5a0 ("Revert "drm/i915: Hold reference to intel_context over life of i915_request"") Signed-off-by: Chris Wilson <chris.p.wilson@linux.intel.com> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230821153035.3903006-1-andrzej.hajda@intel.com (cherry picked from commit 280410677af763f3871b93e794a199cfcf6fb580) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
946e047a |
|
20-Jul-2023 |
Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> |
drm/i915: Fix premature release of request's reusable memory Infinite waits for completion of GPU activity have been observed in CI, mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps generated with a lot of extra trace_printk() calls added to the code, revealed loops of request dependencies being accidentally built, preventing the requests from being processed, each waiting for completion of another one's activity. After we substitute a new request for a last active one tracked on a timeline, we set up a dependency of our new request to wait on completion of current activity of that previous one. While doing that, we must take care of keeping the old request still in memory until we use its attributes for setting up that await dependency, or we can happen to set up the await dependency on an unrelated request that already reuses the memory previously allocated to the old one, already released. Combined with perf adding consecutive kernel context remote requests to different user context timelines, unresolvable loops of await dependencies can be built, leading do infinite waits. We obtain a pointer to the previous request to wait upon when we substitute it with a pointer to our new request in an active tracker, e.g. in intel_timeline.last_request. In some processing paths we protect that old request from being freed before we use it by getting a reference to it under RCU protection, but in others, e.g. __i915_request_commit() -> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(), we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU, that RCU protection is not sufficient against reuse of memory. We could protect i915_request's memory from being prematurely reused by calling its release function via call_rcu() and using rcu_read_lock() consequently, as proposed in v1. However, that approach leads to significant (up to 10 times) increase of SLAB utilization by i915_request SLAB cache. Another potential approach is to take a reference to the previous active fence. When updating an active fence tracker, we first lock the new fence, substitute a pointer of the current active fence with the new one, then we lock the substituted fence. With this approach, there is a time window after the substitution and before the lock when the request can be concurrently released by an interrupt handler and its memory reused, then we may happen to lock and return a new, unrelated request. Always get a reference to the current active fence first, before replacing it with a new one. Having it protected from premature release and reuse, lock it and then replace with the new one but only if not yet signalled via a potential concurrent interrupt nor replaced with another one by a potential concurrent thread, otherwise retry, starting from getting a reference to the new current one. Adjust users to not get a reference to the previous active fence themselves and always put the reference got by __i915_active_fence_set() when no longer needed. v3: Fix lockdep splat reports and other issues caused by incorrect use of try_cmpxchg() (use (cmpxchg() != prev) instead) v2: Protect request's memory by getting a reference to it in favor of delegating its release to call_rcu() (Chris) Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211 Fixes: df9f85d8582e ("drm/i915: Serialise i915_active_fence_set() with itself") Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janusz.krzysztofik@linux.intel.com
|
#
9bba6b19 |
|
17-Jul-2023 |
Geert Uytterhoeven <geert+renesas@glider.be> |
drm: Spelling s/sempahore/semaphore/ Fix misspellings of "semaphore". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/8b0542c12a2427f34a792c41ac2d2a2922874bfa.1689600102.git.geert+renesas@glider.be
|
#
d3f23ab9 |
|
20-Jul-2023 |
Andrzej Hajda <andrzej.hajda@intel.com> |
drm/i915: use direct alias for i915 in requests i915_request contains direct alias to i915, there is no point to go via rq->engine->i915. v2: added missing rq.i915 initialization in measure_breadcrumb_dw. Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230720113002.1541572-1-andrzej.hajda@intel.com
|
#
a337b64f |
|
20-Jul-2023 |
Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> |
drm/i915: Fix premature release of request's reusable memory Infinite waits for completion of GPU activity have been observed in CI, mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps generated with a lot of extra trace_printk() calls added to the code, revealed loops of request dependencies being accidentally built, preventing the requests from being processed, each waiting for completion of another one's activity. After we substitute a new request for a last active one tracked on a timeline, we set up a dependency of our new request to wait on completion of current activity of that previous one. While doing that, we must take care of keeping the old request still in memory until we use its attributes for setting up that await dependency, or we can happen to set up the await dependency on an unrelated request that already reuses the memory previously allocated to the old one, already released. Combined with perf adding consecutive kernel context remote requests to different user context timelines, unresolvable loops of await dependencies can be built, leading do infinite waits. We obtain a pointer to the previous request to wait upon when we substitute it with a pointer to our new request in an active tracker, e.g. in intel_timeline.last_request. In some processing paths we protect that old request from being freed before we use it by getting a reference to it under RCU protection, but in others, e.g. __i915_request_commit() -> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(), we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU, that RCU protection is not sufficient against reuse of memory. We could protect i915_request's memory from being prematurely reused by calling its release function via call_rcu() and using rcu_read_lock() consequently, as proposed in v1. However, that approach leads to significant (up to 10 times) increase of SLAB utilization by i915_request SLAB cache. Another potential approach is to take a reference to the previous active fence. When updating an active fence tracker, we first lock the new fence, substitute a pointer of the current active fence with the new one, then we lock the substituted fence. With this approach, there is a time window after the substitution and before the lock when the request can be concurrently released by an interrupt handler and its memory reused, then we may happen to lock and return a new, unrelated request. Always get a reference to the current active fence first, before replacing it with a new one. Having it protected from premature release and reuse, lock it and then replace with the new one but only if not yet signalled via a potential concurrent interrupt nor replaced with another one by a potential concurrent thread, otherwise retry, starting from getting a reference to the new current one. Adjust users to not get a reference to the previous active fence themselves and always put the reference got by __i915_active_fence_set() when no longer needed. v3: Fix lockdep splat reports and other issues caused by incorrect use of try_cmpxchg() (use (cmpxchg() != prev) instead) v2: Protect request's memory by getting a reference to it in favor of delegating its release to call_rcu() (Chris) Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211 Fixes: df9f85d8582e ("drm/i915: Serialise i915_active_fence_set() with itself") Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janusz.krzysztofik@linux.intel.com (cherry picked from commit 946e047a3d88d46d15b5c5af0414098e12b243f7) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
848a4e5c |
|
08-Jun-2023 |
Luca Coelho <luciano.coelho@intel.com> |
drm/i915: add a dedicated workqueue inside drm_i915_private In order to avoid flush_scheduled_work() usage, add a dedicated workqueue in the drm_i915_private structure. In this way, we don't need to use the system queue anymore. This change is mostly mechanical and based on Tetsuo's original patch[1]. v6 by Jani: - Also create unordered_wq for mock device Link: https://patchwork.freedesktop.org/series/114608/ [1] Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/c816ebe17ef08d363981942a096a586a7658a65e.1686231190.git.jani.nikula@intel.com
|
#
95ccb25e |
|
01-Mar-2023 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: remove unnecessary intel_pm.h includes As intel_pm.[ch] used to contain much more, intel_pm.h was included in a lot of places. Many of them are now unnecessary. Remove. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ab9a7147b0cd63d95b9f27ed40615b9c9be18f84.1677678803.git.jani.nikula@intel.com
|
#
ff1e93e9 |
|
18-Jan-2023 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: add i915_config.h and move relevant declarations there We already have i915_config.c. Add the i915_config.h counterpart, and declutter i915_drv.h in the process. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230118131538.3558599-1-jani.nikula@intel.com
|
#
e6177ec5 |
|
27-Sep-2022 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915/huc: stall media submission until HuC is loaded Wait on the fence to be signalled to avoid the submissions finding HuC not yet loaded. v2: use dedicaded wait_queue_entry for waiting in HuC load, as submitq can't be re-used for it. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tony Ye <tony.ye@intel.com> Acked-by: Tony Ye <tony.ye@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220928004145.745803-13-daniele.ceraolospurio@intel.com
|
#
bcb9aa45 |
|
14-Jun-2022 |
Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> |
Revert "drm/i915: Hold reference to intel_context over life of i915_request" This reverts commit 1e98d8c52ed5dfbaf273c4423c636525c2ce59e7. The problem with this patch is that it makes i915_request to hold a reference to intel_context, which in turn holds a reference on the VM. This strong back referencing can lead to reference loops which leads to resource leak. An example is the upcoming VM_BIND work which requires VM to hold a reference to some shared VM specific BO. But this BO's dma-resv fences holds reference to the i915_request thus leading to reference loop. v2: Do not use reserved requests for virtual engines Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Suggested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Mathew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220614184348.23746-3-ramalingam.c@intel.com
|
#
7307e91b |
|
14-Jun-2022 |
Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> |
drm/i915: Do not access rq->engine without a reference In i915_fence_get_driver_name(), user may not hold a reference to rq->engine. Hence do not access it. Instead, store required device private pointer in 'rq->i915' and use it. Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Suggested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220614184348.23746-2-ramalingam.c@intel.com
|
#
45c64ecf |
|
27-May-2022 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Improve user experience and driver robustness under SIGINT or similar We have long standing customer complaints that pressing Ctrl-C (or to the effect of) causes engine resets with otherwise well behaving programs. Not only is logging engine resets during normal operation not desirable since it creates support incidents, but more fundamentally we should avoid going the engine reset path when we can since any engine reset introduces a chance of harming an innocent context. Reason for this undesirable behaviour is that the driver currently does not distinguish between banned contexts and non-persistent contexts which have been closed. To fix this we add the distinction between the two reasons for revoking contexts, which then allows the strict timeout only be applied to banned, while innocent contexts (well behaving) can preempt cleanly and exit without triggering the engine reset path. Note that the added context exiting category applies both to closed non- persistent context, and any exiting context when hangcheck has been disabled by the user. At the same time we rename the backend operation from 'ban' to 'revoke' which more accurately describes the actual semantics. (There is no ban at the backend level since banning is a concept driven by the scheduling frontend. Backends are simply able to revoke a running context so that is the more appropriate name chosen.) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220527072452.2225610-1-tvrtko.ursulin@linux.intel.com
|
#
7bc80a54 |
|
09-Nov-2021 |
Christian König <christian.koenig@amd.com> |
dma-buf: add enum dma_resv_usage v4 This change adds the dma_resv_usage enum and allows us to specify why a dma_resv object is queried for its containing fences. Additional to that a dma_resv_usage_rw() helper function is added to aid retrieving the fences for a read or write userspace submission. This is then deployed to the different query functions of the dma_resv object and all of their users. When the write paratermer was previously true we now use DMA_RESV_USAGE_WRITE and DMA_RESV_USAGE_READ otherwise. v2: add KERNEL/OTHER in separate patch v3: some kerneldoc suggestions by Daniel v4: some more kerneldoc suggestions by Daniel, fix missing cases lost in the rebase pointed out by Bas. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-2-christian.koenig@amd.com
|
#
24524e3f |
|
09-Feb-2022 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: move the DRIVER_* macros to i915_driver.[ch] The macros are more at home in i915_driver.[ch]. v2: Rebase Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220209123121.3337496-1-jani.nikula@intel.com
|
#
202b1f4c |
|
10-Jan-2022 |
Matt Roper <matthew.d.roper@intel.com> |
drm/i915/gt: Move engine registers to their own header Let's continue breaking up and cleaning up the massive i915_reg.h file by moving all registers that are defined in relation to an engine base to their own header. There are probably a bunch of other "engine registers" that we haven't moved yet (especially those that belong to the render engine in the 0x2??? range), but this is a relatively straightforward first step. Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220111051600.3429104-8-matthew.d.roper@intel.com
|
#
60dc43d1 |
|
10-Jan-2022 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Use struct vma_resource instead of struct vma_snapshot There is always a struct vma_resource guaranteed to be alive when we access a corresponding struct vma_snapshot. So ditch the latter and instead of allocating vma_snapshots, reference the already existning vma_resource. This requires a couple of extra members in struct vma_resource but that's a small price to pay for the simplification. v2: - Fix a missing include and declaration (kernel test robot <lkp@intel.com>) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220110172219.107131-7-thomas.hellstrom@linux.intel.com
|
#
63cf4cad |
|
21-Dec-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Break out the i915_deps utility Since it's starting to be used outside the i915 TTM move code, move it to a separate set of files. v2: - Update the documentation. v4: - Rebase. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211221200050.436316-4-thomas.hellstrom@linux.intel.com
|
#
11930817 |
|
21-Dec-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Avoid using the i915_fence_array when collecting dependencies Since the gt migration code was using only a single fence for dependencies, these were collected in a dma_fence_array. However, it turns out that it's illegal to use some dma_fences in a dma_fence_array, in particular other dma_fence_arrays and dma_fence_chains, and this causes trouble for us moving forward. Have the gt migration code instead take a const struct i915_deps for dependencies. This means we can skip the dma_fence_array creation and instead pass the struct i915_deps instead to circumvent the problem. v2: - Make the prev_deps() function static. (kernel test robot <lkp@intel.com>) - Update the struct i915_deps kerneldoc. v4: - Rebase. Fixes: 5652df829b3c ("drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous") Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211221200050.436316-2-thomas.hellstrom@linux.intel.com
|
#
40aa583e |
|
09-Dec-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Don't leak the capture list items When we recently converted the capture code to use vma snapshots, we forgot to free the struct i915_capture_list list items after use. Fix that by bringing back a kfree. Fixes: ff20afc4cee7 ("drm/i915: Update error capture code to avoid using the current vma state") Cc: Ramalingam C <ramalingam.c@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211209141304.393479-1-thomas.hellstrom@linux.intel.com
|
#
ff20afc4 |
|
29-Nov-2021 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
drm/i915: Update error capture code to avoid using the current vma state With asynchronous migrations, the vma state may be several migrations ahead of the state that matches the request we're capturing. Address that by introducing an i915_vma_snapshot structure that can be used to snapshot relevant state at request submission. In order to make sure we access the correct memory, the snapshots take references on relevant sg-tables and memory regions. Also move the capture list allocation out of the fence signaling critical path and use the CONFIG_DRM_I915_CAPTURE_ERROR define to avoid compiling in members and functions used for error capture when they're not used. Finally, Introduce lockdep annotation. v4: - Break out the capture allocation mode change to a separate patch. v5: - Fix compilation error in the !CONFIG_DRM_I915_CAPTURE_ERROR case (kernel test robot) v6: - Use #if IS_ENABLED() instead of #ifdef to match driver style. - Move yet another change of allocation mode to the separate patch. - Commit message rework due to patch reordering. v7: - Adjust for removal of region refcounting. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211129202245.472043-1-thomas.hellstrom@linux.intel.com
|
#
44505168 |
|
16-Nov-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Drop stealing of bits from i915_sw_fence function pointer Rather than stealing bits from i915_sw_fence function pointer use separate fields for function pointer and flags. If using two different fields, the 4 byte alignment for the i915_sw_fence function pointer can also be dropped. v2: (CI) - Set new function field rather than flags in __i915_sw_fence_init v3: (Tvrtko) - Remove BUG_ON(!fence->flags) in reinit as that will now blow up - Only define fence->flags if CONFIG_DRM_I915_SW_FENCE_CHECK_DAG is defined v4: - Rebase, resend for CI Signed-off-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211116194929.10211-1-matthew.brost@intel.com
|
#
7e2e69ed |
|
20-Oct-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Fix i915_request fence wait semantics The i915_request fence wait behaves differently for timeout = 0 compared to expected dma-fence behavior. i915 behavior: - Unsignaled: -ETIME - Signaled: 0 (= timeout) Expected: - Unsignaled: 0 - Signaled: 1 Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Daniel Vetter <daniel@ffwll.ch> Acked-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211116102431.198905-6-christian.koenig@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>
|
#
493043fe |
|
01-Nov-2021 |
Vinay Belgaumkar <vinay.belgaumkar@intel.com> |
drm/i915/guc/slpc: Add waitboost functionality for SLPC Add helper in RPS code for handling SLPC and non-SLPC paths. When boost is requested in the SLPC path, we can ask GuC to ramp up the frequency req by setting the minimum frequency to boost freq. Reset freq back to the min softlimit when there are no more waiters. v2: Schedule a worker thread which can boost freq from within an interrupt context as well. v3: No need to check against requested freq before scheduling boost work (Ashutosh) Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211102012608.8609-3-vinay.belgaumkar@intel.com
|
#
8581fd40 |
|
02-Dec-2021 |
Jakub Kicinski <kuba@kernel.org> |
treewide: Add missing includes masked by cgroup -> bpf dependency cgroup.h (therefore swap.h, therefore half of the universe) includes bpf.h which in turn includes module.h and slab.h. Since we're about to get rid of that dependency we need to clean things up. v2: drop the cpu.h include from cacheinfo.h, it's not necessary and it makes riscv sensitive to ordering of include files. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Krzysztof Wilczyński <kw@linux.com> Acked-by: Peter Chen <peter.chen@kernel.org> Acked-by: SeongJae Park <sj@kernel.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/all/20211120035253.72074-1-kuba@kernel.org/ # v1 Link: https://lore.kernel.org/all/20211120165528.197359-1-kuba@kernel.org/ # cacheinfo discussion Link: https://lore.kernel.org/bpf/20211202203400.1208663-1-kuba@kernel.org
|
#
a585070f |
|
12-Sep-2021 |
Christian König <christian.koenig@amd.com> |
drm/i915: use the new iterator in i915_request_await_object v2 Simplifying the code a bit. v2: add missing rcu_read_lock()/rcu_read_unlock() v3: use dma_resv_for_each_fence instead Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211005113742.1101-20-christian.koenig@amd.com
|
#
afc76f30 |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Make request conflict tracking understand parallel submits If an object in the excl or shared slot is a composite fence from a parallel submit and the current request in the conflict tracking is from the same parallel context there is no need to enforce ordering as the ordering is already implicit. Make the request conflict tracking understand this by comparing a parallel submit's parent context and skipping conflict insertion if the values match. v2: (John Harrison) - Reword commit message Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-23-matthew.brost@intel.com
|
#
bc955204 |
|
14-Oct-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Insert submit fences between requests in parent-child relationship The GuC must receive requests in the order submitted for contexts in a parent-child relationship to function correctly. To ensure this, insert a submit fence between the current request and last request submitted for requests / contexts in a parent child relationship. This is conceptually similar to a single timeline. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-14-matthew.brost@intel.com
|
#
1a839e01 |
|
05-Oct-2021 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/i915: remove IS_ACTIVE When trying to bring IS_ACTIVE to linux/kconfig.h I thought it wouldn't provide much value just encapsulating it in a boolean context. So I also added the support for handling undefined macros as the IS_ENABLED() counterpart. However the feedback received from Masahiro Yamada was that it is too ugly, not providing much value. And just wrapping in a boolean context is too dumb - we could simply open code it. As detailed in commit babaab2f4738 ("drm/i915: Encapsulate kconfig constant values inside boolean predicates"), the IS_ACTIVE macro was added to workaround a compilation warning. However after checking again our current uses of IS_ACTIVE it turned out there is only 1 case in which it triggers a warning in clang (due -Wconstant-logical-operand) and 2 in smatch. All the others can simply use the shorter version, without wrapping it in any macro. So here I'm dialing all the way back to simply removing the macro. That single case hit by clang can be changed to make the constant come first, so it doesn't think it's mask: - if (context && CONFIG_DRM_I915_FENCE_TIMEOUT) + if (CONFIG_DRM_I915_FENCE_TIMEOUT && context) As talked with Dan Carpenter, that logic will be added in smatch as well, so it will also stop warning about it. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20211005171728.3147094-1-lucas.demarchi@intel.com
|
#
07f82a47 |
|
04-Oct-2021 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Handle Intel igfx + Intel dgfx hybrid graphics setup In short this makes i915 work for hybrid setups (DRI_PRIME=1 with Mesa) when rendering is done on Intel dgfx and scanout/composition on Intel igfx. Before this patch the driver was not quite ready for that setup, mainly because it was able to emit a semaphore wait between the two GPUs, which results in deadlocks because semaphore target location in HWSP is neither shared between the two, nor mapped in both GGTT spaces. To fix it the patch adds an additional check to a couple of relevant code paths in order to prevent using semaphores for inter-engine synchronisation when relevant objects are not in the same GGTT space. v2: * Avoid adding rq->i915. (Chris) v3: * Use GGTT which describes the limit more precisely. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211005113135.768295-1-tvrtko.ursulin@linux.intel.com
|
#
be988eae |
|
21-Sep-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/request: fix early tracepoints Currently we blow up in trace_dma_fence_init, when calling into get_driver_name or get_timeline_name, since both the engine and context might be NULL(or contain some garbage address) in the case of newly allocated slab objects via the request ctor. Note that we also use SLAB_TYPESAFE_BY_RCU here, which allows requests to be immediately freed, but delay freeing the underlying page by an RCU grace period. With this scheme requests can be re-allocated, at the same time as they are also being read by some lockless RCU lookup mechanism. In the ctor case, which is only called for new slab objects(i.e allocate new page and call the ctor for each object) it's safe to reset the context/engine prior to calling into dma_fence_init, since we can be certain that no one is doing an RCU lookup which might depend on peeking at the engine/context, like in active_engine(), since the object can't yet be externally visible. In the recycled case(which might also be externally visible) the request refcount always transitions from 0->1 after we set the context/engine etc, which should ensure it's valid to dereference the engine for example, when doing an RCU list-walk, so long as we can also increment the refcount first. If the refcount is already zero, then the request is considered complete/released. If it's non-zero, then the request might be in the process of being re-allocated, or potentially still in flight, however after successfully incrementing the refcount, it's possible to carefully inspect the request state, to determine if the request is still what we were looking for. Note that all externally visible requests returned to the cache must have zero refcount. One possible fix then is to move dma_fence_init out from the request ctor. Originally this was how it was done, but it was moved in: commit 855e39e65cfc33a73724f1cc644ffc5754864a20 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Feb 3 09:41:48 2020 +0000 drm/i915: Initialise basic fence before acquiring seqno where it looks like intel_timeline_get_seqno() relied on some of the rq->fence state, but that is no longer the case since: commit 12ca695d2c1ed26b2dcbb528b42813bd0f216cfc Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Date: Tue Mar 23 16:49:50 2021 +0100 drm/i915: Do not share hwsp across contexts any more, v8. intel_timeline_get_seqno() could also be cleaned up slightly by dropping the request argument. Moving dma_fence_init back out of the ctor, should ensure we have enough of the request initialised in case of trace_dma_fence_init. Functionally this should be the same, and is effectively what we were already open coding before, except now we also assign the fence->lock and fence->ops, but since these are invariant for recycled requests(which might be externally visible), and will therefore already hold the same value, it shouldn't matter. An alternative fix, since we don't yet have a fully initialised request when in the ctor, is just setting the context/engine as NULL, but this does require adding some extra handling in get_driver_name etc. v2(Daniel): - Try to make the commit message less confusing Fixes: 855e39e65cfc ("drm/i915: Initialise basic fence before acquiring seqno") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Michael Mason <michael.w.mason@intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210921134202.3803151-1-matthew.auld@intel.com
|
#
c83ff018 |
|
21-Sep-2021 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915/request: fix early tracepoints Currently we blow up in trace_dma_fence_init, when calling into get_driver_name or get_timeline_name, since both the engine and context might be NULL(or contain some garbage address) in the case of newly allocated slab objects via the request ctor. Note that we also use SLAB_TYPESAFE_BY_RCU here, which allows requests to be immediately freed, but delay freeing the underlying page by an RCU grace period. With this scheme requests can be re-allocated, at the same time as they are also being read by some lockless RCU lookup mechanism. In the ctor case, which is only called for new slab objects(i.e allocate new page and call the ctor for each object) it's safe to reset the context/engine prior to calling into dma_fence_init, since we can be certain that no one is doing an RCU lookup which might depend on peeking at the engine/context, like in active_engine(), since the object can't yet be externally visible. In the recycled case(which might also be externally visible) the request refcount always transitions from 0->1 after we set the context/engine etc, which should ensure it's valid to dereference the engine for example, when doing an RCU list-walk, so long as we can also increment the refcount first. If the refcount is already zero, then the request is considered complete/released. If it's non-zero, then the request might be in the process of being re-allocated, or potentially still in flight, however after successfully incrementing the refcount, it's possible to carefully inspect the request state, to determine if the request is still what we were looking for. Note that all externally visible requests returned to the cache must have zero refcount. One possible fix then is to move dma_fence_init out from the request ctor. Originally this was how it was done, but it was moved in: commit 855e39e65cfc33a73724f1cc644ffc5754864a20 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Feb 3 09:41:48 2020 +0000 drm/i915: Initialise basic fence before acquiring seqno where it looks like intel_timeline_get_seqno() relied on some of the rq->fence state, but that is no longer the case since: commit 12ca695d2c1ed26b2dcbb528b42813bd0f216cfc Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Date: Tue Mar 23 16:49:50 2021 +0100 drm/i915: Do not share hwsp across contexts any more, v8. intel_timeline_get_seqno() could also be cleaned up slightly by dropping the request argument. Moving dma_fence_init back out of the ctor, should ensure we have enough of the request initialised in case of trace_dma_fence_init. Functionally this should be the same, and is effectively what we were already open coding before, except now we also assign the fence->lock and fence->ops, but since these are invariant for recycled requests(which might be externally visible), and will therefore already hold the same value, it shouldn't matter. An alternative fix, since we don't yet have a fully initialised request when in the ctor, is just setting the context/engine as NULL, but this does require adding some extra handling in get_driver_name etc. v2(Daniel): - Try to make the commit message less confusing Fixes: 855e39e65cfc ("drm/i915: Initialise basic fence before acquiring seqno") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Michael Mason <michael.w.mason@intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210921134202.3803151-1-matthew.auld@intel.com (cherry picked from commit be988eaee1cb208c4445db46bc3ceaf75f586f0b) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
47514ac7 |
|
27-Jul-2021 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/i915: move request slabs to direct module init/exit With the global kmem_cache shrink infrastructure gone there's nothing special and we can convert them over. I'm doing this split up into each patch because there's quite a bit of noise with removing the static global.slab_requests|execute_cbs to just a slab_requests|execute_cbs. v2: Make slab static (Jason, 0day) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727121037.2041102-7-daniel.vetter@ffwll.ch
|
#
ee242ca7 |
|
26-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Implement GuC priority management Implement a simple static mapping algorithm of the i915 priority levels (int, -1k to 1k exposed to user) to the 4 GuC levels. Mapping is as follows: i915 level < 0 -> GuC low level (3) i915 level == 0 -> GuC normal level (2) i915 level < INT_MAX -> GuC high level (1) i915 level == INT_MAX -> GuC highest level (0) We believe this mapping should cover the UMD use cases (3 distinct user levels + 1 kernel level). In addition to static mapping, a simple counter system is attached to each context tracking the number of requests inflight on the context at each level. This is needed as the GuC levels are per context while in the i915 levels are per request. v2: (Daniele) - Add BUILD_BUG_ON to enforce ordering of priority levels - Add missing lockdep to guc_prio_fini - Check for return before setting context registered flag - Map DISPLAY priority or higher to highest guc prio - Update comment for guc_prio Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-33-matthew.brost@intel.com
|
#
62eaf0ae |
|
26-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Support request cancellation This adds GuC backend support for i915_request_cancel(), which in turn makes CONFIG_DRM_I915_REQUEST_TIMEOUT work. This implementation makes use of fence while there are likely simplier options. A fence was chosen because of another feature coming soon which requires a user to block on a context until scheduling is disabled. In that case we return the fence to the user and the user can wait on that fence. v2: (Daniele) - A comment about locking the blocked incr / decr - A comments about the use of the fence - Update commit message explaining why fence - Delete redundant check blocked count in unblock function - Ring buffer implementation - Comment about blocked in submission path - Shorter rpm path v3: (Checkpatch) - Fix typos in commit message (Daniel) - Rework to simplier locking structure in guc_context_block / unblock Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-26-matthew.brost@intel.com
|
#
dc0dad36 |
|
26-Jul-2021 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Fix for error capture after full GPU reset with GuC In the case of a full GPU reset (e.g. because GuC has died or because GuC's hang detection has been disabled), the driver can't rely on GuC reporting the guilty context. Instead, the driver needs to scan all active contexts and find one that is currently executing, as per the execlist mode behaviour. In GuC mode, this scan is different to execlist mode as the active request list is handled very differently. Similarly, the request state dump in debugfs needs to be handled differently when in GuC submission mode. Also refactured some of the request scanning code to avoid duplication across the multiple code paths that are now replicating it. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-20-matthew.brost@intel.com
|
#
d1cee2d3 |
|
26-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Move active request tracking to a vfunc Move active request tracking to a backend vfunc rather than assuming all backends want to do this in the manner. In the of case execlists / ring submission the tracking is on the physical engine while with GuC submission it is on the context. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-8-matthew.brost@intel.com
|
#
1e98d8c5 |
|
26-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Hold reference to intel_context over life of i915_request Hold a reference to the intel_context over life of an i915_request. Without this an i915_request can exist after the context has been destroyed (e.g. request retired, context closed, but user space holds a reference to the request from an out fence). In the case of GuC submission + virtual engine, the engine that the request references is also destroyed which can trigger bad pointer dref in fence ops (e.g. i915_fence_get_driver_name). We could likely change i915_fence_get_driver_name to avoid touching the engine but let's just be safe and hold the intel_context reference. v2: (John Harrison) - Update comment explaining how GuC mode and execlists mode deal with virtual engines differently Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-4-matthew.brost@intel.com
|
#
96d3e0e1 |
|
26-Jul-2021 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Make hangcheck work with GuC virtual engines The serial number tracking of engines happens at the backend of request submission and was expecting to only be given physical engines. However, in GuC submission mode, the decomposition of virtual to physical engines does not happen in i915. Instead, requests are submitted to their virtual engine mask all the way through to the hardware (i.e. to GuC). This would mean that the heart beat code thinks the physical engines are idle due to the serial number not incrementing. Which in turns means hangcheck does not work for GuC virtual engines. This patch updates the tracking to decompose virtual engines into their physical constituents and tracks the request against each. This is not entirely accurate as the GuC will only be issuing the request to one physical engine. However, it is the best that i915 can do given that it has no knowledge of the GuC's scheduling decisions. Downside of this is that all physical engines constituting a GuC virtual engine will be periodically unparked (even during just a single context executing) in order to be pinged with a heartbeat request. However the power and performance cost of this is not expected to be measurable (due low frequency of heartbeat pulses) and it is considered an easier option than trying to make changes to GuC firmware. v2: (Tvrtko) - Update commit message - Have default behavior if no vfunc present Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-3-matthew.brost@intel.com
|
#
38d5ec43 |
|
21-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Ensure request ordering via completion fences If two requests are on the same ring, they are explicitly ordered by the HW. So, a submission fence is sufficient to ensure ordering when using the new GuC submission interface. Conversely, if two requests share a timeline and are on the same physical engine but different context this doesn't ensure ordering on the new GuC submission interface. So, a completion fence needs to be used to ensure ordering. v2: (Daniele) - Don't delete spin lock v3: (Daniele) - Delete forward dec Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-13-matthew.brost@intel.com
|
#
3a4cdf19 |
|
21-Jul-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915/guc: Implement GuC context operations for new inteface Implement GuC context operations which includes GuC specific operations alloc, pin, unpin, and destroy. v2: (Daniel Vetter) - Use msleep_interruptible rather than cond_resched in busy loop (Michal) - Remove C++ style comment v3: (Matthew Brost) - Drop GUC_ID_START (John Harrison) - Fix a bunch of typos - Use drm_err rather than drm_dbg for G2H errors (Daniele) - Fix ;; typo - Clean up sched state functions - Add lockdep for guc_id functions - Don't call __release_guc_id when guc_id is invalid - Use MISSING_CASE - Add comment in guc_context_pin - Use shorter path to rpm (Daniele / CI) - Don't call release_guc_id on an invalid guc_id in destroy v4: (Daniel Vetter) - Add FIXME comment Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-7-matthew.brost@intel.com
|
#
4f62a7e0 |
|
21-Jul-2021 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
drm/i915: Ditch i915 globals shrink infrastructure This essentially reverts commit 84a1074920523430f9dc30ff907f4801b4820072 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jan 24 11:36:08 2018 +0000 drm/i915: Shrink the GEM kmem_caches upon idling mm/vmscan.c:do_shrink_slab() is a thing, if there's an issue with it then we need to fix that there, not hand-roll our own slab shrinking code in i915. Also when this was added there was only one other caller of kmem_cache_shrink (added 2005 to the acpi code). Now there's a 2nd one outside of i915 code in a kunit test, which seems legit since that wants to very carefully control what's in the kmem_cache. This out of a total of over 500 calls to kmem_cache_create. This alone should have been warning sign enough that we're doing something silly. Noticed while reviewing a patch set from Jason to fix up some issues in our i915_init() and i915_exit() module load/cleanup code. Now that i915_globals.c isn't any different than normal init/exit functions, we should convert them over to one unified table and remove i915_globals.[hc] entirely. v2: Improve commit message (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: David Airlie <airlied@linux.ie> Cc: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721183229.4136488-1-daniel.vetter@ffwll.ch
|
#
93a2711c |
|
14-Jul-2021 |
Jason Ekstrand <jason@jlekstrand.net> |
Revert "drm/i915: Propagate errors on awaiting already signaled fences" This reverts commit 9e31c1fe45d555a948ff66f1f0e3fe1f83ca63f7. Ever since that commit, we've been having issues where a hang in one client can propagate to another. In particular, a hang in an app can propagate to the X server which causes the whole desktop to lock up. Error propagation along fences sound like a good idea, but as your bug shows, surprising consequences, since propagating errors across security boundaries is not a good thing. What we do have is track the hangs on the ctx, and report information to userspace using RESET_STATS. That's how arb_robustness works. Also, if my understanding is still correct, the EIO from execbuf is when your context is banned (because not recoverable or too many hangs). And in all these cases it's up to userspace to figure out what is all impacted and should be reported to the application, that's not on the kernel to guess and automatically propagate. What's more, we're also building more features on top of ctx error reporting with RESET_STATS ioctl: Encrypted buffers use the same, and the userspace fence wait also relies on that mechanism. So it is the path going forward for reporting gpu hangs and resets to userspace. So all together that's why I think we should just bury this idea again as not quite the direction we want to go to, hence why I think the revert is the right option here. For backporters: Please note that you _must_ have a backport of https://lore.kernel.org/dri-devel/20210602164149.391653-2-jason@jlekstrand.net/ for otherwise backporting just this patch opens up a security bug. v2: Augment commit message. Also restore Jason's sob that I accidentally lost. v3: Add a note for backporters Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reported-by: Marcin Slusarz <marcin.slusarz@intel.com> Cc: <stable@vger.kernel.org> # v5.6+ Cc: Jason Ekstrand <jason.ekstrand@intel.com> Cc: Marcin Slusarz <marcin.slusarz@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3080 Fixes: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences") Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210714193419.1459723-3-jason@jlekstrand.net
|
#
5ac545b8 |
|
08-Jul-2021 |
Jason Ekstrand <jason@jlekstrand.net> |
drm/i915/request: Remove the hook from await_execution This was only ever used for FENCE_SUBMIT automatic engine selection which was removed in the previous commit. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-12-jason@jlekstrand.net
|
#
3f623e06 |
|
17-Jun-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Move engine->schedule to i915_sched_engine The schedule function should be in the schedule object. v3: (Jason Ekstrand) Add kernel doc Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-6-matthew.brost@intel.com
|
#
349a2bc5 |
|
17-Jun-2021 |
Matthew Brost <matthew.brost@intel.com> |
drm/i915: Move active tracking to i915_sched_engine Move active request tracking and its lock to i915_sched_engine. This lock is also the submission lock so having it in the i915_sched_engine is the correct place. v3: (Jason Ekstrand) Add kernel doc v6: Rebase Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.comk> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-5-matthew.brost@intel.com
|
#
3761baae |
|
14-Jul-2021 |
Jason Ekstrand <jason@jlekstrand.net> |
Revert "drm/i915: Propagate errors on awaiting already signaled fences" This reverts commit 9e31c1fe45d555a948ff66f1f0e3fe1f83ca63f7. Ever since that commit, we've been having issues where a hang in one client can propagate to another. In particular, a hang in an app can propagate to the X server which causes the whole desktop to lock up. Error propagation along fences sound like a good idea, but as your bug shows, surprising consequences, since propagating errors across security boundaries is not a good thing. What we do have is track the hangs on the ctx, and report information to userspace using RESET_STATS. That's how arb_robustness works. Also, if my understanding is still correct, the EIO from execbuf is when your context is banned (because not recoverable or too many hangs). And in all these cases it's up to userspace to figure out what is all impacted and should be reported to the application, that's not on the kernel to guess and automatically propagate. What's more, we're also building more features on top of ctx error reporting with RESET_STATS ioctl: Encrypted buffers use the same, and the userspace fence wait also relies on that mechanism. So it is the path going forward for reporting gpu hangs and resets to userspace. So all together that's why I think we should just bury this idea again as not quite the direction we want to go to, hence why I think the revert is the right option here. For backporters: Please note that you _must_ have a backport of https://lore.kernel.org/dri-devel/20210602164149.391653-2-jason@jlekstrand.net/ for otherwise backporting just this patch opens up a security bug. v2: Augment commit message. Also restore Jason's sob that I accidentally lost. v3: Add a note for backporters Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reported-by: Marcin Slusarz <marcin.slusarz@intel.com> Cc: <stable@vger.kernel.org> # v5.6+ Cc: Jason Ekstrand <jason.ekstrand@intel.com> Cc: Marcin Slusarz <marcin.slusarz@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3080 Fixes: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences") Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210714193419.1459723-3-jason@jlekstrand.net (cherry picked from commit 93a2711cddd5760e2f0f901817d71c93183c3b87) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
651e7d48 |
|
05-Jun-2021 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/i915: replace IS_GEN and friends with GRAPHICS_VER This was done by the following semantic patch: @@ expression i915; @@ - INTEL_GEN(i915) + GRAPHICS_VER(i915) @@ expression i915; expression E; @@ - INTEL_GEN(i915) >= E + GRAPHICS_VER(i915) >= E @@ expression dev_priv; expression E; @@ - !IS_GEN(dev_priv, E) + GRAPHICS_VER(dev_priv) != E @@ expression dev_priv; expression E; @@ - IS_GEN(dev_priv, E) + GRAPHICS_VER(dev_priv) == E @@ expression dev_priv; expression from, until; @@ - IS_GEN_RANGE(dev_priv, from, until) + IS_GRAPHICS_VER(dev_priv, from, until) @def@ expression E; identifier id =~ "^gen$"; @@ - id = GRAPHICS_VER(E) + ver = GRAPHICS_VER(E) @@ identifier def.id; @@ - id + ver It also takes care of renaming the variable we assign to GRAPHICS_VER() so to use "ver" rather than "gen". Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210606045050.103862-2-lucas.demarchi@intel.com
|
#
d3fae3b3 |
|
02-Jun-2021 |
Christian König <christian.koenig@amd.com> |
dma-buf: drop the _rcu postfix on function names v3 The functions can be called both in _rcu context as well as while holding the lock. v2: add some kerneldoc as suggested by Daniel v3: fix indentation Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210602111714.212426-7-christian.koenig@amd.com
|
#
6b41323a |
|
01-Jun-2021 |
Christian König <christian.koenig@amd.com> |
dma-buf: rename dma_resv_get_excl_rcu to _unlocked That describes much better what the function is doing here. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210602111714.212426-6-christian.koenig@amd.com
|
#
0333ec88 |
|
28-Apr-2021 |
Bernard Zhao <bernard@vivo.com> |
drm/i915: Use might_alloc() This maybe uses lockdep through the fs_reclaim annotations. Signed-off-by: Bernard Zhao <bernard@vivo.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210429021327.57944-1-bernard@vivo.com
|
#
f7c37977 |
|
26-Mar-2021 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Take request reference before arming the watchdog timer Reference needs to be taken before arming the timer. Luckily, given the default timer period of 20s, the potential to hit the race is extremely unlikely. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 9b4d0598ee94 ("drm/i915: Request watchdog infrastructure") Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210326105759.2387104-1-tvrtko.ursulin@linux.intel.com
|
#
eef24f11 |
|
26-Mar-2021 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Take request reference before arming the watchdog timer Reference needs to be taken before arming the timer. Luckily, given the default timer period of 20s, the potential to hit the race is extremely unlikely. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 9b4d0598ee94 ("drm/i915: Request watchdog infrastructure") Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210326105759.2387104-1-tvrtko.ursulin@linux.intel.com (cherry picked from commit f7c379779161d364eb30338529490eac7dc377b7) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
9b4d0598 |
|
23-Mar-2021 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Request watchdog infrastructure Prepares the plumbing for setting request/fence expiration time. All code is put in place but is never activated due yet missing ability to actually configure the timer. Outline of the basic operation: A timer is started when request is ready for execution. If the request completes (retires) before the timer fires, timer is cancelled and nothing further happens. If the timer fires request is added to a lockless list and worker queued. Purpose of this is twofold: a) It allows request cancellation from a more friendly context and b) coalesces multiple expirations into a single event of consuming the list. Worker locklessly consumes the list of expired requests and cancels them all using previous added i915_request_cancel(). Associated timeout value is stored in rq->context.watchdog.timeout_us. v2: * Log expiration. v3: * Include more information about user timeline in the log message. v4: * Remove obsolete comment and fix formatting. (Matt) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-6-tvrtko.ursulin@linux.intel.com
|
#
38b237ea |
|
23-Mar-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Individual request cancellation Currently, we cancel outstanding requests within a context when the context is closed. We may also want to cancel individual requests using the same graceful preemption mechanism. v2 (Tvrtko): * Cancel waiters carefully considering no timeline lock and RCU. * Fixed selftests. v3 (Tvrtko): * Remove error propagation to waiters for now. v4 (Tvrtko): * Rebase for extracted i915_request_active_engine. (Matt) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> [danvet: Resolve conflict because intel_engine_flush_scheduler is still called intel_engine_flush_submission] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-3-tvrtko.ursulin@linux.intel.com
|
#
7dbc19da |
|
23-Mar-2021 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Extract active lookup engine to a helper Move active engine lookup to exported i915_request_active_engine. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> [danvet: Slight rebase, engine->sched.lock is still called engine->active.lock.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-2-tvrtko.ursulin@linux.intel.com
|
#
c10e4a79 |
|
01-Feb-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Protect against request freeing during cancellation on wedging As soon as we mark a request as completed, it may be retired. So when cancelling a request and marking it complete, make sure we first keep a reference to the request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210201085715.27435-4-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
#
12ca695d |
|
23-Mar-2021 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
drm/i915: Do not share hwsp across contexts any more, v8. Instead of sharing pages with breadcrumbs, give each timeline a single page. This allows unrelated timelines not to share locks any more during command submission. As an additional benefit, seqno wraparound no longer requires i915_vma_pin, which means we no longer need to worry about a potential -EDEADLK at a point where we are ready to submit. Changes since v1: - Fix erroneous i915_vma_acquire that should be a i915_vma_release (ickle). - Extra check for completion in intel_read_hwsp(). Changes since v2: - Fix inconsistent indent in hwsp_alloc() (kbuild) - memset entire cacheline to 0. Changes since v3: - Do same in intel_timeline_reset_seqno(), and clflush for good measure. Changes since v4: - Use refcounting on timeline, instead of relying on i915_active. - Fix waiting on kernel requests. Changes since v5: - Bump amount of slots to maximum (256), for best wraparounds. - Add hwsp_offset to i915_request to fix potential wraparound hang. - Ensure timeline wrap test works with the changes. - Assign hwsp in intel_timeline_read_hwsp() within the rcu lock to fix a hang. Changes since v6: - Rename i915_request_active_offset to i915_request_active_seqno(), and elaborate the function. (tvrtko) Changes since v7: - Move hunk to where it belongs. (jekstrand) - Replace CACHELINE_BYTES with TIMELINE_SEQNO_BYTES. (jekstrand) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> #v1 Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-2-maarten.lankhorst@linux.intel.com
|
#
9736387a |
|
14-Jan-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Reduce test_and_set_bit to set_bit in i915_request_submit() Avoid the full blown memory barrier of test_and_set_bit() by noting the completed request and removing it from the lists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210114135612.13210-5-chris@chris-wilson.co.uk
|
#
b2fe00bb |
|
14-Jan-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Drop i915_request.lock serialisation around await_start Originally, we used the signal->lock as a means of following the previous link in its timeline and peeking at the previous fence. However, we have replaced the explicit serialisation with a series of very careful probes that anticipate the links being deleted and the fences recycled before we are able to acquire a strong reference to it. We do not need the signal->lock crutch anymore, nor want the contention. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210114135612.13210-2-chris@chris-wilson.co.uk
|
#
163433e5 |
|
14-Jan-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark up protected uses of 'i915_request_completed' When we know that we are inside the timeline mutex, or inside the submission flow (under active.lock or the holder's rcu lock), we know that the rq->hwsp is stable and we can use the simpler direct version. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210114135612.13210-1-chris@chris-wilson.co.uk
|
#
baa7c2cd |
|
09-Jan-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Refactor marking a request as EIO When wedging the device, we cancel all outstanding requests and mark them as EIO. Rather than duplicate the small function to do so between each submission backend, export one. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210109163455.28466-3-chris@chris-wilson.co.uk
|
#
4e5c8a99 |
|
31-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Drop i915_request.lock requirement for intel_rps_boost() Since we use a flag within i915_request.flags to indicate when we have boosted the request (so that we only apply the boost) once, this can be used as the serialisation with i915_request_retire() to avoid having to explicitly take the i915_request.lock which is more heavily contended. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201231093149.19086-1-chris@chris-wilson.co.uk
|
#
9c080b0f |
|
31-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Pull context closure check from request submit to schedule-in We only need to evaluate the current status of the context when it is scheduled in, we will force a reschedule when the context is closed propagating the change to inflight contexts. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201231093946.11649-1-chris@chris-wilson.co.uk
|
#
7904e081 |
|
30-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Cancel submitted requests upon context reset Since we process schedule-in of a context after submitting the request, if we decide to reset the context at that time, we also have to cancel the requets we have marked for submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201230220028.17089-1-chris@chris-wilson.co.uk
|
#
16f2941a |
|
24-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Replace direct submit with direct call to tasklet Rather than having special case code for opportunistically calling process_csb() and performing a direct submit while holding the engine spinlock for submitting the request, simply call the tasklet directly. This allows us to retain the direct submission path, including the CS draining to allow fast/immediate submissions, without requiring any duplicated code paths, and most importantly greatly simplifying the control flow by removing reentrancy. This will enable us to close a few races in the virtual engines in the next few patches. The trickiest part here is to ensure that paired operations (such as schedule_in/schedule_out) remain under consistent locking domains, e.g. when pulled outside of the engine->active.lock v2: Use bh kicking, see commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous"). v3: Update engine-reset to be tasklet aware Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201224135544.1713-1-chris@chris-wilson.co.uk
|
#
5ec17c76 |
|
20-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Another tweak for flushing the tasklets tasklet_kill() ensures that we _yield_ the processor until a remote tasklet is completed. However, this leads to a starvation condition as being at the bottom of the scheduler's runqueue means that anything else is able to run, including all hogs keeping the tasklet occupied. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201220134858.10510-1-chris@chris-wilson.co.uk
|
#
45233ab2 |
|
16-Dec-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Move gen8 CS emitters into gen8_engine_cs.h Reduce the pollution of intel_engine.h by moving gen8_emit_pipe_control and friends to gen8_engine_cs.h Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201216135452.6063-1-chris@chris-wilson.co.uk
|
#
b8e2bd98 |
|
26-Nov-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Decouple completed requests on unwind Since the introduction of preempt-to-busy, requests can complete in the background, even while they are not on the engine->active.requests list. As such, the engine->active.request list itself is not in strict retirement order, and we have to scan the entire list while unwinding to not miss any. However, if the request is completed we currently leave it on the list [until retirement], but we could just as simply remove it and stop treating it as active. We would only have to then traverse it once while unwinding in quick succession. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-1-chris@chris-wilson.co.uk
|
#
562675d0 |
|
19-Nov-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Update request status flags for debug pretty-printer We plan to expand upon the number of available statuses for when we pretty-print the requests along the timelines, and so need a new set of flags. We have settled upon: Unready [U] - initial status after being submitted, the request is not ready for execution as it is waiting for external fences Ready [R] - all fences the request was waiting on have been signaled, and the request is now ready for execution and will be in a backend queue - a ready request may still need to wait on semaphores [internal fences] Ready/virtual [V] - same as ready, but queued over multiple backends Executing [E] - the request has been transferred from the backend queue and submitted for execution on HW - a completed request may still be regarded as executing, its status may not be updated until it is retired and removed from the lists Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-3-chris@chris-wilson.co.uk
|
#
1f0e785a |
|
19-Nov-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Lift i915_request_show() Extract i915_request_show for reuse in other request chain pretty printers. For a bonus point, quietly change the seqno format from %llx to %lld to match everywhere else. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-2-chris@chris-wilson.co.uk
|
#
7a9f50a0 |
|
15-Jun-2020 |
Peter Zijlstra <peterz@infradead.org> |
irq_work: Cleanup Get rid of the __call_single_node union and clean up the API a little to avoid external code relying on the structure layout as much. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
|
#
476c5818 |
|
29-Aug-2020 |
Peter Zijlstra <peterz@infradead.org> |
llist: Add nonatomic __llist_add() and __llist_dell_all() We'll use these in the new, lockless kretprobes code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/159870619318.1229682.13027387548510028721.stgit@devnote2
|
#
7d442ea7 |
|
28-Sep-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Cancel outstanding work after disabling heartbeats on an engine We only allow persistent requests to remain on the GPU past the closure of their containing context (and process) so long as they are continuously checked for hangs or allow other requests to preempt them, as we need to ensure forward progress of the system. If we allow persistent contexts to remain on the system after the the hangcheck mechanism is disabled, the system may grind to a halt. On disabling the mechanism, we sent a pulse along the engine to remove all executing contexts from the engine which would check for hung contexts -- but we did not prevent those contexts from being resubmitted if they survived the final hangcheck. Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs") Testcase: igt/gem_ctx_persistence/heartbeat-stop Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.7+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200928221510.26044-1-chris@chris-wilson.co.uk (cherry picked from commit 7a991cd3e3da9a56d5616b62d425db000a3242f2) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
5701a66e |
|
25-Sep-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Redo "Remove i915_request.lock requirement for execution callbacks" The reordering and rebasing of commit 2e4c6c1a9db5 ("drm/i915: Remove i915_request.lock requirement for execution callbacks") caused it to revert an earlier correction. Let us restore commit 99f0a640d464 ("drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs") Fixes: 2e4c6c1a9db5 ("drm/i915: Remove i915_request.lock requirement for execution callbacks") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200925101107.27869-1-chris@chris-wilson.co.uk (cherry picked from commit 35faeb7de9ef83da510a048f2016061f1e31d5fc) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
7a991cd3 |
|
28-Sep-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Cancel outstanding work after disabling heartbeats on an engine We only allow persistent requests to remain on the GPU past the closure of their containing context (and process) so long as they are continuously checked for hangs or allow other requests to preempt them, as we need to ensure forward progress of the system. If we allow persistent contexts to remain on the system after the the hangcheck mechanism is disabled, the system may grind to a halt. On disabling the mechanism, we sent a pulse along the engine to remove all executing contexts from the engine which would check for hung contexts -- but we did not prevent those contexts from being resubmitted if they survived the final hangcheck. Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs") Testcase: igt/gem_ctx_persistence/heartbeat-stop Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.7+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200928221510.26044-1-chris@chris-wilson.co.uk
|
#
35faeb7d |
|
25-Sep-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Redo "Remove i915_request.lock requirement for execution callbacks" The reordering and rebasing of commit 2e4c6c1a9db5 ("drm/i915: Remove i915_request.lock requirement for execution callbacks") caused it to revert an earlier correction. Let us restore commit 99f0a640d464 ("drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs") Fixes: 2e4c6c1a9db5 ("drm/i915: Remove i915_request.lock requirement for execution callbacks") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200925101107.27869-1-chris@chris-wilson.co.uk
|
#
b82a8b93 |
|
16-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Be wary of data races when reading the active execlists To implement preempt-to-busy (and so efficient timeslicing and best utilization of the hardware submission ports) we let the GPU run asynchronously in respect to the ELSP submission queue. This created challenges in keeping and accessing the driver state mirroring the asynchronous GPU execution. The latest occurence of this was spotted by KCSAN: [ 1413.563200] BUG: KCSAN: data-race in __await_execution+0x217/0x370 [i915] [ 1413.563221] [ 1413.563236] race at unknown origin, with read to 0xffff88885bb6c478 of 8 bytes by task 9654 on cpu 1: [ 1413.563548] __await_execution+0x217/0x370 [i915] [ 1413.563891] i915_request_await_dma_fence+0x4eb/0x6a0 [i915] [ 1413.564235] i915_request_await_object+0x421/0x490 [i915] [ 1413.564577] i915_gem_do_execbuffer+0x29b7/0x3c40 [i915] [ 1413.564967] i915_gem_execbuffer2_ioctl+0x22f/0x5c0 [i915] [ 1413.564998] drm_ioctl_kernel+0x156/0x1b0 [ 1413.565022] drm_ioctl+0x2ff/0x480 [ 1413.565046] __x64_sys_ioctl+0x87/0xd0 [ 1413.565069] do_syscall_64+0x4d/0x80 [ 1413.565094] entry_SYSCALL_64_after_hwframe+0x44/0xa9 To complicate matters, we have to both avoid the read tearing of *active and avoid any write tearing as perform the pending[] -> inflight[] promotion of the execlists. This is because we cannot rely on the memcpy doing u64 aligned copies on all kernels/platforms and so we opt to open-code it with explicit WRITE_ONCE annotations to satisfy KCSAN. v2: When in doubt, write the same comment again. v3: Expanded commit message. Fixes: b55230e5e800 ("drm/i915: Check for awaits on still currently executing requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Joonas: Rebased and reordered into drm-intel-gt-next branch] [Joonas: Added expanded commit message from Tvrtko and Chris] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (cherry picked from commit b4d9145b0154f8c71dafc2db5fd445f1f3db9426) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
2e4c6c1a |
|
16-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove i915_request.lock requirement for execution callbacks To implement preempt-to-busy (and so efficient timeslicing and best utilization of the hardware submission ports) we let the GPU run asynchronously in respect to the ELSP submission queue. This created challenges in keeping and accessing the driver state mirroring the asynchronous GPU execution. Previous fix 1d9221e9d395 ("drm/i915: Skip signaling a signaled request") however did not correctly serialize request retirement with the execution callbacks. We were using the i915_request.lock to serialise adding an execution callback with __i915_request_submit. However, if we use an atomic llist_add to serialise multiple waiters and then check to see if the request is already executing, we can remove the irq-spinlock and fix serialization between retirement and execution callbacks in one go. v2: Avoid using the irq_work when outside of the irq-spinlocks, where we can execute the callbacks immediately. v3: Pay close attention to the order of setting ACTIVE on retirement, we need to ensure the request is signaled and breadcrumbs detached before we finish removing the request from the engine. v4: Expanded commit message. Fixes: 1d9221e9d395 ("drm/i915: Skip signaling a signaled request") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-2-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Joonas: Rebased and reordered into drm-intel-gt-next branch] [Joonas: Added expanded commit message from Tvrtko and Chris] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
b4d9145b |
|
16-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Be wary of data races when reading the active execlists To implement preempt-to-busy (and so efficient timeslicing and best utilization of the hardware submission ports) we let the GPU run asynchronously in respect to the ELSP submission queue. This created challenges in keeping and accessing the driver state mirroring the asynchronous GPU execution. The latest occurence of this was spotted by KCSAN: [ 1413.563200] BUG: KCSAN: data-race in __await_execution+0x217/0x370 [i915] [ 1413.563221] [ 1413.563236] race at unknown origin, with read to 0xffff88885bb6c478 of 8 bytes by task 9654 on cpu 1: [ 1413.563548] __await_execution+0x217/0x370 [i915] [ 1413.563891] i915_request_await_dma_fence+0x4eb/0x6a0 [i915] [ 1413.564235] i915_request_await_object+0x421/0x490 [i915] [ 1413.564577] i915_gem_do_execbuffer+0x29b7/0x3c40 [i915] [ 1413.564967] i915_gem_execbuffer2_ioctl+0x22f/0x5c0 [i915] [ 1413.564998] drm_ioctl_kernel+0x156/0x1b0 [ 1413.565022] drm_ioctl+0x2ff/0x480 [ 1413.565046] __x64_sys_ioctl+0x87/0xd0 [ 1413.565069] do_syscall_64+0x4d/0x80 [ 1413.565094] entry_SYSCALL_64_after_hwframe+0x44/0xa9 To complicate matters, we have to both avoid the read tearing of *active and avoid any write tearing as perform the pending[] -> inflight[] promotion of the execlists. This is because we cannot rely on the memcpy doing u64 aligned copies on all kernels/platforms and so we opt to open-code it with explicit WRITE_ONCE annotations to satisfy KCSAN. v2: When in doubt, write the same comment again. v3: Expanded commit message. Fixes: b55230e5e800 ("drm/i915: Check for awaits on still currently executing requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Joonas: Rebased and reordered into drm-intel-gt-next branch] [Joonas: Added expanded commit message from Tvrtko and Chris] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
e2300560 |
|
01-Aug-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Hold context/request reference while breadcrumbs are active Currently we hold no actual reference to the request nor context while they are attached to a breadcrumb. To avoid freeing the request/context too early, we serialise with cancel-breadcrumbs by taking the irq spinlock in i915_request_retire(). The alternative is to take a reference for a new breadcrumb and release it upon signaling; removing the more frequently hit contention point in i915_request_retire(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200801160225.6814-2-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Joonas: Rebased and reordered into drm-intel-gt-next branch] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
b3786b29 |
|
31-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs On the virtual engines, we only use the intel_breadcrumbs for tracking signaling of stale breadcrumbs from the irq_workers. They do not have any associated interrupt handling, active requests are passed to a physical engine and associated breadcrumb interrupt handler. This causes issues for us as we need to ensure that we do not actually try and enable interrupts and the powermanagement required for them on the virtual engine, as they will never be disabled. Instead, let's specify the physical engine used for interrupt handler on a particular breadcrumb. v2: Drop b->irq_armed = true mocking for no interrupt HW Fixes: 4fe6abb8f513 ("drm/i915/gt: Ignore irq enabling on the virtual engines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-4-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
2854d866 |
|
31-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs After staring at the breadcrumb enabling/cancellation and coming to the conclusion that the cause of the mysterious stale breadcrumbs must the act of submitting a completed requests, we can then redirect those completed requests onto a dedicated signaled_list at the time of construction and so eliminate intel_engine_transfer_stale_breadcrumbs(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-2-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
c18636f7 |
|
31-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs Since the breadcrumb enabling/cancelling itself is serialised by the breadcrumbs.irq_lock, with a bit of care we can remove the outer serialisation with i915_request.lock for concurrent dma_fence_enable_signaling(). This has the important side-effect of eliminating the nested i915_request.lock within request submission. The challenge in serialisation is around the unsubmission where we take an active request that wants a breadcrumb on the signaling engine and put it to sleep. We do not want a concurrent dma_fence_enable_signaling() to attach a breadcrumb as we unsubmit, so we must mark the request as no longer active before serialising with the concurrent enable-signaling. On retire, we serialise with the concurrent enable-signaling, but instead of clearing ACTIVE, we mark it as SIGNALED. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Joonas: Rebased and reordered into drm-intel-gt-next branch] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
27a5dcfe |
|
28-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Remove disordered per-file request list for throttling I915_GEM_THROTTLE dates back to the time before contexts where there was just a single engine, and therefore a single timeline and request list globally. That request list was in execution/retirement order, and so walking it to find a particular aged request made sense and could be split per file. That is no more. We now have many timelines with a file, as many as the user wants to construct (essentially per-engine, per-context). Each of those run independently and so make the single list futile. Remove the disordered list, and iterate over all the timelines to find a request to wait on in each to satisfy the criteria that the CPU is no more than 20ms ahead of its oldest request. It should go without saying that the I915_GEM_THROTTLE ioctl is no longer used as the primary means of throttling, so it makes sense to push the complication into the ioctl where it only impacts upon its few irregular users, rather than the execbuf/retire where everybody has to pay the cost. Fortunately, the few users do not create vast amount of contexts, so the loops over contexts/engines should be concise. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200728152010.30701-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
3adee4ac |
|
14-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Soften the tasklet flush frequency before waits We include a tasklet flush before waiting on a request as a precaution against the HW being lax in event signaling. We now have a precautionary flush in the engine's heartbeat and so do not need to be quite so zealous on every request wait. If we focus on the request, the only tasklet flush that matters is if there is a delay in submitting this request to HW, so if the request is not ready to be executed, no advantage in reducing this wait can be gained by running the tasklet. And there is little point in doing busy work for no result. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200715115147.11866-10-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
3f6a6f34 |
|
16-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Reduce i915_request.lock contention for i915_request_wait Currently, we use i915_request_completed() directly in i915_request_wait() and follow up with a manual invocation of dma_fence_signal(). This appears to cause a large number of contentions on i915_request.lock as when the process is woken up after the fence is signaled by an interrupt, we will then try and call dma_fence_signal() ourselves while the signaler is still holding the lock. dma_fence_is_signaled() has the benefit of checking the DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so avoids most of that contention. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200716100754.5670-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
1840d40a |
|
28-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove gen check before calling intel_rps_boost It's been a while since gen6_rps_boost() [that only worked on gen6+] was replaced by intel_rps_boost() that understood itself when rps was active. Since the intel_rps_boost() is gen-agnostic, just call it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200728152219.1387-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
1d9221e9 |
|
13-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Skip signaling a signaled request Preempt-to-busy introduces various fascinating complications in that the requests may complete as we are unsubmitting them from HW. As they may then signal after unsubmission, we may find ourselves having to cleanup the signaling request from within the signaling callback. This causes us to recurse onto the same i915_request.lock. However, if the request is already signaled (as it will be before we enter the signal callbacks), we know we can skip the signaling of that request during submission, neatly evading the spinlock recursion. unsubmit(ve.rq0) # timeslice expiration or other preemption -> virtual_submit_request(ve.rq0) dma_fence_signal(ve.rq0) # request completed before preemption ack -> submit_notify(ve.rq1) -> virtual_submit_request(ve.rq1) # sees that we have completed ve.rq0 -> __i915_request_submit(ve.rq0) [ 264.210142] BUG: spinlock recursion on CPU#2, sample_multi_tr/2093 [ 264.210150] lock: 0xffff9efd6ac55080, .magic: dead4ead, .owner: sample_multi_tr/2093, .owner_cpu: 2 [ 264.210155] CPU: 2 PID: 2093 Comm: sample_multi_tr Tainted: G U [ 264.210158] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X212.B01.1909060036 09/06/2019 [ 264.210160] Call Trace: [ 264.210167] dump_stack+0x98/0xda [ 264.210174] spin_dump.cold+0x24/0x3c [ 264.210178] do_raw_spin_lock+0x9a/0xd0 [ 264.210184] _raw_spin_lock_nested+0x6a/0x70 [ 264.210314] __i915_request_submit+0x10a/0x3c0 [i915] [ 264.210415] virtual_submit_request+0x9b/0x380 [i915] [ 264.210516] submit_notify+0xaf/0x14c [i915] [ 264.210602] __i915_sw_fence_complete+0x8a/0x230 [i915] [ 264.210692] i915_sw_fence_complete+0x2d/0x40 [i915] [ 264.210762] __dma_i915_sw_fence_wake+0x19/0x30 [i915] [ 264.210767] dma_fence_signal_locked+0xb1/0x1c0 [ 264.210772] dma_fence_signal+0x29/0x50 [ 264.210871] i915_request_wait+0x5cb/0x830 [i915] [ 264.210876] ? dma_resv_get_fences_rcu+0x294/0x5d0 [ 264.210974] i915_gem_object_wait_fence+0x2f/0x40 [i915] [ 264.211084] i915_gem_object_wait+0xce/0x400 [i915] [ 264.211178] i915_gem_wait_ioctl+0xff/0x290 [i915] Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") References: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: "Nayana, Venkata Ramana" <venkata.ramana.nayana@intel.com> Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200713141636.29326-1-chris@chris-wilson.co.uk
|
#
5a833995 |
|
02-Jun-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Drop i915_request.i915 backpointer We infrequently use the direct i915 backpointer from the i915_request, so do we really need to waste the space in the struct for it? 8 bytes from the most frequently allocated struct vs an 3 bytes and pointer chasing in using rq->engine->i915? Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200602220953.21178-1-chris@chris-wilson.co.uk
|
#
0e386959 |
|
29-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Check for awaits on still currently executing requests With the advent of preempt-to-busy, a request may still be on the GPU as we unwind. And in the case of a unpreemptible [due to HW] request, that request will remain indefinitely on the GPU even though we have returned it back to our submission queue, and cleared the active bit. We only run the execution callbacks on transferring the request from our submission queue to the execution queue, but if this is a bonded request that the HW is waiting for, we will not submit it (as we wait for a fresh execution) even though it is still being executed. As we know that there are always preemption points between requests, we know that only the currently executing request may be still active even though we have cleared the flag. However, we do not precisely know which request is in ELSP[0] due to a delay in processing events, and furthermore we only store the last request in a context in our state tracker. Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") Testcase: igt/gem_exec_balancer/bonded-dual Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200529143926.3245-1-chris@chris-wilson.co.uk (cherry picked from commit b55230e5e800868961fc271b26d9ce53ae1f691e) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
631a6582 |
|
26-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Do not schedule normal requests immediately along virtual When we push a virtual request onto the HW, we update the rq->engine to point to the physical engine. A request that is then submitted by the user that waits upon the virtual engine, but along the physical engine in use, will then see that it is due to be submitted to the same engine and take a shortcut (and be queued without waiting for the completion fence). However, the virtual request may be preempted (either by higher priority users, or by timeslicing) and removed from the physical engine to be migrated over to one of its siblings. The dependent normal request however is oblivious to the removal of the virtual request and remains queued to execute on HW, believing that once it reaches the head of its queue all of its predecessors will have completed executing! v2: Beware restriction of signal->execution_mask prior to submission. Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine") Testcase: igt/gem_exec_balancer/sliced Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.3+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200526090753.11329-2-chris@chris-wilson.co.uk (cherry picked from commit 511b6d9aed417739b6aa49d0b6b4354ad21020f1) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
dd873dd5 |
|
26-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Reorder await_execution before await_request Reorder the code so that we can reuse the await_execution from a special case in await_request in the next patch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200526090753.11329-1-chris@chris-wilson.co.uk (cherry picked from commit ffb0c600c240103f6f34e07892a7e0a75502b243) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
b55230e5 |
|
29-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Check for awaits on still currently executing requests With the advent of preempt-to-busy, a request may still be on the GPU as we unwind. And in the case of a unpreemptible [due to HW] request, that request will remain indefinitely on the GPU even though we have returned it back to our submission queue, and cleared the active bit. We only run the execution callbacks on transferring the request from our submission queue to the execution queue, but if this is a bonded request that the HW is waiting for, we will not submit it (as we wait for a fresh execution) even though it is still being executed. As we know that there are always preemption points between requests, we know that only the currently executing request may be still active even though we have cleared the flag. However, we do not precisely know which request is in ELSP[0] due to a delay in processing events, and furthermore we only store the last request in a context in our state tracker. Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") Testcase: igt/gem_exec_balancer/bonded-dual Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200529143926.3245-1-chris@chris-wilson.co.uk
|
#
98b7067a |
|
29-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Add a few asserts around handling of i915_request_is_active() Let's assert that we only call the execute callbacks on making the request active, and that we do not execute the request without calling the callbacks. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200529085809.23691-1-chris@chris-wilson.co.uk
|
#
511b6d9a |
|
26-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Do not schedule normal requests immediately along virtual When we push a virtual request onto the HW, we update the rq->engine to point to the physical engine. A request that is then submitted by the user that waits upon the virtual engine, but along the physical engine in use, will then see that it is due to be submitted to the same engine and take a shortcut (and be queued without waiting for the completion fence). However, the virtual request may be preempted (either by higher priority users, or by timeslicing) and removed from the physical engine to be migrated over to one of its siblings. The dependent normal request however is oblivious to the removal of the virtual request and remains queued to execute on HW, believing that once it reaches the head of its queue all of its predecessors will have completed executing! v2: Beware restriction of signal->execution_mask prior to submission. Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine") Testcase: igt/gem_exec_balancer/sliced Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.3+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200526090753.11329-2-chris@chris-wilson.co.uk
|
#
ffb0c600 |
|
26-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Reorder await_execution before await_request Reorder the code so that we can reuse the await_execution from a special case in await_request in the next patch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200526090753.11329-1-chris@chris-wilson.co.uk
|
#
fc0e1270 |
|
25-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Improve execute_cb struct packing Reduce the irq_work llist for attaching the callbacks to the signal for both smaller structs (two fewer pointers!) and simpler [debug] code: Function old new delta irq_execute_cb 35 34 -1 __igt_breadcrumbs_smoketest 1684 1682 -2 i915_request_retire 2003 1996 -7 __i915_request_create 1047 1040 -7 __notify_execute_cb 135 126 -9 __i915_request_ctor 188 178 -10 __await_execution.part.constprop 451 440 -11 igt_wait_request 924 714 -210 One minor artifact is that the order of cb exection is reversed. No current use cases are affected by that change. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200526112051.10229-1-chris@chris-wilson.co.uk
|
#
ef29440b |
|
21-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Avoid using rq->engine after free during i915_fence_release In order to be valid to dereference during the i915_fence_release, after retiring the fence and releasing its refererences, we assume that rq->engine can only be a real engine (that stay intact until the device is shutdown after all fences have been flushed). However, due to a quirk of preempt-to-busy, we may retire a request that still belongs to a virtual engine and so eventually free it with rq->engine being invalid. To avoid dereferencing that invalid engine, we look at the execution_mask which if it indicates it may be executed on more than one engine, we know it originated on a virtual engine and may still be on one. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1906 Fixes: 43acd6516ca9 ("drm/i915: Keep a per-engine request pool") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200521140617.30015-2-chris@chris-wilson.co.uk (cherry picked from commit 32a4605b38c30689a6a18f3f4c7d3133ac9d3277) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
32a4605b |
|
21-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Avoid using rq->engine after free during i915_fence_release In order to be valid to dereference during the i915_fence_release, after retiring the fence and releasing its refererences, we assume that rq->engine can only be a real engine (that stay intact until the device is shutdown after all fences have been flushed). However, due to a quirk of preempt-to-busy, we may retire a request that still belongs to a virtual engine and so eventually free it with rq->engine being invalid. To avoid dereferencing that invalid engine, we look at the execution_mask which if it indicates it may be executed on more than one engine, we know it originated on a virtual engine and may still be on one. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1906 Fixes: 43acd6516ca9 ("drm/i915: Keep a per-engine request pool") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200521140617.30015-2-chris@chris-wilson.co.uk
|
#
18e4af04 |
|
13-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Drop no-semaphore boosting Now that we have fast timeslicing on semaphores, we no longer need to prioritise none-semaphore work as we will yield any work blocked on a semaphore to the next in the queue. Previously with no timeslicing, blocking on the semaphore caused extremely bad scheduling with multiple clients utilising multiple rings. Now, there is no impact and we can remove the complication. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200513173504.28322-1-chris@chris-wilson.co.uk
|
#
795d4d7f |
|
13-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark the addition of the initial-breadcrumb in the request The initial-breadcrumb is used to mark the end of the awaiting and the beginning of the user payload. We verify that we do not start the user payload before all signaler are completed, checking our semaphore setup by looking for the initial breadcrumb being written too early. We also want to ensure that we do not add semaphore waits after we have already closed the semaphore section, an issue for later deferred waits. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200513165937.9508-2-chris@chris-wilson.co.uk
|
#
a9d094dc |
|
07-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark concurrent submissions with a weak-dependency We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could correctly perform priority inheritance from the parallel branches to the common trunk. However, for the purpose of timeslicing and reset handling, the dependency is weak -- as we the pair of requests are allowed to run in parallel and not in strict succession. The real significance though is that this allows us to rearrange groups of WAIT_FOR_SUBMIT linked requests along the single engine, and so can resolve user level inter-batch scheduling dependencies from user semaphores. Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences") Testcase: igt/gem_exec_fence/submit Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200507155109.8892-1-chris@chris-wilson.co.uk (cherry picked from commit 6b6cd2ebd8d071e55998e32b648bb8081f7f02bb) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
bc850943 |
|
06-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Propagate error from completed fences We need to preserve fatal errors from fences that are being terminated as we hook them up. Fixes: ef4688497512 ("drm/i915: Propagate fence errors") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200506162136.3325-1-chris@chris-wilson.co.uk (cherry picked from commit 24fe5f2ab2478053d50a3bc629ada895903a5cbc) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
16dc224f |
|
09-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Replace the hardcoded I915_FENCE_TIMEOUT Expose the hardcoded timeout for unsignaled foreign fences as a Kconfig option, primarily to allow brave systems to disable the timeout and solely rely on correct signaling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200509105021.12542-1-chris@chris-wilson.co.uk
|
#
fcae4961 |
|
08-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Prevent using semaphores to chain up to external fences The downside of using semaphores is that we lose metadata passing along the signaling chain. This is particularly nasty when we need to pass along a fatal error such as EFAULT or EDEADLK. For fatal errors we want to scrub the request before it is executed, which means that we cannot preload the request onto HW and have it wait upon a semaphore. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200508092933.738-3-chris@chris-wilson.co.uk
|
#
3136deb7 |
|
08-May-2020 |
Lionel Landwerlin <lionel.g.landwerlin@intel.com> |
drm/i915: Peel dma-fence-chains for await To allow faster engine to engine synchronization, peel the layer of dma-fence-chain to expose potential i915 fences so that the i915_request code can emit HW semaphore wait/signal operations in the ring which is faster than waking up the host to submit unblocked workloads after interrupt notification. This is similar to the peeling we do for e.g. dma_fence_array. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200508185448.29709-1-chris@chris-wilson.co.uk
|
#
ac938052 |
|
08-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Pull waiting on an external dma-fence into its routine As a means for a small code consolidation, but primarily to start thinking more carefully about internal-vs-external linkage, pull the pair of i915_sw_fence_await_dma_fence() calls into a common routine. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200508092933.738-2-chris@chris-wilson.co.uk
|
#
2045d666 |
|
08-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Ignore submit-fences on the same timeline While we ordinarily do not skip submit-fences due to the accompanying hook that we want to callback on execution, a submit-fence on the same timeline is meaningless. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200508092933.738-1-chris@chris-wilson.co.uk
|
#
eec39e44 |
|
07-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove wait priority boosting Upon waiting a request (when asked), we gave that request a small priority boost, not enough for it to cause preemption, but enough for it to be scheduled next before all equals. We also used that bit to give new clients a small priority boost, similar to FQ_CODEL, such that we favoured short interactive tasks ahead of long running streams. However, this is causing lots of complications with timeslicing where we both want to honour the boost and yet ignore it. Those complications cause unexpected user behaviour (tasks not being timesliced and run concurrently as epxected), and the easiest way to resolve that is to remove the boost. Hopefully, we can find a compromise again if we need to, but in theory timeslicing itself and future more advanced schedulers should give us the interactivity boost we seek. Testcase: igt/gem_exec_schedule/lateslice Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200507152338.7452-3-chris@chris-wilson.co.uk
|
#
6b6cd2eb |
|
07-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark concurrent submissions with a weak-dependency We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could correctly perform priority inheritance from the parallel branches to the common trunk. However, for the purpose of timeslicing and reset handling, the dependency is weak -- as we the pair of requests are allowed to run in parallel and not in strict succession. The real significance though is that this allows us to rearrange groups of WAIT_FOR_SUBMIT linked requests along the single engine, and so can resolve user level inter-batch scheduling dependencies from user semaphores. Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences") Testcase: igt/gem_exec_fence/submit Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200507155109.8892-1-chris@chris-wilson.co.uk
|
#
24fe5f2a |
|
06-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Propagate error from completed fences We need to preserve fatal errors from fences that are being terminated as we hook them up. Fixes: ef4688497512 ("drm/i915: Propagate fence errors") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200506162136.3325-1-chris@chris-wilson.co.uk
|
#
43acd651 |
|
02-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Keep a per-engine request pool Add a tiny per-engine request mempool so that we should always have a request available for powermanagement allocations from tricky contexts. This reserve is expected to be only used for kernel contexts when barriers must be emitted [almost] without fail. The main consumer for this reserved request is expected to be engine-pm, for which we know that there will always be at least the previous pm request that we can reuse under mempressure (so there should always be a spare request for engine_park()). This is an alternative to using a comparatively bulky mempool, which requires custom handling for both our reserved allocation requirement and to protect our TYPESAFE_BY_RCU slab cache. The advantage of mempool would be that it would allow us to keep a larger per-engine request pool. However, converting over to mempool is straightforward should the need arise. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-and-tested-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200402184037.21630-1-chris@chris-wilson.co.uk
|
#
41e4065a |
|
23-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Rely on direct submission to the queue Drop the pretense of kicking the tasklet (used only for the defunct guc submission backend, it should just take ownership of the submit!) and so remove the bh-kicking from around submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200323092841.22240-5-chris@chris-wilson.co.uk
|
#
14a0d527 |
|
10-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Defer semaphore priority bumping to a workqueue Since the semaphore fence may be signaled from inside an interrupt handler from inside a request holding its request->lock, we cannot then enter into the engine->active.lock for processing the semaphore priority bump as we may traverse our call tree and end up on another held request. CPU 0: [ 2243.218864] _raw_spin_lock_irqsave+0x9a/0xb0 [ 2243.218867] i915_schedule_bump_priority+0x49/0x80 [i915] [ 2243.218869] semaphore_notify+0x6d/0x98 [i915] [ 2243.218871] __i915_sw_fence_complete+0x61/0x420 [i915] [ 2243.218874] ? kmem_cache_free+0x211/0x290 [ 2243.218876] i915_sw_fence_complete+0x58/0x80 [i915] [ 2243.218879] dma_i915_sw_fence_wake+0x3e/0x80 [i915] [ 2243.218881] signal_irq_work+0x571/0x690 [i915] [ 2243.218883] irq_work_run_list+0xd7/0x120 [ 2243.218885] irq_work_run+0x1d/0x50 [ 2243.218887] smp_irq_work_interrupt+0x21/0x30 [ 2243.218889] irq_work_interrupt+0xf/0x20 CPU 1: [ 2242.173107] _raw_spin_lock+0x8f/0xa0 [ 2242.173110] __i915_request_submit+0x64/0x4a0 [i915] [ 2242.173112] __execlists_submission_tasklet+0x8ee/0x2120 [i915] [ 2242.173114] ? i915_sched_lookup_priolist+0x1e3/0x2b0 [i915] [ 2242.173117] execlists_submit_request+0x2e8/0x2f0 [i915] [ 2242.173119] submit_notify+0x8f/0xc0 [i915] [ 2242.173121] __i915_sw_fence_complete+0x61/0x420 [i915] [ 2242.173124] ? _raw_spin_unlock_irqrestore+0x39/0x40 [ 2242.173137] i915_sw_fence_complete+0x58/0x80 [i915] [ 2242.173140] i915_sw_fence_commit+0x16/0x20 [i915] Closes: https://gitlab.freedesktop.org/drm/intel/issues/1318 Fixes: b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of busywaits") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200310101720.9944-1-chris@chris-wilson.co.uk (cherry picked from commit 209df10bb4536c81c2540df96c02cd079435357f) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
c951b0af |
|
05-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Return early for await_start on same timeline Requests within a timeline are ordered by that timeline, so awaiting for the start of a request within the timeline is a no-op. This used to work by falling out of the mutex_trylock() as the signaler and waiter had the same timeline and not returning an error. Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200305134822.2750496-1-chris@chris-wilson.co.uk (cherry picked from commit ab7a69020fb5d5c7ba19fba60f62fd6f9ca9f779) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
c67b35d9 |
|
05-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Actually emit the await_start Fix the inverted test to emit the wait on the end of the previous request if we /haven't/ already. Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200305104210.2619967-1-chris@chris-wilson.co.uk (cherry picked from commit 07e9c59d63df6a1c44c1975c01827ba18b69270a) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
326611dd |
|
10-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark up racy read of active rq->engine As a virtual engine may change the rq->engine to point to the active request in flight, we need to warn the compiler that an active request's engine is volatile. [ 95.017686] write (marked) to 0xffff8881e8386b10 of 8 bytes by interrupt on cpu 2: [ 95.018123] execlists_dequeue+0x762/0x2150 [i915] [ 95.018539] __execlists_submission_tasklet+0x48/0x60 [i915] [ 95.018955] execlists_submission_tasklet+0xd3/0x170 [i915] [ 95.018986] tasklet_action_common.isra.0+0x42/0xa0 [ 95.019016] __do_softirq+0xd7/0x2cd [ 95.019043] irq_exit+0xbe/0xe0 [ 95.019068] irq_work_interrupt+0xf/0x20 [ 95.019491] i915_request_retire+0x2c5/0x670 [i915] [ 95.019937] retire_requests+0xa1/0xf0 [i915] [ 95.020348] engine_retire+0xa1/0xe0 [i915] [ 95.020376] process_one_work+0x3b1/0x690 [ 95.020403] worker_thread+0x80/0x670 [ 95.020429] kthread+0x19a/0x1e0 [ 95.020454] ret_from_fork+0x1f/0x30 [ 95.020476] [ 95.020498] read to 0xffff8881e8386b10 of 8 bytes by task 8909 on cpu 3: [ 95.020918] __i915_request_commit+0x177/0x220 [i915] [ 95.021329] i915_gem_do_execbuffer+0x38c4/0x4e50 [i915] [ 95.021750] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915] [ 95.021784] drm_ioctl_kernel+0xe4/0x120 [ 95.021809] drm_ioctl+0x297/0x4c7 [ 95.021832] ksys_ioctl+0x89/0xb0 [ 95.021865] __x64_sys_ioctl+0x42/0x60 [ 95.021901] do_syscall_64+0x6e/0x2c0 [ 95.021927] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200310142403.5953-1-chris@chris-wilson.co.uk
|
#
209df10b |
|
10-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Defer semaphore priority bumping to a workqueue Since the semaphore fence may be signaled from inside an interrupt handler from inside a request holding its request->lock, we cannot then enter into the engine->active.lock for processing the semaphore priority bump as we may traverse our call tree and end up on another held request. CPU 0: [ 2243.218864] _raw_spin_lock_irqsave+0x9a/0xb0 [ 2243.218867] i915_schedule_bump_priority+0x49/0x80 [i915] [ 2243.218869] semaphore_notify+0x6d/0x98 [i915] [ 2243.218871] __i915_sw_fence_complete+0x61/0x420 [i915] [ 2243.218874] ? kmem_cache_free+0x211/0x290 [ 2243.218876] i915_sw_fence_complete+0x58/0x80 [i915] [ 2243.218879] dma_i915_sw_fence_wake+0x3e/0x80 [i915] [ 2243.218881] signal_irq_work+0x571/0x690 [i915] [ 2243.218883] irq_work_run_list+0xd7/0x120 [ 2243.218885] irq_work_run+0x1d/0x50 [ 2243.218887] smp_irq_work_interrupt+0x21/0x30 [ 2243.218889] irq_work_interrupt+0xf/0x20 CPU 1: [ 2242.173107] _raw_spin_lock+0x8f/0xa0 [ 2242.173110] __i915_request_submit+0x64/0x4a0 [i915] [ 2242.173112] __execlists_submission_tasklet+0x8ee/0x2120 [i915] [ 2242.173114] ? i915_sched_lookup_priolist+0x1e3/0x2b0 [i915] [ 2242.173117] execlists_submit_request+0x2e8/0x2f0 [i915] [ 2242.173119] submit_notify+0x8f/0xc0 [i915] [ 2242.173121] __i915_sw_fence_complete+0x61/0x420 [i915] [ 2242.173124] ? _raw_spin_unlock_irqrestore+0x39/0x40 [ 2242.173137] i915_sw_fence_complete+0x58/0x80 [i915] [ 2242.173140] i915_sw_fence_commit+0x16/0x20 [i915] Closes: https://gitlab.freedesktop.org/drm/intel/issues/1318 Fixes: b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of busywaits") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200310101720.9944-1-chris@chris-wilson.co.uk
|
#
798fa870 |
|
06-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Improve the start alignment of bonded pairs Always wait on the start of the signaler request to reduce the problem of dequeueing the bonded pair too early -- we want both payloads to start at the same time, with no latency, and yet still allow others to make full use of the slack in the system. This reduce the amount of time we spend waiting on the semaphore used to synchronise the start of the bonded payload. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200306133852.3420322-3-chris@chris-wilson.co.uk
|
#
60900add |
|
09-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark racy read of intel_engine_cs.saturated [ 3783.276728] BUG: KCSAN: data-race in __i915_request_submit [i915] / i915_request_await_dma_fence [i915] [ 3783.276766] [ 3783.276787] write to 0xffff8881f1bc60a0 of 1 bytes by interrupt on cpu 2: [ 3783.277187] __i915_request_submit+0x47e/0x4a0 [i915] [ 3783.277580] __execlists_submission_tasklet+0x997/0x2780 [i915] [ 3783.277973] execlists_submission_tasklet+0xd3/0x170 [i915] [ 3783.278006] tasklet_action_common.isra.0+0x42/0xa0 [ 3783.278035] __do_softirq+0xd7/0x2cd [ 3783.278063] irq_exit+0xbe/0xe0 [ 3783.278089] do_IRQ+0x51/0x100 [ 3783.278114] ret_from_intr+0x0/0x1c [ 3783.278140] finish_task_switch+0x72/0x260 [ 3783.278170] __schedule+0x1e5/0x510 [ 3783.278198] schedule+0x45/0xb0 [ 3783.278226] smpboot_thread_fn+0x23e/0x300 [ 3783.278256] kthread+0x19a/0x1e0 [ 3783.278283] ret_from_fork+0x1f/0x30 [ 3783.278305] [ 3783.278327] read to 0xffff8881f1bc60a0 of 1 bytes by task 19440 on cpu 3: [ 3783.278724] i915_request_await_dma_fence+0x2a6/0x530 [i915] [ 3783.279130] i915_request_await_object+0x2fe/0x470 [i915] [ 3783.279524] i915_gem_do_execbuffer+0x45dc/0x4c20 [i915] [ 3783.279908] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915] [ 3783.279940] drm_ioctl_kernel+0xe4/0x120 [ 3783.279968] drm_ioctl+0x297/0x4c7 [ 3783.279996] ksys_ioctl+0x89/0xb0 [ 3783.280021] __x64_sys_ioctl+0x42/0x60 [ 3783.280047] do_syscall_64+0x6e/0x2c0 [ 3783.280074] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200309132726.28358-1-chris@chris-wilson.co.uk
|
#
dff2a11b |
|
06-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Do not poison i915_request.link on removal Do not poison the timeline link on the i915_request to allow both forward/backward list traversal under RCU. [ 9759.139229] RIP: 0010:active_request+0x2a/0x90 [i915] [ 9759.139240] Code: 41 56 41 55 41 54 55 48 89 fd 53 48 89 f3 48 83 c5 60 e8 49 de dc e0 48 8b 83 e8 01 00 00 48 39 c5 74 12 48 8d 90 20 fe ff ff <48> 8b 80 50 fe ff ff a8 01 74 11 e8 66 20 dd e0 48 89 d8 5b 5d 41 [ 9759.139251] RSP: 0018:ffffc9000014ce80 EFLAGS: 00010012 [ 9759.139260] RAX: dead000000000122 RBX: ffff888817cac040 RCX: 0000000000022000 [ 9759.139267] RDX: deacffffffffff42 RSI: ffff888817cac040 RDI: ffff888851fee900 [ 9759.139275] RBP: ffff888851fee960 R08: 000000000000001a R09: ffffffffa04702e0 [ 9759.139282] R10: ffffffff82187ea0 R11: 0000000000000002 R12: 0000000000000004 [ 9759.139289] R13: ffffffffa04d5179 R14: ffff8887f994ae40 R15: ffff888857b9a068 [ 9759.139296] FS: 0000000000000000(0000) GS:ffff88885ed80000(0000) knlGS:0000000000000000 [ 9759.139304] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9759.139311] CR2: 00007fff5bdec000 CR3: 00000008534fe001 CR4: 00000000001606e0 [ 9759.139318] Call Trace: [ 9759.139325] <IRQ> [ 9759.139389] execlists_reset+0x14d/0x310 [i915] [ 9759.139400] ? _raw_spin_unlock_irqrestore+0xf/0x30 [ 9759.139445] ? fwtable_read32+0x90/0x230 [i915] [ 9759.139499] execlists_submission_tasklet+0xf6/0x150 [i915] [ 9759.139508] tasklet_action_common.isra.17+0x32/0xa0 [ 9759.139516] __do_softirq+0x114/0x3dc [ 9759.139525] ? handle_irq_event_percpu+0x59/0x70 [ 9759.139533] irq_exit+0xa1/0xc0 [ 9759.139540] do_IRQ+0x76/0x150 [ 9759.139547] common_interrupt+0xf/0xf [ 9759.139554] </IRQ> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200306140115.3495686-1-chris@chris-wilson.co.uk
|
#
1eaa251b |
|
06-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Assert requests within a context are submitted in order Check the flow of requests into the hardware to verify that are submitted in order along their timeline. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200306071614.2846708-1-chris@chris-wilson.co.uk
|
#
ab7a6902 |
|
05-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Return early for await_start on same timeline Requests within a timeline are ordered by that timeline, so awaiting for the start of a request within the timeline is a no-op. This used to work by falling out of the mutex_trylock() as the signaler and waiter had the same timeline and not returning an error. Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200305134822.2750496-1-chris@chris-wilson.co.uk
|
#
07e9c59d |
|
05-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Actually emit the await_start Fix the inverted test to emit the wait on the end of the previous request if we /haven't/ already. Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200305104210.2619967-1-chris@chris-wilson.co.uk
|
#
36e191f0 |
|
03-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Apply i915_request_skip() on submission Trying to use i915_request_skip() prior to i915_request_add() causes us to try and fill the ring upto request->postfix, which has not yet been set, and so may cause us to memset() past the end of the ring. Instead of skipping the request immediately, just flag the error on the request (only accepting the first fatal error we see) and then clear the request upon submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200304121849.2448028-1-chris@chris-wilson.co.uk
|
#
61231f6b |
|
03-Mar-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Check that the context wasn't closed during setup As setup takes a long time, the user may close the context during the construction of the execbuf. In order to make sure we correctly track all outstanding work with non-persistent contexts, we need to serialise the submission with the context closure and mop up any leaks. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200303080546.1140508-3-chris@chris-wilson.co.uk
|
#
0b1570b7 |
|
27-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Protect i915_request_await_start from early waits We need to be extremely careful inside i915_request_await_start() as it needs to walk the list of requests in the foreign timeline with very little protection. As we hold our own timeline mutex, we can not nest inside the signaler's timeline mutex, so all that remains is our RCU protection. However, to be safe we need to tell the compiler that we may be traversing the list only under RCU protection, and furthermore we need to start declaring requests as elements of the timeline from their construction. Fixes: 9ddc8ec027a3 ("drm/i915: Eliminate the trylock for awaiting an earlier request") Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-11-chris@chris-wilson.co.uk (cherry picked from commit d22d2d073ef859b346bc32cb25299262e3973769) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
062444bb |
|
28-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Expose busywait duration to sysfs We busywait on an inflight request (one that is currently executing on HW, and so might complete quickly) prior to setting up an interrupt and sleeping. The trade off is that we keep an expensive CPU core busy in order to avoid wake up latency: where that trade off should lie is best left to the sysadmin. The busywait mechanism can be compiled out with ./scripts/config --set-val DRM_I915_SPIN_REQUEST 0 The maximum busywait duration can be adjusted per-engine using, /sys/class/drm/card?/engine/*/ms_busywait_duration_ns Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Steve Carbonari <steven.carbonari@intel.com> Tested-by: Steve Carbonari <steven.carbonari@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200228131716.3243616-4-chris@chris-wilson.co.uk
|
#
d22d2d07 |
|
27-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Protect i915_request_await_start from early waits We need to be extremely careful inside i915_request_await_start() as it needs to walk the list of requests in the foreign timeline with very little protection. As we hold our own timeline mutex, we can not nest inside the signaler's timeline mutex, so all that remains is our RCU protection. However, to be safe we need to tell the compiler that we may be traversing the list only under RCU protection, and furthermore we need to start declaring requests as elements of the timeline from their construction. Fixes: 9ddc8ec027a3 ("drm/i915: Eliminate the trylock for awaiting an earlier request") Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-11-chris@chris-wilson.co.uk
|
#
df6b1f3d |
|
20-Feb-2020 |
Matthew Auld <matthew.auld@intel.com> |
drm/i915: remove the other slab_dependencies The real one can be found in i915_scheduler.c. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200220105707.344522-1-matthew.auld@intel.com
|
#
c01e8da2 |
|
03-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Initialise basic fence before acquiring seqno Inside the intel_timeline_get_seqno(), we currently track the retirement of the old cachelines by listening to the new request. This requires that the new request is ready to be used and so requires a minimum bit of initialisation prior to getting the new seqno. Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200203094152.4150550-2-chris@chris-wilson.co.uk (cherry picked from commit 855e39e65cfc33a73724f1cc644ffc5754864a20) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
2aaaa5ee |
|
22-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark the removal of the i915_request from the sched.link Keep the rq->fence.flags consistent with the status of the rq->sched.link, and clear the associated bits when decoupling the link on retirement (as we may wish to inspect those flags independent of other state). Fixes: c3f1ed90e6ff ("drm/i915/gt: Allow temporary suspension of inflight requests") References: https://gitlab.freedesktop.org/drm/intel/issues/997 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-3-chris@chris-wilson.co.uk (cherry picked from commit b4a9a149f91ea345da76bcfe3f8a39715ac346a6) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
9e2750fc |
|
16-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Keep track of request among the scheduling lists If we keep track of when the i915_request.sched.link is on the HW runlist, or in the priority queue we can simplify our interactions with the request (such as during rescheduling). This also simplifies the next patch where we introduce a new in-between list, for requests that are ready but neither on the run list or in the queue. v2: Update i915_sched_node.link explanation for current usage where it is a link on both the queue and on the runlists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk (cherry picked from commit 672c368f9398042b629740cc9816e8e939eff2db) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
89dd019a |
|
11-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Poison rings after use On retiring the request, we should not re-use these elements in the ring (at least not until we fill the ringbuffer and knowingly reuse the space). Leave behind some poison to (hopefully) trap ourselves if we make a mistake. Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200211205615.1190127-1-chris@chris-wilson.co.uk
|
#
f16ccb64 |
|
10-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Disable use of hwsp_cacheline for kernel_context Currently on execlists, we use a local hwsp for the kernel_context, rather than the engine's HWSP, as this is the default for execlists. However, seqno wrap requires allocating a new HWSP cacheline, and may require pinning a new HWSP page in the GGTT. This operation requiring pinning in the GGTT is not allowed within the kernel_context timeline, as doing so may require re-entering the kernel_context in order to evict from the GGTT. As we want to avoid requiring a new HWSP for the kernel_context, we can use the permanently pinned engine's HWSP instead. However to do so we must prevent the use of semaphores reading the kernel_context's HWSP, as the use of semaphores do not support rollover onto the same cacheline. Fortunately, the kernel_context is mostly isolated, so unlikely to give benefit to semaphores. Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200210205722.794180-5-chris@chris-wilson.co.uk
|
#
602ddb41 |
|
05-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Flush execution tasklets before checking request status Rather than flushing the submission tasklets just before we sleep, flush before we check the request status. Ideally this gives us a moment to process the tasklets after sleeping just before we timeout. v2: Compromise by pushing the flush prior to the timeout, but after the check on completion so that we do not further delay the ready client. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200205095441.1769599-1-chris@chris-wilson.co.uk
|
#
855e39e6 |
|
03-Feb-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Initialise basic fence before acquiring seqno Inside the intel_timeline_get_seqno(), we currently track the retirement of the old cachelines by listening to the new request. This requires that the new request is ready to be used and so requires a minimum bit of initialisation prior to getting the new seqno. Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200203094152.4150550-2-chris@chris-wilson.co.uk
|
#
b4a9a149 |
|
22-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark the removal of the i915_request from the sched.link Keep the rq->fence.flags consistent with the status of the rq->sched.link, and clear the associated bits when decoupling the link on retirement (as we may wish to inspect those flags independent of other state). Fixes: 32ff621fd744 ("drm/i915/gt: Allow temporary suspension of inflight requests") References: https://gitlab.freedesktop.org/drm/intel/issues/997 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-3-chris@chris-wilson.co.uk
|
#
672c368f |
|
16-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Keep track of request among the scheduling lists If we keep track of when the i915_request.sched.link is on the HW runlist, or in the priority queue we can simplify our interactions with the request (such as during rescheduling). This also simplifies the next patch where we introduce a new in-between list, for requests that are ready but neither on the run list or in the queue. v2: Update i915_sched_node.link explanation for current usage where it is a link on both the queue and on the runlists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk
|
#
e1c31fb5 |
|
06-Jan-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Merge i915_request.flags with i915_request.fence.flags As we already have a flags field buried within i915_request, reuse it! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200106114234.2529613-3-chris@chris-wilson.co.uk
|
#
6a8679c0 |
|
22-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark the GEM context link as RCU protected The only protection for intel_context.gem_cotext is granted by RCU, so annotate it as a rcu protected pointer and carefully dereference it in the few occasions we need to use it. Fixes: 9f3ccd40acf4 ("drm/i915: Drop GEM context as a direct link from i915_request") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191222233558.2201901-1-chris@chris-wilson.co.uk
|
#
e6ba7648 |
|
21-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove i915->kernel_context Allocate only an internal intel_context for the kernel_context, forgoing a global GEM context for internal use as we only require a separate address space (for our own protection). Now having weaned GT from requiring ce->gem_context, we can stop referencing it entirely. This also means we no longer have to create random and unnecessary GEM contexts for internal use. GEM contexts are now entirely for tracking GEM clients, and intel_context the execution environment on the GPU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191221160324.1073045-1-chris@chris-wilson.co.uk
|
#
0f100b70 |
|
20-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Push the use-semaphore marker onto the intel_context Instead of rummaging through the intel_context to peek at the GEM context in the middle of request submission to decide whether to use semaphores, store that information on the intel_context itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-2-chris@chris-wilson.co.uk
|
#
9f3ccd40 |
|
20-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Drop GEM context as a direct link from i915_request Keep the intel_context as being the primary state for i915_request, with the GEM context a backpointer from the low level state for the rarer cases we need client information. Our goal is to remove such references to clients from the backend, and leave the HW submission agnostic to client interfaces and self-contained. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk
|
#
54400257 |
|
17-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Remove direct invocation of breadcrumb signaling Only signal the breadcrumbs from inside the irq_work, simplifying our interface and calling conventions. The micro-optimisation here is that by always using the irq_work interface, we know we are always inside an irq-off critical section for the breadcrumb signaling and can ellide save/restore of the irq flags. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191217095642.3124521-7-chris@chris-wilson.co.uk
|
#
85bedbf1 |
|
16-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Eliminate the trylock for reading a timeline's hwsp As we stash a pointer to the HWSP cacheline on the request, when reading it we only need confirm that the cacheline is still valid by checking that the request and timeline are still intact. v2: Protect hwsp_cachline with RCU Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191217011659.3092130-1-chris@chris-wilson.co.uk
|
#
9ddc8ec0 |
|
16-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Eliminate the trylock for awaiting an earlier request We currently use an error-prone mutex_trylock to grab another timeline to find an earlier request along it. However, with a bit of a sleight-of-hand, we can reduce the mutex_trylock to a spin_lock on the immediate request and careful pointer chasing to acquire a reference on the previous request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191216165317.2742896-1-chris@chris-wilson.co.uk
|
#
99de9536 |
|
10-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Copy across scheduler behaviour flags across submit fences We want the bonded request to have the same scheduler properties as its master so that it is placed at the same depth in the queue. For example, consider we have requests A, B and B', where B & B' are a bonded pair to run in parallel on two engines. A -> B \- B' B will run after A and so may be scheduled on an idle engine and wait on A using a semaphore. B' sees B being executed and so enters the queue on the same engine as A. As B' did not inherit the semaphore-chain from B, it may have higher precedence than A and so preempts execution. However, B' then sits on a semaphore waiting for B, who is waiting for A, who is blocked by B. Ergo B' needs to inherit the scheduler properties from B (i.e. the semaphore chain) so that it is scheduled with the same priority as B and will not be executed ahead of Bs dependencies. Furthermore, to prevent the priorities changing via the expose fence on B', we need to couple in the dependencies for PI. This requires us to relax our sanity-checks that dependencies are strictly in order. v2: Synchronise (B, B') execution on all platforms, regardless of using a scheduler, any no-op syncs should be elided. Fixes: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding") Closes: https://gitlab.freedesktop.org/drm/intel/issues/464 Testcase: igt/gem_exec_balancer/bonded-chain Testcase: igt/gem_exec_balancer/bonded-semaphore Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191210151332.3902215-1-chris@chris-wilson.co.uk (cherry picked from commit c81471f5e95c79c55687282ff6800f112b5d560b) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
f1925f33 |
|
13-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use EAGAIN for trylock failures While not good behaviour, it is, however, established behaviour that we can punt EAGAIN to userspace if we need to retry the ioctl. When trying to acquire a mutex, prefer to use EAGAIN to propagate losing the race so that if it does end up back in userspace, we try again. Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences") Closes: https://gitlab.freedesktop.org/drm/intel/issues/800 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191213160347.1789004-1-chris@chris-wilson.co.uk
|
#
639f2f24 |
|
13-Dec-2019 |
Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> |
drm/i915: Introduce new macros for tracing New macros ENGINE_TRACE(), CE_TRACE(), RQ_TRACE() and GT_TRACE() are introduce to tag device name and engine name with contexts and requests tracing in i915. Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191213155152.69182-2-venkata.s.dhanalakota@intel.com
|
#
65c29dbb |
|
11-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use the i915_device name for identifying our request fences Use the dev_name(i915) to identify the requests for debugging, so we can tell different device timelines apart. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Reviewed-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191211150204.133471-1-chris@chris-wilson.co.uk
|
#
c81471f5 |
|
10-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Copy across scheduler behaviour flags across submit fences We want the bonded request to have the same scheduler properties as its master so that it is placed at the same depth in the queue. For example, consider we have requests A, B and B', where B & B' are a bonded pair to run in parallel on two engines. A -> B \- B' B will run after A and so may be scheduled on an idle engine and wait on A using a semaphore. B' sees B being executed and so enters the queue on the same engine as A. As B' did not inherit the semaphore-chain from B, it may have higher precedence than A and so preempts execution. However, B' then sits on a semaphore waiting for B, who is waiting for A, who is blocked by B. Ergo B' needs to inherit the scheduler properties from B (i.e. the semaphore chain) so that it is scheduled with the same priority as B and will not be executed ahead of Bs dependencies. Furthermore, to prevent the priorities changing via the expose fence on B', we need to couple in the dependencies for PI. This requires us to relax our sanity-checks that dependencies are strictly in order. v2: Synchronise (B, B') execution on all platforms, regardless of using a scheduler, any no-op syncs should be elided. Fixes: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding") Closes: https://gitlab.freedesktop.org/drm/intel/issues/464 Testcase: igt/gem_exec_balancer/bonded-chain Testcase: igt/gem_exec_balancer/bonded-semaphore Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191210151332.3902215-1-chris@chris-wilson.co.uk
|
#
9e31c1fe |
|
06-Dec-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Propagate errors on awaiting already signaled fences If we see an already signaled fence that we want to await on, we skip adding to the i915_sw_fence. However, we should pay attention to whether there was an error on that fence and if so propagate it for our future request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191206160428.1503343-2-chris@chris-wilson.co.uk
|
#
67a3acaa |
|
22-Nov-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use a ctor for TYPESAFE_BY_RCU i915_request As we start peeking into requests for longer and longer, e.g. incorporating use of spinlocks when only protected by an rcu_read_lock(), we need to be careful in how we reset the request when recycling and need to preserve any barriers that may still be in use as the request is reset for reuse. Quoting Linus Torvalds: > If there is refcounting going on then why use SLAB_TYPESAFE_BY_RCU? .. because the object can be accessed (by RCU) after the refcount has gone down to zero, and the thing has been released. That's the whole and only point of SLAB_TYPESAFE_BY_RCU. That flag basically says: "I may end up accessing this object *after* it has been free'd, because there may be RCU lookups in flight" This has nothing to do with constructors. It's ok if the object gets reused as an object of the same type and does *not* get re-initialized, because we're perfectly fine seeing old stale data. What it guarantees is that the slab isn't shared with any other kind of object, _and_ that the underlying pages are free'd after an RCU quiescent period (so the pages aren't shared with another kind of object either during an RCU walk). And it doesn't necessarily have to have a constructor, because the thing that a RCU walk will care about is (a) guaranteed to be an object that *has* been on some RCU list (so it's not a "new" object) (b) the RCU walk needs to have logic to verify that it's still the *same* object and hasn't been re-used as something else. In contrast, a SLAB_TYPESAFE_BY_RCU memory gets free'd and re-used immediately, but because it gets reused as the same kind of object, the RCU walker can "know" what parts have meaning for re-use, in a way it couidn't if the re-use was random. That said, it *is* subtle, and people should be careful. > So the re-use might initialize the fields lazily, not necessarily using a ctor. If you have a well-defined refcount, and use "atomic_inc_not_zero()" to guard the speculative RCU access section, and use "atomic_dec_and_test()" in the freeing section, then you should be safe wrt new allocations. If you have a completely new allocation that has "random stale content", you know that it cannot be on the RCU list, so there is no speculative access that can ever see that random content. So the only case you need to worry about is a re-use allocation, and you know that the refcount will start out as zero even if you don't have a constructor. So you can think of the refcount itself as always having a zero constructor, *BUT* you need to be careful with ordering. In particular, whoever does the allocation needs to then set the refcount to a non-zero value *after* it has initialized all the other fields. And in particular, it needs to make sure that it uses the proper memory ordering to do so. NOTE! One thing to be very worried about is that re-initializing whatever RCU lists means that now the RCU walker may be walking on the wrong list so the walker may do the right thing for this particular entry, but it may miss walking *other* entries. So then you can get spurious lookup failures, because the RCU walker never walked all the way to the end of the right list. That ends up being a much more subtle bug. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191122094924.629690-1-chris@chris-wilson.co.uk
|
#
3e7abf81 |
|
24-Oct-2019 |
Andi Shyti <andi@etezian.org> |
drm/i915: Extract GT render power state management i915_irq.c is large. One reason for this is that has a large chunk of the GT render power management stashed away in it. Extract that logic out of i915_irq.c and intel_pm.c and put it under one roof. Based on a patch by Chris Wilson. Signed-off-by: Andi Shyti <andi.shyti@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191024211642.7688-1-chris@chris-wilson.co.uk
|
#
babaab2f |
|
25-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Encapsulate kconfig constant values inside boolean predicates Avoid angering clang and smatch by using a constant value in a '&&' test, by forcing that constant value into a boolean. E.g., drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c:159:13: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand] if (!delay && CONFIG_DRM_I915_PREEMPT_TIMEOUT) { ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191025135943.12524-1-chris@chris-wilson.co.uk
|
#
2871ea85 |
|
24-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Split intel_ring_submission Split the legacy submission backend from the common CS ring buffer handling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191024100344.5041-1-chris@chris-wilson.co.uk
|
#
1dfffa00 |
|
17-Oct-2019 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
drm/i915: Don't disable interrupts independently of the lock The locks (active.lock and rq->lock) need to be taken with disabled interrupts. This is done in i915_request_retire() by disabling the interrupts independently of the locks itself. While local_irq_disable()+spin_lock() equals spin_lock_irq() on vanilla it does not on PREEMPT_RT. Chris Wilson confirmed that local_irq_disable() was just introduced as an optimisation to avoid enabling/disabling interrupts during lock/unlock combo. Enable/disable interrupts as part of the locking instruction. Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191017161352.e5z3ugse7gxl5ari@linutronix.de
|
#
19306502 |
|
15-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Flush tasklet submission before sleeping on i915_request_wait If the system is being slow and userspace is racing ahead of the GPU and finds itself waiting for the GPU to catch up, before the process sleeps give the tasklet a kick, bypassing ksoftirqd. If the system is overloaded, then ksoftirqd may be delayed incurring additional latency to our user. This should not be a frequent problem, but in the past we have observed several hundred millisecond delays before ksoftirqd services an interrupt, so burn a few cycles to lend a helping hand. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191015132606.14349-1-chris@chris-wilson.co.uk
|
#
89b6d183 |
|
13-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/execlists: Tweak virtual unsubmission Since commit e2144503bf3b ("drm/i915: Prevent bonded requests from overtaking each other on preemption") we have restricted requests to run on their chosen engine across preemption events. We can take this restriction into account to know that we will want to resubmit those requests onto the same physical engine, and so can shortcircuit the virtual engine selection process and keep the request on the same engine during unwind. References: e2144503bf3b ("drm/i915: Prevent bonded requests from overtaking each other on preemption") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Ramlingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191013203012.25208-1-chris@chris-wilson.co.uk
|
#
a8385f0c |
|
22-Sep-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Only enqueue already completed requests If we are asked to submit a completed request, just move it onto the active-list without modifying it's payload. If we try to emit the modified payload of a completed request, we risk racing with the ring->head update during retirement which may advance the head past our breadcrumb and so we generate a warning for the emission being behind the RING_HEAD. v2: Commentary for the sneaky, shared responsibility between functions. v3: Spelling mistakes and bonus assertion Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-3-chris@chris-wilson.co.uk (cherry picked from commit c0bb487dc19fc45dbeede7dcf8f513df51a3cd33) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
5facae4f |
|
18-Sep-2019 |
Qian Cai <cai@lca.pw> |
locking/lockdep: Remove unused @nested argument from lock_release() Since the following commit: b4adfe8e05f1 ("locking/lockdep: Remove unused argument in __lock_release") @nested is no longer used in lock_release(), so remove it from all lock_release() calls and friends. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: alexander.levin@microsoft.com Cc: daniel@iogearbox.net Cc: davem@davemloft.net Cc: dri-devel@lists.freedesktop.org Cc: duyuyang@gmail.com Cc: gregkh@linuxfoundation.org Cc: hannes@cmpxchg.org Cc: intel-gfx@lists.freedesktop.org Cc: jack@suse.com Cc: jlbec@evilplan.or Cc: joonas.lahtinen@linux.intel.com Cc: joseph.qi@linux.alibaba.com Cc: jslaby@suse.com Cc: juri.lelli@redhat.com Cc: maarten.lankhorst@linux.intel.com Cc: mark@fasheh.com Cc: mhocko@kernel.org Cc: mripard@kernel.org Cc: ocfs2-devel@oss.oracle.com Cc: rodrigo.vivi@intel.com Cc: sean@poorly.run Cc: st@kernel.org Cc: tj@kernel.org Cc: tytso@mit.edu Cc: vdavydov.dev@gmail.com Cc: vincent.guittot@linaro.org Cc: viro@zeniv.linux.org.uk Link: https://lkml.kernel.org/r/1568909380-32199-1-git-send-email-cai@lca.pw Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
abf5cdcf |
|
18-Sep-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Verify the engine after acquiring the active.lock When using virtual engines, the rq->engine is not stable until we hold the engine->active.lock (as the virtual engine may be exchanged with the sibling). Since commit 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") we may retire a request concurrently with resubmitting it to HW, we need to be extra careful to verify we are holding the correct lock for the request's active list. This is similar to the issue we saw with rescheduling the virtual requests, see sched_lock_engine(). Or else: <4> [876.736126] list_add corruption. prev->next should be next (ffff8883f931a1f8), but was dead000000000100. (prev=ffff888361ffa610). <4> [876.736136] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70 <4> [876.736137] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915] <4> [876.736154] CPU: 2 PID: 21 Comm: ksoftirqd/2 Tainted: G U 5.3.0-CI-CI_DRM_6898+ #1 <4> [876.736156] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019 <4> [876.736157] RIP: 0010:__list_add_valid+0x4d/0x70 <4> [876.736159] Code: c3 48 89 d1 48 c7 c7 20 33 0e 82 48 89 c2 e8 4a 4a bc ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 70 33 0e 82 e8 33 4a bc ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 c0 33 0e 82 e8 <4> [876.736160] RSP: 0018:ffffc9000018bd30 EFLAGS: 00010082 <4> [876.736162] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000104 <4> [876.736163] RDX: 0000000080000104 RSI: 0000000000000000 RDI: 00000000ffffffff <4> [876.736164] RBP: ffffc9000018bd68 R08: 0000000000000000 R09: 0000000000000001 <4> [876.736165] R10: 00000000aed95de3 R11: 000000007fe927eb R12: ffff888361ffca10 <4> [876.736166] R13: ffff888361ffa610 R14: ffff888361ffc880 R15: ffff8883f931a1f8 <4> [876.736168] FS: 0000000000000000(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000 <4> [876.736169] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [876.736170] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0 <4> [876.736171] PKRU: 55555554 <4> [876.736172] Call Trace: <4> [876.736226] __i915_request_submit+0x152/0x370 [i915] <4> [876.736263] __execlists_submission_tasklet+0x6da/0x1f50 [i915] <4> [876.736293] ? execlists_submission_tasklet+0x29/0x50 [i915] <4> [876.736321] execlists_submission_tasklet+0x34/0x50 [i915] <4> [876.736325] tasklet_action_common.isra.5+0x47/0xb0 <4> [876.736328] __do_softirq+0xd8/0x4ae <4> [876.736332] ? smpboot_thread_fn+0x23/0x280 <4> [876.736334] ? smpboot_thread_fn+0x6b/0x280 <4> [876.736336] run_ksoftirqd+0x2b/0x50 <4> [876.736338] smpboot_thread_fn+0x1d3/0x280 <4> [876.736341] ? sort_range+0x20/0x20 <4> [876.736343] kthread+0x119/0x130 <4> [876.736345] ? kthread_park+0xa0/0xa0 <4> [876.736347] ret_from_fork+0x24/0x50 <4> [876.736353] irq event stamp: 2290145 <4> [876.736356] hardirqs last enabled at (2290144): [<ffffffff8123cde8>] __slab_free+0x3e8/0x500 <4> [876.736358] hardirqs last disabled at (2290145): [<ffffffff819cfb4d>] _raw_spin_lock_irqsave+0xd/0x50 <4> [876.736360] softirqs last enabled at (2290114): [<ffffffff81c0033e>] __do_softirq+0x33e/0x4ae <4> [876.736361] softirqs last disabled at (2290119): [<ffffffff810b815b>] run_ksoftirqd+0x2b/0x50 <4> [876.736363] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70 <4> [876.736364] ---[ end trace 3e58d6c7356c65bf ]--- <4> [876.736406] ------------[ cut here ]------------ <4> [876.736415] list_del corruption. prev->next should be ffff888361ffca10, but was ffff88840ac2c730 <4> [876.736421] WARNING: CPU: 2 PID: 5490 at lib/list_debug.c:53 __list_del_entry_valid+0x79/0x90 <4> [876.736422] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915] <4> [876.736433] CPU: 2 PID: 5490 Comm: i915_selftest Tainted: G U W 5.3.0-CI-CI_DRM_6898+ #1 <4> [876.736435] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019 <4> [876.736436] RIP: 0010:__list_del_entry_valid+0x79/0x90 <4> [876.736438] Code: 0b 31 c0 c3 48 89 fe 48 c7 c7 30 34 0e 82 e8 ae 49 bc ff 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 68 34 0e 82 e8 97 49 bc ff <0f> 0b 31 c0 c3 48 c7 c7 a8 34 0e 82 e8 86 49 bc ff 0f 0b 31 c0 c3 <4> [876.736439] RSP: 0018:ffffc900003ef758 EFLAGS: 00010086 <4> [876.736440] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000002 <4> [876.736442] RDX: 0000000080000002 RSI: 0000000000000000 RDI: 00000000ffffffff <4> [876.736443] RBP: ffffc900003ef780 R08: 0000000000000000 R09: 0000000000000001 <4> [876.736444] R10: 000000001418e4b7 R11: 000000007f0ea93b R12: ffff888361ffcab8 <4> [876.736445] R13: ffff88843b6d0000 R14: 000000000000217c R15: 0000000000000001 <4> [876.736447] FS: 00007f4e6f255240(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000 <4> [876.736448] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [876.736449] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0 <4> [876.736450] PKRU: 55555554 <4> [876.736451] Call Trace: <4> [876.736488] i915_request_retire+0x224/0x8e0 [i915] <4> [876.736521] i915_request_create+0x4b/0x1b0 [i915] <4> [876.736550] nop_virtual_engine+0x230/0x4d0 [i915] Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111695 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190918145453.8800-1-chris@chris-wilson.co.uk (cherry picked from commit 37fa0de3c137d5f54f7e64f53495c9d501d42a4d) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
66101975 |
|
04-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move request runtime management onto gt Requests are run from the gt and are tided into the gt runtime power management, so pull the runtime request management under gt/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-12-chris@chris-wilson.co.uk
|
#
f33a8a51 |
|
04-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Merge wait_for_timelines with retire_request wait_for_timelines is essentially the same loop as retiring requests (with an extra timeout), so merge the two into one routine. v2: i915_retire_requests_timeout and keep VT'd w/a as !interruptible Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-10-chris@chris-wilson.co.uk
|
#
b1e3177b |
|
04-Oct-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Coordinate i915_active with its own mutex Forgo the struct_mutex serialisation for i915_active, and interpose its own mutex handling for active/retire. This is a multi-layered sleight-of-hand. First, we had to ensure that no active/retire callbacks accidentally inverted the mutex ordering rules, nor assumed that they were themselves serialised by struct_mutex. More challenging though, is the rule over updating elements of the active rbtree. Instead of the whole i915_active now being serialised by struct_mutex, allocations/rotations of the tree are serialised by the i915_active.mutex and individual nodes are serialised by the caller using the i915_timeline.mutex (we need to use nested spinlocks to interact with the dma_fence callback lists). The pain point here is that instead of a single mutex around execbuf, we now have to take a mutex for active tracker (one for each vma, context, etc) and a couple of spinlocks for each fence update. The improvement in fine grained locking allowing for multiple concurrent clients (eventually!) should be worth it in typical loads. v2: Add some comments that barely elucidate anything :( Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-6-chris@chris-wilson.co.uk
|
#
c0bb487d |
|
22-Sep-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Only enqueue already completed requests If we are asked to submit a completed request, just move it onto the active-list without modifying it's payload. If we try to emit the modified payload of a completed request, we risk racing with the ring->head update during retirement which may advance the head past our breadcrumb and so we generate a warning for the emission being behind the RING_HEAD. v2: Commentary for the sneaky, shared responsibility between functions. v3: Spelling mistakes and bonus assertion Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-3-chris@chris-wilson.co.uk
|
#
6a79d848 |
|
18-Sep-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Lock signaler timeline while navigating As we need to take a walk back along the signaler timeline to find the fence before upon which we want to wait, we need to lock that timeline to prevent it being modified as we walk. Similarly, we also need to acquire a reference to the earlier fence while it still exists! Though we lack the correct locking today, we are saved by the overarching struct_mutex -- but that protection is being removed. v2: Tvrtko made me realise I was being lax and using annotations to ignore the AB-BA deadlock from the timeline overlap. As it would be possible to construct a second request that was using a semaphore from the same timeline as ourselves, we could quite easily end up in a situation where we deadlocked in our mutex waits. Avoid that by using a trylock and falling back to a normal dma-fence await if contended. v3: Eek, the signal->timeline is volatile and must be carefully dereferenced to ensure it is valid. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-2-chris@chris-wilson.co.uk
|
#
d19d71fc |
|
18-Sep-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark i915_request.timeline as a volatile, rcu pointer The request->timeline is only valid until the request is retired (i.e. before it is completed). Upon retiring the request, the context may be unpinned and freed, and along with it the timeline may be freed. We therefore need to be very careful when chasing rq->timeline that the pointer does not disappear beneath us. The vast majority of users are in a protected context, either during request construction or retirement, where the timeline->mutex is held and the timeline cannot disappear. It is those few off the beaten path (where we access a second timeline) that need extra scrutiny -- to be added in the next patch after first adding the warnings about dangerous access. One complication, where we cannot use the timeline->mutex itself, is during request submission onto hardware (under spinlocks). Here, we want to check on the timeline to finalize the breadcrumb, and so we need to impose a second rule to ensure that the request->timeline is indeed valid. As we are submitting the request, it's context and timeline must be pinned, as it will be used by the hardware. Since it is pinned, we know the request->timeline must still be valid, and we cannot submit the idle barrier until after we release the engine->active.lock, ergo while submitting and holding that spinlock, a second thread cannot release the timeline. v2: Don't be lazy inside selftests; hold the timeline->mutex for as long as we need it, and tidy up acquiring the timeline with a bit of refactoring (i915_active_add_request) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-1-chris@chris-wilson.co.uk
|
#
37fa0de3 |
|
18-Sep-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Verify the engine after acquiring the active.lock When using virtual engines, the rq->engine is not stable until we hold the engine->active.lock (as the virtual engine may be exchanged with the sibling). Since commit 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") we may retire a request concurrently with resubmitting it to HW, we need to be extra careful to verify we are holding the correct lock for the request's active list. This is similar to the issue we saw with rescheduling the virtual requests, see sched_lock_engine(). Or else: <4> [876.736126] list_add corruption. prev->next should be next (ffff8883f931a1f8), but was dead000000000100. (prev=ffff888361ffa610). <4> [876.736136] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70 <4> [876.736137] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915] <4> [876.736154] CPU: 2 PID: 21 Comm: ksoftirqd/2 Tainted: G U 5.3.0-CI-CI_DRM_6898+ #1 <4> [876.736156] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019 <4> [876.736157] RIP: 0010:__list_add_valid+0x4d/0x70 <4> [876.736159] Code: c3 48 89 d1 48 c7 c7 20 33 0e 82 48 89 c2 e8 4a 4a bc ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 70 33 0e 82 e8 33 4a bc ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 c0 33 0e 82 e8 <4> [876.736160] RSP: 0018:ffffc9000018bd30 EFLAGS: 00010082 <4> [876.736162] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000104 <4> [876.736163] RDX: 0000000080000104 RSI: 0000000000000000 RDI: 00000000ffffffff <4> [876.736164] RBP: ffffc9000018bd68 R08: 0000000000000000 R09: 0000000000000001 <4> [876.736165] R10: 00000000aed95de3 R11: 000000007fe927eb R12: ffff888361ffca10 <4> [876.736166] R13: ffff888361ffa610 R14: ffff888361ffc880 R15: ffff8883f931a1f8 <4> [876.736168] FS: 0000000000000000(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000 <4> [876.736169] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [876.736170] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0 <4> [876.736171] PKRU: 55555554 <4> [876.736172] Call Trace: <4> [876.736226] __i915_request_submit+0x152/0x370 [i915] <4> [876.736263] __execlists_submission_tasklet+0x6da/0x1f50 [i915] <4> [876.736293] ? execlists_submission_tasklet+0x29/0x50 [i915] <4> [876.736321] execlists_submission_tasklet+0x34/0x50 [i915] <4> [876.736325] tasklet_action_common.isra.5+0x47/0xb0 <4> [876.736328] __do_softirq+0xd8/0x4ae <4> [876.736332] ? smpboot_thread_fn+0x23/0x280 <4> [876.736334] ? smpboot_thread_fn+0x6b/0x280 <4> [876.736336] run_ksoftirqd+0x2b/0x50 <4> [876.736338] smpboot_thread_fn+0x1d3/0x280 <4> [876.736341] ? sort_range+0x20/0x20 <4> [876.736343] kthread+0x119/0x130 <4> [876.736345] ? kthread_park+0xa0/0xa0 <4> [876.736347] ret_from_fork+0x24/0x50 <4> [876.736353] irq event stamp: 2290145 <4> [876.736356] hardirqs last enabled at (2290144): [<ffffffff8123cde8>] __slab_free+0x3e8/0x500 <4> [876.736358] hardirqs last disabled at (2290145): [<ffffffff819cfb4d>] _raw_spin_lock_irqsave+0xd/0x50 <4> [876.736360] softirqs last enabled at (2290114): [<ffffffff81c0033e>] __do_softirq+0x33e/0x4ae <4> [876.736361] softirqs last disabled at (2290119): [<ffffffff810b815b>] run_ksoftirqd+0x2b/0x50 <4> [876.736363] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70 <4> [876.736364] ---[ end trace 3e58d6c7356c65bf ]--- <4> [876.736406] ------------[ cut here ]------------ <4> [876.736415] list_del corruption. prev->next should be ffff888361ffca10, but was ffff88840ac2c730 <4> [876.736421] WARNING: CPU: 2 PID: 5490 at lib/list_debug.c:53 __list_del_entry_valid+0x79/0x90 <4> [876.736422] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915] <4> [876.736433] CPU: 2 PID: 5490 Comm: i915_selftest Tainted: G U W 5.3.0-CI-CI_DRM_6898+ #1 <4> [876.736435] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019 <4> [876.736436] RIP: 0010:__list_del_entry_valid+0x79/0x90 <4> [876.736438] Code: 0b 31 c0 c3 48 89 fe 48 c7 c7 30 34 0e 82 e8 ae 49 bc ff 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 68 34 0e 82 e8 97 49 bc ff <0f> 0b 31 c0 c3 48 c7 c7 a8 34 0e 82 e8 86 49 bc ff 0f 0b 31 c0 c3 <4> [876.736439] RSP: 0018:ffffc900003ef758 EFLAGS: 00010086 <4> [876.736440] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000002 <4> [876.736442] RDX: 0000000080000002 RSI: 0000000000000000 RDI: 00000000ffffffff <4> [876.736443] RBP: ffffc900003ef780 R08: 0000000000000000 R09: 0000000000000001 <4> [876.736444] R10: 000000001418e4b7 R11: 000000007f0ea93b R12: ffff888361ffcab8 <4> [876.736445] R13: ffff88843b6d0000 R14: 000000000000217c R15: 0000000000000001 <4> [876.736447] FS: 00007f4e6f255240(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000 <4> [876.736448] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [876.736449] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0 <4> [876.736450] PKRU: 55555554 <4> [876.736451] Call Trace: <4> [876.736488] i915_request_retire+0x224/0x8e0 [i915] <4> [876.736521] i915_request_create+0x4b/0x1b0 [i915] <4> [876.736550] nop_virtual_engine+0x230/0x4d0 [i915] Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111695 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190918145453.8800-1-chris@chris-wilson.co.uk
|
#
c210e85b |
|
17-Sep-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/tgl: Extend MI_SEMAPHORE_WAIT On Tigerlake, MI_SEMAPHORE_WAIT grew an extra dword, so be sure to update the length field and emit that extra parameter and any padding noop as required. v2: Define the token shift while we are adding the updated MI_SEMAPHORE_WAIT v3: Use int instead of bool in the addition so that readers are not left wondering about the intricacies of the C spec. Now they just have to worry what the integer value of a boolean operation is... Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190917123055.28965-1-chris@chris-wilson.co.uk
|
#
ff36c5c4 |
|
23-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Hold irq-off for the entire fake lock period Sadly lockdep records when the irqs are re-enabled and then marks up the fake lock as being irq-unsafe. Our hand is forced and so we must mark up the entire fake lock critical section as irq-off. Hopefully this is the last tweak required! v2: Not quite, we need to mark the timeline spinlock as irqsafe. That was a genuine bug being hidden by the earlier lockdep splat. Fixes: d67739268cf0 ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190823132700.25286-2-chris@chris-wilson.co.uk (cherry picked from commit 6dcb85a0ad990455ae7c596e3fc966ad9c1ba9c5) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
0f7dc620 |
|
26-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Protect our local workers against I915_FENCE_TIMEOUT Trust our own workers to not cause unnecessary delays and disable the automatic timeout on their asynchronous fence waits. (Along the same lines that we trust our own requests to complete eventually, if necessary by force.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190826072149.9447-6-chris@chris-wilson.co.uk
|
#
77715906 |
|
23-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Keep drm_i915_file_private around under RCU Ensure that the drm_i915_file_private continues to exist as we attempt to remove a request from its list, which may race with the destruction of the file. <6> [38.380714] [IGT] gem_ctx_create: starting subtest basic-files <0> [42.201329] BUG: spinlock bad magic on CPU#0, kworker/u16:0/7 <4> [42.201356] general protection fault: 0000 [#1] PREEMPT SMP PTI <4> [42.201371] CPU: 0 PID: 7 Comm: kworker/u16:0 Tainted: G U 5.3.0-rc5-CI-Patchwork_14169+ #1 <4> [42.201391] Hardware name: Dell Inc. OptiPlex 745 /0GW726, BIOS 2.3.1 05/21/2007 <4> [42.201594] Workqueue: i915 retire_work_handler [i915] <4> [42.201614] RIP: 0010:spin_dump+0x5a/0x90 <4> [42.201625] Code: 00 48 8d 88 c0 06 00 00 48 c7 c7 00 71 09 82 e8 35 ef 00 00 48 85 db 44 8b 4d 08 41 b8 ff ff ff ff 48 c7 c1 0b cd 0f 82 74 0e <44> 8b 83 e0 04 00 00 48 8d 8b c0 06 00 00 8b 55 04 48 89 ee 48 c7 <4> [42.201660] RSP: 0018:ffffc9000004bd80 EFLAGS: 00010202 <4> [42.201673] RAX: 0000000000000031 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffff820fcd0b <4> [42.201688] RDX: 0000000000000000 RSI: ffff88803de266f8 RDI: 00000000ffffffff <4> [42.201703] RBP: ffff888038381ff8 R08: 00000000ffffffff R09: 000000006b6b6b6b <4> [42.201718] R10: 0000000041cb0b89 R11: 646162206b636f6c R12: ffff88802a618500 <4> [42.201733] R13: ffff88802b32c288 R14: ffff888038381ff8 R15: ffff88802b32c250 <4> [42.201748] FS: 0000000000000000(0000) GS:ffff88803de00000(0000) knlGS:0000000000000000 <4> [42.201765] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [42.201778] CR2: 00007f2cefc6d180 CR3: 00000000381ee000 CR4: 00000000000006f0 <4> [42.201793] Call Trace: <4> [42.201805] do_raw_spin_lock+0x66/0xb0 <4> [42.201898] i915_request_retire+0x548/0x7c0 [i915] <4> [42.201989] retire_requests+0x4d/0x60 [i915] <4> [42.202078] i915_retire_requests+0x144/0x2e0 [i915] <4> [42.202169] retire_work_handler+0x10/0x40 [i915] Recently, in commit 44c22f3f1a0a ("drm/i915: Serialize insertion into the file->mm.request_list"), we fixed a race on insertion. Now, it appears we also have a race with destruction! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190823181455.31910-1-chris@chris-wilson.co.uk
|
#
6dcb85a0 |
|
23-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Hold irq-off for the entire fake lock period Sadly lockdep records when the irqs are re-enabled and then marks up the fake lock as being irq-unsafe. Our hand is forced and so we must mark up the entire fake lock critical section as irq-off. Hopefully this is the last tweak required! v2: Not quite, we need to mark the timeline spinlock as irqsafe. That was a genuine bug being hidden by the earlier lockdep splat. Fixes: d67739268cf0 ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190823132700.25286-2-chris@chris-wilson.co.uk
|
#
44c22f3f |
|
20-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Serialize insertion into the file->mm.request_list Currently, we remove the from per-file request list for throttling and retirement under a dedicated spinlock, but insertion is governed by struct_mutex. This needs to be the same lock so that the retirement/insertion of neighbouring requests (at the tail) doesn't break the list. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190820080907.4665-1-chris@chris-wilson.co.uk
|
#
cc337560 |
|
19-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use 0 for the unordered context Since commit 078dec3326e2 ("dma-buf: add dma_fence_get_stub") the 0 fence context became an impossible match as it is used for an always signaled fence. We can simplify our timeline tracking by knowing that 0 always means no match. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190819184404.24200-1-chris@chris-wilson.co.uk Link: https://patchwork.freedesktop.org/patch/msgid/20190819175109.5241-1-chris@chris-wilson.co.uk
|
#
ef468849 |
|
17-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Propagate fence errors Errors spread like wildfire, and must eventually be returned to the user. They need to be captured and passed along the flow of fences, infecting each in turn with the existing error, until finally they fall out of a user visible result. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190817232511.11391-1-chris@chris-wilson.co.uk
|
#
6c69a454 |
|
16-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Mark context->active_count as protected by timeline->mutex We use timeline->mutex to protect modifications to context->active_count, and the associated enable/disable callbacks. Due to complications with engine-pm barrier there is a path where we used a "superlock" to provide serialised protect and so could not unconditionally assert with lockdep that it was always held. However, we can mark the mutex as taken (noting that we may be nested underneath ourselves) which means we can be reassured the right timeline->mutex is always treated as held and let lockdep roam free. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190816121000.8507-1-chris@chris-wilson.co.uk
|
#
e5dadff4 |
|
15-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Protect request retirement with timeline->mutex Forgo the struct_mutex requirement for request retirement as we have been transitioning over to only using the timeline->mutex for controlling the lifetime of a request on that timeline. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-4-chris@chris-wilson.co.uk
|
#
62520e33 |
|
14-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move tasklet kicking to __i915_request_queue caller Since __i915_request_queue() may be called from hardirq (timer) context, we cannot use local_bh_disable/enable at the lower level. As we do want to kick the tasklet to speed up initial submission or preemption for normal client submission, lift it to the normal process context callpath. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190815042031.27750-1-chris@chris-wilson.co.uk
|
#
a79ca656 |
|
13-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Push the wakeref->count deferral to the backend If the backend wishes to defer the wakeref parking, make it responsible for unlocking the wakeref (i.e. bumping the counter). This allows it to time the unlock much more carefully in case it happens to needs the wakeref to be active during its deferral. For instance, during engine parking we may choose to emit an idle barrier (a request). To do so, we borrow the engine->kernel_context timeline and to ensure exclusive access we keep the engine->wakeref.count as 0. However, to submit that request to HW may require a intel_engine_pm_get() (e.g. to keep the submission tasklet alive) and before we allow that we have to rewake our wakeref to avoid a recursive deadlock. <4> [257.742916] IRQs not enabled as expected <4> [257.742930] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:169 __local_bh_enable_ip+0xa9/0x100 <4> [257.742936] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 btusb btrtl btbcm btintel snd_hda_intel snd_intel_nhlt bluetooth snd_hda_codec coretemp snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul ecdh_generic ecc ghash_clmulni_intel snd_pcm r8169 realtek lpc_ich prime_numbers i2c_hid <4> [257.742991] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G U W 5.3.0-rc3-g5d0a06cd532c-drmtip_340+ #1 <4> [257.742998] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015 <4> [257.743008] RIP: 0010:__local_bh_enable_ip+0xa9/0x100 <4> [257.743017] Code: 37 5b 5d c3 8b 80 50 08 00 00 85 c0 75 a9 80 3d 0b be 25 01 00 75 a0 48 c7 c7 f3 0c 06 ac c6 05 fb bd 25 01 01 e8 77 84 ff ff <0f> 0b eb 89 48 89 ef e8 3b 41 06 00 eb 98 e8 e4 5c f4 ff 5b 5d c3 <4> [257.743025] RSP: 0018:ffffa78600003cb8 EFLAGS: 00010086 <4> [257.743035] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000010302 <4> [257.743042] RDX: 0000000080010302 RSI: 0000000000000000 RDI: 00000000ffffffff <4> [257.743050] RBP: ffffffffc0494bb3 R08: 0000000000000000 R09: 0000000000000001 <4> [257.743058] R10: 0000000014c8f0e9 R11: 00000000fee2ff8e R12: ffffa23ba8c38008 <4> [257.743065] R13: ffffa23bacc579c0 R14: ffffa23bb7db0f60 R15: ffffa23b9cc8c430 <4> [257.743074] FS: 0000000000000000(0000) GS:ffffa23bbba00000(0000) knlGS:0000000000000000 <4> [257.743082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [257.743089] CR2: 00007fe477b20778 CR3: 000000011f72a000 CR4: 00000000001006f0 <4> [257.743096] Call Trace: <4> [257.743104] <IRQ> <4> [257.743265] __i915_request_commit+0x240/0x5d0 [i915] <4> [257.743427] ? __i915_request_create+0x228/0x4c0 [i915] <4> [257.743584] __engine_park+0x64/0x250 [i915] <4> [257.743730] ____intel_wakeref_put_last+0x1c/0x70 [i915] <4> [257.743878] i915_sample+0x2ee/0x310 [i915] <4> [257.744030] ? i915_pmu_cpu_offline+0xb0/0xb0 [i915] <4> [257.744040] __hrtimer_run_queues+0x11e/0x4b0 <4> [257.744068] hrtimer_interrupt+0xea/0x250 <4> [257.744079] ? lockdep_hardirqs_off+0x79/0xd0 <4> [257.744101] smp_apic_timer_interrupt+0x96/0x280 <4> [257.744114] apic_timer_interrupt+0xf/0x20 <4> [257.744125] RIP: 0010:__do_softirq+0xb3/0x4ae v2: Keep the priority_hint assert v3: That assert was desperately trying to point out my bug. Sorry, little assert. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111378 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190813190705.23869-1-chris@chris-wilson.co.uk
|
#
52791eee |
|
11-Aug-2019 |
Christian König <christian.koenig@amd.com> |
dma-buf: rename reservation_object to dma_resv Be more consistent with the naming of the other DMA-buf objects. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/323401/
|
#
75d0a7f3 |
|
09-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Lift timeline into intel_context Move the timeline from being inside the intel_ring to intel_context itself. This saves much pointer dancing and makes the relations of the context to its timeline much clearer. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190809182518.20486-4-chris@chris-wilson.co.uk
|
#
a09d9a80 |
|
06-Aug-2019 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: avoid including intel_drv.h via i915_drv.h->i915_trace.h Disentangle i915_drv.h from intel_drv.h, which gets included via i915_trace.h. This necessitates including i915_trace.h wherever it's needed. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ed82bf259d3b725a1a1a3c3e9d6fb5c08bc4d489.1565085691.git.jani.nikula@intel.com
|
#
c29579d2 |
|
06-Aug-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Make caps.scheduler static We do not notify userspace when the scheduler capabilities are changed (due to wedging the driver) and as such userspace will expect the caps to be static and unchanging. Make it so, and so we only need to compute our caps once during driver registration. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190806124300.24945-1-chris@chris-wilson.co.uk
|
#
cb823ed9 |
|
12-Jul-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Use intel_gt as the primary object for handling resets Having taken the first step in encapsulating the functionality by moving the related files under gt/, the next step is to start encapsulating by passing around the relevant structs rather than the global drm_i915_private. In this step, we pass intel_gt to intel_reset.c Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190712192953.9187-1-chris@chris-wilson.co.uk
|
#
2a98f4e6 |
|
09-Jul-2019 |
Lionel Landwerlin <lionel.g.landwerlin@intel.com> |
drm/i915: add infrastructure to hold off preemption on a request We want to set this flag in the next commit on requests containing perf queries so that the result of the perf query can just be a delta of global counters, rather than doing post processing of the OA buffer. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [ickle: add basic selftest for nopreempt] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190709164227.25859-1-chris@chris-wilson.co.uk
|
#
f0c02c1b |
|
21-Jun-2019 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Rename i915_timeline to intel_timeline and move under gt Move all timeline code under gt and rename to intel_gt prefix. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-32-tvrtko.ursulin@linux.intel.com
|
#
22b7a426 |
|
20-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/execlists: Preempt-to-busy When using a global seqno, we required a precise stop-the-workd event to handle preemption and unwind the global seqno counter. To accomplish this, we would preempt to a special out-of-band context and wait for the machine to report that it was idle. Given an idle machine, we could very precisely see which requests had completed and which we needed to feed back into the run queue. However, now that we have scrapped the global seqno, we no longer need to precisely unwind the global counter and only track requests by their per-context seqno. This allows us to loosely unwind inflight requests while scheduling a preemption, with the enormous caveat that the requests we put back on the run queue are still _inflight_ (until the preemption request is complete). This makes request tracking much more messy, as at any point then we can see a completed request that we believe is not currently scheduled for execution. We also have to be careful not to rewind RING_TAIL past RING_HEAD on preempting to the running context, and for this we use a semaphore to prevent completion of the request before continuing. To accomplish this feat, we change how we track requests scheduled to the HW. Instead of appending our requests onto a single list as we submit, we track each submission to ELSP as its own block. Then upon receiving the CS preemption event, we promote the pending block to the inflight block (discarding what was previously being tracked). As normal CS completion events arrive, we then remove stale entries from the inflight tracker. v2: Be a tinge paranoid and ensure we flush the write into the HWS page for the GPU semaphore to pick in a timely fashion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk
|
#
b87b6c0d |
|
18-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Flush the execution-callbacks on retiring In the unlikely case the request completes while we regard it as not even executing on the GPU (see the next patch!), we have to flush any pending execution callbacks at retirement and ensure that we do not add any more. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190618074153.16055-4-chris@chris-wilson.co.uk
|
#
ce94bef9 |
|
18-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Signal fence completion from i915_request_wait With the upcoming change to automanaged i915_active, the intent is that whenever we wait on the set of active fences, they are signaled and collected. The requirement is that all successful returns from i915_request_wait() signal the fence, so fixup the one remaining path where we may return before the interrupt has been run. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190619112341.9082-3-chris@chris-wilson.co.uk
|
#
2f530945 |
|
18-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Stop passing I915_WAIT_LOCKED to i915_request_wait() Since commit eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex"), the I915_WAIT_LOCKED flags passed to i915_request_wait() has been defunct. Now go ahead and remove it from all callers. References: eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190618074153.16055-3-chris@chris-wilson.co.uk
|
#
44d89409 |
|
18-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Make the semaphore saturation mask global The idea behind keeping the saturation mask local to a context backfired spectacularly. The premise with the local mask was that we would be more proactive in attempting to use semaphores after each time the context idled, and that all new contexts would attempt to use semaphores ignoring the current state of the system. This turns out to be horribly optimistic. If the system state is still oversaturated and the existing workloads have all stopped using semaphores, the new workloads would attempt to use semaphores and be deprioritised behind real work. The new contexts would not switch off using semaphores until their initial batch of low priority work had completed. Given sufficient backload load of equal user priority, this would completely starve the new work of any GPU time. To compensate, remove the local tracking in favour of keeping it as global state on the engine -- once the system is saturated and semaphores are disabled, everyone stops attempting to use semaphores until the system is idle again. One of the reason for preferring local context tracking was that it worked with virtual engines, so for switching to global state we could either do a complete check of all the virtual siblings or simply disable semaphores for those requests. This takes the simpler approach of disabling semaphores on virtual engines. The downside is that the decision that the engine is saturated is a local measure -- we are only checking whether or not this context was scheduled in a timely fashion, it may be legitimately delayed due to user priorities. We still have the same dilemma though, that we do not want to employ the semaphore poll unless it will be used. v2: Explain why we need to assume the worst wrt virtual engines. Fixes: ca6e56f654e7 ("drm/i915: Disable semaphore busywaits on saturated systems") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Dmitry Ermilov <dmitry.ermilov@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190618074153.16055-8-chris@chris-wilson.co.uk
|
#
ef78f7b1 |
|
18-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use drm_gem_object.resv Since commit 1ba627148ef5 ("drm: Add reservation_object to drm_gem_object"), struct drm_gem_object grew its own builtin reservation_object rendering our own private one bloat. Remove our redundant reservation_object and point into obj->base.resv instead. References: 1ba627148ef5 ("drm: Add reservation_object to drm_gem_object") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190618125858.7295-1-chris@chris-wilson.co.uk
|
#
422d7df4 |
|
14-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Replace engine->timeline with a plain list To continue the onslaught of removing the assumption of a global execution ordering, another casualty is the engine->timeline. Without an actual timeline to track, it is overkill and we can replace it with a much less grand plain list. We still need a list of requests inflight, for the simple purpose of finding inflight requests (for retiring, resetting, preemption etc). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-3-chris@chris-wilson.co.uk
|
#
9db0c5ca |
|
14-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Stop retiring along engine We no longer track the execution order along the engine and so no longer need to enforce ordering of retire along the engine. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-2-chris@chris-wilson.co.uk
|
#
ce476c80 |
|
14-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Keep contexts pinned until after the next kernel context switch We need to keep the context image pinned in memory until after the GPU has finished writing into it. Since it continues to write as we signal the final breadcrumb, we need to keep it pinned until the request after it is complete. Currently we know the order in which requests execute on each engine, and so to remove that presumption we need to identify a request/context-switch we know must occur after our completion. Any request queued after the signal must imply a context switch, for simplicity we use a fresh request from the kernel context. The sequence of operations for keeping the context pinned until saved is: - On context activation, we preallocate a node for each physical engine the context may operate on. This is to avoid allocations during unpinning, which may be from inside FS_RECLAIM context (aka the shrinker) - On context deactivation on retirement of the last active request (which is before we know the context has been saved), we add the preallocated node onto a barrier list on each engine - On engine idling, we emit a switch to kernel context. When this switch completes, we know that all previous contexts must have been saved, and so on retiring this request we can finally unpin all the contexts that were marked as deactivated prior to the switch. We can enhance this in future by flushing all the idle contexts on a regular heartbeat pulse of a switch to kernel context, which will also be used to check for hung engines. v2: intel_context_active_acquire/_release Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-1-chris@chris-wilson.co.uk
|
#
84383d2e |
|
14-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Refine i915_reset.lock_map We already use a mutex to serialise i915_reset() and wedging, so all we need it to link that into i915_request_wait() and we have our lock cycle detection. v2.5: Take error mutex for selftests Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190614071023.17929-3-chris@chris-wilson.co.uk
|
#
6e4e9708 |
|
13-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Execute signal callbacks from no-op i915_request_wait If we enter i915_request_wait() with an already completed request, but unsignaled dma-fence, signal the fence before returning. This allows us to execute any of the signal callbacks at the earliest opportunity. v2: Also signal after busyspin success Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190614111053.25615-2-chris@chris-wilson.co.uk
|
#
33df8a76 |
|
12-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Prevent lock-cycles between GPU waits and GPU resets We cannot allow ourselves to wait on the GPU while holding any lock as we may need to reset the GPU. While there is not an explicit lock between the two operations, lockdep cannot detect the dependency. So let's tell lockdep about the wait/reset dependency with an explicit lockmap. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190612085246.16374-1-chris@chris-wilson.co.uk
|
#
f4d57d83 |
|
10-Jun-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Allow interrupts when taking the timeline->mutex Before we commit ourselves to writing commands into the ringbuffer and submitting the request, allow signals to interrupt acquisition of the timeline mutex. We allow ourselves to be interrupted at any time later if we need to block for space in the ring, anyway. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190610103610.19883-1-chris@chris-wilson.co.uk
|
#
10be98a7 |
|
28-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move more GEM objects under gem/ Continuing the theme of separating out the GEM clutter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-8-chris@chris-wilson.co.uk
|
#
f71e01a7 |
|
21-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Extend execution fence to support a callback In the next patch, we will want to configure the slave request depending on which physical engine the master request is executed on. For this, we introduce a callback from the execute fence to convey this information. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-8-chris@chris-wilson.co.uk
|
#
78e41ddd |
|
21-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Apply an execution_mask to the virtual_engine Allow the user to direct which physical engines of the virtual engine they wish to execute one, as sometimes it is necessary to override the load balancing algorithm. v2: Only kick the virtual engines on context-out if required Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-7-chris@chris-wilson.co.uk
|
#
a491cc8e |
|
15-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Truly bump ready tasks ahead of busywaits In commit b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of busywaits"), I tried cutting a corner in order to not install a signal for each of our dependencies, and only listened to requests on which we were intending to busywait. The compromise that was made was that instead of then being able to promote the request with a full NOSEMAPHORE like its non-busywaiting brethren, as we had not ensured we had cleared the semaphore chain, we settled for only using the NEWCLIENT boost. With an over saturated system with multiple NEWCLIENTS in flight at any time, this was found to be an inadequate promotion and left us with a much poorer scheduling order than prior to using semaphores. The outcome of this patch, is that all requests have NOSEMAPHORE priority when they have no dependencies and are ready to run and not busywait, restoring the pre-semaphore ordering on saturated systems. We can demonstrate the effect of poor scheduling order by oversaturating the system using gem_wsim on a system with multiple vcs engines (i.e running the same workloads across more clients than required for peak throughput, e.g. media_load_balance_17i7.wsim -c4 -b context): x v5.1 (normalized) + tip * fix +------------------------------------------------------------------------+ | x | | x | | x | | x | | %x | | %%x | | %%x | | %%x | | %%x | | %%x | | %%x | | %%x | | %%x | | %%x | | %%x | | %#x | | %#x | | %#x | | %#x | | %#x | | + %#xx | | + %#xx | | + %%#xx | | + %%#xx | | + %%#xx | | + %%#xx | | + %%##x | | +++ %%##x | | +++ %%##x | | +++ %%##x | | ++++ %%##x | | ++++ %%##x | | ++++ %%##xx | | ++++ %###xx | | ++++ %###xx | | ++++ %###xx | | ++++ %###xx | | ++++ + %#O#xx | | ++++ + %#O#xx | | ++++++ + %#O#xx | | ++++++++++ %OOOxxx| | ++++++++++ + %#OOO#xx| | + ++++++++++++ ++ +++++ + ++ @@OOOO#xx| | |A_| | ||__________M_______A____________________| | | |A_| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 120 0.99456 1.00628 0.999985 1.0001545 0.0024387139 + 120 0.873021 1.00037 0.884134 0.90148752 0.039190862 Difference at 99.5% confidence -0.098667 +/- 0.0110762 -9.86517% +/- 1.10745% (Student's t, pooled s = 0.0277657) % 120 0.990207 1.00165 0.9970265 0.99699748 0.0021024 Difference at 99.5% confidence -0.003157 +/- 0.000908245 -0.315651% +/- 0.0908105% (Student's t, pooled s = 0.00227678) Fixes: b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of busywaits") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Dmitry Ermilov <dmitry.ermilov@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-2-chris@chris-wilson.co.uk (cherry picked from commit 17db337f5098d29415314c4a588b842fc684394b) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
c80274bb |
|
15-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Downgrade NEWCLIENT to non-preemptive Commit 1413b2bc0717 ("drm/i915: Trim NEWCLIENT boosting") had the intended consequence of not allowing a sequence of work that merely crossed into a new engine the privilege to be promoted to NEWCLIENT status. It also had the unintended consequence of actually making NEWCLIENT effective on heavily oversubscribed transcode machines and impacting upon their throughput. If we consider a client packet composed of (rcsA, rcsB, vcs) and 30 of those clients, using the NEWCLIENT boost that will be scheduled as rcsA x 30, (rcsB, vcs) x 30 where as before it would have been (rcsA, rcsB, vcs) x 30 That is with NEWCLIENT only boosting the first request of each client, we would execute all rcsA requests prior to running on the vcs engines; acruing a lot of dead time as compared to the previous case where the vcs engine would be started in parallel to processing the second client. The previous patch has the effect of delaying submission until it is required by a third party (either the user with an explicit wait, or by another client/engine). We reduce the NEWCLIENT bump to a mere WAIT, which has the effect of removing its preemptive grant and reducing it to the same level as any other user interaction -- that it will not be promoted above the interengine dependencies, and so preventing NEWCLIENTS from starving other engines. This a large nerf to the rrul properties of the current NEWCLIENT, but it still does give prioritised submission to new requests from light workloads. References: b16c765122f9 ("drm/i915: Priority boost for new clients") Fixes: 1413b2bc0717 ("drm/i915: Trim NEWCLIENT boosting") # customer impact Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Dmitry Ermilov <dmitry.ermilov@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-4-chris@chris-wilson.co.uk (cherry picked from commit 68fc728b01fcc93b26d52f6e884e738962a49a66) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
9981927c |
|
15-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Bump signaler priority on adding a waiter The handling of the no-preemption priority level imposes the restriction that we need to maintain the implied ordering even though preemption is disabled. Otherwise we may end up with an AB-BA deadlock across multiple engine due to a real preemption event reordering the no-preemption WAITs. To resolve this issue we currently promote all requests to WAIT on unsubmission, however this interferes with the timeslicing requirement that we do not apply any implicit promotion that will defeat the round-robin timeslice list. (If we automatically promote the active request it will go back to the head of the queue and not the tail!) So we need implicit promotion to prevent reordering around semaphores where we are not allowed to preempt, and we must avoid implicit promotion on unsubmission. So instead of at unsubmit, if we apply that implicit promotion on adding the dependency, we avoid the semaphore deadlock and we also reduce the gains made by the promotion for user space waiting. Furthermore, by keeping the earlier dependencies at a higher level, we reduce the search space for timeslicing without altering runtime scheduling too badly (no dependencies at all will be assigned a higher priority for rrul). v2: Limit the bump to external edges (as originally intended) i.e. between contexts and out to the user. Testcase: igt/gem_concurrent_blit Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-3-chris@chris-wilson.co.uk (cherry picked from commit 6e7eb7a80769e7250e31652b96918cf7f3e0d285) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
68fc728b |
|
15-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Downgrade NEWCLIENT to non-preemptive Commit 1413b2bc0717 ("drm/i915: Trim NEWCLIENT boosting") had the intended consequence of not allowing a sequence of work that merely crossed into a new engine the privilege to be promoted to NEWCLIENT status. It also had the unintended consequence of actually making NEWCLIENT effective on heavily oversubscribed transcode machines and impacting upon their throughput. If we consider a client packet composed of (rcsA, rcsB, vcs) and 30 of those clients, using the NEWCLIENT boost that will be scheduled as rcsA x 30, (rcsB, vcs) x 30 where as before it would have been (rcsA, rcsB, vcs) x 30 That is with NEWCLIENT only boosting the first request of each client, we would execute all rcsA requests prior to running on the vcs engines; acruing a lot of dead time as compared to the previous case where the vcs engine would be started in parallel to processing the second client. The previous patch has the effect of delaying submission until it is required by a third party (either the user with an explicit wait, or by another client/engine). We reduce the NEWCLIENT bump to a mere WAIT, which has the effect of removing its preemptive grant and reducing it to the same level as any other user interaction -- that it will not be promoted above the interengine dependencies, and so preventing NEWCLIENTS from starving other engines. This a large nerf to the rrul properties of the current NEWCLIENT, but it still does give prioritised submission to new requests from light workloads. References: b16c765122f9 ("drm/i915: Priority boost for new clients") Fixes: 1413b2bc0717 ("drm/i915: Trim NEWCLIENT boosting") # customer impact Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Dmitry Ermilov <dmitry.ermilov@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-4-chris@chris-wilson.co.uk
|
#
6e7eb7a8 |
|
15-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Bump signaler priority on adding a waiter The handling of the no-preemption priority level imposes the restriction that we need to maintain the implied ordering even though preemption is disabled. Otherwise we may end up with an AB-BA deadlock across multiple engine due to a real preemption event reordering the no-preemption WAITs. To resolve this issue we currently promote all requests to WAIT on unsubmission, however this interferes with the timeslicing requirement that we do not apply any implicit promotion that will defeat the round-robin timeslice list. (If we automatically promote the active request it will go back to the head of the queue and not the tail!) So we need implicit promotion to prevent reordering around semaphores where we are not allowed to preempt, and we must avoid implicit promotion on unsubmission. So instead of at unsubmit, if we apply that implicit promotion on adding the dependency, we avoid the semaphore deadlock and we also reduce the gains made by the promotion for user space waiting. Furthermore, by keeping the earlier dependencies at a higher level, we reduce the search space for timeslicing without altering runtime scheduling too badly (no dependencies at all will be assigned a higher priority for rrul). v2: Limit the bump to external edges (as originally intended) i.e. between contexts and out to the user. Testcase: igt/gem_concurrent_blit Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-3-chris@chris-wilson.co.uk
|
#
17db337f |
|
15-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Truly bump ready tasks ahead of busywaits In commit b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of busywaits"), I tried cutting a corner in order to not install a signal for each of our dependencies, and only listened to requests on which we were intending to busywait. The compromise that was made was that instead of then being able to promote the request with a full NOSEMAPHORE like its non-busywaiting brethren, as we had not ensured we had cleared the semaphore chain, we settled for only using the NEWCLIENT boost. With an over saturated system with multiple NEWCLIENTS in flight at any time, this was found to be an inadequate promotion and left us with a much poorer scheduling order than prior to using semaphores. The outcome of this patch, is that all requests have NOSEMAPHORE priority when they have no dependencies and are ready to run and not busywait, restoring the pre-semaphore ordering on saturated systems. We can demonstrate the effect of poor scheduling order by oversaturating the system using gem_wsim on a system with multiple vcs engines (i.e running the same workloads across more clients than required for peak throughput, e.g. media_load_balance_17i7.wsim -c4 -b context): x v5.1 (normalized) + tip * fix +------------------------------------------------------------------------+ | x | | x | | x | | x | | %x | | %%x | | %%x | | %%x | | %%x | | %%x | | %%x | | %%x | | %%x | | %%x | | %%x | | %#x | | %#x | | %#x | | %#x | | %#x | | + %#xx | | + %#xx | | + %%#xx | | + %%#xx | | + %%#xx | | + %%#xx | | + %%##x | | +++ %%##x | | +++ %%##x | | +++ %%##x | | ++++ %%##x | | ++++ %%##x | | ++++ %%##xx | | ++++ %###xx | | ++++ %###xx | | ++++ %###xx | | ++++ %###xx | | ++++ + %#O#xx | | ++++ + %#O#xx | | ++++++ + %#O#xx | | ++++++++++ %OOOxxx| | ++++++++++ + %#OOO#xx| | + ++++++++++++ ++ +++++ + ++ @@OOOO#xx| | |A_| | ||__________M_______A____________________| | | |A_| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 120 0.99456 1.00628 0.999985 1.0001545 0.0024387139 + 120 0.873021 1.00037 0.884134 0.90148752 0.039190862 Difference at 99.5% confidence -0.098667 +/- 0.0110762 -9.86517% +/- 1.10745% (Student's t, pooled s = 0.0277657) % 120 0.990207 1.00165 0.9970265 0.99699748 0.0021024 Difference at 99.5% confidence -0.003157 +/- 0.000908245 -0.315651% +/- 0.0908105% (Student's t, pooled s = 0.00227678) Fixes: b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of busywaits") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Dmitry Ermilov <dmitry.ermilov@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-2-chris@chris-wilson.co.uk
|
#
dba5a7f3 |
|
15-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Mark semaphores as complete on unsubmit out if payload was started Avoid charging us for the presumed busywait if the request was preempted after successfully using semaphores to reduce inter-engine latency. v2: Bump the priority to reflect the lack of semaphores now required. References: ca6e56f654e7 ("drm/i915: Disable semaphore busywaits on saturated systems") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-1-chris@chris-wilson.co.uk
|
#
c36beba6 |
|
07-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Seal races between async GPU cancellation, retirement and signaling Currently there is an underlying assumption that i915_request_unsubmit() is synchronous wrt the GPU -- that is the request is no longer in flight as we remove it. In the near future that may change, and this may upset our signaling as we can process an interrupt for that request while it is no longer in flight. CPU0 CPU1 intel_engine_breadcrumbs_irq (queue request completion) i915_request_cancel_signaling ... ... i915_request_enable_signaling dma_fence_signal Hence in the time it took us to drop the lock to signal the request, a preemption event may have occurred and re-queued the request. In the process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and so reused the rq->signal_link that was in use on CPU0, leading to bad pointer chasing in intel_engine_breadcrumbs_irq. A related issue was that if someone started listening for a signal on a completed but no longer in-flight request, we missed the opportunity to immediately signal that request. Furthermore, as intel_contexts may be immediately released during request retirement, in order to be entirely sure that intel_engine_breadcrumbs_irq may no longer dereference the intel_context (ce->signals and ce->signal_link), we must wait for irq spinlock. In order to prevent the race, we use a bit in the fence.flags to signal the transfer onto the signal list inside intel_engine_breadcrumbs_irq. For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then quickly signals to any outside observer that the fence is indeed signaled. v2: Sketch out potential dma-fence API for manual signaling v3: And the test_and_set_bit() Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190508112452.18942-1-chris@chris-wilson.co.uk (cherry picked from commit 0152b3b3f49b36b0f1a1bf9f0353dc636f41d8f0) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
0152b3b3 |
|
07-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Seal races between async GPU cancellation, retirement and signaling Currently there is an underlying assumption that i915_request_unsubmit() is synchronous wrt the GPU -- that is the request is no longer in flight as we remove it. In the near future that may change, and this may upset our signaling as we can process an interrupt for that request while it is no longer in flight. CPU0 CPU1 intel_engine_breadcrumbs_irq (queue request completion) i915_request_cancel_signaling ... ... i915_request_enable_signaling dma_fence_signal Hence in the time it took us to drop the lock to signal the request, a preemption event may have occurred and re-queued the request. In the process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and so reused the rq->signal_link that was in use on CPU0, leading to bad pointer chasing in intel_engine_breadcrumbs_irq. A related issue was that if someone started listening for a signal on a completed but no longer in-flight request, we missed the opportunity to immediately signal that request. Furthermore, as intel_contexts may be immediately released during request retirement, in order to be entirely sure that intel_engine_breadcrumbs_irq may no longer dereference the intel_context (ce->signals and ce->signal_link), we must wait for irq spinlock. In order to prevent the race, we use a bit in the fence.flags to signal the transfer onto the signal list inside intel_engine_breadcrumbs_irq. For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then quickly signals to any outside observer that the fence is indeed signaled. v2: Sketch out potential dma-fence API for manual signaling v3: And the test_and_set_bit() Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190508112452.18942-1-chris@chris-wilson.co.uk
|
#
25d851ad |
|
07-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Only reschedule the submission tasklet if preemption is possible If we couple the scheduler more tightly with the execlists policy, we can apply the preemption policy to the question of whether we need to kick the tasklet at all for this priority bump. v2: Rephrase it as a core i915 policy and not an execlists foible. v3: Pull the kick together. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190507122544.12698-1-chris@chris-wilson.co.uk
|
#
c8a0e2ae |
|
03-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Acquire the signaler's timeline HWSP last Acquiring the signaler's timeline takes an active reference to their HWSP that we would like to avoid if possible, so take it after performing all of our allocations required to set up the fencing. The acquisition also provides the final check that the target has not already signaled allowing us to avoid the semaphore at the last moment. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190503140239.32668-1-chris@chris-wilson.co.uk
|
#
2564fe70 |
|
04-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Disable semaphore busywaits on saturated systems Asking the GPU to busywait on a memory address, perhaps not unexpectedly in hindsight for a shared system, leads to bus contention that affects CPU programs trying to concurrently access memory. This can manifest as a drop in transcode throughput on highly over-saturated workloads. The only clue offered by perf, is that the bus-cycles (perf stat -e bus-cycles) jumped by 50% when enabling semaphores. This corresponds with extra CPU active cycles being attributed to intel_idle's mwait. This patch introduces a heuristic to try and detect when more than one client is submitting to the GPU pushing it into an oversaturated state. As we already keep track of when the semaphores are signaled, we can inspect their state on submitting the busywait batch and if we planned to use a semaphore but were too late, conclude that the GPU is overloaded and not try to use semaphores in future requests. In practice, this means we optimistically try to use semaphores for the first frame of a transcode job split over multiple engines, and fail if there are multiple clients active and continue not to use semaphores for the subsequent frames in the sequence. Periodically, we try to optimistically switch semaphores back on whenever the client waits to catch up with the transcode results. With 1 client, on Broxton J3455, with the relative fps normalized by %cpu: x no semaphores + drm-tip * patched +------------------------------------------------------------------------+ | * | | *+ | | **+ | | **+ x | | x * +**+ x | | x x * * +***x xx | | x x * * *+***x *x | | x x* + * * *****x *x x | | + x xx+x* + *** * ********* x * | | + x xx+x* * *** +** ********* xx * | | * + ++++* + x*x****+*+* ***+*************+x* * | |*+ +** *+ + +* + *++****** *xxx**********x***+*****************+*++ *| | |__________A_____M_____| | | |_______________A____M_________| | | |____________A___M________| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 120 2.60475 3.50941 3.31123 3.2143953 0.21117399 + 120 2.3826 3.57077 3.25101 3.1414161 0.28146407 Difference at 95.0% confidence -0.0729792 +/- 0.0629585 -2.27039% +/- 1.95864% (Student's t, pooled s = 0.248814) * 120 2.35536 3.66713 3.2849 3.2059917 0.24618565 No difference proven at 95.0% confidence With 10 clients over-saturating the pipeline: x no semaphores + drm-tip * patched +------------------------------------------------------------------------+ | ++ ** | | ++ ** | | ++ ** | | ++ ** | | ++ xx *** | | ++ xx *** | | ++ xxx*** | | ++ xxx*** | | +++ xxx*** | | +++ xx**** | | +++ xx**** | | +++ xx**** | | +++ xx**** | | ++++ xx**** | | +++++ xx**** | | +++++ x x****** | | ++++++ xxx******* | | ++++++ xxx******* | | ++++++ xxx******* | | ++++++ xx******** | | ++++++ xxxx******** | | ++++++ xxxx******** | | ++++++++ xxxxx********* | |+ + + + ++++++++ xxx*xx**********x* *| | |__A__| | | |__AM__| | | |__A_| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 120 2.47855 2.8972 2.72376 2.7193402 0.074604933 + 120 1.17367 1.77459 1.71977 1.6966782 0.085850697 Difference at 95.0% confidence -1.02266 +/- 0.0203502 -37.607% +/- 0.748352% (Student's t, pooled s = 0.0804246) * 120 2.57868 3.00821 2.80142 2.7923878 0.058646477 Difference at 95.0% confidence 0.0730476 +/- 0.0169791 2.68622% +/- 0.624383% (Student's t, pooled s = 0.0671018) Indicating that we've recovered the regression from enabling semaphores on this saturated setup, with a hint towards an overall improvement. Very similar, but of smaller magnitude, results are observed on both Skylake(gt2) and Kabylake(gt4). This may be due to the reduced impact of bus-cycles, where we see a 50% hit on Broxton, it is only 10% on the big core, in this particular test. One observation to make here is that for a greedy client trying to maximise its own throughput, using semaphores is the right choice. It is only the holistic system-wide view that semaphores of one client impacts another and reduces the overall throughput where we would choose to disable semaphores. The most noticeable negactive impact this has is on the no-op microbenchmarks, which are also very notable for having no cpu bus load. In particular, this increases the runtime and energy consumption of gem_exec_whisper. Fixes: e88619646971 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Dmitry Ermilov <dmitry.ermilov@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190504070707.30902-1-chris@chris-wilson.co.uk (cherry picked from commit ca6e56f654e7b241256ffba78cd2abb22aa3bc97) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
e766fde6 |
|
30-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Delay semaphore submission until the start of the signaler Currently we submit the semaphore busywait as soon as the signaler is submitted to HW. However, we may submit the signaler as the tail of a batch of requests, and even not as the first context in the HW list, i.e. the busywait may start spinning far in advance of the signaler even starting. If we wait until the request before the signaler is completed before submitting the busywait, we prevent the busywait from starting too early, if the signaler is not first in submission port. To handle the case where the signaler is at the start of the second (or later) submission port, we will need to delay the execution callback until we know the context is promoted to port0. A challenge for later. Fixes: e88619646971 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190501114541.10077-9-chris@chris-wilson.co.uk (cherry picked from commit 0d90ccb70211cbf55140e91bd39db684aa4c16e9) [Joonas: edited Fixes: tag into single line.] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
ca6e56f6 |
|
04-May-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Disable semaphore busywaits on saturated systems Asking the GPU to busywait on a memory address, perhaps not unexpectedly in hindsight for a shared system, leads to bus contention that affects CPU programs trying to concurrently access memory. This can manifest as a drop in transcode throughput on highly over-saturated workloads. The only clue offered by perf, is that the bus-cycles (perf stat -e bus-cycles) jumped by 50% when enabling semaphores. This corresponds with extra CPU active cycles being attributed to intel_idle's mwait. This patch introduces a heuristic to try and detect when more than one client is submitting to the GPU pushing it into an oversaturated state. As we already keep track of when the semaphores are signaled, we can inspect their state on submitting the busywait batch and if we planned to use a semaphore but were too late, conclude that the GPU is overloaded and not try to use semaphores in future requests. In practice, this means we optimistically try to use semaphores for the first frame of a transcode job split over multiple engines, and fail if there are multiple clients active and continue not to use semaphores for the subsequent frames in the sequence. Periodically, we try to optimistically switch semaphores back on whenever the client waits to catch up with the transcode results. With 1 client, on Broxton J3455, with the relative fps normalized by %cpu: x no semaphores + drm-tip * patched +------------------------------------------------------------------------+ | * | | *+ | | **+ | | **+ x | | x * +**+ x | | x x * * +***x xx | | x x * * *+***x *x | | x x* + * * *****x *x x | | + x xx+x* + *** * ********* x * | | + x xx+x* * *** +** ********* xx * | | * + ++++* + x*x****+*+* ***+*************+x* * | |*+ +** *+ + +* + *++****** *xxx**********x***+*****************+*++ *| | |__________A_____M_____| | | |_______________A____M_________| | | |____________A___M________| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 120 2.60475 3.50941 3.31123 3.2143953 0.21117399 + 120 2.3826 3.57077 3.25101 3.1414161 0.28146407 Difference at 95.0% confidence -0.0729792 +/- 0.0629585 -2.27039% +/- 1.95864% (Student's t, pooled s = 0.248814) * 120 2.35536 3.66713 3.2849 3.2059917 0.24618565 No difference proven at 95.0% confidence With 10 clients over-saturating the pipeline: x no semaphores + drm-tip * patched +------------------------------------------------------------------------+ | ++ ** | | ++ ** | | ++ ** | | ++ ** | | ++ xx *** | | ++ xx *** | | ++ xxx*** | | ++ xxx*** | | +++ xxx*** | | +++ xx**** | | +++ xx**** | | +++ xx**** | | +++ xx**** | | ++++ xx**** | | +++++ xx**** | | +++++ x x****** | | ++++++ xxx******* | | ++++++ xxx******* | | ++++++ xxx******* | | ++++++ xx******** | | ++++++ xxxx******** | | ++++++ xxxx******** | | ++++++++ xxxxx********* | |+ + + + ++++++++ xxx*xx**********x* *| | |__A__| | | |__AM__| | | |__A_| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 120 2.47855 2.8972 2.72376 2.7193402 0.074604933 + 120 1.17367 1.77459 1.71977 1.6966782 0.085850697 Difference at 95.0% confidence -1.02266 +/- 0.0203502 -37.607% +/- 0.748352% (Student's t, pooled s = 0.0804246) * 120 2.57868 3.00821 2.80142 2.7923878 0.058646477 Difference at 95.0% confidence 0.0730476 +/- 0.0169791 2.68622% +/- 0.624383% (Student's t, pooled s = 0.0671018) Indicating that we've recovered the regression from enabling semaphores on this saturated setup, with a hint towards an overall improvement. Very similar, but of smaller magnitude, results are observed on both Skylake(gt2) and Kabylake(gt4). This may be due to the reduced impact of bus-cycles, where we see a 50% hit on Broxton, it is only 10% on the big core, in this particular test. One observation to make here is that for a greedy client trying to maximise its own throughput, using semaphores is the right choice. It is only the holistic system-wide view that semaphores of one client impacts another and reduces the overall throughput where we would choose to disable semaphores. The most noticeable negactive impact this has is on the no-op microbenchmarks, which are also very notable for having no cpu bus load. In particular, this increases the runtime and energy consumption of gem_exec_whisper. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Dmitry Ermilov <dmitry.ermilov@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190504070707.30902-1-chris@chris-wilson.co.uk
|
#
0d90ccb7 |
|
30-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Delay semaphore submission until the start of the signaler Currently we submit the semaphore busywait as soon as the signaler is submitted to HW. However, we may submit the signaler as the tail of a batch of requests, and even not as the first context in the HW list, i.e. the busywait may start spinning far in advance of the signaler even starting. If we wait until the request before the signaler is completed before submitting the busywait, we prevent the busywait from starting too early, if the signaler is not first in submission port. To handle the case where the signaler is at the start of the second (or later) submission port, we will need to delay the execution callback until we know the context is promoted to port0. A challenge for later. Fixes: e88619646971 ("drm/i915: Use HW semaphores for inter-engine synchroni sation on gen8+") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190501114541.10077-9-chris@chris-wilson.co.uk
|
#
46472b3e |
|
26-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move i915_request_alloc into selftests/ Having transitioned GEM over to using intel_context as its primary means of tracking the GEM context and engine combined and using i915_request_create(), we can move the older i915_request_alloc() helper function into selftests/ where the remaining users are confined. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-9-chris@chris-wilson.co.uk
|
#
5e2a0419 |
|
26-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Switch back to an array of logical per-engine HW contexts We switched to a tree of per-engine HW context to accommodate the introduction of virtual engines. However, we plan to also support multiple instances of the same engine within the GEM context, defeating our use of the engine as a key to looking up the HW context. Just allocate a logical per-engine instance and always use an index into the ctx->engines[]. Later on, this ctx->engines[] may be replaced by a user specified map. v2: Add for_each_gem_engine() helper to iterator within the engines lock v3: intel_context_create_request() helper v4: s/unsigned long/unsigned int/ 4 billion engines is quite enough. v5: Push iterator locking to caller Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-7-chris@chris-wilson.co.uk
|
#
fa9f6681 |
|
26-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Export intel_context_instance() We want to pass in a intel_context into intel_context_pin() and that requires us to first be able to lookup the intel_context! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-2-chris@chris-wilson.co.uk
|
#
8f2a1057 |
|
24-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Explicitly pin the logical context for execbuf In order to separate the reservation phase of building a request from its emission phase, we need to pull some of the request alloc activities from deep inside i915_request to the surface, GEM_EXECBUFFER. v2: Be frivolous, use a local drm_i915_private. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190425050143.811-1-chris@chris-wilson.co.uk
|
#
79ffac85 |
|
24-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Invert the GEM wakeref hierarchy In the current scheme, on submitting a request we take a single global GEM wakeref, which trickles down to wake up all GT power domains. This is undesirable as we would like to be able to localise our power management to the available power domains and to remove the global GEM operations from the heart of the driver. (The intent there is to push global GEM decisions to the boundary as used by the GEM user interface.) Now during request construction, each request is responsible via its logical context to acquire a wakeref on each power domain it intends to utilize. Currently, each request takes a wakeref on the engine(s) and the engines themselves take a chipset wakeref. This gives us a transition on each engine which we can extend if we want to insert more powermangement control (such as soft rc6). The global GEM operations that currently require a struct_mutex are reduced to listening to pm events from the chipset GT wakeref. As we reduce the struct_mutex requirement, these listeners should evaporate. Perhaps the biggest immediate change is that this removes the struct_mutex requirement around GT power management, allowing us greater flexibility in request construction. Another important knock-on effect, is that by tracking engine usage, we can insert a switch back to the kernel context on that engine immediately, avoiding any extra delay or inserting global synchronisation barriers. This makes tracking when an engine and its associated contexts are idle much easier -- important for when we forgo our assumed execution ordering and need idle barriers to unpin used contexts. In the process, it means we remove a large chunk of code whose only purpose was to switch back to the kernel context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
|
#
2ccdf6a1 |
|
24-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Pass intel_context to i915_request_create() Start acquiring the logical intel_context and using that as our primary means for request allocation. This is the initial step to allow us to avoid requiring struct_mutex for request allocation along the perma-pinned kernel context, but it also provides a foundation for breaking up the complex request allocation to handle different scenarios inside execbuf. For the purpose of emitting a request from inside retirement (see the next patch for engine power management), we also need to lift control over the timeline mutex to the caller. v2: Note that the request carries the active reference upon construction. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-4-chris@chris-wilson.co.uk
|
#
6eee33e8 |
|
24-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Introduce context->enter() and context->exit() We wish to start segregating the power management into different control domains, both with respect to the hardware and the user interface. The first step is that at the lowest level flow of requests, we want to process a context event (and not a global GEM operation). In this patch, we introduce the context callbacks that in future patches will be redirected to per-engine interfaces leading to global operations as required. The intent is that this will be guarded by the timeline->mutex, except that retiring has not quite finished transitioning over from being guarded by struct_mutex. So at the moment it is protected by struct_mutex with a reminded to switch. v2: Rename default handlers to intel_context_enter_engine. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-3-chris@chris-wilson.co.uk
|
#
112ed2d3 |
|
24-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move GraphicsTechnology files under gt/ Start partitioning off the code that talks to the hardware (GT) from the uapi layers and move the device facing code under gt/ One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to subdivide that header and body further (and split out the submission code from the ringbuffer and logical context handling). This patch aims to be simple motion so git can fixup inflight patches with little mess. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk
|
#
7ce99d24 |
|
19-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Expose the busyspin durations for i915_wait_request An interesting discussion regarding "hybrid interrupt polling" for NVMe came to the conclusion that the ideal busyspin before sleeping was half of the expected request latency (and better if it was already halfway through that request). This suggested that we too should look again at our tradeoff between spinning and waiting. Currently, our spin simply tries to hide the cost of enabling the interrupt, which is good to avoid penalising nop requests (i.e. test throughput) and not much else. Studying real world workloads suggests that a spin of upto 500us can dramatically boost performance, but the suggestion is that this is not from avoiding interrupt latency per-se, but from secondary effects of sleeping such as allowing the CPU reduce cstate and context switch away. In a truly hybrid interrupt polling scheme, we would aim to sleep until just before the request completed and then wake up in advance of the interrupt and do a quick poll to handle completion. This is tricky for ourselves at the moment as we are not recording request times, and since we allow preemption, our requests are not on as a nicely ordered timeline as IO. However, the idea is interesting, for it will certainly help us decide when busyspinning is worthwhile. v2: Expose the spin setting via Kconfig options for easier adjustment and testing. v3: Don't get caught sneaking in a change to the busyspin parameters. v4: Explain more about the "hybrid interrupt polling" scheme that we want to migrate towards. Suggested-by: Sagar Kamble <sagar.a.kamble@intel.com> References: http://events.linuxfoundation.org/sites/events/files/slides/lemoal-nvme-polling-vault-2017-final_0.pdf Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Sagar Kamble <sagar.a.kamble@intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Sagar Kamble <sagar.a.kamble@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190419182625.11186-1-chris@chris-wilson.co.uk
|
#
0c441cb6 |
|
11-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Call i915_sw_fence_fini on request cleanup As i915_requests are put into an RCU-freelist, they may get reused before debugobjects notice them as being freed. On cleanup, explicitly call i915_sw_fence_fini() so that the debugobject is properly tracked. Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Fixes: b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of busywaits") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190411122445.20060-1-chris@chris-wilson.co.uk
|
#
b7404c7e |
|
09-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Bump ready tasks ahead of busywaits Consider two tasks that are running in parallel on a pair of engines (vcs0, vcs1), but then must complete on a shared engine (rcs0). To maximise throughput, we want to run the first ready task on rcs0 (i.e. the first task that completes on either of vcs0 or vcs1). When using semaphores, however, we will instead queue onto rcs in submission order. To resolve this incorrect ordering, we want to re-evaluate the priority queue when each of the request is ready. Normally this happens because we only insert into the priority queue requests that are ready, but with semaphores we are inserting ahead of their readiness and to compensate we penalize those tasks with reduced priority (so that tasks that do not need to busywait should naturally be run first). However, given a series of tasks that each use semaphores, the queue degrades into submission fifo rather than readiness fifo, and so to counter this we give a small boost to semaphore users as their dependent tasks are completed (and so we no longer require any busywait prior to running the user task as they are then ready themselves). v2: Fixup irqsave for schedule_lock (Tvrtko) Testcase: igt/gem_exec_schedule/semaphore-codependency Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Dmitry Ermilov <dmitry.ermilov@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190409152922.23894-1-chris@chris-wilson.co.uk
|
#
de220cc2 |
|
08-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Consolidate the timeline->barrier The timeline is strictly ordered, so by inserting the timeline->barrier request into the timeline->last_request it naturally provides the same barrier. Consolidate the pair of barriers into one as they serve the same purpose. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190408091728.20207-4-chris@chris-wilson.co.uk
|
#
696173b0 |
|
05-Apr-2019 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: extract intel_pm.h from intel_drv.h It used to be handy that we only had a couple of headers, but over time intel_drv.h has become unwieldy. Extract declarations to a separate header file corresponding to the implementation module, clarifying the modularity of the driver. Ensure the new header is self-contained, and do so with minimal further includes, using forward declarations as needed. Include the new header only where needed, and sort the modified include directives while at it and as needed. No functional changes. v2: gen6_rps_reset_ei() is in i915_irq.c not intel_pm.c. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/adc6463b95eef3440fba9826793f7d1c5f3b0b4a.1554461791.git.jani.nikula@intel.com
|
#
b66ea2c2 |
|
03-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use lockdep_pin_lock() over the construction of the request During request construction, we take the timeline->mutex to ensure exclusive access to the ringbuffer (for command emission) and the timeline itself (for command ordering). The timeline->mutex should not be dropped by callers until we release it in i915_request_add(). lockdep provides a pin/unpin lock facility to detect accidental unlocks inside critical sections, so put it to use for request construction. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190403082132.327-1-chris@chris-wilson.co.uk
|
#
7881e605 |
|
01-Apr-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Only emit one semaphore per request Ideally we only need one semaphore per ring to accommodate waiting on multiple engines in parallel. However, since we do not know which fences we will finally be waiting on, we emit a semaphore for every fence. It turns out to be quite easy to trick ourselves into exhausting our ringbuffer causing an error, just by feeding in a batch that depends on several thousand contexts. Since we never can be waiting on more than one semaphore in parallel (other than perhaps the desire to busywait on multiple engines), just pick the first fence for our semaphore. If we pick the wrong fence to busywait on, we just miss an opportunity to reduce latency. An adaption might be to use sched.flags as either a semaphore counter, or to track the first busywait on each engine, converting it back to a single use bit prior to closing the request. v2: Track first semaphore used per-engine (this caters for our basic igt that semaphores are working). Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> Testcase: igt/gem_exec_fence/long-history Fixes: e88619646971 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190401162641.10963-3-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
ea593dbb |
|
22-Mar-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Allow contexts to share a single timeline across all engines Previously, our view has been always to run the engines independently within a context. (Multiple engines happened before we had contexts and timelines, so they always operated independently and that behaviour persisted into contexts.) However, at the user level the context often represents a single timeline (e.g. GL contexts) and userspace must ensure that the individual engines are serialised to present that ordering to the client (or forgot about this detail entirely and hope no one notices - a fair ploy if the client can only directly control one engine themselves ;) In the next patch, we will want to construct a set of engines that operate as one, that have a single timeline interwoven between them, to present a single virtual engine to the user. (They submit to the virtual engine, then we decide which engine to execute on based.) To that end, we want to be able to create contexts which have a single timeline (fence context) shared between all engines, rather than multiple timelines. v2: Move the specialised timeline ordering to its own function. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-4-chris@chris-wilson.co.uk
|
#
4daffb66 |
|
21-Mar-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Stop storing the context name as the timeline name The timeline->name is only used for convenience in pretty printing the i915_request.fence->ops->get_timeline_name() and it is just as convenient to pull it from the gem_context directly. The few instances of its use inside GEM_TRACE() has proven more of a nuisance than helpful, so not worth saving imo. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190321140711.11190-4-chris@chris-wilson.co.uk
|
#
65baf0ef |
|
18-Mar-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Hold a ref to the ring while retiring As the final request on a ring may hold the reference to this ring (via retiring the last pinned context), we may find ourselves chasing a dangling pointer on completion of the list. A quick solution is to hold a reference to the ring itself as we retire along it so that we only free it after we stop dereferencing it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190318095204.9913-4-chris@chris-wilson.co.uk
|
#
c6eeb479 |
|
08-Mar-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Reduce presumption of request ordering for barriers Currently we assume that we know the order in which requests run and so can determine if we need to reissue a switch-to-kernel-context prior to idling. That assumption does not hold for the future, so instead of tracking which barriers have been used, simply determine if we have ever switched away from the kernel context by using the engine and before idling ensure that all engines that have been used since the last idle are synchronously switched back to the kernel context for safety (and else of shrinking memory while idle). v2: Use intel_engine_mask_t and ALL_ENGINES Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-3-chris@chris-wilson.co.uk
|
#
103b76ee |
|
05-Mar-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use i915_global_register() Rather than manually add every new global into each hook, use i915_global_register() function and keep a list of registered globals to invoke instead. However, I haven't found a way for random drivers to add an .init table to avoid having to manually add ourselves to i915_globals_init() each time. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190305213830.18094-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
f9e9e9de |
|
01-Mar-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Prioritise non-busywait semaphore workloads We don't want to busywait on the GPU if we have other work to do. If we give non-busywaiting workloads higher (initial) priority than workloads that require a busywait, we will prioritise work that is ready to run immediately. We then also have to be careful that we don't give earlier semaphores an accidental boost because later work doesn't wait on other rings, hence we keep a history of semaphore usage of the dependency chain. v2: Stop rolling the bits into a chain and just use a flag in case this request or any of our dependencies use a semaphore. The rolling around was contagious as Tvrtko was heard to fall off his chair. Testcase: igt/gem_exec_schedule/semaphore Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-4-chris@chris-wilson.co.uk
|
#
e8861964 |
|
01-Mar-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Having introduced per-context seqno, we now have a means to identity progress across the system without feel of rollback as befell the global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in advance of submission safe in the knowledge that our target seqno and address is stable. However, since we are telling the GPU to busy-spin on the target address until it matches the signaling seqno, we only want to do so when we are sure that busy-spin will be completed quickly. To achieve this we only submit the request to HW once the signaler is itself executing (modulo preemption causing us to wait longer), and we only do so for default and above priority requests (so that idle priority tasks never themselves hog the GPU waiting for others). As might be reasonably expected, HW semaphores excel in inter-engine synchronisation microbenchmarks (where the 3x reduced latency / increased throughput more than offset the power cost of spinning on a second ring) and have significant improvement (can be up to ~10%, most see no change) for single clients that utilize multiple engines (typically media players and transcoders), without regressing multiple clients that can saturate the system or changing the power envelope dramatically. v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway. v4: Tell the world and include it as part of scheduler caps. Testcase: igt/gem_exec_whisper Testcase: igt/benchmarks/gem_wsim Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-3-chris@chris-wilson.co.uk
|
#
ebece753 |
|
01-Mar-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Keep timeline HWSP allocated until idle across the system In preparation for enabling HW semaphores, we need to keep in flight timeline HWSP alive until its use across entire system has completed, as any other timeline active on the GPU may still refer back to the already retired timeline. We both have to delay recycling available cachelines and unpinning old HWSP until the next idle point. An easy option would be to simply keep all used HWSP until the system as a whole was idle, i.e. we could release them all at once on parking. However, on a busy system, we may never see a global idle point, essentially meaning the resource will be leaked until we are forced to do a GC pass. We already employ a fine-grained idle detection mechanism for vma, which we can reuse here so that each cacheline can be freed immediately after the last request using it is retired. v3: Keep track of the activity of each cacheline. v4: cacheline_free() on canceling the seqno tracking v5: Finally with a testcase to exercise wraparound v6: Pack cacheline into empty bits of page-aligned vaddr v7: Use i915_utils to hide the pointer casting around bit manipulation Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-2-chris@chris-wilson.co.uk
|
#
3ef71149 |
|
01-Mar-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Introduce i915_timeline.mutex A simple mutex used for guarding the flow of requests in and out of the timeline. In the short-term, it will be used only to guard the addition of requests into the timeline, taken on alloc and released on commit so that only one caller can construct a request into the timeline (important as the seqno and ring pointers must be serialised). This will be used by observers to ensure that the seqno/hwsp is stable. Later, when we have reduced retiring to only operate on a single timeline at a time, we can then use the mutex as the sole guard required for retiring. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190301110547.14758-2-chris@chris-wilson.co.uk
|
#
b5773a36 |
|
28-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/execlists: Suppress mere WAIT preemption WAIT is occasionally suppressed by virtue of preempted requests being promoted to NEWCLIENT if they have not all ready received that boost. Make this consistent for all WAIT boosts that they are not allowed to preempt executing contexts and are merely granted the right to be at the front of the queue for the next execution slot. This is in keeping with the desire that the WAIT boost be a minor tweak that does not give excessive promotion to its user and open ourselves to trivial abuse. The problem with the inconsistent WAIT preemption becomes more apparent as the preemption is propagated across the engines, where one engine may preempt and the other not, and we be relying on the exact execution order being consistent across engines (e.g. using HW semaphores to coordinate parallel execution). v2: Also protect GuC submission from false preemption loops. v3: Build bug safeguards and better debug messages for st. v4: Do the priority bumping in unsubmit (i.e. on preemption/reset unwind), applying it earlier during submit causes out-of-order execution combined with execute fences. v5: Call sw_fence_fini for our dummy request (Matthew) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190228220639.3173-1-chris@chris-wilson.co.uk
|
#
32eb6bcf |
|
28-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Make request allocation caches global As kmem_caches share the same properties (size, allocation/free behaviour) for all potential devices, we can use global caches. While this potential has worse fragmentation behaviour (one can argue that different devices would have different activity lifetimes, but you can also argue that activity is temporal across the system) it is the default behaviour of the system at large to amalgamate matching caches. The benefit for us is much reduced pointer dancing along the frequent allocation paths. v2: Defer shrinking until after a global grace period for futureproofing multiple consumers of the slab caches, similar to the current strategy for avoiding shrinking too early. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-1-chris@chris-wilson.co.uk
|
#
b300fde8 |
|
26-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove i915_request.global_seqno Having weaned the interrupt handling off using a single global execution queue, we no longer need to emit a global_seqno. Note that we still have a few assumptions about execution order along engine timelines, but this removes the most obvious artefact! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-3-chris@chris-wilson.co.uk
|
#
8892f477 |
|
26-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove access to global seqno in the HWSP Stop accessing the HWSP to read the global seqno, and stop tracking the mirror in the engine's execution timeline -- it is unused. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-2-chris@chris-wilson.co.uk
|
#
c41166f9 |
|
20-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Beware temporary wedging when determining -EIO At a few points in our uABI, we check to see if the driver is wedged and report -EIO back to the user in that case. However, as we perform the check and reset asynchronously (where once before they were both serialised by the struct_mutex), we may instead see the temporary wedging used to cancel inflight rendering to avoid a deadlock during reset (caused by either us timing out in our reset handler, i915_wedge_on_timeout or with malice aforethought in intel_reset_prepare for a stuck modeset). If we suspect this is the case, that is we see a wedged driver *and* reset in progress, then wait until the reset is resolved before reporting upon the wedged status. v2: might_sleep() (Mika) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109580 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190220145637.23503-1-chris@chris-wilson.co.uk
|
#
7f4127c4 |
|
18-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Use time based guilty context banning Currently, we accumulate each time a context hangs the GPU, offset against the number of requests it submits, and if that score exceeds a certain threshold, we ban that context from submitting any more requests (cancelling any work in flight). In contrast, we use a simple timer on the file, that if we see more than a 9 hangs faster than 60s apart in total across all of its contexts, we will ban the client from creating any more contexts. This leads to a confusing situation where the file may be banned before the context, so lets use a simple timer scheme for each. If the context submits 3 hanging requests within a 120s period, declare it forbidden to ever send more requests. This has the advantage of not being easy to repair by simply sending empty requests, but has the disadvantage that if the context is idle then it is forgiven. However, if the context is idle, it is not disrupting the system, but a hog can evade the request counting and cause much more severe disruption to the system. Updating ban_score from request retirement is dubious as the retirement is purposely not in sync with request submission (i.e. we try and batch retirement to reduce overhead and avoid latency on submission), which leads to surprising situations where we can forgive a hang immediately due to a backlog of requests from before the hang being retired afterwards. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190219122215.8941-2-chris@chris-wilson.co.uk
|
#
d9e61b66 |
|
13-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Defer application of request banning to submission As we currently do not check on submission whether the context is banned in a timely manner it is possible for some requests to escape cancellation after their parent context is banned. By moving the ban into the request submission under the engine->timeline.lock, we serialise it with the reset and setting of the context ban. References: eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190213182737.12695-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
|
#
62eb3c24 |
|
13-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Apply rps waitboosting for dma_fence_wait_timeout() As time goes by, usage of generic ioctls such as drm_syncobj and sync_file are on the increase bypassing i915-specific ioctls like GEM_WAIT. Currently, we only apply waitboosting to our driver ioctls as we track the file/client and account the waitboosting to them. However, since commit 7b92c1bd0540 ("drm/i915: Avoid keeping waitboost active for signaling threads"), we no longer have been applying the client ratelimiting on waitboosts and so that information has only been used for debug tracking. Push the application of waitboosting down to the common i915_request_wait, and apply it to all foreign fence waits as well. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190213092504.25709-1-chris@chris-wilson.co.uk
|
#
21950ee7 |
|
05-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Pull i915_gem_active into the i915_active family Looking forward, we need to break the struct_mutex dependency on i915_gem_active. In the meantime, external use of i915_gem_active is quite beguiling, little do new users suspect that it implies a barrier as each request it tracks must be ordered wrt the previous one. As one of many, it can be used to track activity across multiple timelines, a shared fence, which fits our unordered request submission much better. We need to steer external users away from the singular, exclusive fence imposed by i915_gem_active to i915_active instead. As part of that process, we move i915_gem_active out of i915_request.c into i915_active.c to start separating the two concepts, and rename it to i915_active_request (both to tie it to the concept of tracking just one request, and to give it a longer, less appealing name). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-5-chris@chris-wilson.co.uk
|
#
78108584 |
|
05-Feb-2019 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915: Add timeline barrier support Timeline barrier allows serialization between different timelines. After calling i915_timeline_set_barrier with a request, all following submissions on this timeline will be set up as depending on this request, or barrier. Once the barrier has been completed it automatically gets cleared and things continue as normal. This facility will be used by the upcoming context SSEU code. v2: * Assert barrier has been retired on timeline_fini. (Chris Wilson) * Fix mock_timeline. v3: * Improved comment language. (Chris Wilson) v4: * Maintain ordering with previous barriers set on the timeline. v5: * Rebase. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-3-tvrtko.ursulin@linux.intel.com
|
#
1413b2bc |
|
04-Feb-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Trim NEWCLIENT boosting Limit the NEWCLIENT boost to only give its small priority boost to fresh clients only that have no dependencies. The idea for using NEWCLIENT boosting, commit b16c765122f9 ("drm/i915: Priority boost for new clients"), is that short-lived streams are often interactive and require lower latency -- and that by executing those ahead of the long running hogs, the short-lived clients do little to interfere with the system throughput by virtue of their short-lived nature. However, we were only considering the client's own timeline for determining whether or not it was a fresh stream. This allowed for compositors to wake up before their vblank and bump all of its client streams. However, in testing with media-bench this results in chaining all cooperating contexts together preventing us from being able to reorder contexts to reduce bubbles (pipeline stalls), overall increasing latency, and reducing system throughput. The exact opposite of our intent. The compromise of applying the NEWCLIENT boost to strictly fresh clients (that do not wait upon anything else) should maintain the "real-time response under load" characteristics of FQ_CODEL, without locking together the long chains of dependencies across the system. References: b16c765122f9 ("drm/i915: Priority boost for new clients") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190204150101.30759-1-chris@chris-wilson.co.uk
|
#
52c0fdb2 |
|
29-Jan-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Replace global breadcrumbs with per-context interrupt tracking A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd"), the issue of handling multiple clients waiting in parallel was brought to our attention. The requirement was that every client should be woken immediately upon its request being signaled, without incurring any cpu overhead. To handle certain fragility of our hw meant that we could not do a simple check inside the irq handler (some generations required almost unbounded delays before we could be sure of seqno coherency) and so request completion checking required delegation. Before commit 688e6c725816, the solution was simple. Every client waiting on a request would be woken on every interrupt and each would do a heavyweight check to see if their request was complete. Commit 688e6c725816 introduced an rbtree so that only the earliest waiter on the global timeline would woken, and would wake the next and so on. (Along with various complications to handle requests being reordered along the global timeline, and also a requirement for kthread to provide a delegate for fence signaling that had no process context.) The global rbtree depends on knowing the execution timeline (and global seqno). Without knowing that order, we must instead check all contexts queued to the HW to see which may have advanced. We trim that list by only checking queued contexts that are being waited on, but still we keep a list of all active contexts and their active signalers that we inspect from inside the irq handler. By moving the waiters onto the fence signal list, we can combine the client wakeup with the dma_fence signaling (a dramatic reduction in complexity, but does require the HW being coherent, the seqno must be visible from the cpu before the interrupt is raised - we keep a timer backup just in case). Having previously fixed all the issues with irq-seqno serialisation (by inserting delays onto the GPU after each request instead of random delays on the CPU after each interrupt), we can rely on the seqno state to perfom direct wakeups from the interrupt handler. This allows us to preserve our single context switch behaviour of the current routine, with the only downside that we lose the RT priority sorting of wakeups. In general, direct wakeup latency of multiple clients is about the same (about 10% better in most cases) with a reduction in total CPU time spent in the waiter (about 20-50% depending on gen). Average herd behaviour is improved, but at the cost of not delegating wakeups on task_prio. v2: Capture fence signaling state for error state and add comments to warm even the most cold of hearts. v3: Check if the request is still active before busywaiting v4: Reduce the amount of pointer misdirection with list_for_each_safe and using a local i915_request variable inside the loops v5: Add a missing pluralisation to a purely informative selftest message. References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
|
#
85474441 |
|
29-Jan-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Identify active requests To allow requests to forgo a common execution timeline, one question we need to be able to answer is "is this request running?". To track whether a request has started on HW, we can emit a breadcrumb at the beginning of the request and check its timeline's HWSP to see if the breadcrumb has advanced past the start of this request. (This is in contrast to the global timeline where we need only ask if we are on the global timeline and if the timeline has advanced past the end of the previous request.) There is still confusion from a preempted request, which has already started but relinquished the HW to a high priority request. For the common case, this discrepancy should be negligible. However, for identification of hung requests, knowing which one was running at the time of the hang will be much more important. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-2-chris@chris-wilson.co.uk
|
#
5013eb8c |
|
28-Jan-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Track the context's seqno in its own timeline HWSP Now that we have allocated ourselves a cacheline to store a breadcrumb, we can emit a write from the GPU into the timeline's HWSP of the per-context seqno as we complete each request. This drops the mirroring of the per-engine HWSP and allows each context to operate independently. We do not need to unwind the per-context timeline, and so requests are always consistent with the timeline breadcrumb, greatly simplifying the completion checks as we no longer need to be concerned about the global_seqno changing mid check. One complication though is that we have to be wary that the request may outlive the HWSP and so avoid touching the potentially danging pointer after we have retired the fence. We also have to guard our access of the HWSP with RCU, the release of the obj->mm.pages should already be RCU-safe. At this point, we are emitting both per-context and global seqno and still using the single per-engine execution timeline for resolving interrupts. v2: s/fake_complete/mark_complete/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-5-chris@chris-wilson.co.uk
|
#
3adac468 |
|
28-Jan-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Introduce concept of per-timeline (context) HWSP Supplement the per-engine HWSP with a per-timeline HWSP. That is a per-request pointer through which we can check a local seqno, abstracting away the presumption of a global seqno. In this first step, we point each request back into the engine's HWSP so everything continues to work with the global timeline. v2: s/i915_request_hwsp/hwsp_seqno/ to emphasis that this is the current HW value and that we are accessing it via i915_request merely as a convenience. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-1-chris@chris-wilson.co.uk
|
#
eb8d0f5a |
|
25-Jan-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove GPU reset dependence on struct_mutex Now that the submission backends are controlled via their own spinlocks, with a wave of a magic wand we can lift the struct_mutex requirement around GPU reset. That is we allow the submission frontend (userspace) to keep on submitting while we process the GPU reset as we can suspend the backend independently. The major change is around the backoff/handoff strategy for performing the reset. With no mutex deadlock, we no longer have to coordinate with any waiter, and just perform the reset immediately. Testcase: igt/gem_mmap_gtt/hang # regresses Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190125132230.22221-3-chris@chris-wilson.co.uk
|
#
9fa4973e |
|
24-Jan-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove manual breadcumb counting Now that we know we measure the size of the engine->emit_breadcrumb() correctly, we can remove the previous manual counting. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190125120005.25191-1-chris@chris-wilson.co.uk
|
#
0e21834e |
|
21-Jan-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Tidy common test_bit probing of i915_request->fence.flags A repeated pattern is to test the signaled bit of our request->fence.flags. Make this an inline to shorten a few lines and remove unnecessary line continuations. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190121222117.23305-20-chris@chris-wilson.co.uk
|
#
f1e9c909 |
|
19-Jan-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Prevent use of global_seqno=0 We are not allowed to assign rq->global_seqno=0 as it has a special meaning of "inactive" (not executing on HW). Fixes: 6faf5916e6be ("drm/i915: Remove HW semaphores for gen7 inter-engine synchronisation") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190119143024.26971-1-chris@chris-wilson.co.uk
|
#
9f58892e |
|
16-Jan-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Pull all the reset functionality together into i915_reset.c Currently the code to reset the GPU and our state is spread widely across a few files. Pull the logic together into a common file. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190116153304.787-1-chris@chris-wilson.co.uk
|
#
d22ba0cb |
|
09-Jan-2019 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Reduce i915_request_alloc retirement to local context In the continual quest to reduce the amount of global work required when submitting requests, replace i915_retire_requests() after allocation failure to retiring just our ring. v2: Don't forget the list iteration included an early break, so we would never throttle on the last request in the ring/timeline. v3: Use the common ring_retire_requests() References: 11abf0c5a021 ("drm/i915: Limit the backpressure for i915_request allocation") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190109215932.26454-1-chris@chris-wilson.co.uk
|
#
1216e3c3 |
|
28-Dec-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Drop unused engine->irq_seqno_barrier w/a Now that we have eliminated the CPU-side irq_seqno_barrier by moving the delays on the GPU before emitting the MI_USER_INTERRUPT, we can remove the engine->irq_seqno_barrier infrastructure. Though intentionally slowing down the GPU is nasty, so is the code we can now remove! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-6-chris@chris-wilson.co.uk
|
#
ed2922c0 |
|
28-Dec-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove redundant trailing request flush Now that we perform the request flushing inline with emitting the breadcrumb, we can remove the now redundant manual flush. And we can also remove the infrastructure that remained only for its purpose. v2: emit_breadcrumb_sz is in dwords, but rq->reserved_space is in bytes Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-1-chris@chris-wilson.co.uk
|
#
6faf5916 |
|
28-Dec-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove HW semaphores for gen7 inter-engine synchronisation The writing is on the wall for the existence of a single execution queue along each engine, and as a consequence we will not be able to track dependencies along the HW queue itself, i.e. we will not be able to use HW semaphores on gen7 as they use a global set of registers (and unlike gen8+ we can not effectively target memory to keep per-context seqno and dependencies). On the positive side, when we implement request reordering for gen7 we also can not presume a simple execution queue and would also require removing the current semaphore generation code. So this bring us another step closer to request reordering for ringbuffer submission! The negative side is that using interrupts to drive inter-engine synchronisation is much slower (4us -> 15us to do a nop on each of the 3 engines on ivb). This is much better than it was at the time of introducing the HW semaphores and equally important userspace weaned itself off intermixing dependent BLT/RENDER operations (the prime culprit was glyph rendering in UXA). So while we regress the microbenchmarks, it should not impact the user. References: https://bugs.freedesktop.org/show_bug.cgi?id=108888 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181228140736.32606-2-chris@chris-wilson.co.uk
|
#
dd847a70 |
|
07-Dec-2018 |
Mika Kuoppala <mika.kuoppala@linux.intel.com> |
drm/i915: Compile fix for 64b dma-fence seqno Many errs of the form: drivers/gpu/drm/i915/selftests/intel_hangcheck.c: In function ‘__igt_reset_evict_vma’: ./include/linux/kern_levels.h:5:18: error: format ‘%x’ expects argument of type ‘unsigned int’, but argum Fixes: b312d8ca3a7c ("dma-buf: make fence sequence numbers 64 bit v2") Cc: Christian König <christian.koenig@amd.com> Cc: Chunming Zhou <david1.zhou@amd.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20181207123428.16257-1-mika.kuoppala@linux.intel.com
|
#
5f5800a7 |
|
07-Dec-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Push EMIT_INVALIDATE at request start to backends Move the common engine->emit_flush(EMIT_INVALIDATE) back to the backends (where it was once previously) as we seek to specialise it in future patches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181207090213.14352-1-chris@chris-wilson.co.uk
|
#
39e84937 |
|
26-Nov-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Skip engine serialisation for no-op seqno reset If the engine's seqno is already at our target seqno (most likely it hasn't been used since the last reset), we can skip serialising the engine and leave it as is. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181126095610.20962-1-chris@chris-wilson.co.uk
|
#
1e016a86 |
|
24-Oct-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Park signaling thread while wrapping the seqno A danger encountered when resetting the seqno (using debugfs/i915_next_seqno) is that as we change the breadcrumb stored in the HWSP, it may be inspected by the signaler thread leading to confusion in our sanity checks. <0> [136.331342] i915/sig-347 3..s1 136336154us : execlists_submission_tasklet: rcs0 awake?=1, active=5 <0> [136.331373] i915/sig-347 3d.s2 136336155us : process_csb: rcs0 cs-irq head=5, tail=0 <0> [136.331402] i915/sig-347 3d.s2 136336155us : process_csb: rcs0 csb[0]: status=0x00000018:0x00000002, active=0x5 <0> [136.331434] i915/sig-347 3d.s2 136336156us : process_csb: rcs0 out[0]: ctx=2.1, global=219 (fence 46:8455) (current 219), prio=0 <0> [136.331466] i915/sig-347 3d.s2 136336156us : process_csb: rcs0 completed ctx=2 <0> [136.332027] gem_exec-1049 0.... 136336246us : reset_all_global_seqno.part.5: rcs0 seqno 219 (current 219) -> -43 <0> [136.332056] gem_exec-1049 0.... 136336251us : reset_all_global_seqno.part.5: bcs0 seqno 183 (current 183) -> -43 <0> [136.332085] gem_exec-1049 0.... 136336255us : reset_all_global_seqno.part.5: vcs0 seqno 191 (current 191) -> -43 <0> [136.332114] gem_exec-1049 0.... 136336259us : reset_all_global_seqno.part.5: vcs1 seqno 180 (current 180) -> -43 <0> [136.332143] gem_exec-1049 0.... 136336262us : reset_all_global_seqno.part.5: vecs0 seqno 212 (current 212) -> -43 <0> [136.332174] i915/sig-347 3.... 136336280us : intel_breadcrumbs_signaler: intel_breadcrumbs_signaler:673 GEM_BUG_ON(!i915_request_completed(rq)) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181024104939.2861-1-chris@chris-wilson.co.uk
|
#
33373258 |
|
05-Oct-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove the global cache shrink & rcu barrier on allocation failure Earlier, we reasoned that having idled the gpu under mempressure, that would be a good time to trim our request slabs in order to perform the next request allocation. We have stopped performing the global operation on the device (no idling) and wish to make the allocation failure handling more local, so out with the global barrier that may take a long time. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181005080300.9908-2-chris@chris-wilson.co.uk
|
#
e9eaf82d |
|
01-Oct-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Priority boost for waiting clients Latency is in the eye of the beholder. In the case where a client stops and waits for the gpu, give that request chain a small priority boost (not so that it overtakes higher priority clients, to preserve the external ordering) so that ideally the wait completes earlier. v2: Tvrtko recommends to keep the boost-from-user-stall as small as possible and to allow new client flows to be preferred for interactivity over stalls. Testcase: igt/gem_sync/switch-default Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-3-chris@chris-wilson.co.uk
|
#
e2f3496e |
|
01-Oct-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Pull scheduling under standalone lock Currently, the backend scheduling code abuses struct_mutex into order to have a global lock to manipulate a temporary list (without widespread allocation) and to protect against list modifications. This is an extraneous coupling to struct_mutex and further can not extend beyond the local device. Pull all the code that needs to be under the one true lock into i915_scheduler.c, and make it so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-2-chris@chris-wilson.co.uk
|
#
b16c7651 |
|
01-Oct-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Priority boost for new clients Taken from an idea used for FQ_CODEL, we give the first request of a new request flows a small priority boost. These flows are likely to correspond with short, interactive tasks and so be more latency sensitive than the longer free running queues. As soon as the client has more than one request in the queue, further requests are not boosted and it settles down into ordinary steady state behaviour. Such small kicks dramatically help combat the starvation issue, by allowing each client the opportunity to run even when the system is under heavy throughput load (within the constraints of the user selected priority). v2: Mark the preempted request as the start of a new flow, to prevent a single client being continually gazumped by its peers. Testcase: igt/benchmarks/rrul Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-1-chris@chris-wilson.co.uk
|
#
11abf0c5 |
|
14-Sep-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Limit the backpressure for i915_request allocation If we try and fail to allocate a i915_request, we apply some backpressure on the clients to throttle the memory allocations coming from i915.ko. Currently, we wait until completely idle, but this is far too heavy and leads to some situations where the only escape is to declare a client hung and reset the GPU. The intent is to only ratelimit the allocation requests and to allow ourselves to recycle requests and memory from any long queues built up by a client hog. Although the system memory is inherently a global resources, we don't want to overly penalize an unlucky client to pay the price of reaping a hog. To reduce the influence of one client on another, we can instead of waiting for the entire GPU to idle, impose a barrier on the local client. (One end goal for request allocation is for scalability to many concurrent allocators; simultaneous execbufs.) To prevent ourselves from getting caught out by long running requests (requests that may never finish without userspace intervention, whom we are blocking) we need to impose a finite timeout, ideally shorter than hangcheck. A long time ago Paul McKenney suggested that RCU users should ratelimit themselves using judicious use of cond_synchronize_rcu(). This gives us the opportunity to reduce our indefinite wait for the GPU to idle to a wait for the RCU grace period of the previous allocation along this timeline to expire, satisfying both the local and finite properties we desire for our ratelimiting. There are still a few global steps (reclaim not least amongst those!) when we exhaust the immediate slab pool, at least now the wait is itself decoupled from struct_mutex for our glorious highly parallel future! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106680 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180914080017.30308-1-chris@chris-wilson.co.uk
|
#
97f06158 |
|
05-Aug-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Pull seqno started checks together We have a few instances of checking seqno-1 to see if the HW has started the request. Pull those together under a helper. v2: Pull the !seqno assertion higher, as given seqno==1 we may indeed check to see if we have started using seqno==0. Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180806112605.20725-1-chris@chris-wilson.co.uk
|
#
ec625fb9 |
|
09-Jul-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Provide a timeout to i915_gem_wait_for_idle() Usually we have no idea about the upper bound we need to wait to catch up with userspace when idling the device, but in a few situations we know the system was idle beforehand and can provide a short timeout in order to very quickly catch a failure, long before hangcheck kicks in. In the following patches, we will use the timeout to curtain two overly long waits, where we know we can expect the GPU to complete within a reasonable time or declare it broken. In particular, with a broken GPU we expect it to fail during the initial GPU setup where do a couple of context switches to record the defaults. This is a task that takes a few milliseconds even on the slowest of devices, but we may have to wait 60s for hangcheck to give in and declare the machine inoperable. In this a case where any gpu hang is unacceptable, both from a timeliness and practical standpoint. The other improvement is that in selftests, we do not need to arm an independent timer to inject a wedge, as we can just limit the timeout on the wait directly. v2: Include the timeout parameter in the trace. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-1-chris@chris-wilson.co.uk
|
#
890fd185 |
|
06-Jul-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Replace nested subclassing with explicit subclasses In the next patch, we will want a third distinct class of timeline that may overlap with the current pair of client and engine timeline classes. Rather than use the ad hoc markup of SINGLE_DEPTH_NESTING, initialise the different timeline classes with an explicit subclass. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706210710.16251-1-chris@chris-wilson.co.uk
|
#
6dd7526f |
|
06-Jul-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Export i915_request_skip() In the next patch, we will want to start skipping requests on failing to complete their payloads. So export the utility function current used to make requests inoperable following a failed gpu reset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706103947.15919-2-chris@chris-wilson.co.uk
|
#
e3be4079 |
|
27-Jun-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Only signal from interrupt when requested Avoid calling dma_fence_signal() from inside the interrupt if we haven't enabled signaling on the request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180627201304.15817-4-chris@chris-wilson.co.uk
|
#
78796877 |
|
27-Jun-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move the irq_counter inside the spinlock Rather than have multiple locked instructions inside the notify_ring() irq handler, move them inside the spinlock and reduce their intrinsic locking. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180627201304.15817-3-chris@chris-wilson.co.uk
|
#
697b9a87 |
|
12-Jun-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Make closing request flush mandatory For symmetry, simplicity and ensuring the request is always truly idle upon its completion, always emit the closing flush prior to emitting the request breadcrumb. Previously, we would only emit the flush if we had started a user batch, but this just leaves all the other paths open to speculation (do they affect the GPU caches or not?) With mm switching, a key requirement is that the GPU is flushed and invalidated before hand, so for absolute safety, we want that closing flush be mandatory. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180612105135.4459-1-chris@chris-wilson.co.uk
|
#
b3ee09a4 |
|
10-Jun-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/ringbuffer: Fix context restore upon reset The discovery with trying to enable full-ppgtt was that we were completely failing to the load both the mm and context following the reset. Although we were performing mmio to set the PP_DIR (per-process GTT) and CCID (context), these were taking no effect (the assumption was that this would trigger reload of the context and restore the page tables). It was not until we performed the LRI + MI_SET_CONTEXT in a following context switch would anything occur. Since we are then required to reset the context image and PP_DIR using CS commands, we place those commands into every batch. The hardware should recognise the no-ops and eliminate the expensive context loads, but we still have to pay the cost of using cross-powerwell register writes. In practice, this has no effect on actual context switch times, and only adds a few hundred nanoseconds to no-op switches. We can improve the latter by eliminating the w/a around known no-op switches, but there is an ulterior motive to keeping them. Always emitting the context switch at the beginning of the request (and relying on HW to skip unneeded switches) does have one key advantage. Should we implement request reordering on Haswell, we will not know in advance what the previous executing context was on the GPU and so we would not be able to elide the MI_SET_CONTEXT commands ourselves and always have to emit them. Having our hand forced now actually prepares us for later. Now since that context and mm follow the request, we no longer (and not for a long time since requests took over!) require a trace point to tell when we write the switch into the ring, since it is always. (This is even more important when you remember that simply writing into the ring bears no relation to the current mm.) v2: Sandybridge has to agree to use LRI as well. Testcase: igt/drv_selftests/live_hangcheck Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180611110845.31890-1-chris@chris-wilson.co.uk
|
#
09a4c02e |
|
24-May-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Look for an active kernel context before switching We were not very carefully checking to see if an older request on the engine was an earlier switch-to-kernel-context before deciding to emit a new switch. The end result would be that we could get into a permanent loop of trying to emit a new request to perform the switch simply to flush the existing switch. What we need is a means of tracking the completion of each timeline versus the kernel context, that is to detect if a more recent request has been submitted that would result in a switch away from the kernel context. To realise this, we need only to look in our syncmap on the kernel context and check that we have synchronized against all active rings. v2: Since all ringbuffer clients currently share the same timeline, we do have to use the gem_context to distinguish clients. As a bonus, include all the tracing used to debug the death inside suspend. v3: Test, test, test. Construct a selftest to exercise and assert the expected behaviour that multiple switch-to-contexts do not emit redundant requests. Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180524081135.15278-1-chris@chris-wilson.co.uk
|
#
1fc44d9b |
|
17-May-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Store a pointer to intel_context in i915_request To ease the frequent and ugly pointer dance of &request->gem_context->engine[request->engine->id] during request submission, store that pointer as request->hw_context. One major advantage that we will exploit later is that this decouples the logical context state from the engine itself. v2: Set mock_context->ops so we don't crash and burn in selftests. Cleanups from Tvrtko. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-3-chris@chris-wilson.co.uk
|
#
4e0d64db |
|
17-May-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move request->ctx aside In the next patch, we want to store the intel_context pointer inside i915_request, as it is frequently access via a convoluted dance when submitting the request to hw. Having two context pointers inside i915_request leads to confusion so first rename the existing i915_gem_context pointer to i915_request.gem_context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-1-chris@chris-wilson.co.uk
|
#
0adb90d3 |
|
08-May-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Annotate timeline lock nesting CI noticed <4>[ 23.430701] ============================================ <4>[ 23.430706] WARNING: possible recursive locking detected <4>[ 23.430713] 4.17.0-rc4-CI-CI_DRM_4156+ #1 Not tainted <4>[ 23.430720] -------------------------------------------- <4>[ 23.430725] systemd-udevd/169 is trying to acquire lock: <4>[ 23.430732] (ptrval) (&(&timeline->lock)->rlock){....}, at: move_to_timeline+0x48/0x12c [i915] <4>[ 23.430888] but task is already holding lock: <4>[ 23.430894] (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915] <4>[ 23.430995] other info that might help us debug this: <4>[ 23.431002] Possible unsafe locking scenario: <4>[ 23.431007] CPU0 <4>[ 23.431010] ---- <4>[ 23.431013] lock(&(&timeline->lock)->rlock); <4>[ 23.431021] lock(&(&timeline->lock)->rlock); <4>[ 23.431028] *** DEADLOCK *** <4>[ 23.431036] May be due to missing lock nesting notation <4>[ 23.431044] 5 locks held by systemd-udevd/169: <4>[ 23.431049] #0: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x42/0xe0 <4>[ 23.431065] #1: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x50/0xe0 <4>[ 23.431078] #2: (ptrval) (&dev->struct_mutex){+.+.}, at: i915_gem_init+0xca/0x630 [i915] <4>[ 23.431174] #3: (ptrval) (rcu_read_lock){....}, at: submit_notify+0x35/0x124 [i915] <4>[ 23.431271] #4: (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915] <4>[ 23.431369] stack backtrace: <4>[ 23.431377] CPU: 0 PID: 169 Comm: systemd-udevd Not tainted 4.17.0-rc4-CI-CI_DRM_4156+ #1 <4>[ 23.431385] Hardware name: Dell Inc. OptiPlex GX280 /0G8310, BIOS A04 02/09/2005 <4>[ 23.431394] Call Trace: <4>[ 23.431403] dump_stack+0x67/0x9b <4>[ 23.431411] __lock_acquire+0xc67/0x1b50 <4>[ 23.431421] ? ring_buffer_lock_reserve+0x154/0x3f0 <4>[ 23.431429] ? lock_acquire+0xa6/0x210 <4>[ 23.431435] lock_acquire+0xa6/0x210 <4>[ 23.431530] ? move_to_timeline+0x48/0x12c [i915] <4>[ 23.431540] _raw_spin_lock+0x2a/0x40 <4>[ 23.431634] ? move_to_timeline+0x48/0x12c [i915] <4>[ 23.431730] move_to_timeline+0x48/0x12c [i915] <4>[ 23.431826] __i915_request_submit+0xfa/0x280 [i915] <4>[ 23.431923] i915_request_submit+0x25/0x40 [i915] <4>[ 23.432024] i9xx_submit_request+0x11/0x140 [i915] <4>[ 23.432120] submit_notify+0x8d/0x124 [i915] <4>[ 23.432202] __i915_sw_fence_complete+0x81/0x250 [i915] <4>[ 23.432300] __i915_request_add+0x31c/0x7c0 [i915] <4>[ 23.432395] i915_gem_init+0x621/0x630 [i915] <4>[ 23.432476] i915_driver_load+0xbee/0x10b0 [i915] <4>[ 23.432485] ? trace_hardirqs_on_caller+0xe0/0x1b0 <4>[ 23.432566] i915_pci_probe+0x29/0x90 [i915] <4>[ 23.432574] pci_device_probe+0xa1/0x130 <4>[ 23.432582] driver_probe_device+0x306/0x480 <4>[ 23.432589] __driver_attach+0xb7/0xe0 <4>[ 23.432596] ? driver_probe_device+0x480/0x480 <4>[ 23.432602] ? driver_probe_device+0x480/0x480 <4>[ 23.432609] bus_for_each_dev+0x74/0xc0 <4>[ 23.432616] bus_add_driver+0x15f/0x250 <4>[ 23.432623] ? 0xffffffffa02d7000 <4>[ 23.432629] driver_register+0x52/0xc0 <4>[ 23.432635] ? 0xffffffffa02d7000 <4>[ 23.432642] do_one_initcall+0x58/0x370 <4>[ 23.432653] ? do_init_module+0x1d/0x1ea <4>[ 23.432660] ? rcu_read_lock_sched_held+0x6f/0x80 <4>[ 23.432667] ? kmem_cache_alloc_trace+0x282/0x2e0 <4>[ 23.432675] do_init_module+0x56/0x1ea <4>[ 23.432682] load_module+0x2435/0x2b20 <4>[ 23.432694] ? __se_sys_finit_module+0xd3/0xf0 <4>[ 23.432701] __se_sys_finit_module+0xd3/0xf0 <4>[ 23.432710] do_syscall_64+0x55/0x190 <4>[ 23.432717] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 23.432724] RIP: 0033:0x7fa780782839 <4>[ 23.432729] RSP: 002b:00007ffcea73e668 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 <4>[ 23.432738] RAX: ffffffffffffffda RBX: 0000561a472a4b30 RCX: 00007fa780782839 <4>[ 23.432745] RDX: 0000000000000000 RSI: 00007fa7804610e5 RDI: 000000000000000e <4>[ 23.432752] RBP: 00007fa7804610e5 R08: 0000000000000000 R09: 00007ffcea73e780 <4>[ 23.432758] R10: 000000000000000e R11: 0000000000000246 R12: 0000000000000000 <4>[ 23.432765] R13: 0000561a47296450 R14: 0000000000020000 R15: 0000561a472a4b30 but did not report it as an issue as it only occurred during the first module on boot. This is due to the removal of the distinct global timeline, and its separate lock class. So instead mark up the expected nesting. An alternative would be to define a separate lock class for the engine, but since we only expect to have a single point of nesting, we can avoid having multiple lock classes for the struct. Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Tested-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180508153514.20251-1-chris@chris-wilson.co.uk
|
#
71ace7ca |
|
07-May-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Disable tasklet scheduling across initial scheduling During request submission, we call the engine->schedule() function so that we may reorder the active requests as required for inheriting the new request's priority. This may schedule several tasklets to run on the local CPU, but we will need to schedule the tasklets again for the new request. Delay all the local tasklets until the end, so that we only have to process the queue just once. v2: Beware PREEMPT_RCU, as then local_bh_disable() is then not a superset of rcu_read_lock(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180507135731.10587-2-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
43c8c441 |
|
04-May-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove assertion of active_rings must be non-empty if active_requests "An outstanding request must still be on an active ring somewhere" is only true if we haven't just been interrupted by the shrinker in the middle of allocating the request itself. (At the start of i915_request_alloc() we pin the context and prepare the GT for activity, marking it as active, and then try to allocate the request. If this allocation invokes the shrinker, we try to reclaim some space by calling i915_retire_requests() which may then be confused by the pre-reservation of active_requests.) <3>[ 125.472695] i915_retire_requests:1429 GEM_BUG_ON(list_empty(&i915->gt.active_rings)) <2>[ 125.472792] kernel BUG at drivers/gpu/drm/i915/i915_request.c:1429! <4>[ 125.472822] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI <4>[ 125.498764] Modules linked in: snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btusb btrtl btbcm btintel cdc_ether snd_hda_codec_realtek bluetooth i915 snd_hda_codec_generic usbnet r8152 mii ecdh_generic lpc_ich mei_me snd_hda_intel snd_hda_codec mei snd_hwdep snd_hda_core snd_pcm prime_numbers <4>[ 125.498923] CPU: 0 PID: 1115 Comm: gem_exec_create Tainted: G U 4.17.0-rc3-gc49cbe0d1eb8-kasan_32+ #1 <4>[ 125.498955] Hardware name: GOOGLE Peppy/Peppy, BIOS MrChromebox 02/04/2018 <4>[ 125.499074] RIP: 0010:i915_retire_requests+0x3f2/0x590 [i915] <4>[ 125.499095] RSP: 0018:ffff88004e5dec40 EFLAGS: 00010282 <4>[ 125.499117] RAX: 0000000000000010 RBX: ffff8800458f0000 RCX: 0000000000000000 <4>[ 125.499140] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff880060c2f6f0 <4>[ 125.499164] RBP: ffff88004e5dee30 R08: ffffed000c185ee6 R09: ffffed000c185ee6 <4>[ 125.499187] R10: 0000000000000001 R11: ffffed000c185ee5 R12: ffff8800553da160 <4>[ 125.499210] R13: dffffc0000000000 R14: 0000000000000000 R15: ffff8800458faed0 <4>[ 125.499235] FS: 00007fe18f052980(0000) GS:ffff880065400000(0000) knlGS:0000000000000000 <4>[ 125.499262] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 125.499282] CR2: 00007f01df11efb8 CR3: 00000000518d4001 CR4: 00000000000606f0 <4>[ 125.499304] Call Trace: <4>[ 125.499417] i915_gem_shrink+0x576/0xb50 [i915] <4>[ 125.499532] ? i915_gem_shrinker_count+0x2f0/0x2f0 [i915] <4>[ 125.499561] ? trace_hardirqs_on_thunk+0x1a/0x1c <4>[ 125.499671] ? i915_gem_shrinker_count+0x1d6/0x2f0 [i915] <4>[ 125.499782] ? i915_gem_shrinker_scan+0xc4/0x320 [i915] <4>[ 125.499889] i915_gem_shrinker_scan+0xc4/0x320 [i915] <4>[ 125.499997] ? i915_gem_shrinker_vmap+0x3a0/0x3a0 [i915] <4>[ 125.500021] ? do_raw_spin_unlock+0x4f/0x240 <4>[ 125.500042] ? _raw_spin_unlock+0x29/0x40 <4>[ 125.500149] ? i915_gem_shrinker_count+0x1d6/0x2f0 [i915] <4>[ 125.500177] shrink_slab.part.18+0x23e/0x8f0 <4>[ 125.500202] ? unregister_shrinker+0x1f0/0x1f0 <4>[ 125.500226] ? mem_cgroup_iter+0x379/0xcc0 <4>[ 125.500249] shrink_node+0xa7e/0x1180 <4>[ 125.500276] ? shrink_node_memcg+0x11f0/0x11f0 <4>[ 125.500297] ? __delayacct_freepages_start+0x38/0x80 <4>[ 125.500319] ? __is_insn_slot_addr+0xe3/0x1a0 <4>[ 125.500342] ? recalibrate_cpu_khz+0x10/0x10 <4>[ 125.500361] ? ktime_get+0xb2/0x140 <4>[ 125.500382] do_try_to_free_pages+0x2d3/0xe40 <4>[ 125.500407] ? allow_direct_reclaim.part.23+0x1e0/0x1e0 <4>[ 125.500429] ? shrink_node+0x1180/0x1180 <4>[ 125.500450] ? __read_once_size_nocheck.constprop.4+0x10/0x10 <4>[ 125.500476] try_to_free_pages+0x1af/0x560 <4>[ 125.500497] ? do_try_to_free_pages+0xe40/0xe40 <4>[ 125.500525] __alloc_pages_nodemask+0xadc/0x2130 <4>[ 125.500553] ? gfp_pfmemalloc_allowed+0x150/0x150 <4>[ 125.500654] ? i915_gem_do_execbuffer+0x219d/0x32e0 [i915] <4>[ 125.500678] ? debug_check_no_locks_freed+0x2a0/0x2a0 <4>[ 125.500701] ? __debug_object_init+0x322/0xd90 <4>[ 125.500722] ? debug_check_no_locks_freed+0x2a0/0x2a0 <4>[ 125.500827] ? i915_gem_do_execbuffer+0xdc2/0x32e0 [i915] <4>[ 125.500942] ? i915_request_alloc+0x5b5/0x13f0 [i915] <4>[ 125.500964] ? page_frag_free+0x170/0x170 <4>[ 125.500984] ? debug_check_no_locks_freed+0x2a0/0x2a0 <4>[ 125.501008] new_slab+0x21d/0x5c0 <4>[ 125.501029] ___slab_alloc.constprop.35+0x322/0x3e0 <4>[ 125.501052] ? reservation_object_reserve_shared+0x10b/0x250 <4>[ 125.501074] ? __ww_mutex_lock.constprop.3+0x1104/0x2cf0 <4>[ 125.501097] ? _raw_spin_unlock_irqrestore+0x39/0x60 <4>[ 125.501120] ? fs_reclaim_acquire+0x10/0x10 <4>[ 125.501138] ? lock_acquire+0x138/0x3c0 <4>[ 125.501156] ? lock_acquire+0x3c0/0x3c0 <4>[ 125.501176] ? reservation_object_reserve_shared+0x10b/0x250 <4>[ 125.501198] ? __slab_alloc.isra.27.constprop.34+0x3d/0x70 <4>[ 125.501219] __slab_alloc.isra.27.constprop.34+0x3d/0x70 <4>[ 125.501243] ? reservation_object_reserve_shared+0x10b/0x250 <4>[ 125.501265] __kmalloc_track_caller+0x313/0x350 <4>[ 125.501287] krealloc+0x62/0xb0 <4>[ 125.501305] reservation_object_reserve_shared+0x10b/0x250 <4>[ 125.501411] i915_gem_do_execbuffer+0x2040/0x32e0 [i915] <4>[ 125.501522] ? eb_relocate_slow+0xad0/0xad0 [i915] <4>[ 125.501544] ? debug_check_no_locks_freed+0x2a0/0x2a0 <4>[ 125.501646] ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915] <4>[ 125.501755] ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915] <4>[ 125.501779] ? drm_dev_get+0x20/0x20 <4>[ 125.501803] ? __might_fault+0xea/0x1a0 <4>[ 125.501902] ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915] <4>[ 125.502012] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502116] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502218] i915_gem_execbuffer2_ioctl+0x3c5/0x770 [i915] <4>[ 125.502243] ? drm_dev_enter+0xe0/0xe0 <4>[ 125.502260] ? lock_acquire+0x138/0x3c0 <4>[ 125.502362] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502470] ? i915_gem_object_create.part.28+0x570/0x570 [i915] <4>[ 125.502575] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502680] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502702] drm_ioctl_kernel+0x151/0x200 <4>[ 125.502721] ? drm_ioctl_permit+0x2a0/0x2a0 <4>[ 125.502746] drm_ioctl+0x63a/0x920 <4>[ 125.502844] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502868] ? drm_getstats+0x20/0x20 <4>[ 125.502886] ? trace_hardirqs_on_thunk+0x1a/0x1c <4>[ 125.502919] do_vfs_ioctl+0x173/0xe90 <4>[ 125.502936] ? trace_hardirqs_on_thunk+0x1a/0x1c <4>[ 125.502957] ? ioctl_preallocate+0x170/0x170 <4>[ 125.502978] ? trace_hardirqs_on_thunk+0x1a/0x1c <4>[ 125.503002] ? retint_kernel+0x2d/0x2d <4>[ 125.503024] ksys_ioctl+0x35/0x60 <4>[ 125.503043] __x64_sys_ioctl+0x6a/0xb0 <4>[ 125.503061] do_syscall_64+0x97/0x400 <4>[ 125.503081] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 125.503101] RIP: 0033:0x7fe18e4f65d7 <4>[ 125.503116] RSP: 002b:00007ffe2ffc06a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 <4>[ 125.503145] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe18e4f65d7 <4>[ 125.503168] RDX: 00007ffe2ffc07f0 RSI: 0000000040406469 RDI: 0000000000000003 <4>[ 125.503191] RBP: 00007ffe2ffc07f0 R08: 0000000000000004 R09: 00007ffe2ffcf080 <4>[ 125.503215] R10: 000000000002c7de R11: 0000000000000246 R12: 0000000040406469 <4>[ 125.503238] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 <4>[ 125.503268] Code: e8 18 a0 c9 da 48 8b 35 25 3a 47 00 49 c7 c0 a0 3b 88 c0 b9 95 05 00 00 48 c7 c2 e0 49 88 c0 48 c7 c7 8d 3b 5d c0 e8 ee 7e db da <0f> 0b 48 89 ef e8 a4 26 f5 da e9 51 fe ff ff e8 8a 26 f5 da e9 <1>[ 125.503548] RIP: i915_retire_requests+0x3f2/0x590 [i915] RSP: ffff88004e5dec40 Fixes: 643b450a594e ("drm/i915: Only track live rings for retiring") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180504101147.26286-1-chris@chris-wilson.co.uk
|
#
7c572e1b |
|
03-May-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Keep one request in our ring_list Don't pre-emptively retire the oldest request in our ring's list if it is the only request. We keep various bits of state alive using the active reference from the request and would rather transfer that state over to a new request rather than the more involved process of retiring and reacquiring it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180503195115.22309-2-chris@chris-wilson.co.uk
|
#
ea491b23 |
|
02-May-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Reset the hangcheck timestamp before repeating a seqno In the unusual circumstance where we reuse a seqno (for example, in igt), make sure that we reset the hangcheck timestamp before it sees the same seqno again. References: https://bugs.freedesktop.org/show_bug.cgi?id=106215 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502220313.6459-1-chris@chris-wilson.co.uk
|
#
a89d1f92 |
|
02-May-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Split i915_gem_timeline into individual timelines We need to move to a more flexible timeline that doesn't assume one fence context per engine, and so allow for a single timeline to be used across a combination of engines. This means that preallocating a fence context per engine is now a hindrance, and so we want to introduce the singular timeline. From the code perspective, this has the notable advantage of clearing up a lot of mirky semantics and some clumsy pointer chasing. By splitting the timeline up into a single entity rather than an array of per-engine timelines, we can realise the goal of the previous patch of tracking the timeline alongside the ring. v2: Tweak wait_for_idle to stop the compiling thinking that ret may be uninitialised. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-2-chris@chris-wilson.co.uk
|
#
65fcb806 |
|
02-May-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Move timeline from GTT to ring In the future, we want to move a request between engines. To achieve this, we first realise that we have two timelines in effect here. The first runs through the GTT is required for ordering vma access, which is tracked currently by engine. The second is implied by sequential execution of commands inside the ringbuffer. This timeline is one that maps to userspace's expectations when submitting requests (i.e. given the same context, batch A is executed before batch B). As the rings's timelines map to userspace and the GTT timeline an implementation detail, move the timeline from the GTT into the ring itself (per-context in logical-ring-contexts/execlists, or a global per-engine timeline for the shared ringbuffers in legacy submission. The two timelines are still assumed to be equivalent at the moment (no migrating requests between engines yet) and so we can simply move from one to the other without adding extra ordering. v2: Reinforce that one isn't allowed to mix the engine execution timeline with the client timeline from userspace (on the ring). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-1-chris@chris-wilson.co.uk
|
#
643b450a |
|
30-Apr-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Only track live rings for retiring We don't need to track every ring for its lifetime as they are managed by the contexts/engines. What we do want to track are the live rings so that we can sporadically clean up requests if userspace falls behind. We can simply restrict the gt->rings list to being only gt->live_rings. v2: s/live/active/ for consistency with gt.active_requests Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-4-chris@chris-wilson.co.uk
|
#
b887d615 |
|
30-Apr-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Retire requests along rings In the next patch, rings are the central timeline as requests may jump between engines. Therefore in the future as we retire in order along the engine timeline, we may retire out-of-order within a ring (as the ring now occurs along multiple engines), leading to much hilarity in miscomputing the position of ring->head. As an added bonus, retiring along the ring reduces the penalty of having one execlists client do cleanup for another (old legacy submission shares a ring between all clients). The downside is that slow and irregular (off the critical path) process of cleaning up stale requests after userspace becomes a modicum less efficient. In the long run, it will become apparent that the ordered ring->request_list matches the ring->timeline, a fun challenge for the future will be unifying the two lists to avoid duplication! v2: We need both engine-order and ring-order processing to maintain our knowledge of where individual rings have completed upto as well as knowing what was last executing on any engine. And finally by decoupling retiring the contexts on the engine and the timelines along the rings, we do have to keep a reference to the context on each request (previously it was guaranteed by the context being pinned). v3: Not just a reference to the context, but we need to keep it pinned as we manipulate the rings; i.e. we need a pin for both the manipulation of the engine state during its retirements, and a separate pin for the manipulation of the ring state. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-3-chris@chris-wilson.co.uk
|
#
ab82a063 |
|
30-Apr-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Wrap engine->context_pin() and engine->context_unpin() Make life easier in upcoming patches by moving the context_pin and context_unpin vfuncs into inline helpers. v2: Fixup mock_engine to mark the context as pinned on use. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-2-chris@chris-wilson.co.uk
|
#
52d7f16e |
|
30-Apr-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Stop tracking timeline->inflight_seqnos In commit 9b6586ae9f6b ("drm/i915: Keep a global seqno per-engine"), we moved from a global inflight counter to per-engine counters in the hope that will be easy to run concurrently in future. However, with the advent of the desire to move requests between engines, we do need a global counter to preserve the semantics that no engine wraps in the middle of a submit. (Although this semantic is now only required for gen7 semaphore support, which only supports greater-then comparisons!) v2: Keep a global counter of all requests ever submitted and force the reset when it wraps. References: 9b6586ae9f6b ("drm/i915: Keep a global seqno per-engine") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-1-chris@chris-wilson.co.uk
|
#
b7268c5e |
|
18-Apr-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Pack params to engine->schedule() into a struct Today we only want to pass along the priority to engine->schedule(), but in the future we want to have much more control over the various aspects of the GPU during a context's execution, for example controlling the frequency allowed. As we need an ever growing number of parameters for scheduling, move those into a struct for convenience. v2: Move the anonymous struct into its own function for legibility and ye olde gcc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180418184052.7129-3-chris@chris-wilson.co.uk
|
#
0c7112a0 |
|
18-Apr-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Rename priotree to sched Having moved the priotree struct into i915_scheduler.h, identify it as the scheduling element and rebrand into i915_sched. This becomes more useful as we start attaching more information we require to propagate through the scheduler. v2: Use i915_sched_node for future distinctiveness Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180418184052.7129-2-chris@chris-wilson.co.uk
|
#
0c5c7df3 |
|
06-Apr-2018 |
Tvrtko Ursulin <tvrtko.ursulin@intel.com> |
drm/i915/execlists: Log fence context & seqno throughout GEM_TRACE Include fence context and seqno in low level tracing so it is easier to follow flows of individual requests when things go bad. Also added tracing on the reset side of things. v2: Chris Wilson: * Standardize global_seqno and seqno as global. * Include current hws seqno in execlists_cancel_port_requests. v3: * Fix port printk format for all builds. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v2 Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180406123514.5809-1-tvrtko.ursulin@linux.intel.com
|
#
d0667e9c |
|
06-Apr-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Pass the set of guilty engines to i915_reset() Currently, we rely on inspecting the hangcheck state from within the i915_reset() routines to determine which engines were guilty of the hang. This is problematic for cases where we want to run i915_handle_error() and call i915_reset() independently of hangcheck. Instead of relying on the indirect parameter passing, turn it into an explicit parameter providing the set of stalled engines which then are treated as guilty until proven innocent. While we are removing the implicit stalled parameter, also make the reason into an explicit parameter to i915_reset(). We still need a back-channel for i915_handle_error() to hand over the task to the locked waiter, but let's keep that its own channel rather than incriminate another. This leaves stalled/seqno as being private to hangcheck, with no more nefarious snooping by reset, be it whole-device or per-engine. \o/ The only real issue now is that this makes it crystal clear that we don't actually do any testing of hangcheck per se in drv_selftest/live_hangcheck, merely of resets! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Jeff McGee <jeff.mcgee@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180406220354.18911-2-chris@chris-wilson.co.uk
|
#
e4d2006f |
|
06-Apr-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Split out parking from the idle worker for reuse We will want to park GEM before disengaging the drive^W^W^W unwedging. Since we already do the work for idling, expose the guts as a new function that we can then reuse. v2: Just skip if already parked; makes it more forgiving to use by future callers. v3: Extract mark_busy, rename it to i915_gem_unpark and place it next to i915_gem_park so that we can evaluate it for symmetry more easily. Calling GEM from inside i915_request looks to be a bit of a layering violation, for the moment I am imaging them as being notify_cb. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> #v1 Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180406155144.27791-1-chris@chris-wilson.co.uk
|
#
e7702760 |
|
27-Mar-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Include the HW breadcrumb whenever we trace the global_seqno When we include a request's global_seqno in a GEM_TRACE it often helps to know how that relates to the current breadcrumb as seen by the hardware. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180327210157.16896-3-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
#
4ccfee92 |
|
22-Mar-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove local timeline var from submit/unsubmit Both request_submit and request_unsubmit deal with transferring the request from the client's timeline onto the execution timeline and back again. As both functions deal with a pair of timeline's, using a shorthand for just one of them is slightly confusing, especially as the different functions use the shorthand for the alternate timeline. Instead, use the full version of each timeline so it should be easier to keep track of the transfer between the request/client and the engine. v2: Refactor the common lock+list_move v3: Be clear we require the other timeline list to be locked as well. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180322131034.6036-1-chris@chris-wilson.co.uk
|
#
0e59c209 |
|
22-Mar-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Fix tracing of submit seqno We pre-increment the timeline->seqno when handing it to the request, make sure the GEM_TRACE takes this into account. Otherwise, it appears that we go backwards over a preemption point: 1d..1 157681077us : __i915_request_unsubmit: vcs0 fence 75e:3 <- global_seqno 17 0d.s1 157681113us : __i915_request_submit: vcs0 fence 75e:3 -> global_seqno 16 Fixes: d9b13c4dde6c ("drm/i915: Trace GEM steps between submit and wedging") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180322110059.4467-1-chris@chris-wilson.co.uk
|
#
ce800754 |
|
20-Mar-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Add control flags to i915_handle_error() Not all callers want the GPU error to handled in the same way, so expose a control parameter. In the first instance, some callers do not want the heavyweight error capture so add a bit to request the state to be captured and saved. v2: Pass msg down to i915_reset/i915_reset_engine so that we include the reason for the reset in the dev_notice(), superseding the earlier option to not print that notice. v3: Stash the reason inside the i915->gpu_error to handover to the direct reset from the blocking waiter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jeff McGee <jeff.mcgee@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180320100449.1360-2-chris@chris-wilson.co.uk
|
#
7e9d3a4a |
|
07-Mar-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Similar to the staging around handling of engine->submit_request, we need to stop adding to the execlists->queue prior to calling engine->cancel_requests. cancel_requests will move requests from the queue onto the timeline, so if we add a request onto the queue after that point, it will be lost. Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-5-chris@chris-wilson.co.uk (cherry picked from commit 47650db02dd52267953df81438c93cf8a0eb0e5e) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
d9b13c4d |
|
15-Mar-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Trace GEM steps between submit and wedging We still have an odd race with wedging/unwedging as shown by igt/gem_eio that defies expectations. Add some more trace_printks to try and visualize the flow over the precipice. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180315131451.4060-1-chris@chris-wilson.co.uk
|
#
6f9ec414 |
|
08-Mar-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove the impedance mismatch around intel_engine_enable_signaling There is some redundancy between dma_fence->ops->enable_signaling (via i915_fence_enable_signaling) and our backend, intel_engine_enable_signaling() in that both levels recheck the fence status multiple times. If we convert intel_engine_enable_signaling() to return the information desired by dma_fence->ops->enable_signaling, we can reduce i915_fence_enable_signaling to a simple stub and avoid trying to reinterpret the same information. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180308140732.25090-1-chris@chris-wilson.co.uk
|
#
47650db0 |
|
07-Mar-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Similar to the staging around handling of engine->submit_request, we need to stop adding to the execlists->queue prior to calling engine->cancel_requests. cancel_requests will move requests from the queue onto the timeline, so if we add a request onto the queue after that point, it will be lost. Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-5-chris@chris-wilson.co.uk
|
#
36620032 |
|
07-Mar-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Update ring position from request on retiring When wedged, we do not update the ring->tail as we submit the requests causing us to leak the ring->space upon cleaning up the wedged driver. We can just use the value stored in rq->tail, and keep the submission backend details away from set-wedge. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-3-chris@chris-wilson.co.uk
|
#
f41d19be |
|
06-Mar-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Flush waiters on seqno wraparound Previously, we would spin waiting for all waiters to wake up and notice their request had completed before we would reset the seqno upon wraparound. However, we can mark their waits as complete and wake them up directly using the existing machinery for handling the flushing of missed wakeups when idling. Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180306130143.13312-2-chris@chris-wilson.co.uk
|
#
93eef7d6 |
|
06-Mar-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Stop kicking the signaling thread on seqno wraparound Since commit fd10e2ce9905 ("drm/i915/breadcrumbs: Ignore unsubmitted signalers"), we cancel the signaler when retiring the request and so upon wraparound, where we wait for all requests to be retired, we no longer need to spin waiting for the signaling thread to release its references to the in-flight requests, and so we can assert that the signaler is idle. References: fd10e2ce9905 ("drm/i915/breadcrumbs: Ignore unsubmitted signalers") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180306130143.13312-1-chris@chris-wilson.co.uk
|
#
e532be89 |
|
22-Feb-2018 |
Michel Thierry <michel.thierry@intel.com> |
drm/i915: Update missing parts after the rename to i915_request Mostly doc/print messages that were not updated after commit e61e0f51ba79 ("drm/i915: Rename drm_i915_gem_request to i915_request"). Signed-off-by: Michel Thierry <michel.thierry@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180222172405.11386-1-michel.thierry@intel.com
|
#
e61e0f51 |
|
21-Feb-2018 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Rename drm_i915_gem_request to i915_request We want to de-emphasize the link between the request (dependency, execution and fence tracking) from GEM and so rename the struct from drm_i915_gem_request to i915_request. That is we may implement the GEM user interface on top of requests, but they are an abstraction for tracking execution rather than an implementation detail of GEM. (Since they are not tied to HW, we keep the i915 prefix as opposed to intel.) In short, the spatch: @@ @@ - struct drm_i915_gem_request + struct i915_request A corollary to contracting the type name, we also harmonise on using 'rq' shorthand for local variables where space if of the essence and repetition makes 'request' unwieldy. For globals and struct members, 'request' is still much preferred for its clarity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|