History log of /linux-master/drivers/gpu/drm/i915/i915_request.c
Revision Date Author Comments
# 95f4e97f 15-Feb-2024 Jiri Slaby (SUSE) <jirislaby@kernel.org>

drm/i915: remove execute_cb::signal

execute_cb::signal is not used since commit 5ac545b8b014
(drm/i915/request: Remove the hook from await_execution). Drop it.

Found by https://github.com/jirislaby/clang-struct.

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240216065326.6910-20-jirislaby@kernel.org


# 28041067 21-Aug-2023 Andrzej Hajda <andrzej.hajda@intel.com>

drm/i915: mark requests for GuC virtual engines to avoid use-after-free

References to i915_requests may be trapped by userspace inside a
sync_file or dmabuf (dma-resv) and held indefinitely across different
proceses. To counter-act the memory leaks, we try to not to keep
references from the request past their completion.
On the other side on fence release we need to know if rq->engine
is valid and points to hw engine (true for non-virtual requests).
To make it possible extra bit has been added to rq->execution_mask,
for marking virtual engines.

Fixes: bcb9aa45d5a0 ("Revert "drm/i915: Hold reference to intel_context over life of i915_request"")
Signed-off-by: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230821153035.3903006-1-andrzej.hajda@intel.com


# 5eefc530 21-Aug-2023 Andrzej Hajda <andrzej.hajda@intel.com>

drm/i915: mark requests for GuC virtual engines to avoid use-after-free

References to i915_requests may be trapped by userspace inside a
sync_file or dmabuf (dma-resv) and held indefinitely across different
proceses. To counter-act the memory leaks, we try to not to keep
references from the request past their completion.
On the other side on fence release we need to know if rq->engine
is valid and points to hw engine (true for non-virtual requests).
To make it possible extra bit has been added to rq->execution_mask,
for marking virtual engines.

Fixes: bcb9aa45d5a0 ("Revert "drm/i915: Hold reference to intel_context over life of i915_request"")
Signed-off-by: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230821153035.3903006-1-andrzej.hajda@intel.com
(cherry picked from commit 280410677af763f3871b93e794a199cfcf6fb580)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 946e047a 20-Jul-2023 Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>

drm/i915: Fix premature release of request's reusable memory

Infinite waits for completion of GPU activity have been observed in CI,
mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or
igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps
generated with a lot of extra trace_printk() calls added to the code,
revealed loops of request dependencies being accidentally built,
preventing the requests from being processed, each waiting for completion
of another one's activity.

After we substitute a new request for a last active one tracked on a
timeline, we set up a dependency of our new request to wait on completion
of current activity of that previous one. While doing that, we must take
care of keeping the old request still in memory until we use its
attributes for setting up that await dependency, or we can happen to set
up the await dependency on an unrelated request that already reuses the
memory previously allocated to the old one, already released. Combined
with perf adding consecutive kernel context remote requests to different
user context timelines, unresolvable loops of await dependencies can be
built, leading do infinite waits.

We obtain a pointer to the previous request to wait upon when we
substitute it with a pointer to our new request in an active tracker,
e.g. in intel_timeline.last_request. In some processing paths we protect
that old request from being freed before we use it by getting a reference
to it under RCU protection, but in others, e.g. __i915_request_commit()
-> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(),
we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU,
that RCU protection is not sufficient against reuse of memory.

We could protect i915_request's memory from being prematurely reused by
calling its release function via call_rcu() and using rcu_read_lock()
consequently, as proposed in v1. However, that approach leads to
significant (up to 10 times) increase of SLAB utilization by i915_request
SLAB cache. Another potential approach is to take a reference to the
previous active fence.

When updating an active fence tracker, we first lock the new fence,
substitute a pointer of the current active fence with the new one, then we
lock the substituted fence. With this approach, there is a time window
after the substitution and before the lock when the request can be
concurrently released by an interrupt handler and its memory reused, then
we may happen to lock and return a new, unrelated request.

Always get a reference to the current active fence first, before
replacing it with a new one. Having it protected from premature release
and reuse, lock it and then replace with the new one but only if not
yet signalled via a potential concurrent interrupt nor replaced with
another one by a potential concurrent thread, otherwise retry, starting
from getting a reference to the new current one. Adjust users to not
get a reference to the previous active fence themselves and always put the
reference got by __i915_active_fence_set() when no longer needed.

v3: Fix lockdep splat reports and other issues caused by incorrect use of
try_cmpxchg() (use (cmpxchg() != prev) instead)
v2: Protect request's memory by getting a reference to it in favor of
delegating its release to call_rcu() (Chris)

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211
Fixes: df9f85d8582e ("drm/i915: Serialise i915_active_fence_set() with itself")
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janusz.krzysztofik@linux.intel.com


# 9bba6b19 17-Jul-2023 Geert Uytterhoeven <geert+renesas@glider.be>

drm: Spelling s/sempahore/semaphore/

Fix misspellings of "semaphore".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/8b0542c12a2427f34a792c41ac2d2a2922874bfa.1689600102.git.geert+renesas@glider.be


# d3f23ab9 20-Jul-2023 Andrzej Hajda <andrzej.hajda@intel.com>

drm/i915: use direct alias for i915 in requests

i915_request contains direct alias to i915, there is no point to go via
rq->engine->i915.

v2: added missing rq.i915 initialization in measure_breadcrumb_dw.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230720113002.1541572-1-andrzej.hajda@intel.com


# a337b64f 20-Jul-2023 Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>

drm/i915: Fix premature release of request's reusable memory

Infinite waits for completion of GPU activity have been observed in CI,
mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or
igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps
generated with a lot of extra trace_printk() calls added to the code,
revealed loops of request dependencies being accidentally built,
preventing the requests from being processed, each waiting for completion
of another one's activity.

After we substitute a new request for a last active one tracked on a
timeline, we set up a dependency of our new request to wait on completion
of current activity of that previous one. While doing that, we must take
care of keeping the old request still in memory until we use its
attributes for setting up that await dependency, or we can happen to set
up the await dependency on an unrelated request that already reuses the
memory previously allocated to the old one, already released. Combined
with perf adding consecutive kernel context remote requests to different
user context timelines, unresolvable loops of await dependencies can be
built, leading do infinite waits.

We obtain a pointer to the previous request to wait upon when we
substitute it with a pointer to our new request in an active tracker,
e.g. in intel_timeline.last_request. In some processing paths we protect
that old request from being freed before we use it by getting a reference
to it under RCU protection, but in others, e.g. __i915_request_commit()
-> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(),
we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU,
that RCU protection is not sufficient against reuse of memory.

We could protect i915_request's memory from being prematurely reused by
calling its release function via call_rcu() and using rcu_read_lock()
consequently, as proposed in v1. However, that approach leads to
significant (up to 10 times) increase of SLAB utilization by i915_request
SLAB cache. Another potential approach is to take a reference to the
previous active fence.

When updating an active fence tracker, we first lock the new fence,
substitute a pointer of the current active fence with the new one, then we
lock the substituted fence. With this approach, there is a time window
after the substitution and before the lock when the request can be
concurrently released by an interrupt handler and its memory reused, then
we may happen to lock and return a new, unrelated request.

Always get a reference to the current active fence first, before
replacing it with a new one. Having it protected from premature release
and reuse, lock it and then replace with the new one but only if not
yet signalled via a potential concurrent interrupt nor replaced with
another one by a potential concurrent thread, otherwise retry, starting
from getting a reference to the new current one. Adjust users to not
get a reference to the previous active fence themselves and always put the
reference got by __i915_active_fence_set() when no longer needed.

v3: Fix lockdep splat reports and other issues caused by incorrect use of
try_cmpxchg() (use (cmpxchg() != prev) instead)
v2: Protect request's memory by getting a reference to it in favor of
delegating its release to call_rcu() (Chris)

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211
Fixes: df9f85d8582e ("drm/i915: Serialise i915_active_fence_set() with itself")
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janusz.krzysztofik@linux.intel.com
(cherry picked from commit 946e047a3d88d46d15b5c5af0414098e12b243f7)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 848a4e5c 08-Jun-2023 Luca Coelho <luciano.coelho@intel.com>

drm/i915: add a dedicated workqueue inside drm_i915_private

In order to avoid flush_scheduled_work() usage, add a dedicated
workqueue in the drm_i915_private structure. In this way, we don't
need to use the system queue anymore.

This change is mostly mechanical and based on Tetsuo's original
patch[1].

v6 by Jani:
- Also create unordered_wq for mock device

Link: https://patchwork.freedesktop.org/series/114608/ [1]
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/c816ebe17ef08d363981942a096a586a7658a65e.1686231190.git.jani.nikula@intel.com


# 95ccb25e 01-Mar-2023 Jani Nikula <jani.nikula@intel.com>

drm/i915: remove unnecessary intel_pm.h includes

As intel_pm.[ch] used to contain much more, intel_pm.h was included in a
lot of places. Many of them are now unnecessary. Remove.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ab9a7147b0cd63d95b9f27ed40615b9c9be18f84.1677678803.git.jani.nikula@intel.com


# ff1e93e9 18-Jan-2023 Jani Nikula <jani.nikula@intel.com>

drm/i915: add i915_config.h and move relevant declarations there

We already have i915_config.c. Add the i915_config.h counterpart, and
declutter i915_drv.h in the process.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230118131538.3558599-1-jani.nikula@intel.com


# e6177ec5 27-Sep-2022 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915/huc: stall media submission until HuC is loaded

Wait on the fence to be signalled to avoid the submissions finding HuC
not yet loaded.

v2: use dedicaded wait_queue_entry for waiting in HuC load, as submitq
can't be re-used for it.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tony Ye <tony.ye@intel.com>
Acked-by: Tony Ye <tony.ye@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220928004145.745803-13-daniele.ceraolospurio@intel.com


# bcb9aa45 14-Jun-2022 Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Revert "drm/i915: Hold reference to intel_context over life of i915_request"

This reverts commit 1e98d8c52ed5dfbaf273c4423c636525c2ce59e7.

The problem with this patch is that it makes i915_request to hold a
reference to intel_context, which in turn holds a reference on the VM.
This strong back referencing can lead to reference loops which leads
to resource leak.

An example is the upcoming VM_BIND work which requires VM to hold
a reference to some shared VM specific BO. But this BO's dma-resv
fences holds reference to the i915_request thus leading to reference
loop.

v2:
Do not use reserved requests for virtual engines

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Mathew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220614184348.23746-3-ramalingam.c@intel.com


# 7307e91b 14-Jun-2022 Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

drm/i915: Do not access rq->engine without a reference

In i915_fence_get_driver_name(), user may not hold a
reference to rq->engine. Hence do not access it. Instead,
store required device private pointer in 'rq->i915' and use it.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220614184348.23746-2-ramalingam.c@intel.com


# 45c64ecf 27-May-2022 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Improve user experience and driver robustness under SIGINT or similar

We have long standing customer complaints that pressing Ctrl-C (or to the
effect of) causes engine resets with otherwise well behaving programs.

Not only is logging engine resets during normal operation not desirable
since it creates support incidents, but more fundamentally we should avoid
going the engine reset path when we can since any engine reset introduces
a chance of harming an innocent context.

Reason for this undesirable behaviour is that the driver currently does
not distinguish between banned contexts and non-persistent contexts which
have been closed.

To fix this we add the distinction between the two reasons for revoking
contexts, which then allows the strict timeout only be applied to banned,
while innocent contexts (well behaving) can preempt cleanly and exit
without triggering the engine reset path.

Note that the added context exiting category applies both to closed non-
persistent context, and any exiting context when hangcheck has been
disabled by the user.

At the same time we rename the backend operation from 'ban' to 'revoke'
which more accurately describes the actual semantics. (There is no ban at
the backend level since banning is a concept driven by the scheduling
frontend. Backends are simply able to revoke a running context so that
is the more appropriate name chosen.)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220527072452.2225610-1-tvrtko.ursulin@linux.intel.com


# 7bc80a54 09-Nov-2021 Christian König <christian.koenig@amd.com>

dma-buf: add enum dma_resv_usage v4

This change adds the dma_resv_usage enum and allows us to specify why a
dma_resv object is queried for its containing fences.

Additional to that a dma_resv_usage_rw() helper function is added to aid
retrieving the fences for a read or write userspace submission.

This is then deployed to the different query functions of the dma_resv
object and all of their users. When the write paratermer was previously
true we now use DMA_RESV_USAGE_WRITE and DMA_RESV_USAGE_READ otherwise.

v2: add KERNEL/OTHER in separate patch
v3: some kerneldoc suggestions by Daniel
v4: some more kerneldoc suggestions by Daniel, fix missing cases lost in
the rebase pointed out by Bas.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-2-christian.koenig@amd.com


# 24524e3f 09-Feb-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915: move the DRIVER_* macros to i915_driver.[ch]

The macros are more at home in i915_driver.[ch].

v2: Rebase

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220209123121.3337496-1-jani.nikula@intel.com


# 202b1f4c 10-Jan-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915/gt: Move engine registers to their own header

Let's continue breaking up and cleaning up the massive i915_reg.h file
by moving all registers that are defined in relation to an engine base
to their own header.

There are probably a bunch of other "engine registers" that we haven't
moved yet (especially those that belong to the render engine in the
0x2??? range), but this is a relatively straightforward first step.

Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220111051600.3429104-8-matthew.d.roper@intel.com


# 60dc43d1 10-Jan-2022 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Use struct vma_resource instead of struct vma_snapshot

There is always a struct vma_resource guaranteed to be alive when we
access a corresponding struct vma_snapshot.

So ditch the latter and instead of allocating vma_snapshots, reference
the already existning vma_resource.

This requires a couple of extra members in struct vma_resource but that's
a small price to pay for the simplification.

v2:
- Fix a missing include and declaration (kernel test robot <lkp@intel.com>)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220110172219.107131-7-thomas.hellstrom@linux.intel.com


# 63cf4cad 21-Dec-2021 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Break out the i915_deps utility

Since it's starting to be used outside the i915 TTM move code, move it
to a separate set of files.

v2:
- Update the documentation.
v4:
- Rebase.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211221200050.436316-4-thomas.hellstrom@linux.intel.com


# 11930817 21-Dec-2021 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Avoid using the i915_fence_array when collecting dependencies

Since the gt migration code was using only a single fence for
dependencies, these were collected in a dma_fence_array. However, it
turns out that it's illegal to use some dma_fences in a dma_fence_array,
in particular other dma_fence_arrays and dma_fence_chains, and this
causes trouble for us moving forward.

Have the gt migration code instead take a const struct i915_deps for
dependencies. This means we can skip the dma_fence_array creation
and instead pass the struct i915_deps instead to circumvent the
problem.

v2:
- Make the prev_deps() function static. (kernel test robot <lkp@intel.com>)
- Update the struct i915_deps kerneldoc.
v4:
- Rebase.

Fixes: 5652df829b3c ("drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211221200050.436316-2-thomas.hellstrom@linux.intel.com


# 40aa583e 09-Dec-2021 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Don't leak the capture list items

When we recently converted the capture code to use vma snapshots,
we forgot to free the struct i915_capture_list list items after use.

Fix that by bringing back a kfree.

Fixes: ff20afc4cee7 ("drm/i915: Update error capture code to avoid using the current vma state")
Cc: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211209141304.393479-1-thomas.hellstrom@linux.intel.com


# ff20afc4 29-Nov-2021 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Update error capture code to avoid using the current vma state

With asynchronous migrations, the vma state may be several migrations
ahead of the state that matches the request we're capturing.
Address that by introducing an i915_vma_snapshot structure that
can be used to snapshot relevant state at request submission.
In order to make sure we access the correct memory, the snapshots take
references on relevant sg-tables and memory regions.

Also move the capture list allocation out of the fence signaling
critical path and use the CONFIG_DRM_I915_CAPTURE_ERROR define to
avoid compiling in members and functions used for error capture
when they're not used.

Finally, Introduce lockdep annotation.

v4:
- Break out the capture allocation mode change to a separate patch.
v5:
- Fix compilation error in the !CONFIG_DRM_I915_CAPTURE_ERROR case
(kernel test robot)
v6:
- Use #if IS_ENABLED() instead of #ifdef to match driver style.
- Move yet another change of allocation mode to the separate patch.
- Commit message rework due to patch reordering.
v7:
- Adjust for removal of region refcounting.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211129202245.472043-1-thomas.hellstrom@linux.intel.com


# 44505168 16-Nov-2021 Matthew Brost <matthew.brost@intel.com>

drm/i915: Drop stealing of bits from i915_sw_fence function pointer

Rather than stealing bits from i915_sw_fence function pointer use
separate fields for function pointer and flags. If using two different
fields, the 4 byte alignment for the i915_sw_fence function pointer can
also be dropped.

v2:
(CI)
- Set new function field rather than flags in __i915_sw_fence_init
v3:
(Tvrtko)
- Remove BUG_ON(!fence->flags) in reinit as that will now blow up
- Only define fence->flags if CONFIG_DRM_I915_SW_FENCE_CHECK_DAG is
defined
v4:
- Rebase, resend for CI

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211116194929.10211-1-matthew.brost@intel.com


# 7e2e69ed 20-Oct-2021 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Fix i915_request fence wait semantics

The i915_request fence wait behaves differently for timeout = 0
compared to expected dma-fence behavior.

i915 behavior:
- Unsignaled: -ETIME
- Signaled: 0 (= timeout)

Expected:
- Unsignaled: 0
- Signaled: 1

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211116102431.198905-6-christian.koenig@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>


# 493043fe 01-Nov-2021 Vinay Belgaumkar <vinay.belgaumkar@intel.com>

drm/i915/guc/slpc: Add waitboost functionality for SLPC

Add helper in RPS code for handling SLPC and non-SLPC paths.
When boost is requested in the SLPC path, we can ask GuC to ramp
up the frequency req by setting the minimum frequency to boost freq.
Reset freq back to the min softlimit when there are no more waiters.

v2: Schedule a worker thread which can boost freq from within
an interrupt context as well.

v3: No need to check against requested freq before scheduling boost
work (Ashutosh)

Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211102012608.8609-3-vinay.belgaumkar@intel.com


# 8581fd40 02-Dec-2021 Jakub Kicinski <kuba@kernel.org>

treewide: Add missing includes masked by cgroup -> bpf dependency

cgroup.h (therefore swap.h, therefore half of the universe)
includes bpf.h which in turn includes module.h and slab.h.
Since we're about to get rid of that dependency we need
to clean things up.

v2: drop the cpu.h include from cacheinfo.h, it's not necessary
and it makes riscv sensitive to ordering of include files.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Krzysztof Wilczyński <kw@linux.com>
Acked-by: Peter Chen <peter.chen@kernel.org>
Acked-by: SeongJae Park <sj@kernel.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/all/20211120035253.72074-1-kuba@kernel.org/ # v1
Link: https://lore.kernel.org/all/20211120165528.197359-1-kuba@kernel.org/ # cacheinfo discussion
Link: https://lore.kernel.org/bpf/20211202203400.1208663-1-kuba@kernel.org


# a585070f 12-Sep-2021 Christian König <christian.koenig@amd.com>

drm/i915: use the new iterator in i915_request_await_object v2

Simplifying the code a bit.

v2: add missing rcu_read_lock()/rcu_read_unlock()
v3: use dma_resv_for_each_fence instead

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211005113742.1101-20-christian.koenig@amd.com


# afc76f30 14-Oct-2021 Matthew Brost <matthew.brost@intel.com>

drm/i915: Make request conflict tracking understand parallel submits

If an object in the excl or shared slot is a composite fence from a
parallel submit and the current request in the conflict tracking is from
the same parallel context there is no need to enforce ordering as the
ordering is already implicit. Make the request conflict tracking
understand this by comparing a parallel submit's parent context and
skipping conflict insertion if the values match.

v2:
(John Harrison)
- Reword commit message

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-23-matthew.brost@intel.com


# bc955204 14-Oct-2021 Matthew Brost <matthew.brost@intel.com>

drm/i915/guc: Insert submit fences between requests in parent-child relationship

The GuC must receive requests in the order submitted for contexts in a
parent-child relationship to function correctly. To ensure this, insert
a submit fence between the current request and last request submitted
for requests / contexts in a parent child relationship. This is
conceptually similar to a single timeline.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-14-matthew.brost@intel.com


# 1a839e01 05-Oct-2021 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915: remove IS_ACTIVE

When trying to bring IS_ACTIVE to linux/kconfig.h I thought it wouldn't
provide much value just encapsulating it in a boolean context. So I also
added the support for handling undefined macros as the IS_ENABLED()
counterpart. However the feedback received from Masahiro Yamada was that
it is too ugly, not providing much value. And just wrapping in a boolean
context is too dumb - we could simply open code it.

As detailed in commit babaab2f4738 ("drm/i915: Encapsulate kconfig
constant values inside boolean predicates"), the IS_ACTIVE macro was
added to workaround a compilation warning. However after checking again
our current uses of IS_ACTIVE it turned out there is only
1 case in which it triggers a warning in clang (due
-Wconstant-logical-operand) and 2 in smatch. All the others
can simply use the shorter version, without wrapping it in any macro.

So here I'm dialing all the way back to simply removing the macro. That
single case hit by clang can be changed to make the constant come first,
so it doesn't think it's mask:

- if (context && CONFIG_DRM_I915_FENCE_TIMEOUT)
+ if (CONFIG_DRM_I915_FENCE_TIMEOUT && context)

As talked with Dan Carpenter, that logic will be added in smatch as
well, so it will also stop warning about it.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211005171728.3147094-1-lucas.demarchi@intel.com


# 07f82a47 04-Oct-2021 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Handle Intel igfx + Intel dgfx hybrid graphics setup

In short this makes i915 work for hybrid setups (DRI_PRIME=1 with Mesa)
when rendering is done on Intel dgfx and scanout/composition on Intel
igfx.

Before this patch the driver was not quite ready for that setup, mainly
because it was able to emit a semaphore wait between the two GPUs, which
results in deadlocks because semaphore target location in HWSP is neither
shared between the two, nor mapped in both GGTT spaces.

To fix it the patch adds an additional check to a couple of relevant code
paths in order to prevent using semaphores for inter-engine
synchronisation when relevant objects are not in the same GGTT space.

v2:
* Avoid adding rq->i915. (Chris)

v3:
* Use GGTT which describes the limit more precisely.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211005113135.768295-1-tvrtko.ursulin@linux.intel.com


# be988eae 21-Sep-2021 Matthew Auld <matthew.auld@intel.com>

drm/i915/request: fix early tracepoints

Currently we blow up in trace_dma_fence_init, when calling into
get_driver_name or get_timeline_name, since both the engine and context
might be NULL(or contain some garbage address) in the case of newly
allocated slab objects via the request ctor. Note that we also use
SLAB_TYPESAFE_BY_RCU here, which allows requests to be immediately
freed, but delay freeing the underlying page by an RCU grace period.
With this scheme requests can be re-allocated, at the same time as they
are also being read by some lockless RCU lookup mechanism.

In the ctor case, which is only called for new slab objects(i.e allocate
new page and call the ctor for each object) it's safe to reset the
context/engine prior to calling into dma_fence_init, since we can be
certain that no one is doing an RCU lookup which might depend on peeking
at the engine/context, like in active_engine(), since the object can't
yet be externally visible.

In the recycled case(which might also be externally visible) the request
refcount always transitions from 0->1 after we set the context/engine
etc, which should ensure it's valid to dereference the engine for
example, when doing an RCU list-walk, so long as we can also increment
the refcount first. If the refcount is already zero, then the request is
considered complete/released. If it's non-zero, then the request might
be in the process of being re-allocated, or potentially still in flight,
however after successfully incrementing the refcount, it's possible to
carefully inspect the request state, to determine if the request is
still what we were looking for. Note that all externally visible
requests returned to the cache must have zero refcount.

One possible fix then is to move dma_fence_init out from the request
ctor. Originally this was how it was done, but it was moved in:

commit 855e39e65cfc33a73724f1cc644ffc5754864a20
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Feb 3 09:41:48 2020 +0000

drm/i915: Initialise basic fence before acquiring seqno

where it looks like intel_timeline_get_seqno() relied on some of the
rq->fence state, but that is no longer the case since:

commit 12ca695d2c1ed26b2dcbb528b42813bd0f216cfc
Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date: Tue Mar 23 16:49:50 2021 +0100

drm/i915: Do not share hwsp across contexts any more, v8.

intel_timeline_get_seqno() could also be cleaned up slightly by dropping
the request argument.

Moving dma_fence_init back out of the ctor, should ensure we have enough
of the request initialised in case of trace_dma_fence_init.
Functionally this should be the same, and is effectively what we were
already open coding before, except now we also assign the fence->lock
and fence->ops, but since these are invariant for recycled
requests(which might be externally visible), and will therefore already
hold the same value, it shouldn't matter.

An alternative fix, since we don't yet have a fully initialised request
when in the ctor, is just setting the context/engine as NULL, but this
does require adding some extra handling in get_driver_name etc.

v2(Daniel):
- Try to make the commit message less confusing

Fixes: 855e39e65cfc ("drm/i915: Initialise basic fence before acquiring seqno")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Michael Mason <michael.w.mason@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210921134202.3803151-1-matthew.auld@intel.com


# c83ff018 21-Sep-2021 Matthew Auld <matthew.auld@intel.com>

drm/i915/request: fix early tracepoints

Currently we blow up in trace_dma_fence_init, when calling into
get_driver_name or get_timeline_name, since both the engine and context
might be NULL(or contain some garbage address) in the case of newly
allocated slab objects via the request ctor. Note that we also use
SLAB_TYPESAFE_BY_RCU here, which allows requests to be immediately
freed, but delay freeing the underlying page by an RCU grace period.
With this scheme requests can be re-allocated, at the same time as they
are also being read by some lockless RCU lookup mechanism.

In the ctor case, which is only called for new slab objects(i.e allocate
new page and call the ctor for each object) it's safe to reset the
context/engine prior to calling into dma_fence_init, since we can be
certain that no one is doing an RCU lookup which might depend on peeking
at the engine/context, like in active_engine(), since the object can't
yet be externally visible.

In the recycled case(which might also be externally visible) the request
refcount always transitions from 0->1 after we set the context/engine
etc, which should ensure it's valid to dereference the engine for
example, when doing an RCU list-walk, so long as we can also increment
the refcount first. If the refcount is already zero, then the request is
considered complete/released. If it's non-zero, then the request might
be in the process of being re-allocated, or potentially still in flight,
however after successfully incrementing the refcount, it's possible to
carefully inspect the request state, to determine if the request is
still what we were looking for. Note that all externally visible
requests returned to the cache must have zero refcount.

One possible fix then is to move dma_fence_init out from the request
ctor. Originally this was how it was done, but it was moved in:

commit 855e39e65cfc33a73724f1cc644ffc5754864a20
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Feb 3 09:41:48 2020 +0000

drm/i915: Initialise basic fence before acquiring seqno

where it looks like intel_timeline_get_seqno() relied on some of the
rq->fence state, but that is no longer the case since:

commit 12ca695d2c1ed26b2dcbb528b42813bd0f216cfc
Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date: Tue Mar 23 16:49:50 2021 +0100

drm/i915: Do not share hwsp across contexts any more, v8.

intel_timeline_get_seqno() could also be cleaned up slightly by dropping
the request argument.

Moving dma_fence_init back out of the ctor, should ensure we have enough
of the request initialised in case of trace_dma_fence_init.
Functionally this should be the same, and is effectively what we were
already open coding before, except now we also assign the fence->lock
and fence->ops, but since these are invariant for recycled
requests(which might be externally visible), and will therefore already
hold the same value, it shouldn't matter.

An alternative fix, since we don't yet have a fully initialised request
when in the ctor, is just setting the context/engine as NULL, but this
does require adding some extra handling in get_driver_name etc.

v2(Daniel):
- Try to make the commit message less confusing

Fixes: 855e39e65cfc ("drm/i915: Initialise basic fence before acquiring seqno")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Michael Mason <michael.w.mason@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210921134202.3803151-1-matthew.auld@intel.com
(cherry picked from commit be988eaee1cb208c4445db46bc3ceaf75f586f0b)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 47514ac7 27-Jul-2021 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: move request slabs to direct module init/exit

With the global kmem_cache shrink infrastructure gone there's nothing
special and we can convert them over.

I'm doing this split up into each patch because there's quite a bit of
noise with removing the static global.slab_requests|execute_cbs to just a
slab_requests|execute_cbs.

v2: Make slab static (Jason, 0day)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727121037.2041102-7-daniel.vetter@ffwll.ch


# ee242ca7 26-Jul-2021 Matthew Brost <matthew.brost@intel.com>

drm/i915/guc: Implement GuC priority management

Implement a simple static mapping algorithm of the i915 priority levels
(int, -1k to 1k exposed to user) to the 4 GuC levels. Mapping is as
follows:

i915 level < 0 -> GuC low level (3)
i915 level == 0 -> GuC normal level (2)
i915 level < INT_MAX -> GuC high level (1)
i915 level == INT_MAX -> GuC highest level (0)

We believe this mapping should cover the UMD use cases (3 distinct user
levels + 1 kernel level).

In addition to static mapping, a simple counter system is attached to
each context tracking the number of requests inflight on the context at
each level. This is needed as the GuC levels are per context while in
the i915 levels are per request.

v2:
(Daniele)
- Add BUILD_BUG_ON to enforce ordering of priority levels
- Add missing lockdep to guc_prio_fini
- Check for return before setting context registered flag
- Map DISPLAY priority or higher to highest guc prio
- Update comment for guc_prio

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-33-matthew.brost@intel.com


# 62eaf0ae 26-Jul-2021 Matthew Brost <matthew.brost@intel.com>

drm/i915/guc: Support request cancellation

This adds GuC backend support for i915_request_cancel(), which in turn
makes CONFIG_DRM_I915_REQUEST_TIMEOUT work.

This implementation makes use of fence while there are likely simplier
options. A fence was chosen because of another feature coming soon
which requires a user to block on a context until scheduling is
disabled. In that case we return the fence to the user and the user can
wait on that fence.

v2:
(Daniele)
- A comment about locking the blocked incr / decr
- A comments about the use of the fence
- Update commit message explaining why fence
- Delete redundant check blocked count in unblock function
- Ring buffer implementation
- Comment about blocked in submission path
- Shorter rpm path
v3:
(Checkpatch)
- Fix typos in commit message
(Daniel)
- Rework to simplier locking structure in guc_context_block / unblock

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-26-matthew.brost@intel.com


# dc0dad36 26-Jul-2021 John Harrison <John.C.Harrison@Intel.com>

drm/i915/guc: Fix for error capture after full GPU reset with GuC

In the case of a full GPU reset (e.g. because GuC has died or because
GuC's hang detection has been disabled), the driver can't rely on GuC
reporting the guilty context. Instead, the driver needs to scan all
active contexts and find one that is currently executing, as per the
execlist mode behaviour. In GuC mode, this scan is different to
execlist mode as the active request list is handled very differently.

Similarly, the request state dump in debugfs needs to be handled
differently when in GuC submission mode.

Also refactured some of the request scanning code to avoid duplication
across the multiple code paths that are now replicating it.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-20-matthew.brost@intel.com


# d1cee2d3 26-Jul-2021 Matthew Brost <matthew.brost@intel.com>

drm/i915: Move active request tracking to a vfunc

Move active request tracking to a backend vfunc rather than assuming all
backends want to do this in the manner. In the of case execlists /
ring submission the tracking is on the physical engine while with GuC
submission it is on the context.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-8-matthew.brost@intel.com


# 1e98d8c5 26-Jul-2021 Matthew Brost <matthew.brost@intel.com>

drm/i915: Hold reference to intel_context over life of i915_request

Hold a reference to the intel_context over life of an i915_request.
Without this an i915_request can exist after the context has been
destroyed (e.g. request retired, context closed, but user space holds a
reference to the request from an out fence). In the case of GuC
submission + virtual engine, the engine that the request references is
also destroyed which can trigger bad pointer dref in fence ops (e.g.
i915_fence_get_driver_name). We could likely change
i915_fence_get_driver_name to avoid touching the engine but let's just
be safe and hold the intel_context reference.

v2:
(John Harrison)
- Update comment explaining how GuC mode and execlists mode deal with
virtual engines differently

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-4-matthew.brost@intel.com


# 96d3e0e1 26-Jul-2021 John Harrison <John.C.Harrison@Intel.com>

drm/i915/guc: Make hangcheck work with GuC virtual engines

The serial number tracking of engines happens at the backend of
request submission and was expecting to only be given physical
engines. However, in GuC submission mode, the decomposition of virtual
to physical engines does not happen in i915. Instead, requests are
submitted to their virtual engine mask all the way through to the
hardware (i.e. to GuC). This would mean that the heart beat code
thinks the physical engines are idle due to the serial number not
incrementing. Which in turns means hangcheck does not work for
GuC virtual engines.

This patch updates the tracking to decompose virtual engines into
their physical constituents and tracks the request against each. This
is not entirely accurate as the GuC will only be issuing the request
to one physical engine. However, it is the best that i915 can do given
that it has no knowledge of the GuC's scheduling decisions.

Downside of this is that all physical engines constituting a GuC
virtual engine will be periodically unparked (even during just a single
context executing) in order to be pinged with a heartbeat request.
However the power and performance cost of this is not expected to be
measurable (due low frequency of heartbeat pulses) and it is considered
an easier option than trying to make changes to GuC firmware.

v2:
(Tvrtko)
- Update commit message
- Have default behavior if no vfunc present

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-3-matthew.brost@intel.com


# 38d5ec43 21-Jul-2021 Matthew Brost <matthew.brost@intel.com>

drm/i915/guc: Ensure request ordering via completion fences

If two requests are on the same ring, they are explicitly ordered by the
HW. So, a submission fence is sufficient to ensure ordering when using
the new GuC submission interface. Conversely, if two requests share a
timeline and are on the same physical engine but different context this
doesn't ensure ordering on the new GuC submission interface. So, a
completion fence needs to be used to ensure ordering.

v2:
(Daniele)
- Don't delete spin lock
v3:
(Daniele)
- Delete forward dec

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-13-matthew.brost@intel.com


# 3a4cdf19 21-Jul-2021 Matthew Brost <matthew.brost@intel.com>

drm/i915/guc: Implement GuC context operations for new inteface

Implement GuC context operations which includes GuC specific operations
alloc, pin, unpin, and destroy.

v2:
(Daniel Vetter)
- Use msleep_interruptible rather than cond_resched in busy loop
(Michal)
- Remove C++ style comment
v3:
(Matthew Brost)
- Drop GUC_ID_START
(John Harrison)
- Fix a bunch of typos
- Use drm_err rather than drm_dbg for G2H errors
(Daniele)
- Fix ;; typo
- Clean up sched state functions
- Add lockdep for guc_id functions
- Don't call __release_guc_id when guc_id is invalid
- Use MISSING_CASE
- Add comment in guc_context_pin
- Use shorter path to rpm
(Daniele / CI)
- Don't call release_guc_id on an invalid guc_id in destroy
v4:
(Daniel Vetter)
- Add FIXME comment

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-7-matthew.brost@intel.com


# 4f62a7e0 21-Jul-2021 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Ditch i915 globals shrink infrastructure

This essentially reverts

commit 84a1074920523430f9dc30ff907f4801b4820072
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Jan 24 11:36:08 2018 +0000

drm/i915: Shrink the GEM kmem_caches upon idling

mm/vmscan.c:do_shrink_slab() is a thing, if there's an issue with it
then we need to fix that there, not hand-roll our own slab shrinking
code in i915.

Also when this was added there was only one other caller of
kmem_cache_shrink (added 2005 to the acpi code). Now there's a 2nd one
outside of i915 code in a kunit test, which seems legit since that
wants to very carefully control what's in the kmem_cache. This out of
a total of over 500 calls to kmem_cache_create. This alone should have
been warning sign enough that we're doing something silly.

Noticed while reviewing a patch set from Jason to fix up some issues
in our i915_init() and i915_exit() module load/cleanup code. Now that
i915_globals.c isn't any different than normal init/exit functions, we
should convert them over to one unified table and remove
i915_globals.[hc] entirely.

v2: Improve commit message (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: David Airlie <airlied@linux.ie>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721183229.4136488-1-daniel.vetter@ffwll.ch


# 93a2711c 14-Jul-2021 Jason Ekstrand <jason@jlekstrand.net>

Revert "drm/i915: Propagate errors on awaiting already signaled fences"

This reverts commit 9e31c1fe45d555a948ff66f1f0e3fe1f83ca63f7. Ever
since that commit, we've been having issues where a hang in one client
can propagate to another. In particular, a hang in an app can propagate
to the X server which causes the whole desktop to lock up.

Error propagation along fences sound like a good idea, but as your bug
shows, surprising consequences, since propagating errors across security
boundaries is not a good thing.

What we do have is track the hangs on the ctx, and report information to
userspace using RESET_STATS. That's how arb_robustness works. Also, if my
understanding is still correct, the EIO from execbuf is when your context
is banned (because not recoverable or too many hangs). And in all these
cases it's up to userspace to figure out what is all impacted and should
be reported to the application, that's not on the kernel to guess and
automatically propagate.

What's more, we're also building more features on top of ctx error
reporting with RESET_STATS ioctl: Encrypted buffers use the same, and the
userspace fence wait also relies on that mechanism. So it is the path
going forward for reporting gpu hangs and resets to userspace.

So all together that's why I think we should just bury this idea again as
not quite the direction we want to go to, hence why I think the revert is
the right option here.

For backporters: Please note that you _must_ have a backport of
https://lore.kernel.org/dri-devel/20210602164149.391653-2-jason@jlekstrand.net/
for otherwise backporting just this patch opens up a security bug.

v2: Augment commit message. Also restore Jason's sob that I
accidentally lost.

v3: Add a note for backporters

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Marcin Slusarz <marcin.slusarz@intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
Fixes: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210714193419.1459723-3-jason@jlekstrand.net


# 5ac545b8 08-Jul-2021 Jason Ekstrand <jason@jlekstrand.net>

drm/i915/request: Remove the hook from await_execution

This was only ever used for FENCE_SUBMIT automatic engine selection
which was removed in the previous commit.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-12-jason@jlekstrand.net


# 3f623e06 17-Jun-2021 Matthew Brost <matthew.brost@intel.com>

drm/i915: Move engine->schedule to i915_sched_engine

The schedule function should be in the schedule object.

v3:
(Jason Ekstrand)
Add kernel doc

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-6-matthew.brost@intel.com


# 349a2bc5 17-Jun-2021 Matthew Brost <matthew.brost@intel.com>

drm/i915: Move active tracking to i915_sched_engine

Move active request tracking and its lock to i915_sched_engine. This
lock is also the submission lock so having it in the i915_sched_engine
is the correct place.

v3:
(Jason Ekstrand)
Add kernel doc
v6:
Rebase

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.comk>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-5-matthew.brost@intel.com


# 3761baae 14-Jul-2021 Jason Ekstrand <jason@jlekstrand.net>

Revert "drm/i915: Propagate errors on awaiting already signaled fences"

This reverts commit 9e31c1fe45d555a948ff66f1f0e3fe1f83ca63f7. Ever
since that commit, we've been having issues where a hang in one client
can propagate to another. In particular, a hang in an app can propagate
to the X server which causes the whole desktop to lock up.

Error propagation along fences sound like a good idea, but as your bug
shows, surprising consequences, since propagating errors across security
boundaries is not a good thing.

What we do have is track the hangs on the ctx, and report information to
userspace using RESET_STATS. That's how arb_robustness works. Also, if my
understanding is still correct, the EIO from execbuf is when your context
is banned (because not recoverable or too many hangs). And in all these
cases it's up to userspace to figure out what is all impacted and should
be reported to the application, that's not on the kernel to guess and
automatically propagate.

What's more, we're also building more features on top of ctx error
reporting with RESET_STATS ioctl: Encrypted buffers use the same, and the
userspace fence wait also relies on that mechanism. So it is the path
going forward for reporting gpu hangs and resets to userspace.

So all together that's why I think we should just bury this idea again as
not quite the direction we want to go to, hence why I think the revert is
the right option here.

For backporters: Please note that you _must_ have a backport of
https://lore.kernel.org/dri-devel/20210602164149.391653-2-jason@jlekstrand.net/
for otherwise backporting just this patch opens up a security bug.

v2: Augment commit message. Also restore Jason's sob that I
accidentally lost.

v3: Add a note for backporters

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Marcin Slusarz <marcin.slusarz@intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
Fixes: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210714193419.1459723-3-jason@jlekstrand.net
(cherry picked from commit 93a2711cddd5760e2f0f901817d71c93183c3b87)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 651e7d48 05-Jun-2021 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915: replace IS_GEN and friends with GRAPHICS_VER

This was done by the following semantic patch:

@@ expression i915; @@
- INTEL_GEN(i915)
+ GRAPHICS_VER(i915)

@@ expression i915; expression E; @@
- INTEL_GEN(i915) >= E
+ GRAPHICS_VER(i915) >= E

@@ expression dev_priv; expression E; @@
- !IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) != E

@@ expression dev_priv; expression E; @@
- IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) == E

@@
expression dev_priv;
expression from, until;
@@
- IS_GEN_RANGE(dev_priv, from, until)
+ IS_GRAPHICS_VER(dev_priv, from, until)

@def@
expression E;
identifier id =~ "^gen$";
@@
- id = GRAPHICS_VER(E)
+ ver = GRAPHICS_VER(E)

@@
identifier def.id;
@@
- id
+ ver

It also takes care of renaming the variable we assign to GRAPHICS_VER()
so to use "ver" rather than "gen".

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210606045050.103862-2-lucas.demarchi@intel.com


# d3fae3b3 02-Jun-2021 Christian König <christian.koenig@amd.com>

dma-buf: drop the _rcu postfix on function names v3

The functions can be called both in _rcu context as well
as while holding the lock.

v2: add some kerneldoc as suggested by Daniel
v3: fix indentation

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210602111714.212426-7-christian.koenig@amd.com


# 6b41323a 01-Jun-2021 Christian König <christian.koenig@amd.com>

dma-buf: rename dma_resv_get_excl_rcu to _unlocked

That describes much better what the function is doing here.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210602111714.212426-6-christian.koenig@amd.com


# 0333ec88 28-Apr-2021 Bernard Zhao <bernard@vivo.com>

drm/i915: Use might_alloc()

This maybe uses lockdep through the fs_reclaim annotations.

Signed-off-by: Bernard Zhao <bernard@vivo.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210429021327.57944-1-bernard@vivo.com


# f7c37977 26-Mar-2021 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Take request reference before arming the watchdog timer

Reference needs to be taken before arming the timer. Luckily, given the
default timer period of 20s, the potential to hit the race is extremely
unlikely.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 9b4d0598ee94 ("drm/i915: Request watchdog infrastructure")
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210326105759.2387104-1-tvrtko.ursulin@linux.intel.com


# eef24f11 26-Mar-2021 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Take request reference before arming the watchdog timer

Reference needs to be taken before arming the timer. Luckily, given the
default timer period of 20s, the potential to hit the race is extremely
unlikely.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 9b4d0598ee94 ("drm/i915: Request watchdog infrastructure")
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210326105759.2387104-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit f7c379779161d364eb30338529490eac7dc377b7)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 9b4d0598 23-Mar-2021 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Request watchdog infrastructure

Prepares the plumbing for setting request/fence expiration time. All code
is put in place but is never activated due yet missing ability to actually
configure the timer.

Outline of the basic operation:

A timer is started when request is ready for execution. If the request
completes (retires) before the timer fires, timer is cancelled and nothing
further happens.

If the timer fires request is added to a lockless list and worker queued.
Purpose of this is twofold: a) It allows request cancellation from a more
friendly context and b) coalesces multiple expirations into a single event
of consuming the list.

Worker locklessly consumes the list of expired requests and cancels them
all using previous added i915_request_cancel().

Associated timeout value is stored in rq->context.watchdog.timeout_us.

v2:
* Log expiration.

v3:
* Include more information about user timeline in the log message.

v4:
* Remove obsolete comment and fix formatting. (Matt)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-6-tvrtko.ursulin@linux.intel.com


# 38b237ea 23-Mar-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Individual request cancellation

Currently, we cancel outstanding requests within a context when the
context is closed. We may also want to cancel individual requests using
the same graceful preemption mechanism.

v2 (Tvrtko):
* Cancel waiters carefully considering no timeline lock and RCU.
* Fixed selftests.

v3 (Tvrtko):
* Remove error propagation to waiters for now.

v4 (Tvrtko):
* Rebase for extracted i915_request_active_engine. (Matt)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
[danvet: Resolve conflict because intel_engine_flush_scheduler is
still called intel_engine_flush_submission]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-3-tvrtko.ursulin@linux.intel.com


# 7dbc19da 23-Mar-2021 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Extract active lookup engine to a helper

Move active engine lookup to exported i915_request_active_engine.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
[danvet: Slight rebase, engine->sched.lock is still called
engine->active.lock.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-2-tvrtko.ursulin@linux.intel.com


# c10e4a79 01-Feb-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Protect against request freeing during cancellation on wedging

As soon as we mark a request as completed, it may be retired. So when
cancelling a request and marking it complete, make sure we first keep a
reference to the request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210201085715.27435-4-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 12ca695d 23-Mar-2021 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Do not share hwsp across contexts any more, v8.

Instead of sharing pages with breadcrumbs, give each timeline a
single page. This allows unrelated timelines not to share locks
any more during command submission.

As an additional benefit, seqno wraparound no longer requires
i915_vma_pin, which means we no longer need to worry about a
potential -EDEADLK at a point where we are ready to submit.

Changes since v1:
- Fix erroneous i915_vma_acquire that should be a i915_vma_release (ickle).
- Extra check for completion in intel_read_hwsp().
Changes since v2:
- Fix inconsistent indent in hwsp_alloc() (kbuild)
- memset entire cacheline to 0.
Changes since v3:
- Do same in intel_timeline_reset_seqno(), and clflush for good measure.
Changes since v4:
- Use refcounting on timeline, instead of relying on i915_active.
- Fix waiting on kernel requests.
Changes since v5:
- Bump amount of slots to maximum (256), for best wraparounds.
- Add hwsp_offset to i915_request to fix potential wraparound hang.
- Ensure timeline wrap test works with the changes.
- Assign hwsp in intel_timeline_read_hwsp() within the rcu lock to
fix a hang.
Changes since v6:
- Rename i915_request_active_offset to i915_request_active_seqno(),
and elaborate the function. (tvrtko)
Changes since v7:
- Move hunk to where it belongs. (jekstrand)
- Replace CACHELINE_BYTES with TIMELINE_SEQNO_BYTES. (jekstrand)

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> #v1
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-2-maarten.lankhorst@linux.intel.com


# 9736387a 14-Jan-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reduce test_and_set_bit to set_bit in i915_request_submit()

Avoid the full blown memory barrier of test_and_set_bit() by noting the
completed request and removing it from the lists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210114135612.13210-5-chris@chris-wilson.co.uk


# b2fe00bb 14-Jan-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop i915_request.lock serialisation around await_start

Originally, we used the signal->lock as a means of following the
previous link in its timeline and peeking at the previous fence.
However, we have replaced the explicit serialisation with a series of
very careful probes that anticipate the links being deleted and the
fences recycled before we are able to acquire a strong reference to it.
We do not need the signal->lock crutch anymore, nor want the contention.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210114135612.13210-2-chris@chris-wilson.co.uk


# 163433e5 14-Jan-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark up protected uses of 'i915_request_completed'

When we know that we are inside the timeline mutex, or inside the
submission flow (under active.lock or the holder's rcu lock), we know
that the rq->hwsp is stable and we can use the simpler direct version.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210114135612.13210-1-chris@chris-wilson.co.uk


# baa7c2cd 09-Jan-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor marking a request as EIO

When wedging the device, we cancel all outstanding requests and mark
them as EIO. Rather than duplicate the small function to do so between
each submission backend, export one.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210109163455.28466-3-chris@chris-wilson.co.uk


# 4e5c8a99 31-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop i915_request.lock requirement for intel_rps_boost()

Since we use a flag within i915_request.flags to indicate when we have
boosted the request (so that we only apply the boost) once, this can be
used as the serialisation with i915_request_retire() to avoid having to
explicitly take the i915_request.lock which is more heavily contended.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201231093149.19086-1-chris@chris-wilson.co.uk


# 9c080b0f 31-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Pull context closure check from request submit to schedule-in

We only need to evaluate the current status of the context when it is
scheduled in, we will force a reschedule when the context is closed
propagating the change to inflight contexts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201231093946.11649-1-chris@chris-wilson.co.uk


# 7904e081 30-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Cancel submitted requests upon context reset

Since we process schedule-in of a context after submitting the request,
if we decide to reset the context at that time, we also have to cancel
the requets we have marked for submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201230220028.17089-1-chris@chris-wilson.co.uk


# 16f2941a 24-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Replace direct submit with direct call to tasklet

Rather than having special case code for opportunistically calling
process_csb() and performing a direct submit while holding the engine
spinlock for submitting the request, simply call the tasklet directly.
This allows us to retain the direct submission path, including the CS
draining to allow fast/immediate submissions, without requiring any
duplicated code paths, and most importantly greatly simplifying the
control flow by removing reentrancy. This will enable us to close a few
races in the virtual engines in the next few patches.

The trickiest part here is to ensure that paired operations (such as
schedule_in/schedule_out) remain under consistent locking domains,
e.g. when pulled outside of the engine->active.lock

v2: Use bh kicking, see commit 3c53776e29f8 ("Mark HI and TASKLET
softirq synchronous").
v3: Update engine-reset to be tasklet aware

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201224135544.1713-1-chris@chris-wilson.co.uk


# 5ec17c76 20-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Another tweak for flushing the tasklets

tasklet_kill() ensures that we _yield_ the processor until a remote
tasklet is completed. However, this leads to a starvation condition as
being at the bottom of the scheduler's runqueue means that anything else
is able to run, including all hogs keeping the tasklet occupied.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201220134858.10510-1-chris@chris-wilson.co.uk


# 45233ab2 16-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Move gen8 CS emitters into gen8_engine_cs.h

Reduce the pollution of intel_engine.h by moving gen8_emit_pipe_control
and friends to gen8_engine_cs.h

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201216135452.6063-1-chris@chris-wilson.co.uk


# b8e2bd98 26-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Decouple completed requests on unwind

Since the introduction of preempt-to-busy, requests can complete in the
background, even while they are not on the engine->active.requests list.
As such, the engine->active.request list itself is not in strict
retirement order, and we have to scan the entire list while unwinding to
not miss any. However, if the request is completed we currently leave it
on the list [until retirement], but we could just as simply remove it
and stop treating it as active. We would only have to then traverse it
once while unwinding in quick succession.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201126140407.31952-1-chris@chris-wilson.co.uk


# 562675d0 19-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Update request status flags for debug pretty-printer

We plan to expand upon the number of available statuses for when we
pretty-print the requests along the timelines, and so need a new set of
flags. We have settled upon:

Unready [U]
- initial status after being submitted, the request is not
ready for execution as it is waiting for external fences

Ready [R]
- all fences the request was waiting on have been signaled,
and the request is now ready for execution and will be
in a backend queue

- a ready request may still need to wait on semaphores
[internal fences]

Ready/virtual [V]
- same as ready, but queued over multiple backends

Executing [E]
- the request has been transferred from the backend queue and
submitted for execution on HW

- a completed request may still be regarded as executing, its
status may not be updated until it is retired and removed
from the lists

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-3-chris@chris-wilson.co.uk


# 1f0e785a 19-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Lift i915_request_show()

Extract i915_request_show for reuse in other request chain pretty
printers.

For a bonus point, quietly change the seqno format from %llx to %lld to
match everywhere else.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-2-chris@chris-wilson.co.uk


# 7a9f50a0 15-Jun-2020 Peter Zijlstra <peterz@infradead.org>

irq_work: Cleanup

Get rid of the __call_single_node union and clean up the API a little
to avoid external code relying on the structure layout as much.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>


# 476c5818 29-Aug-2020 Peter Zijlstra <peterz@infradead.org>

llist: Add nonatomic __llist_add() and __llist_dell_all()

We'll use these in the new, lockless kretprobes code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/159870619318.1229682.13027387548510028721.stgit@devnote2


# 7d442ea7 28-Sep-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Cancel outstanding work after disabling heartbeats on an engine

We only allow persistent requests to remain on the GPU past the closure
of their containing context (and process) so long as they are continuously
checked for hangs or allow other requests to preempt them, as we need to
ensure forward progress of the system. If we allow persistent contexts
to remain on the system after the the hangcheck mechanism is disabled,
the system may grind to a halt. On disabling the mechanism, we sent a
pulse along the engine to remove all executing contexts from the engine
which would check for hung contexts -- but we did not prevent those
contexts from being resubmitted if they survived the final hangcheck.

Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs")
Testcase: igt/gem_ctx_persistence/heartbeat-stop
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200928221510.26044-1-chris@chris-wilson.co.uk
(cherry picked from commit 7a991cd3e3da9a56d5616b62d425db000a3242f2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 5701a66e 25-Sep-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Redo "Remove i915_request.lock requirement for execution callbacks"

The reordering and rebasing of commit 2e4c6c1a9db5 ("drm/i915: Remove
i915_request.lock requirement for execution callbacks") caused it to
revert an earlier correction. Let us restore commit 99f0a640d464
("drm/i915: Remove requirement for holding i915_request.lock for
breadcrumbs")

Fixes: 2e4c6c1a9db5 ("drm/i915: Remove i915_request.lock requirement for execution callbacks")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200925101107.27869-1-chris@chris-wilson.co.uk
(cherry picked from commit 35faeb7de9ef83da510a048f2016061f1e31d5fc)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 7a991cd3 28-Sep-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Cancel outstanding work after disabling heartbeats on an engine

We only allow persistent requests to remain on the GPU past the closure
of their containing context (and process) so long as they are continuously
checked for hangs or allow other requests to preempt them, as we need to
ensure forward progress of the system. If we allow persistent contexts
to remain on the system after the the hangcheck mechanism is disabled,
the system may grind to a halt. On disabling the mechanism, we sent a
pulse along the engine to remove all executing contexts from the engine
which would check for hung contexts -- but we did not prevent those
contexts from being resubmitted if they survived the final hangcheck.

Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs")
Testcase: igt/gem_ctx_persistence/heartbeat-stop
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200928221510.26044-1-chris@chris-wilson.co.uk


# 35faeb7d 25-Sep-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Redo "Remove i915_request.lock requirement for execution callbacks"

The reordering and rebasing of commit 2e4c6c1a9db5 ("drm/i915: Remove
i915_request.lock requirement for execution callbacks") caused it to
revert an earlier correction. Let us restore commit 99f0a640d464
("drm/i915: Remove requirement for holding i915_request.lock for
breadcrumbs")

Fixes: 2e4c6c1a9db5 ("drm/i915: Remove i915_request.lock requirement for execution callbacks")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200925101107.27869-1-chris@chris-wilson.co.uk


# b82a8b93 16-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Be wary of data races when reading the active execlists

To implement preempt-to-busy (and so efficient timeslicing and best utilization
of the hardware submission ports) we let the GPU run asynchronously in respect
to the ELSP submission queue. This created challenges in keeping and accessing
the driver state mirroring the asynchronous GPU execution.

The latest occurence of this was spotted by KCSAN:

[ 1413.563200] BUG: KCSAN: data-race in __await_execution+0x217/0x370 [i915]
[ 1413.563221]
[ 1413.563236] race at unknown origin, with read to 0xffff88885bb6c478 of 8 bytes by task 9654 on cpu 1:
[ 1413.563548] __await_execution+0x217/0x370 [i915]
[ 1413.563891] i915_request_await_dma_fence+0x4eb/0x6a0 [i915]
[ 1413.564235] i915_request_await_object+0x421/0x490 [i915]
[ 1413.564577] i915_gem_do_execbuffer+0x29b7/0x3c40 [i915]
[ 1413.564967] i915_gem_execbuffer2_ioctl+0x22f/0x5c0 [i915]
[ 1413.564998] drm_ioctl_kernel+0x156/0x1b0
[ 1413.565022] drm_ioctl+0x2ff/0x480
[ 1413.565046] __x64_sys_ioctl+0x87/0xd0
[ 1413.565069] do_syscall_64+0x4d/0x80
[ 1413.565094] entry_SYSCALL_64_after_hwframe+0x44/0xa9

To complicate matters, we have to both avoid the read tearing of *active and
avoid any write tearing as perform the pending[] -> inflight[] promotion of the
execlists.

This is because we cannot rely on the memcpy doing u64 aligned copies on all
kernels/platforms and so we opt to open-code it with explicit WRITE_ONCE
annotations to satisfy KCSAN.

v2: When in doubt, write the same comment again.
v3: Expanded commit message.

Fixes: b55230e5e800 ("drm/i915: Check for awaits on still currently executing requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
[Joonas: Added expanded commit message from Tvrtko and Chris]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit b4d9145b0154f8c71dafc2db5fd445f1f3db9426)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 2e4c6c1a 16-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove i915_request.lock requirement for execution callbacks

To implement preempt-to-busy (and so efficient timeslicing and best utilization
of the hardware submission ports) we let the GPU run asynchronously in respect
to the ELSP submission queue. This created challenges in keeping and accessing
the driver state mirroring the asynchronous GPU execution.

Previous fix 1d9221e9d395 ("drm/i915: Skip signaling a signaled request")
however did not correctly serialize request retirement with the execution
callbacks.

We were using the i915_request.lock to serialise adding an execution callback
with __i915_request_submit. However, if we use an atomic llist_add to serialise
multiple waiters and then check to see if the request is already executing, we
can remove the irq-spinlock and fix serialization between retirement and
execution callbacks in one go.

v2: Avoid using the irq_work when outside of the irq-spinlocks, where we
can execute the callbacks immediately.
v3: Pay close attention to the order of setting ACTIVE on retirement, we
need to ensure the request is signaled and breadcrumbs detached before
we finish removing the request from the engine.
v4: Expanded commit message.

Fixes: 1d9221e9d395 ("drm/i915: Skip signaling a signaled request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
[Joonas: Added expanded commit message from Tvrtko and Chris]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# b4d9145b 16-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Be wary of data races when reading the active execlists

To implement preempt-to-busy (and so efficient timeslicing and best utilization
of the hardware submission ports) we let the GPU run asynchronously in respect
to the ELSP submission queue. This created challenges in keeping and accessing
the driver state mirroring the asynchronous GPU execution.

The latest occurence of this was spotted by KCSAN:

[ 1413.563200] BUG: KCSAN: data-race in __await_execution+0x217/0x370 [i915]
[ 1413.563221]
[ 1413.563236] race at unknown origin, with read to 0xffff88885bb6c478 of 8 bytes by task 9654 on cpu 1:
[ 1413.563548] __await_execution+0x217/0x370 [i915]
[ 1413.563891] i915_request_await_dma_fence+0x4eb/0x6a0 [i915]
[ 1413.564235] i915_request_await_object+0x421/0x490 [i915]
[ 1413.564577] i915_gem_do_execbuffer+0x29b7/0x3c40 [i915]
[ 1413.564967] i915_gem_execbuffer2_ioctl+0x22f/0x5c0 [i915]
[ 1413.564998] drm_ioctl_kernel+0x156/0x1b0
[ 1413.565022] drm_ioctl+0x2ff/0x480
[ 1413.565046] __x64_sys_ioctl+0x87/0xd0
[ 1413.565069] do_syscall_64+0x4d/0x80
[ 1413.565094] entry_SYSCALL_64_after_hwframe+0x44/0xa9

To complicate matters, we have to both avoid the read tearing of *active and
avoid any write tearing as perform the pending[] -> inflight[] promotion of the
execlists.

This is because we cannot rely on the memcpy doing u64 aligned copies on all
kernels/platforms and so we opt to open-code it with explicit WRITE_ONCE
annotations to satisfy KCSAN.

v2: When in doubt, write the same comment again.
v3: Expanded commit message.

Fixes: b55230e5e800 ("drm/i915: Check for awaits on still currently executing requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
[Joonas: Added expanded commit message from Tvrtko and Chris]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# e2300560 01-Aug-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Hold context/request reference while breadcrumbs are active

Currently we hold no actual reference to the request nor context while
they are attached to a breadcrumb. To avoid freeing the request/context
too early, we serialise with cancel-breadcrumbs by taking the irq
spinlock in i915_request_retire(). The alternative is to take a
reference for a new breadcrumb and release it upon signaling; removing
the more frequently hit contention point in i915_request_retire().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200801160225.6814-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# b3786b29 31-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs

On the virtual engines, we only use the intel_breadcrumbs for tracking
signaling of stale breadcrumbs from the irq_workers. They do not have
any associated interrupt handling, active requests are passed to a
physical engine and associated breadcrumb interrupt handler. This causes
issues for us as we need to ensure that we do not actually try and
enable interrupts and the powermanagement required for them on the
virtual engine, as they will never be disabled. Instead, let's
specify the physical engine used for interrupt handler on a particular
breadcrumb.

v2: Drop b->irq_armed = true mocking for no interrupt HW

Fixes: 4fe6abb8f513 ("drm/i915/gt: Ignore irq enabling on the virtual engines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-4-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 2854d866 31-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs

After staring at the breadcrumb enabling/cancellation and coming to the
conclusion that the cause of the mysterious stale breadcrumbs must the
act of submitting a completed requests, we can then redirect those
completed requests onto a dedicated signaled_list at the time of
construction and so eliminate intel_engine_transfer_stale_breadcrumbs().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-2-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# c18636f7 31-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs

Since the breadcrumb enabling/cancelling itself is serialised by the
breadcrumbs.irq_lock, with a bit of care we can remove the outer
serialisation with i915_request.lock for concurrent
dma_fence_enable_signaling(). This has the important side-effect of
eliminating the nested i915_request.lock within request submission.

The challenge in serialisation is around the unsubmission where we take
an active request that wants a breadcrumb on the signaling engine and
put it to sleep. We do not want a concurrent
dma_fence_enable_signaling() to attach a breadcrumb as we unsubmit, so
we must mark the request as no longer active before serialising with the
concurrent enable-signaling.

On retire, we serialise with the concurrent enable-signaling, but
instead of clearing ACTIVE, we mark it as SIGNALED.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200731154834.8378-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Joonas: Rebased and reordered into drm-intel-gt-next branch]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 27a5dcfe 28-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Remove disordered per-file request list for throttling

I915_GEM_THROTTLE dates back to the time before contexts where there was
just a single engine, and therefore a single timeline and request list
globally. That request list was in execution/retirement order, and so
walking it to find a particular aged request made sense and could be
split per file.

That is no more. We now have many timelines with a file, as many as the
user wants to construct (essentially per-engine, per-context). Each of
those run independently and so make the single list futile. Remove the
disordered list, and iterate over all the timelines to find a request to
wait on in each to satisfy the criteria that the CPU is no more than 20ms
ahead of its oldest request.

It should go without saying that the I915_GEM_THROTTLE ioctl is no
longer used as the primary means of throttling, so it makes sense to push
the complication into the ioctl where it only impacts upon its few
irregular users, rather than the execbuf/retire where everybody has to
pay the cost. Fortunately, the few users do not create vast amount of
contexts, so the loops over contexts/engines should be concise.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728152010.30701-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 3adee4ac 14-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Soften the tasklet flush frequency before waits

We include a tasklet flush before waiting on a request as a precaution
against the HW being lax in event signaling. We now have a precautionary
flush in the engine's heartbeat and so do not need to be quite so
zealous on every request wait. If we focus on the request, the only
tasklet flush that matters is if there is a delay in submitting this
request to HW, so if the request is not ready to be executed, no
advantage in reducing this wait can be gained by running the tasklet.
And there is little point in doing busy work for no result.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200715115147.11866-10-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 3f6a6f34 16-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reduce i915_request.lock contention for i915_request_wait

Currently, we use i915_request_completed() directly in
i915_request_wait() and follow up with a manual invocation of
dma_fence_signal(). This appears to cause a large number of contentions
on i915_request.lock as when the process is woken up after the fence is
signaled by an interrupt, we will then try and call dma_fence_signal()
ourselves while the signaler is still holding the lock.
dma_fence_is_signaled() has the benefit of checking the
DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so
avoids most of that contention.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716100754.5670-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 1840d40a 28-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove gen check before calling intel_rps_boost

It's been a while since gen6_rps_boost() [that only worked on gen6+] was
replaced by intel_rps_boost() that understood itself when rps was
active. Since the intel_rps_boost() is gen-agnostic, just call it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728152219.1387-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 1d9221e9 13-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip signaling a signaled request

Preempt-to-busy introduces various fascinating complications in that the
requests may complete as we are unsubmitting them from HW. As they may
then signal after unsubmission, we may find ourselves having to cleanup
the signaling request from within the signaling callback. This causes us
to recurse onto the same i915_request.lock.

However, if the request is already signaled (as it will be before we
enter the signal callbacks), we know we can skip the signaling of that
request during submission, neatly evading the spinlock recursion.

unsubmit(ve.rq0) # timeslice expiration or other preemption
-> virtual_submit_request(ve.rq0)
dma_fence_signal(ve.rq0) # request completed before preemption ack
-> submit_notify(ve.rq1)
-> virtual_submit_request(ve.rq1) # sees that we have completed ve.rq0
-> __i915_request_submit(ve.rq0)

[ 264.210142] BUG: spinlock recursion on CPU#2, sample_multi_tr/2093
[ 264.210150] lock: 0xffff9efd6ac55080, .magic: dead4ead, .owner: sample_multi_tr/2093, .owner_cpu: 2
[ 264.210155] CPU: 2 PID: 2093 Comm: sample_multi_tr Tainted: G U
[ 264.210158] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X212.B01.1909060036 09/06/2019
[ 264.210160] Call Trace:
[ 264.210167] dump_stack+0x98/0xda
[ 264.210174] spin_dump.cold+0x24/0x3c
[ 264.210178] do_raw_spin_lock+0x9a/0xd0
[ 264.210184] _raw_spin_lock_nested+0x6a/0x70
[ 264.210314] __i915_request_submit+0x10a/0x3c0 [i915]
[ 264.210415] virtual_submit_request+0x9b/0x380 [i915]
[ 264.210516] submit_notify+0xaf/0x14c [i915]
[ 264.210602] __i915_sw_fence_complete+0x8a/0x230 [i915]
[ 264.210692] i915_sw_fence_complete+0x2d/0x40 [i915]
[ 264.210762] __dma_i915_sw_fence_wake+0x19/0x30 [i915]
[ 264.210767] dma_fence_signal_locked+0xb1/0x1c0
[ 264.210772] dma_fence_signal+0x29/0x50
[ 264.210871] i915_request_wait+0x5cb/0x830 [i915]
[ 264.210876] ? dma_resv_get_fences_rcu+0x294/0x5d0
[ 264.210974] i915_gem_object_wait_fence+0x2f/0x40 [i915]
[ 264.211084] i915_gem_object_wait+0xce/0x400 [i915]
[ 264.211178] i915_gem_wait_ioctl+0xff/0x290 [i915]

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
References: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: "Nayana, Venkata Ramana" <venkata.ramana.nayana@intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200713141636.29326-1-chris@chris-wilson.co.uk


# 5a833995 02-Jun-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop i915_request.i915 backpointer

We infrequently use the direct i915 backpointer from the i915_request,
so do we really need to waste the space in the struct for it? 8 bytes
from the most frequently allocated struct vs an 3 bytes and pointer
chasing in using rq->engine->i915?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200602220953.21178-1-chris@chris-wilson.co.uk


# 0e386959 29-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Check for awaits on still currently executing requests

With the advent of preempt-to-busy, a request may still be on the GPU as
we unwind. And in the case of a unpreemptible [due to HW] request, that
request will remain indefinitely on the GPU even though we have
returned it back to our submission queue, and cleared the active bit.

We only run the execution callbacks on transferring the request from our
submission queue to the execution queue, but if this is a bonded request
that the HW is waiting for, we will not submit it (as we wait for a
fresh execution) even though it is still being executed.

As we know that there are always preemption points between requests, we
know that only the currently executing request may be still active even
though we have cleared the flag. However, we do not precisely know which
request is in ELSP[0] due to a delay in processing events, and
furthermore we only store the last request in a context in our state
tracker.

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Testcase: igt/gem_exec_balancer/bonded-dual
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200529143926.3245-1-chris@chris-wilson.co.uk
(cherry picked from commit b55230e5e800868961fc271b26d9ce53ae1f691e)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 631a6582 26-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Do not schedule normal requests immediately along virtual

When we push a virtual request onto the HW, we update the rq->engine to
point to the physical engine. A request that is then submitted by the
user that waits upon the virtual engine, but along the physical engine
in use, will then see that it is due to be submitted to the same engine
and take a shortcut (and be queued without waiting for the completion
fence). However, the virtual request may be preempted (either by higher
priority users, or by timeslicing) and removed from the physical engine
to be migrated over to one of its siblings. The dependent normal request
however is oblivious to the removal of the virtual request and remains
queued to execute on HW, believing that once it reaches the head of its
queue all of its predecessors will have completed executing!

v2: Beware restriction of signal->execution_mask prior to submission.

Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Testcase: igt/gem_exec_balancer/sliced
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.3+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200526090753.11329-2-chris@chris-wilson.co.uk
(cherry picked from commit 511b6d9aed417739b6aa49d0b6b4354ad21020f1)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# dd873dd5 26-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reorder await_execution before await_request

Reorder the code so that we can reuse the await_execution from a special
case in await_request in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200526090753.11329-1-chris@chris-wilson.co.uk
(cherry picked from commit ffb0c600c240103f6f34e07892a7e0a75502b243)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# b55230e5 29-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Check for awaits on still currently executing requests

With the advent of preempt-to-busy, a request may still be on the GPU as
we unwind. And in the case of a unpreemptible [due to HW] request, that
request will remain indefinitely on the GPU even though we have
returned it back to our submission queue, and cleared the active bit.

We only run the execution callbacks on transferring the request from our
submission queue to the execution queue, but if this is a bonded request
that the HW is waiting for, we will not submit it (as we wait for a
fresh execution) even though it is still being executed.

As we know that there are always preemption points between requests, we
know that only the currently executing request may be still active even
though we have cleared the flag. However, we do not precisely know which
request is in ELSP[0] due to a delay in processing events, and
furthermore we only store the last request in a context in our state
tracker.

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Testcase: igt/gem_exec_balancer/bonded-dual
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200529143926.3245-1-chris@chris-wilson.co.uk


# 98b7067a 29-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add a few asserts around handling of i915_request_is_active()

Let's assert that we only call the execute callbacks on making the
request active, and that we do not execute the request without calling
the callbacks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200529085809.23691-1-chris@chris-wilson.co.uk


# 511b6d9a 26-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Do not schedule normal requests immediately along virtual

When we push a virtual request onto the HW, we update the rq->engine to
point to the physical engine. A request that is then submitted by the
user that waits upon the virtual engine, but along the physical engine
in use, will then see that it is due to be submitted to the same engine
and take a shortcut (and be queued without waiting for the completion
fence). However, the virtual request may be preempted (either by higher
priority users, or by timeslicing) and removed from the physical engine
to be migrated over to one of its siblings. The dependent normal request
however is oblivious to the removal of the virtual request and remains
queued to execute on HW, believing that once it reaches the head of its
queue all of its predecessors will have completed executing!

v2: Beware restriction of signal->execution_mask prior to submission.

Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Testcase: igt/gem_exec_balancer/sliced
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.3+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200526090753.11329-2-chris@chris-wilson.co.uk


# ffb0c600 26-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reorder await_execution before await_request

Reorder the code so that we can reuse the await_execution from a special
case in await_request in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200526090753.11329-1-chris@chris-wilson.co.uk


# fc0e1270 25-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Improve execute_cb struct packing

Reduce the irq_work llist for attaching the callbacks to the signal for
both smaller structs (two fewer pointers!) and simpler [debug] code:

Function old new delta
irq_execute_cb 35 34 -1
__igt_breadcrumbs_smoketest 1684 1682 -2
i915_request_retire 2003 1996 -7
__i915_request_create 1047 1040 -7
__notify_execute_cb 135 126 -9
__i915_request_ctor 188 178 -10
__await_execution.part.constprop 451 440 -11
igt_wait_request 924 714 -210

One minor artifact is that the order of cb exection is reversed. No
current use cases are affected by that change.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200526112051.10229-1-chris@chris-wilson.co.uk


# ef29440b 21-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid using rq->engine after free during i915_fence_release

In order to be valid to dereference during the i915_fence_release, after
retiring the fence and releasing its refererences, we assume that
rq->engine can only be a real engine (that stay intact until the device
is shutdown after all fences have been flushed). However, due to a quirk
of preempt-to-busy, we may retire a request that still belongs to a
virtual engine and so eventually free it with rq->engine being invalid.
To avoid dereferencing that invalid engine, we look at the
execution_mask which if it indicates it may be executed on more than one
engine, we know it originated on a virtual engine and may still be on
one.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1906
Fixes: 43acd6516ca9 ("drm/i915: Keep a per-engine request pool")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200521140617.30015-2-chris@chris-wilson.co.uk
(cherry picked from commit 32a4605b38c30689a6a18f3f4c7d3133ac9d3277)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 32a4605b 21-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid using rq->engine after free during i915_fence_release

In order to be valid to dereference during the i915_fence_release, after
retiring the fence and releasing its refererences, we assume that
rq->engine can only be a real engine (that stay intact until the device
is shutdown after all fences have been flushed). However, due to a quirk
of preempt-to-busy, we may retire a request that still belongs to a
virtual engine and so eventually free it with rq->engine being invalid.
To avoid dereferencing that invalid engine, we look at the
execution_mask which if it indicates it may be executed on more than one
engine, we know it originated on a virtual engine and may still be on
one.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1906
Fixes: 43acd6516ca9 ("drm/i915: Keep a per-engine request pool")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200521140617.30015-2-chris@chris-wilson.co.uk


# 18e4af04 13-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop no-semaphore boosting

Now that we have fast timeslicing on semaphores, we no longer need to
prioritise none-semaphore work as we will yield any work blocked on a
semaphore to the next in the queue. Previously with no timeslicing,
blocking on the semaphore caused extremely bad scheduling with multiple
clients utilising multiple rings. Now, there is no impact and we can
remove the complication.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200513173504.28322-1-chris@chris-wilson.co.uk


# 795d4d7f 13-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark the addition of the initial-breadcrumb in the request

The initial-breadcrumb is used to mark the end of the awaiting and the
beginning of the user payload. We verify that we do not start the user
payload before all signaler are completed, checking our semaphore setup
by looking for the initial breadcrumb being written too early. We also
want to ensure that we do not add semaphore waits after we have already
closed the semaphore section, an issue for later deferred waits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200513165937.9508-2-chris@chris-wilson.co.uk


# a9d094dc 07-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark concurrent submissions with a weak-dependency

We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could
correctly perform priority inheritance from the parallel branches to the
common trunk. However, for the purpose of timeslicing and reset
handling, the dependency is weak -- as we the pair of requests are
allowed to run in parallel and not in strict succession.

The real significance though is that this allows us to rearrange
groups of WAIT_FOR_SUBMIT linked requests along the single engine, and
so can resolve user level inter-batch scheduling dependencies from user
semaphores.

Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences")
Testcase: igt/gem_exec_fence/submit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507155109.8892-1-chris@chris-wilson.co.uk
(cherry picked from commit 6b6cd2ebd8d071e55998e32b648bb8081f7f02bb)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# bc850943 06-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Propagate error from completed fences

We need to preserve fatal errors from fences that are being terminated
as we hook them up.

Fixes: ef4688497512 ("drm/i915: Propagate fence errors")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200506162136.3325-1-chris@chris-wilson.co.uk
(cherry picked from commit 24fe5f2ab2478053d50a3bc629ada895903a5cbc)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 16dc224f 09-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Replace the hardcoded I915_FENCE_TIMEOUT

Expose the hardcoded timeout for unsignaled foreign fences as a Kconfig
option, primarily to allow brave systems to disable the timeout and
solely rely on correct signaling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200509105021.12542-1-chris@chris-wilson.co.uk


# fcae4961 08-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prevent using semaphores to chain up to external fences

The downside of using semaphores is that we lose metadata passing
along the signaling chain. This is particularly nasty when we
need to pass along a fatal error such as EFAULT or EDEADLK. For
fatal errors we want to scrub the request before it is executed,
which means that we cannot preload the request onto HW and have
it wait upon a semaphore.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200508092933.738-3-chris@chris-wilson.co.uk


# 3136deb7 08-May-2020 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915: Peel dma-fence-chains for await

To allow faster engine to engine synchronization, peel the layer of
dma-fence-chain to expose potential i915 fences so that the
i915_request code can emit HW semaphore wait/signal operations in the
ring which is faster than waking up the host to submit unblocked
workloads after interrupt notification.

This is similar to the peeling we do for e.g. dma_fence_array.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200508185448.29709-1-chris@chris-wilson.co.uk


# ac938052 08-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pull waiting on an external dma-fence into its routine

As a means for a small code consolidation, but primarily to start
thinking more carefully about internal-vs-external linkage, pull the
pair of i915_sw_fence_await_dma_fence() calls into a common routine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200508092933.738-2-chris@chris-wilson.co.uk


# 2045d666 08-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Ignore submit-fences on the same timeline

While we ordinarily do not skip submit-fences due to the accompanying
hook that we want to callback on execution, a submit-fence on the same
timeline is meaningless.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200508092933.738-1-chris@chris-wilson.co.uk


# eec39e44 07-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove wait priority boosting

Upon waiting a request (when asked), we gave that request a small
priority boost, not enough for it to cause preemption, but enough for it
to be scheduled next before all equals. We also used that bit to give
new clients a small priority boost, similar to FQ_CODEL, such that we
favoured short interactive tasks ahead of long running streams.

However, this is causing lots of complications with timeslicing where we
both want to honour the boost and yet ignore it. Those complications
cause unexpected user behaviour (tasks not being timesliced and run
concurrently as epxected), and the easiest way to resolve that is to
remove the boost. Hopefully, we can find a compromise again if we need
to, but in theory timeslicing itself and future more advanced schedulers
should give us the interactivity boost we seek.

Testcase: igt/gem_exec_schedule/lateslice
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507152338.7452-3-chris@chris-wilson.co.uk


# 6b6cd2eb 07-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark concurrent submissions with a weak-dependency

We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could
correctly perform priority inheritance from the parallel branches to the
common trunk. However, for the purpose of timeslicing and reset
handling, the dependency is weak -- as we the pair of requests are
allowed to run in parallel and not in strict succession.

The real significance though is that this allows us to rearrange
groups of WAIT_FOR_SUBMIT linked requests along the single engine, and
so can resolve user level inter-batch scheduling dependencies from user
semaphores.

Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences")
Testcase: igt/gem_exec_fence/submit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507155109.8892-1-chris@chris-wilson.co.uk


# 24fe5f2a 06-May-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Propagate error from completed fences

We need to preserve fatal errors from fences that are being terminated
as we hook them up.

Fixes: ef4688497512 ("drm/i915: Propagate fence errors")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200506162136.3325-1-chris@chris-wilson.co.uk


# 43acd651 02-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Keep a per-engine request pool

Add a tiny per-engine request mempool so that we should always have a
request available for powermanagement allocations from tricky
contexts. This reserve is expected to be only used for kernel
contexts when barriers must be emitted [almost] without fail.

The main consumer for this reserved request is expected to be engine-pm,
for which we know that there will always be at least the previous pm
request that we can reuse under mempressure (so there should always be
a spare request for engine_park()).

This is an alternative to using a comparatively bulky mempool, which
requires custom handling for both our reserved allocation requirement
and to protect our TYPESAFE_BY_RCU slab cache. The advantage of mempool
would be that it would allow us to keep a larger per-engine request
pool. However, converting over to mempool is straightforward should the
need arise.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-and-tested-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200402184037.21630-1-chris@chris-wilson.co.uk


# 41e4065a 23-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rely on direct submission to the queue

Drop the pretense of kicking the tasklet (used only for the defunct guc
submission backend, it should just take ownership of the submit!) and so
remove the bh-kicking from around submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200323092841.22240-5-chris@chris-wilson.co.uk


# 14a0d527 10-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Defer semaphore priority bumping to a workqueue

Since the semaphore fence may be signaled from inside an interrupt
handler from inside a request holding its request->lock, we cannot then
enter into the engine->active.lock for processing the semaphore priority
bump as we may traverse our call tree and end up on another held
request.

CPU 0:
[ 2243.218864] _raw_spin_lock_irqsave+0x9a/0xb0
[ 2243.218867] i915_schedule_bump_priority+0x49/0x80 [i915]
[ 2243.218869] semaphore_notify+0x6d/0x98 [i915]
[ 2243.218871] __i915_sw_fence_complete+0x61/0x420 [i915]
[ 2243.218874] ? kmem_cache_free+0x211/0x290
[ 2243.218876] i915_sw_fence_complete+0x58/0x80 [i915]
[ 2243.218879] dma_i915_sw_fence_wake+0x3e/0x80 [i915]
[ 2243.218881] signal_irq_work+0x571/0x690 [i915]
[ 2243.218883] irq_work_run_list+0xd7/0x120
[ 2243.218885] irq_work_run+0x1d/0x50
[ 2243.218887] smp_irq_work_interrupt+0x21/0x30
[ 2243.218889] irq_work_interrupt+0xf/0x20

CPU 1:
[ 2242.173107] _raw_spin_lock+0x8f/0xa0
[ 2242.173110] __i915_request_submit+0x64/0x4a0 [i915]
[ 2242.173112] __execlists_submission_tasklet+0x8ee/0x2120 [i915]
[ 2242.173114] ? i915_sched_lookup_priolist+0x1e3/0x2b0 [i915]
[ 2242.173117] execlists_submit_request+0x2e8/0x2f0 [i915]
[ 2242.173119] submit_notify+0x8f/0xc0 [i915]
[ 2242.173121] __i915_sw_fence_complete+0x61/0x420 [i915]
[ 2242.173124] ? _raw_spin_unlock_irqrestore+0x39/0x40
[ 2242.173137] i915_sw_fence_complete+0x58/0x80 [i915]
[ 2242.173140] i915_sw_fence_commit+0x16/0x20 [i915]

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1318
Fixes: b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of busywaits")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.2+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200310101720.9944-1-chris@chris-wilson.co.uk
(cherry picked from commit 209df10bb4536c81c2540df96c02cd079435357f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# c951b0af 05-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Return early for await_start on same timeline

Requests within a timeline are ordered by that timeline, so awaiting for
the start of a request within the timeline is a no-op. This used to work
by falling out of the mutex_trylock() as the signaler and waiter had the
same timeline and not returning an error.

Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305134822.2750496-1-chris@chris-wilson.co.uk
(cherry picked from commit ab7a69020fb5d5c7ba19fba60f62fd6f9ca9f779)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# c67b35d9 05-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Actually emit the await_start

Fix the inverted test to emit the wait on the end of the previous
request if we /haven't/ already.

Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305104210.2619967-1-chris@chris-wilson.co.uk
(cherry picked from commit 07e9c59d63df6a1c44c1975c01827ba18b69270a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 326611dd 10-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark up racy read of active rq->engine

As a virtual engine may change the rq->engine to point to the active
request in flight, we need to warn the compiler that an active request's
engine is volatile.

[ 95.017686] write (marked) to 0xffff8881e8386b10 of 8 bytes by interrupt on cpu 2:
[ 95.018123] execlists_dequeue+0x762/0x2150 [i915]
[ 95.018539] __execlists_submission_tasklet+0x48/0x60 [i915]
[ 95.018955] execlists_submission_tasklet+0xd3/0x170 [i915]
[ 95.018986] tasklet_action_common.isra.0+0x42/0xa0
[ 95.019016] __do_softirq+0xd7/0x2cd
[ 95.019043] irq_exit+0xbe/0xe0
[ 95.019068] irq_work_interrupt+0xf/0x20
[ 95.019491] i915_request_retire+0x2c5/0x670 [i915]
[ 95.019937] retire_requests+0xa1/0xf0 [i915]
[ 95.020348] engine_retire+0xa1/0xe0 [i915]
[ 95.020376] process_one_work+0x3b1/0x690
[ 95.020403] worker_thread+0x80/0x670
[ 95.020429] kthread+0x19a/0x1e0
[ 95.020454] ret_from_fork+0x1f/0x30
[ 95.020476]
[ 95.020498] read to 0xffff8881e8386b10 of 8 bytes by task 8909 on cpu 3:
[ 95.020918] __i915_request_commit+0x177/0x220 [i915]
[ 95.021329] i915_gem_do_execbuffer+0x38c4/0x4e50 [i915]
[ 95.021750] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915]
[ 95.021784] drm_ioctl_kernel+0xe4/0x120
[ 95.021809] drm_ioctl+0x297/0x4c7
[ 95.021832] ksys_ioctl+0x89/0xb0
[ 95.021865] __x64_sys_ioctl+0x42/0x60
[ 95.021901] do_syscall_64+0x6e/0x2c0
[ 95.021927] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200310142403.5953-1-chris@chris-wilson.co.uk


# 209df10b 10-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Defer semaphore priority bumping to a workqueue

Since the semaphore fence may be signaled from inside an interrupt
handler from inside a request holding its request->lock, we cannot then
enter into the engine->active.lock for processing the semaphore priority
bump as we may traverse our call tree and end up on another held
request.

CPU 0:
[ 2243.218864] _raw_spin_lock_irqsave+0x9a/0xb0
[ 2243.218867] i915_schedule_bump_priority+0x49/0x80 [i915]
[ 2243.218869] semaphore_notify+0x6d/0x98 [i915]
[ 2243.218871] __i915_sw_fence_complete+0x61/0x420 [i915]
[ 2243.218874] ? kmem_cache_free+0x211/0x290
[ 2243.218876] i915_sw_fence_complete+0x58/0x80 [i915]
[ 2243.218879] dma_i915_sw_fence_wake+0x3e/0x80 [i915]
[ 2243.218881] signal_irq_work+0x571/0x690 [i915]
[ 2243.218883] irq_work_run_list+0xd7/0x120
[ 2243.218885] irq_work_run+0x1d/0x50
[ 2243.218887] smp_irq_work_interrupt+0x21/0x30
[ 2243.218889] irq_work_interrupt+0xf/0x20

CPU 1:
[ 2242.173107] _raw_spin_lock+0x8f/0xa0
[ 2242.173110] __i915_request_submit+0x64/0x4a0 [i915]
[ 2242.173112] __execlists_submission_tasklet+0x8ee/0x2120 [i915]
[ 2242.173114] ? i915_sched_lookup_priolist+0x1e3/0x2b0 [i915]
[ 2242.173117] execlists_submit_request+0x2e8/0x2f0 [i915]
[ 2242.173119] submit_notify+0x8f/0xc0 [i915]
[ 2242.173121] __i915_sw_fence_complete+0x61/0x420 [i915]
[ 2242.173124] ? _raw_spin_unlock_irqrestore+0x39/0x40
[ 2242.173137] i915_sw_fence_complete+0x58/0x80 [i915]
[ 2242.173140] i915_sw_fence_commit+0x16/0x20 [i915]

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1318
Fixes: b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of busywaits")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.2+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200310101720.9944-1-chris@chris-wilson.co.uk


# 798fa870 06-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Improve the start alignment of bonded pairs

Always wait on the start of the signaler request to reduce the problem
of dequeueing the bonded pair too early -- we want both payloads to
start at the same time, with no latency, and yet still allow others to
make full use of the slack in the system. This reduce the amount of time
we spend waiting on the semaphore used to synchronise the start of the
bonded payload.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200306133852.3420322-3-chris@chris-wilson.co.uk


# 60900add 09-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark racy read of intel_engine_cs.saturated

[ 3783.276728] BUG: KCSAN: data-race in __i915_request_submit [i915] / i915_request_await_dma_fence [i915]
[ 3783.276766]
[ 3783.276787] write to 0xffff8881f1bc60a0 of 1 bytes by interrupt on cpu 2:
[ 3783.277187] __i915_request_submit+0x47e/0x4a0 [i915]
[ 3783.277580] __execlists_submission_tasklet+0x997/0x2780 [i915]
[ 3783.277973] execlists_submission_tasklet+0xd3/0x170 [i915]
[ 3783.278006] tasklet_action_common.isra.0+0x42/0xa0
[ 3783.278035] __do_softirq+0xd7/0x2cd
[ 3783.278063] irq_exit+0xbe/0xe0
[ 3783.278089] do_IRQ+0x51/0x100
[ 3783.278114] ret_from_intr+0x0/0x1c
[ 3783.278140] finish_task_switch+0x72/0x260
[ 3783.278170] __schedule+0x1e5/0x510
[ 3783.278198] schedule+0x45/0xb0
[ 3783.278226] smpboot_thread_fn+0x23e/0x300
[ 3783.278256] kthread+0x19a/0x1e0
[ 3783.278283] ret_from_fork+0x1f/0x30
[ 3783.278305]
[ 3783.278327] read to 0xffff8881f1bc60a0 of 1 bytes by task 19440 on cpu 3:
[ 3783.278724] i915_request_await_dma_fence+0x2a6/0x530 [i915]
[ 3783.279130] i915_request_await_object+0x2fe/0x470 [i915]
[ 3783.279524] i915_gem_do_execbuffer+0x45dc/0x4c20 [i915]
[ 3783.279908] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915]
[ 3783.279940] drm_ioctl_kernel+0xe4/0x120
[ 3783.279968] drm_ioctl+0x297/0x4c7
[ 3783.279996] ksys_ioctl+0x89/0xb0
[ 3783.280021] __x64_sys_ioctl+0x42/0x60
[ 3783.280047] do_syscall_64+0x6e/0x2c0
[ 3783.280074] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200309132726.28358-1-chris@chris-wilson.co.uk


# dff2a11b 06-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do not poison i915_request.link on removal

Do not poison the timeline link on the i915_request to allow both
forward/backward list traversal under RCU.

[ 9759.139229] RIP: 0010:active_request+0x2a/0x90 [i915]
[ 9759.139240] Code: 41 56 41 55 41 54 55 48 89 fd 53 48 89 f3 48 83 c5 60 e8 49 de dc e0 48 8b 83 e8 01 00 00 48 39 c5 74 12 48 8d 90 20 fe ff ff <48> 8b 80 50 fe ff ff a8 01 74 11 e8 66 20 dd e0 48 89 d8 5b 5d 41
[ 9759.139251] RSP: 0018:ffffc9000014ce80 EFLAGS: 00010012
[ 9759.139260] RAX: dead000000000122 RBX: ffff888817cac040 RCX: 0000000000022000
[ 9759.139267] RDX: deacffffffffff42 RSI: ffff888817cac040 RDI: ffff888851fee900
[ 9759.139275] RBP: ffff888851fee960 R08: 000000000000001a R09: ffffffffa04702e0
[ 9759.139282] R10: ffffffff82187ea0 R11: 0000000000000002 R12: 0000000000000004
[ 9759.139289] R13: ffffffffa04d5179 R14: ffff8887f994ae40 R15: ffff888857b9a068
[ 9759.139296] FS: 0000000000000000(0000) GS:ffff88885ed80000(0000) knlGS:0000000000000000
[ 9759.139304] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9759.139311] CR2: 00007fff5bdec000 CR3: 00000008534fe001 CR4: 00000000001606e0
[ 9759.139318] Call Trace:
[ 9759.139325] <IRQ>
[ 9759.139389] execlists_reset+0x14d/0x310 [i915]
[ 9759.139400] ? _raw_spin_unlock_irqrestore+0xf/0x30
[ 9759.139445] ? fwtable_read32+0x90/0x230 [i915]
[ 9759.139499] execlists_submission_tasklet+0xf6/0x150 [i915]
[ 9759.139508] tasklet_action_common.isra.17+0x32/0xa0
[ 9759.139516] __do_softirq+0x114/0x3dc
[ 9759.139525] ? handle_irq_event_percpu+0x59/0x70
[ 9759.139533] irq_exit+0xa1/0xc0
[ 9759.139540] do_IRQ+0x76/0x150
[ 9759.139547] common_interrupt+0xf/0xf
[ 9759.139554] </IRQ>

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200306140115.3495686-1-chris@chris-wilson.co.uk


# 1eaa251b 06-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Assert requests within a context are submitted in order

Check the flow of requests into the hardware to verify that are
submitted in order along their timeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200306071614.2846708-1-chris@chris-wilson.co.uk


# ab7a6902 05-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Return early for await_start on same timeline

Requests within a timeline are ordered by that timeline, so awaiting for
the start of a request within the timeline is a no-op. This used to work
by falling out of the mutex_trylock() as the signaler and waiter had the
same timeline and not returning an error.

Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305134822.2750496-1-chris@chris-wilson.co.uk


# 07e9c59d 05-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Actually emit the await_start

Fix the inverted test to emit the wait on the end of the previous
request if we /haven't/ already.

Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200305104210.2619967-1-chris@chris-wilson.co.uk


# 36e191f0 03-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Apply i915_request_skip() on submission

Trying to use i915_request_skip() prior to i915_request_add() causes us
to try and fill the ring upto request->postfix, which has not yet been
set, and so may cause us to memset() past the end of the ring.

Instead of skipping the request immediately, just flag the error on the
request (only accepting the first fatal error we see) and then clear the
request upon submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200304121849.2448028-1-chris@chris-wilson.co.uk


# 61231f6b 03-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Check that the context wasn't closed during setup

As setup takes a long time, the user may close the context during the
construction of the execbuf. In order to make sure we correctly track
all outstanding work with non-persistent contexts, we need to serialise
the submission with the context closure and mop up any leaks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200303080546.1140508-3-chris@chris-wilson.co.uk


# 0b1570b7 27-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Protect i915_request_await_start from early waits

We need to be extremely careful inside i915_request_await_start() as it
needs to walk the list of requests in the foreign timeline with very
little protection. As we hold our own timeline mutex, we can not nest
inside the signaler's timeline mutex, so all that remains is our RCU
protection. However, to be safe we need to tell the compiler that we may
be traversing the list only under RCU protection, and furthermore we
need to start declaring requests as elements of the timeline from their
construction.

Fixes: 9ddc8ec027a3 ("drm/i915: Eliminate the trylock for awaiting an earlier request")
Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-11-chris@chris-wilson.co.uk
(cherry picked from commit d22d2d073ef859b346bc32cb25299262e3973769)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 062444bb 28-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Expose busywait duration to sysfs

We busywait on an inflight request (one that is currently executing on
HW, and so might complete quickly) prior to setting up an interrupt and
sleeping. The trade off is that we keep an expensive CPU core busy in
order to avoid wake up latency: where that trade off should lie is best
left to the sysadmin.

The busywait mechanism can be compiled out with

./scripts/config --set-val DRM_I915_SPIN_REQUEST 0

The maximum busywait duration can be adjusted per-engine using,

/sys/class/drm/card?/engine/*/ms_busywait_duration_ns

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Steve Carbonari <steven.carbonari@intel.com>
Tested-by: Steve Carbonari <steven.carbonari@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200228131716.3243616-4-chris@chris-wilson.co.uk


# d22d2d07 27-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Protect i915_request_await_start from early waits

We need to be extremely careful inside i915_request_await_start() as it
needs to walk the list of requests in the foreign timeline with very
little protection. As we hold our own timeline mutex, we can not nest
inside the signaler's timeline mutex, so all that remains is our RCU
protection. However, to be safe we need to tell the compiler that we may
be traversing the list only under RCU protection, and furthermore we
need to start declaring requests as elements of the timeline from their
construction.

Fixes: 9ddc8ec027a3 ("drm/i915: Eliminate the trylock for awaiting an earlier request")
Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-11-chris@chris-wilson.co.uk


# df6b1f3d 20-Feb-2020 Matthew Auld <matthew.auld@intel.com>

drm/i915: remove the other slab_dependencies

The real one can be found in i915_scheduler.c.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200220105707.344522-1-matthew.auld@intel.com


# c01e8da2 03-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Initialise basic fence before acquiring seqno

Inside the intel_timeline_get_seqno(), we currently track the retirement
of the old cachelines by listening to the new request. This requires
that the new request is ready to be used and so requires a minimum bit
of initialisation prior to getting the new seqno.

Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200203094152.4150550-2-chris@chris-wilson.co.uk
(cherry picked from commit 855e39e65cfc33a73724f1cc644ffc5754864a20)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 2aaaa5ee 22-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark the removal of the i915_request from the sched.link

Keep the rq->fence.flags consistent with the status of the
rq->sched.link, and clear the associated bits when decoupling the link
on retirement (as we may wish to inspect those flags independent of
other state).

Fixes: c3f1ed90e6ff ("drm/i915/gt: Allow temporary suspension of inflight requests")
References: https://gitlab.freedesktop.org/drm/intel/issues/997
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-3-chris@chris-wilson.co.uk
(cherry picked from commit b4a9a149f91ea345da76bcfe3f8a39715ac346a6)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 9e2750fc 16-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Keep track of request among the scheduling lists

If we keep track of when the i915_request.sched.link is on the HW
runlist, or in the priority queue we can simplify our interactions with
the request (such as during rescheduling). This also simplifies the next
patch where we introduce a new in-between list, for requests that are
ready but neither on the run list or in the queue.

v2: Update i915_sched_node.link explanation for current usage where it
is a link on both the queue and on the runlists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk
(cherry picked from commit 672c368f9398042b629740cc9816e8e939eff2db)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 89dd019a 11-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Poison rings after use

On retiring the request, we should not re-use these elements in the ring
(at least not until we fill the ringbuffer and knowingly reuse the space).
Leave behind some poison to (hopefully) trap ourselves if we make a
mistake.

Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200211205615.1190127-1-chris@chris-wilson.co.uk


# f16ccb64 10-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Disable use of hwsp_cacheline for kernel_context

Currently on execlists, we use a local hwsp for the kernel_context,
rather than the engine's HWSP, as this is the default for execlists.
However, seqno wrap requires allocating a new HWSP cacheline, and may
require pinning a new HWSP page in the GGTT. This operation requiring
pinning in the GGTT is not allowed within the kernel_context timeline,
as doing so may require re-entering the kernel_context in order to evict
from the GGTT. As we want to avoid requiring a new HWSP for the
kernel_context, we can use the permanently pinned engine's HWSP instead.
However to do so we must prevent the use of semaphores reading the
kernel_context's HWSP, as the use of semaphores do not support rollover
onto the same cacheline. Fortunately, the kernel_context is mostly
isolated, so unlikely to give benefit to semaphores.

Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200210205722.794180-5-chris@chris-wilson.co.uk


# 602ddb41 05-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush execution tasklets before checking request status

Rather than flushing the submission tasklets just before we sleep, flush
before we check the request status. Ideally this gives us a moment to
process the tasklets after sleeping just before we timeout.

v2: Compromise by pushing the flush prior to the timeout, but after the
check on completion so that we do not further delay the ready client.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200205095441.1769599-1-chris@chris-wilson.co.uk


# 855e39e6 03-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Initialise basic fence before acquiring seqno

Inside the intel_timeline_get_seqno(), we currently track the retirement
of the old cachelines by listening to the new request. This requires
that the new request is ready to be used and so requires a minimum bit
of initialisation prior to getting the new seqno.

Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200203094152.4150550-2-chris@chris-wilson.co.uk


# b4a9a149 22-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark the removal of the i915_request from the sched.link

Keep the rq->fence.flags consistent with the status of the
rq->sched.link, and clear the associated bits when decoupling the link
on retirement (as we may wish to inspect those flags independent of
other state).

Fixes: 32ff621fd744 ("drm/i915/gt: Allow temporary suspension of inflight requests")
References: https://gitlab.freedesktop.org/drm/intel/issues/997
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-3-chris@chris-wilson.co.uk


# 672c368f 16-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Keep track of request among the scheduling lists

If we keep track of when the i915_request.sched.link is on the HW
runlist, or in the priority queue we can simplify our interactions with
the request (such as during rescheduling). This also simplifies the next
patch where we introduce a new in-between list, for requests that are
ready but neither on the run list or in the queue.

v2: Update i915_sched_node.link explanation for current usage where it
is a link on both the queue and on the runlists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk


# e1c31fb5 06-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Merge i915_request.flags with i915_request.fence.flags

As we already have a flags field buried within i915_request, reuse it!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200106114234.2529613-3-chris@chris-wilson.co.uk


# 6a8679c0 22-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark the GEM context link as RCU protected

The only protection for intel_context.gem_cotext is granted by RCU, so
annotate it as a rcu protected pointer and carefully dereference it in
the few occasions we need to use it.

Fixes: 9f3ccd40acf4 ("drm/i915: Drop GEM context as a direct link from i915_request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191222233558.2201901-1-chris@chris-wilson.co.uk


# e6ba7648 21-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove i915->kernel_context

Allocate only an internal intel_context for the kernel_context, forgoing
a global GEM context for internal use as we only require a separate
address space (for our own protection).

Now having weaned GT from requiring ce->gem_context, we can stop
referencing it entirely. This also means we no longer have to create random
and unnecessary GEM contexts for internal use.

GEM contexts are now entirely for tracking GEM clients, and intel_context
the execution environment on the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191221160324.1073045-1-chris@chris-wilson.co.uk


# 0f100b70 20-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Push the use-semaphore marker onto the intel_context

Instead of rummaging through the intel_context to peek at the GEM
context in the middle of request submission to decide whether to use
semaphores, store that information on the intel_context itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-2-chris@chris-wilson.co.uk


# 9f3ccd40 20-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop GEM context as a direct link from i915_request

Keep the intel_context as being the primary state for i915_request, with
the GEM context a backpointer from the low level state for the rarer
cases we need client information. Our goal is to remove such references
to clients from the backend, and leave the HW submission agnostic to
client interfaces and self-contained.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk


# 54400257 17-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Remove direct invocation of breadcrumb signaling

Only signal the breadcrumbs from inside the irq_work, simplifying our
interface and calling conventions. The micro-optimisation here is that
by always using the irq_work interface, we know we are always inside an
irq-off critical section for the breadcrumb signaling and can ellide
save/restore of the irq flags.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191217095642.3124521-7-chris@chris-wilson.co.uk


# 85bedbf1 16-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Eliminate the trylock for reading a timeline's hwsp

As we stash a pointer to the HWSP cacheline on the request, when reading
it we only need confirm that the cacheline is still valid by checking
that the request and timeline are still intact.

v2: Protect hwsp_cachline with RCU

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191217011659.3092130-1-chris@chris-wilson.co.uk


# 9ddc8ec0 16-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Eliminate the trylock for awaiting an earlier request

We currently use an error-prone mutex_trylock to grab another timeline
to find an earlier request along it. However, with a bit of a
sleight-of-hand, we can reduce the mutex_trylock to a spin_lock on the
immediate request and careful pointer chasing to acquire a reference on
the previous request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191216165317.2742896-1-chris@chris-wilson.co.uk


# 99de9536 10-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Copy across scheduler behaviour flags across submit fences

We want the bonded request to have the same scheduler properties as its
master so that it is placed at the same depth in the queue. For example,
consider we have requests A, B and B', where B & B' are a bonded pair to
run in parallel on two engines.

A -> B
\- B'

B will run after A and so may be scheduled on an idle engine and wait on
A using a semaphore. B' sees B being executed and so enters the queue on
the same engine as A. As B' did not inherit the semaphore-chain from B,
it may have higher precedence than A and so preempts execution. However,
B' then sits on a semaphore waiting for B, who is waiting for A, who is
blocked by B.

Ergo B' needs to inherit the scheduler properties from B (i.e. the
semaphore chain) so that it is scheduled with the same priority as B and
will not be executed ahead of Bs dependencies.

Furthermore, to prevent the priorities changing via the expose fence on
B', we need to couple in the dependencies for PI. This requires us to
relax our sanity-checks that dependencies are strictly in order.

v2: Synchronise (B, B') execution on all platforms, regardless of using
a scheduler, any no-op syncs should be elided.

Fixes: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/464
Testcase: igt/gem_exec_balancer/bonded-chain
Testcase: igt/gem_exec_balancer/bonded-semaphore
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191210151332.3902215-1-chris@chris-wilson.co.uk
(cherry picked from commit c81471f5e95c79c55687282ff6800f112b5d560b)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# f1925f33 13-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use EAGAIN for trylock failures

While not good behaviour, it is, however, established behaviour that we
can punt EAGAIN to userspace if we need to retry the ioctl. When trying
to acquire a mutex, prefer to use EAGAIN to propagate losing the race
so that if it does end up back in userspace, we try again.

Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/800
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191213160347.1789004-1-chris@chris-wilson.co.uk


# 639f2f24 13-Dec-2019 Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>

drm/i915: Introduce new macros for tracing

New macros ENGINE_TRACE(), CE_TRACE(), RQ_TRACE() and
GT_TRACE() are introduce to tag device name and engine
name with contexts and requests tracing in i915.

Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191213155152.69182-2-venkata.s.dhanalakota@intel.com


# 65c29dbb 11-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use the i915_device name for identifying our request fences

Use the dev_name(i915) to identify the requests for debugging, so we can
tell different device timelines apart.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Reviewed-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191211150204.133471-1-chris@chris-wilson.co.uk


# c81471f5 10-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Copy across scheduler behaviour flags across submit fences

We want the bonded request to have the same scheduler properties as its
master so that it is placed at the same depth in the queue. For example,
consider we have requests A, B and B', where B & B' are a bonded pair to
run in parallel on two engines.

A -> B
\- B'

B will run after A and so may be scheduled on an idle engine and wait on
A using a semaphore. B' sees B being executed and so enters the queue on
the same engine as A. As B' did not inherit the semaphore-chain from B,
it may have higher precedence than A and so preempts execution. However,
B' then sits on a semaphore waiting for B, who is waiting for A, who is
blocked by B.

Ergo B' needs to inherit the scheduler properties from B (i.e. the
semaphore chain) so that it is scheduled with the same priority as B and
will not be executed ahead of Bs dependencies.

Furthermore, to prevent the priorities changing via the expose fence on
B', we need to couple in the dependencies for PI. This requires us to
relax our sanity-checks that dependencies are strictly in order.

v2: Synchronise (B, B') execution on all platforms, regardless of using
a scheduler, any no-op syncs should be elided.

Fixes: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/464
Testcase: igt/gem_exec_balancer/bonded-chain
Testcase: igt/gem_exec_balancer/bonded-semaphore
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191210151332.3902215-1-chris@chris-wilson.co.uk


# 9e31c1fe 06-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Propagate errors on awaiting already signaled fences

If we see an already signaled fence that we want to await on, we skip
adding to the i915_sw_fence. However, we should pay attention to whether
there was an error on that fence and if so propagate it for our future
request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191206160428.1503343-2-chris@chris-wilson.co.uk


# 67a3acaa 22-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use a ctor for TYPESAFE_BY_RCU i915_request

As we start peeking into requests for longer and longer, e.g.
incorporating use of spinlocks when only protected by an
rcu_read_lock(), we need to be careful in how we reset the request when
recycling and need to preserve any barriers that may still be in use as
the request is reset for reuse.

Quoting Linus Torvalds:

> If there is refcounting going on then why use SLAB_TYPESAFE_BY_RCU?

.. because the object can be accessed (by RCU) after the refcount has
gone down to zero, and the thing has been released.

That's the whole and only point of SLAB_TYPESAFE_BY_RCU.

That flag basically says:

"I may end up accessing this object *after* it has been free'd,
because there may be RCU lookups in flight"

This has nothing to do with constructors. It's ok if the object gets
reused as an object of the same type and does *not* get
re-initialized, because we're perfectly fine seeing old stale data.

What it guarantees is that the slab isn't shared with any other kind
of object, _and_ that the underlying pages are free'd after an RCU
quiescent period (so the pages aren't shared with another kind of
object either during an RCU walk).

And it doesn't necessarily have to have a constructor, because the
thing that a RCU walk will care about is

(a) guaranteed to be an object that *has* been on some RCU list (so
it's not a "new" object)

(b) the RCU walk needs to have logic to verify that it's still the
*same* object and hasn't been re-used as something else.

In contrast, a SLAB_TYPESAFE_BY_RCU memory gets free'd and re-used
immediately, but because it gets reused as the same kind of object,
the RCU walker can "know" what parts have meaning for re-use, in a way
it couidn't if the re-use was random.

That said, it *is* subtle, and people should be careful.

> So the re-use might initialize the fields lazily, not necessarily using a ctor.

If you have a well-defined refcount, and use "atomic_inc_not_zero()"
to guard the speculative RCU access section, and use
"atomic_dec_and_test()" in the freeing section, then you should be
safe wrt new allocations.

If you have a completely new allocation that has "random stale
content", you know that it cannot be on the RCU list, so there is no
speculative access that can ever see that random content.

So the only case you need to worry about is a re-use allocation, and
you know that the refcount will start out as zero even if you don't
have a constructor.

So you can think of the refcount itself as always having a zero
constructor, *BUT* you need to be careful with ordering.

In particular, whoever does the allocation needs to then set the
refcount to a non-zero value *after* it has initialized all the other
fields. And in particular, it needs to make sure that it uses the
proper memory ordering to do so.

NOTE! One thing to be very worried about is that re-initializing
whatever RCU lists means that now the RCU walker may be walking on the
wrong list so the walker may do the right thing for this particular
entry, but it may miss walking *other* entries. So then you can get
spurious lookup failures, because the RCU walker never walked all the
way to the end of the right list. That ends up being a much more
subtle bug.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191122094924.629690-1-chris@chris-wilson.co.uk


# 3e7abf81 24-Oct-2019 Andi Shyti <andi@etezian.org>

drm/i915: Extract GT render power state management

i915_irq.c is large. One reason for this is that has a large chunk of
the GT render power management stashed away in it. Extract that logic
out of i915_irq.c and intel_pm.c and put it under one roof.

Based on a patch by Chris Wilson.

Signed-off-by: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191024211642.7688-1-chris@chris-wilson.co.uk


# babaab2f 25-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Encapsulate kconfig constant values inside boolean predicates

Avoid angering clang and smatch by using a constant value in a '&&' test,
by forcing that constant value into a boolean.

E.g.,
drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c:159:13: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand]
if (!delay && CONFIG_DRM_I915_PREEMPT_TIMEOUT) {
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191025135943.12524-1-chris@chris-wilson.co.uk


# 2871ea85 24-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Split intel_ring_submission

Split the legacy submission backend from the common CS ring buffer
handling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191024100344.5041-1-chris@chris-wilson.co.uk


# 1dfffa00 17-Oct-2019 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

drm/i915: Don't disable interrupts independently of the lock

The locks (active.lock and rq->lock) need to be taken with disabled
interrupts. This is done in i915_request_retire() by disabling the
interrupts independently of the locks itself.
While local_irq_disable()+spin_lock() equals spin_lock_irq() on vanilla
it does not on PREEMPT_RT.
Chris Wilson confirmed that local_irq_disable() was just introduced as
an optimisation to avoid enabling/disabling interrupts during
lock/unlock combo.

Enable/disable interrupts as part of the locking instruction.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191017161352.e5z3ugse7gxl5ari@linutronix.de


# 19306502 15-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush tasklet submission before sleeping on i915_request_wait

If the system is being slow and userspace is racing ahead of the GPU and
finds itself waiting for the GPU to catch up, before the process sleeps
give the tasklet a kick, bypassing ksoftirqd. If the system is
overloaded, then ksoftirqd may be delayed incurring additional latency
to our user.

This should not be a frequent problem, but in the past we have observed
several hundred millisecond delays before ksoftirqd services an
interrupt, so burn a few cycles to lend a helping hand.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015132606.14349-1-chris@chris-wilson.co.uk


# 89b6d183 13-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Tweak virtual unsubmission

Since commit e2144503bf3b ("drm/i915: Prevent bonded requests from
overtaking each other on preemption") we have restricted requests to run
on their chosen engine across preemption events. We can take this
restriction into account to know that we will want to resubmit those
requests onto the same physical engine, and so can shortcircuit the
virtual engine selection process and keep the request on the same
engine during unwind.

References: e2144503bf3b ("drm/i915: Prevent bonded requests from overtaking each other on preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Ramlingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191013203012.25208-1-chris@chris-wilson.co.uk


# a8385f0c 22-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only enqueue already completed requests

If we are asked to submit a completed request, just move it onto the
active-list without modifying it's payload. If we try to emit the
modified payload of a completed request, we risk racing with the
ring->head update during retirement which may advance the head past our
breadcrumb and so we generate a warning for the emission being behind
the RING_HEAD.

v2: Commentary for the sneaky, shared responsibility between functions.
v3: Spelling mistakes and bonus assertion

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-3-chris@chris-wilson.co.uk
(cherry picked from commit c0bb487dc19fc45dbeede7dcf8f513df51a3cd33)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 5facae4f 18-Sep-2019 Qian Cai <cai@lca.pw>

locking/lockdep: Remove unused @nested argument from lock_release()

Since the following commit:

b4adfe8e05f1 ("locking/lockdep: Remove unused argument in __lock_release")

@nested is no longer used in lock_release(), so remove it from all
lock_release() calls and friends.

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: alexander.levin@microsoft.com
Cc: daniel@iogearbox.net
Cc: davem@davemloft.net
Cc: dri-devel@lists.freedesktop.org
Cc: duyuyang@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: hannes@cmpxchg.org
Cc: intel-gfx@lists.freedesktop.org
Cc: jack@suse.com
Cc: jlbec@evilplan.or
Cc: joonas.lahtinen@linux.intel.com
Cc: joseph.qi@linux.alibaba.com
Cc: jslaby@suse.com
Cc: juri.lelli@redhat.com
Cc: maarten.lankhorst@linux.intel.com
Cc: mark@fasheh.com
Cc: mhocko@kernel.org
Cc: mripard@kernel.org
Cc: ocfs2-devel@oss.oracle.com
Cc: rodrigo.vivi@intel.com
Cc: sean@poorly.run
Cc: st@kernel.org
Cc: tj@kernel.org
Cc: tytso@mit.edu
Cc: vdavydov.dev@gmail.com
Cc: vincent.guittot@linaro.org
Cc: viro@zeniv.linux.org.uk
Link: https://lkml.kernel.org/r/1568909380-32199-1-git-send-email-cai@lca.pw
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# abf5cdcf 18-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Verify the engine after acquiring the active.lock

When using virtual engines, the rq->engine is not stable until we hold
the engine->active.lock (as the virtual engine may be exchanged with the
sibling). Since commit 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
we may retire a request concurrently with resubmitting it to HW, we need
to be extra careful to verify we are holding the correct lock for the
request's active list. This is similar to the issue we saw with
rescheduling the virtual requests, see sched_lock_engine().

Or else:

<4> [876.736126] list_add corruption. prev->next should be next (ffff8883f931a1f8), but was dead000000000100. (prev=ffff888361ffa610).
<4> [876.736136] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
<4> [876.736137] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915]
<4> [876.736154] CPU: 2 PID: 21 Comm: ksoftirqd/2 Tainted: G U 5.3.0-CI-CI_DRM_6898+ #1
<4> [876.736156] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [876.736157] RIP: 0010:__list_add_valid+0x4d/0x70
<4> [876.736159] Code: c3 48 89 d1 48 c7 c7 20 33 0e 82 48 89 c2 e8 4a 4a bc ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 70 33 0e 82 e8 33 4a bc ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 c0 33 0e 82 e8
<4> [876.736160] RSP: 0018:ffffc9000018bd30 EFLAGS: 00010082
<4> [876.736162] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000104
<4> [876.736163] RDX: 0000000080000104 RSI: 0000000000000000 RDI: 00000000ffffffff
<4> [876.736164] RBP: ffffc9000018bd68 R08: 0000000000000000 R09: 0000000000000001
<4> [876.736165] R10: 00000000aed95de3 R11: 000000007fe927eb R12: ffff888361ffca10
<4> [876.736166] R13: ffff888361ffa610 R14: ffff888361ffc880 R15: ffff8883f931a1f8
<4> [876.736168] FS: 0000000000000000(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000
<4> [876.736169] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [876.736170] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0
<4> [876.736171] PKRU: 55555554
<4> [876.736172] Call Trace:
<4> [876.736226] __i915_request_submit+0x152/0x370 [i915]
<4> [876.736263] __execlists_submission_tasklet+0x6da/0x1f50 [i915]
<4> [876.736293] ? execlists_submission_tasklet+0x29/0x50 [i915]
<4> [876.736321] execlists_submission_tasklet+0x34/0x50 [i915]
<4> [876.736325] tasklet_action_common.isra.5+0x47/0xb0
<4> [876.736328] __do_softirq+0xd8/0x4ae
<4> [876.736332] ? smpboot_thread_fn+0x23/0x280
<4> [876.736334] ? smpboot_thread_fn+0x6b/0x280
<4> [876.736336] run_ksoftirqd+0x2b/0x50
<4> [876.736338] smpboot_thread_fn+0x1d3/0x280
<4> [876.736341] ? sort_range+0x20/0x20
<4> [876.736343] kthread+0x119/0x130
<4> [876.736345] ? kthread_park+0xa0/0xa0
<4> [876.736347] ret_from_fork+0x24/0x50
<4> [876.736353] irq event stamp: 2290145
<4> [876.736356] hardirqs last enabled at (2290144): [<ffffffff8123cde8>] __slab_free+0x3e8/0x500
<4> [876.736358] hardirqs last disabled at (2290145): [<ffffffff819cfb4d>] _raw_spin_lock_irqsave+0xd/0x50
<4> [876.736360] softirqs last enabled at (2290114): [<ffffffff81c0033e>] __do_softirq+0x33e/0x4ae
<4> [876.736361] softirqs last disabled at (2290119): [<ffffffff810b815b>] run_ksoftirqd+0x2b/0x50
<4> [876.736363] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
<4> [876.736364] ---[ end trace 3e58d6c7356c65bf ]---
<4> [876.736406] ------------[ cut here ]------------
<4> [876.736415] list_del corruption. prev->next should be ffff888361ffca10, but was ffff88840ac2c730
<4> [876.736421] WARNING: CPU: 2 PID: 5490 at lib/list_debug.c:53 __list_del_entry_valid+0x79/0x90
<4> [876.736422] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915]
<4> [876.736433] CPU: 2 PID: 5490 Comm: i915_selftest Tainted: G U W 5.3.0-CI-CI_DRM_6898+ #1
<4> [876.736435] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [876.736436] RIP: 0010:__list_del_entry_valid+0x79/0x90
<4> [876.736438] Code: 0b 31 c0 c3 48 89 fe 48 c7 c7 30 34 0e 82 e8 ae 49 bc ff 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 68 34 0e 82 e8 97 49 bc ff <0f> 0b 31 c0 c3 48 c7 c7 a8 34 0e 82 e8 86 49 bc ff 0f 0b 31 c0 c3
<4> [876.736439] RSP: 0018:ffffc900003ef758 EFLAGS: 00010086
<4> [876.736440] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000002
<4> [876.736442] RDX: 0000000080000002 RSI: 0000000000000000 RDI: 00000000ffffffff
<4> [876.736443] RBP: ffffc900003ef780 R08: 0000000000000000 R09: 0000000000000001
<4> [876.736444] R10: 000000001418e4b7 R11: 000000007f0ea93b R12: ffff888361ffcab8
<4> [876.736445] R13: ffff88843b6d0000 R14: 000000000000217c R15: 0000000000000001
<4> [876.736447] FS: 00007f4e6f255240(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000
<4> [876.736448] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [876.736449] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0
<4> [876.736450] PKRU: 55555554
<4> [876.736451] Call Trace:
<4> [876.736488] i915_request_retire+0x224/0x8e0 [i915]
<4> [876.736521] i915_request_create+0x4b/0x1b0 [i915]
<4> [876.736550] nop_virtual_engine+0x230/0x4d0 [i915]

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111695
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190918145453.8800-1-chris@chris-wilson.co.uk
(cherry picked from commit 37fa0de3c137d5f54f7e64f53495c9d501d42a4d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 66101975 04-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move request runtime management onto gt

Requests are run from the gt and are tided into the gt runtime power
management, so pull the runtime request management under gt/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-12-chris@chris-wilson.co.uk


# f33a8a51 04-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Merge wait_for_timelines with retire_request

wait_for_timelines is essentially the same loop as retiring requests
(with an extra timeout), so merge the two into one routine.

v2: i915_retire_requests_timeout and keep VT'd w/a as !interruptible

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-10-chris@chris-wilson.co.uk


# b1e3177b 04-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Coordinate i915_active with its own mutex

Forgo the struct_mutex serialisation for i915_active, and interpose its
own mutex handling for active/retire.

This is a multi-layered sleight-of-hand. First, we had to ensure that no
active/retire callbacks accidentally inverted the mutex ordering rules,
nor assumed that they were themselves serialised by struct_mutex. More
challenging though, is the rule over updating elements of the active
rbtree. Instead of the whole i915_active now being serialised by
struct_mutex, allocations/rotations of the tree are serialised by the
i915_active.mutex and individual nodes are serialised by the caller
using the i915_timeline.mutex (we need to use nested spinlocks to
interact with the dma_fence callback lists).

The pain point here is that instead of a single mutex around execbuf, we
now have to take a mutex for active tracker (one for each vma, context,
etc) and a couple of spinlocks for each fence update. The improvement in
fine grained locking allowing for multiple concurrent clients
(eventually!) should be worth it in typical loads.

v2: Add some comments that barely elucidate anything :(

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-6-chris@chris-wilson.co.uk


# c0bb487d 22-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only enqueue already completed requests

If we are asked to submit a completed request, just move it onto the
active-list without modifying it's payload. If we try to emit the
modified payload of a completed request, we risk racing with the
ring->head update during retirement which may advance the head past our
breadcrumb and so we generate a warning for the emission being behind
the RING_HEAD.

v2: Commentary for the sneaky, shared responsibility between functions.
v3: Spelling mistakes and bonus assertion

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-3-chris@chris-wilson.co.uk


# 6a79d848 18-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Lock signaler timeline while navigating

As we need to take a walk back along the signaler timeline to find the
fence before upon which we want to wait, we need to lock that timeline
to prevent it being modified as we walk. Similarly, we also need to
acquire a reference to the earlier fence while it still exists!

Though we lack the correct locking today, we are saved by the
overarching struct_mutex -- but that protection is being removed.

v2: Tvrtko made me realise I was being lax and using annotations to
ignore the AB-BA deadlock from the timeline overlap. As it would be
possible to construct a second request that was using a semaphore from the
same timeline as ourselves, we could quite easily end up in a situation
where we deadlocked in our mutex waits. Avoid that by using a trylock
and falling back to a normal dma-fence await if contended.

v3: Eek, the signal->timeline is volatile and must be carefully
dereferenced to ensure it is valid.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-2-chris@chris-wilson.co.uk


# d19d71fc 18-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark i915_request.timeline as a volatile, rcu pointer

The request->timeline is only valid until the request is retired (i.e.
before it is completed). Upon retiring the request, the context may be
unpinned and freed, and along with it the timeline may be freed. We
therefore need to be very careful when chasing rq->timeline that the
pointer does not disappear beneath us. The vast majority of users are in
a protected context, either during request construction or retirement,
where the timeline->mutex is held and the timeline cannot disappear. It
is those few off the beaten path (where we access a second timeline) that
need extra scrutiny -- to be added in the next patch after first adding
the warnings about dangerous access.

One complication, where we cannot use the timeline->mutex itself, is
during request submission onto hardware (under spinlocks). Here, we want
to check on the timeline to finalize the breadcrumb, and so we need to
impose a second rule to ensure that the request->timeline is indeed
valid. As we are submitting the request, it's context and timeline must
be pinned, as it will be used by the hardware. Since it is pinned, we
know the request->timeline must still be valid, and we cannot submit the
idle barrier until after we release the engine->active.lock, ergo while
submitting and holding that spinlock, a second thread cannot release the
timeline.

v2: Don't be lazy inside selftests; hold the timeline->mutex for as long
as we need it, and tidy up acquiring the timeline with a bit of
refactoring (i915_active_add_request)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-1-chris@chris-wilson.co.uk


# 37fa0de3 18-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Verify the engine after acquiring the active.lock

When using virtual engines, the rq->engine is not stable until we hold
the engine->active.lock (as the virtual engine may be exchanged with the
sibling). Since commit 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
we may retire a request concurrently with resubmitting it to HW, we need
to be extra careful to verify we are holding the correct lock for the
request's active list. This is similar to the issue we saw with
rescheduling the virtual requests, see sched_lock_engine().

Or else:

<4> [876.736126] list_add corruption. prev->next should be next (ffff8883f931a1f8), but was dead000000000100. (prev=ffff888361ffa610).
<4> [876.736136] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
<4> [876.736137] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915]
<4> [876.736154] CPU: 2 PID: 21 Comm: ksoftirqd/2 Tainted: G U 5.3.0-CI-CI_DRM_6898+ #1
<4> [876.736156] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [876.736157] RIP: 0010:__list_add_valid+0x4d/0x70
<4> [876.736159] Code: c3 48 89 d1 48 c7 c7 20 33 0e 82 48 89 c2 e8 4a 4a bc ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 70 33 0e 82 e8 33 4a bc ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 c0 33 0e 82 e8
<4> [876.736160] RSP: 0018:ffffc9000018bd30 EFLAGS: 00010082
<4> [876.736162] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000104
<4> [876.736163] RDX: 0000000080000104 RSI: 0000000000000000 RDI: 00000000ffffffff
<4> [876.736164] RBP: ffffc9000018bd68 R08: 0000000000000000 R09: 0000000000000001
<4> [876.736165] R10: 00000000aed95de3 R11: 000000007fe927eb R12: ffff888361ffca10
<4> [876.736166] R13: ffff888361ffa610 R14: ffff888361ffc880 R15: ffff8883f931a1f8
<4> [876.736168] FS: 0000000000000000(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000
<4> [876.736169] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [876.736170] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0
<4> [876.736171] PKRU: 55555554
<4> [876.736172] Call Trace:
<4> [876.736226] __i915_request_submit+0x152/0x370 [i915]
<4> [876.736263] __execlists_submission_tasklet+0x6da/0x1f50 [i915]
<4> [876.736293] ? execlists_submission_tasklet+0x29/0x50 [i915]
<4> [876.736321] execlists_submission_tasklet+0x34/0x50 [i915]
<4> [876.736325] tasklet_action_common.isra.5+0x47/0xb0
<4> [876.736328] __do_softirq+0xd8/0x4ae
<4> [876.736332] ? smpboot_thread_fn+0x23/0x280
<4> [876.736334] ? smpboot_thread_fn+0x6b/0x280
<4> [876.736336] run_ksoftirqd+0x2b/0x50
<4> [876.736338] smpboot_thread_fn+0x1d3/0x280
<4> [876.736341] ? sort_range+0x20/0x20
<4> [876.736343] kthread+0x119/0x130
<4> [876.736345] ? kthread_park+0xa0/0xa0
<4> [876.736347] ret_from_fork+0x24/0x50
<4> [876.736353] irq event stamp: 2290145
<4> [876.736356] hardirqs last enabled at (2290144): [<ffffffff8123cde8>] __slab_free+0x3e8/0x500
<4> [876.736358] hardirqs last disabled at (2290145): [<ffffffff819cfb4d>] _raw_spin_lock_irqsave+0xd/0x50
<4> [876.736360] softirqs last enabled at (2290114): [<ffffffff81c0033e>] __do_softirq+0x33e/0x4ae
<4> [876.736361] softirqs last disabled at (2290119): [<ffffffff810b815b>] run_ksoftirqd+0x2b/0x50
<4> [876.736363] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
<4> [876.736364] ---[ end trace 3e58d6c7356c65bf ]---
<4> [876.736406] ------------[ cut here ]------------
<4> [876.736415] list_del corruption. prev->next should be ffff888361ffca10, but was ffff88840ac2c730
<4> [876.736421] WARNING: CPU: 2 PID: 5490 at lib/list_debug.c:53 __list_del_entry_valid+0x79/0x90
<4> [876.736422] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915]
<4> [876.736433] CPU: 2 PID: 5490 Comm: i915_selftest Tainted: G U W 5.3.0-CI-CI_DRM_6898+ #1
<4> [876.736435] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [876.736436] RIP: 0010:__list_del_entry_valid+0x79/0x90
<4> [876.736438] Code: 0b 31 c0 c3 48 89 fe 48 c7 c7 30 34 0e 82 e8 ae 49 bc ff 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 68 34 0e 82 e8 97 49 bc ff <0f> 0b 31 c0 c3 48 c7 c7 a8 34 0e 82 e8 86 49 bc ff 0f 0b 31 c0 c3
<4> [876.736439] RSP: 0018:ffffc900003ef758 EFLAGS: 00010086
<4> [876.736440] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000002
<4> [876.736442] RDX: 0000000080000002 RSI: 0000000000000000 RDI: 00000000ffffffff
<4> [876.736443] RBP: ffffc900003ef780 R08: 0000000000000000 R09: 0000000000000001
<4> [876.736444] R10: 000000001418e4b7 R11: 000000007f0ea93b R12: ffff888361ffcab8
<4> [876.736445] R13: ffff88843b6d0000 R14: 000000000000217c R15: 0000000000000001
<4> [876.736447] FS: 00007f4e6f255240(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000
<4> [876.736448] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [876.736449] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0
<4> [876.736450] PKRU: 55555554
<4> [876.736451] Call Trace:
<4> [876.736488] i915_request_retire+0x224/0x8e0 [i915]
<4> [876.736521] i915_request_create+0x4b/0x1b0 [i915]
<4> [876.736550] nop_virtual_engine+0x230/0x4d0 [i915]

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111695
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190918145453.8800-1-chris@chris-wilson.co.uk


# c210e85b 17-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/tgl: Extend MI_SEMAPHORE_WAIT

On Tigerlake, MI_SEMAPHORE_WAIT grew an extra dword, so be sure to
update the length field and emit that extra parameter and any padding
noop as required.

v2: Define the token shift while we are adding the updated MI_SEMAPHORE_WAIT
v3: Use int instead of bool in the addition so that readers are not left
wondering about the intricacies of the C spec. Now they just have to
worry what the integer value of a boolean operation is...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190917123055.28965-1-chris@chris-wilson.co.uk


# ff36c5c4 23-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Hold irq-off for the entire fake lock period

Sadly lockdep records when the irqs are re-enabled and then marks up the
fake lock as being irq-unsafe. Our hand is forced and so we must mark up
the entire fake lock critical section as irq-off.

Hopefully this is the last tweak required!

v2: Not quite, we need to mark the timeline spinlock as irqsafe. That
was a genuine bug being hidden by the earlier lockdep splat.

Fixes: d67739268cf0 ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823132700.25286-2-chris@chris-wilson.co.uk
(cherry picked from commit 6dcb85a0ad990455ae7c596e3fc966ad9c1ba9c5)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 0f7dc620 26-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Protect our local workers against I915_FENCE_TIMEOUT

Trust our own workers to not cause unnecessary delays and disable the
automatic timeout on their asynchronous fence waits. (Along the same
lines that we trust our own requests to complete eventually, if
necessary by force.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190826072149.9447-6-chris@chris-wilson.co.uk


# 77715906 23-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Keep drm_i915_file_private around under RCU

Ensure that the drm_i915_file_private continues to exist as we attempt
to remove a request from its list, which may race with the destruction
of the file.

<6> [38.380714] [IGT] gem_ctx_create: starting subtest basic-files
<0> [42.201329] BUG: spinlock bad magic on CPU#0, kworker/u16:0/7
<4> [42.201356] general protection fault: 0000 [#1] PREEMPT SMP PTI
<4> [42.201371] CPU: 0 PID: 7 Comm: kworker/u16:0 Tainted: G U 5.3.0-rc5-CI-Patchwork_14169+ #1
<4> [42.201391] Hardware name: Dell Inc. OptiPlex 745 /0GW726, BIOS 2.3.1 05/21/2007
<4> [42.201594] Workqueue: i915 retire_work_handler [i915]
<4> [42.201614] RIP: 0010:spin_dump+0x5a/0x90
<4> [42.201625] Code: 00 48 8d 88 c0 06 00 00 48 c7 c7 00 71 09 82 e8 35 ef 00 00 48 85 db 44 8b 4d 08 41 b8 ff ff ff ff 48 c7 c1 0b cd 0f 82 74 0e <44> 8b 83 e0 04 00 00 48 8d 8b c0 06 00 00 8b 55 04 48 89 ee 48 c7
<4> [42.201660] RSP: 0018:ffffc9000004bd80 EFLAGS: 00010202
<4> [42.201673] RAX: 0000000000000031 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffff820fcd0b
<4> [42.201688] RDX: 0000000000000000 RSI: ffff88803de266f8 RDI: 00000000ffffffff
<4> [42.201703] RBP: ffff888038381ff8 R08: 00000000ffffffff R09: 000000006b6b6b6b
<4> [42.201718] R10: 0000000041cb0b89 R11: 646162206b636f6c R12: ffff88802a618500
<4> [42.201733] R13: ffff88802b32c288 R14: ffff888038381ff8 R15: ffff88802b32c250
<4> [42.201748] FS: 0000000000000000(0000) GS:ffff88803de00000(0000) knlGS:0000000000000000
<4> [42.201765] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [42.201778] CR2: 00007f2cefc6d180 CR3: 00000000381ee000 CR4: 00000000000006f0
<4> [42.201793] Call Trace:
<4> [42.201805] do_raw_spin_lock+0x66/0xb0
<4> [42.201898] i915_request_retire+0x548/0x7c0 [i915]
<4> [42.201989] retire_requests+0x4d/0x60 [i915]
<4> [42.202078] i915_retire_requests+0x144/0x2e0 [i915]
<4> [42.202169] retire_work_handler+0x10/0x40 [i915]

Recently, in commit 44c22f3f1a0a ("drm/i915: Serialize insertion into the
file->mm.request_list"), we fixed a race on insertion. Now, it appears
we also have a race with destruction!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823181455.31910-1-chris@chris-wilson.co.uk


# 6dcb85a0 23-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Hold irq-off for the entire fake lock period

Sadly lockdep records when the irqs are re-enabled and then marks up the
fake lock as being irq-unsafe. Our hand is forced and so we must mark up
the entire fake lock critical section as irq-off.

Hopefully this is the last tweak required!

v2: Not quite, we need to mark the timeline spinlock as irqsafe. That
was a genuine bug being hidden by the earlier lockdep splat.

Fixes: d67739268cf0 ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823132700.25286-2-chris@chris-wilson.co.uk


# 44c22f3f 20-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Serialize insertion into the file->mm.request_list

Currently, we remove the from per-file request list for throttling and
retirement under a dedicated spinlock, but insertion is governed by
struct_mutex. This needs to be the same lock so that the
retirement/insertion of neighbouring requests (at the tail) doesn't
break the list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190820080907.4665-1-chris@chris-wilson.co.uk


# cc337560 19-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use 0 for the unordered context

Since commit 078dec3326e2 ("dma-buf: add dma_fence_get_stub") the 0
fence context became an impossible match as it is used for an always
signaled fence. We can simplify our timeline tracking by knowing that 0
always means no match.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190819184404.24200-1-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20190819175109.5241-1-chris@chris-wilson.co.uk


# ef468849 17-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Propagate fence errors

Errors spread like wildfire, and must eventually be returned to the
user. They need to be captured and passed along the flow of fences,
infecting each in turn with the existing error, until finally they fall
out of a user visible result.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817232511.11391-1-chris@chris-wilson.co.uk


# 6c69a454 16-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Mark context->active_count as protected by timeline->mutex

We use timeline->mutex to protect modifications to
context->active_count, and the associated enable/disable callbacks.
Due to complications with engine-pm barrier there is a path where we used
a "superlock" to provide serialised protect and so could not
unconditionally assert with lockdep that it was always held. However,
we can mark the mutex as taken (noting that we may be nested underneath
ourselves) which means we can be reassured the right timeline->mutex is
always treated as held and let lockdep roam free.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190816121000.8507-1-chris@chris-wilson.co.uk


# e5dadff4 15-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Protect request retirement with timeline->mutex

Forgo the struct_mutex requirement for request retirement as we have
been transitioning over to only using the timeline->mutex for
controlling the lifetime of a request on that timeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-4-chris@chris-wilson.co.uk


# 62520e33 14-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move tasklet kicking to __i915_request_queue caller

Since __i915_request_queue() may be called from hardirq (timer) context,
we cannot use local_bh_disable/enable at the lower level. As we do want
to kick the tasklet to speed up initial submission or preemption for
normal client submission, lift it to the normal process context
callpath.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815042031.27750-1-chris@chris-wilson.co.uk


# a79ca656 13-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Push the wakeref->count deferral to the backend

If the backend wishes to defer the wakeref parking, make it responsible
for unlocking the wakeref (i.e. bumping the counter). This allows it to
time the unlock much more carefully in case it happens to needs the
wakeref to be active during its deferral.

For instance, during engine parking we may choose to emit an idle
barrier (a request). To do so, we borrow the engine->kernel_context
timeline and to ensure exclusive access we keep the
engine->wakeref.count as 0. However, to submit that request to HW may
require a intel_engine_pm_get() (e.g. to keep the submission tasklet
alive) and before we allow that we have to rewake our wakeref to avoid a
recursive deadlock.

<4> [257.742916] IRQs not enabled as expected
<4> [257.742930] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:169 __local_bh_enable_ip+0xa9/0x100
<4> [257.742936] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 btusb btrtl btbcm btintel snd_hda_intel snd_intel_nhlt bluetooth snd_hda_codec coretemp snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul ecdh_generic ecc ghash_clmulni_intel snd_pcm r8169 realtek lpc_ich prime_numbers i2c_hid
<4> [257.742991] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G U W 5.3.0-rc3-g5d0a06cd532c-drmtip_340+ #1
<4> [257.742998] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015
<4> [257.743008] RIP: 0010:__local_bh_enable_ip+0xa9/0x100
<4> [257.743017] Code: 37 5b 5d c3 8b 80 50 08 00 00 85 c0 75 a9 80 3d 0b be 25 01 00 75 a0 48 c7 c7 f3 0c 06 ac c6 05 fb bd 25 01 01 e8 77 84 ff ff <0f> 0b eb 89 48 89 ef e8 3b 41 06 00 eb 98 e8 e4 5c f4 ff 5b 5d c3
<4> [257.743025] RSP: 0018:ffffa78600003cb8 EFLAGS: 00010086
<4> [257.743035] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000010302
<4> [257.743042] RDX: 0000000080010302 RSI: 0000000000000000 RDI: 00000000ffffffff
<4> [257.743050] RBP: ffffffffc0494bb3 R08: 0000000000000000 R09: 0000000000000001
<4> [257.743058] R10: 0000000014c8f0e9 R11: 00000000fee2ff8e R12: ffffa23ba8c38008
<4> [257.743065] R13: ffffa23bacc579c0 R14: ffffa23bb7db0f60 R15: ffffa23b9cc8c430
<4> [257.743074] FS: 0000000000000000(0000) GS:ffffa23bbba00000(0000) knlGS:0000000000000000
<4> [257.743082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [257.743089] CR2: 00007fe477b20778 CR3: 000000011f72a000 CR4: 00000000001006f0
<4> [257.743096] Call Trace:
<4> [257.743104] <IRQ>
<4> [257.743265] __i915_request_commit+0x240/0x5d0 [i915]
<4> [257.743427] ? __i915_request_create+0x228/0x4c0 [i915]
<4> [257.743584] __engine_park+0x64/0x250 [i915]
<4> [257.743730] ____intel_wakeref_put_last+0x1c/0x70 [i915]
<4> [257.743878] i915_sample+0x2ee/0x310 [i915]
<4> [257.744030] ? i915_pmu_cpu_offline+0xb0/0xb0 [i915]
<4> [257.744040] __hrtimer_run_queues+0x11e/0x4b0
<4> [257.744068] hrtimer_interrupt+0xea/0x250
<4> [257.744079] ? lockdep_hardirqs_off+0x79/0xd0
<4> [257.744101] smp_apic_timer_interrupt+0x96/0x280
<4> [257.744114] apic_timer_interrupt+0xf/0x20
<4> [257.744125] RIP: 0010:__do_softirq+0xb3/0x4ae

v2: Keep the priority_hint assert
v3: That assert was desperately trying to point out my bug. Sorry, little
assert.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111378
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190813190705.23869-1-chris@chris-wilson.co.uk


# 52791eee 11-Aug-2019 Christian König <christian.koenig@amd.com>

dma-buf: rename reservation_object to dma_resv

Be more consistent with the naming of the other DMA-buf objects.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/323401/


# 75d0a7f3 09-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Lift timeline into intel_context

Move the timeline from being inside the intel_ring to intel_context
itself. This saves much pointer dancing and makes the relations of the
context to its timeline much clearer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190809182518.20486-4-chris@chris-wilson.co.uk


# a09d9a80 06-Aug-2019 Jani Nikula <jani.nikula@intel.com>

drm/i915: avoid including intel_drv.h via i915_drv.h->i915_trace.h

Disentangle i915_drv.h from intel_drv.h, which gets included via
i915_trace.h. This necessitates including i915_trace.h wherever it's
needed.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ed82bf259d3b725a1a1a3c3e9d6fb5c08bc4d489.1565085691.git.jani.nikula@intel.com


# c29579d2 06-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Make caps.scheduler static

We do not notify userspace when the scheduler capabilities are changed
(due to wedging the driver) and as such userspace will expect the caps
to be static and unchanging. Make it so, and so we only need to compute
our caps once during driver registration.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190806124300.24945-1-chris@chris-wilson.co.uk


# cb823ed9 12-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Use intel_gt as the primary object for handling resets

Having taken the first step in encapsulating the functionality by moving
the related files under gt/, the next step is to start encapsulating by
passing around the relevant structs rather than the global
drm_i915_private. In this step, we pass intel_gt to intel_reset.c

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190712192953.9187-1-chris@chris-wilson.co.uk


# 2a98f4e6 09-Jul-2019 Lionel Landwerlin <lionel.g.landwerlin@intel.com>

drm/i915: add infrastructure to hold off preemption on a request

We want to set this flag in the next commit on requests containing
perf queries so that the result of the perf query can just be a delta
of global counters, rather than doing post processing of the OA
buffer.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[ickle: add basic selftest for nopreempt]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190709164227.25859-1-chris@chris-wilson.co.uk


# f0c02c1b 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Rename i915_timeline to intel_timeline and move under gt

Move all timeline code under gt and rename to intel_gt prefix.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-32-tvrtko.ursulin@linux.intel.com


# 22b7a426 20-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Preempt-to-busy

When using a global seqno, we required a precise stop-the-workd event to
handle preemption and unwind the global seqno counter. To accomplish
this, we would preempt to a special out-of-band context and wait for the
machine to report that it was idle. Given an idle machine, we could very
precisely see which requests had completed and which we needed to feed
back into the run queue.

However, now that we have scrapped the global seqno, we no longer need
to precisely unwind the global counter and only track requests by their
per-context seqno. This allows us to loosely unwind inflight requests
while scheduling a preemption, with the enormous caveat that the
requests we put back on the run queue are still _inflight_ (until the
preemption request is complete). This makes request tracking much more
messy, as at any point then we can see a completed request that we
believe is not currently scheduled for execution. We also have to be
careful not to rewind RING_TAIL past RING_HEAD on preempting to the
running context, and for this we use a semaphore to prevent completion
of the request before continuing.

To accomplish this feat, we change how we track requests scheduled to
the HW. Instead of appending our requests onto a single list as we
submit, we track each submission to ELSP as its own block. Then upon
receiving the CS preemption event, we promote the pending block to the
inflight block (discarding what was previously being tracked). As normal
CS completion events arrive, we then remove stale entries from the
inflight tracker.

v2: Be a tinge paranoid and ensure we flush the write into the HWS page
for the GPU semaphore to pick in a timely fashion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk


# b87b6c0d 18-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush the execution-callbacks on retiring

In the unlikely case the request completes while we regard it as not even
executing on the GPU (see the next patch!), we have to flush any pending
execution callbacks at retirement and ensure that we do not add any
more.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190618074153.16055-4-chris@chris-wilson.co.uk


# ce94bef9 18-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Signal fence completion from i915_request_wait

With the upcoming change to automanaged i915_active, the intent is that
whenever we wait on the set of active fences, they are signaled and
collected. The requirement is that all successful returns from
i915_request_wait() signal the fence, so fixup the one remaining path
where we may return before the interrupt has been run.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190619112341.9082-3-chris@chris-wilson.co.uk


# 2f530945 18-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop passing I915_WAIT_LOCKED to i915_request_wait()

Since commit eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on
struct_mutex"), the I915_WAIT_LOCKED flags passed to i915_request_wait()
has been defunct. Now go ahead and remove it from all callers.

References: eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190618074153.16055-3-chris@chris-wilson.co.uk


# 44d89409 18-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make the semaphore saturation mask global

The idea behind keeping the saturation mask local to a context backfired
spectacularly. The premise with the local mask was that we would be more
proactive in attempting to use semaphores after each time the context
idled, and that all new contexts would attempt to use semaphores
ignoring the current state of the system. This turns out to be horribly
optimistic. If the system state is still oversaturated and the existing
workloads have all stopped using semaphores, the new workloads would
attempt to use semaphores and be deprioritised behind real work. The
new contexts would not switch off using semaphores until their initial
batch of low priority work had completed. Given sufficient backload load
of equal user priority, this would completely starve the new work of any
GPU time.

To compensate, remove the local tracking in favour of keeping it as
global state on the engine -- once the system is saturated and
semaphores are disabled, everyone stops attempting to use semaphores
until the system is idle again. One of the reason for preferring local
context tracking was that it worked with virtual engines, so for
switching to global state we could either do a complete check of all the
virtual siblings or simply disable semaphores for those requests. This
takes the simpler approach of disabling semaphores on virtual engines.

The downside is that the decision that the engine is saturated is a
local measure -- we are only checking whether or not this context was
scheduled in a timely fashion, it may be legitimately delayed due to user
priorities. We still have the same dilemma though, that we do not want
to employ the semaphore poll unless it will be used.

v2: Explain why we need to assume the worst wrt virtual engines.

Fixes: ca6e56f654e7 ("drm/i915: Disable semaphore busywaits on saturated systems")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190618074153.16055-8-chris@chris-wilson.co.uk


# ef78f7b1 18-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use drm_gem_object.resv

Since commit 1ba627148ef5 ("drm: Add reservation_object to
drm_gem_object"), struct drm_gem_object grew its own builtin
reservation_object rendering our own private one bloat. Remove our
redundant reservation_object and point into obj->base.resv instead.

References: 1ba627148ef5 ("drm: Add reservation_object to drm_gem_object")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190618125858.7295-1-chris@chris-wilson.co.uk


# 422d7df4 14-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Replace engine->timeline with a plain list

To continue the onslaught of removing the assumption of a global
execution ordering, another casualty is the engine->timeline. Without an
actual timeline to track, it is overkill and we can replace it with a
much less grand plain list. We still need a list of requests inflight,
for the simple purpose of finding inflight requests (for retiring,
resetting, preemption etc).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-3-chris@chris-wilson.co.uk


# 9db0c5ca 14-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop retiring along engine

We no longer track the execution order along the engine and so no longer
need to enforce ordering of retire along the engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-2-chris@chris-wilson.co.uk


# ce476c80 14-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Keep contexts pinned until after the next kernel context switch

We need to keep the context image pinned in memory until after the GPU
has finished writing into it. Since it continues to write as we signal
the final breadcrumb, we need to keep it pinned until the request after
it is complete. Currently we know the order in which requests execute on
each engine, and so to remove that presumption we need to identify a
request/context-switch we know must occur after our completion. Any
request queued after the signal must imply a context switch, for
simplicity we use a fresh request from the kernel context.

The sequence of operations for keeping the context pinned until saved is:

- On context activation, we preallocate a node for each physical engine
the context may operate on. This is to avoid allocations during
unpinning, which may be from inside FS_RECLAIM context (aka the
shrinker)

- On context deactivation on retirement of the last active request (which
is before we know the context has been saved), we add the
preallocated node onto a barrier list on each engine

- On engine idling, we emit a switch to kernel context. When this
switch completes, we know that all previous contexts must have been
saved, and so on retiring this request we can finally unpin all the
contexts that were marked as deactivated prior to the switch.

We can enhance this in future by flushing all the idle contexts on a
regular heartbeat pulse of a switch to kernel context, which will also
be used to check for hung engines.

v2: intel_context_active_acquire/_release

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-1-chris@chris-wilson.co.uk


# 84383d2e 14-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refine i915_reset.lock_map

We already use a mutex to serialise i915_reset() and wedging, so all we
need it to link that into i915_request_wait() and we have our lock cycle
detection.

v2.5: Take error mutex for selftests

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614071023.17929-3-chris@chris-wilson.co.uk


# 6e4e9708 13-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Execute signal callbacks from no-op i915_request_wait

If we enter i915_request_wait() with an already completed request, but
unsignaled dma-fence, signal the fence before returning. This allows us
to execute any of the signal callbacks at the earliest opportunity.

v2: Also signal after busyspin success

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614111053.25615-2-chris@chris-wilson.co.uk


# 33df8a76 12-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prevent lock-cycles between GPU waits and GPU resets

We cannot allow ourselves to wait on the GPU while holding any lock as we
may need to reset the GPU. While there is not an explicit lock between
the two operations, lockdep cannot detect the dependency. So let's tell
lockdep about the wait/reset dependency with an explicit lockmap.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190612085246.16374-1-chris@chris-wilson.co.uk


# f4d57d83 10-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Allow interrupts when taking the timeline->mutex

Before we commit ourselves to writing commands into the
ringbuffer and submitting the request, allow signals to interrupt
acquisition of the timeline mutex. We allow ourselves to be interrupted
at any time later if we need to block for space in the ring, anyway.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190610103610.19883-1-chris@chris-wilson.co.uk


# 10be98a7 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move more GEM objects under gem/

Continuing the theme of separating out the GEM clutter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-8-chris@chris-wilson.co.uk


# f71e01a7 21-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Extend execution fence to support a callback

In the next patch, we will want to configure the slave request
depending on which physical engine the master request is executed on.
For this, we introduce a callback from the execute fence to convey this
information.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-8-chris@chris-wilson.co.uk


# 78e41ddd 21-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Apply an execution_mask to the virtual_engine

Allow the user to direct which physical engines of the virtual engine
they wish to execute one, as sometimes it is necessary to override the
load balancing algorithm.

v2: Only kick the virtual engines on context-out if required

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-7-chris@chris-wilson.co.uk


# a491cc8e 15-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Truly bump ready tasks ahead of busywaits

In commit b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of
busywaits"), I tried cutting a corner in order to not install a signal
for each of our dependencies, and only listened to requests on which we
were intending to busywait. The compromise that was made was that
instead of then being able to promote the request with a full
NOSEMAPHORE like its non-busywaiting brethren, as we had not ensured we
had cleared the semaphore chain, we settled for only using the NEWCLIENT
boost. With an over saturated system with multiple NEWCLIENTS in flight
at any time, this was found to be an inadequate promotion and left us
with a much poorer scheduling order than prior to using semaphores.

The outcome of this patch, is that all requests have NOSEMAPHORE
priority when they have no dependencies and are ready to run and not
busywait, restoring the pre-semaphore ordering on saturated systems.

We can demonstrate the effect of poor scheduling order by oversaturating
the system using gem_wsim on a system with multiple vcs engines
(i.e running the same workloads across more clients than required for
peak throughput, e.g. media_load_balance_17i7.wsim -c4 -b context):

x v5.1 (normalized)
+ tip
* fix
+------------------------------------------------------------------------+
| x |
| x |
| x |
| x |
| %x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %#x |
| %#x |
| %#x |
| %#x |
| %#x |
| + %#xx |
| + %#xx |
| + %%#xx |
| + %%#xx |
| + %%#xx |
| + %%#xx |
| + %%##x |
| +++ %%##x |
| +++ %%##x |
| +++ %%##x |
| ++++ %%##x |
| ++++ %%##x |
| ++++ %%##xx |
| ++++ %###xx |
| ++++ %###xx |
| ++++ %###xx |
| ++++ %###xx |
| ++++ + %#O#xx |
| ++++ + %#O#xx |
| ++++++ + %#O#xx |
| ++++++++++ %OOOxxx|
| ++++++++++ + %#OOO#xx|
| + ++++++++++++ ++ +++++ + ++ @@OOOO#xx|
| |A_| |
||__________M_______A____________________| |
| |A_| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 120 0.99456 1.00628 0.999985 1.0001545 0.0024387139
+ 120 0.873021 1.00037 0.884134 0.90148752 0.039190862
Difference at 99.5% confidence
-0.098667 +/- 0.0110762
-9.86517% +/- 1.10745%
(Student's t, pooled s = 0.0277657)
% 120 0.990207 1.00165 0.9970265 0.99699748 0.0021024
Difference at 99.5% confidence
-0.003157 +/- 0.000908245
-0.315651% +/- 0.0908105%
(Student's t, pooled s = 0.00227678)

Fixes: b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of busywaits")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-2-chris@chris-wilson.co.uk
(cherry picked from commit 17db337f5098d29415314c4a588b842fc684394b)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# c80274bb 15-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Downgrade NEWCLIENT to non-preemptive

Commit 1413b2bc0717 ("drm/i915: Trim NEWCLIENT boosting") had the
intended consequence of not allowing a sequence of work that merely
crossed into a new engine the privilege to be promoted to NEWCLIENT
status. It also had the unintended consequence of actually making
NEWCLIENT effective on heavily oversubscribed transcode machines and
impacting upon their throughput.

If we consider a client packet composed of (rcsA, rcsB, vcs) and 30 of
those clients, using the NEWCLIENT boost that will be scheduled as

rcsA x 30, (rcsB, vcs) x 30

where as before it would have been

(rcsA, rcsB, vcs) x 30

That is with NEWCLIENT only boosting the first request of each client,
we would execute all rcsA requests prior to running on the vcs engines;
acruing a lot of dead time as compared to the previous case where the
vcs engine would be started in parallel to processing the second client.

The previous patch has the effect of delaying submission until it is
required by a third party (either the user with an explicit wait, or by
another client/engine). We reduce the NEWCLIENT bump to a mere WAIT,
which has the effect of removing its preemptive grant and reducing it to
the same level as any other user interaction -- that it will not be
promoted above the interengine dependencies, and so preventing NEWCLIENTS
from starving other engines. This a large nerf to the rrul properties of
the current NEWCLIENT, but it still does give prioritised submission to
new requests from light workloads.

References: b16c765122f9 ("drm/i915: Priority boost for new clients")
Fixes: 1413b2bc0717 ("drm/i915: Trim NEWCLIENT boosting") # customer impact
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-4-chris@chris-wilson.co.uk
(cherry picked from commit 68fc728b01fcc93b26d52f6e884e738962a49a66)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 9981927c 15-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Bump signaler priority on adding a waiter

The handling of the no-preemption priority level imposes the restriction
that we need to maintain the implied ordering even though preemption is
disabled. Otherwise we may end up with an AB-BA deadlock across multiple
engine due to a real preemption event reordering the no-preemption
WAITs. To resolve this issue we currently promote all requests to WAIT
on unsubmission, however this interferes with the timeslicing
requirement that we do not apply any implicit promotion that will defeat
the round-robin timeslice list. (If we automatically promote the active
request it will go back to the head of the queue and not the tail!)

So we need implicit promotion to prevent reordering around semaphores
where we are not allowed to preempt, and we must avoid implicit
promotion on unsubmission. So instead of at unsubmit, if we apply that
implicit promotion on adding the dependency, we avoid the semaphore
deadlock and we also reduce the gains made by the promotion for user
space waiting. Furthermore, by keeping the earlier dependencies at a
higher level, we reduce the search space for timeslicing without
altering runtime scheduling too badly (no dependencies at all will be
assigned a higher priority for rrul).

v2: Limit the bump to external edges (as originally intended) i.e.
between contexts and out to the user.

Testcase: igt/gem_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-3-chris@chris-wilson.co.uk
(cherry picked from commit 6e7eb7a80769e7250e31652b96918cf7f3e0d285)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 68fc728b 15-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Downgrade NEWCLIENT to non-preemptive

Commit 1413b2bc0717 ("drm/i915: Trim NEWCLIENT boosting") had the
intended consequence of not allowing a sequence of work that merely
crossed into a new engine the privilege to be promoted to NEWCLIENT
status. It also had the unintended consequence of actually making
NEWCLIENT effective on heavily oversubscribed transcode machines and
impacting upon their throughput.

If we consider a client packet composed of (rcsA, rcsB, vcs) and 30 of
those clients, using the NEWCLIENT boost that will be scheduled as

rcsA x 30, (rcsB, vcs) x 30

where as before it would have been

(rcsA, rcsB, vcs) x 30

That is with NEWCLIENT only boosting the first request of each client,
we would execute all rcsA requests prior to running on the vcs engines;
acruing a lot of dead time as compared to the previous case where the
vcs engine would be started in parallel to processing the second client.

The previous patch has the effect of delaying submission until it is
required by a third party (either the user with an explicit wait, or by
another client/engine). We reduce the NEWCLIENT bump to a mere WAIT,
which has the effect of removing its preemptive grant and reducing it to
the same level as any other user interaction -- that it will not be
promoted above the interengine dependencies, and so preventing NEWCLIENTS
from starving other engines. This a large nerf to the rrul properties of
the current NEWCLIENT, but it still does give prioritised submission to
new requests from light workloads.

References: b16c765122f9 ("drm/i915: Priority boost for new clients")
Fixes: 1413b2bc0717 ("drm/i915: Trim NEWCLIENT boosting") # customer impact
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-4-chris@chris-wilson.co.uk


# 6e7eb7a8 15-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Bump signaler priority on adding a waiter

The handling of the no-preemption priority level imposes the restriction
that we need to maintain the implied ordering even though preemption is
disabled. Otherwise we may end up with an AB-BA deadlock across multiple
engine due to a real preemption event reordering the no-preemption
WAITs. To resolve this issue we currently promote all requests to WAIT
on unsubmission, however this interferes with the timeslicing
requirement that we do not apply any implicit promotion that will defeat
the round-robin timeslice list. (If we automatically promote the active
request it will go back to the head of the queue and not the tail!)

So we need implicit promotion to prevent reordering around semaphores
where we are not allowed to preempt, and we must avoid implicit
promotion on unsubmission. So instead of at unsubmit, if we apply that
implicit promotion on adding the dependency, we avoid the semaphore
deadlock and we also reduce the gains made by the promotion for user
space waiting. Furthermore, by keeping the earlier dependencies at a
higher level, we reduce the search space for timeslicing without
altering runtime scheduling too badly (no dependencies at all will be
assigned a higher priority for rrul).

v2: Limit the bump to external edges (as originally intended) i.e.
between contexts and out to the user.

Testcase: igt/gem_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-3-chris@chris-wilson.co.uk


# 17db337f 15-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Truly bump ready tasks ahead of busywaits

In commit b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of
busywaits"), I tried cutting a corner in order to not install a signal
for each of our dependencies, and only listened to requests on which we
were intending to busywait. The compromise that was made was that
instead of then being able to promote the request with a full
NOSEMAPHORE like its non-busywaiting brethren, as we had not ensured we
had cleared the semaphore chain, we settled for only using the NEWCLIENT
boost. With an over saturated system with multiple NEWCLIENTS in flight
at any time, this was found to be an inadequate promotion and left us
with a much poorer scheduling order than prior to using semaphores.

The outcome of this patch, is that all requests have NOSEMAPHORE
priority when they have no dependencies and are ready to run and not
busywait, restoring the pre-semaphore ordering on saturated systems.

We can demonstrate the effect of poor scheduling order by oversaturating
the system using gem_wsim on a system with multiple vcs engines
(i.e running the same workloads across more clients than required for
peak throughput, e.g. media_load_balance_17i7.wsim -c4 -b context):

x v5.1 (normalized)
+ tip
* fix
+------------------------------------------------------------------------+
| x |
| x |
| x |
| x |
| %x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %#x |
| %#x |
| %#x |
| %#x |
| %#x |
| + %#xx |
| + %#xx |
| + %%#xx |
| + %%#xx |
| + %%#xx |
| + %%#xx |
| + %%##x |
| +++ %%##x |
| +++ %%##x |
| +++ %%##x |
| ++++ %%##x |
| ++++ %%##x |
| ++++ %%##xx |
| ++++ %###xx |
| ++++ %###xx |
| ++++ %###xx |
| ++++ %###xx |
| ++++ + %#O#xx |
| ++++ + %#O#xx |
| ++++++ + %#O#xx |
| ++++++++++ %OOOxxx|
| ++++++++++ + %#OOO#xx|
| + ++++++++++++ ++ +++++ + ++ @@OOOO#xx|
| |A_| |
||__________M_______A____________________| |
| |A_| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 120 0.99456 1.00628 0.999985 1.0001545 0.0024387139
+ 120 0.873021 1.00037 0.884134 0.90148752 0.039190862
Difference at 99.5% confidence
-0.098667 +/- 0.0110762
-9.86517% +/- 1.10745%
(Student's t, pooled s = 0.0277657)
% 120 0.990207 1.00165 0.9970265 0.99699748 0.0021024
Difference at 99.5% confidence
-0.003157 +/- 0.000908245
-0.315651% +/- 0.0908105%
(Student's t, pooled s = 0.00227678)

Fixes: b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of busywaits")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-2-chris@chris-wilson.co.uk


# dba5a7f3 15-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark semaphores as complete on unsubmit out if payload was started

Avoid charging us for the presumed busywait if the request was preempted
after successfully using semaphores to reduce inter-engine latency.

v2: Bump the priority to reflect the lack of semaphores now required.

References: ca6e56f654e7 ("drm/i915: Disable semaphore busywaits on saturated systems")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-1-chris@chris-wilson.co.uk


# c36beba6 07-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Seal races between async GPU cancellation, retirement and signaling

Currently there is an underlying assumption that i915_request_unsubmit()
is synchronous wrt the GPU -- that is the request is no longer in flight
as we remove it. In the near future that may change, and this may upset
our signaling as we can process an interrupt for that request while it
is no longer in flight.

CPU0 CPU1
intel_engine_breadcrumbs_irq
(queue request completion)
i915_request_cancel_signaling
... ...
i915_request_enable_signaling
dma_fence_signal

Hence in the time it took us to drop the lock to signal the request, a
preemption event may have occurred and re-queued the request. In the
process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and
so reused the rq->signal_link that was in use on CPU0, leading to bad
pointer chasing in intel_engine_breadcrumbs_irq.

A related issue was that if someone started listening for a signal on a
completed but no longer in-flight request, we missed the opportunity to
immediately signal that request.

Furthermore, as intel_contexts may be immediately released during
request retirement, in order to be entirely sure that
intel_engine_breadcrumbs_irq may no longer dereference the intel_context
(ce->signals and ce->signal_link), we must wait for irq spinlock.

In order to prevent the race, we use a bit in the fence.flags to signal
the transfer onto the signal list inside intel_engine_breadcrumbs_irq.
For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then
quickly signals to any outside observer that the fence is indeed signaled.

v2: Sketch out potential dma-fence API for manual signaling
v3: And the test_and_set_bit()

Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190508112452.18942-1-chris@chris-wilson.co.uk
(cherry picked from commit 0152b3b3f49b36b0f1a1bf9f0353dc636f41d8f0)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 0152b3b3 07-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Seal races between async GPU cancellation, retirement and signaling

Currently there is an underlying assumption that i915_request_unsubmit()
is synchronous wrt the GPU -- that is the request is no longer in flight
as we remove it. In the near future that may change, and this may upset
our signaling as we can process an interrupt for that request while it
is no longer in flight.

CPU0 CPU1
intel_engine_breadcrumbs_irq
(queue request completion)
i915_request_cancel_signaling
... ...
i915_request_enable_signaling
dma_fence_signal

Hence in the time it took us to drop the lock to signal the request, a
preemption event may have occurred and re-queued the request. In the
process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and
so reused the rq->signal_link that was in use on CPU0, leading to bad
pointer chasing in intel_engine_breadcrumbs_irq.

A related issue was that if someone started listening for a signal on a
completed but no longer in-flight request, we missed the opportunity to
immediately signal that request.

Furthermore, as intel_contexts may be immediately released during
request retirement, in order to be entirely sure that
intel_engine_breadcrumbs_irq may no longer dereference the intel_context
(ce->signals and ce->signal_link), we must wait for irq spinlock.

In order to prevent the race, we use a bit in the fence.flags to signal
the transfer onto the signal list inside intel_engine_breadcrumbs_irq.
For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then
quickly signals to any outside observer that the fence is indeed signaled.

v2: Sketch out potential dma-fence API for manual signaling
v3: And the test_and_set_bit()

Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190508112452.18942-1-chris@chris-wilson.co.uk


# 25d851ad 07-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only reschedule the submission tasklet if preemption is possible

If we couple the scheduler more tightly with the execlists policy, we
can apply the preemption policy to the question of whether we need to
kick the tasklet at all for this priority bump.

v2: Rephrase it as a core i915 policy and not an execlists foible.
v3: Pull the kick together.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190507122544.12698-1-chris@chris-wilson.co.uk


# c8a0e2ae 03-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Acquire the signaler's timeline HWSP last

Acquiring the signaler's timeline takes an active reference to their
HWSP that we would like to avoid if possible, so take it after
performing all of our allocations required to set up the fencing. The
acquisition also provides the final check that the target has not
already signaled allowing us to avoid the semaphore at the last moment.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190503140239.32668-1-chris@chris-wilson.co.uk


# 2564fe70 04-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Disable semaphore busywaits on saturated systems

Asking the GPU to busywait on a memory address, perhaps not unexpectedly
in hindsight for a shared system, leads to bus contention that affects
CPU programs trying to concurrently access memory. This can manifest as
a drop in transcode throughput on highly over-saturated workloads.

The only clue offered by perf, is that the bus-cycles (perf stat -e
bus-cycles) jumped by 50% when enabling semaphores. This corresponds
with extra CPU active cycles being attributed to intel_idle's mwait.

This patch introduces a heuristic to try and detect when more than one
client is submitting to the GPU pushing it into an oversaturated state.
As we already keep track of when the semaphores are signaled, we can
inspect their state on submitting the busywait batch and if we planned
to use a semaphore but were too late, conclude that the GPU is
overloaded and not try to use semaphores in future requests. In
practice, this means we optimistically try to use semaphores for the
first frame of a transcode job split over multiple engines, and fail if
there are multiple clients active and continue not to use semaphores for
the subsequent frames in the sequence. Periodically, we try to
optimistically switch semaphores back on whenever the client waits to
catch up with the transcode results.

With 1 client, on Broxton J3455, with the relative fps normalized by %cpu:

x no semaphores
+ drm-tip
* patched
+------------------------------------------------------------------------+
| * |
| *+ |
| **+ |
| **+ x |
| x * +**+ x |
| x x * * +***x xx |
| x x * * *+***x *x |
| x x* + * * *****x *x x |
| + x xx+x* + *** * ********* x * |
| + x xx+x* * *** +** ********* xx * |
| * + ++++* + x*x****+*+* ***+*************+x* * |
|*+ +** *+ + +* + *++****** *xxx**********x***+*****************+*++ *|
| |__________A_____M_____| |
| |_______________A____M_________| |
| |____________A___M________| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 120 2.60475 3.50941 3.31123 3.2143953 0.21117399
+ 120 2.3826 3.57077 3.25101 3.1414161 0.28146407
Difference at 95.0% confidence
-0.0729792 +/- 0.0629585
-2.27039% +/- 1.95864%
(Student's t, pooled s = 0.248814)
* 120 2.35536 3.66713 3.2849 3.2059917 0.24618565
No difference proven at 95.0% confidence

With 10 clients over-saturating the pipeline:

x no semaphores
+ drm-tip
* patched
+------------------------------------------------------------------------+
| ++ ** |
| ++ ** |
| ++ ** |
| ++ ** |
| ++ xx *** |
| ++ xx *** |
| ++ xxx*** |
| ++ xxx*** |
| +++ xxx*** |
| +++ xx**** |
| +++ xx**** |
| +++ xx**** |
| +++ xx**** |
| ++++ xx**** |
| +++++ xx**** |
| +++++ x x****** |
| ++++++ xxx******* |
| ++++++ xxx******* |
| ++++++ xxx******* |
| ++++++ xx******** |
| ++++++ xxxx******** |
| ++++++ xxxx******** |
| ++++++++ xxxxx********* |
|+ + + + ++++++++ xxx*xx**********x* *|
| |__A__| |
| |__AM__| |
| |__A_| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 120 2.47855 2.8972 2.72376 2.7193402 0.074604933
+ 120 1.17367 1.77459 1.71977 1.6966782 0.085850697
Difference at 95.0% confidence
-1.02266 +/- 0.0203502
-37.607% +/- 0.748352%
(Student's t, pooled s = 0.0804246)
* 120 2.57868 3.00821 2.80142 2.7923878 0.058646477
Difference at 95.0% confidence
0.0730476 +/- 0.0169791
2.68622% +/- 0.624383%
(Student's t, pooled s = 0.0671018)

Indicating that we've recovered the regression from enabling semaphores
on this saturated setup, with a hint towards an overall improvement.

Very similar, but of smaller magnitude, results are observed on both
Skylake(gt2) and Kabylake(gt4). This may be due to the reduced impact of
bus-cycles, where we see a 50% hit on Broxton, it is only 10% on the big
core, in this particular test.

One observation to make here is that for a greedy client trying to
maximise its own throughput, using semaphores is the right choice. It is
only the holistic system-wide view that semaphores of one client
impacts another and reduces the overall throughput where we would choose
to disable semaphores.

The most noticeable negactive impact this has is on the no-op
microbenchmarks, which are also very notable for having no cpu bus load.
In particular, this increases the runtime and energy consumption of
gem_exec_whisper.

Fixes: e88619646971 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190504070707.30902-1-chris@chris-wilson.co.uk
(cherry picked from commit ca6e56f654e7b241256ffba78cd2abb22aa3bc97)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# e766fde6 30-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Delay semaphore submission until the start of the signaler

Currently we submit the semaphore busywait as soon as the signaler is
submitted to HW. However, we may submit the signaler as the tail of a
batch of requests, and even not as the first context in the HW list,
i.e. the busywait may start spinning far in advance of the signaler even
starting.

If we wait until the request before the signaler is completed before
submitting the busywait, we prevent the busywait from starting too
early, if the signaler is not first in submission port.

To handle the case where the signaler is at the start of the second (or
later) submission port, we will need to delay the execution callback
until we know the context is promoted to port0. A challenge for later.

Fixes: e88619646971 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190501114541.10077-9-chris@chris-wilson.co.uk
(cherry picked from commit 0d90ccb70211cbf55140e91bd39db684aa4c16e9)
[Joonas: edited Fixes: tag into single line.]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# ca6e56f6 04-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Disable semaphore busywaits on saturated systems

Asking the GPU to busywait on a memory address, perhaps not unexpectedly
in hindsight for a shared system, leads to bus contention that affects
CPU programs trying to concurrently access memory. This can manifest as
a drop in transcode throughput on highly over-saturated workloads.

The only clue offered by perf, is that the bus-cycles (perf stat -e
bus-cycles) jumped by 50% when enabling semaphores. This corresponds
with extra CPU active cycles being attributed to intel_idle's mwait.

This patch introduces a heuristic to try and detect when more than one
client is submitting to the GPU pushing it into an oversaturated state.
As we already keep track of when the semaphores are signaled, we can
inspect their state on submitting the busywait batch and if we planned
to use a semaphore but were too late, conclude that the GPU is
overloaded and not try to use semaphores in future requests. In
practice, this means we optimistically try to use semaphores for the
first frame of a transcode job split over multiple engines, and fail if
there are multiple clients active and continue not to use semaphores for
the subsequent frames in the sequence. Periodically, we try to
optimistically switch semaphores back on whenever the client waits to
catch up with the transcode results.

With 1 client, on Broxton J3455, with the relative fps normalized by %cpu:

x no semaphores
+ drm-tip
* patched
+------------------------------------------------------------------------+
| * |
| *+ |
| **+ |
| **+ x |
| x * +**+ x |
| x x * * +***x xx |
| x x * * *+***x *x |
| x x* + * * *****x *x x |
| + x xx+x* + *** * ********* x * |
| + x xx+x* * *** +** ********* xx * |
| * + ++++* + x*x****+*+* ***+*************+x* * |
|*+ +** *+ + +* + *++****** *xxx**********x***+*****************+*++ *|
| |__________A_____M_____| |
| |_______________A____M_________| |
| |____________A___M________| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 120 2.60475 3.50941 3.31123 3.2143953 0.21117399
+ 120 2.3826 3.57077 3.25101 3.1414161 0.28146407
Difference at 95.0% confidence
-0.0729792 +/- 0.0629585
-2.27039% +/- 1.95864%
(Student's t, pooled s = 0.248814)
* 120 2.35536 3.66713 3.2849 3.2059917 0.24618565
No difference proven at 95.0% confidence

With 10 clients over-saturating the pipeline:

x no semaphores
+ drm-tip
* patched
+------------------------------------------------------------------------+
| ++ ** |
| ++ ** |
| ++ ** |
| ++ ** |
| ++ xx *** |
| ++ xx *** |
| ++ xxx*** |
| ++ xxx*** |
| +++ xxx*** |
| +++ xx**** |
| +++ xx**** |
| +++ xx**** |
| +++ xx**** |
| ++++ xx**** |
| +++++ xx**** |
| +++++ x x****** |
| ++++++ xxx******* |
| ++++++ xxx******* |
| ++++++ xxx******* |
| ++++++ xx******** |
| ++++++ xxxx******** |
| ++++++ xxxx******** |
| ++++++++ xxxxx********* |
|+ + + + ++++++++ xxx*xx**********x* *|
| |__A__| |
| |__AM__| |
| |__A_| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 120 2.47855 2.8972 2.72376 2.7193402 0.074604933
+ 120 1.17367 1.77459 1.71977 1.6966782 0.085850697
Difference at 95.0% confidence
-1.02266 +/- 0.0203502
-37.607% +/- 0.748352%
(Student's t, pooled s = 0.0804246)
* 120 2.57868 3.00821 2.80142 2.7923878 0.058646477
Difference at 95.0% confidence
0.0730476 +/- 0.0169791
2.68622% +/- 0.624383%
(Student's t, pooled s = 0.0671018)

Indicating that we've recovered the regression from enabling semaphores
on this saturated setup, with a hint towards an overall improvement.

Very similar, but of smaller magnitude, results are observed on both
Skylake(gt2) and Kabylake(gt4). This may be due to the reduced impact of
bus-cycles, where we see a 50% hit on Broxton, it is only 10% on the big
core, in this particular test.

One observation to make here is that for a greedy client trying to
maximise its own throughput, using semaphores is the right choice. It is
only the holistic system-wide view that semaphores of one client
impacts another and reduces the overall throughput where we would choose
to disable semaphores.

The most noticeable negactive impact this has is on the no-op
microbenchmarks, which are also very notable for having no cpu bus load.
In particular, this increases the runtime and energy consumption of
gem_exec_whisper.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190504070707.30902-1-chris@chris-wilson.co.uk


# 0d90ccb7 30-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Delay semaphore submission until the start of the signaler

Currently we submit the semaphore busywait as soon as the signaler is
submitted to HW. However, we may submit the signaler as the tail of a
batch of requests, and even not as the first context in the HW list,
i.e. the busywait may start spinning far in advance of the signaler even
starting.

If we wait until the request before the signaler is completed before
submitting the busywait, we prevent the busywait from starting too
early, if the signaler is not first in submission port.

To handle the case where the signaler is at the start of the second (or
later) submission port, we will need to delay the execution callback
until we know the context is promoted to port0. A challenge for later.

Fixes: e88619646971 ("drm/i915: Use HW semaphores for inter-engine synchroni
sation on gen8+")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190501114541.10077-9-chris@chris-wilson.co.uk


# 46472b3e 26-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move i915_request_alloc into selftests/

Having transitioned GEM over to using intel_context as its primary means
of tracking the GEM context and engine combined and using
i915_request_create(), we can move the older i915_request_alloc()
helper function into selftests/ where the remaining users are confined.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-9-chris@chris-wilson.co.uk


# 5e2a0419 26-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Switch back to an array of logical per-engine HW contexts

We switched to a tree of per-engine HW context to accommodate the
introduction of virtual engines. However, we plan to also support
multiple instances of the same engine within the GEM context, defeating
our use of the engine as a key to looking up the HW context. Just
allocate a logical per-engine instance and always use an index into the
ctx->engines[]. Later on, this ctx->engines[] may be replaced by a user
specified map.

v2: Add for_each_gem_engine() helper to iterator within the engines lock
v3: intel_context_create_request() helper
v4: s/unsigned long/unsigned int/ 4 billion engines is quite enough.
v5: Push iterator locking to caller

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-7-chris@chris-wilson.co.uk


# fa9f6681 26-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Export intel_context_instance()

We want to pass in a intel_context into intel_context_pin() and that
requires us to first be able to lookup the intel_context!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-2-chris@chris-wilson.co.uk


# 8f2a1057 24-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Explicitly pin the logical context for execbuf

In order to separate the reservation phase of building a request from
its emission phase, we need to pull some of the request alloc activities
from deep inside i915_request to the surface, GEM_EXECBUFFER.

v2: Be frivolous, use a local drm_i915_private.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190425050143.811-1-chris@chris-wilson.co.uk


# 79ffac85 24-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Invert the GEM wakeref hierarchy

In the current scheme, on submitting a request we take a single global
GEM wakeref, which trickles down to wake up all GT power domains. This
is undesirable as we would like to be able to localise our power
management to the available power domains and to remove the global GEM
operations from the heart of the driver. (The intent there is to push
global GEM decisions to the boundary as used by the GEM user interface.)

Now during request construction, each request is responsible via its
logical context to acquire a wakeref on each power domain it intends to
utilize. Currently, each request takes a wakeref on the engine(s) and
the engines themselves take a chipset wakeref. This gives us a
transition on each engine which we can extend if we want to insert more
powermangement control (such as soft rc6). The global GEM operations
that currently require a struct_mutex are reduced to listening to pm
events from the chipset GT wakeref. As we reduce the struct_mutex
requirement, these listeners should evaporate.

Perhaps the biggest immediate change is that this removes the
struct_mutex requirement around GT power management, allowing us greater
flexibility in request construction. Another important knock-on effect,
is that by tracking engine usage, we can insert a switch back to the
kernel context on that engine immediately, avoiding any extra delay or
inserting global synchronisation barriers. This makes tracking when an
engine and its associated contexts are idle much easier -- important for
when we forgo our assumed execution ordering and need idle barriers to
unpin used contexts. In the process, it means we remove a large chunk of
code whose only purpose was to switch back to the kernel context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk


# 2ccdf6a1 24-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pass intel_context to i915_request_create()

Start acquiring the logical intel_context and using that as our primary
means for request allocation. This is the initial step to allow us to
avoid requiring struct_mutex for request allocation along the
perma-pinned kernel context, but it also provides a foundation for
breaking up the complex request allocation to handle different scenarios
inside execbuf.

For the purpose of emitting a request from inside retirement (see the
next patch for engine power management), we also need to lift control
over the timeline mutex to the caller.

v2: Note that the request carries the active reference upon construction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-4-chris@chris-wilson.co.uk


# 6eee33e8 24-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Introduce context->enter() and context->exit()

We wish to start segregating the power management into different control
domains, both with respect to the hardware and the user interface. The
first step is that at the lowest level flow of requests, we want to
process a context event (and not a global GEM operation). In this patch,
we introduce the context callbacks that in future patches will be
redirected to per-engine interfaces leading to global operations as
required.

The intent is that this will be guarded by the timeline->mutex, except
that retiring has not quite finished transitioning over from being
guarded by struct_mutex. So at the moment it is protected by
struct_mutex with a reminded to switch.

v2: Rename default handlers to intel_context_enter_engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-3-chris@chris-wilson.co.uk


# 112ed2d3 24-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move GraphicsTechnology files under gt/

Start partitioning off the code that talks to the hardware (GT) from the
uapi layers and move the device facing code under gt/

One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to
subdivide that header and body further (and split out the submission
code from the ringbuffer and logical context handling). This patch aims
to be simple motion so git can fixup inflight patches with little mess.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk


# 7ce99d24 19-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Expose the busyspin durations for i915_wait_request

An interesting discussion regarding "hybrid interrupt polling" for NVMe
came to the conclusion that the ideal busyspin before sleeping was half
of the expected request latency (and better if it was already halfway
through that request). This suggested that we too should look again at
our tradeoff between spinning and waiting. Currently, our spin simply
tries to hide the cost of enabling the interrupt, which is good to avoid
penalising nop requests (i.e. test throughput) and not much else.
Studying real world workloads suggests that a spin of upto 500us can
dramatically boost performance, but the suggestion is that this is not
from avoiding interrupt latency per-se, but from secondary effects of
sleeping such as allowing the CPU reduce cstate and context switch away.

In a truly hybrid interrupt polling scheme, we would aim to sleep until
just before the request completed and then wake up in advance of the
interrupt and do a quick poll to handle completion. This is tricky for
ourselves at the moment as we are not recording request times, and since
we allow preemption, our requests are not on as a nicely ordered
timeline as IO. However, the idea is interesting, for it will certainly
help us decide when busyspinning is worthwhile.

v2: Expose the spin setting via Kconfig options for easier adjustment
and testing.
v3: Don't get caught sneaking in a change to the busyspin parameters.
v4: Explain more about the "hybrid interrupt polling" scheme that we
want to migrate towards.

Suggested-by: Sagar Kamble <sagar.a.kamble@intel.com>
References: http://events.linuxfoundation.org/sites/events/files/slides/lemoal-nvme-polling-vault-2017-final_0.pdf
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sagar Kamble <sagar.a.kamble@intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Sagar Kamble <sagar.a.kamble@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190419182625.11186-1-chris@chris-wilson.co.uk


# 0c441cb6 11-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Call i915_sw_fence_fini on request cleanup

As i915_requests are put into an RCU-freelist, they may get reused
before debugobjects notice them as being freed. On cleanup, explicitly
call i915_sw_fence_fini() so that the debugobject is properly tracked.

Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Fixes: b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of busywaits")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190411122445.20060-1-chris@chris-wilson.co.uk


# b7404c7e 09-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Bump ready tasks ahead of busywaits

Consider two tasks that are running in parallel on a pair of engines
(vcs0, vcs1), but then must complete on a shared engine (rcs0). To
maximise throughput, we want to run the first ready task on rcs0 (i.e.
the first task that completes on either of vcs0 or vcs1). When using
semaphores, however, we will instead queue onto rcs in submission order.

To resolve this incorrect ordering, we want to re-evaluate the priority
queue when each of the request is ready. Normally this happens because
we only insert into the priority queue requests that are ready, but with
semaphores we are inserting ahead of their readiness and to compensate
we penalize those tasks with reduced priority (so that tasks that do not
need to busywait should naturally be run first). However, given a series
of tasks that each use semaphores, the queue degrades into submission
fifo rather than readiness fifo, and so to counter this we give a small
boost to semaphore users as their dependent tasks are completed (and so
we no longer require any busywait prior to running the user task as they
are then ready themselves).

v2: Fixup irqsave for schedule_lock (Tvrtko)

Testcase: igt/gem_exec_schedule/semaphore-codependency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190409152922.23894-1-chris@chris-wilson.co.uk


# de220cc2 08-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Consolidate the timeline->barrier

The timeline is strictly ordered, so by inserting the timeline->barrier
request into the timeline->last_request it naturally provides the same
barrier. Consolidate the pair of barriers into one as they serve the
same purpose.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190408091728.20207-4-chris@chris-wilson.co.uk


# 696173b0 05-Apr-2019 Jani Nikula <jani.nikula@intel.com>

drm/i915: extract intel_pm.h from intel_drv.h

It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.

Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.

No functional changes.

v2: gen6_rps_reset_ei() is in i915_irq.c not intel_pm.c.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/adc6463b95eef3440fba9826793f7d1c5f3b0b4a.1554461791.git.jani.nikula@intel.com


# b66ea2c2 03-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use lockdep_pin_lock() over the construction of the request

During request construction, we take the timeline->mutex to ensure
exclusive access to the ringbuffer (for command emission) and the
timeline itself (for command ordering). The timeline->mutex should not
be dropped by callers until we release it in i915_request_add().

lockdep provides a pin/unpin lock facility to detect accidental unlocks
inside critical sections, so put it to use for request construction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190403082132.327-1-chris@chris-wilson.co.uk


# 7881e605 01-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only emit one semaphore per request

Ideally we only need one semaphore per ring to accommodate waiting on
multiple engines in parallel. However, since we do not know which fences
we will finally be waiting on, we emit a semaphore for every fence. It
turns out to be quite easy to trick ourselves into exhausting our
ringbuffer causing an error, just by feeding in a batch that depends on
several thousand contexts.

Since we never can be waiting on more than one semaphore in parallel
(other than perhaps the desire to busywait on multiple engines), just
pick the first fence for our semaphore. If we pick the wrong fence to
busywait on, we just miss an opportunity to reduce latency.

An adaption might be to use sched.flags as either a semaphore counter,
or to track the first busywait on each engine, converting it back to a
single use bit prior to closing the request.

v2: Track first semaphore used per-engine (this caters for our basic
igt that semaphores are working).

Reported-by: Mika Kuoppala <mika.kuoppala@intel.com>
Testcase: igt/gem_exec_fence/long-history
Fixes: e88619646971 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190401162641.10963-3-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# ea593dbb 22-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Allow contexts to share a single timeline across all engines

Previously, our view has been always to run the engines independently
within a context. (Multiple engines happened before we had contexts and
timelines, so they always operated independently and that behaviour
persisted into contexts.) However, at the user level the context often
represents a single timeline (e.g. GL contexts) and userspace must
ensure that the individual engines are serialised to present that
ordering to the client (or forgot about this detail entirely and hope no
one notices - a fair ploy if the client can only directly control one
engine themselves ;)

In the next patch, we will want to construct a set of engines that
operate as one, that have a single timeline interwoven between them, to
present a single virtual engine to the user. (They submit to the virtual
engine, then we decide which engine to execute on based.)

To that end, we want to be able to create contexts which have a single
timeline (fence context) shared between all engines, rather than multiple
timelines.

v2: Move the specialised timeline ordering to its own function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-4-chris@chris-wilson.co.uk


# 4daffb66 21-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop storing the context name as the timeline name

The timeline->name is only used for convenience in pretty printing the
i915_request.fence->ops->get_timeline_name() and it is just as
convenient to pull it from the gem_context directly. The few instances
of its use inside GEM_TRACE() has proven more of a nuisance than
helpful, so not worth saving imo.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190321140711.11190-4-chris@chris-wilson.co.uk


# 65baf0ef 18-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Hold a ref to the ring while retiring

As the final request on a ring may hold the reference to this ring (via
retiring the last pinned context), we may find ourselves chasing a
dangling pointer on completion of the list.

A quick solution is to hold a reference to the ring itself as we retire
along it so that we only free it after we stop dereferencing it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190318095204.9913-4-chris@chris-wilson.co.uk


# c6eeb479 08-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reduce presumption of request ordering for barriers

Currently we assume that we know the order in which requests run and so
can determine if we need to reissue a switch-to-kernel-context prior to
idling. That assumption does not hold for the future, so instead of
tracking which barriers have been used, simply determine if we have ever
switched away from the kernel context by using the engine and before
idling ensure that all engines that have been used since the last idle
are synchronously switched back to the kernel context for safety (and
else of shrinking memory while idle).

v2: Use intel_engine_mask_t and ALL_ENGINES

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-3-chris@chris-wilson.co.uk


# 103b76ee 05-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use i915_global_register()

Rather than manually add every new global into each hook, use
i915_global_register() function and keep a list of registered globals to
invoke instead.

However, I haven't found a way for random drivers to add an .init table
to avoid having to manually add ourselves to i915_globals_init() each
time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190305213830.18094-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# f9e9e9de 01-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prioritise non-busywait semaphore workloads

We don't want to busywait on the GPU if we have other work to do. If we
give non-busywaiting workloads higher (initial) priority than workloads
that require a busywait, we will prioritise work that is ready to run
immediately. We then also have to be careful that we don't give earlier
semaphores an accidental boost because later work doesn't wait on other
rings, hence we keep a history of semaphore usage of the dependency chain.

v2: Stop rolling the bits into a chain and just use a flag in case this
request or any of our dependencies use a semaphore. The rolling around
was contagious as Tvrtko was heard to fall off his chair.

Testcase: igt/gem_exec_schedule/semaphore
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-4-chris@chris-wilson.co.uk


# e8861964 01-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+

Having introduced per-context seqno, we now have a means to identity
progress across the system without feel of rollback as befell the
global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
advance of submission safe in the knowledge that our target seqno and
address is stable.

However, since we are telling the GPU to busy-spin on the target address
until it matches the signaling seqno, we only want to do so when we are
sure that busy-spin will be completed quickly. To achieve this we only
submit the request to HW once the signaler is itself executing (modulo
preemption causing us to wait longer), and we only do so for default and
above priority requests (so that idle priority tasks never themselves
hog the GPU waiting for others).

As might be reasonably expected, HW semaphores excel in inter-engine
synchronisation microbenchmarks (where the 3x reduced latency / increased
throughput more than offset the power cost of spinning on a second ring)
and have significant improvement (can be up to ~10%, most see no change)
for single clients that utilize multiple engines (typically media players
and transcoders), without regressing multiple clients that can saturate
the system or changing the power envelope dramatically.

v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway.
v4: Tell the world and include it as part of scheduler caps.

Testcase: igt/gem_exec_whisper
Testcase: igt/benchmarks/gem_wsim
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-3-chris@chris-wilson.co.uk


# ebece753 01-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Keep timeline HWSP allocated until idle across the system

In preparation for enabling HW semaphores, we need to keep in flight
timeline HWSP alive until its use across entire system has completed,
as any other timeline active on the GPU may still refer back to the
already retired timeline. We both have to delay recycling available
cachelines and unpinning old HWSP until the next idle point.

An easy option would be to simply keep all used HWSP until the system as
a whole was idle, i.e. we could release them all at once on parking.
However, on a busy system, we may never see a global idle point,
essentially meaning the resource will be leaked until we are forced to
do a GC pass. We already employ a fine-grained idle detection mechanism
for vma, which we can reuse here so that each cacheline can be freed
immediately after the last request using it is retired.

v3: Keep track of the activity of each cacheline.
v4: cacheline_free() on canceling the seqno tracking
v5: Finally with a testcase to exercise wraparound
v6: Pack cacheline into empty bits of page-aligned vaddr
v7: Use i915_utils to hide the pointer casting around bit manipulation

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-2-chris@chris-wilson.co.uk


# 3ef71149 01-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Introduce i915_timeline.mutex

A simple mutex used for guarding the flow of requests in and out of the
timeline. In the short-term, it will be used only to guard the addition
of requests into the timeline, taken on alloc and released on commit so
that only one caller can construct a request into the timeline
(important as the seqno and ring pointers must be serialised). This will
be used by observers to ensure that the seqno/hwsp is stable. Later,
when we have reduced retiring to only operate on a single timeline at a
time, we can then use the mutex as the sole guard required for retiring.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301110547.14758-2-chris@chris-wilson.co.uk


# b5773a36 28-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Suppress mere WAIT preemption

WAIT is occasionally suppressed by virtue of preempted requests being
promoted to NEWCLIENT if they have not all ready received that boost.
Make this consistent for all WAIT boosts that they are not allowed to
preempt executing contexts and are merely granted the right to be at the
front of the queue for the next execution slot. This is in keeping with
the desire that the WAIT boost be a minor tweak that does not give
excessive promotion to its user and open ourselves to trivial abuse.

The problem with the inconsistent WAIT preemption becomes more apparent
as the preemption is propagated across the engines, where one engine may
preempt and the other not, and we be relying on the exact execution
order being consistent across engines (e.g. using HW semaphores to
coordinate parallel execution).

v2: Also protect GuC submission from false preemption loops.
v3: Build bug safeguards and better debug messages for st.
v4: Do the priority bumping in unsubmit (i.e. on preemption/reset
unwind), applying it earlier during submit causes out-of-order execution
combined with execute fences.
v5: Call sw_fence_fini for our dummy request (Matthew)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228220639.3173-1-chris@chris-wilson.co.uk


# 32eb6bcf 28-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make request allocation caches global

As kmem_caches share the same properties (size, allocation/free behaviour)
for all potential devices, we can use global caches. While this
potential has worse fragmentation behaviour (one can argue that
different devices would have different activity lifetimes, but you can
also argue that activity is temporal across the system) it is the
default behaviour of the system at large to amalgamate matching caches.

The benefit for us is much reduced pointer dancing along the frequent
allocation paths.

v2: Defer shrinking until after a global grace period for futureproofing
multiple consumers of the slab caches, similar to the current strategy
for avoiding shrinking too early.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-1-chris@chris-wilson.co.uk


# b300fde8 26-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove i915_request.global_seqno

Having weaned the interrupt handling off using a single global execution
queue, we no longer need to emit a global_seqno. Note that we still have
a few assumptions about execution order along engine timelines, but this
removes the most obvious artefact!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-3-chris@chris-wilson.co.uk


# 8892f477 26-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove access to global seqno in the HWSP

Stop accessing the HWSP to read the global seqno, and stop tracking the
mirror in the engine's execution timeline -- it is unused.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-2-chris@chris-wilson.co.uk


# c41166f9 20-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Beware temporary wedging when determining -EIO

At a few points in our uABI, we check to see if the driver is wedged and
report -EIO back to the user in that case. However, as we perform the
check and reset asynchronously (where once before they were both
serialised by the struct_mutex), we may instead see the temporary wedging
used to cancel inflight rendering to avoid a deadlock during reset
(caused by either us timing out in our reset handler,
i915_wedge_on_timeout or with malice aforethought in intel_reset_prepare
for a stuck modeset). If we suspect this is the case, that is we see a
wedged driver *and* reset in progress, then wait until the reset is
resolved before reporting upon the wedged status.

v2: might_sleep() (Mika)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109580
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190220145637.23503-1-chris@chris-wilson.co.uk


# 7f4127c4 18-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use time based guilty context banning

Currently, we accumulate each time a context hangs the GPU, offset
against the number of requests it submits, and if that score exceeds a
certain threshold, we ban that context from submitting any more requests
(cancelling any work in flight). In contrast, we use a simple timer on
the file, that if we see more than a 9 hangs faster than 60s apart in
total across all of its contexts, we will ban the client from creating
any more contexts. This leads to a confusing situation where the file
may be banned before the context, so lets use a simple timer scheme for
each.

If the context submits 3 hanging requests within a 120s period, declare
it forbidden to ever send more requests.

This has the advantage of not being easy to repair by simply sending
empty requests, but has the disadvantage that if the context is idle
then it is forgiven. However, if the context is idle, it is not
disrupting the system, but a hog can evade the request counting and
cause much more severe disruption to the system.

Updating ban_score from request retirement is dubious as the retirement
is purposely not in sync with request submission (i.e. we try and batch
retirement to reduce overhead and avoid latency on submission), which
leads to surprising situations where we can forgive a hang immediately
due to a backlog of requests from before the hang being retired
afterwards.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190219122215.8941-2-chris@chris-wilson.co.uk


# d9e61b66 13-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Defer application of request banning to submission

As we currently do not check on submission whether the context is banned
in a timely manner it is possible for some requests to escape
cancellation after their parent context is banned. By moving the ban
into the request submission under the engine->timeline.lock, we
serialise it with the reset and setting of the context ban.

References: eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190213182737.12695-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# 62eb3c24 13-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Apply rps waitboosting for dma_fence_wait_timeout()

As time goes by, usage of generic ioctls such as drm_syncobj and
sync_file are on the increase bypassing i915-specific ioctls like
GEM_WAIT. Currently, we only apply waitboosting to our driver ioctls as
we track the file/client and account the waitboosting to them. However,
since commit 7b92c1bd0540 ("drm/i915: Avoid keeping waitboost active for
signaling threads"), we no longer have been applying the client
ratelimiting on waitboosts and so that information has only been used
for debug tracking.

Push the application of waitboosting down to the common
i915_request_wait, and apply it to all foreign fence waits as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190213092504.25709-1-chris@chris-wilson.co.uk


# 21950ee7 05-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pull i915_gem_active into the i915_active family

Looking forward, we need to break the struct_mutex dependency on
i915_gem_active. In the meantime, external use of i915_gem_active is
quite beguiling, little do new users suspect that it implies a barrier
as each request it tracks must be ordered wrt the previous one. As one
of many, it can be used to track activity across multiple timelines, a
shared fence, which fits our unordered request submission much better. We
need to steer external users away from the singular, exclusive fence
imposed by i915_gem_active to i915_active instead. As part of that
process, we move i915_gem_active out of i915_request.c into
i915_active.c to start separating the two concepts, and rename it to
i915_active_request (both to tie it to the concept of tracking just one
request, and to give it a longer, less appealing name).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-5-chris@chris-wilson.co.uk


# 78108584 05-Feb-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Add timeline barrier support

Timeline barrier allows serialization between different timelines.

After calling i915_timeline_set_barrier with a request, all following
submissions on this timeline will be set up as depending on this request,
or barrier. Once the barrier has been completed it automatically gets
cleared and things continue as normal.

This facility will be used by the upcoming context SSEU code.

v2:
* Assert barrier has been retired on timeline_fini. (Chris Wilson)
* Fix mock_timeline.

v3:
* Improved comment language. (Chris Wilson)

v4:
* Maintain ordering with previous barriers set on the timeline.

v5:
* Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-3-tvrtko.ursulin@linux.intel.com


# 1413b2bc 04-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Trim NEWCLIENT boosting

Limit the NEWCLIENT boost to only give its small priority boost to fresh
clients only that have no dependencies.

The idea for using NEWCLIENT boosting, commit b16c765122f9 ("drm/i915:
Priority boost for new clients"), is that short-lived streams are often
interactive and require lower latency -- and that by executing those
ahead of the long running hogs, the short-lived clients do little to
interfere with the system throughput by virtue of their short-lived
nature. However, we were only considering the client's own timeline for
determining whether or not it was a fresh stream. This allowed for
compositors to wake up before their vblank and bump all of its client
streams. However, in testing with media-bench this results in chaining
all cooperating contexts together preventing us from being able to
reorder contexts to reduce bubbles (pipeline stalls), overall increasing
latency, and reducing system throughput. The exact opposite of our
intent. The compromise of applying the NEWCLIENT boost to strictly fresh
clients (that do not wait upon anything else) should maintain the
"real-time response under load" characteristics of FQ_CODEL, without
locking together the long chains of dependencies across the system.

References: b16c765122f9 ("drm/i915: Priority boost for new clients")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190204150101.30759-1-chris@chris-wilson.co.uk


# 52c0fdb2 29-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Replace global breadcrumbs with per-context interrupt tracking

A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.

To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.

Before commit 688e6c725816, the solution was simple. Every client
waiting on a request would be woken on every interrupt and each would do
a heavyweight check to see if their request was complete. Commit
688e6c725816 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)

The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).

Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.

v2: Capture fence signaling state for error state and add comments to
warm even the most cold of hearts.
v3: Check if the request is still active before busywaiting
v4: Reduce the amount of pointer misdirection with list_for_each_safe
and using a local i915_request variable inside the loops
v5: Add a missing pluralisation to a purely informative selftest message.

References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk


# 85474441 29-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Identify active requests

To allow requests to forgo a common execution timeline, one question we
need to be able to answer is "is this request running?". To track
whether a request has started on HW, we can emit a breadcrumb at the
beginning of the request and check its timeline's HWSP to see if the
breadcrumb has advanced past the start of this request. (This is in
contrast to the global timeline where we need only ask if we are on the
global timeline and if the timeline has advanced past the end of the
previous request.)

There is still confusion from a preempted request, which has already
started but relinquished the HW to a high priority request. For the
common case, this discrepancy should be negligible. However, for
identification of hung requests, knowing which one was running at the
time of the hang will be much more important.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-2-chris@chris-wilson.co.uk


# 5013eb8c 28-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track the context's seqno in its own timeline HWSP

Now that we have allocated ourselves a cacheline to store a breadcrumb,
we can emit a write from the GPU into the timeline's HWSP of the
per-context seqno as we complete each request. This drops the mirroring
of the per-engine HWSP and allows each context to operate independently.
We do not need to unwind the per-context timeline, and so requests are
always consistent with the timeline breadcrumb, greatly simplifying the
completion checks as we no longer need to be concerned about the
global_seqno changing mid check.

One complication though is that we have to be wary that the request may
outlive the HWSP and so avoid touching the potentially danging pointer
after we have retired the fence. We also have to guard our access of the
HWSP with RCU, the release of the obj->mm.pages should already be RCU-safe.

At this point, we are emitting both per-context and global seqno and
still using the single per-engine execution timeline for resolving
interrupts.

v2: s/fake_complete/mark_complete/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-5-chris@chris-wilson.co.uk


# 3adac468 28-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Introduce concept of per-timeline (context) HWSP

Supplement the per-engine HWSP with a per-timeline HWSP. That is a
per-request pointer through which we can check a local seqno,
abstracting away the presumption of a global seqno. In this first step,
we point each request back into the engine's HWSP so everything
continues to work with the global timeline.

v2: s/i915_request_hwsp/hwsp_seqno/ to emphasis that this is the current
HW value and that we are accessing it via i915_request merely as a
convenience.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-1-chris@chris-wilson.co.uk


# eb8d0f5a 25-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove GPU reset dependence on struct_mutex

Now that the submission backends are controlled via their own spinlocks,
with a wave of a magic wand we can lift the struct_mutex requirement
around GPU reset. That is we allow the submission frontend (userspace)
to keep on submitting while we process the GPU reset as we can suspend
the backend independently.

The major change is around the backoff/handoff strategy for performing
the reset. With no mutex deadlock, we no longer have to coordinate with
any waiter, and just perform the reset immediately.

Testcase: igt/gem_mmap_gtt/hang # regresses
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190125132230.22221-3-chris@chris-wilson.co.uk


# 9fa4973e 24-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove manual breadcumb counting

Now that we know we measure the size of the engine->emit_breadcrumb()
correctly, we can remove the previous manual counting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190125120005.25191-1-chris@chris-wilson.co.uk


# 0e21834e 21-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Tidy common test_bit probing of i915_request->fence.flags

A repeated pattern is to test the signaled bit of our
request->fence.flags. Make this an inline to shorten a few lines and
remove unnecessary line continuations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190121222117.23305-20-chris@chris-wilson.co.uk


# f1e9c909 19-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prevent use of global_seqno=0

We are not allowed to assign rq->global_seqno=0 as it has a special
meaning of "inactive" (not executing on HW).

Fixes: 6faf5916e6be ("drm/i915: Remove HW semaphores for gen7 inter-engine synchronisation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190119143024.26971-1-chris@chris-wilson.co.uk


# 9f58892e 16-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pull all the reset functionality together into i915_reset.c

Currently the code to reset the GPU and our state is spread widely
across a few files. Pull the logic together into a common file.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190116153304.787-1-chris@chris-wilson.co.uk


# d22ba0cb 09-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reduce i915_request_alloc retirement to local context

In the continual quest to reduce the amount of global work required when
submitting requests, replace i915_retire_requests() after allocation
failure to retiring just our ring.

v2: Don't forget the list iteration included an early break, so we would
never throttle on the last request in the ring/timeline.
v3: Use the common ring_retire_requests()

References: 11abf0c5a021 ("drm/i915: Limit the backpressure for i915_request allocation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190109215932.26454-1-chris@chris-wilson.co.uk


# 1216e3c3 28-Dec-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop unused engine->irq_seqno_barrier w/a

Now that we have eliminated the CPU-side irq_seqno_barrier by moving the
delays on the GPU before emitting the MI_USER_INTERRUPT, we can remove
the engine->irq_seqno_barrier infrastructure. Though intentionally
slowing down the GPU is nasty, so is the code we can now remove!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-6-chris@chris-wilson.co.uk


# ed2922c0 28-Dec-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove redundant trailing request flush

Now that we perform the request flushing inline with emitting the
breadcrumb, we can remove the now redundant manual flush. And we can
also remove the infrastructure that remained only for its purpose.

v2: emit_breadcrumb_sz is in dwords, but rq->reserved_space is in bytes

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-1-chris@chris-wilson.co.uk


# 6faf5916 28-Dec-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove HW semaphores for gen7 inter-engine synchronisation

The writing is on the wall for the existence of a single execution queue
along each engine, and as a consequence we will not be able to track
dependencies along the HW queue itself, i.e. we will not be able to use
HW semaphores on gen7 as they use a global set of registers (and unlike
gen8+ we can not effectively target memory to keep per-context seqno and
dependencies).

On the positive side, when we implement request reordering for gen7 we
also can not presume a simple execution queue and would also require
removing the current semaphore generation code. So this bring us another
step closer to request reordering for ringbuffer submission!

The negative side is that using interrupts to drive inter-engine
synchronisation is much slower (4us -> 15us to do a nop on each of the 3
engines on ivb). This is much better than it was at the time of introducing
the HW semaphores and equally important userspace weaned itself off
intermixing dependent BLT/RENDER operations (the prime culprit was glyph
rendering in UXA). So while we regress the microbenchmarks, it should not
impact the user.

References: https://bugs.freedesktop.org/show_bug.cgi?id=108888
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181228140736.32606-2-chris@chris-wilson.co.uk


# dd847a70 07-Dec-2018 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Compile fix for 64b dma-fence seqno

Many errs of the form:
drivers/gpu/drm/i915/selftests/intel_hangcheck.c: In function ‘__igt_reset_evict_vma’:
./include/linux/kern_levels.h:5:18: error: format ‘%x’ expects argument of type ‘unsigned int’, but argum

Fixes: b312d8ca3a7c ("dma-buf: make fence sequence numbers 64 bit v2")
Cc: Christian König <christian.koenig@amd.com>
Cc: Chunming Zhou <david1.zhou@amd.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20181207123428.16257-1-mika.kuoppala@linux.intel.com


# 5f5800a7 07-Dec-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Push EMIT_INVALIDATE at request start to backends

Move the common engine->emit_flush(EMIT_INVALIDATE) back to the backends
(where it was once previously) as we seek to specialise it in future
patches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181207090213.14352-1-chris@chris-wilson.co.uk


# 39e84937 26-Nov-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip engine serialisation for no-op seqno reset

If the engine's seqno is already at our target seqno (most likely it
hasn't been used since the last reset), we can skip serialising the
engine and leave it as is.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181126095610.20962-1-chris@chris-wilson.co.uk


# 1e016a86 24-Oct-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Park signaling thread while wrapping the seqno

A danger encountered when resetting the seqno (using
debugfs/i915_next_seqno) is that as we change the breadcrumb stored in
the HWSP, it may be inspected by the signaler thread leading to
confusion in our sanity checks.

<0> [136.331342] i915/sig-347 3..s1 136336154us : execlists_submission_tasklet: rcs0 awake?=1, active=5
<0> [136.331373] i915/sig-347 3d.s2 136336155us : process_csb: rcs0 cs-irq head=5, tail=0
<0> [136.331402] i915/sig-347 3d.s2 136336155us : process_csb: rcs0 csb[0]: status=0x00000018:0x00000002, active=0x5
<0> [136.331434] i915/sig-347 3d.s2 136336156us : process_csb: rcs0 out[0]: ctx=2.1, global=219 (fence 46:8455) (current 219), prio=0
<0> [136.331466] i915/sig-347 3d.s2 136336156us : process_csb: rcs0 completed ctx=2
<0> [136.332027] gem_exec-1049 0.... 136336246us : reset_all_global_seqno.part.5: rcs0 seqno 219 (current 219) -> -43
<0> [136.332056] gem_exec-1049 0.... 136336251us : reset_all_global_seqno.part.5: bcs0 seqno 183 (current 183) -> -43
<0> [136.332085] gem_exec-1049 0.... 136336255us : reset_all_global_seqno.part.5: vcs0 seqno 191 (current 191) -> -43
<0> [136.332114] gem_exec-1049 0.... 136336259us : reset_all_global_seqno.part.5: vcs1 seqno 180 (current 180) -> -43
<0> [136.332143] gem_exec-1049 0.... 136336262us : reset_all_global_seqno.part.5: vecs0 seqno 212 (current 212) -> -43
<0> [136.332174] i915/sig-347 3.... 136336280us : intel_breadcrumbs_signaler: intel_breadcrumbs_signaler:673 GEM_BUG_ON(!i915_request_completed(rq))

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181024104939.2861-1-chris@chris-wilson.co.uk


# 33373258 05-Oct-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the global cache shrink & rcu barrier on allocation failure

Earlier, we reasoned that having idled the gpu under mempressure, that
would be a good time to trim our request slabs in order to perform the
next request allocation. We have stopped performing the global operation
on the device (no idling) and wish to make the allocation failure
handling more local, so out with the global barrier that may take a long
time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181005080300.9908-2-chris@chris-wilson.co.uk


# e9eaf82d 01-Oct-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Priority boost for waiting clients

Latency is in the eye of the beholder. In the case where a client stops
and waits for the gpu, give that request chain a small priority boost
(not so that it overtakes higher priority clients, to preserve the
external ordering) so that ideally the wait completes earlier.

v2: Tvrtko recommends to keep the boost-from-user-stall as small as
possible and to allow new client flows to be preferred for interactivity
over stalls.

Testcase: igt/gem_sync/switch-default
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-3-chris@chris-wilson.co.uk


# e2f3496e 01-Oct-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pull scheduling under standalone lock

Currently, the backend scheduling code abuses struct_mutex into order to
have a global lock to manipulate a temporary list (without widespread
allocation) and to protect against list modifications. This is an
extraneous coupling to struct_mutex and further can not extend beyond
the local device.

Pull all the code that needs to be under the one true lock into
i915_scheduler.c, and make it so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-2-chris@chris-wilson.co.uk


# b16c7651 01-Oct-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Priority boost for new clients

Taken from an idea used for FQ_CODEL, we give the first request of a
new request flows a small priority boost. These flows are likely to
correspond with short, interactive tasks and so be more latency sensitive
than the longer free running queues. As soon as the client has more than
one request in the queue, further requests are not boosted and it settles
down into ordinary steady state behaviour. Such small kicks dramatically
help combat the starvation issue, by allowing each client the opportunity
to run even when the system is under heavy throughput load (within the
constraints of the user selected priority).

v2: Mark the preempted request as the start of a new flow, to prevent a
single client being continually gazumped by its peers.

Testcase: igt/benchmarks/rrul
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-1-chris@chris-wilson.co.uk


# 11abf0c5 14-Sep-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Limit the backpressure for i915_request allocation

If we try and fail to allocate a i915_request, we apply some
backpressure on the clients to throttle the memory allocations coming
from i915.ko. Currently, we wait until completely idle, but this is far
too heavy and leads to some situations where the only escape is to
declare a client hung and reset the GPU. The intent is to only ratelimit
the allocation requests and to allow ourselves to recycle requests and
memory from any long queues built up by a client hog.

Although the system memory is inherently a global resources, we don't
want to overly penalize an unlucky client to pay the price of reaping a
hog. To reduce the influence of one client on another, we can instead of
waiting for the entire GPU to idle, impose a barrier on the local client.
(One end goal for request allocation is for scalability to many
concurrent allocators; simultaneous execbufs.)

To prevent ourselves from getting caught out by long running requests
(requests that may never finish without userspace intervention, whom we
are blocking) we need to impose a finite timeout, ideally shorter than
hangcheck. A long time ago Paul McKenney suggested that RCU users should
ratelimit themselves using judicious use of cond_synchronize_rcu(). This
gives us the opportunity to reduce our indefinite wait for the GPU to
idle to a wait for the RCU grace period of the previous allocation along
this timeline to expire, satisfying both the local and finite properties
we desire for our ratelimiting.

There are still a few global steps (reclaim not least amongst those!)
when we exhaust the immediate slab pool, at least now the wait is itself
decoupled from struct_mutex for our glorious highly parallel future!

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106680
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180914080017.30308-1-chris@chris-wilson.co.uk


# 97f06158 05-Aug-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pull seqno started checks together

We have a few instances of checking seqno-1 to see if the HW has started
the request. Pull those together under a helper.

v2: Pull the !seqno assertion higher, as given seqno==1 we may indeed
check to see if we have started using seqno==0.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180806112605.20725-1-chris@chris-wilson.co.uk


# ec625fb9 09-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Provide a timeout to i915_gem_wait_for_idle()

Usually we have no idea about the upper bound we need to wait to catch
up with userspace when idling the device, but in a few situations we
know the system was idle beforehand and can provide a short timeout in
order to very quickly catch a failure, long before hangcheck kicks in.

In the following patches, we will use the timeout to curtain two overly
long waits, where we know we can expect the GPU to complete within a
reasonable time or declare it broken.

In particular, with a broken GPU we expect it to fail during the initial
GPU setup where do a couple of context switches to record the defaults.
This is a task that takes a few milliseconds even on the slowest of
devices, but we may have to wait 60s for hangcheck to give in and
declare the machine inoperable. In this a case where any gpu hang is
unacceptable, both from a timeliness and practical standpoint.

The other improvement is that in selftests, we do not need to arm an
independent timer to inject a wedge, as we can just limit the timeout on
the wait directly.

v2: Include the timeout parameter in the trace.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-1-chris@chris-wilson.co.uk


# 890fd185 06-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Replace nested subclassing with explicit subclasses

In the next patch, we will want a third distinct class of timeline that
may overlap with the current pair of client and engine timeline classes.
Rather than use the ad hoc markup of SINGLE_DEPTH_NESTING, initialise
the different timeline classes with an explicit subclass.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180706210710.16251-1-chris@chris-wilson.co.uk


# 6dd7526f 06-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Export i915_request_skip()

In the next patch, we will want to start skipping requests on failing to
complete their payloads. So export the utility function current used to
make requests inoperable following a failed gpu reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180706103947.15919-2-chris@chris-wilson.co.uk


# e3be4079 27-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only signal from interrupt when requested

Avoid calling dma_fence_signal() from inside the interrupt if we haven't
enabled signaling on the request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180627201304.15817-4-chris@chris-wilson.co.uk


# 78796877 27-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move the irq_counter inside the spinlock

Rather than have multiple locked instructions inside the notify_ring()
irq handler, move them inside the spinlock and reduce their intrinsic
locking.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180627201304.15817-3-chris@chris-wilson.co.uk


# 697b9a87 12-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make closing request flush mandatory

For symmetry, simplicity and ensuring the request is always truly idle
upon its completion, always emit the closing flush prior to emitting the
request breadcrumb. Previously, we would only emit the flush if we had
started a user batch, but this just leaves all the other paths open to
speculation (do they affect the GPU caches or not?) With mm switching, a
key requirement is that the GPU is flushed and invalidated before hand,
so for absolute safety, we want that closing flush be mandatory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180612105135.4459-1-chris@chris-wilson.co.uk


# b3ee09a4 10-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/ringbuffer: Fix context restore upon reset

The discovery with trying to enable full-ppgtt was that we were
completely failing to the load both the mm and context following the
reset. Although we were performing mmio to set the PP_DIR (per-process
GTT) and CCID (context), these were taking no effect (the assumption was
that this would trigger reload of the context and restore the page
tables). It was not until we performed the LRI + MI_SET_CONTEXT in a
following context switch would anything occur.

Since we are then required to reset the context image and PP_DIR using
CS commands, we place those commands into every batch. The hardware
should recognise the no-ops and eliminate the expensive context loads,
but we still have to pay the cost of using cross-powerwell register
writes. In practice, this has no effect on actual context switch times,
and only adds a few hundred nanoseconds to no-op switches. We can improve
the latter by eliminating the w/a around known no-op switches, but there
is an ulterior motive to keeping them.

Always emitting the context switch at the beginning of the request (and
relying on HW to skip unneeded switches) does have one key advantage.
Should we implement request reordering on Haswell, we will not know in
advance what the previous executing context was on the GPU and so we
would not be able to elide the MI_SET_CONTEXT commands ourselves and
always have to emit them. Having our hand forced now actually prepares
us for later.

Now since that context and mm follow the request, we no longer (and not
for a long time since requests took over!) require a trace point to tell
when we write the switch into the ring, since it is always. (This is
even more important when you remember that simply writing into the ring
bears no relation to the current mm.)

v2: Sandybridge has to agree to use LRI as well.

Testcase: igt/drv_selftests/live_hangcheck
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180611110845.31890-1-chris@chris-wilson.co.uk


# 09a4c02e 24-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Look for an active kernel context before switching

We were not very carefully checking to see if an older request on the
engine was an earlier switch-to-kernel-context before deciding to emit a
new switch. The end result would be that we could get into a permanent
loop of trying to emit a new request to perform the switch simply to
flush the existing switch.

What we need is a means of tracking the completion of each timeline
versus the kernel context, that is to detect if a more recent request
has been submitted that would result in a switch away from the kernel
context. To realise this, we need only to look in our syncmap on the
kernel context and check that we have synchronized against all active
rings.

v2: Since all ringbuffer clients currently share the same timeline, we do
have to use the gem_context to distinguish clients.

As a bonus, include all the tracing used to debug the death inside
suspend.

v3: Test, test, test. Construct a selftest to exercise and assert the
expected behaviour that multiple switch-to-contexts do not emit
redundant requests.

Reported-by: Mika Kuoppala <mika.kuoppala@intel.com>
Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180524081135.15278-1-chris@chris-wilson.co.uk


# 1fc44d9b 17-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Store a pointer to intel_context in i915_request

To ease the frequent and ugly pointer dance of
&request->gem_context->engine[request->engine->id] during request
submission, store that pointer as request->hw_context. One major
advantage that we will exploit later is that this decouples the logical
context state from the engine itself.

v2: Set mock_context->ops so we don't crash and burn in selftests.
Cleanups from Tvrtko.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-3-chris@chris-wilson.co.uk


# 4e0d64db 17-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move request->ctx aside

In the next patch, we want to store the intel_context pointer inside
i915_request, as it is frequently access via a convoluted dance when
submitting the request to hw. Having two context pointers inside
i915_request leads to confusion so first rename the existing
i915_gem_context pointer to i915_request.gem_context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-1-chris@chris-wilson.co.uk


# 0adb90d3 08-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Annotate timeline lock nesting

CI noticed

<4>[ 23.430701] ============================================
<4>[ 23.430706] WARNING: possible recursive locking detected
<4>[ 23.430713] 4.17.0-rc4-CI-CI_DRM_4156+ #1 Not tainted
<4>[ 23.430720] --------------------------------------------
<4>[ 23.430725] systemd-udevd/169 is trying to acquire lock:
<4>[ 23.430732] (ptrval) (&(&timeline->lock)->rlock){....}, at: move_to_timeline+0x48/0x12c [i915]
<4>[ 23.430888]
but task is already holding lock:
<4>[ 23.430894] (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915]
<4>[ 23.430995]
other info that might help us debug this:
<4>[ 23.431002] Possible unsafe locking scenario:

<4>[ 23.431007] CPU0
<4>[ 23.431010] ----
<4>[ 23.431013] lock(&(&timeline->lock)->rlock);
<4>[ 23.431021] lock(&(&timeline->lock)->rlock);
<4>[ 23.431028]
*** DEADLOCK ***

<4>[ 23.431036] May be due to missing lock nesting notation

<4>[ 23.431044] 5 locks held by systemd-udevd/169:
<4>[ 23.431049] #0: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x42/0xe0
<4>[ 23.431065] #1: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x50/0xe0
<4>[ 23.431078] #2: (ptrval) (&dev->struct_mutex){+.+.}, at: i915_gem_init+0xca/0x630 [i915]
<4>[ 23.431174] #3: (ptrval) (rcu_read_lock){....}, at: submit_notify+0x35/0x124 [i915]
<4>[ 23.431271] #4: (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915]
<4>[ 23.431369]
stack backtrace:
<4>[ 23.431377] CPU: 0 PID: 169 Comm: systemd-udevd Not tainted 4.17.0-rc4-CI-CI_DRM_4156+ #1
<4>[ 23.431385] Hardware name: Dell Inc. OptiPlex GX280 /0G8310, BIOS A04 02/09/2005
<4>[ 23.431394] Call Trace:
<4>[ 23.431403] dump_stack+0x67/0x9b
<4>[ 23.431411] __lock_acquire+0xc67/0x1b50
<4>[ 23.431421] ? ring_buffer_lock_reserve+0x154/0x3f0
<4>[ 23.431429] ? lock_acquire+0xa6/0x210
<4>[ 23.431435] lock_acquire+0xa6/0x210
<4>[ 23.431530] ? move_to_timeline+0x48/0x12c [i915]
<4>[ 23.431540] _raw_spin_lock+0x2a/0x40
<4>[ 23.431634] ? move_to_timeline+0x48/0x12c [i915]
<4>[ 23.431730] move_to_timeline+0x48/0x12c [i915]
<4>[ 23.431826] __i915_request_submit+0xfa/0x280 [i915]
<4>[ 23.431923] i915_request_submit+0x25/0x40 [i915]
<4>[ 23.432024] i9xx_submit_request+0x11/0x140 [i915]
<4>[ 23.432120] submit_notify+0x8d/0x124 [i915]
<4>[ 23.432202] __i915_sw_fence_complete+0x81/0x250 [i915]
<4>[ 23.432300] __i915_request_add+0x31c/0x7c0 [i915]
<4>[ 23.432395] i915_gem_init+0x621/0x630 [i915]
<4>[ 23.432476] i915_driver_load+0xbee/0x10b0 [i915]
<4>[ 23.432485] ? trace_hardirqs_on_caller+0xe0/0x1b0
<4>[ 23.432566] i915_pci_probe+0x29/0x90 [i915]
<4>[ 23.432574] pci_device_probe+0xa1/0x130
<4>[ 23.432582] driver_probe_device+0x306/0x480
<4>[ 23.432589] __driver_attach+0xb7/0xe0
<4>[ 23.432596] ? driver_probe_device+0x480/0x480
<4>[ 23.432602] ? driver_probe_device+0x480/0x480
<4>[ 23.432609] bus_for_each_dev+0x74/0xc0
<4>[ 23.432616] bus_add_driver+0x15f/0x250
<4>[ 23.432623] ? 0xffffffffa02d7000
<4>[ 23.432629] driver_register+0x52/0xc0
<4>[ 23.432635] ? 0xffffffffa02d7000
<4>[ 23.432642] do_one_initcall+0x58/0x370
<4>[ 23.432653] ? do_init_module+0x1d/0x1ea
<4>[ 23.432660] ? rcu_read_lock_sched_held+0x6f/0x80
<4>[ 23.432667] ? kmem_cache_alloc_trace+0x282/0x2e0
<4>[ 23.432675] do_init_module+0x56/0x1ea
<4>[ 23.432682] load_module+0x2435/0x2b20
<4>[ 23.432694] ? __se_sys_finit_module+0xd3/0xf0
<4>[ 23.432701] __se_sys_finit_module+0xd3/0xf0
<4>[ 23.432710] do_syscall_64+0x55/0x190
<4>[ 23.432717] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 23.432724] RIP: 0033:0x7fa780782839
<4>[ 23.432729] RSP: 002b:00007ffcea73e668 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4>[ 23.432738] RAX: ffffffffffffffda RBX: 0000561a472a4b30 RCX: 00007fa780782839
<4>[ 23.432745] RDX: 0000000000000000 RSI: 00007fa7804610e5 RDI: 000000000000000e
<4>[ 23.432752] RBP: 00007fa7804610e5 R08: 0000000000000000 R09: 00007ffcea73e780
<4>[ 23.432758] R10: 000000000000000e R11: 0000000000000246 R12: 0000000000000000
<4>[ 23.432765] R13: 0000561a47296450 R14: 0000000000020000 R15: 0000561a472a4b30

but did not report it as an issue as it only occurred during the first
module on boot. This is due to the removal of the distinct global
timeline, and its separate lock class. So instead mark up the expected
nesting. An alternative would be to define a separate lock class for the
engine, but since we only expect to have a single point of nesting, we
can avoid having multiple lock classes for the struct.

Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Tested-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180508153514.20251-1-chris@chris-wilson.co.uk


# 71ace7ca 07-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Disable tasklet scheduling across initial scheduling

During request submission, we call the engine->schedule() function so
that we may reorder the active requests as required for inheriting the
new request's priority. This may schedule several tasklets to run on the
local CPU, but we will need to schedule the tasklets again for the new
request. Delay all the local tasklets until the end, so that we only
have to process the queue just once.

v2: Beware PREEMPT_RCU, as then local_bh_disable() is then not a
superset of rcu_read_lock().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180507135731.10587-2-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 43c8c441 04-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove assertion of active_rings must be non-empty if active_requests

"An outstanding request must still be on an active ring somewhere" is
only true if we haven't just been interrupted by the shrinker in the
middle of allocating the request itself. (At the start of
i915_request_alloc() we pin the context and prepare the GT for activity,
marking it as active, and then try to allocate the request. If this
allocation invokes the shrinker, we try to reclaim some space by calling
i915_retire_requests() which may then be confused by the pre-reservation
of active_requests.)

<3>[ 125.472695] i915_retire_requests:1429 GEM_BUG_ON(list_empty(&i915->gt.active_rings))
<2>[ 125.472792] kernel BUG at drivers/gpu/drm/i915/i915_request.c:1429!
<4>[ 125.472822] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
<4>[ 125.498764] Modules linked in: snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btusb btrtl btbcm btintel cdc_ether snd_hda_codec_realtek bluetooth i915 snd_hda_codec_generic usbnet r8152 mii ecdh_generic lpc_ich mei_me snd_hda_intel snd_hda_codec mei snd_hwdep snd_hda_core snd_pcm prime_numbers
<4>[ 125.498923] CPU: 0 PID: 1115 Comm: gem_exec_create Tainted: G U 4.17.0-rc3-gc49cbe0d1eb8-kasan_32+ #1
<4>[ 125.498955] Hardware name: GOOGLE Peppy/Peppy, BIOS MrChromebox 02/04/2018
<4>[ 125.499074] RIP: 0010:i915_retire_requests+0x3f2/0x590 [i915]
<4>[ 125.499095] RSP: 0018:ffff88004e5dec40 EFLAGS: 00010282
<4>[ 125.499117] RAX: 0000000000000010 RBX: ffff8800458f0000 RCX: 0000000000000000
<4>[ 125.499140] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff880060c2f6f0
<4>[ 125.499164] RBP: ffff88004e5dee30 R08: ffffed000c185ee6 R09: ffffed000c185ee6
<4>[ 125.499187] R10: 0000000000000001 R11: ffffed000c185ee5 R12: ffff8800553da160
<4>[ 125.499210] R13: dffffc0000000000 R14: 0000000000000000 R15: ffff8800458faed0
<4>[ 125.499235] FS: 00007fe18f052980(0000) GS:ffff880065400000(0000) knlGS:0000000000000000
<4>[ 125.499262] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 125.499282] CR2: 00007f01df11efb8 CR3: 00000000518d4001 CR4: 00000000000606f0
<4>[ 125.499304] Call Trace:
<4>[ 125.499417] i915_gem_shrink+0x576/0xb50 [i915]
<4>[ 125.499532] ? i915_gem_shrinker_count+0x2f0/0x2f0 [i915]
<4>[ 125.499561] ? trace_hardirqs_on_thunk+0x1a/0x1c
<4>[ 125.499671] ? i915_gem_shrinker_count+0x1d6/0x2f0 [i915]
<4>[ 125.499782] ? i915_gem_shrinker_scan+0xc4/0x320 [i915]
<4>[ 125.499889] i915_gem_shrinker_scan+0xc4/0x320 [i915]
<4>[ 125.499997] ? i915_gem_shrinker_vmap+0x3a0/0x3a0 [i915]
<4>[ 125.500021] ? do_raw_spin_unlock+0x4f/0x240
<4>[ 125.500042] ? _raw_spin_unlock+0x29/0x40
<4>[ 125.500149] ? i915_gem_shrinker_count+0x1d6/0x2f0 [i915]
<4>[ 125.500177] shrink_slab.part.18+0x23e/0x8f0
<4>[ 125.500202] ? unregister_shrinker+0x1f0/0x1f0
<4>[ 125.500226] ? mem_cgroup_iter+0x379/0xcc0
<4>[ 125.500249] shrink_node+0xa7e/0x1180
<4>[ 125.500276] ? shrink_node_memcg+0x11f0/0x11f0
<4>[ 125.500297] ? __delayacct_freepages_start+0x38/0x80
<4>[ 125.500319] ? __is_insn_slot_addr+0xe3/0x1a0
<4>[ 125.500342] ? recalibrate_cpu_khz+0x10/0x10
<4>[ 125.500361] ? ktime_get+0xb2/0x140
<4>[ 125.500382] do_try_to_free_pages+0x2d3/0xe40
<4>[ 125.500407] ? allow_direct_reclaim.part.23+0x1e0/0x1e0
<4>[ 125.500429] ? shrink_node+0x1180/0x1180
<4>[ 125.500450] ? __read_once_size_nocheck.constprop.4+0x10/0x10
<4>[ 125.500476] try_to_free_pages+0x1af/0x560
<4>[ 125.500497] ? do_try_to_free_pages+0xe40/0xe40
<4>[ 125.500525] __alloc_pages_nodemask+0xadc/0x2130
<4>[ 125.500553] ? gfp_pfmemalloc_allowed+0x150/0x150
<4>[ 125.500654] ? i915_gem_do_execbuffer+0x219d/0x32e0 [i915]
<4>[ 125.500678] ? debug_check_no_locks_freed+0x2a0/0x2a0
<4>[ 125.500701] ? __debug_object_init+0x322/0xd90
<4>[ 125.500722] ? debug_check_no_locks_freed+0x2a0/0x2a0
<4>[ 125.500827] ? i915_gem_do_execbuffer+0xdc2/0x32e0 [i915]
<4>[ 125.500942] ? i915_request_alloc+0x5b5/0x13f0 [i915]
<4>[ 125.500964] ? page_frag_free+0x170/0x170
<4>[ 125.500984] ? debug_check_no_locks_freed+0x2a0/0x2a0
<4>[ 125.501008] new_slab+0x21d/0x5c0
<4>[ 125.501029] ___slab_alloc.constprop.35+0x322/0x3e0
<4>[ 125.501052] ? reservation_object_reserve_shared+0x10b/0x250
<4>[ 125.501074] ? __ww_mutex_lock.constprop.3+0x1104/0x2cf0
<4>[ 125.501097] ? _raw_spin_unlock_irqrestore+0x39/0x60
<4>[ 125.501120] ? fs_reclaim_acquire+0x10/0x10
<4>[ 125.501138] ? lock_acquire+0x138/0x3c0
<4>[ 125.501156] ? lock_acquire+0x3c0/0x3c0
<4>[ 125.501176] ? reservation_object_reserve_shared+0x10b/0x250
<4>[ 125.501198] ? __slab_alloc.isra.27.constprop.34+0x3d/0x70
<4>[ 125.501219] __slab_alloc.isra.27.constprop.34+0x3d/0x70
<4>[ 125.501243] ? reservation_object_reserve_shared+0x10b/0x250
<4>[ 125.501265] __kmalloc_track_caller+0x313/0x350
<4>[ 125.501287] krealloc+0x62/0xb0
<4>[ 125.501305] reservation_object_reserve_shared+0x10b/0x250
<4>[ 125.501411] i915_gem_do_execbuffer+0x2040/0x32e0 [i915]
<4>[ 125.501522] ? eb_relocate_slow+0xad0/0xad0 [i915]
<4>[ 125.501544] ? debug_check_no_locks_freed+0x2a0/0x2a0
<4>[ 125.501646] ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915]
<4>[ 125.501755] ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915]
<4>[ 125.501779] ? drm_dev_get+0x20/0x20
<4>[ 125.501803] ? __might_fault+0xea/0x1a0
<4>[ 125.501902] ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915]
<4>[ 125.502012] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
<4>[ 125.502116] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
<4>[ 125.502218] i915_gem_execbuffer2_ioctl+0x3c5/0x770 [i915]
<4>[ 125.502243] ? drm_dev_enter+0xe0/0xe0
<4>[ 125.502260] ? lock_acquire+0x138/0x3c0
<4>[ 125.502362] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
<4>[ 125.502470] ? i915_gem_object_create.part.28+0x570/0x570 [i915]
<4>[ 125.502575] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
<4>[ 125.502680] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
<4>[ 125.502702] drm_ioctl_kernel+0x151/0x200
<4>[ 125.502721] ? drm_ioctl_permit+0x2a0/0x2a0
<4>[ 125.502746] drm_ioctl+0x63a/0x920
<4>[ 125.502844] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
<4>[ 125.502868] ? drm_getstats+0x20/0x20
<4>[ 125.502886] ? trace_hardirqs_on_thunk+0x1a/0x1c
<4>[ 125.502919] do_vfs_ioctl+0x173/0xe90
<4>[ 125.502936] ? trace_hardirqs_on_thunk+0x1a/0x1c
<4>[ 125.502957] ? ioctl_preallocate+0x170/0x170
<4>[ 125.502978] ? trace_hardirqs_on_thunk+0x1a/0x1c
<4>[ 125.503002] ? retint_kernel+0x2d/0x2d
<4>[ 125.503024] ksys_ioctl+0x35/0x60
<4>[ 125.503043] __x64_sys_ioctl+0x6a/0xb0
<4>[ 125.503061] do_syscall_64+0x97/0x400
<4>[ 125.503081] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 125.503101] RIP: 0033:0x7fe18e4f65d7
<4>[ 125.503116] RSP: 002b:00007ffe2ffc06a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4>[ 125.503145] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe18e4f65d7
<4>[ 125.503168] RDX: 00007ffe2ffc07f0 RSI: 0000000040406469 RDI: 0000000000000003
<4>[ 125.503191] RBP: 00007ffe2ffc07f0 R08: 0000000000000004 R09: 00007ffe2ffcf080
<4>[ 125.503215] R10: 000000000002c7de R11: 0000000000000246 R12: 0000000040406469
<4>[ 125.503238] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
<4>[ 125.503268] Code: e8 18 a0 c9 da 48 8b 35 25 3a 47 00 49 c7 c0 a0 3b 88 c0 b9 95 05 00 00 48 c7 c2 e0 49 88 c0 48 c7 c7 8d 3b 5d c0 e8 ee 7e db da <0f> 0b 48 89 ef e8 a4 26 f5 da e9 51 fe ff ff e8 8a 26 f5 da e9
<1>[ 125.503548] RIP: i915_retire_requests+0x3f2/0x590 [i915] RSP: ffff88004e5dec40

Fixes: 643b450a594e ("drm/i915: Only track live rings for retiring")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180504101147.26286-1-chris@chris-wilson.co.uk


# 7c572e1b 03-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Keep one request in our ring_list

Don't pre-emptively retire the oldest request in our ring's list if it
is the only request. We keep various bits of state alive using the
active reference from the request and would rather transfer that state
over to a new request rather than the more involved process of retiring
and reacquiring it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180503195115.22309-2-chris@chris-wilson.co.uk


# ea491b23 02-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reset the hangcheck timestamp before repeating a seqno

In the unusual circumstance where we reuse a seqno (for example, in
igt), make sure that we reset the hangcheck timestamp before it sees the
same seqno again.

References: https://bugs.freedesktop.org/show_bug.cgi?id=106215
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502220313.6459-1-chris@chris-wilson.co.uk


# a89d1f92 02-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split i915_gem_timeline into individual timelines

We need to move to a more flexible timeline that doesn't assume one
fence context per engine, and so allow for a single timeline to be used
across a combination of engines. This means that preallocating a fence
context per engine is now a hindrance, and so we want to introduce the
singular timeline. From the code perspective, this has the notable
advantage of clearing up a lot of mirky semantics and some clumsy
pointer chasing.

By splitting the timeline up into a single entity rather than an array
of per-engine timelines, we can realise the goal of the previous patch
of tracking the timeline alongside the ring.

v2: Tweak wait_for_idle to stop the compiling thinking that ret may be
uninitialised.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-2-chris@chris-wilson.co.uk


# 65fcb806 02-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move timeline from GTT to ring

In the future, we want to move a request between engines. To achieve
this, we first realise that we have two timelines in effect here. The
first runs through the GTT is required for ordering vma access, which is
tracked currently by engine. The second is implied by sequential
execution of commands inside the ringbuffer. This timeline is one that
maps to userspace's expectations when submitting requests (i.e. given the
same context, batch A is executed before batch B). As the rings's
timelines map to userspace and the GTT timeline an implementation
detail, move the timeline from the GTT into the ring itself (per-context
in logical-ring-contexts/execlists, or a global per-engine timeline for
the shared ringbuffers in legacy submission.

The two timelines are still assumed to be equivalent at the moment (no
migrating requests between engines yet) and so we can simply move from
one to the other without adding extra ordering.

v2: Reinforce that one isn't allowed to mix the engine execution
timeline with the client timeline from userspace (on the ring).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-1-chris@chris-wilson.co.uk


# 643b450a 30-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only track live rings for retiring

We don't need to track every ring for its lifetime as they are managed
by the contexts/engines. What we do want to track are the live rings so
that we can sporadically clean up requests if userspace falls behind. We
can simply restrict the gt->rings list to being only gt->live_rings.

v2: s/live/active/ for consistency with gt.active_requests

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-4-chris@chris-wilson.co.uk


# b887d615 30-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Retire requests along rings

In the next patch, rings are the central timeline as requests may jump
between engines. Therefore in the future as we retire in order along the
engine timeline, we may retire out-of-order within a ring (as the ring now
occurs along multiple engines), leading to much hilarity in miscomputing
the position of ring->head.

As an added bonus, retiring along the ring reduces the penalty of having
one execlists client do cleanup for another (old legacy submission
shares a ring between all clients). The downside is that slow and
irregular (off the critical path) process of cleaning up stale requests
after userspace becomes a modicum less efficient.

In the long run, it will become apparent that the ordered
ring->request_list matches the ring->timeline, a fun challenge for the
future will be unifying the two lists to avoid duplication!

v2: We need both engine-order and ring-order processing to maintain our
knowledge of where individual rings have completed upto as well as
knowing what was last executing on any engine. And finally by decoupling
retiring the contexts on the engine and the timelines along the rings,
we do have to keep a reference to the context on each request
(previously it was guaranteed by the context being pinned).

v3: Not just a reference to the context, but we need to keep it pinned
as we manipulate the rings; i.e. we need a pin for both the manipulation
of the engine state during its retirements, and a separate pin for the
manipulation of the ring state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-3-chris@chris-wilson.co.uk


# ab82a063 30-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wrap engine->context_pin() and engine->context_unpin()

Make life easier in upcoming patches by moving the context_pin and
context_unpin vfuncs into inline helpers.

v2: Fixup mock_engine to mark the context as pinned on use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-2-chris@chris-wilson.co.uk


# 52d7f16e 30-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop tracking timeline->inflight_seqnos

In commit 9b6586ae9f6b ("drm/i915: Keep a global seqno per-engine"), we
moved from a global inflight counter to per-engine counters in the
hope that will be easy to run concurrently in future. However, with the
advent of the desire to move requests between engines, we do need a
global counter to preserve the semantics that no engine wraps in the
middle of a submit. (Although this semantic is now only required for gen7
semaphore support, which only supports greater-then comparisons!)

v2: Keep a global counter of all requests ever submitted and force the
reset when it wraps.

References: 9b6586ae9f6b ("drm/i915: Keep a global seqno per-engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-1-chris@chris-wilson.co.uk


# b7268c5e 18-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pack params to engine->schedule() into a struct

Today we only want to pass along the priority to engine->schedule(), but
in the future we want to have much more control over the various aspects
of the GPU during a context's execution, for example controlling the
frequency allowed. As we need an ever growing number of parameters for
scheduling, move those into a struct for convenience.

v2: Move the anonymous struct into its own function for legibility and
ye olde gcc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180418184052.7129-3-chris@chris-wilson.co.uk


# 0c7112a0 18-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename priotree to sched

Having moved the priotree struct into i915_scheduler.h, identify it as
the scheduling element and rebrand into i915_sched. This becomes more
useful as we start attaching more information we require to propagate
through the scheduler.

v2: Use i915_sched_node for future distinctiveness

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180418184052.7129-2-chris@chris-wilson.co.uk


# 0c5c7df3 06-Apr-2018 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915/execlists: Log fence context & seqno throughout GEM_TRACE

Include fence context and seqno in low level tracing so it is easier to
follow flows of individual requests when things go bad.

Also added tracing on the reset side of things.

v2:
Chris Wilson:
* Standardize global_seqno and seqno as global.
* Include current hws seqno in execlists_cancel_port_requests.

v3:
* Fix port printk format for all builds.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v2
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180406123514.5809-1-tvrtko.ursulin@linux.intel.com


# d0667e9c 06-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pass the set of guilty engines to i915_reset()

Currently, we rely on inspecting the hangcheck state from within the
i915_reset() routines to determine which engines were guilty of the
hang. This is problematic for cases where we want to run
i915_handle_error() and call i915_reset() independently of hangcheck.
Instead of relying on the indirect parameter passing, turn it into an
explicit parameter providing the set of stalled engines which then are
treated as guilty until proven innocent.

While we are removing the implicit stalled parameter, also make the
reason into an explicit parameter to i915_reset(). We still need a
back-channel for i915_handle_error() to hand over the task to the locked
waiter, but let's keep that its own channel rather than incriminate
another.

This leaves stalled/seqno as being private to hangcheck, with no more
nefarious snooping by reset, be it whole-device or per-engine. \o/

The only real issue now is that this makes it crystal clear that we
don't actually do any testing of hangcheck per se in
drv_selftest/live_hangcheck, merely of resets!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180406220354.18911-2-chris@chris-wilson.co.uk


# e4d2006f 06-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split out parking from the idle worker for reuse

We will want to park GEM before disengaging the drive^W^W^W unwedging.
Since we already do the work for idling, expose the guts as a new
function that we can then reuse.

v2: Just skip if already parked; makes it more forgiving to use by
future callers.
v3: Extract mark_busy, rename it to i915_gem_unpark and place it next to
i915_gem_park so that we can evaluate it for symmetry more easily.
Calling GEM from inside i915_request looks to be a bit of a layering
violation, for the moment I am imaging them as being notify_cb.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> #v1
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180406155144.27791-1-chris@chris-wilson.co.uk


# e7702760 27-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Include the HW breadcrumb whenever we trace the global_seqno

When we include a request's global_seqno in a GEM_TRACE it often helps
to know how that relates to the current breadcrumb as seen by the
hardware.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180327210157.16896-3-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 4ccfee92 22-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove local timeline var from submit/unsubmit

Both request_submit and request_unsubmit deal with transferring the
request from the client's timeline onto the execution timeline and back
again. As both functions deal with a pair of timeline's, using a
shorthand for just one of them is slightly confusing, especially as the
different functions use the shorthand for the alternate timeline.
Instead, use the full version of each timeline so it should be easier to
keep track of the transfer between the request/client and the engine.

v2: Refactor the common lock+list_move
v3: Be clear we require the other timeline list to be locked as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180322131034.6036-1-chris@chris-wilson.co.uk


# 0e59c209 22-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix tracing of submit seqno

We pre-increment the timeline->seqno when handing it to the request,
make sure the GEM_TRACE takes this into account. Otherwise, it appears
that we go backwards over a preemption point:

1d..1 157681077us : __i915_request_unsubmit: vcs0 fence 75e:3 <- global_seqno 17
0d.s1 157681113us : __i915_request_submit: vcs0 fence 75e:3 -> global_seqno 16

Fixes: d9b13c4dde6c ("drm/i915: Trace GEM steps between submit and wedging")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180322110059.4467-1-chris@chris-wilson.co.uk


# ce800754 20-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add control flags to i915_handle_error()

Not all callers want the GPU error to handled in the same way, so expose
a control parameter. In the first instance, some callers do not want the
heavyweight error capture so add a bit to request the state to be
captured and saved.

v2: Pass msg down to i915_reset/i915_reset_engine so that we include the
reason for the reset in the dev_notice(), superseding the earlier option
to not print that notice.
v3: Stash the reason inside the i915->gpu_error to handover to the direct
reset from the blocking waiter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180320100449.1360-2-chris@chris-wilson.co.uk


# 7e9d3a4a 07-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection

Similar to the staging around handling of engine->submit_request, we
need to stop adding to the execlists->queue prior to calling
engine->cancel_requests. cancel_requests will move requests from the
queue onto the timeline, so if we add a request onto the queue after that
point, it will be lost.

Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-5-chris@chris-wilson.co.uk
(cherry picked from commit 47650db02dd52267953df81438c93cf8a0eb0e5e)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# d9b13c4d 15-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Trace GEM steps between submit and wedging

We still have an odd race with wedging/unwedging as shown by igt/gem_eio
that defies expectations. Add some more trace_printks to try and
visualize the flow over the precipice.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180315131451.4060-1-chris@chris-wilson.co.uk


# 6f9ec414 08-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the impedance mismatch around intel_engine_enable_signaling

There is some redundancy between dma_fence->ops->enable_signaling (via
i915_fence_enable_signaling) and our backend,
intel_engine_enable_signaling() in that both levels recheck the fence
status multiple times. If we convert intel_engine_enable_signaling() to
return the information desired by dma_fence->ops->enable_signaling, we
can reduce i915_fence_enable_signaling to a simple stub and avoid
trying to reinterpret the same information.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180308140732.25090-1-chris@chris-wilson.co.uk


# 47650db0 07-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection

Similar to the staging around handling of engine->submit_request, we
need to stop adding to the execlists->queue prior to calling
engine->cancel_requests. cancel_requests will move requests from the
queue onto the timeline, so if we add a request onto the queue after that
point, it will be lost.

Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-5-chris@chris-wilson.co.uk


# 36620032 07-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Update ring position from request on retiring

When wedged, we do not update the ring->tail as we submit the requests
causing us to leak the ring->space upon cleaning up the wedged driver.
We can just use the value stored in rq->tail, and keep the submission
backend details away from set-wedge.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-3-chris@chris-wilson.co.uk


# f41d19be 06-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush waiters on seqno wraparound

Previously, we would spin waiting for all waiters to wake up and notice
their request had completed before we would reset the seqno upon
wraparound. However, we can mark their waits as complete and wake them
up directly using the existing machinery for handling the flushing of
missed wakeups when idling.

Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180306130143.13312-2-chris@chris-wilson.co.uk


# 93eef7d6 06-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop kicking the signaling thread on seqno wraparound

Since commit fd10e2ce9905 ("drm/i915/breadcrumbs: Ignore unsubmitted
signalers"), we cancel the signaler when retiring the request and so
upon wraparound, where we wait for all requests to be retired, we no
longer need to spin waiting for the signaling thread to release its
references to the in-flight requests, and so we can assert that the
signaler is idle.

References: fd10e2ce9905 ("drm/i915/breadcrumbs: Ignore unsubmitted signalers")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180306130143.13312-1-chris@chris-wilson.co.uk


# e532be89 22-Feb-2018 Michel Thierry <michel.thierry@intel.com>

drm/i915: Update missing parts after the rename to i915_request

Mostly doc/print messages that were not updated after commit e61e0f51ba79
("drm/i915: Rename drm_i915_gem_request to i915_request").

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180222172405.11386-1-michel.thierry@intel.com


# e61e0f51 21-Feb-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename drm_i915_gem_request to i915_request

We want to de-emphasize the link between the request (dependency,
execution and fence tracking) from GEM and so rename the struct from
drm_i915_gem_request to i915_request. That is we may implement the GEM
user interface on top of requests, but they are an abstraction for
tracking execution rather than an implementation detail of GEM. (Since
they are not tied to HW, we keep the i915 prefix as opposed to intel.)

In short, the spatch:
@@

@@
- struct drm_i915_gem_request
+ struct i915_request

A corollary to contracting the type name, we also harmonise on using
'rq' shorthand for local variables where space if of the essence and
repetition makes 'request' unwieldy. For globals and struct members,
'request' is still much preferred for its clarity.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>